STK121 SUT 2.4.1-2.4.4 Nonparametric Methods (2022) MBS

STK 120/121/161
Chapter 18
(Study unit 2.4)
Nonparametric Methods
1
Chapter 18
• Sign Test (SUT 2.4.1)
• Wilcoxon Signed Rank Test (NOT PART of this curriculum)
• Mann-Whitney-Wilcoxon Test (SUT 2.4.2)
• Kruskal-Wallis Test (SUT 2.4.3)
• Rank Correlation (SUT 2.4.4)
2
LEARNING
STUDY UNIT STUDY UNIT THEME
AREA
Multivariate 2.1 Tests of Goodness of Fit and 2.1.1 Conditional probabilities in a two-way table
Statistical Independence 2.1.2 Tests of Independence
Methods (Chapter 12)
Textbook 2.2 Analysis of Variance and Experimental 2.2.1 An Introduction to Analysis of Variance
Design 2.2.2 Analysis of Variance and the Completely Randomized design
(Chapter 13)
2.3 Time Series and Forecasting 2.3.1 Time Series Patterns

(Chapter 17) 2.3.2 Smoothing Methods: Moving Averages and Forecasting
2.3.3 Trend Projection
2.3.4 Time Series Decomposition
2.4 Nonparametric Methods 2.4.1 Sign test
(Chapter 18) 2.4.2 Mann-Whitney-Wilcoxon test
4 Lectures 2.4.3 Kruskal-Wallis test
2.4.4 Spearman rank-correlation coefficient
2.5 Index Numbers 2.5.1 Price Relatives

(Notes) 2.5.2 Aggregate Price Indexes
2.5.3 Computing an aggregate price index from price relatives
2.5.4 Some important price indexes
2.5.5 Deflating a series by price indexes
2.5.6 Quantity Indexes
3
Study Unit Theme 2.4.1: (Sign Test)
Learning Outcomes
On completion of this study unit it will be expected of you to:
• know what the difference in approah between parametric and nonparametric tests are.
• be able to understand, explain and execute the hypothesis test about a single population
median.
• be able to calculate a p-value by hand and by using EXCEL’s BINOM.DIST function.
• be able to calculate a p-value for larger sample sizes, where the normal distribution
approximation of the binomial distribution is used in hand calculations and by using EXCEL’s
NORM.S.DIST and NORM.DIST functions.
• know what the requiremments for the sample size is in order to do a normal approximation.
• be able to apply the correction factors when approximating the discrete binomial distribution
with the continous normal distribution.
4
• Most of the statistical methods referred to as parametric require the
use of interval- or ratio-scaled data.
• Nonparametric methods are often the only way to analyze categorical
( nominal or ordinal) data and draw statistical conclusions.
• Nonparametric methods require no assumptions about the
population probability distributions.
• Nonparametric methods are often called distribution-free methods.
• Whenever the data are quantitative, we will transform the data into
categorical data in order to conduct the nonparametric test.
5
Sign Test
• The sign test is a versatile method for hypothesis testing
that uses the binomial distribution with p = .50 as the
sampling distribution.
• We present two applications of the sign test:
• A hypothesis test about a population median.
Remember the median is the datapoint in the middle of
the array of observations with half of the values (0.5)
less than it and half of the values (0.5) greater than it.
• A matched-sample test about the difference between
two populations (Not part of the Curriculum)
6
Hypothesis Test about a Population Median
• We can apply the sign test by:
• Using a plus sign whenever the data in the sample
are above the hypothesized value of the median
• Using a minus sign whenever the data in the
sample are below the hypothesized value of the
median
• Discarding (Delete) any data exactly equal to the
hypothesized median
7
Hypothesis Test about a Population Median
The assigning of the plus and minus signs makes the situation into a
binomial distribution application.
• The sample size is the number of trials.
• There are two outcomes possible per trial, a plus sign or a minus sign.
• The trials are independent.
• We let p denote the probability of a plus sign.
• If the population median is in fact a particular value, p should equal .5.
8
Hypothesis Test about a Population Median:
Small-Sample Case
• The small-sample case for this sign test should be used whenever n < 20.
• The hypotheses are
• 𝐻0 : 𝑝 = .50 The population median equals the value
assumed.
• 𝐻𝑎 : 𝑝 ≠ .50 The population median is different than the value
assumed.
• The number of plus signs is our test statistic.
• Assuming H0 is true, the sampling distribution for the test statistic is a
binomial distribution with p = .5.
• H0 is rejected if the p-value < level of significance, a.
9
Small-Sample Case
Example: Lawler Sample data
Lawler’s Grocery Store made the decision to carry Cape May Potato
Chips based on the manufacturer’s estimate that the median sales
should be $450 per week on a per-store basis.
Lawler’s has been carrying the potato chips for three months. Data
showing one-week sales at 10 randomly selected Lawler’s stores are
shown on the next slide.
10
Small-Sample Case Store Weekly Sign
Number Sales ($)
56 485 +
19 562 +
❑ One week sales at 10 Lawler grocery 36 415 -
stores 128 860 +
12 426 -
63 474 +
39 662 +
84 380 -
102 515 +
44 721 +
11
Small-Sample Case
Lawler’s management requested the following hypothesis test about the
population median weekly sales of Cape May Potato Chips (using a = .10).
H0: Median Sales = $450
Ha: Median Sales ≠ $450
In terms of the binomial probability p:

H0: p = .50
Ha: p ≠ .50
12
Small-Sample Case Number of +
signs
Probability
0 .0010
Example: Lawler Sample data 1 .0098
2 .0439
❑ Binomial Probabilities 3 .1172
4 .2051
with n = 10 and p = .50
5 .2461
6 .2051
7 .1172
8 .0439
9 .0098
10 .0010
13
Small-Sample Case
Because the observed number of plus signs is 7, we begin by computing the
probability of obtaining 7 or more plus signs.
The probability of 7, 8, 9, or 10 plus signs is:
.1172 + .0439 + .0098 + .0010 = .1719.
We are using a two-tailed hypothesis test, so:

p-value = 2(.1719) = .3438.
With p-value > a, (.3438 > .10), we cannot reject H0.

14
Small-Sample Case
• Conclusion
Because the p-value > a, we cannot reject H0. There is insufficient evidence in
the sample to reject the assumption that the median weekly sales is $450.
15
Larger Sample Size
• With larger sample sizes, we rely on the normal distribution
approximation of the binomial distribution to compute the p-value,
which makes the computations quicker and easier.
• Normal Approximation of the Number of Plus Signs when H0: p = .50

Mean:  = .50n
Standard Deviation: 𝜎 = .25𝑛
Distribution Form: Approximately normal for n > 20
16
Larger Sample Size
Example: Median price of new homes
A hypothesis test is being conducted about the median price of a new
home.
H0: Median ≥ 236,000
Ha: Median < 236,000
A random sample of 61 recent new home sales found 22 homes sold
for more than $236,000, 38 homes sold for less than $236,000 and one
home sold for $236,000.
Is there sufficient evidence to reject H0? Use a = .05.
17
Larger Sample Size
• Letting x denote the number of plus signs, we will use the normal
distribution to approximate the binomial probability P(x < 22).
• Remember that the binomial distribution is discrete and the normal
distribution is continuous.
• To account for this, the binomial probability of 22 is computed by the
normal probability interval 21.5 to 22.5.
18
Larger Sample Size
• Mean and Standard Deviation

 = .5(n) = .5(60) = 30
𝜎 = .25𝑛 = .25(60) = 3. 873

• Test Statistic
z = (x – )/s = (22.5 – 30)/3.873 = - 1.94
• p-Value
p-Value = .0262
19
Larger Sample Size
• Rejection Rule
Using .05 level of significance:
Reject H0 if p-value < .05
• Conclusion
Reject H0.
Median price of a new home is less than the $236,000 median price
a year ago.
20
LEARNING
AREA
(Chapter 13)


21
Study Unit Theme 2.4.2: (Mann-Whitney-Wilcoxon Test)
Learning Outcomes
• be able to understand, explain and execute the hypothesis test about the difference
between two populations based on independent samples.
• be able to calculate a p-value by hand and by using EXCEL’s BINOM.DIST function.
• be able to calculate a p-value for larger sample sizes, where the normal distribution
approximation of the binomial distribution is used in hand calculations and by using
EXCEL’s NORM.S.DIST and NORM.DIST functions.
• know what the requiremments for the sample size is in order to do a normal
approximation.
• be able to apply the correction factors when approximating the discrete binomial
distribution with the continous normal distribution.
22
Mann-Whitney-Wilcoxon Test
• This test is another nonparametric method for determining whether
there is a difference between two populations.
• This test is based on two independent samples.
• Advantages of this procedure are;
✓It can be used with either ordinal data or quantitative data.
✓It does not require the assumption that the populations have a normal
distribution.
23
• Instead of testing for the difference between the medians of two
populations, this method tests to determine whether the two
populations are identical.
• The hypotheses are:
H0: The two populations are identical
Ha: The two populations are not identical
24
Example: Third national bank
The manager is monitoring the balances maintained in checking

accounts at two branch banks and wants to know if the populations of
account balances at the two branch banks are identical.
Two independent samples of checking accounts are taken with sample
sizes n1 = 12 at branch 1 and n2 = 10 at branch 2.
25
Example: Third national bank Branch 2
Branch 1 Branch 1 Account Balance ($)
Accoun Balance Account Balance 1 885
❑ Account balances t ($) ($) 2 850
1 1095 9 875 3 915
2 955 10 1055 4 950
3 1200 11 1025 5 800
4 1195 12 975 6 750
5 925 7 865
6 950 8 1000
7 805 9 1050
8 945 10 935
26
• Hypotheses
H0: Both the branches are identical in terms of populations of

account balances
Ha: The branches are not identical in terms of populations of
account balances
27
• First, rank the combined data from the lowest to the highest values,
with tied values being assigned the average of the tied rankings.
• Then, compute W, the sum of the ranks for the first sample.
• Then, compare the observed value of W to the sampling distribution
of W for identical populations. The value of the standardized test
statistic z will provide the basis for deciding whether to reject H0.
28
• Sampling Distribution of W with Identical Populations
• Mean
𝛍W = 1/2 n1(n1 + n2 + 1)
• Standard Deviation
𝛔𝑊 = 1ൗ12 𝑛1 𝑛2 (𝑛1 + 𝑛2 + 1)
• Distribution Form
Approximately normal, provided n1 > 7 and n2 > 7
29
Mann-Whitney-Wilcoxon Test Branch 2
Account Balance Rank
($)
Example: Third national bank 1 885 7
Branch 1 Branch 1 2 850 4
Accoun Balance Rank Account Balance Rank 3 915 8
t ($) ($) 4 950 12.5
1 1095 20 9 875 6
❑Combined 2 955 14 10 1055 19
5 800 2
6 750 1
ranking 3 1200 22 11 1025 17 7 865 5
4 1195 21 12 975 15 8 1000 16
5 925 9 Sum of 169.5 9 1050 18
6 950 12.5 ranks
10 935 10
7 805 3 Sum of 83.5
8 945 11 ranks
30
❑ Sampling
distribution of w
31
• Rejection Rule
Using .05 level of significance,
• Test Statistic
169−138
𝑃(W ≥ 169.5)= 𝑃 𝑧 ≥ 15.1658 = 𝑃(𝑧 ≥ 2.04) = 𝑃(𝑧 < − 2.04)
• p-Value
2(1-P(z ≤ 2.04) =2(1 - 0.9793) = 2(0.0207) = 0.0414
OR 2𝑃(𝑧 < − 2.04) = 2(0.0207) = 0.0414
32
• Conclusion
p-value = 0.0414 < α=0.05
∴ Reject H0.
There is insufficient evidence in the sample data to conclude that

there is no difference in the two populations of account
balances.
33
LEARNING
AREA
(Chapter 13)


34
Study Unit Theme 2.4.3: (Kruskal-Wallis Test)
Learning Outcomes
• be able to understand, explain and execute the hypothesis test about the
difference of more than two populations based on independent samples.
• know what the requiremments for the sample size is in order to do a
normal approximation.
• be able to calculate a p-value by hand and by using EXCEL’s CHI.DIST.RT
function.
35
Kruskal-Wallis Test
• The Mann-Whitney-Wilcoxon test has been extended by Kruskal and
Wallis for cases of three or more populations.
H0: All populations are identical
Ha: Not all populations are identical
• The Kruskal-Wallis test can be used with ordinal data as well as with
interval or ratio data.
• Also, the Kruskal-Wallis test does not require the assumption of
normally distributed populations.
36
Kruskal-Wallis Test
• Test Statistic
𝑘
12 𝑅𝑖 2
𝐻= ෍ − 3 𝑛𝑇 + 1
𝑛𝑇 𝑛𝑇 + 1 𝑛𝑖
𝑖=1
where:
k = number of populations
ni = number of observations in sample i
nT = total number of observations in all samples
Ri = sum of the ranks for sample i
37
Kruskal-Wallis Test
• When the populations are identical, the sampling distribution of the
test statistic H can be approximated by a chi-square distribution with
k – 1 degrees of freedom.
• This approximation is acceptable if each of the sample sizes ni is > 5.
• This test is always expressed as an upper-tailed test.
• The rejection rule is: Reject H0 if p-value < a
38
Kruskal-Wallis Test
Example: Williams Manufacturing Company
Williams Manufacturing Company hires employees for its management

staff from three different colleges. The director wants to know whether
there are differences in the performance ratings among the managers
who graduated from the three colleges.
Performance rating data are available for independent samples of

seven managers who graduated from college A, Six managers who
graduated from college B and seven managers who graduated from
college C.
39
Kruskal-Wallis Test
College A College B College C
❑ Performance 25 60 50
Evaluation ratings 70 20 70
60 30 60
85 15 80
95 40 90
90 35 70
80 75
40
Kruskal-Wallis Test
College Rank College Rank College Rank
A B C
❑ Combined rankings 25 3 60 9 50 7
70 12 20 2 70 12
60 9 30 4 60 9
85 17 15 1 80 15.5
95 20 40 6 90 18.5
90 18.5 35 5 70 12
80 15.5 75 14
Sum 95 27 88
41
Kruskal-Wallis Test
• Rejection rules
Using p-value: Reject H0 if p-value < .05
• Kruskal-Wallis Test Statistic

k = 3 populations, n1 = 7, n2 = 6, n3 = 7, nT = 20
𝑘
12 𝑅𝑖 2
𝐻= ෍ − 3 𝑛𝑇 + 1
𝑛𝑇 𝑛𝑇 + 1 𝑛𝑖
𝑖=1
12 (95)2 (27)2 (88)2

𝐻= + + − 3 20 + 1 = 8.92
20(20 + 1) 7 6 7
42
Kruskal-Wallis Test
• Conclusion
From the ꭓ2 - table (Table 3) it follows that for 2 degrees of
freedom 7.378 (α=0,025)< H = 8.92 < 9.21 (α=0,01) and thus
the p-value will be: 0.01 < p-value < 0.025 which is less than 0.05
∴ we Reject H0. There is insufficient evidence to conclude that
the populations are identical.
OR
Using excel will show that p-value for H = 8.92 is .0116. As p-
value is < α = .05, we reject H0. There is insufficient evidence to
conclude that the populations are identical.
43
LEARNING
AREA
(Chapter 13)


44
Study Unit Theme 2.4.4: (Rank Correlation)
Learning Outcomes
• know what the difference between the Pearson correlation coefficient and
Spearman rank correlation coefficient are.
• be able to understand and calculate the rank-correlation coefficient.
• be able to calculate a p-value by hand and by using EXCEL’s NORM.S.DIST
and NORM.DIST.RT function.
• know what the requiremments for the sample size is in order to do a
normal approximation.
45
Rank Correlation
• The Pearson correlation coefficient, r, is a measure of the linear
association between two variables for which interval or ratio data are
available.
• The Spearman rank-correlation coefficient, rs , is a measure of
association between two variables when only ordinal data are
available.
• Values of rs can range from –1.0 to +1.0, where
• values near 1.0 indicate a strong positive association between the rankings,
and
• values near -1.0 indicate a strong negative association between the rankings.
46
Rank Correlation
• Spearman Rank-Correlation Coefficient, rs
6 σ 𝑑𝑖 2
𝑟𝑠 = 1 −
𝑛 𝑛2 − 1
where:
n = number of observations being ranked
xi = rank of observation i with respect to the first variable
yi = rank of observation i with respect to the second variable
d i = x i - yi
47
Test for Significant Rank Correlation
• We may want to use sample results to make an inference about the
population rank correlation ρs.
• To do so, we must test the hypotheses:
𝐻0 : ρ𝑠 = 0 (No rank correlation exists)
𝐻𝑎 : ρ𝑠 ≠ 0 (Rank correlation exists)
48
Rank Correlation
Sampling Distribution of rs when ρs = 0
• Mean
𝜇𝑟𝑠 = 0
• Standard Deviation
1
𝜎𝑟𝑠 =
𝑛−1
• Distribution Form
Approximately normal, provided n > 10
49
Rank Correlation
Example: Sales Potential and Sales Performance
A company wants to determine whether individuals who had a greater

potential at the time of employment turn out to have higher sales
records. Two rankings were assigned to 10 employees. One on the basis
of sales potential and other on the basis of actual sales performance.
Use rank correlation, with a =.05, to comment on the agreement of the
two rankings.
50
Rank Correlation
Sales person A B C D E F G H I J
Ranking of 2 4 7 1 6 3 10 9 8 5
potential
Ranking on 1 3 5 6 7 4 10 8 9 2
sales
Hypotheses
𝐻0 : ρ𝑠 = 0 (No rank correlation exists)
𝐻𝑎 : ρ𝑠 ≠ 0 (Rank correlation exists)
51
Rank Correlation
Sales person A B C D E F G H I J
Ranking of 2 4 7 1 6 3 10 9 8 5
potential
Ranking on 1 3 5 6 7 4 10 8 9 2
sales
Difference in 1 1 2 -5 -1 -1 0 1 -1 3
ranking
Squared 1 1 4 25 1 1 0 1 1 9 44
difference
52
Rank Correlation
• Rejection Rule
With .05 level of significance:
• Test Statistic
6 σ 𝑑𝑖 2 6(44)
𝑟𝑠 = 1 − 2 = 1- = 0.733
𝑛 𝑛 −1 10(100−1)
z = (rs - rs)/rs = (.733 - 0)/ 1

ൗ(10−1) = (.733 - 0)/.333 = 2.20
• p-Value
p-Value = 2(1-P(z ≤ 2.20) = 2(1.0000 - .9861) = .0278
53
Rank Correlation
• Conclusion
Reject H0.
The p-value is < a. Thus we reject the null hypothesis that the
population rank-correlation coefficient is 0.
There is a significant agreement (relationship) between the
rankings of Potential and Sales Performance.
54
End of Chapter 18
55

STK121 SUT 2.4.1-2.4.4 Nonparametric Methods (2022) MBS

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

STK121 SUT 2.4.1-2.4.4 Nonparametric Methods (2022) MBS

Uploaded by

Copyright:

Available Formats

STK 120/121/161

2.3 Time Series and Forecasting 2.3.1 Time Series Patterns

2.5 Index Numbers 2.5.1 Price Relatives

In terms of the binomial probability p:

The probability of 7, 8, 9, or 10 plus signs is:

.1172 + .0439 + .0098 + .0010 = .1719.

We are using a two-tailed hypothesis test, so:

With p-value > a, (.3438 > .10), we cannot reject H0.

• Normal Approximation of the Number of Plus Signs when H0: p = .50

• Mean and Standard Deviation

𝜎 = .25𝑛 = .25(60) = 3. 873

2.3 Time Series and Forecasting 2.3.1 Time Series Patterns

2.5 Index Numbers 2.5.1 Price Relatives

The manager is monitoring the balances maintained in checking

H0: Both the branches are identical in terms of populations of

There is insufficient evidence in the sample data to conclude that

2.3 Time Series and Forecasting 2.3.1 Time Series Patterns

2.5 Index Numbers 2.5.1 Price Relatives

Williams Manufacturing Company hires employees for its management

Performance rating data are available for independent samples of

• Kruskal-Wallis Test Statistic

12 (95)2 (27)2 (88)2

2.3 Time Series and Forecasting 2.3.1 Time Series Patterns

2.5 Index Numbers 2.5.1 Price Relatives

A company wants to determine whether individuals who had a greater

z = (rs - rs)/rs = (.733 - 0)/ 1

You might also like