Download as pdf or txt
Download as pdf or txt
You are on page 1of 55

STK 120/121/161

Chapter 18
(Study unit 2.4)
Nonparametric Methods

1
Chapter 18
Nonparametric Methods
• Sign Test (SUT 2.4.1)
• Wilcoxon Signed Rank Test (NOT PART of this curriculum)
• Mann-Whitney-Wilcoxon Test (SUT 2.4.2)
• Kruskal-Wallis Test (SUT 2.4.3)
• Rank Correlation (SUT 2.4.4)

2
LEARNING
STUDY UNIT STUDY UNIT THEME
AREA

Multivariate 2.1 Tests of Goodness of Fit and 2.1.1 Conditional probabilities in a two-way table
Statistical Independence 2.1.2 Tests of Independence
Methods (Chapter 12)
Textbook 2.2 Analysis of Variance and Experimental 2.2.1 An Introduction to Analysis of Variance
Design 2.2.2 Analysis of Variance and the Completely Randomized design
(Chapter 13)

2.3 Time Series and Forecasting 2.3.1 Time Series Patterns


(Chapter 17) 2.3.2 Smoothing Methods: Moving Averages and Forecasting
2.3.3 Trend Projection
2.3.4 Time Series Decomposition
2.4 Nonparametric Methods 2.4.1 Sign test
(Chapter 18) 2.4.2 Mann-Whitney-Wilcoxon test
4 Lectures 2.4.3 Kruskal-Wallis test
2.4.4 Spearman rank-correlation coefficient

2.5 Index Numbers 2.5.1 Price Relatives


(Notes) 2.5.2 Aggregate Price Indexes
2.5.3 Computing an aggregate price index from price relatives
2.5.4 Some important price indexes
2.5.5 Deflating a series by price indexes
2.5.6 Quantity Indexes

3
Study Unit Theme 2.4.1: (Sign Test)

Learning Outcomes
On completion of this study unit it will be expected of you to:
• know what the difference in approah between parametric and nonparametric tests are.
• be able to understand, explain and execute the hypothesis test about a single population
median.
• be able to calculate a p-value by hand and by using EXCEL’s BINOM.DIST function.
• be able to calculate a p-value for larger sample sizes, where the normal distribution
approximation of the binomial distribution is used in hand calculations and by using EXCEL’s
NORM.S.DIST and NORM.DIST functions.
• know what the requiremments for the sample size is in order to do a normal approximation.
• be able to apply the correction factors when approximating the discrete binomial distribution
with the continous normal distribution.

4
Nonparametric Methods
• Most of the statistical methods referred to as parametric require the
use of interval- or ratio-scaled data.
• Nonparametric methods are often the only way to analyze categorical
( nominal or ordinal) data and draw statistical conclusions.
• Nonparametric methods require no assumptions about the
population probability distributions.
• Nonparametric methods are often called distribution-free methods.
• Whenever the data are quantitative, we will transform the data into
categorical data in order to conduct the nonparametric test.

5
Sign Test
• The sign test is a versatile method for hypothesis testing
that uses the binomial distribution with p = .50 as the
sampling distribution.
• We present two applications of the sign test:
• A hypothesis test about a population median.
Remember the median is the datapoint in the middle of
the array of observations with half of the values (0.5)
less than it and half of the values (0.5) greater than it.
• A matched-sample test about the difference between
two populations (Not part of the Curriculum)

6
Hypothesis Test about a Population Median
• We can apply the sign test by:
• Using a plus sign whenever the data in the sample
are above the hypothesized value of the median
• Using a minus sign whenever the data in the
sample are below the hypothesized value of the
median
• Discarding (Delete) any data exactly equal to the
hypothesized median
7
Hypothesis Test about a Population Median
The assigning of the plus and minus signs makes the situation into a
binomial distribution application.
• The sample size is the number of trials.
• There are two outcomes possible per trial, a plus sign or a minus sign.
• The trials are independent.
• We let p denote the probability of a plus sign.
• If the population median is in fact a particular value, p should equal .5.

8
Hypothesis Test about a Population Median:
Small-Sample Case
• The small-sample case for this sign test should be used whenever n < 20.
• The hypotheses are
• 𝐻0 : 𝑝 = .50 The population median equals the value
assumed.
• 𝐻𝑎 : 𝑝 ≠ .50 The population median is different than the value
assumed.
• The number of plus signs is our test statistic.
• Assuming H0 is true, the sampling distribution for the test statistic is a
binomial distribution with p = .5.
• H0 is rejected if the p-value < level of significance, a.

9
Hypothesis Test about a Population Median:
Small-Sample Case
Example: Lawler Sample data
Lawler’s Grocery Store made the decision to carry Cape May Potato
Chips based on the manufacturer’s estimate that the median sales
should be $450 per week on a per-store basis.
Lawler’s has been carrying the potato chips for three months. Data
showing one-week sales at 10 randomly selected Lawler’s stores are
shown on the next slide.

10
Hypothesis Test about a Population Median:
Small-Sample Case Store Weekly Sign
Number Sales ($)
Example: Lawler Sample data
56 485 +
19 562 +
❑ One week sales at 10 Lawler grocery 36 415 -
stores 128 860 +
12 426 -
63 474 +
39 662 +
84 380 -
102 515 +
44 721 +

11
Hypothesis Test about a Population Median:
Small-Sample Case
Example: Lawler Sample data
Lawler’s management requested the following hypothesis test about the
population median weekly sales of Cape May Potato Chips (using a = .10).
H0: Median Sales = $450
Ha: Median Sales ≠ $450

In terms of the binomial probability p:


H0: p = .50
Ha: p ≠ .50

12
Hypothesis Test about a Population Median:
Small-Sample Case Number of +
signs
Probability

0 .0010
Example: Lawler Sample data 1 .0098
2 .0439
❑ Binomial Probabilities 3 .1172
4 .2051
with n = 10 and p = .50
5 .2461
6 .2051
7 .1172
8 .0439
9 .0098
10 .0010

13
Hypothesis Test about a Population Median:
Small-Sample Case
Because the observed number of plus signs is 7, we begin by computing the
probability of obtaining 7 or more plus signs.

The probability of 7, 8, 9, or 10 plus signs is:

.1172 + .0439 + .0098 + .0010 = .1719.

We are using a two-tailed hypothesis test, so:


p-value = 2(.1719) = .3438.

With p-value > a, (.3438 > .10), we cannot reject H0.


14
Hypothesis Test about a Population Median:
Small-Sample Case
• Conclusion
Because the p-value > a, we cannot reject H0. There is insufficient evidence in
the sample to reject the assumption that the median weekly sales is $450.

15
Hypothesis Test about a Population Median:
Larger Sample Size
• With larger sample sizes, we rely on the normal distribution
approximation of the binomial distribution to compute the p-value,
which makes the computations quicker and easier.

• Normal Approximation of the Number of Plus Signs when H0: p = .50


Mean:  = .50n
Standard Deviation: 𝜎 = .25𝑛
Distribution Form: Approximately normal for n > 20

16
Hypothesis Test about a Population Median:
Larger Sample Size
Example: Median price of new homes
A hypothesis test is being conducted about the median price of a new
home.
H0: Median ≥ 236,000
Ha: Median < 236,000
A random sample of 61 recent new home sales found 22 homes sold
for more than $236,000, 38 homes sold for less than $236,000 and one
home sold for $236,000.
Is there sufficient evidence to reject H0? Use a = .05.

17
Hypothesis Test about a Population Median:
Larger Sample Size
Example: Median price of new homes

• Letting x denote the number of plus signs, we will use the normal
distribution to approximate the binomial probability P(x < 22).
• Remember that the binomial distribution is discrete and the normal
distribution is continuous.
• To account for this, the binomial probability of 22 is computed by the
normal probability interval 21.5 to 22.5.

18
Hypothesis Test about a Population Median:
Larger Sample Size
Example: Median price of new homes

• Mean and Standard Deviation


 = .5(n) = .5(60) = 30

𝜎 = .25𝑛 = .25(60) = 3. 873


• Test Statistic
z = (x – )/s = (22.5 – 30)/3.873 = - 1.94
• p-Value

p-Value = .0262

19
Hypothesis Test about a Population Median:
Larger Sample Size
• Rejection Rule
Using .05 level of significance:
Reject H0 if p-value < .05

• Conclusion
Reject H0.

Median price of a new home is less than the $236,000 median price
a year ago.

20
LEARNING
STUDY UNIT STUDY UNIT THEME
AREA

Multivariate 2.1 Tests of Goodness of Fit and 2.1.1 Conditional probabilities in a two-way table
Statistical Independence 2.1.2 Tests of Independence
Methods (Chapter 12)
Textbook 2.2 Analysis of Variance and Experimental 2.2.1 An Introduction to Analysis of Variance
Design 2.2.2 Analysis of Variance and the Completely Randomized design
(Chapter 13)

2.3 Time Series and Forecasting 2.3.1 Time Series Patterns


(Chapter 17) 2.3.2 Smoothing Methods: Moving Averages and Forecasting
2.3.3 Trend Projection
2.3.4 Time Series Decomposition
2.4 Nonparametric Methods 2.4.1 Sign test
(Chapter 18) 2.4.2 Mann-Whitney-Wilcoxon test
4 Lectures 2.4.3 Kruskal-Wallis test
2.4.4 Spearman rank-correlation coefficient

2.5 Index Numbers 2.5.1 Price Relatives


(Notes) 2.5.2 Aggregate Price Indexes
2.5.3 Computing an aggregate price index from price relatives
2.5.4 Some important price indexes
2.5.5 Deflating a series by price indexes
2.5.6 Quantity Indexes

21
Study Unit Theme 2.4.2: (Mann-Whitney-Wilcoxon Test)

Learning Outcomes
On completion of this study unit it will be expected of you to:
• be able to understand, explain and execute the hypothesis test about the difference
between two populations based on independent samples.
• be able to calculate a p-value by hand and by using EXCEL’s BINOM.DIST function.
• be able to calculate a p-value for larger sample sizes, where the normal distribution
approximation of the binomial distribution is used in hand calculations and by using
EXCEL’s NORM.S.DIST and NORM.DIST functions.
• know what the requiremments for the sample size is in order to do a normal
approximation.
• be able to apply the correction factors when approximating the discrete binomial
distribution with the continous normal distribution.

22
Mann-Whitney-Wilcoxon Test
• This test is another nonparametric method for determining whether
there is a difference between two populations.
• This test is based on two independent samples.
• Advantages of this procedure are;
✓It can be used with either ordinal data or quantitative data.
✓It does not require the assumption that the populations have a normal
distribution.

23
Mann-Whitney-Wilcoxon Test
• Instead of testing for the difference between the medians of two
populations, this method tests to determine whether the two
populations are identical.
• The hypotheses are:
H0: The two populations are identical
Ha: The two populations are not identical

24
Mann-Whitney-Wilcoxon Test
Example: Third national bank

The manager is monitoring the balances maintained in checking


accounts at two branch banks and wants to know if the populations of
account balances at the two branch banks are identical.
Two independent samples of checking accounts are taken with sample
sizes n1 = 12 at branch 1 and n2 = 10 at branch 2.

25
Mann-Whitney-Wilcoxon Test
Example: Third national bank Branch 2
Branch 1 Branch 1 Account Balance ($)
Accoun Balance Account Balance 1 885
❑ Account balances t ($) ($) 2 850
1 1095 9 875 3 915
2 955 10 1055 4 950
3 1200 11 1025 5 800
4 1195 12 975 6 750
5 925 7 865
6 950 8 1000
7 805 9 1050
8 945 10 935

26
Mann-Whitney-Wilcoxon Test
Example: Third national bank

• Hypotheses

H0: Both the branches are identical in terms of populations of


account balances
Ha: The branches are not identical in terms of populations of
account balances

27
Mann-Whitney-Wilcoxon Test
• First, rank the combined data from the lowest to the highest values,
with tied values being assigned the average of the tied rankings.
• Then, compute W, the sum of the ranks for the first sample.
• Then, compare the observed value of W to the sampling distribution
of W for identical populations. The value of the standardized test
statistic z will provide the basis for deciding whether to reject H0.

28
Mann-Whitney-Wilcoxon Test
• Sampling Distribution of W with Identical Populations

• Mean
𝛍W = 1/2 n1(n1 + n2 + 1)
• Standard Deviation

𝛔𝑊 = 1ൗ12 𝑛1 𝑛2 (𝑛1 + 𝑛2 + 1)
• Distribution Form
Approximately normal, provided n1 > 7 and n2 > 7

29
Mann-Whitney-Wilcoxon Test Branch 2
Account Balance Rank
($)
Example: Third national bank 1 885 7
Branch 1 Branch 1 2 850 4
Accoun Balance Rank Account Balance Rank 3 915 8
t ($) ($) 4 950 12.5
1 1095 20 9 875 6
❑Combined 2 955 14 10 1055 19
5 800 2
6 750 1
ranking 3 1200 22 11 1025 17 7 865 5
4 1195 21 12 975 15 8 1000 16
5 925 9 Sum of 169.5 9 1050 18
6 950 12.5 ranks
10 935 10
7 805 3 Sum of 83.5
8 945 11 ranks

30
Mann-Whitney-Wilcoxon Test
Example: Third national bank

❑ Sampling
distribution of w

31
Mann-Whitney-Wilcoxon Test
Example: Third national bank
• Rejection Rule
Using .05 level of significance,
Reject H0 if p-value < .05

• Test Statistic
169−138
𝑃(W ≥ 169.5)= 𝑃 𝑧 ≥ 15.1658 = 𝑃(𝑧 ≥ 2.04) = 𝑃(𝑧 < − 2.04)
• p-Value
2(1-P(z ≤ 2.04) =2(1 - 0.9793) = 2(0.0207) = 0.0414
OR 2𝑃(𝑧 < − 2.04) = 2(0.0207) = 0.0414

32
Mann-Whitney-Wilcoxon Test
Example: Third national bank
• Conclusion
p-value = 0.0414 < α=0.05
∴ Reject H0.

There is insufficient evidence in the sample data to conclude that


there is no difference in the two populations of account
balances.

33
LEARNING
STUDY UNIT STUDY UNIT THEME
AREA

Multivariate 2.1 Tests of Goodness of Fit and 2.1.1 Conditional probabilities in a two-way table
Statistical Independence 2.1.2 Tests of Independence
Methods (Chapter 12)
Textbook 2.2 Analysis of Variance and Experimental 2.2.1 An Introduction to Analysis of Variance
Design 2.2.2 Analysis of Variance and the Completely Randomized design
(Chapter 13)

2.3 Time Series and Forecasting 2.3.1 Time Series Patterns


(Chapter 17) 2.3.2 Smoothing Methods: Moving Averages and Forecasting
2.3.3 Trend Projection
2.3.4 Time Series Decomposition
2.4 Nonparametric Methods 2.4.1 Sign test
(Chapter 18) 2.4.2 Mann-Whitney-Wilcoxon test
4 Lectures 2.4.3 Kruskal-Wallis test
2.4.4 Spearman rank-correlation coefficient

2.5 Index Numbers 2.5.1 Price Relatives


(Notes) 2.5.2 Aggregate Price Indexes
2.5.3 Computing an aggregate price index from price relatives
2.5.4 Some important price indexes
2.5.5 Deflating a series by price indexes
2.5.6 Quantity Indexes

34
Study Unit Theme 2.4.3: (Kruskal-Wallis Test)

Learning Outcomes
On completion of this study unit it will be expected of you to:
• be able to understand, explain and execute the hypothesis test about the
difference of more than two populations based on independent samples.
• know what the requiremments for the sample size is in order to do a
normal approximation.
• be able to calculate a p-value by hand and by using EXCEL’s CHI.DIST.RT
function.

35
Kruskal-Wallis Test
• The Mann-Whitney-Wilcoxon test has been extended by Kruskal and
Wallis for cases of three or more populations.
H0: All populations are identical
Ha: Not all populations are identical

• The Kruskal-Wallis test can be used with ordinal data as well as with
interval or ratio data.
• Also, the Kruskal-Wallis test does not require the assumption of
normally distributed populations.

36
Kruskal-Wallis Test
• Test Statistic
𝑘
12 𝑅𝑖 2
𝐻= ෍ − 3 𝑛𝑇 + 1
𝑛𝑇 𝑛𝑇 + 1 𝑛𝑖
𝑖=1

where:
k = number of populations
ni = number of observations in sample i
nT = total number of observations in all samples
Ri = sum of the ranks for sample i

37
Kruskal-Wallis Test
• When the populations are identical, the sampling distribution of the
test statistic H can be approximated by a chi-square distribution with
k – 1 degrees of freedom.
• This approximation is acceptable if each of the sample sizes ni is > 5.
• This test is always expressed as an upper-tailed test.
• The rejection rule is: Reject H0 if p-value < a

38
Kruskal-Wallis Test
Example: Williams Manufacturing Company

Williams Manufacturing Company hires employees for its management


staff from three different colleges. The director wants to know whether
there are differences in the performance ratings among the managers
who graduated from the three colleges.

Performance rating data are available for independent samples of


seven managers who graduated from college A, Six managers who
graduated from college B and seven managers who graduated from
college C.
39
Kruskal-Wallis Test
Example: Williams Manufacturing Company
College A College B College C
❑ Performance 25 60 50

Evaluation ratings 70 20 70
60 30 60
85 15 80
95 40 90
90 35 70
80 75

40
Kruskal-Wallis Test
Example: Williams Manufacturing Company
College Rank College Rank College Rank
A B C
❑ Combined rankings 25 3 60 9 50 7
70 12 20 2 70 12
60 9 30 4 60 9
85 17 15 1 80 15.5
95 20 40 6 90 18.5
90 18.5 35 5 70 12
80 15.5 75 14
Sum 95 27 88

41
Kruskal-Wallis Test
Example: Williams Manufacturing Company
• Rejection rules
Using p-value: Reject H0 if p-value < .05

• Kruskal-Wallis Test Statistic


k = 3 populations, n1 = 7, n2 = 6, n3 = 7, nT = 20

𝑘
12 𝑅𝑖 2
𝐻= ෍ − 3 𝑛𝑇 + 1
𝑛𝑇 𝑛𝑇 + 1 𝑛𝑖
𝑖=1

12 (95)2 (27)2 (88)2


𝐻= + + − 3 20 + 1 = 8.92
20(20 + 1) 7 6 7

42
Kruskal-Wallis Test
Example: Williams Manufacturing Company
• Conclusion
From the ꭓ2 - table (Table 3) it follows that for 2 degrees of
freedom 7.378 (α=0,025)< H = 8.92 < 9.21 (α=0,01) and thus
the p-value will be: 0.01 < p-value < 0.025 which is less than 0.05
∴ we Reject H0. There is insufficient evidence to conclude that
the populations are identical.
OR
Using excel will show that p-value for H = 8.92 is .0116. As p-
value is < α = .05, we reject H0. There is insufficient evidence to
conclude that the populations are identical.

43
LEARNING
STUDY UNIT STUDY UNIT THEME
AREA

Multivariate 2.1 Tests of Goodness of Fit and 2.1.1 Conditional probabilities in a two-way table
Statistical Independence 2.1.2 Tests of Independence
Methods (Chapter 12)
Textbook 2.2 Analysis of Variance and Experimental 2.2.1 An Introduction to Analysis of Variance
Design 2.2.2 Analysis of Variance and the Completely Randomized design
(Chapter 13)

2.3 Time Series and Forecasting 2.3.1 Time Series Patterns


(Chapter 17) 2.3.2 Smoothing Methods: Moving Averages and Forecasting
2.3.3 Trend Projection
2.3.4 Time Series Decomposition
2.4 Nonparametric Methods 2.4.1 Sign test
(Chapter 18) 2.4.2 Mann-Whitney-Wilcoxon test
4 Lectures 2.4.3 Kruskal-Wallis test
2.4.4 Spearman rank-correlation coefficient

2.5 Index Numbers 2.5.1 Price Relatives


(Notes) 2.5.2 Aggregate Price Indexes
2.5.3 Computing an aggregate price index from price relatives
2.5.4 Some important price indexes
2.5.5 Deflating a series by price indexes
2.5.6 Quantity Indexes

44
Study Unit Theme 2.4.4: (Rank Correlation)

Learning Outcomes
On completion of this study unit it will be expected of you to:
• know what the difference between the Pearson correlation coefficient and
Spearman rank correlation coefficient are.
• be able to understand and calculate the rank-correlation coefficient.
• be able to calculate a p-value by hand and by using EXCEL’s NORM.S.DIST
and NORM.DIST.RT function.
• know what the requiremments for the sample size is in order to do a
normal approximation.

45
Rank Correlation
• The Pearson correlation coefficient, r, is a measure of the linear
association between two variables for which interval or ratio data are
available.
• The Spearman rank-correlation coefficient, rs , is a measure of
association between two variables when only ordinal data are
available.
• Values of rs can range from –1.0 to +1.0, where
• values near 1.0 indicate a strong positive association between the rankings,
and
• values near -1.0 indicate a strong negative association between the rankings.

46
Rank Correlation
• Spearman Rank-Correlation Coefficient, rs

6 σ 𝑑𝑖 2
𝑟𝑠 = 1 −
𝑛 𝑛2 − 1

where:
n = number of observations being ranked
xi = rank of observation i with respect to the first variable
yi = rank of observation i with respect to the second variable
d i = x i - yi

47
Test for Significant Rank Correlation
• We may want to use sample results to make an inference about the
population rank correlation ρs.
• To do so, we must test the hypotheses:
𝐻0 : ρ𝑠 = 0 (No rank correlation exists)
𝐻𝑎 : ρ𝑠 ≠ 0 (Rank correlation exists)

48
Rank Correlation
Sampling Distribution of rs when ρs = 0
• Mean
𝜇𝑟𝑠 = 0

• Standard Deviation
1
𝜎𝑟𝑠 =
𝑛−1
• Distribution Form
Approximately normal, provided n > 10

49
Rank Correlation
Example: Sales Potential and Sales Performance

A company wants to determine whether individuals who had a greater


potential at the time of employment turn out to have higher sales
records. Two rankings were assigned to 10 employees. One on the basis
of sales potential and other on the basis of actual sales performance.
Use rank correlation, with a =.05, to comment on the agreement of the
two rankings.

50
Rank Correlation
Example: Sales Potential and Sales Performance
Sales person A B C D E F G H I J
Ranking of 2 4 7 1 6 3 10 9 8 5
potential
Ranking on 1 3 5 6 7 4 10 8 9 2
sales

Hypotheses
𝐻0 : ρ𝑠 = 0 (No rank correlation exists)
𝐻𝑎 : ρ𝑠 ≠ 0 (Rank correlation exists)

51
Rank Correlation
Example: Sales Potential and Sales Performance
Sales person A B C D E F G H I J
Ranking of 2 4 7 1 6 3 10 9 8 5
potential
Ranking on 1 3 5 6 7 4 10 8 9 2
sales
Difference in 1 1 2 -5 -1 -1 0 1 -1 3
ranking
Squared 1 1 4 25 1 1 0 1 1 9 44
difference

52
Rank Correlation
Example: Sales Potential and Sales Performance
• Rejection Rule
With .05 level of significance:
Reject H0 if p-value < .05

• Test Statistic
6 σ 𝑑𝑖 2 6(44)
𝑟𝑠 = 1 − 2 = 1- = 0.733
𝑛 𝑛 −1 10(100−1)

z = (rs - rs)/rs = (.733 - 0)/ 1


ൗ(10−1) = (.733 - 0)/.333 = 2.20
• p-Value
p-Value = 2(1-P(z ≤ 2.20) = 2(1.0000 - .9861) = .0278
53
Rank Correlation
Example: Sales Potential and Sales Performance
• Conclusion

Reject H0.

The p-value is < a. Thus we reject the null hypothesis that the
population rank-correlation coefficient is 0.
There is a significant agreement (relationship) between the
rankings of Potential and Sales Performance.

54
End of Chapter 18

55

You might also like