Professional Documents
Culture Documents
STA 2020 Non Parametrics
STA 2020 Non Parametrics
Original Authors:
Allan Clark, Kutlwano Ramaboa, Karl Stielau, Christien Thiart, Melvin Varaghuse
Revised for ERT by Ehsaan Rajak with help from slides as prepared by Neil Watson
Department of Statistical Sciences
University of Cape Town
1
Contents
1 Introduction 3
5 k-Independent Samples 23
7 Tests of Association 28
9 Additional Exercises! . . . 34
2
1 Introduction
The inferential statistics that you have encountered thus far, such as the t-test and ANOVA, are
examples of parametric tests. A parametric test makes many assumptions about the nature of the
population from which the observations or data were drawn (e.g. normal distribution; two samples
of data drawn from populations having the same variance (σ 2 ), etc.). Parametric tests are more
powerful1 when all the assumptions required by a particular statistical test are met.
In contrast, non-parametric tests make very few and less stringent assumptions about the underlying
distribution of the data. Thus, non-parametric statistics are useful when the required distribution
of the data is unknown or other assumptions required by a parametric test are not met. This is
because the majority of the non-parametric tests do not focus on the numerical values of the scores
but rather on the rank of the scores.
Data comes in different forms and it is important to know about these because it is part of what
determines the statistical test that you can use to analyse your data.
There are two main classifications of scales of measurement, and four scales of measurement that
must be considered:
Data that represent categories, such as nominal and ordinal observations, are collectively called
categorical (or qualitative) data. Data that are counted or measured using a numerically defined
method are called numerical (or quantitative) data.
Nominal-scale Data that is measured on a nominal scale can be placed into categories, but the
categories do not have a natural order. For example, colour (black, white, red, yellow, etc.)
Ordinal-scale Data that is measured on an ordinal scale can be placed into categories, but the
objects in one category of a scale are not only different from the objects in the other categories
of that scale but also stand in some kind of relation to them. However, the differences between
categories cannot be defined. For example, students’ grade on a course, where possible grades
are A, B, C, D, etc.; when we classify the size of companies into ‘small’, ‘medium’ and ‘large’;
Likert scales; etc.
1 Power in the statistical sense refers to how likely a test is to reject a false null hypothesis.
3
Interval-scale Data that is measured on an interval scale has all the characteristics of an ordinal
scale, except that the differences between any two numbers on the scale do have meaning.
Ratios between numbers on this scale are not meaningful, so operations such as multiplication
and division cannot be carried out directly. But ratios of differences can be expressed. The
zero point does not indicate the absence of the characteristic being measured, but is arbitrary
or undefined. Examples of interval scales are rare, but include time; temperature in degrees
centigrade (zero degrees is not ‘no temperature’); counts or ranked data; etc.
Ratio-scale Data that is measured on a ratio scale has all the characteristics of an interval scale
and, in addition, has a true zero point as its origin (i.e. a zero does indicate the absence of the
characteristic being measured). Examples include length; weight; etc.
Note that for most statistical procedures, the distinction between interval-scale and ratio-scale
does not matter and it is common to use the term “interval” to refer to ratio data as well.
Knowledge Check
Determine the type of data for the following:
3. The rating (Extremely poor[1], Very Poor[2], Poor[3], Unsure[4], Good[5], Very Good[6], Ex-
cellent[7]) reported for a particular television program by each of a sample of viewers.
4. The weekly closing price of gold throughout the year.
5. The month of highest sales for each firm in a sample.
6. The socio-economic status of people who reside in Cape Town (upper class, middle class, lower
class).
7. The responses by citizens on a 5-point rating scale (where 1=Strongly Disagree, 2=Disagree,
3=Unsure, 4=Agree, 5=Strongly Agree) to the statement:
“South Africa should be divided into two time zones”.
Solution
1. Ratio-Scale
2. Nominal
3. Ordinal
4. Ratio-Scale
5. Nominal
6. Ordinal
4
7. Ordinal
8. Nominal
9. Interval-Scale
10. Ordinal
Different tests require an assumption about the measurement scale of the data. The following tables
are a summary of the various nonparametric tests that will be studied in this course.
5
1.3.4 Tests of association
6
2 Single Sample Tests
In every statistical test and estimation procedure that we have encountered so far, we have assumed
that the data comprise a random sample from the population. It is possible for a data set to not be
a random sample from a population, but to have some internal sequential pattern. A majority of
statistical tests however require that the data be random.
Random means that the process generating the sample produces a sequence of data within which the
sequence of values are independent of each other. The statistical test that enables us to determine
whether the data is random is called the Runs test. The Runs test is based on the order or sequence
in which the data were originally obtained.
Data Assumptions
• The data must be observed and recorded in some natural or chronological order.
• The data (either originally or after some transformation) consists of two mutually exclusive
and exhaustive categories. For example, the following sequence of data has two categories only:
M M M F M M F F
Terminology
The Runs test is based on the number of runs which a sample exhibits.
• A run is defined as any sequence of observations of one type (i.e. a succession of identical
categories), bounded by observations of the other type, or by no observations. For example,
the following sequence:
M M M F M M F F
has R = 4 runs in n = 8 observations. The sample begins with a run of three Ms followed by
a run which consists of one F, then another run which consists of two Ms, followed by a run of
two Fs. By underlining and numbering each succession of identical observations, we observe
the 4 runs.
M M M F M M F F
1 2 3 4
The total number of runs in a sample can give an indication of whether or not the sample is
random. Non-randomness is observed when either:
1. Too many runs occur. This feature indicates that observations in one category tend to
follow observations of the other category, and form a repeated alternating pattern within
the observed data sequence.
7
For example,
N D N D N D N D N D N D N D N N
2. Too few runs occur. This feature indicates that observations tend to have the same cate-
gory as their predecessors, and hence definite grouping or clustering is present within the
observed data sequence. For example,
N N N N N N N N N D D D D D D D
• The length of the run, l = number of observations in the run. For example, in the following
sequence:
M M M F M M F F
there are 4 runs in 8 observations, and the length of the first run l1 = 3.
Hypotheses
Data Summary
Test Statistic
8
Test statistic = R (number of runs)
For large samples (i.e. n1 > 20 AND/OR n2 > 20), the sampling distribution of R is approximately
Gaussian or the so-called normal, with
R − µR
z=
σR
Critical Region
• The critical values for the small sample Runs test are given in the Runs Test table
• There are two critical values, and we reject H0 if R is smaller than or equal to RL , the lower
critical value, or if R is greater than or equal to RU , the upper critical value, and infer that
the sequence of data is not random (that is, there is switching of categories).
• We reject H0 if |z| ≥ c, where c is the critical value, or if the calculated p-value ≤ α, where
p-value = P r(|Z| ≥ z).
Class Examples
Example 1: Consider an industrial production line which produces television sets. As each succes-
sive set leaves the production line it is subjected to a quality control check and classified as
either non-defective (N) or defective (D).
We expect the production sequence of sets to be random. If not, it may indicate a systematic
problem in production which should be corrected.
The following results from a two-hour period were recorded in the order in which the television
sets were checked:
N N N D N N D D N N N N N D D
9
1. Find n, n1 , n2 and R
2. Test at the 5% significance level whether the sequence of data is random.
Example 2: Assume that the results of the quality control check were extended to cover an entire
8-hour shift. The following results were recorded:
N N D N D D N N D D D N N N N N D N N
N D N N N D N D N N N N N D D D N D N
N D N N N N D N D N N N D D D N N N N
N N N
1. Find n, n1 , n2 and R
2. Test at the 2.5% significance level whether the full sequence of data is random
3. What is the p-value of the test?
1. Determine the position of the score with respect to the median by assigning a + if above
median, or − if below the median
2. Find n, n1 , n2 and R
3. Test the hypothesis of random order
10
Position of the
score with respect
Child Score to the median
1 31
2 23
3 36
4 43
5 51
6 44
7 12
8 26
9 43
10 75
11 2
12 3
13 15
14 18
15 78
16 24
17 13
18 27
19 86
20 61
21 13
22 7
23 6
24 8
11
3 Two Sample Tests
The Sign test gets its name from the fact that it is based on the sign of the difference between two
related observations. The test is used when we wish to compare two matched or paired populations
when the data are ordinal.
Data Assumptions
Hypotheses
The null hypothesis for the Sign test is that there is no difference between the two populations. If
this assumption is true, then the number of + signs (or − signs) should have a binomial distribution
with p = 0.5.
N ull Hypothesis: H0 : The two population locations (medians) are the same (OR p = 0.5)
Alternative Hypothesis: H1 : The two population locations (medians) are not the same
(OR p 6= 0.5) (Two-tailed test)
Data Summary
• calculate the difference for each pair (i.e. for each observation, subtract the 2nd value from
the 1st value).
• eliminate all pairs whose difference = 0, and effectively reduce the sample size. NOTE: n =
number of non-zero differences
12
• Record the sign of all the paired-differences (we usually designate a plus sign (+) for a positive
difference, and a minus sign (−) for a negative difference)
Test Statistic
n n
µS = , σS2 =
2 4
(S ± 0.5) − µS
z=
σS
where (S + 0.5) is used when S < n/2, and (S − 0.5) is used when S > n/2. The 0.5 is a “correction
for continuity”. The correction is necessary because the Gaussian distribution is continuous while
the Sign test involves discrete values.
Critical Region
• We refer to the binomial tables to find the critical region which is defined by the alternative
hypothesis
• Reject H0 if p-value ≤ α
Class Examples
13
Example 1: In an experiment to determine which car is perceived to have the more comfortable
ride, 25 people took two rides:
− One ride in a European model.
− One ride in a North American car.
Each person ranked the cars on a scale of 1 (ride is very uncomfortable) to 5 (ride is very
comfortable). Do the data on the next page allow us to conclude that the European car is
perceived to be more comfortable? Test at the 1% significance level.
Example 2: Two different additives were compared to see which one is better for improving the
durability of concrete. 100 small batches of concrete were mixed under various conditions, and
during the mixing each batch was divided into two parts. One part received additive A and
the other part received additive B. After the concrete hardened, the two parts in each batch
were crushed against each other, and an observer rated the two parts to determine the part
that appeared to be most durable. In 77 cases the concrete with additive A was rated more
durable; in 23 cases the concrete with additive B was rated more durable. Is there a significant
difference between the effects of the two additives?
14
Example 3: Some 22 customers in a grocery store were asked to taste each of two types of cheese
and declare their preference. 7 customers preferred one kind, 12 preferred the other kind, and
3 had no preference. Does this data set indicate a significant difference in preference? Test at
the 5% significance level.
The Sign test discussed above utilizes information only about the direction of the differences within
paired observations. The Wilcoxon Signed Rank Sum test is used if we wish to consider the direction
as well as the size of the difference within paired observations. Thus the test is used when we wish
to compare two matched or paired populations when the data are quantitative (i.e. either interval
or ratio-scaled).
The Wilcoxon Signed Rank Sum test is a non-parametric counterpart of the parametric matched or
paired t-test.
Data Assumptions
Hypotheses
Data Summary
15
• rank the absolute value of the differences
NOTE: Data are ranked by ordering them from lowest to highest and assigning them, in order,
the integer values from 1 to n (e.g. numbers 11, 16, 17, 25, 31 are assigned ranks 1, 2, 3, 4,
and 5). Ties are resolved by assigning any tied values the mean of the ranks they would have
received if there were no ties (e.g. numbers 11, 17, 17, 25, 31 are assigned ranks 1, 2.5, 2.5, 4,
and 5. The two tied numbers, which would have been assigned rank 2 and 3, are assigned the
mean of 2 and 3 = 2.5).
• record the sign of the paired difference (we usually designate a plus sign (+) for a positive
difference, and a minus sign (−) for a negative difference)
• calculate the sum of the signed ranks = T .
Test Statistic
n(n + 1)(2n + 1)
µT = 0, σT2 =
6
T − µT
z=
σT
Critical Region
Large Sample Critical Region (for n ≥ 10)
16
Class Examples
Example 2: A regional water authority has been carrying out some new pollution-control measures
on one of the main rivers under its control.
Pollution was measured at 12 sites before new controls were implemented, and then again four
years later at the same 12 sites.
The water authority wants to determine whether the new controls have been effective. Use a
2.5% significance level.
17
Example 3: Does a flexi-time work-schedule help reduce the travel time of workers to work?
A random sample of 32 workers was selected, and workers recorded their travel time before
and after the program was implemented.
18
4 Two Independent samples
Also called the U-test, the Mann-Whitney test, the Wilcoxon-Mann-Whitney test, or just the Rank
Sum test.
The Wilcoxon Rank Sum test is used to determine whether two independent random sample groups
have been drawn from the same population, and is a counterpart to the parametric t-test for two
normal independent random samples.
Data Assumptions
• the distributions of the two populations differ with respect to location only (if they differ at
all).
Hypotheses
Data Summary
19
Test Statistic
Large Sample Test statistic (for n1 or n2 or both > 10) For large samples, the
sampling distribution of T is approximately Gaussian with
n1 (n1 + n2 + 1) n1 n2 (n1 + n2 + 1)
µT = , σT2 =
2 12
T − µT
z=
σT
Critical Region
• The critical values for the small sample Wilcoxon Rank Sum test are given in the Wilcoxon
Rank Sum Test Tables.
• There are two critical values, and we reject H0 if T is smaller than or equal to TL , the lower
critical value, or if T is greater than or equal to TU , the upper critical value.
• The upper critical value can be calculated by TU = n1 (n1 + n2 + 1) − TL
20
Class Examples
Example 1: Based on the two independent samples shown below, can we infer at the 5% significance
level that the location of population 1 is to the left of the location of population 2?
Sample 1: 20, 23, 22, 18, 24
Sample 2: 22, 27, 26, 28, 25
Example 2: The ABC Company has sent 13 of its employees to a privately-run programme that
provides word-processing skills training. Six of the employees were randomly chosen from the
data-processing (DP) department and the others were from the typing (T) pool.
At the end of the programme the company received a report indicating the score received by
each of its employees out of a total possible score of 100. The scores of the 13 employees of
ABC are given in the following table:
Is there a difference in the performance of the two groups in the word-processing programme?
Test at a 5% significance level.
DP T
70 59
52 70
46 75
65 85
60 50
40 82
64
Each participant was asked to indicate which one of the following five statements best repre-
sented the effectiveness of the drug they took.
21
New Rank Aspirin Rank
3 4
5 1
4 3
3 2
2 4
5 1
1 3
4 4
5 2
3 2
3 2
5 4
5 3
5 4
4 5
Example 4: The number of defective products from each of two production lines was recorded daily
for a period of 14 days on each production line (the measurements were taken over different
14-day periods for the two production lines). The results are shown below:
Day A B
1 172 201
2 165 180
3 206 159
4 185 192
5 175 177
6 142 170
7 190 182
8 169 179
9 161 169
10 184 192
11 191 180
12 170 174
13 138 159
14 172 166
Assume that both production lines produce the same daily output. Test whether production
line B produces more defective products than production line A.
22
5 k-Independent Samples
The Kruskal-Wallis test is a test used to determine whether k independent samples are from different
populations, thus, it is an extension of the Wilcoxon Rank Sum test for two independent samples.
The test is an equivalent of a single factor ANOVA, and is used to determine whether differences
among three or more groups are significant in situations that do not meet the assumptions necessary
for single factor/one-way ANOVA.
Data Assumptions
• The data are either ordinal or quantitative but not necessarily Gaussian.
• The random samples (i.e. treatment levels) and observations within the random samples (treat-
ment levels) are independent.
• nj ≥ 3 (i.e. at least three observations per sample).
• The distributions of all the population locations differ with respect to location only (if they
differ at all).
Hypotheses
N ull Hypothesis: H0 : The locations of all the k populations (groups) are the same
Alternative Hypothesis: H1 : At least two population locations differ
Data Summary
P
• combine observations from all k groups to form one sample (n = nj )
• rank the observations from 1 (smallest) to n (largest)
Pk
• calculate Tj , the sum of ranks within each of the k treatment levels (check that Tj =
n(n + 1)/2)
Test Statistic
k
12 X Tj2
H= − 3(n + 1)
n(n + 1) j=1 nj
23
Critical Region
The statistic H has approximately a chi-squared (χ2 ) distribution, with degrees of freedom equal to
k − 1. Therefore, we use the χ2 tables.
Class Examples
Example 1: How do customers rate three shifts with respect to speed of service in a particular
restaurant?
Three samples of 10 customer response-cards were randomly selected, one sample from each
shift, and customer ratings were recorded:
Can we conclude at a 5% significance level that customers perceive the speed of service to be
different among the three shifts?
Example 2: To determine whether absentee rates are the same amongst three different levels of
employees in a company, samples comprising of 4 top managers, 5 middle managers and 5
workers were selected and their records examined to determine how many days they reported
sick in the last year.
Is there evidence that the absentee rates differ from one level of employee to another?
24
Example 3: The ages of executives in top management in 4 firms – A, B, C, and D are as follows:
If these 4 samples are considered representative of the firms’ top management age structure,
test whether the average age of the executives varies from firm to firm. Conduct the test at a
10% significance level.
When we wish to compare k matched samples (where k > 2), the Friedman test is used. The test is
an extension of both the Sign test and the Wilcoxon Signed Rank Sum test, and is an alternative of
the parametric randomised block design two-way ANOVA.
Data Assumptions
• The data are either ordinal or quantitative but not Gaussian (normal)
• The data are generated from a blocked experiment with b blocks and k treatments
• The measurements within a block are dependent or related
• The measurements from different blocks are independent
• The patterns within blocks are random
Hypotheses
N ull Hypothesis: H0 : The locations of all the k populations are the same
Alternative Hypothesis: H1 : At least two population locations differ
Data Summary
25
• identify the blocks and treatment levels
• rank the observations from smallest to largest within each block
• calculate Tj , the sum of ranks within each of the k treatment levels
Test Statistic
k
12 X
Fr = T 2 − 3b(k + 1)
bk(k + 1) j=1 j
where
Critical Region
The statistic Fr has approximately a chi-squared (χ2 ) distribution (provided that either b or k ≥ 5),
with degrees of freedom equal to k − 1. Therefore the critical values (and corresponding probability
levels) from the χ2 tables will be used to draw conclusions about the null hypothesis.
Class Examples
Example 1: Four managers evaluate applicants for a job in an accounting firm on several dimen-
sions. Eight applicants were selected, and their evaluations by the four managers recorded.
There are 5 possibilities:
1) The candidate is in the top 5% of applicants 2) The candidate is in the top 10% of applicants,
but not in the top 5% 3) The candidate is in the top 25% of applicants, but not in the top 10%
4) The candidate is in the top 50% of applicants, but not in the top 25% 5) The candidate is
in the bottom 50% of applicants
Can we conclude that there are differences in the way managers evaluate applicants?
26
Manager
Applicant 1 2 3 4
1 2 1 2 2
2 4 2 3 2
3 2 2 2 3
4 3 1 3 2
5 3 2 3 5
6 2 2 3 4
7 4 1 5 5
8 3 2 5 3
Example 2: Four property development companies enter sealed bids for a number of vacant plots
of land on auction. From the random sample of plots and bids (in thousands of Rands) made
below, does it appear as if some firms tend to make higher bids on average than other firms?
Test at the 0.5% significance level.
Home-owner
Grass 1 2 3 4 5 6 7 8 9 10 11 12
1 4 4 3 3 4 2 1 2 3.5 4 4 3.5
2 3 2 1.5 1 2 2 3 4 1 1 2 1
3 2 3 1.5 2 1 2 2 1 2 3 3 2
4 1 1 4 4 3 4 4 3 3.5 2 1 3.5
27
7 Tests of Association
The tests introduced in this section can be used with variables whose joint distribution is any specified
distribution, including the bivariate normal, or whose joint distribution is completely unknown and
therefore not specified.
Recall from INTROSTAT Chapter 12, that if an association exists between two variables, no matter
how the association is measured, this association cannot and should not be interpreted as implying
a cause and effect relationship between the two variables.
• they are interacting with each other (i.e. one (or both) variable(s) affects the other), or
• mere coincidence, or
• because both variables are affected by other variables that have not been measured in the
study.
The Spearman rank-order correlation coefficient test is used when we wish to measure the degree of
association between two variables that are measured on at least an ordinal scale.
Data Assumptions
Hypotheses
N ull Hypothesis: H0 : ρs = 0
(no rank correlation exists between the two variables)
Alternative Hypothesis: H1 : ρs 6= 0
(significant rank correlation exists between the two variables)
(Two-tailed test)
OR H1 : ρs > 0
(the rank correlation between the two variables is positive)
(One-tailed test)
OR H1 : ρs < 0
(the rank correlation between the two variables is negative
(One-tailed test)
28
Data Summary
2. Calculate the difference d for each pair of ranks, that is, d = rank(xi ) − rank(yi ).
Test Statistic
n
X
6 d2i
i=1
rs = 1 −
n(n2 − 1)
For large samples (n ≥ 10), rs is approximately normally distributed, and the test
statistic is
√
z = rs n − 1
29
Critical Region
Class Examples
Example 1: A company runs a large fleet of trucks, which vary in age from 1 year old to 12 years
old. The annual running costs (in thousands of rands) and age of a random sample of 8 trucks
are given in the following table:
Can the company infer at the 2.5% significance level that age of trucks is positively correlated
with running costs?
A random sample of 20 production workers was selected. The test scores and performance
rating scores were recorded for each person.
Can the firm’s manager infer at the 1% significance level that aptitude test scores are correlated
with performance rating?
30
Aptitude Rank Aptitude Performance Rank Performance
59 9 3 10.5
47 3 2 3.5
58 8 4 17
66 14 3 10.5
77 20 2 3.5
57 7 4 17
62 12 3 10.5
68 16 3 10.5
69 17 5 19.5
36 1 1 1
48 4 3 10.5
65 13 3 10.5
51 5 2 3.5
61 11 3 10.5
40 2 3 10.5
67 15 4 17
60 10 2 3.5
56 6 3 10.5
76 19 3 10.5
71 18 5 19.5
Example 3: After several semesters without much success, Pat Statstud (a student in the lowest
quarter of a statistics course) decided to try and improve. Pat needed to know the secret of
success for university students. After many hours of discussion with other more successful
students, Pat postulated a rather radical theory: The longer one studied, the better one’s
grade.
To test the theory, Pat took a random sample of 35 students in an economics course and
asked each to report the average amount of time he or she studied economics, and the final
percentage mark obtained.
Test to determine whether grade and study time are positively related.
31
Time Rank Time Mark (%) Rank Mark
30 17 71 9
5 4 30 4
36 30.5 82 17.5
37 32 98 34
32 22.5 78 14
23 7 73 10.5
34 28 82 17.5
2 2.5 25 3
34 28 94 32
43 35 99 35
34 28 85 22
32 22.5 74 12
30 17 79 15
36 30.5 82 17.5
40 34 88 26
24 8.5 55 5
0 1 7 1
25 10.5 62 6
29 13.5 91 29.5
21 5 66 8
31 20.5 86 23
30 17 73 10.5
33 25 90 28
30 17 88 26
33 25 91 29.5
22 6 64 7
29 13.5 83 20
24 8.5 87 24
30 17 96 33
2 2.5 16 2
31 20.5 84 21
33 25 92 31
25 10.5 82 17.5
38 33 88 26
26 12 75 13
32
8 Advantages and Disadvantages of Non-parametric Tests
• The tests can be used when parametric methods are inapplicable, or the validity of their
parametric assumptions is uncertain.
• The tests are useful when sample sizes are small as there may be no equivalent parametric
test, unless the population distribution is known exactly.
• The tests often involve less computational work and are therefore sometimes easier and quicker
to apply than a corresponding parametric test.
• The assumptions are usually few and easily met, in contrast to assumptions for parametric
techniques.
• The tests are not just restricted to quantitative data i.e. the tests can be used on all types of
data.
33
9 Additional Exercises! . . .
1. The human resource manager of a large company wanted to compare how long business and
non-business graduates worked for the company before quitting. Two samples of 25 business
graduates and 20 non-business graduates were randomly selected from the lists of former em-
ployees. The data representing their time with the company were recorded (in months):
Business Non-Business
60 25
11 60
18 22
19 24
5 23
25 36
60 39
7 15
8 35
17 16
37 28
4 9
8 60
28 29
27 16
11 22
60 60
25 17
5 60
13 32
22
11
17
9
4
Can the personnel manager conclude at a 5% significance level that a difference in duration of
employment exists between business and non-business graduates?
2. Two kinds of emergency flares are compared on the basis of the following burning times
(rounded to the nearest tenth of a minute):
Brand A 14.9 11.3 13.2 16.6 17.0 14.1 15.4 13.0 16.9
Brand B 15.2 19.8 14.7 18.3 16.2 21.2 18.9 12.2 15.3 19.4
Test whether the average burning time of Brand A flares is less than that of Brand B flares -
use a 5% significance level.
34
3. The data below show IQ scores (Wechsler IQ test) of children with severe learning problems
after taking a placebo and after taking a drug (Ethosuximide). The order in which the placebo
and the drug were administered was randomized.
Test whether the drug has a significant effect on measured IQ, in particular that the drug has
an adverse effect on IQ. Use a 2.5% significance level.
4. The following are the final examination grades of samples from three groups of students who
where taught Swahili by one of three different methods (classroom instruction and language
laboratory (A), only classroom instruction (B), and only self-study in language laboratory (C)):
A 94 88 91 74 87 97
B 85 82 79 84 61 72 80
C 89 67 72 76 69
5. Until its recent indictment as a possible carcinogen, cyclamate was a widely used sweetener
in soft drinks. The following data show a comparison of three laboratory methods for deter-
mining the percentage of sodium cyclamate in commercially produced orange drink. All three
procedures were applied to each of 12 samples.
Method
Sample A B C
1 0.598 0.628 0.632
2 0.614 0.628 0.630
3 0.600 0.600 0.622
4 0.580 0.612 0.584
5 0.596 0.600 0.650
6 0.592 0.628 0.606
7 0.616 0.628 0.644
8 0.614 0.644 0.644
9 0.604 0.644 0.624
10 0.608 0.612 0.619
11 0.602 0.628 0.632
12 0.614 0.644 0.616
35
6. The data below represent the monthly sales and the promotional expenses for a women’s ap-
parel store that specializes in sportswear for younger women.
Calculate rs , the Spearman’s rank correlation between monthly sales and promotional ex-
penses. Use a 1% significance level to test for evidence of rank association.
36
7. Monthly returns of Intel for the period January 1993 to December 1995 are shown below:
Test whether positive returns are more likely to follow positive returns (i.e. determine whether
there is evidence against the data sequence being random).
8. A well known soft drink manufacturer has used the same secret recipe for its product since
its introduction more than 100 years ago. In response to decreasing market share, however,
the president of the company is contemplating changing the recipe. She has developed two
alternative recipes. In a preliminary study, she asked 15 randomly selected people to taste the
original recipe and the two new recipes. Each person was then asked to evaluate the product
on a 5-point scale, where 1=awful, 2=poor, 3=fair, 4=good and 5=wonderful. The data is
shown below:
37
Original New Recipe 1 New Recipe 2
5 5 5
3 4 5
4 5 5
2 4 4
3 3 5
2 2 3
3 3 2
4 3 5
1 1 1
1 3 2
2 4 3
3 3 4
5 3 4
3 2 3
4 3 5
Can we conclude at the 1% significance level that there are differences in the ratings of the
three recipes?
9. A cell phone user suspects that instances of satisfactory (S) and poor (P) signal reception do
not occur at random, but rather in periods lasting several minutes. Over an hour, the cell
phone user checks reception at one minute intervals, making the following 28 observations:
S S P P S S P S S S S S P P S S S P S P S S S P
S S P S
Test whether the observations support the suspicions of the cell phone user.
10. The high cost of medical care makes it imperative that hospitals operate efficiently and ef-
fectively. As part of a larger study, a random sample of 60 patients leaving a hospital were
surveyed. They were asked how satisfied they were with the treatment they received. The
responses were recorded with a measure of the degree of severity of their illness (as determined
by the admitting physician) and the length of their stay.
Satisfaction levels were coded from 1 for very unsatisfied to 5 for very satisfied and severity of
illness was coded from 1 for least severe to 10 for most severe.
The Spearman rank correlation between all the variables is presented below:
Is the satisfaction level affected by the severity of the illness? Conduct a suitable test.
38