Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 24

Sultan Kudarat State University

Access Campus, EJC Montilla, Tacurong City

In Partial Requirements

In

Analytical Statistics
Educ. 600

Cashmira Balabagan-Ibrahim
MAT-English

October 2010
McNemar's Test Of Change

McNemar's Test of Change is a two-sample dependent test for proportions. The test involves two evaluations
of a single set of items, where each items falls into one of two classifications on each evaluation. If we use a
Pass/Fail classification scheme, McNemar's Test evaluates the differences between the number that Passed
on the first evaluation and Failed on the second, versus the number that Passed on the second evaluation and
Failed on the first.

The frequencies corresponding to the two classifications and two evaluations may be placed in a 2 x 2 table as
seen below. The cells of interest in this table are the b and c cells. In these cells, b and c, differences were
found in the classifications from the first evaluation to the second. McNemar's Test evaluates the change in the
number of misclassifications in one direction versus the number of misclassifications in the other. The
comparison is then the number of observations falling in the b cell versus the number falling in the c cell.

Hypotheses

The following hypotheses may be tested:

Where is the population proportion that would Pass on the first evaluation and Fail on the second, and is
the population proportion that would Pass on the second evaluation and Fail on the first.

Assumptions

1. The samples have been randomly drawn from two dependent populations either through matching or
repeated measures (Critical)

2. Each item evaluation yields one of two classifications (Critical)

3. Each observation is independent of every other observation, other than the given paired dependency
(Critical)

Test Statistics
McNemar's Test of Change may be reduced to a One-Sample Binomial Test, with the following:

Let p = c/(b+c), and = 0.5

The exact or approximate One-Sample Binomial Test is then performed using these values.

Output

Note

The p-value is flagged with an asterisk (*) when p <= alpha.

Kappa

Kappa is a measure of agreement. While McNemar may reject the null hypothesis, the level of agreement may
also be of interest.

The following statistics are output.

Agreement
Proportion Agreement = 0.520
Proportion Chance Agreement = 0.392
Kappa (Max) = 0.211
Kappa = 0.211
Here are the methods of calculation

• N=A+B+C+D [Total Sample Size]


• Po=(A+D)/N [Proportion Agreement]
• Pc=((A+B)*(A+C)+(C+D)*(B+D))/N/N [Proportion Chance Agreement]
• Pom=(Minimum(A+C,A+B)+Minimum(B+D,C+D))/N
• Kappa (Max)=(Pom-Pc)/(1-Pc) [Maximum value of Kappa, given marginal values]
• Kappa=(Po-Pc)/(1-Pc)

Mann–Whitney U
In statistics, the Mann–Whitney U test (also called the Mann–Whitney–Wilcoxon (MWW), Wilcoxon rank-
sum test, or Wilcoxon–Mann–Whitney test) is a non-parametric test for assessing whether two independent
samples of observations have equally large values. It is one of the best-known non-parametric significance
tests. It was proposed initially by the Irish-born US statistician Frank Wilcoxon in 1945, for equal sample sizes,
and extended to arbitrary sample sizes and in other ways by the Austrian-born US mathematician Henry
Berthold Mann and the US statistician Donald Ransom Whitney. MWW is virtually identical to performing an
ordinary parametric two-sample t test on the data after ranking over the combined samples.

Assumptions and formal statement of hypotheses

Although Mann and Whitney (1947) developed the MWW test under the assumption of continuous responses
with the alternative hypothesis being that one distribution is stochastically greater than the other, there are
many other ways to formulate the null and alternative hypotheses such that the MWW test will give a valid test.
[1]

A very general formulation is to assume that:

1. All the observations from both groups are independent of each other,
2. The responses are ordinal or continuous measurements (i.e. one can at least say, of any two
observations, which is the greater),
3. Under the null hypothesis the distributions of both groups are equal, so that the probability of an
observation from one population (X) exceeding an observation from the second population (Y) equals
the probability of an observation from Y exceeding an observation from X, that is, there is a symmetry
between populations with respect to probability of random drawing of a larger observation.
4. Under the alternative hypothesis the probability of an observation from one population (X) exceeding an
observation from the second population (Y) (after correcting for ties) is not equal to 0.5. The alternative
may also be stated in terms of a one-sided test, for example: P(X > Y) + 0.5 P(X = Y) > 0.5.

If we add more strict assumptions than those above such that the responses are assumed continuous and the
alternative is a location shift (i.e. F1(x) = F2(x + δ)), then we can interpret a significant MWW test as showing a
significant difference in medians. Under this location shift assumption, we can also interpret the MWW as
assessing whether the Hodges–Lehmann estimate of the difference in central tendency between the two
populations differs significantly from zero. The Hodges–Lehmann estimate for this two-sample problem is the
median of all possible differences between an observation in the first sample and an observation in the second
sample.

Calculations
The test involves the calculation of a statistic, usually called U, whose distribution under the null hypothesis is
known. In the case of small samples, the distribution is tabulated, but for sample sizes above ~20 there is a
good approximation using the normal distribution. Some books tabulate statistics equivalent to U, such as the
sum of ranks in one of the samples, rather than U itself.

The U test is included in most modern statistical packages. It is also easily calculated by hand, especially for
small samples. There are two ways of doing this.

For small samples a direct method is recommended. It is very quick, and gives an insight into the meaning of
the U statistic.

1. Choose the sample for which the ranks seem to be smaller (The only reason to do this is to make
computation easier). Call this "sample 1," and call the other sample "sample 2."
2. Taking each observation in sample 1, count the number of observations in sample 2 that are smaller
than it (count a half for any that are equal to it).
3. The total of these counts is U.

For larger samples, a formula can be used:

1. Arrange all the observations into a single ranked series. That is, rank all the observations without
regard to which sample they are in.
2. Add up the ranks for the observations which came from sample 1. The sum of ranks in sample 2 follows
by calculation, since the sum of all the ranks equals N(N + 1)/2 where N is the total number of
observations.
3. U is then given by:

where n1 is the sample size for sample 1, and R1 is the sum of the ranks in sample 1.
Note that there is no specification as to which sample is considered sample 1. An equally valid formula
for U is

The smaller value of U1 and U2 is the one used when consulting significance tables. The sum of the two
values is given by

Knowing that R1 + R2 = N(N + 1)/2 and N = n1 + n2 , and doing some algebra, we find that the sum is
The maximum value of U is the product of the sample sizes for the two samples. In such a case, the "other" U
would be 0. The Mann–Whitney U is equivalent to the area under the receiver operating characteristic curve
that can be readily calculated

Examples

Illustration of calculation methods

Suppose that Aesop is dissatisfied with his classic experiment in which one tortoise was found to beat one
hare in a race, and decides to carry out a significance test to discover whether the results could be extended to
tortoises and hares in general. He collects a sample of 6 tortoises and 6 hares, and makes them all run his
race. The order in which they reach the finishing post (their rank order, from first to last) is as follows, writing T
for a tortoise and H for a hare:

THHHHHTTTTTH

What is the value of U?

• Using the direct method, we take each tortoise in turn, and count the number of hares it is beaten by
(lower rank), getting 0, 5, 5, 5, 5, 5, which means U = 25. Alternatively, we could take each hare in turn,
and count the number of tortoises it is beaten by. In this case, we get 1, 1, 1, 1, 1, 6. So U = 6 + 1 + 1 +
1 + 1 + 1 = 11. Note that the sum of these two values for U is 36, which is 6 × 6.
• Using the indirect method:

the sum of the ranks achieved by the tortoises is 1 + 7 + 8 + 9 + 10 + 11 = 46.


Therefore U = 46 − (6×7)/2 = 46 − 21 = 25.
the sum of the ranks achieved by the hares is 2 + 3 + 4 + 5 + 6 + 12 = 32, leading to U = 32 − 21 = 11.

Illustration of object of test

A second example illustrates the point that the Mann–Whitney does not test for equality of medians. Consider
another hare and tortoise race, with 19 participants of each species, in which the outcomes are as follows:

HHHHHHHHHTTTTTTTTTTHHHHHHHHHHTTTTTTTTT

The median tortoise here comes in at position 19, and thus actually beats the median hare, which comes in at
position 20.
However, the value of U (for hares) is 100

(9 Hares beaten by (x) 0 tortoises) + (10 hares beaten by (x) 10 tortoises) = 0 + 100 = 100

Value of U(for tortoises) is

(10 tortoises beaten by 9 hares) + (9 tortoises beaten by 19 hares) = 90 + 171 = 261

Consulting tables, or using the approximation below, shows that this U value gives significant evidence that
hares tend to do better than tortoises (p < 0.05, two-tailed). Obviously this is an extreme distribution that would
be spotted easily, but in a larger sample something similar could happen without it being so apparent. Notice
that the problem here is not that the two distributions of ranks have different variances; they are mirror images
of each other, so their variances are the same, but they have very different skewness.

Normal approximation

For large samples, U is approximately normally distributed. In that case, the standardized value

where mU and σU are the mean and standard deviation of U, is approximately a standard normal deviate whose
significance can be checked in tables of the normal distribution. mU and σU are given by

The formula for the standard deviation is more complicated in the presence of tied ranks; the full formula is
given in the text books referenced below. However, if the number of ties is small (and especially if there are no
large tie bands) ties can be ignored when doing calculations by hand. The computer statistical packages will
use the correctly adjusted formula as a matter of routine.

Note that since U1 + U2 = n1 n2, the mean n1 n2/2 used in the normal approximation is the mean of the two
values of U. Therefore, the absolute value of the z statistic calculated will be same whichever value of U is
used.

Relation to other tests

Comparison to Student's t-test


The U test is useful in the same situations as the independent samples Student's t-test, and the question
arises of which should be preferred.

Ordinal data
U remains the logical choice when the data are ordinal but not interval scaled, so that the spacing
between adjacent values cannot be assumed to be constant.
Robustness
As it compares the sums of ranks [2]. the Mann–Whitney test is less likely than the t-test to spuriously
indicate significance because of the presence of outliers – i.e. Mann–Whitney is more robust.[clarification
needed][citation needed]

Efficiency
When normality holds, MWW has an (asymptotic) efficiency of 3 / π or about 0.95 when compared to
the t test[3]. For distributions sufficiently far from normal and for sufficiently large sample sizes, the
MWW can be considerably more efficient than the t[4].

Overall, the robustness makes the MWW more widely applicable than the t test, and for large samples from the
normal distribution, the efficiency loss compared to the t test is only 5%, so one can recommend MWW as the
default test for comparing interval or ordinal measurements with similar distributions.

The relation between efficiency and power in concrete situations isn't trivial though. For small sample sizes one
should investigate the power of the MWW vs t.

Different distributions

If one is only interested in stochastic ordering of the two populations (i.e., the concordance probability
P(Y > X)), the Wilcoxon–Mann–Whitney test can be used even if the shapes of the distributions are different.
The concordance probability is exactly equal to the area under the receiver operating characteristic curve
(AUC) that is often used in the context.[citation needed] If one desires a simple shift interpretation, the U test should
not be used when the distributions of the two samples are very different, as it can give erroneously significant
results.

Alternatives

In that situation, the unequal variances version of the t test is likely to give more reliable results, but only if
normality holds.

Alternatively, some authors (e.g. Conover) suggest transforming the data to ranks (if they are not already
ranks) and then performing the t test on the transformed data, the version of the t test used depending on
whether or not the population variances are suspected to be different. Rank transformations do not preserve
variances so it is difficult to see how this would help.
The Brown–Forsythe test has been suggested as an appropriate non-parametric equivalent to the F test for
equal variances.

Kendall's τ

The U test is related to a number of other non-parametric statistical procedures. For example, it is equivalent to
Kendall's τ correlation coefficient if one of the variables is binary (that is, it can only take two values).

ρ statistic

A statistic called ρ that is linearly related to U and widely used in studies of categorization (discrimination
learning involving concepts) is calculated by dividing U by its maximum value for the given sample sizes, which
is simply n1 × n2. ρ is thus a non-parametric measure of the overlap between two distributions; it can take
values between 0 and 1, and it is an estimate of P(Y > X) + 0.5 P(Y = X), where X and Y are randomly chosen
observations from the two distributions. Both extreme values represent complete separation of the
distributions, while a ρ of 0.5 represents complete overlap. This statistic was first proposed by Richard
Herrnstein (see Herrnstein et al., 1976). The usefulness of the ρ statistic can be seen in the case of the odd
example used above, where two distributions that were significantly different on a U-test nonetheless had
nearly identical medians: the ρ value in this case is approximately 0.723 in favour of the hares, correctly
reflecting the fact that even though the median tortoise beat the median hare, the hares collectively did better
than the tortoises collectively.

Example statement of results

In reporting the results of a Mann–Whitney test, it is important to state:

• A measure of the central tendencies of the two groups (means or medians; since the Mann–Whitney is
an ordinal test, medians are usually recommended)
• The value of U
• The sample sizes
• The significance level.

In practice some of this information may already have been supplied and common sense should be used in
deciding whether to repeat it. A typical report might run,

"Median latencies in groups E and C were 153 and 247 ms; the distributions in the two groups differed
significantly (Mann–Whitney U = 10.5, n1 = n2 = 8, P < 0.05 two-tailed)."

A statement that does full justice to the statistical status of the test might run,
"Outcomes of the two treatments were compared using the Wilcoxon–Mann–Whitney two-sample rank-
sum test. The treatment effect (difference between treatments) was quantified using the Hodges–
Lehmann (HL) estimator, which is consistent with the Wilcoxon test (ref. 5 below). This estimator (HLΔ)
is the median of all possible differences in outcomes between a subject in group B and a subject in
group A. A non-parametric 0.95 confidence interval for HLΔ accompanies these estimates as does ρ,
an estimate of the probability that a randomly chosen subject from population B has a higher weight
than a randomly chosen subject from population A. The median [quartiles] weight for subjects on
treatment A and B respectively are 147 [121, 177] and 151 [130, 180] Kg. Treatment A decreased
weight by HLΔ = 5 Kg. (0.95 CL [2, 9] Kg., 2P = 0.02, ρ = 0.58)."

However it would be rare to find so extended a report in a document whose major topic was not statistical
inference.

CHI-SQUARE INDEPENDENCE TEST

If we have N observations with two variables where each observation can be classified into one of R mutually
exclusive categories for variable one and one of C mutually exclusive categories for variable two, then a cross-
tabulation of the data results in a two-way contingency table (also referred to as an RxC contingency table).
The resulting contingency table has R rows and C columns.

A common question with regards to a two-way contingency table is whether we have independence. By
independence, we mean that the row and column variables are unassociated (i.e., knowing the value of the
row variable will not help us predict the value of column variable and likewise knowing the value of the column
variable will not help us predict the value of the row variable).

A more technical definition for independence is that

P(row i, column j) = P(row i)*P(column j) for all i,j

One such test is the chi-square test for independence.

H0: The two-way table is independent


Ha: The two-way table is not independent
Test Statistic:

where

r = the number of rows in the contingency table


c = the number of columns in the contingency table
Oij = the observed frequency of the ith row and jth column
Eij = the expected frequency of the ith row and jth column
=

Ri = the sum of the observed frequencies for row i


Cj = the sum of the observed frequencies for column j
N = the total sample size
Significance
Level:
Critical Region: T > CHSPPF(alpha,(r-1)*(c-1))

where CHSPPF is the percent point function of the chi-square distribution and
(r-1)*(c-1) is the degrees of freedom
Conclusion: Reject the independence hypothesis if the value of the test statistic is greater
than the chi-square value.

This test statistic can also be formulated as

where

The dij are referred to as the standardized residuals and they show the contribution to the chi-square
test statistic of each cell.

Syntax 1:
CHI-SQUARE INDPENDENCE TEST <y1> <y2>
<SUBSET/EXCEPT/FOR qualification>
where <y1> is the first response variable;
<y2> is the second response variable;
and where the <SUBSET/EXCEPT/FOR qualification> is optional.

This syntax is used for the case where you have raw data (i.e., the data has not yet been cross
tabulated into a two-way table).

Syntax 2:
CHI-SQUARE INDEPENDENCE TEST <m>
<SUBSET/EXCEPT/FOR qualification>
where <m> is a matrix containing the two-way table;
and where the <SUBSET/EXCEPT/FOR qualification> is optional.

This syntax is used for the case where we the data have already been cross-tabulated into a two-way
contingency table.

Syntax 3:
CHI-SQUARE INDEPENDENCE TEST <n11> <n12> <n21> <n22>
where <n11> is a parameter containing the value for row 1, column 1 of a 2x2 table;
<n12> is a parameter containing the value for row 1, column 2 of a 2x2 table;
<n21> is a parameter containing the value for row 2, column 1 of a 2x2 table;
and <n22> is a parameter containing the value for row 2, column 2 of a 2x2 table.

This syntax is used for the special case where you have a 2x2 table. In this case, you can enter the 4
values directly, although you do need to be careful that the parameters are entered in the order
expected above.

Examples:
CHI-SQUARE INDEPENDENCE TEST Y1 Y2
CHI-SQUARE INDEPENDENCE TEST M
CHI-SQUARE INDEPENDENCE TEST N11 N12 N21 N22
Note:
The chi-square approximation is asymptotic. This means that the critical values may not be valid if the
expected frequencies are too small.

Cochran suggests that if the minimum expected frequency is less than 1 or if 20% of the expected frequencies
are less than 5, the approximation may be poor. However, Conover suggests that this is probably too
conservative, particularly if r and c are not too small. He suggests that the minimum expected frequency should
be 0.5 and at least half the expected frequencies should be greater than 1.

In any event, if there are too many low expected frequencies, you can do one of the following:

1. If rows or columns with small expected frequencies can be intelligently combined, then this may
result in expected frequencies that are sufficiently large.
2. Use Fisher's exact test.

Note:
Conover points out that there are really 3 distinct tests:

1. Only N is fixed. The row and column totals are not fixed (i.e., they are random).
2. Either the row totals or the column totals are fixed beforehand.
3. Both the row totals and the column totals are fixed beforehand.

Note that in all three cases, the test statistic and the chi-square approximation are the same. What differs is the
exact distribution of the test statistic. When either the row or column totals (or both) are fixed, the possible
number of contingency tables is reduced.

As long as the expected frequencies are sufficiently large, the chi-square approximation should be adequate
for practical purposes.

Note:
Some authors recommend using a continuity correction for this test. In this case, 0.5 is added to the observed
frequency in each cell. Dataplot performs this test both with the continuity correction and without the continuity
correction.
Note:
The following information is written to the file dpst1f.dat (in the current directory):
Column 1 - row id
Column 2 - column id
Column 3 - row total
Column 4 - column total
Column 5 - expected frequency (Eij
Column 6 - observed frequency (Oij

To read this information into Dataplot, enter

SKIP 1
READ DPST1F.DAT ROWID COLID ROWTOT COLTOT ...
EXPFREQ OBSFREQ
Note:
The ASSOCIATION PLOT command can be used to plot the standardized residuals of the chi-square analysis.

The ODDS RATIO INDEPDNENCE TEST is an alternative test for independence based on the LOG(odds
ratio).

Related Commands:
ODDS RATIO INDEPENDENCE TEST = Perform a log(odds ratio) test for independence.
FISHER EXACT TEST = Perform Fisher's exact test.
ASSOCIATION PLOT = Generate an association plot.
SIEVE PLOT = Generate a sieve plot.
ROSE PLOT = Generate a Rose plot.
BINARY TABULATION PLOT = Generate a binary tabulation plot.
ROC CURVE = Generate a ROC curve.
ODDS RATIO = Compute the bias corrected odds ratio.
LOG ODDS RATIO = Compute the bias corrected log(odds ratio).

FRIEDMAN TEST

The Friedman test is a non-parametric test for analyzing randomized complete block designs. It is an extension
of the sign test when there may be more than two treatments.

The Friedman test assumes that there are k experimental treatments (k ≥ 2). The observations are
arranged in b blocks, that is

Treatment
Block 1 2 ... k
1 X11 X12 ... X1k
2 X21 X22 ... X2k
3 X31 X32 ... X3k
... ... ... ... ...
b Xb1 Xb2 ... Xbk

Let R(Xij) be the rank assigned to Xij within block i (i.e., ranks within a given row). Average ranks are
used in the case of ties. The ranks are summed to obtain

Then the Friedman test is

H0: The treatment effects have identical effects


Ha: At least one treatment is different from at least one other treatment
Test Statistic:

If there are ties, then

where
Note that Conover recommends the statistic

since it has a more accurate approximate distribution. The T2 statistic is the two-way
analysis of variance statistic computed on the ranks R(Xij).
Significance
Level:
Critical Region:

where F is the percent point function of the F distributuion.

where is the percent point function of the chi-square distribution.

The T1 approximation is sometimes poor, so the T2 approximation is typically preferred.


Conclusion: Reject the null hypothesis if the test statistic is in the critical region.

If the hypothesis of identical treatment effects is rejected, it is often desirable to determine which
treatments are different (i.e., multiple comparisons). Treatments i and j are considered different if

Syntax:
FRIEDMAN TEST <y> <block> <treat>
<SUBSET/EXCEPT/FOR qualification>
where <y> is the response variable;
<block> is a variable that identifies the block;
<treat> is a variable that identifies the treatment;
and where the <SUBSET/EXCEPT/FOR qualification> is optional.
Examples:
FRIEDMAN TEST Y BLOCK TREATMENT
FRIEDMAN TEST Y X1 X2
FRIEDMAN TEST Y BLOCK TREATMENT SUBSET BLOCK > 2
Note:
In Dataplot, the variables should be given as:
Y BLOCK TREAT
X11 1 1
X12 1 2
... 1 ...
X1k 1 k
X21 2 1
X22 2 2
... 2 ...
X2k 2 k
... ... ...
Xb1 b 1
Xb2 b 2
... b ...
Xbk b k

If your data are in a format similar to that given in the DESCRIPTION section (i.e., you have colums Y1
to Yk, each with b rows), you can convert it to the format required by Dataplot with the commands:

LET K = 5
LET NBLOCK = SIZE Y1
LET NTOTAL = K*NBLOCK
LET BLOCK = SEQUENCE 1 K 1 NBLOCK
LET TREAT = SEQUENCE 1 1 K FOR I = 1 1 NTOTAL
LET Y2 = STACK Y1 Y2 ... YK
FRIEDMAN TEST Y2 BLOCK TREAT
Note:
The response, ranked response, block, and treatment are written to the file dpst1f.dat in the current
directory.

The treatment ranks and multiple comparisons are written to the file dpst2f.dat in the current directory.
Comparisons that are statistically significant at the 95% level are flagged with a single asterisk while
comparisons that are statistically significant at the 99% level are flagged with two asterisks.

Note:
The Friedman test is based on the following assumptions:
1. The b rows are mutually independent. That is, the results within one block (row) do not affect
the results within other blocks.
2. The data can be meaningfully ranked.

Default:
None
Synonyms:
None
Related Commands:
ANOVA = Perform an analysis of variance.
SIGN TEST = Perform a sign test.
MEDIAN POLISH = Carries out a robust ANOVA.
T TEST = Carries out a t test.
RANK SUM TEST = Perform a rank sum test.
SIGNED RANK TEST = Perform a signed rank test.
BLOCK PLOT = Generate a block plot.
DEX SCATTER PLOT = Generates a dex scatter plot.
DEX ... PLOT = Generates a dex plot for a statistic.
DEX ... EFFECTS PLOT = Generates a dex effects plot for a
Reference:
"Practical Nonparametric Statistics", Third Edition, Wiley, 1999, pp. 367-373.
Applications:
Analysis of Variance
Implementation Date:
2004/1
Program:
SKIP 1
READ CONOVER.DAT Y BLOCK TREAT
FRIEDMAN Y BLOCK TREAT

The following output is generated.


FRIEDMAN TEST FOR TWO-WAY ANOVA

1. STATISTICS
NUMBER OF OBSERVATIONS = 48
NUMBER OF BLOCKS = 12
NUMBER OF TREATMENTS = 4
FRIEDMAN TEST STATISTIC (ORIGINAL) = 8.097345
A1 (SUM OF SQUARES OF RANKS) = 356.5000
C1 (CORRECTION FACTOR) = 300.0000
FRIEDMAN TEST STATISTIC (CONOVER) = 3.192198
2. PERCENT POINTS OF THE F REFERENCE DISTRIBUTION
FOR FRIEDMAN TEST STATISTIC
0 % POINT = 0.000000
50 % POINT = 0.8052071
75 % POINT = 1.435732
90 % POINT = 2.257744
95 % POINT = 2.891563
99 % POINT = 4.436786
99.9 % POINT = 6.882786

96.37845 % Point: 3.192198

3. CONCLUSION (AT THE 5% LEVEL):


THE 4 TREATMENTS DO NOT HAVE IDENTICAL EFFECTS

Kruskal-Wallis test

This is a method for comparing several independent random samples and can be used as a non-parametric
alternative to the one way ANOVA.

The Kruskal-Wallis test statistic for k samples, each of size ni is:

- where N is the total number (all ni) and Ri is the sum of the ranks (from all samples pooled) for the ith sample
and:

The null hypothesis of the test is that all k distribution functions are equal. The alternative hypothesis is that at
least one of the populations tends to yield larger values than at least one of the other populations.

Assumptions:
• random samples from populations
• independence within each sample
• mutual independence among samples
• measurement scale is at least ordinal
• either k population distribution functions are identical, or else some of the populations tend to yield
larger values than other populations

If the test is significant, you can make multiple comparisons between the samples. You may choose the level
of significance for these comparisons (default is  = 0.05). All pairwise comparisons are made and the
probability of each presumed "non-difference" is indicated (Conover, 1999; Critchlow and Fligner, 1991;
Hollander and Wolfe, 1999). Two alternative methods are used to make all possible pairwise comparisons
between groups; these are Dwass-Steel-Critchlow-Fligner and Conover-Inman. In most situations, you should
use the Dwass-Steel-Critchlow-Fligner result.

By the Dwass-Steel-Critchlow-Fligner procedure, a contrast is considered significant if the following inequality


is satisfied:

- where q is a quantile from the normal range distribution for k groups, ni is size of the ith group, nj is the size of
the jth group, tb is the number of ties at rank b and Wij is the sum of the ranks for the ith group where
observations for both groups have been ranked together. The values either side of the greater than sign are
displayed in parentheses in StatsDirect results.

The Conover-Inman procedure is simply Fisher's least significant difference method performed on ranks. A
contrast is considered significant if the following inequality is satisfied:

- where t is a quantile from the Student t distribution on N-k degrees of freedom. The values either side of the
greater than sign are displayed in parentheses in StatsDirect results.

An alternative to Kruskal-Wallis is to perform a one way ANOVA on the ranks of the observations.

StatsDirect also gives you an homogeneity of variance test option with Kruskal-Wallis; this is marked as
"Equality of variance (squared ranks)". Please refer to homogeneity of variance for more details.
Technical Validation

The test statistic is an extension of the Mann-Whitney test and is calculated as above. In the presence of tied
ranks the test statistic is given in adjusted and unadjusted forms, (opinion varies concerning the handling of
ties). The test statistic follows approximately a chi-square distribution with k-1 degrees of freedom; P values
are derived from this. For small samples you may wish to refer to tables of the Kruskal-Wallis test statistic but
the chi-square approximation is highly satisfactory in most cases (Conover, 1999).

Example

From Conover (1999, p. 291).

Test workbook (ANOVA worksheet: Method 1, Method 2, Method 3, Method 4).

The following data represent corn yields per acre from four different fields where different farming methods
were used.

Method 1 Method 2 Method 3 Method 4


83 91 101 78
91 90 100 82
94 81 91 81
89 83 93 77
89 84 96 79
96 83 95 81
91 88 94 80
92 91 81
90 89
84

To analyse these data in StatsDirect you must first prepare them in four workbook columns appropriately
labelled. Alternatively, open the test workbook using the file open function of the file menu. Then select
Kruskal-Wallis from the Non-parametric section of the analysis menu. Then select the columns marked
"Method 1", "Method 2", "Method 3" and "Method 4" in one selection action.

Example:

Adjusted for ties: T = 25.62883 P < 0.0001

All pairwise comparisons (Dwass-Steel-Chritchlow-Fligner)

Method 1 and Method 2 , P = 0.1529

Method 1 and Method 3 , P = 0.0782


Method 1 and Method 4 , P = 0.0029

Method 2 and Method 3 , P = 0.0048

Method 2 and Method 4 , P = 0.0044

Method 3 and Method 4 , P = 0.0063

All pairwise comparisons (Conover-Inman)

Method 1 and Method 2, P = 0.0078

Method 1 and Method 3, P = 0.0044

Method 1 and Method 4, P < 0.0001

Method 2 and Method 3, P < 0.0001

Method 2 and Method 4, P = 0.0001

Method 3 and Method 4, P < 0.0001

From the overall T we see a statistically highly significant tendency for at least one group to give higher values
than at least one of the others. Subsequent contrasts show a significant separation of all groups with the
Conover-Inman method and all but method 1 vs. methods 2 and 3 with the Dwass-Steel-Chritchlow-Fligner
method. In most situations, it is best to use only the Dwass-Steel-Chritchlow-Fligner result.

P values-analysis of variance

CRAMER CONTINGENCY COEFICIENT

If we have N observations with two variables where each observation can be classified into one of R mutually
exclusive categories for variable one and one of C mutually exclusive categories for variable two, then a cross-
tabulation of the data results in a two-way contingency table (also referred to as an RxC contingency table).
The resulting contingency table has R rows and C columns.

A common question with regards to a two-way contingency table is whether we have independence. By
independence, we mean that the row and column variables are unassociated (i.e., knowing the value of the
row variable will not help us predict the value of column variable and likewise knowing the value of the column
variable will not help us predict the value of the row variable).
A more technical definition for independence is that

P(row i, column j) = P(row i)*P(column j) for all i,j

The standard test statistic for determing independence is the chi-square test statistic:

One criticism of this statistic is that it does not give a meaningful description of the degree of dependence (or
strength of association). That is, it is useful for determining whether there is dependence. However, since the
strength of that association also depends on the degrees of freedom as well as the value of the test statistic, it
is not easy to interpert the strength of association.

The Cramer's contingency coefficient is one method to provide an easier to interpret measure of strength of
association. Specifically, it is:

where

T = the chi-square test statistic given above


N = the total sample size
q = minimum(number of rows,number of columns)

This statistic is based on the fact that the maximum value of T is:

N (q - 1)

So this statistic basically scales the chi-square statistic to a value between 0 (no association) and 1 (maximum
association). It has the desirable property of scale invariance. That is, if the sample size increases, the value of
Cramer's contingency coefficient does not change as long as values in the table change the same relative to
each other.

The data for the contingency table can be specified in either of the following two ways:

1. raw data

In this case, you will have two variables. The first will contain r distinct values and the second
will contain c distinct values. Dataplot will automatically perform the cross-tabulation to obtain
the counts for each cell. Although the distinct values will typically be integers, this is not strictly
required.

2. table data

If you only have the resulting contingency table (i.e., the counts for each cell), then you can use
the READ MATRIX (or CREATE MATRIX) command to create a matrix with the data. This is
demonstrated in the example program below.

In this case, your data should contain non-negative integers since they represent the counts for
each cell.

Syntax 1:
LET <par> = CRAMER CONTINGENCY COEFICIENT <y1> <y2>
<SUBSET/EXCEPT/FOR qualification>
where <y1> is the first response variable;
<y2> is the second response variable;
<par> is a parameter where the computed Cramer contingency coefficient is stored;
and where the <SUBSET/EXCEPT/FOR qualification> is optional.

Use this syntax for raw data.

Syntax 2:
LET <p> = MATRIX GRAND CRAMER CONTINGENCY COEFICIENT <y1> <y2>
<SUBSET/EXCEPT/FOR qualification>
where <m> is a matrix containing the contingency table;
<p> is a parameter where the computed Cramer contingency coefficient is stored;
and where the <SUBSET/EXCEPT/FOR qualification> is optional.

Use this syntax if your data is a contingency table.

Examples:
LET A = CRAMER CONTINGENCY COEFICIENT Y1 Y2
LET A = MATRIX GRAND CRAMER CONTINGENCY COEFICIENT M
Note:
For the raw data case, the two variables should have the same number of elements.
Note:
The following additional commands are supported
TABULATE CRAMER CONTINGENCY COEFICIENT Y1 Y2 X
CROSS TABULATE CRAMER CONTINGENCY COEFICIENT ...
Y1 Y2 X1 X2

CRAMER CONTINGENCY COEFICIENT PLOT Y1 Y2 X


CROSS TABULATE CRAMER CONTINGENCY COEFICIENT PLOT ...
Y1 Y2 X1 X2

BOOTSTRAP CRAMER CONTINGENCY COEFICIENT PLOT Y1 Y2


JACKNIFE CRAMER CONTINGENCY COEFICIENT PLOT Y1 Y2

The above commands expect the variables to have the same number of observations.

Note that the above commands are only available if you have raw data.

Default:
None
Synonyms:
None
Related Commands:
PEARSON CONTINGENCY COEFFICIENT = Compute Pearson's contingency coefficient.
CHI-SQUARE INDEPENDENCE TEST = Perform a chi-square test for independence.
ODDS RATIO INDEPENDENCE TEST = Perform a log(odds ratio) test for independence.
FISHER EXACT TEST = Perform Fisher's exact test.
ASSOCIATION PLOT = Generate an association plot.
SIEVE PLOT = Generate a sieve plot.
ROSE PLOT = Generate a Rose plot.
BINARY TABULATION PLOT = Generate a binary tabulation plot.
ROC CURVE = Generate a ROC curve.
ODDS RATIO = Compute the bias corrected odds ratio.
LOG ODDS RATIO = Compute the bias corrected log(odds ratio).

You might also like