Professional Documents
Culture Documents
Statistical Methods For Environmental Pollution Monitoring
Statistical Methods For Environmental Pollution Monitoring
Statistical Methods For Environmental Pollution Monitoring
Statistical Methods
for Environmental ·
Pollution Monitoring
Richard 0. Gilbert
Pacific Northwest Laboratory
LOS ALAMOS
NATIONAL LABORATORY
DEC 0 'J b~l
LIBRARIES
PROPERTY
2 Sam plin g
Van Nostrand Reinhold 2.1 Samp
115 Fifth Avenue 2.? .... ,._rgc
New York, New York I<XX>3 2.. pn
Chapman & Hall 2.4 Choo
2-6 Boundary Row 2.5 Varia
i.A?rxt~ SEI 8HN, En~and 2.6 Case
: f,
· 'Jboqtas Nelson Australia · . , 2.7 Sumr.
102 Dodds Street ' . : : ·, ~· I
South Melbourne, Victoria 3205, Aus 3 Env iron n
tralia
Nelson Canada
1120 Birchmount Road
3.1 lntJO{
Scarborough, Ontario MIK 504 , Can 3.2 Crite
ada 3.3
·' • • ' I Meth
3.4 Sumr
16 IS 14 13 12 II 10 9 8 7
6 5 4 4 Sim ple R
4.1 Basic
4.2 Esti n
4.3 Effec
4.4 Num
Lib rary of Con gres s Cat alog ing-
in-P ubli cati on Dat a 4.5 Num'
Gilbert, Richard 0.
Statistical methods for environm 4.6 Estin
ental pollution 4.7 Sum1
monitoring.
Bibliography: p.
Includes index. 5 Stratifiell
I. Poll utio n-E nvir onm enta l aspe
cts- Stat istic al 5.1 Basi<
methods. 2. Environmental mon
itor ing- Stat istic al 5.~ ·tin
methods. I. Title.
TDI 93.G 55 5.. dn
1987 628 .5'0 28 86-26758
ISBN 0-442-23050-8 5.4 Arbit
5.5 All0<
J direction for
i ~tion for
station-season
18 Comparing Populations
, > 6.63, we
~ions over all
241
242 Comparing Populations
Sta•
B :s l - 1 or B =::: u Day 28
where l and u are integers taken from Table A14 for the appropriate n and
chosen ex. 2
For example, suppose there are n = 34 differences, and we choose to test 3
at the ex = 0.05 level. Then we see from Table A14 that we reject H and 4
0 5
accept HA if B :s 10 or if B =::: 24.
6
8
EXAMP LE 18.1
9
Grivet ( 1980) reports average and maximum oxidant pollution con- 10
centrations at several air monitoring stations in California. The daily Data ar~
11
'Tied co
Tests Using Paired Data 243
~ig.n test maximum (of hourly averag
'· .... ~ese e) oxidant concentrations (par
hundred million) at 2 stations ts per
.ny for the first 20 days in July 197
given in Table 18.1. The data 2 are
he latter at the 2 stations are paired and
correlated because they were perh aps
nonnal taken on the same day. This
correlation is permitted, but cor type of
relation between the pairs, that
between observations taken on is,
different days, should not be
for the test to be completely present
valid. If this latter type of pos
correlation is present, the test itive
would indicate more than the
:aderlying lOOa% of the time a signific allowed
ant difference between the 2
detected) when none actually exists. Thi stations
s problem is discussed by Gas
since the and Rubin (1971) and by Alb twirth
ers (l97 8a) . •
)ugh not We test the null hypothesis that
the median difference in maximu
JXOn test concentrations between the two m
stations is zero, that is, there
between tendency for the oxidant concen is no
trations at one station to be
:1putation than at the oth er station. Since larger
concentrations are tied on 3 day
equals 17 rather than 20. The s, n
number of + signs is B = 9,
..::h X1; < number of days that the max the
imum concentration at station
of the D; exceeds that at station 28783. 41541
Suppose we use a = 0.0 5.
+ or - from Table A14 for n = 17 Then
we find l = 6 and u = 15.
and n is is not less than or equal to l Sin ce B
- 1 = 5 nor greater than or
15, we cannot reject H equal to
0.
Table A14 gives values of lan
d u for n :S 75. When n > 20, we may use
the following approximate test
procedure:
1. Compute
18.1
B- n/2 2B - n
lSO n z -
8 - -=
.,J;J=-
4 18.3
.Jn
2. Reject H0 and accept HA. (Eq
. 18.2) if Z8 -2 1 _""2 or if Z8
where Z 1 _a12 is obtained from
Table Al.
:S ~ z._a,2 ,
'.!, to test
EXAMPLE 18.2
Using the data in Example 18.
1, we test H0 versus HA., usin
We have Z8 = (18 - 17) /..Jl g Z8 .
? = 0.243. For a = 0.0 5, Tab
gives Zo. 975 = 1.96. Since Z is le Al
18.2 8 not less than or equal to -1. 96
Table 18.1 Maximum Oxi
dant Concentrationsa at Two Sta
tions in July 1972
Station Station Sign of
Day 28783 Station Station Sign of
41541 Difference Day 28783 41541 Difference
iate n and I 8 10 + II
2 5 II 13
7 + 12 +
1 se to test 3 6 7
12 14 +
+ 13 13
.::t Ho and 4 7 7 20 +
5 14 14 28
4 6 + +
6 15 12 6
4 6
7 + 16 12 7
3 3
8 17 13 7
5 4
9 18 14 6
5 5
10 19 12 4
:on - 6 4 20 15 5
:ail y "Dala are parts per hundred
million.
"Tied concentrations.
244 Comparing Populations
' ., nor greater than or equal to 1.96, we cannot reject H • This
0 becal
conclusion is the same as that obtained by using Table A14 in have
Example 18. 1. test
test ·
One-Sided Test
Thus far we have considered only a two-sided alternative hypothesis (Eq. 18.2). 18.
One-sided tests may also be used. There are two such tests:
Frie<
1. Test H0 versus the alternative hypothesis, HA, that the x measurements tend k rel
2
to exceed the x 1 measurements more often than the reverse. In this case sym;
reject H0 and accept HA if B ~ u, where u is obtained from Table A14. with
Alternatively, if n > 20, reject H0 and accept HA if Z ~ Z _ ,., where ZB are ;
8 1
is computed by Eq. 18.3 and Z 1 -a is from Table Al.
2. Test H0 versus the alternative hypothesis that the x measurements tend to
1
exceed the x 2 measurements more often than the reverse. If n ::5 75, use
Table A14 and reject H0 and accept HA if B ::5 l - 1. Alternatively, if n
> 20, reject H0 and accept HA if Z8 ::5 -Z1 -a• where Z8 is computed by The
II Eq. 18.3.
When one-sided tests are conducted with Table A14, the a levels indicated
in the table are divided by 2. Hence, Table A14 may only be used to make I
one-sided tests at the 0.025 and 0.005 significance levels. .ner
of •
Trace Concentrations stat
ana
The sign test can be conducted even though some data are missing or are ND out
concentrations. See Table 18.2 for a summary of the types of data that can
occur, whether or not the sign can be determined, and the effect on n. The
effect of decreasing n is to lower the power of the test to indicate differences
between the two populations.
Can Sign Be
Type of Data Computed? Decreases n? 2.
One or both members of a pair
3.
No Yes
are absent
X2i = xli No Yes
One member of a pair is ND Yes" No
Both members of a pair are ND No Yes
• If the numerical value is greater than the detection limit of the 4.
ND value.
Tests Using Paired Data 245
1iS because it requires computing and rank
ing the D;. In most situations it should
in have greater pow er to find differences
in two populations than does the sign
test. The null and alternative hypothese
s are the same as for the sign test. The
test is described by Hollander and Wol
fe (1973).
2 3 n
Population 1 XII x,2 Xn
Population 2 X21 X22 X2J
Fr = [ I2
2:k RJ
nk(k + 1) j= I
J- 3n(k + I) 18.4
4. If tied values are present within one
or more blocks, compute the Friedman
statistic as follows:
246 Comparing Populations
r - n(k + 1)]
k
12 j~l LRj
2 Table 18
2
F, = ----- ----- ----- ----- ----- ----- - 18.5 Blocks
1
;~
+
1 1 ~ ti.J
1
n [( g;
nk(k 1) - k _ )
- k
]
1 2
3
where g; is the numbe r of tied groups in block i and t;, 5
tied data in the jth tied group in block i. Each untied value1 is the numbe r of
within block i 6
is considered to be a "grou p" of ties of size 1. The quantit
y in braces { }
in the denom inator of Eq. 18.5 is zero for any block that
contains no ties.
This method of handling ties is illustrated in Example 18.3.
Equation 18.5 ha·
reduces to Eq. 18.4 when there are no ties in any block.
5. For an a level test, reject H and accept HA if F, sh<
0 ~ x~ -O<,k- ,, where in
x~ -O<,k-I is the 1 - a quantile of the chi-square distrib
ution with k - 1
df, as obtained from Table A19, where k is the numbe r of
populations. The 18
chi-square distribution is appropriate only if n is reasonably
large. Hollan der de
and Wolfe provide exact critical values (their Table A.15)
for testing F, for co
the following combinations of k and n: k = 3, n = 2, 3,
. . . , 13; k =
4, n = 2, 3, . . . , 8; k = 5, n = 3, 4, 5. Odeh et al.
(1977) extend k,
these tables to k = 5, n = 6, 7, 8; k = 6, n = 2, 3, 4,
5, 6. These tables St
will give only an approximate test if ties are present. The use
of F, computed
by Eq. 18.5 and evaluated using the chi-square tables may
be preferred in
this situation. The foregoing tests are completely valid only
if the observations
in different blocks are not correlated.
EXAMPLE 18.3
The data in Table 18.3 are daily maximum oxidant air concen 18.2
trations
(parts per hundred million) at k = 5 monitoring stations in
California We di
for the first n = 6 days in July 1973 (from Grivet , 1980).
We shall rank S1
use Friedm an's procedure to test at the a = 0.025 significance
level in Set
the null hypothesis, H0 , that there is no tendency for any
station to Wilco:
Table 18.3 Daily Maximum Air Concentrations 8 in California During
July 1973
18.2
Day (Block)
Station
Number Sum of The V
2 3 4 5 6 Ranks (R1) two i·
28783 7 (2)" 5 (4) 7 (3.5) 12 (2) 4 (3) 4 (4)
tend t
41541 18.5
5 (I) 3 ( 1.5) 3 (I) 8 (I) 3 This ·
( 1.5) 2 (I) 7
43382 II (4) 4 (3) 7 (3.5) 17 (4) 5 (4.5) 4 (4) 23 t d~
60335 13 (5) 6 (5) 12 (5) 21 (5) 5 (4.5) 4 (4) 28.5 AW,
60336 8 (3) 3 (1.5) 4 (2) 13 (3) 3 ( 1.5) 3 (2) 13 test f<
"Data are parts per hundred millions .
85.)
• Rank of the measure ment.
Independent Data Sets 247
Table 18.4 Computing the Correction for Ties in Eq.
18.5 for Frie dma n's Test
,,
.5 Blocks g, L; t;',J - k
t,,j
;=I
2 4
3
lz. 1 = 2, 12,2 = lz.J = 12.4 = II - 5 = 6
4 13.1 = 2, 13,2 = 13.3 = 13.4 =
5 II - 5 = 6
1 mbe r of 3 Is, I = ls.2 = 2, Is. 3 = I
6 3 17 - 5 = 12
:1 block i 16. I = 3, 16,2 = /6,3 = I 29 - 5 = 24
races { } Sum = 48
> no ties.
lion 18.5 have oxidant levels grea ter or smal ler
than any othe r station. Also
shown in Table 18.3 are the ranks of the
measurements· obtained as
1 , where in step I and the sum of the ranks for
th k - 1 each station (step 2).
Since there are ties in blocks 2, 3, 5,
.ions. The and 6, we must use Eq.
18.5 to com pute F,. First com pute the
Holl ande r quantity in braces { } in the
deno mina tor of Eq. 18.5 for all bloc
ing F, for ks that contain ties. This
computation is done in Tabl e 18.4.
. 13; k = Note that the values of all the t;./s in
77) exte nd each block must sum to
k, the num ber of populations (stations)
. Since k = 5, n = 6, and
1ese tables Sum = 48, Eq. 18.5 is
computed
referred in 12[(18.5 - 18) 2 + (7 - 18) 2 + · · · + (13 - I8f]
F =~~----~~~~-
-~--------~-
Jservations r -- -~
6(5)(6) - 48/4
= 20.1
rank 1. If For a = 0.02 5 we find from Table Al9
that x~. 975 • 4 = 11.14.
values and Since F, > 11.1 4, we reject H and
0 accept the HA that at least I
tin ·ock , station tends to have daily max imum
oxidant concentrations at a
~, Jd of different level than the othe r stations.
From Table 18.3 it appears
:r than the that stations 41541 and 60336 have cons
istently lowe r concentrations
than the othe r stations.
The rank sum test has two main advantages over the independent-sa
mple t 8. For a
test: (i) The two data sets need not be drawn from normal distribu
tions, and from 1
(ii) the rank sum test can handle a moderate number of NO values
by treating accept
them as ties (illustrated in Example 18.4). Howev er, both tests assume
that the
distributions of the two populations are identical in shape (varian
ce), but the
distributions need not be symmetric. Modifications to the t test to
account for KX
unequal variances can be made as described in Snedecor and Cochra
n (1980, In
p. 96). Evidently, no such modification exists for the rank sum test.
Reckhow col
and Chapra (1983) illustrate the use of the rank sum test on chlorop
hyll data fro•
in two lakes.
Tw
Suppose there are n 1 and n 2 data in data sets 1 and 2, respectively
(n 1 need Wi
not equal n2 ). We test
COl
H 0 : The populations from which the two data sets have tha
been drawn have the same mean the
18.6
7.
versus the alternative hypothesis sh1
HA: The populations have different means 18.7
is,
The Wilcoxon rank sum test procedure is as follows:
g 18.9 0.0059
m + 1- L::
·=I J
f.(t~ -
J
1)
} 1/2
0.0074
0.015
m(m - 1) 0,018
0.019
where g is the number of tied groups and ti is the number of tied
data in 0.019
the jth group. Equation 18.9 reduces to Eq. 18.8 when there are 0.024
no ties.
6. For an a level two-tailed test, reject H (Eq. 18.6) and accept 0.031
0 HA (Eq. 18.7)
if Z"' :s -Z 1 _ 2 or if Zrs ~ Z 1 -a12 • 0,031
0"
average rank of 2, which would not have changed the value of W,... If NDs
occur in both populations, they can be treated as tied values all less than the
smallest numerical value in the combined data set. Hence, they would each
receive the average rank value for that group of NDs, and the Wilcoxon test
could still be conducted. (See Exercise 18.4.) wher
the jt
5. For <
2
Xl-a
18.2.2 Kruskai-Wallis Test
df, a~
The Kruskal-Wallis test is an extension of the Wilcoxon rank sum test from Quad
two to k independent data sets. These data sets need not be drawn from follo'
underlying distributions that are normal or even symmetric, but the k distributions
are assumed to be identical in shape. A moderate number of tied and ND values
can be accommodated. The null hypothesis is
H 0 : The populations from which the k data sets have
been drawn have the same mean 18.10
The alternative hypothesis is
HA: At least one population has a mean larger or Les~
smaller than at least one other population 18.11 Wol
The data take the form
E
Population
,.
a
2·
1 2 3 k
g
XII X21 XJI xk,
1
x,2 Xn x32 ... xk2
1. Rank the m data from smallest to largest, that is, assign the rank 1 to the
smallest datum, the rank 2 to the next largest, and so on. If ties occur,
assign the midrank (illustrated in Example 18.5). If NDs occur, treat these
as a group of tied values that are less than the smallest numerical value in
the data set (assuming the detection limit of the ND values is less than the
smallest numerical value).
2. Compute the sum of the ranks for each data set. Denote this sum for the
jth data set by Rj.
3. If there are no tied or ND values, compute the Kruskal-Wallis statistic as
follows:
Kw =
12 k
2:: -
RJ] - 3(m + 1) 18.12
[ m(m+l)j=lnj
n1 = n2 = n 3 = 7
n1 = n2 = n 3 = 8
18.10
k =4
k =5
Less extensive exact tables are given in
18.11 Conover (1980) and Hollander and
Wolfe (1973) for k = 3 data sets.
EXAMPLE 18.5
An aliquot-size variability study is cond
ucted in which multiple soil
aliquots of sizes I g, 10 g, 25 g, 50 g,
241 and 100 g are analyzed for
Am. A portion of the data for aliquot
sizes I g, 25 g, and 100
g is used in this example. (Two ND valu
es are added for illustration.)
The full data set is discussed by Gilbert
and Doctor (1985). We test
the null hypothesis that the concentrations
from all 3 aliquot sizes
have the same mean. The alternative hypo
thesis is that the concen-
trations for at least I aliquot size tend
to be larger or smaller than
those for at least I other aliquot size.
We test at the a = 0.05
level. The data, ranks, and rank sums are
given in Table 18.6.
Pair
1 2 3 4 5 6 7 8 9 10 11
x, ND 7 ND M 3 M 3 7 12 10 15
x2 ND
" 6 6 M
0 that x 1 measurements
tend to exceed x2 measurements more often than
the reverse. Use a
0.025.
18.3 Compute Friedman's test, using the data
in Example 18.3 and a =
0.025. Ignore the correction for ties.
18.4 Suppose all 241 Am concentrations less than 0.02
pCi/g in the 2 populations
in Example 18.4 were reported by the analytical labora
tory as ND. Use
the Wilcoxon rank sum test to test H : means of
0 both populations are
equal versus HA: the offsite population has a smaller
mean than the onsite
Answers 253
rhe
population. Use a = 0.025. Wh
at effect do the large number of
have on Zrs? Is it more difficult NDs
to reject H0 if NDs are present?
6 18.5 Suppose that all 241 Am mea
surements less than 1.5 nCi/g
18.5 were reported by the laborato in Example
ry as ND. Use the Kruskal-Wallis
we on the resulting data set. Use a test
= 0.05. (Retain the 2 NDs in Example
18.5.)
~ in
1 is
are ANSWERS
18.1 Denote station 28781 data as x
data, and station 415 4l data as
B = 5. From Table Al4 , u = 17. Sinc x2 data.
e B < 7, we cannot reject H at
the a = 0.025 level. 0
uct ~ one-
l)\ ~sis
to "·~::.Aceed
od (ND =
10 II
10 15
8 3
1 surements
Use a
and a ==
,pulations
ND. Use
ations are
the onsite