Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

7

Using Transformations

KEY WORDS antilog, arcsin, bacterial counts, Box-Cox transformation, cadmium, confidence inter-
val, geometric mean, transformations, linearization, logarithm, nonconstant variance, plankton counts,
power function, reciprocal, square root, variance stabilization.

There is usually no scientific reason why we should insist on analyzing data in their original scale of
measurement. Instead of doing our analysis on y it may be more appropriate to look at log(y), y, 1/y,
or some other function of y. These re-expressions of y are called transformations. Properly used trans-
formations eliminate distortions and give each observation equal power to inform.
Making a transformation is not cheating. It is a common scientific practice for presenting and inter-
+
preting data. A pH meter reads in logarithmic units, pH = – log 10 [ H ] and not in hydrogen ion concen-
tration units. The instrument makes a data transformation that we accept as natural. Light absorbency
is measured on a logarithmic scale by a spectrophotometer and converted to a concentration with the
aid of a calibration curve. The calibration curve makes a transformation that is accepted without
hesitation. If we are dealing with bacterial counts, N, we think just as well in terms of log(N ) as N itself.
There are three technical reasons for sometimes doing the calculations on a transformed scale: (1) to
make the spread equal in different data sets (to make the variances uniform); (2) to make the distribution
1
of the residuals normal; and (3) to make the effects of treatments additive (Box et al., 1978). Equal
variance means having equal spread at the different settings of the independent variables or in the different
data sets that are compared. The requirement for a normal distribution applies to the measurement errors
and not to the entire sample of data. Transforming the data makes it possible to satisfy these requirements
when they are not satisfied by the original measurements.

Transformations for Linearization


Transformations are sometimes used to obtain a straight-line relationship between two variables. This
may involve, for example, using reciprocals, ratios, or logarithms. The left-hand panel of Figure 7.1 shows
the exponential growth of bacteria. Notice that the variance (spread) of the counts increases as the population
density increases. The right-hand panel shows that the data can be described by a straight line when plotted
on a log scale. Plotting on a log scale is equivalent to making a log transformation of the data.
The important characteristic of the original data is the nonconstant variance, not nonlinearity. This is
a problem when the curve or line is fitted to the data using regression. Regression tries to minimize the
distance between the data points and the line described by the model. Points that are far from the line
exert a strong effect because the regression mathematics wants to reduce the square of this distance. The result
is that the precisely measured points at time t = 1 will have less influence on the position of the regression
line than the poorly measured data at t = 3. This gives too much influence to the least reliable data. We
would prefer for each data point to have about the same amount of influence on the location of the line.
In this example, the log-transformed data have constant variance at the different population levels. Each data

1 a b
For example, if y = x z , a log transformation gives log y = a log x + b log z. Now the effects of factors x and z are additive.
See Box et al. (1978) for an example of how this can be useful.

© 2002 By CRC Press LLC


1200 10000
1000

MPN count
800 1000
600
400 100
200
0 10
0 1 2 3 4 0 1 2 3 4
Time Time

FIGURE 7.1 An example of how a transformation can create constant variance. Constant variance at all levels is important
so each data point will carry equal weight in locating the position of the fitted curve.

100 100

80
Concentration

60
10
40

20

0 1
0 2 4 6 8 10 12 0 2 4 6 8 10 12

FIGURE 7.2 An example of how a transformation could create nonconstant variance.

value has roughly equal weight in determining the position of the line. The log transformation is used to
achieve this equal weighting and not because it gives a straight line.
A word of warning is in order about using transformations to obtain linearity. A transformation can
turn a good situation into a bad one by distorting the variances and making them unequal (see Chapter 45).
Figure 7.2 shows a case where the constant variance of the original data is destroyed by an inappropriate
logarithmic transformation.
In the examples above it was easy to check the variances at the different levels of the independent variables
because the measurements had been replicated. If there is no replication, this check cannot be made. This
is only one reason why replication is always helpful and why it is recommended in experimental and moni-
toring work.
Lacking replication, should one assume that the variances are originally equal or unequal? Sometimes
the nature of the measurement process gives a hint as to what might be the case. If dilutions or concentrations
are part of the measurement process, or if the final result is computed from the raw measurements, or
if the concentration levels are widely different, it is not unusual for the variances to be unequal and to
be larger at high levels of the independent variable. Biological counts frequently have nonconstant
variance. These are not justifications to make transformations indiscriminately. Do not avoid making
transformations, but use them wisely and with care.

Transformations to Obtain Constant Variance


When the variance changes over the range of experimental observations, the variance is said to be non-
constant, or unstable. Common situations that tend to create this pattern are (1) measurements that involve
making dilutions or other steps that introduce multiplicative errors, (2) using instruments that read out on
a log scale which results in low values being recorded more precisely than high values, and (3) biological
counts. One of the transformations given in Table 7.1 should be suitable to obtain constant variance.
© 2002 By CRC Press LLC
TABLE 7.1
Transformations that are Useful to Obtain Uniform Variance
Condition Replace y by
σ uniform over range of y no transformation needed
σ ∝ y2 x = 1/y
σ ∝ y 3/2 x = 1/ y
σ ∝ y ( σ2 > y ) all y > 0 x = log(y)
some y = 0 x = log(y + c)
σ ∝y 1/2
all y > 0 x = y
some y < 0 x = y+c
σ >y
2
p = ratio or percentage x = arcsin p

Source: Box, G. E. P., W. G. Hunter, and J. S. Hunter (1978). Statistics for Experimenters:
An Introduction to Design, Data Analysis, and Model Building, New York, Wiley Interscience.

TABLE 7.2
Plankton Counts on 20 Replicate Water Samples from Five Stations in a Reservoir
Station 1 0 2 1 0 0 1 1 0 1 1 0 2 1 0 0 2 3 0 1 1
Station 2 3 1 1 1 4 0 1 4 3 3 5 3 2 2 1 1 2 2 2 0
Station 3 6 1 5 7 4 1 6 5 3 3 5 3 4 3 8 4 2 2 4 2
Station 4 7 2 6 9 5 2 7 6 4 3 5 3 6 4 8 5 2 3 4 1
Station 5 12 7 10 15 9 6 13 11 8 7 10 8 11 8 14 9 6 7 9 5
Source: Elliot, J. (1977). Some Methods for the Statistical Analysis of Samples of Benthic Invertebrates, 2nd ed.,
Ambleside, England, Freshwater Biological Association.

TABLE 7.3
Statistics Computed from the Data in Table 7.2
Station 1 2 4 5 6
Untransformed data y = 0.85 2.05 3.90 4.60 9.25
2
s y = 0.77 1.84 3.67 4.78 7.57
Transformed x = y+c x = 1.10 1.54 2.05 2.20 3.09
2
s x = 0.14 0.20 0.22 0.22 0.19

The effect of square root and logarithmic transformations is to make the larger values less important
relative to the small ones. For example, the square root converts the values (0, 1, 4) to (0, 1, 2). The 4,
which tends to dominate on the original scale, is made relatively less important by the transformation.
The log transformation is a stronger transformation than the square root transformation. “Stronger” means
that the range of the transformed variables is relatively smaller for a log transformation that for the square
root. When the sample contains some zero values, the log transformation is x = log(y + c), where c is a
constant. Usually the value of c is arbitrarily chosen to be 1 or 0.5. The larger the value of c, the less severe
the transformation. Similarly, for square root transformations, y + c is less severe than y.
The arcsin transformation is used for decimal fractions and is most useful when the sample includes values
near 0.00 and 1.00. One application is in bioassys where the data are fractions of organisms showing an effect.

Example 7.1

Twenty replicate samples from five stations were counted for plankton, with the results given in
Table 7.2. The computed averages and variances are in Table 7.3. The computed means and variance
on the original data show that variance is not uniform; it is ten times larger at station 5 than at
2
station 1. Also, the variance increases as the average increases and s y seems to be proportional to
y. This indicates that a square root transformation may be suitable. Because most of the counts

© 2002 By CRC Press LLC


10
Station 1

0
10
Station 2

0
10
Frequency
Station 3

0
10

Station 4

0
10

Station 5

0
0 4 8 12 16 0 1 2 3 4
Plankton Count Count + 0.5

FIGURE 7.3 Original and transformed plankton counts.

TABLE 7.4
Eight Replicate Measurements on Bacteria at Three Sampling Stations
y = Bacteria/100 mL x = log10(Bacteria/100 mL)
1 2 3 1 2 3
27 225 1020 1.431 2.352 3.009
11 99 136 1.041 1.996 2.134
48 41 317 1.681 1.613 2.501
36 60 161 1.556 1.778 2.207
120 190 130 2.079 2.279 2.114
85 240 601 1.929 2.380 2.779
18 90 760 1.255 1.954 2.889
130 112 240 2.144 2.049 2.380
y = 59.4 132 420.6 x = 1.636 2.050 2.502
2 2
s y = 2156 5771 111,886 s x = 0.151 0.076 0.124

are small, the transform used was y + 0.5 . Figure 7.3 shows the distribution of the original and
the transformed data. The transformed distributions are more symmetrical and normal-like than the
originals. The variances computed from the transformed data are uniform.

Example 7.2

Table 7.4 shows eight replicate measurements of bacterial density that were made at three
locations to study the spatial pattern of contamination in an estuary. The data show that s > y and
2

that s increases in proportion to y. Table 7.1 suggests a logarithmic transformation. The improvement
due to the log transformation is shown in Table 7.4. Note that the transformation could be done
using either loge or log10 because they differ by only a constant (loge = 2.303 log10).
© 2002 By CRC Press LLC
Confidence Intervals and Transformations
After summary statistics (means, standard deviations, etc.) have been calculated on the transformed
scale, it is often desirable to translate the results back to the original scale of measurement. This can
create some confusion. For example, if the average x has been estimated using x = log(y), the simple
back-transformation of antilog( x ) does not give an unbiased estimate of y. The antilogarithm of x is
the geometric mean of the original data (y); that is, antilog( x ) = y g . The correct estimate of the arithmetic
2
mean on the original y scale is y = antilog( x + 0.5 s ) (Gilbert, 1987).
If the transformation produced a near-normal distribution, the standard deviations and standard errors
computed from the transformed data will be symmetric about the mean on the transformed scale. But
they will be asymmetric on the original scale. The options are to:

1. Quote symmetric confidence limits on the transformed scale.


2. Quote asymmetric confidence limits on the original scale (recognizing that in the case of a
log transformation they apply to the geometric mean and not to the arithmetic average).
3. Give two sets of results, one with standard errors and symmetric confidence limits on the
transformed scale and a corresponding set of means (arithmetic and geometric) on the original
scale. The reader can judge the statistical significance on the transformed scale and the
practical importance on the original scale.

Two examples illustrate the use of log-transformed data to construct confidence limits for the geometric
mean.

Example 7.3
2
A sample of n = 5 observations [95, 20, 74, 195, 71] gives y = 91 and s y = 4,140. Clearly,
s y > y and a log transformation should be tried. The x = log10(y) values are 1.97772, 1.30103,
2
2
1.86923, 2.29003, and 1.85126. This gives x = 1.85786 and s x = 0.12784. The value of t for
ν = n −1 = 4 degrees of freedom and α /2 = 0.025 is 2.776. Therefore, the 95% confidence interval
for the mean on the log-transformed scale, ηx, is:

2
s 0.1278
x ± t ----x = 1.85786 ± 2.776 ---------------
- = 1.85786 ± 0.44381
n 5
and

1.4140 < η x < 2.3017

Transforming ηx back to the original scale gives an estimate of the geometric mean of the y’s:

y g = antilog ( x ) = antilog ( 1.85786 ) = 72.09

The asymmetric 95% confidence limits for the true value of the geometric mean, ηx, are obtained
by taking antilogarithms of the upper and lower confidence limits of ηx, which gives:

25.94 ≤ η g ≤ 200.29

Note that the upper and lower confidence limits in the log metric are x + β and x – β , where
β = t α /2 s /n. The upper confidence limit on the original scale is antilog ( x + β ) = antilog ( x ) ⋅
2

antilog( β ), which becomes y g ⋅ β ′ where β ′ = antilog(β). Likewise, the lower confidence limit is
y g / β ′. For this example, β = 0.44381, antilog (0.44381) = 2.778, and the 95% confidence limits
for the geometric mean on the original scale are 72.09(2.7785) = 200.29 and 72.09/2.7785 = 25.94.
© 2002 By CRC Press LLC
Example 7.4

A log transformation is needed on a sample of n = 6 that includes some zeros: y = [0, 2, 1, 0,


3, 9]. Because of the zeros, use the transformation x = log(y + 1) to find the transformed values:
x = [0, 0.47712, 0.30103, 0.60206, 1.0]. This gives x = 0.39670 with standard deviation
2
s x = 0.14730 . For ν = n −1 = 5 and α /2 = 0.025, t5,0.025 = 2.571 and the 95% confidence limits
on the transformed scale are:

LCL ( x ) = 0.39670 – 2.571 0.14730/6 = – 0.006 (say zero)

and

UCL ( x ) = 0.39670 + 2.571 0.14730/6 = 0.79954

Transforming back to the original metric gives the geometric mean:

y g = antilog 10 ( 0.39670 ) – 1 = 2.5 – 1 = 1.5

The −1 is due to using x = log(y + 1). The similar inverse of the confidence limits gives:

LCL ( y ) = 0 and UCL ( y ) = 5.3

Confidence Intervals on the Original Scale


Notice that the above examples are for the geometric mean ηg and not the arithmetic mean η. The work
becomes more difficult if we want the asymmetric confidence intervals of the arithmetic mean on the
original scale. This will be shown for the lognormal distribution.
A simple method of estimating the mean η and variance σ of the two-parameter lognormal distribution
2

is to use:

2
s
η̂ = exp  x + ----x σ̂ = η̂ [ exp ( s x ) – 1 ]
2 2 2
and
2

where η̂ and σ̂ are the estimated mean and variance. x and s x are calculated in the usual way shown
2 2

in Chapter 2 (Gilbert, 1987).


These estimates are slightly biased upward by about 5% for n = 20 and 1% for n = 100. The importance
of this when using η̂ to judge compliance with environmental standards, or when comparing estimates
based on unequal n, is discussed by Gilbert (1987) and Landwehr (1978).
Computing confidence limits involves more than simply adding a multiple of the standard deviation
(as we do for the normal distribution). Confidence limits for the mean, η, are estimated using the
following equations (Land, 1971):

syH 1 – α 
UCL 1 – α = exp  x + 0.5s x + ----------------
2
-
 n – 1
syH α 
LCL α = exp  x + 0.5s x + ---------------
2
-
 n – 1

The quantities H1–α and Hα depend on sx , n, and the confidence level α. Land (1975) provides the
necessary tables; a subset of these may be found in Gilbert (1987).

© 2002 By CRC Press LLC


The Box-Cox Power Transformations
A power transformation model developed by Box and Cox (1964) can, so far as possible, satisfy the
conditions of normality and constant variance simultaneously. The method is applicable for almost any
(λ)
kind of statistical model and any kind of transformation. The transformed value Y i of the original vari-
able yi is:

λ
(λ) yi – 1
Yi = -------------
λ–1
-
λ yg

where y g is the geometric mean of the original data series, and λ expresses the power of the transfor-
mation. The geometric mean is obtained by averaging ln(y) and taking the exponential (antilog) of the
(0)
result. The special case when λ = 0 is the log transformation: Y i = y g ln ( y i ). λ = −1 is a reciprocal
transformation, λ = 1/2 is a square root transformation, and λ = 1 is no transformation. Example
applications of this transformation are given in Box et al. (1978).

Example 7.5

Table 7.5 lists 36 measurements on cadmium (Cd) in soil, and their logarithms. The Cd concen-
trations range from 0.005 to 0.094 mg/kg. The limit of detection was 0.01. Values below this
were arbitrarily replaced with 0.005. Comparisons must be made with other sets of similar data
and some transformation is needed before this can be done. Experience with environmental data
suggests that a log transformation may be useful, but something better might be discovered if
we make the Box-Cox transformation for several values of λ and compare the variances of the
transformed data.
The variance of the log-transformed values in Table 7.5 is σ ln ( y ) = 0.549. This cannot be
2

compared directly with the variance from, for instance, a square root transformation unless the
calculations are normalized to keep them on the same relative scale. The denominator of the Box-
λ –1
Cox transformation, λ y g , is a normalizing factor to make the variances comparable across different

TABLE 7.5
Cadmium Concentrations in Soil
Cadmium ln
(mg/kg) [Cadmium]
0.023 0.005 0.005 −3.7723 −5.2983 −5.2983
0.020 0.005 0.032 −3.9120 −5.2983 −3.4420
0.010 0.005 0.031 −4.6052 −5.2983 −3.4738
0.020 0.013 0.005 −3.9120 −4.3428 −4.2687
0.020 0.005 0.014 −3.9120 −5.2983 −3.9120
0.020 0.094 0.020 −3.9120 −2.3645 −5.2983
0.010 0.011 0.005 −4.6052 −4.5099 −3.6119
0.010 0.005 0.027 −4.6052 −5.2983 −4.1997
0.010 0.005 0.015 −4.6052 −5.2983 −3.3814
0.010 0.028 0.034 −4.6052 −3.5756 −5.2983
0.010 0.010 0.005 −4.6052 −4.6052 −5.2983
0.005 0.018 0.013 −5.2983 −4.0174 −4.3428
Average = 0.0161 Average of ln[Cd] = −4.42723
Variance = 0.000255 Variance = 0.549
Geo. mean = 0.01195
Note: Concentrations in mg/kg.

© 2002 By CRC Press LLC


TABLE 7.6
Transformed Values, Means, and Variances as a Function of λ
λ = −1 λ = − 0.5 λ=0 λ = 0.5 λ=1
−0.0061 −0.0146 −0.0451 −0.1855 −0.9770
−0.0070 −0.0159 −0.0467 −0.1877 −0.9800
−0.0141 −0.0235 −0.0550 −0.1967 −0.9900
… … … … …
… … … … …
−0.0284 −0.0343 −0.0633 −0.2032 −0.9950
−0.0108 −0.0203 −0.0519 −0.1937 −0.9870
yλ = −0.01496 −0.0228 −0.0529 −0.1930 −0.9839
λ
Var ( y ) = 0.000093 0.000076 0.000078 0.000113 0.000255

log transformation
0.3
Variance (x1000)

No transformation
0.2

0.1

0.0
-1.0 -0.5 0.0 0.5 1.0
λ

FIGURE 7.4 Variance of the transformed cadmium data as a function of λ.

values of λ. The geometric mean of the untransformed data is y g = exp ( – 4.42723 ) = 0.01195.
The denominator for λ = 0.5, for example, is 0.5(0.01195)
0.5−1
= 4.5744; the denominator for
λ = −0.5 is −765.747.
Table 7.6 gives some of the power-transformed data for λ = −1, −0.5, 0, 0.5, and 1. λ = 1 is
(1)
no transformation ( Y i = y i – 1 ) except for scaling to be comparable to the other transforma-
(0)
tions. λ = 0 is the log transformation, calculated from Y i = y g ln ( y i ) , which is again scaled
so the variance can be compared directly with variances of other power-transformed values.
The two bottom rows of Table 7.6 give the mean and the variance of the power-transformed
values. The suitable transformations give small variances. Rather than pick the smallest value
from the table, make a plot (Figure 7.4) that shows how the variance changes with λ. The smooth
curve is drawn as a reminder that these variances are estimates and that small differences between
them should not be taken seriously. Do not seek an optimal value of λ that minimizes the variance.
Such a value is likely to be awkward, like λ = 0.23. The data do not justify such detail, especially
because the censored values (y < 0.01) were arbitrarily replaced with 0.005. (This inflates the variance
from whatever it would be if the censored values were known.) Values of λ = −0.5, λ = 0, or λ = 0.5
are almost equally effective transformations. Any of these will be better than no transformation
(λ = 1). The log transformation (λ = 0) is very satisfactory and is our choice as a matter of
convenience.
Figure 7.5 shows dot diagrams for the original data, the square root (λ = 0.5), the logarithms
(λ = 0), and reciprocal square root (λ = −0.5). The log transformation is most symmetric, but
it is not normal because of the 11 non-detect data that were replaced with 0.005 (i.e., 1/2 the
MDL).

© 2002 By CRC Press LLC


λ = - 0.5
1// √ y
2 4 6 8 10 12 14 16

λ=0
y
0.00 0.02 0.04 0.06 0.08 0.10

λ = 0.5
√y
0.0 0.1 0.2 0.3 0.4

λ=1
In (y)
-6 -5 -4 -3 -2

FIGURE 7.5 Dot diagrams of the data and the square root, log, and reciprocal square root transformed values. The eye-
catching spike of 11 points are “non-detects” that were arbitrarily assigned values of 0.005 mg/kg.

Comments
Transformations are not tricks to reduce variation or to convert a complicated nonlinear model into a
simple linear form. There are often statistical justifications for making transformations and then analyzing
the transformed data. They may be needed to stabilize the variance or to make the distribution of the
errors normal. The most common and important use is to stabilize (make uniform) the variance.
It can be tempting to use a transformation to make a nonlinear function linear so that it can be fitted
using simple linear regression methods. Sometimes the transformation that gives a linear model will
coincidentally produce uniform variance. Beware, however, because linearization can also produce the
opposite effect of making constant variance become nonconstant (see Chapter 45).
When the analysis has been done on transformed data, the analyst must consider carefully whether
to report the final results on the transformed or the original scale of measurement. Confidence intervals
that are symmetrical on the transformed scale will not be symmetric when transformed back to the
original scale. Care must also be taken when converting the mean on the transformed scale back to the
original scale. A simple back-transformation typically does not give an unbiased estimate, as was
demonstrated in the case of the logarithmic transformation.

References
Box, G. E. P. and D. R. Cox (1964). “An Analysis of Transformations,” J. Royal Stat. Soc., Series B, 26, 211.
Box, G. E. P., W. G. Hunter, and J. S. Hunter (1978). Statistics for Experimenters: An Introduction to Design,
Data Analysis, and Model Building, New York, Wiley Interscience.
Elliot, J. (1977). Some Methods for the Statistical Analysis of Samples of Benthic Invertebrates, 2nd ed.,
Ambleside, England, Freshwater Biological Association.
Gilbert, R. O. (1987). Statistical Methods for Environmental Pollution Monitoring, New York, Van Nostrand
Reinhold.

© 2002 By CRC Press LLC


Land, C. E. (1975). “Tables of Confidence Limits for Linear Functions of the Normal Mean and Variance,”
in Selected Tables in Mathematical Statistics, Vol. III, Am. Math. Soc., Providence, RI, 358–419.
Landwehr, J. M. (1978). “Some Properties of the Geometric Mean and its Use in Water Quality Standards,”
Water Resources Res., 14, 467–473.

Exercises
7.1 Plankton Counts. Transform the plankton data in Table 7.2 using a square root transformation
x = sqrt( y) and a logarithmic transformation x = log( y) and compare the results with those
shown in Figure 7.3.
7.2 Lead in Soil. Examine the distribution of the 36 measurements of lead (mg/kg) in soil and
recommend a transformation that will make the data nearly symmetrical and normal.

7.6 32 5 4.2 14 18 2.3 52 10 3.3 38 3.4 4.3 0.10 5.7 0.10 0.10 4.4
0.42 0.10 16 2.0 1.2 0.10 3.2 0.43 1.4 5.9 0.23 0.10 0.10 0.23 0.29 5.3 2.0 1.0

7.3 Box-Cox Transformation. Use the Box-Cox power function to find a suitable value of λ to
transform the 48 lead measurements given below. Note: All < MDL values were replaced by
0.05.

7.6 32 5.0 4.2 14 18 2.3 52 10 3.3 38 3.4 4.3 0.05 0.05 0.10
0.10 0.05 0.05 0.05 0.0 0.05 1.2 0.10 0.10 0.10 0.10 0.10 0.23 4.4 0.42 0.10
16. 2.0 2.0 1.0 3.2 0.43 1.4 0.10 5.9 0.10 0.10 0.23 0.29 5.3 5.7 0.10

7.4 Are Transformations Necessary? Which of the following are correct reasons for transforming
data? (a) Facilitate interpretation in a natural way. (b) Promote symmetry in a data sample.
(c) Promote constant variance in several sets of data. (d) Promote a straight-line relationship
between two variables. (e) Simplify the structure so that a simple additive model can help us
understand the data.
7.5 Power Transformations. Which of the following statements about power transformations are
correct? (a) The order of the data in the sample is preserved. (b) Medians are transformed to
medians, and quartiles are transformed to quartiles. (c) They are continuous functions. (d)
Points very close together in the raw data will be close together in the transformed data, at
least relative to the scale being used. (e) They are smooth functions. (f) They are elementary
functions so the calculations of re-expression are quick and easy.

© 2002 By CRC Press LLC

You might also like