Professional Documents
Culture Documents
Box-Cox Transformation: An Overview
Box-Cox Transformation: An Overview
Introduction
Since the seminal paper by Box and Cox(1964), the Box-Cox type
of power transformations have generated a great deal of interests,
both in theoretical work and in practical applications. In this
presentation, I intend to go over the following topics:
◮ The following are Q-Q Normal plots for a random sample of size
500 from Exp(1000) distribution.
Q−Q Normality plot for original data Q−Q Normality plot for lambda=0.2
5000
5
Sample Quantiles
Sample Quantiles
4
3000
3
0 1000
2
−3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3
Exp(1000) Weibull(5,1000)
Q−Q Normality plot for lambda=0.275 Q−Q Normality plot for lambda=0.35
20
10
Sample Quantiles
Sample Quantiles
15
8
6
10
4
5
2
−3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3
Weibull(3.64,1000) Weibull(2.86,1000)
Since the work of Box and Cox(1964), there have been many
modifications proposed.
◮ Manly(1971) proposed the following exponential transformation:
eλy −1 , if λ 6= 0;
λ
y(λ) =
y, if λ = 0.
where
1, if y ≥ 0;
Sign(y) =
−1, if y < 0.
n
lP (λ) = C − log(s2λ ),
2
where s2λ is the residual sum of squares divided by n from fitting
the linear model y(λ, g) ∼ N(Xβ, σ 2 In ). So to maximize the profile
log-likelihood, we only need to find a λ that minimizes
2 y(λ,g)′ (In −G)y(λ,g)
sλ = n .
◮ Without any further effort and just use the standard likelihood
methods, we could easily give a likelihood ratio test. For test
H0 : λ = λ0 , the test statistic is W = 2[lP (λ̂) − lP (λ0 )].
Asymptotically W is distributed as χ21 . Carefully note that W is a
function of both the data (through λ̂) and λ0 .
◮ A large sample CI for λ is easily obtainable by inverting the
likelihood ratio test. Let λ̂ be the MLE of λ, then an approximate
(1 − α)100% CI for λ is
SSE(λ)
{λ | n × log( ) ≤ χ21 (1 − α)},
SSE(λ̂)
where SSE(λ) = y(λ, g)′ (In − G)y(λ, g). The accuracy of the
approximation is given by the following fact:
− 12
P (W ≤ χ21 (1 − α)) = 1 − α + O(n ).
◮ It’s also not hard to derive a test using Rao’s score statistic.
Atkinson(1973) first proposed a score-type statistic for test
H0 : λ = λ0 , although the derivation were not based on likelihood
theory. Lawrence(1987) modified the result by Atkinson(1973), by
employing the standard likelihood theory.
1
◮ Box and Cox(1964) chose bλ = J(λ, y) n = g λ−1 , which they
admitted that such choice was “somewhat arbitrary”. This gives
the Box-Cox version of the prior distribution.
◮ Pericchi(1981) followed exactly the same argument, with the
exception that the use of Jefferys’ prior for (θ, σ) instead of
invariant non-informative prior.
◮ Clearly the Box and Cox’s prior is “outcome-dependent”, which
seems to be an undesirable property.
◮ It’s not hard to see that the posterior distribution for Box and
Cox prior is
1 b ′ A′ A(θ − θ)
Sλ + (θ − θ) b
(λ−1)(n−r)
π1 (λ, θ, σ|y, A) ∝ n+1 ×exp(− 2
)×g ×π(λ),
σ 2σ
where Sλ = (y(λ) − Aθ̂)′ (y(λ) − Aθ̂).
The posterior distribution for Pericchi’s prior is
1 b ′ A′ A(θ − θ)
Sλ + (θ − θ) b
(λ−1)n
π2 (λ, θ, σ|y, A) ∝ n+r+1 ×exp(− 2
)×g ×π(λ).
σ 2σ
◮ Integrating out (θ, σ), we can then get the posterior log
likelihood for λ alone. For Box and Cox’s prior,
1 Sλ 1
l1 (λ) = C − (n − r) log( × ),
2 n − r g 2(λ−1)
and for Pericchi’s prior,
1 Sλ 1
l2 (λ) = C − n log( × 2(λ−1) ).
2 n g
−613.0
95%
−613.5
−614.0
−614.5
0.5 1.0 1.5 2.0 2.5
λ
Pengfei Li Apr 11,2005
Box-Cox Transformation: An Overview
4
3
log of fitted slope
2
1
lambda
5
log of residual standard deviation
4
3
2
1
lambda
0.38
ratio of fitted slope to residual standard deviation
0.36
0.34
0.32
0.30
lambda
4
1000 2000
33 33
Standardized residuals
3
18 18
2
Residuals
1
0
0
−1000
−2 −1
64
64
0.5
33
77
18
Standardized residuals
0.4
1.5
64
Cook’s distance
0.3
1.0
17
0.2
0.5
0.1
64
0.0
0.0
11.5
11.0
10.5
fitted slope
10.0
9.5
9.0
lambda
30.1
30.0
residual standard deviation
29.9
29.8
29.7
29.6
lambda
0.38
ratio of fitted slope to residual standard deviation
0.36
0.34
0.32
0.30
lambda
4
1000 2000
33 33
Standardized residuals
3
18 18
2
Residuals
1
0
0
−1000
−2 −1
64
64
0.5
33
77
18
Standardized residuals
0.4
1.5
64
Cook’s distance
0.3
1.0
17
0.2
0.5
0.1
64
0.0
0.0
◮ However, such scaling has not much effect other than producing
comparable estimates. The test statistic remain the same as if
unscaled. Thus it may not be useful for ANOVA model.