Professional Documents
Culture Documents
Self-Study - The Difference Between Link Functions and Data Transformations
Self-Study - The Difference Between Link Functions and Data Transformations
Self-Study - The Difference Between Link Functions and Data Transformations
Data Transformations
by Kim Love 1 Comment
This can lead to confusion, though, because on the surface it looks very
similar to what happens when we transform the dependent variable in a linear
model, like a linear regression.
The key thing to understand is that the natural log link function is a function of
the mean of y, not the y values themselves.
Transformations of Y
Below is a linear model equation where the original dependent variable, y, has
been natural log transformed. That is, the natural log has been taken of each
individual value of y, and that is being used as the dependent variable.
The linear model with the log transformation is providing an equation for an
individual value of ln(y). We could also write it as follows, where we are
modeling the mean of ln(y) (note the error term is no longer present):
This makes the difference a bit clearer. When we transform the data in a
linear model, we are no longer claiming that y is normally distributed around a
mean, given the x values — we are claiming that our new outcome variable,
ln(yi), is normally distributed.
In the case of the Poisson model, however, the link function does not change
the distribution of the actual observations in some way to make them
something other than Poisson distributed. Instead, the link function defines the
relationship of the x variables directly to the mean of the Poisson distributed y.
The individual observations then vary around this expected value accordingly.
You might be surprised to know, though, that you can do this with a link
function. If you have specific values of your x variables, you can calculate the
predicted average count, μy based on those x values by inversing the natural
log:
This ability to back-transform means (and regression coefficients) to a more
intuitive scale is part of what makes generalized linear models so useful.