Professional Documents
Culture Documents
Functional Data Analysis PDF
Functional Data Analysis PDF
1
2 FUNCTIONAL DATA ANALYSIS
1.5
180 1
0.5
160
0
Centimeters/year2
140
Centimeters
−0.5
−1
120
−1.5
100 −2
−2.5
80
−3
60 −3.5
5 10 15 5 10 15
Years Years
Figure 1. The left panel shows irregularly spaced observations of the heights of ten girls, along
with the smooth increasing curves fit to each set of data. The right panel shows the second
derivative or height acceleration functions computed from the smooth curves in the left panel. The
heavy dashed line shows the mean of these acceleration curves.
the mth derivative of function x at argument can define a much wider variety of func-
value t. tion behaviors than conventional parameter-
The right panel of Figure 1 shows the based approaches, and this is especially so for
estimated second derivative D2 h(t) or accel- nonlinear differential equations. Hence dif-
eration of height for each girl. Height accel- ferential equations offer potentially powerful
eration displays more directly than height modelling strategies.
itself the dynamics of growth in terms of For example, returning to the need for a
the input of hormonal stimulation and other height function h to be strictly increasing or
factors into these biological systems, just as monotonic, it can be shown that any such
acceleration in mechanics reflects exogenous function satisfies the differential equation
forces acting on a moving body.
We also use derivatives to impose smooth-
ness or regularity on estimated functions by w(t)Dh(t) + D2 h(t) = 0, (1)
controlling a measure of the size of a deriva-
tive. For example, it is common practice to
where w(t) is some function that is itself
control the size of the second derivative or
unconstrained in any way. This equation
curvature of a function
using the total cur-
transforms the difficult problem of estimating
vature measure [D2 x(t)]2 dx. The closer to
h(t) to the much easier one of estimating w(t),
zero this measure is, the more like a straight
and also plays a role in curve registration
line the curve will be. More sophisticated
discussed below. See [6] for other examples
measures are both possible and of genuine
of differential equation models for functional
practical value.
parameters in statistics.
A differential equation is an equation
Finally, given actual functional data, can
involving one or more derivatives as well as
we estimate a differential equation whose
possibly the function value. For example, the
solution will approximate the data closely?
differential equation defining harmonic or
The technique principal differential analysis
sinusoidal behavior is ωx + D2 x = 0 for some
(PDA) for doing this can be an important part
positive constant ω. Differential equations
of a functional data analyst’s toolbox. See the
entry for this topic.
FUNCTIONAL DATA ANALYSIS 3
deal with, is still subject to limits on the have been more or less replaced by spline
energy required to produce it. These issues functions (see article for details) for non-
are of practical importance when we con- periodic data, but Fourier series remains
sider hypothesis testing and other inferential the basis of choice for periodic problems.
issues. Both splines and Fourier series offer excellent
control over derivative estimation. Wavelet
FIRST STEPS IN FUNCTIONAL DATA bases are especially important for rendering
ANALYSES sharp localized features.
In general, basis function expansions are
Techniques for Representing Functions easy to fit to data and easy to regularize.
In the typical application the discrete obser-
The first task in a functional data analy- vations yj are fit by minimizing a penalized
sis is to capture the essential features of least squares criterion such as
the data in a function. This we can always
n
do by interpolating the data with a piece-
[yj − x(tj )]2 + λ [D2 x(t)]2 dt (3)
wise linear curve; this not only contains the
j
original data within it, but also provides
interpolating values between data points. When x(t) is defined (2), solving for the coeffi-
However, some smoothing can be desirable cients ck that minimize the criterion implies
as a means of reducing observational error, solving a matrix linear equation. The larger
and if one or more derivatives are desired, penalty parameter λ is, the more x(t) will be
then the fitted curve should also be differ- like a straight line, and he close λ is to zero,
entiable at least up to the level desired, and the more nearly x(t) will fit the data. There
perhaps a few derivatives beyond. Additional are a variety of data-dependent methods for
constraints may also be required such as pos- finding an appropriate value of λ; see entry
itivity, monotonicity, and etc. The smoothing ?? for more details. It is also possible to sub-
or regularization aspect of fitting can often stitute other more sophisticated differential
be deferred, however, by applying it to the operators for D2 , and consequently smooth
functional parameters to be estimated from toward targets other than straight lines; see
the data rather than to the data-fitting curve [3] for examples and details.
itself. For multi-dimensional arguments, the
Functions are traditionally defined by a choice of bases become more limited. Radial
small number of parameters that seem to and tensor-product spline bases are often
capture the main shape features in the data, used, but are problematical over regions
and ideally are also motivated by substantive with complex boundaries or regions where
knowledge about the process. For the height large areas do not contain observations. On
data, for example several models have been the other hand, the triangular mesh and
proposed, and the most satisfactory require local linear bases associated with finite ele-
eight or more parameters. But when the ment methods for solving partial differential
curve has a lot of detail or high accuracy is equations are extremely flexible, and soft-
required, parametric methods usually break ware tools for manipulating them are widely
down. In this case two strategies dominate available. Moreover, many data-fitting prob-
most of the applied work. lems can be expressed directly as partial
One approach is to use a system of basis differential equation solutions; see [7] and
functions φk (t) in the linear combination [9] for examples.
The other function-fitting strategy is to
K
x(t) = ck φk (t). (2) use various types of kernel methods (see [4]),
k
which are essentially convolutions of the data
sequence with a specified kernel K(s − t). The
The basis system must have enough resolv- convolution of an image with a Gaussian ker-
ing power so that, with K large enough, any nel is a standard procedure in image analysis,
interesting feature in the data can be cap- and also corresponds to a solution of the heat
tured by x. Polynomials, where φk (t) = tk−1 , or diffusion partial differential equation.
FUNCTIONAL DATA ANALYSIS 5
Basis function and convolution methods time is transformed for each child to biologi-
share many aspects, and each can be made to cal time, so that all children are in the same
look like the other. They do, however, impose growth phase with respect to the biological
limits on what kind of function behavior can time scale. This transformation of time h(t)
be accommodated in the sense that they can- is often called a time warping function, and
not do more than is possible with the bases it must, like a growth curve, be both smooth
or kernels that are used. For example, there and strictly increasing. As a consequence,
are situations where sines and cosines are, in the differential equation (1) also defines h(t),
a sense, too smooth, being infinitely differen- and, through the estimation of weight func-
tiable, and for this reason are being replaced tion w(t), can be used to register a set of
by wavelet bases in some applications, and curves.
especially in image compression where the The mean function x(t) summarizes the
image combines local sharp features with location of a set of curves, possibly after
large expanses of slow variation. registration. Variation among functions of a
Expressing both functional data and func- single argument is described by covariance
tional parameters as differential equations and correlation surfaces, v(s, t) and r(s, t),
greatly expands the horizons of function rep- respectively. These are bivariate functions
resentation. Even rather simple differential that are the analogs of the corresponding
equations can define behavior that would matrices V and R in multivariate statistics.
be difficult to mimic in any basis system, Of course, the simpler point-wise standard
and references on chaos theory and nonlin- deviation curve s(t) may also be of interest.
ear dynamics offer plenty of examples; see
[1]. Linear differential equations are already
used in fields such as control theory to repre- FUNCTIONAL DATA ANALYSES
sent functional data, typically with constant
coefficients. Linear equations with noncon- FDA involves functional counterparts of mul-
stant coefficients are even more flexible tools, tivariate methods such as principal compo-
as we saw in equation (1 for growth curves, nents analysis, canonical correlation analy-
and nonlinear equations are even more so. sis, and various linear models. In principal
More generally, functional equations of all components analysis, for example, the func-
sorts will probably emerge as models for func- tional counterparts of the eigenvectors of a
tional data. correlation matrix R are the eigenfunctions
of a correlation function r(s, t). Functional
Registration and Descriptive Statistics linear models use regression coefficient func-
tions β(s) or β(s, t) rather than β1 , . . . , βp .
Once the data are represented by samples of Naive application of standard estimation
functions, the next step might be to display methods, however, can often result in param-
some descriptive statistics. However, if we eters such as regression coefficient functions
inspect a sample of curves such as the acceler- and eigenfunctions which are unacceptably
ation estimates in the right panel of Figure 1, rough or complex, even when the data are
we see that the timing of the pubertal growth fairly smooth. Consequently, it can be essen-
spurt varies from child to child. The point- tial to impose smoothness on the estimate,
wise mean, indicated by the heavy dashed and this can be achieved through regulariza-
line, is a poor summary of the data; it has tion methods, typically involving attaching
less amplitude variation and a longer puber- a term to the fitting criterion that penal-
tal growth period than those of any curve. izes excessive roughness. Reference [5] offers
The reason is that the average is taken over a number of examples realized in the con-
children doing different things at any point in text of basis function representations of the
time; some are pre-pubertal, some are in the parameters.
middle, and some are terminating growth. Few methods for hypothesis testing, inter-
Registration is the one-to-one transforma- val estimation and other inferential problems
tion of the argument domain in order to align in the context of FDA have been developed.
salient curve or image features. That is, clock Part of the problem is conceptual; what does
6 FUNCTIONAL DATA ANALYSIS
it mean to test a hypothesis about an infi- data analyses is to apply a roughness penalty
nite dimensional state of nature on the basis directly to the estimated weight functions or
of finite information? On a technical level, eigenvalues ξ (t). This is achieved by chang-
although the theory of random functions or ing the normalizing condition to {ξ 2 (t) +
stochastic processes is well developed, testing λ[D2 ξ (t)]2 } dt = 1. In this way the smooth-
procedures based on a white noise model for ing or regularization process, controlled by
ignorable information are both difficult and roughness penalty parameter λ, may be defer-
seem unrealistic. red to the step at which we estimate the
Much progress has been made in [10], how- functional parameter that interests us.
ever, in adapting t, F, and other tests to locat- In multivariate work it is common prac-
ing regions in multidimensional domains tice to apply a rotation matrix T to principal
where the data indicate significant depar- components once a suitable number have
tures from a zero mean Gaussian random been selected in order to aid interpretability.
field. This works points the way to defining Matrix T is often required to maximize the
inference as a local decision problem rather VARIMAX criterion. The strategy of rotat-
than a global one. ing functional principal components also fre-
quently assists interpretation.
Principal Components Analysis
Linear Models
Suppose that we have a sample of N curves
xi (t), where the curves may have been The possibilities for functional linear models
registered and the mean function has been are much wider than for the multivariate
subtracted from each curve. Let v(s, t) be case. The simplest case arises when the
their covariance function. The goal of princi- dependent variable y is functional, the covari-
pal components analysis (PCA) is to identify ates are multivariate, and their values con-
an important mode of variation among the tained in a N by p design matrix Z. Then the
curves, and we can formalize this as the linear model is
search for a weighting function ξ (t), called
p
the eigenfunction, such that the amount of E[yi (t)] = zij βj (t) (5)
this variation, expressed as j
µ= ξ (s)v(s, t)ξ (t) ds dt (4) A functional version of theusual error sum
i [yi (t) − ŷi (t)] dt,
of squares criterion is 2
certainly help, but an alternate approach that E[yi (t)] = xij (t)βj (t). (7)
may be applied to virtually all functional j
FUNCTIONAL DATA ANALYSIS 7
In this model, yi depends on the xij ’s only in is maximized subject to the two normalizing
terms of their simultaneous behavior. constraints
But much more generally, a dependent
variable function yi can be fit by a single ξ (s)vXX (s, t)ξ (t) ds dt = 1
bivariate independent variable covariate xi
with the linear model η(s)vYY (s, t)η(t) ds dt = 1.
E[yi (t)] = α(t) + x(s, t)β(s, t) ds (8)
t Further matching pairs of canonical weight
functions can also be computed by requiring
where α(t) is the intercept function, β(s, t) is that they be orthogonal to previously com-
the bivariate regression function and t is puted weight functions, as in PCA.
a domain of integration that can depend on However, CCA is especially susceptible
t. The notation and theorems of functional to picking up high frequency uninteresting
analysis can be used to derive the func- variation in curves, and by augmenting the
tional counterpart of a least squares fit of normalization conditions in the same way as
this model. In practice, moreover, we can for PCA by a roughness penalty, the canonical
expect that there will be multiple covariates, weight functions and the modes of covariation
and some will be multivariate and some func- that they define can be kept interpretable. As
tional. in PCA, once a set of interesting modes of
No matter what the linear model, we can covariation have been selected, rotation can
attach one or more roughness penalties to aid interpretation.
the error sum of squares criterion being min-
imized to control the roughness the regres- Principal Differential Analysis
sion functions. Indeed, we must exercise this
Principal differential analysis (PDA) is a
option when the dependent variable is either
technique for estimating a linear variable-
scalar or finite dimensional and the covari-
coefficient differential equation from one or
ate is functional because the dimensionality
more functional observations. That is, func-
of the covariate is then potentially infinite,
tional data are used to estimate an equation
and we can almost always find function a
of the form
β(s) that will provide a perfect fit. In this
situation, controlling the roughness of the w0 (t)x(t) + w1 (t)Dx(t) + . . .
functional parameter also ensures that the
solution is meaningful as well as unique. + wm−1 (t)Dm−1 x(t) + Dm x(t) = f (t). (10)
Acknowledgments
The author’s work was prepared under a grant
from the Natural Science and Engineering
Research Council of Canada.
REFERENCES
J. O. RAMSAY