Professional Documents
Culture Documents
A Hierarchical Approach To Motion Analysis and Synthesis For Articulated Figures
A Hierarchical Approach To Motion Analysis and Synthesis For Articulated Figures
A Hierarchical Approach to
Motion Analysis and Synthesis
for Articulated Figures
2000 . 2 . 18
Jehee Lee
1 Introduction 1
2 Preliminary 5
iii
4.3 Preliminary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.3.1 Displacement Mapping . . . . . . . . . . . . . . . . . . . . . . 33
4.3.2 Multilevel B-spline Approximation . . . . . . . . . . . . . . . 34
4.4 Hierarchical Motion Fitting . . . . . . . . . . . . . . . . . . . . . . . 36
4.4.1 Hierarchical Displacement Mapping . . . . . . . . . . . . . . 36
4.4.2 Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.4.3 Motion Fitting . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.4.4 Knot Spacing . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.4.5 Initial Guesses . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.5 Inverse Kinematics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.5.1 A Numerical Approach . . . . . . . . . . . . . . . . . . . . . 41
4.5.2 A Hybrid Approach . . . . . . . . . . . . . . . . . . . . . . . 42
4.5.3 Arm and Leg Postures . . . . . . . . . . . . . . . . . . . . . . 44
4.5.4 Derivation for Equation (4.7) . . . . . . . . . . . . . . . . . . 45
4.6 Joint Limit Specification . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.7 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.8 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
iv
6 Conclusion 80
6.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
Bibliography 85
v
List of Figures
vi
4.9 Adaptation to environment change . . . . . . . . . . . . . . . . . . . 53
4.10 Adaptation to character change and motion transition . . . . . . . . 54
vii
List of Tables
viii
Chapter 1
Introduction
1
Crafting animations with a set of available motion clips requires a rich set of spe-
cialized tools such as interactive editing, retargetting, blending, stitching, smooth-
ing, enhancing, up-sampling, down-sampling, compression and so on. Motion clips
are typically short and specific to particular characters, environments and the con-
text of animation. With those tools, animators combine the motion clips together
into animations of arbitrary length with great variety in character size, environment
geometry, and scenario.
Motion data consist of a bundle of motion signals. Each signal represents a
sequence of sampled values that correspond to either the position or orientation of
a body segment. We will denote the position by a 3-dimensional vector and the
orientation by a unit quaternion. It is well-known that unit quaternions can repre-
sent 3-dimensional orientations smoothly and compactly without singularity. Those
signals are sampled at a sequence of discrete time instances at uniform intervals to
form a motion clip that consists of a sequence of frames. In each frame, the sampled
values from the signals determine the configuration of an articulated figure at that
frame.
A great deal of research has been devoted to processing regularly sampled
vector-valued data, for example, images which consist of RGB values sampled on
regular grids. An abundance of signal processing tools in both spatial and frequency
domains have been developed for image processing. Unfortunately, it is hard to
employ those tools without significant modification for processing motion data by
two reasons: The first reason is due to orientation components of motion data,
such as joint angles and the orientation of a root segment. An orientation in a 3-
dimensional space cannot be parameterized by a vector in the 3-dimensional space
without yielding singularity. The non-singular representations, such as rotation
matrices and unit quaternions, form a Lie group which introduces a complication
to signal processing techniques. The second reason is the lack of consideration on
articulated structures which yield kinematic relationships among motion signals that
comprise a motion clip. For example, if a signal that corresponds to each individual
joint such as the shoulder and the elbow is considered independently, we may have
undesirable trajectories of end-effectors, that is, hands of a synthetic character.
2
A typical process of producing animation from live-captured data includes three
major steps: The first step is to filter the raw input signal received from a motion
capture system to obtain smoother, pleasing motion data. The next is to adapt
the motion data to fit into a specific character or a virtual environment that may
different from the live puppeteer or the environment, respectively, where the mo-
tion takes place. Finally, the motion segments thus obtained are combined into a
seamless animation. In the thesis, we elaborate fundamental techniques that facili-
tate such tasks with proper consideration on orientation components and articulated
structures. The specific issues addressed in the thesis can be summarized as follows:
3
closed-form solution to compute the joint angles of a limb linkage. This analytical
method greatly reduces the burden of a numerical optimization to find the solutions
for full degrees of freedom of a human-like articulated figure.
The remainder of the thesis is organized as follows. Chapter 2 gives a brief re-
view of unit quaternions and their relation to orientation representation, Chapter 3
describes a general scheme to design spatial filters applicable to motion data that
comprise both translation and rotation components. Chapter 4 describes a hierarchi-
cal motion representation by which we cannot only manipulate a motion adaptively
to satisfy a large set of constraints within a specified error tolerance, but also edit an
arbitrary portion of the motion through direct manipulation. Chapter 5 presents an
analysis scheme to decompose motion into a hierarchy of detail levels, followed by
a synthesis scheme which produces a new motion by hierarchically combining detail
coefficients of prescribed example motions. Finally, Chapter 6 concludes the thesis
and describes the direction of future work.
4
Chapter 2
Preliminary
Quaternion Basics
The four-dimensional space of quaternions is spanned by a real axis and three or-
thogonal imaginary axes, denoted by î, ĵ, and k̂, which obey Hamilton’s rules
5
q1 and q2 can be written in several forms:
= w1 w2 − x1 x2 − y1 y2 − z1 z2 +
(y1 w2 + z1 x2 + w1 y2 − x1 z2 )ĵ +
(z1 w2 − y1 x2 + x1 y2 + w1 z2 )k̂,
and equivalently
q1 q2 = (w1 , v1 )(w2 , v2 )
(2.3)
= (w1 w2 − v1 · v2 , w1 v2 + w2 v1 + v1 × v2 ).
6
T1 S3 ≡ R3
1
log(q)
v
q exp(v)
S3
real axis and thus it must be a purely imaginary quaternion, which corresponds to
a vector in R3 . In physics terminology, the purely imaginary tangent is identical to
the angular velocity ω(t) ∈ R3 of q(t):
ω(t)
which is measured in the local coordinate frame specified by q(t). Here, ω(t)
is an instantaneous axis of the rotation, and ω(t) is the rate of change of the
rotation angle about the axis. Since the unit quaternion space is folded by the
antipodal equivalence, the angular velocity measured in S3 is twice as fast as the
angular velocity measured in SO(3). The constant factor 2 in Equation (2.5) keeps
consistency between the unit quaternion space and the rotation space.
One of the main connections between vectors and unit quaternions is the exponential
mapping. Quaternion exponentiation is defined in the standard way as:
q q qn
exp(q) = 1 + + + ··· + + ··· (2.6)
1! 2! n!
7
If the real part of q is zero, then exponential mapping gives a unit quaternion which
can be expressed in a closed-form.
v
exp(q) = exp(0, v) = (cos v, sin v) ∈ S3 . (2.7)
v
For simplicity, we often denote exp(0, v) as exp(v). This map is onto but not one-to-
one. To define its inverse function, we limit the domain such that v < π. Then, the
exponential map becomes one-to-one and thus its inverse map log : S3 \(−1, 0, 0, 0) →
R3 is well-defined.
π
2 v, if w = 0,
log(q) = log(w, v) = v
tan−1 v
w , if 0 < |w| < 1, (2.8)
v
0, if w = 1.
Geodesics
traversed by the object gives the shortest path on S3 between q1 and q2 , and is
called a geodesic of S3 . In that sense, the geodesic norm
dist(q1 , q2 ) = log(q−1
1 q2 ) (2.9)
8
T1 S3 ≡ R3
1
tv
v = log(q−1
1 q2 ) exp(tv)
q−1
1 q2
t q1
1−t
slerpt (q1 , q2 ) = q1 exp(tv)
geodesic q2
curve
S3
is a natural distance metric in the unit quaternion space. This metric is bi-invariant,
that is,
for any a, b ∈ S3 .
The slerp (spherical linear interpolation) introduced by Shoemake [88] param-
eterizes the points on a geodesic curve to compute the in-betweens of two given
orientations q1 and q2 (see Figure 2.2). That is,
9
Chapter 3
General Construction of
Spatial Filters for
Orientation Data
Efforts have been increasing to develop signal processing tools for filtering digitized
motion data, which comprise both translation and rotation components. A great deal
of research has been devoted to processing translation data, whereas the research
on orientation data is now emerging. In this chapter, we present a general scheme
of constructing spatial filters that perform smoothing and sharpening on orientation
data.
3.1 Motivation
Spatial filtering (as opposed to frequency domain filtering) has a variety of utilities
in digital signal processing including smoothing, sharpening, predicting, warping,
and so on [34, 46, 47]. Given a vector-valued signal (· · · , pi−1 , pi , pi+1 , · · · ) and a
spatial mask (a−k , · · · , a0 , · · · , ak ), the basic approach of spatial filtering is to sum
the products between the mask coefficients and the sample values under the mask
10
at a specific position on the signal. The i-th filter response is
This type of filtering is very popular for vector signals. However, if such a mask
is applied to a unit quaternion signal, then the response of the mask will not, in
general, be a quaternion of unit length, because the unit quaternion space is not
closed under addition. Azuma and Bishop had a similar problem in applying a
Kalman filter to unit quaternion signals [1]. They simply normalize filter responses,
which causes undesirable side effects such as singularity and unexpected distortion.
There have been efforts to develop digital signal processing techniques for ori-
entation signals. Lee and Shin [64] suggested smoothing operators derived from a
series of fairness functionals defined on orientation data. Such an operator can be
applied to orientation data for incrementally constructing a smooth angular motion.
Fang et al. [23] applied a low-pass filter to the estimated angular velocity of an input
signal to reconstruct a smooth angular motion by integrating the filter responses.
Hsieh et al. [43, 54] formulated the problem of smoothing orientation data as a non-
linear optimization of which objective function is defined in terms of the squared
magnitude of angular acceleration. They modified the traditional gradient-descent
method to enforce the unitariness of a quaternion signal during optimization.
All those techniques successfully exclude brute-force normalization. Their com-
mon idea is to employ exponential and logarithmic maps that have been used for
handling orientation data in recent literature [16, 40, 55, 88]. The exponential loga-
rithmic maps provide a natural, non-singular parameterization for “small” angular
displacements. This parameterization allows us to draw analogies between quater-
nion algebra and its vector counterpart. For example, the slerp (spherical linear
interpolation), that gives an intermediate point between two unit quaternion points,
is an analogue of the linear interpolation between two vector points [88].
There are a variety of possible schemes to draw a quaternion analogue of Equa-
tion (3.1). Many of the variations suffer from lack of crucial properties of spatial
filters such as shift-invariance and symmetry, or from limited applicability. Smooth-
11
ing operators presented by Lee and Shin [64] are shift-invariant but not symmetric.
The scheme of Fang et al [23] is neither shift-invariant nor symmetric. The numerical
optimization technique of Hsieh et al [43, 54] provides a special filter for smoothing
orientation data. However, this idea is not based on spatial masking and thus can
hardly be generalized to other types of spatial masking.
Our goal is to find a better analogue of spatial masking given in Equation (3.1),
which satisfies some desirable properties, i.e., coordinate-invariance, shift-invariance,
and symmetry. The basic idea is to convert the orientation data into their analogies
in a vector space to apply a spatial mask, and then to bring the result back to
the orientation space. We do not focus on designing a specific filter but propose
a general scheme applicable to a large class of spatial filters. Our interest lies in
affine-invariant spatial masks of which coefficients are summed up to one due to
their wide applicability.
In the following section, we give a brief introduction to spatial masking and ori-
entation representation. In Section 3.3, we present an orientation filtering scheme in
detail. Some examples of orientation filters are derived in Section 3.4. In Section 3.5,
we discuss detailed implementation issues and illustrates relevant experimental re-
sults. In Section 3.6, we compare our filter design scheme with others. Finally, we
conclude this paper in Section 3.7.
holds for any given scalar values a and b, and vector-valued signals pi , pi ∈ Rd . The
filter is shift-invariant if its response does not depend on the position in the signal.
12
We can formulate this property more elegantly by introducing shift operator S l that
translates the signal in the time domain by l steps:
F ◦ R = R ◦ F, (3.4)
where R(pi ) = p−i . Since asymmetric filters could shift the moments of a signal,
symmetric filters are preferred in many cases [34, 46, 47].
The difficulty of filtering unit quaternion data stems from the non-linear nature
of the unit quaternion space. Since the unit quaternion space is not closed under
addition and scalar multiplication, the weighted sum of unit quaternion points is
generally not a quaternion of unit length. This implies that the masking operation
given in Equation (3.1) may not be effective for unit quaternion data.
The exponential and logarithmic maps provide a clue to address this problem.
Let {qi ∈ S3 |i ≥ 0} be a unit quaternion signal that forms a piecewise slerp curve on
S3 . Through a simple derivation, each point qi can be represented as a cumulation
13
S3 R3
q2
exp(ω1)
q1
p0 ω0
exp(ω0)
q0 p1 ω1
p2
Figure 3.1: The transform between an angular signal in S3 and a linear signal in R3
qi = (q0 q−1 −1 −1
0 )(q1 q1 ) · · · (qi−1 qi−1 )qi
= q0 (q−1 −1
0 q1 ) · · · (qi−1 qi )
i−1
= q0 exp(ωj ), (3.5)
j=0
where ωj = log(q−1 3
j qj+1 ) ∈ R . Note that the angular displacement between two
14
Then, P and Q can be transformed to each other as follows (See Figure 3.1):
i−1
pi = p0 + log(q−1
j qj+1 ), and (3.6)
j=0
i−1
qi = q0 exp(pj+1 − pj ). (3.7)
j=0
k
where m=−k am = 1. If we apply F to the vector signal P , then each point is
displaced by F(pi ) − pi . The key idea of our approach is to exploit the one-to-
one correspondence between linear displacements (or linear velocity) and angular
displacements (or angular velocity) to construct a meaningful analogue of F in the
unit quaternion space. We define the corresponding orientation filter H in such a
way that it yields the angular displacement log(q−1
i H(qi )) which equals to the linear
The unitariness of filter responses is guaranteed, since the unit quaternion space is
closed under the quaternion multiplication. Conceptually, our filtering scheme is
to transform the input orientation signal Q to its analogue P in a vector space, to
apply a mask to P in a normal way, and finally to generate a filter response through
inverse transformation (See Figure 3.2).
The notion of local support is important for designing a filter that corresponds
to a mask of finite size. Since we are dealing with a sequence of displacements, the
filter with an infinite support may cause a large discrepancy at the end of the signal
15
qi pi
qi−2
qi+2 pi−2
qi−1 qi+1 pi+1 pi+2
pi−1
qi pi
exp(F (pi )−pi ) F (pi )−pi
H(qi ) F (pi )
even for a small deviation at each time instance. Letting pim = pi+m − pi ,
m=k
H(qi ) = qi exp(( am pi+m ) − pi )
m=−k
m=k
= qi exp( am (pi+m − pi ))
m=−k
m=k
= qi exp( am pim ). (3.10)
m=−k
16
Clearly, H(qi ) is locally supported by the neighboring points (qi−k , · · · , qi , · · · , qi+k ).
The size of support of H is identical to that of F. One interesting observation is
that the explicit evaluation of pi and F(pi ) is not actually needed for computing
H(qi ) although we originally define H in terms of them.
Letting ωi = log(q−1
i qi+1 ), we can further simplify Equation (3.10):
H(qi ) = qi exp a1 ωi + a2 (ωi + ωi+1 ) + · · · + ak (ωi + · · · + ωi+k−1 )
− a−1 ωi−1 − a−2 (ωi−1 + ωi−2 ) − · · · − a−k (ωi−1 + · · · + ωi−k )
= qi exp (a1 + a2 + · · · + ak )ωi + (a2 + a3 + · · · + ak )ωi+1 + · · · + ak ωi+k−1
− (a−1 + · · · + a−k )ωi−1 − (a−2 + · · · + a−k )ωi−2 − · · · − a−k ωi−k
k−1
k−1
= qi exp bm ωi+m = qi exp bm log(q−1 q
i+m i+m+1 ) , (3.12)
m=−k m=−k
where
k if 0 ≤ m ≤ k − 1,
j=m+1 aj ,
bm = (3.13)
m
j=−k −aj , if −k ≤ m < 0.
Equation (3.12) will be used in the next section for proving crucial properties of the
filter.
As mentioned earlier, the exponential map exp(v) is defined for all v ∈ R3 but its
inverse, that is, the logarithm is not well-defined at −I = (−1, 0, 0, 0). To evaluate
ωi reliably, we need an assumption that the angle between any pair of consecutive
points is smaller than π
2, that is, log(q−1
i qi+1 ) <
π
2 for all i. π
2 in the unit
quaternion space is equivalent to π in the orientation space, and thus our assumption
is reasonable in practice.
The orientation filter H inherits important properties from spatial masking,
m=k
since the angular displacement m=−k am pim caused by H is represented as a mask-
17
ing operation on a vector signal. The first property we will prove in this section is
coordinate-invariance. Due to this property, our filter gives the same results inde-
pendent of the coordinate system in which the orientation data are represented.
Proof: The first step of the proof is to show that exp(b−1 vb) = b−1 exp(v)b for
any v ∈ R3 and b ∈ S3 . Since v = b−1 vb,
sin v −1
exp(b−1 vb) = (cos v, b vb)
v
sin v −1
= (cos v, 0) + (0, b vb)
v
sin v
= b−1 (cos v, 0)b + b−1 (0, v)b
v
sin v
= b−1 (cos v, v)b
v
= b−1 exp(v)b.
Similarly, we can show that log(b−1 qb) = b−1 log(q)b for any q and b ∈ S3 . Then,
we have
k−1
−1 −1 −1
H(aqi b) = aqi b exp bm log(b qi+m a aqi+m+1 b)
m=−k
k−1
−1 −1
= aqi b exp b bm log(qi+m qi+m+1 )b
m=−k
k−1
−1
= aqi exp bm log(qi+m qi+m+1 ) b
m=−k
= aH(qi )b.
Since the support of H is finite, we can show that H is shift-invariant.
Proposition 2 H is shift-invariant.
18
Proof: Using Equation (3.12), we show that H commutes with S l for any l:
k−1
−1
S ◦ H(qi ) = qi−l exp
l
bm log(q(i+m)−l q(i+m+1)−l )
m=−k
k−1
−1
= qi−l exp bm log(q(i−l)+m q(i−l)+m+1 )
m=−k
= H(qi−l ) = H ◦ S l (qi ).
Now, we show that H is symmetric for any given symmetric coefficients.
Proof: We will be done if we show that H commutes with R. We first expand R◦H
using Equation (3.12):
k−1
−1
R ◦ H(qi ) = q−i exp bm log(q−i−m q−i−m−1 )
m=−k
k−1
−1 −1
= q−i exp bm log(q−i−m−1 q−i−m )
m=−k
k−1
−1
= q−i exp −bm log(q−i−m−1 q−i−m ) .
m=−k
k−1
−1
R ◦ H(qi ) = q−i exp −b−n−1 log(q−i+n q−i+n+1 )
n=−k
k−1
−1
= q−i exp bn log(q−i+n q−i+n+1 )
n=−k
= H(q−i ) = H ◦ R(qi ).
19
3.4 Examples
In this section, we provide some examples of orientation filters that correspond to
popular spatial filters such as smoothing, blurring and sharpening filter masks.
Smoothing: Our first example is a smoothing filter mask that is of practical use
in signal processing [59, 60]. The smoothness measure p (t)2 dt is minimized
if the corresponding Euler-Lagrange equation p (t) = 0 holds [28]. The discrete
version of the Euler-Lagrange equation, ∆4 pi = 0, is obtained by replacing differ-
ential operators with forward divided difference operators. It is well-known that an
iterative scheme using a local update rule
pi ← pi − λ∆4 pi (3.14)
gradually adjusts the data points to approach the optimal solution [59, 60]. Here,
λ is a damping factor that controls the rate of convergence. This update rule yields
an affine-invariant spatial mask ( −λ 4λ 16−6λ 4λ −λ
16 , 16 , 16 , 16 , 16 ) and the corresponding ori-
−λ i 4λ i 16 − 6λ i 4λ i −λ i
HS (qi ) = qi exp( p + p + p0 + p + p )
16 −2 16 −1 16 16 1 16 2
λ
= qi exp( (−(−ωi−2 − ωi−1 ) + 4(−ωi−1 ) + 4ωi − (ωi + ωi+1 )))
16
λ
= qi exp( (ωi−2 − 3ωi−1 + 3ωi − ωi+1 )). (3.15)
16
There is a strong analogy between orientation signal Q and its vector counterpart
P in a sense that the estimated velocity and acceleration of the linear motion rep-
resented by P are identical to the estimated angular velocity and acceleration, re-
spectively, of the angular motion represented by Q. Accordingly, if a given filter
mask is able to minimize i p (ti )2 , then its orientation counterpart HS can be
expected to minimize the corresponding measure i ω (ti )2 to give a fair angular
motion. Here, the angular acceleration at a discrete time instance is estimated as
20
follows [43, 64]:
log(q−1 −1
i qi+1 ) − log(qi−1 qi )
ω (ti ) = , (3.16)
h2
Blurring: A more popular class of filters are derived from the binomial distribu-
tion [46]. Binomial coefficients give a low-pass filter that suppresses Gaussian noise
and blurs the details of the signal. The coefficients of an odd-sized (2k + 1) binomial
mask can be written:
1 (2k)!
Bi2k+1 = 2k
, −k ≤ i ≤ k. (3.17)
2 (k − i)!(k + i)!
1
HB (qi ) = qi exp( (−ωi−2 − 5ωi−1 + 5ωi + ωi+1 )). (3.18)
16
21
If the binomial mask B 5 is used for blurring the original signal, then the coeffi-
cients ( −λ −4λ 16+10λ −4λ −λ
16 , 16 , 16 , 16 , 16 ) for sharpening is obtained. By substituting the
coefficients in Equation (3.10), we have the orientation filter
λ
HU (qi ) = qi exp( (ωi−2 + 5ωi−1 − 5ωi − ωi+1 )) (3.19)
16
3.5 Experiments
Sampling Rate: Yet another ambiguity would stem from the slow sampling rate of
a motion capture system. Consider an object which spins at the rate of 2π times the
sampling rate. Then, motion data obtained from the object may be indistinguishable
from the one captured from a stationary object. In practice, the sampling rate is
fast enough not to cause such a problem and thus the angle between two consecutive
orientations in a signal is sufficiently small. In our experiments, we assume that each
subsequence (qi−k , · · · , qi , · · · , qi+k ) of an input signal is inside an open half-sphere
whose center is at qi for a given spatial mask of size (2k + 1). With this assumption,
22
the geodesic distance from qi to any of its neighboring points under the mask is less
π
than 2.
Boundary Conditions: In general, the input signal is neither infinite nor peri-
odic. The signal has boundary points, and the left boundary seldom has anything
to do with the right boundary. A periodic extension can be expected to have a dis-
continuity. The natural way to avoid this discontinuity is to reflect the signal at its
endpoints to seamlessly extend the signal [90]. Let (q0 , · · · , qn ) be a unit quaternion
signal and ωi = log(q−1
i qi+1 ), 0 ≤ i < n, be the angular displacements of the signal.
As shown in Figure 3.3, we first apply our orientation filter to synthetic motion
data to visualize the effect of the filtering. The initial orientation data (top left) are
uniformly sampled from a unit quaternion spline curve and perturbed with noises
by multiplying each sample qi with the exponent eδi of a 3D vector δi < 0.2,
which is randomly-generated. This moves each sampled point slightly in a random
direction up to 0.2 radians. We apply smoothing filter HS to the initial motion once
(top right), twice (bottom left), and ten times (bottom right) to illustrate a series
of incrementally refined motions. The smoothing effect is clearly observed along the
trajectories of the tips of bird wings.
We also apply the filter to captured motion data. Our motion capture system
(MotionStar, Ascension Technology) consists of a magnetic field transmitter and 14
trackers each of which is attached to a link of a puppeteer and detects both the
position and orientation of the link measured in the global coordinate system. As
shown in Figure 3.4, we capture a live athletic stretching motion. This motion is
sampled at the rate of 30 frames per second. In particular, we concentrate on the
orientation signal for the left shoulder. In Figure 3.5(a), x-axis represents the frame
numbers of the signal and y-axis does the magnitude of each component of unit
23
quaternions. The magnitude of angular acceleration is plotted to show the noise in
the captured signal (See Figure 3.5(b)).
In Figures 3.5(c) and (e), we use smoothing filters HS and HB , respectively, to
reduce the noise in the signal. Each filter is applied to the signal five times. The effect
of smoothing is clearly shown in the corresponding magnitude plots of angular accel-
eration (See Figures 3.5(d) and (f)). On the contrary, the high-frequency boosting
filter HU enhances the high-frequency components of the signal and thus the esti-
mated angular acceleration vectors are magnified as expected (See Figures 3.5(g)
and (h)).
3.6 Discussion
As mentioned earlier, there are many variations in drawing a quaternion analogue
of spatial masking. In this section, we compare our filter design scheme with others.
Global vs. Local Parameterization: The naive use of the exponential and
logarithm maps could lead to a troublesome orientation filter
each pair of consecutive points in the signal. The local parameterization enables us
to design the coordinate-invariant orientation filters which are singularity-free for
sufficiently dense samples.
24
Symmetric vs. Asymmetric Transform: Fang et al [23] also employed the idea
of the local parameterization to design an orientation filter H̃ in that a corresponding
spatial filter F is directly applied to a sequence of angular displacements as follows:
i−1
H̃(qi ) = q0 exp(F(ωj )), (3.22)
j=0
where ωj = log(q−1
j qj+1 ). However, this filter has some drawbacks. To illustrate
this, consider the support of the filter. Since the responses of F are explicitly
cumulated to produce the response of H̃, the value of H̃(qi ) is influenced by the
non-local, asymmetric neighbors qj for 0 ≤ j ≤ i + k, and thus the filter H̃ may
yield a larger deviation at the end of the signal. Instead, in our orientation filter
m=k
H(qi ) = qi exp(F(pi ) − pi ) = qi exp( am pim ), (3.23)
m=−k
we consider the displacement, (F(pi )−pi ), gained by a given filter. Unlike the direct
filter response F(pi ), this displacement is computed from a local neighborhood of
qi without explicitly evaluating pi and F(pi ). Therefore, the deviation at a frame
is not propagated to other frames and thus we do not have any exaggeration at the
end of the signal.
25
Figure 3.3: Bird flying
26
1 3.5
0.8
3
0.6
2.5
0.4
0.2 2
0 1.5
-0.2
w 1
-0.4 x
y 0.5
-0.6 z
-0.8 0
-1
0 20 40 60 80 100 120 140 160 180 200 0 20 40 60 80 100 120 140 160 180 200
-0.8 0
-1
0 20 40 60 80 100 120 140 160 180 200 0 20 40 60 80 100 120 140 160 180 200
-0.8 0
-1
0 20 40 60 80 100 120 140 160 180 200 0 20 40 60 80 100 120 140 160 180 200
27
Chapter 4
Much of the recent research in character animation has been devoted to developing
various kinds of editing tools to produce a convincing motion from prerecorded
motion clips. To reuse motion-captured data, animators often adapt them to a
different character, i.e., retargetting a motion from one character to another [33], or
to a different environment to compensate for geometric variations [7, 98]. Animators
also combine two motion clips in such a way that the end of one motion is seamlessly
connected to the start of the other [82].
This chapter presents a technique for adapting an existing motion of a human-
like character to have the desired features specified by a set of constraints. This tech-
nique can be used for retargetting a motion to compensate for geometric variations
caused by both characters and environments, as well as for directly manipulating
motion clips through a graphical interface.
28
orientation of an end-effector, such as a foot and a hand, of an articulated figure at a
specific time. The important features of the target motion are specified interactively
as constraints, and the captured motion is deformed to satisfy those constraints.
Motion data consist of a bundle of motion signals. Each signal represents a
sequence of sampled values for each degree of freedom. Those signals are sampled
at a sequence of discrete time instances with a uniform interval to form a motion
clip that consists of a sequence of frames. In each frame, the sampled values from
the signals determine the configuration of an articulated figure at that frame, and
thus they are related to each other by kinematic constraints. This structure yields
two relationships among sampled values: inter-frame and intra-frame relationships.
Through the use of an inverse kinematics solver, the intra-frame relationship, that
is, the configuration of an articulated figure within each frame can be adjusted to
meet the kinematic constraints. However, if each frame is considered independently,
then there could be an undesirable jerkiness between consecutive frames. Therefore,
we have to take account of the inter-frame relationship as well. For this purpose,
we employ the multilevel B-spline fitting technique. We also present an efficient
inverse kinematics algorithm which is used in conjunction with the fitting technique.
Our approach is distinct from the work of Gleicher [33] who addressed the same
problem. He provided a unified approach to fuse both relationships into a very
large non-linear optimization problem, which is cumbersome to handle. Instead, we
decouple the problem into manageable subproblems each of which can be solved very
efficiently.
Multilevel B-spline fitting techniques have been investigated to design smooth
surfaces which interpolate scattered features within a specified tolerance [27, 65, 66,
95]. Among them, we extend the technique presented by Lee et al. [66] for adapting a
motion to satisfy the constraints which are scattered over the frames. The multilevel
B-splines make use of a coarse-to-fine hierarchy of knot sequences to generate a series
of uniform cubic B-spline curves whose sum approaches the desired function. At each
level in the hierarchy, the control points of the B-spline curve are computed locally
with a least-squares method which provides an interactive performance. With this
fitting technique, we cannot only manipulate a curve adaptively to satisfy a large
29
set of constraints within a specified error tolerance, but also edit a curve at any
level of detail to allow an arbitrary portion of the motion to be affected through
direct manipulation. Exploiting these favorable properties of the multilevel B-spline
curves, we conveniently derive a hierarchy of displacement maps which are applied
to the original motion data to obtain a new, smoothly modified motion. Because of
this displacement mapping, the detail characteristics of the original motion can be
preserved [7, 98].
The performance of our approach is further enhanced by our new inverse kine-
matics solver. It is commonplace to formulate the inverse kinematics with multiple
targets as a constrained non-linear optimization for which the computational cost is
expensive [82, 99]. As noticed by Korein and Badler [62], we can find a closed-form
solution to the inverse kinematics problem for a limb linkage which consists of three
joints, for example, shoulder-elbow-wrist for the arm and hip-knee-ankle for the
leg. We combine this analytical method with a numerical optimization technique to
compute the solutions for full degrees of freedom of a human-like articulated figure.
Our hybrid algorithm enables us to edit the motions of a 37 DOF articulated figure,
interactively.
The remainder of this chapter is organized as follows. After a review of previous
works, we give an introduction to the displacement mapping and the multilevel B-
spline fitting technique in Section 4.3. In Section 4.4, we present our motion editing
technique. In Section 4.5, we describe two inverse kinematics algorithms: One is
designed to manipulate a general tree-structured articulated figure and the other is
specialized to a human-like figure with limb linkages. In Section 4.6, we describe
how to specify joint limits in a unit quaternion representation. In Section 4.7, we
demonstrate how our technique can be used for motion capture-based animation
which includes adapting a motion from one character to another, fitting a recorded
walk onto a rough terrain and performing seamless transitions among motion clips.
In Section 4.8, we compare our algorithm to the previous approaches at several
viewpoints.
30
4.2 Related Work
There have been an abundance of research results to develop motion editing tools.
Bruderlin and Williams [7] showed that techniques from the signal processing do-
main can be applied to manipulating animated motions. They introduced the idea
of displacement mapping to alter a motion clip. Witkin and Popović [98] presented
a motion warping technique for the same purpose. Bruderlin and Williams also pre-
sented a multi-target interpolation with dynamic time warping to blend two motions.
Unuma et al. [94] used Fourier analysis techniques to interpolate and extrapolate mo-
tion data in the frequency domain. Wiley and Hahn [96] and Guo and Robergé [37]
investigated spatial domain techniques to linearly interpolate a set of example mo-
tions. Rose et al. [81] adopted a multidimensional interpolation technique to blend
multiple motions all together.
Witkin and Kass [97] proposed a spacetime constraint technique to produce
the optimal motion which satisfies a set of user-specified constraints. Brotman and
Netravali [6] achieved a similar result by employing optimal control techniques. The
spacetime formulation leads to a constrained non-linear optimization problem. Co-
hen [14] developed a spacetime control system which allows a user to interactively
guide a numerical optimization process to find an acceptable solution in a feasible
time. Liu et al. [68] used a hierarchical wavelet representation to automatically
add motion details. Rose et al. [82] adopted this approach to generate a smooth
transition between motion clips. Gleicher [32] simplified the spacetime problem by
removing the physics-related aspects from the objective function and constraints to
achieve an interactive performance for motion editing. He also applied this technique
for motion retargetting [33].
31
introduced a hierarchical B-spline representation to enhance surface modeling ca-
pability. This representation allows details to be adaptively added to the surface
through local refinement. They also employed the hierarchical representation for
fitting a spline surface to the regular data sampled at grid points [27]. Welch and
Witkin [95] proposed a variational approach to directly manipulate a B-spline sur-
face with scattered features, such as points and curves. Lee et al. [65, 66] suggested
an efficient method for interpolating scattered data points. They also demonstrated
that image warping applications can be cast as a surface fitting problem by adopting
the idea of displacement mapping. Although authors used different terms, such as
hierarchical and multilevel B-spline surfaces, to refer to their hierarchical structures,
their underlying ideas are the same, that is, a coarse-to-fine hierarchy of control lat-
tices. Another class of approaches is due to multiresolution analysis and wavelets.
Finkelstein and Salesin [25] used B-spline wavelets for multiresolution editing of
curves. Many authors have investigated multiresolution analysis for manipulating
spline surfaces and polygonal meshes [10, 22, 69, 85].
Traditionally, inverse kinematics solvers can be divided into two categories: analytic
and numerical solvers. Most industrial manipulators are designed to have analytic
solutions for efficient and robust control. Kahan [48] and Paden [74] independently
discussed methods to solve an inverse kinematics problem by reducing it into a
series of simpler subproblems whose closed-form solutions are known. Korein and
Badler [62] showed that the inverse kinematics problem of a human arm and leg
allows an analytic solution. Actual solutions are derived by Tolani and Badler [93].
A numerical method relies on an iterative process to obtain a solution. Girard
and Maciejewski [31] addressed the locomotion of a legged figure using Jacobian ma-
trix and its pseudo inverse. Koga et al. [61] made use of results from neurophysiology
to achieve an “experimentally” good initial guess and then employed a numerical
procedure for fine tuning. Zhao and Badler [99] formulated the inverse kinematics
problem of a human figure as a constrained non-linear optimization problem. Rose
et al. [82] extended this formulation to handle variational constraints that hold over
32
an interval of motion frames.
4.3 Preliminary
p p0 v0 p0 + v0
q1 q10 v1 q10 exp(v1 )
= ⊕ = . (4.1)
.. .. .. ..
. . . .
qn qn0 vn qn0 exp(vn )
v
Here, exp(v) denotes a 3-dimensional rotation about the axis v ∈ R3 by angle
v ∈ R. With the displacement mapping, we are able to deal with both position
and orientation data in a uniform way; the displacement map is a homogeneous
33
array of 3-dimensional vectors, while the configuration of an articulated figure is
represented as a heterogeneous array of a vector and unit quaternions.
Lee et al. [66] proposed a multilevel B-spline approximation technique for fitting a
spline surface to scattered data points. In this section, we give a brief summary to
introduce their fitting technique. Since we need to manipulate a curve rather than
a surface, our derivation focuses on curve fitting.
Let Ω = {t ∈ R|0 ≤ t < n} be a domain interval. Consider a set of scattered
data points P = {(ti , xi )} for ti ∈ Ω. To interpolate the data points, we formulate an
approximation function f as a B-spline function which is defined over a uniform knot
3
sequence overlaid on the domain Ω. The function f (t) = k=0 Bk (t − t)bt+k−1
can be described in terms of its control points and uniform cubic B-spline basis
functions Bk , 0 ≤ k ≤ 3. Here, bi is the i-th control point on the knot sequence
for −1 ≤ i ≤ n + 1. With this formulation, the problem of deriving function f is
reduced to that of finding the control points that best approximate the data points
in P.
Since each control point bj is influenced by the data points in its neighborhood,
we can define the proximity set Pj = {(ti , xi ) ∈ P|j − 2 ≤ ti < j + 2} which affect
the value of bj . Simple linear algebra using pseudo inverse provides a least-squares
solution
2
(t ,xi )∈Pj ωij βij
bj = i 2 (4.2)
(ti ,xi )∈Pj ωij
which minimizes a local approximation error (ti ,xi )∈Pj f (ti ) − xi 2 . Here, ωij =
ωij xi
Bj+1−ti (ti − ti ) comes from a B-spline basis function, and βij = 3
Bk (ti −ti )
k=0
34
f0 f0 + f1 f0 + f1 + f2
f1 f2
Figure 4.1: Hierarchical curve fitting to scattered data through multilevel B-spline
approximation
35
4.4 Hierarchical Motion Fitting
Given the original motion m0 and a set C of constraints, our problem is to derive
a smooth displacement map d such that a target motion m = m0 ⊕ d satisfies the
constraints in C. Current motion editing techniques represent the displacement map
as an array of spline curves defined over a common knot sequence [33, 98]. Each
spline curve gives the time-varying motion displacement of its corresponding joint.
With a finer knot sequence, we can possibly find a solution that accurately satisfies
all the constraints in C. However, we have to pay a higher computational cost for the
accuracy due to the finer knot sequence. Ideally, we wish to determine the density of
knots to yield just enough shape freedom for an exact solution. However, the target
motion is not known in advance and thus we require the displacement map which
allows details to be added by adaptively refining the knot sequence.
We adopt the hierarchical structure [66] reviewed in Section 4.3.2 to perform
this adaptive refinement. The multilevel B-spline approximation technique was em-
ployed to derive a warp function for image morphing and geometry reconstruction.
In our context, we extend this technique to handle motion data. From the displace-
ment map d, we derive a series of successively finer submaps d1 , · · · , dh that give
the corresponding series of incrementally refined motions, m1 , · · · , mh .
mh = (· · · ((m0 ⊕ d1 ) ⊕ d2 ) ⊕ · · · ⊕ dh ). (4.3)
36
from an erroneous derivation, that is, to substitute exp(v1 ) exp(v2 ) · · · exp(vh ) for
exp(v1 + v2 + · · · + vh ) in Equation (4.1) and (4.3). This derivation is not correct,
since the quaternion multiplication is not commutative.
4.4.2 Constraints
To specify the desired features of the target motion, two categories of constraints are
employed: The ones in the first category are used to describe an articulated figure
itself, such as a joint limit and an anatomical relationship among joints. Those in the
other category are for placing end-effectors of the figure at particular positions and
orientations which are interactively specified by the user or automatically derived
from the interaction between the figure and its environment. For example, we first
specify the contact point between the foot and the ground through a graphical
interface and automatically modify the point later in accordance with the geometric
variation of the ground. We assume that a constraint in either category is defined
at a particular instance of time. A variational constraint that holds over an interval
of motion frames can be realized by a sequence of constraints for the time interval.
An ordered pair (tj , Cj ) specifies the set Cj of constraints at a frame tj .
37
minimize
38
Figure 4.2: A live-captured walking motion was interactively modified at the middle
frame such that the character bent forward and lowered the pelvis. The character
is depicted at the modified frame. The range of deformation is determined by the
density of the knot sequences. The knots in τ1 are spaced every (top) 4, (middle) 6,
and (bottom) 12 frames.
have several drawbacks. The resulting curve may be less accurate and could have
undulations because of the lack of the global propagation of a displacement. For-
tunately, our hierarchical structure can compensate for such drawbacks by globally
propagating the displacement at a coarse level and performing the later tuning at
fine levels.
For simplicity, our implementation doubles the density of a knot sequence from one
level to the next. Therefore, if τk has (n + 3) control points on it, then the next
finer knot sequence τk+1 will have (2n + 3) control points. The density of a knot
sequence τk , 1 ≤ k ≤ h, determines the range of influence of a constraint on the
displacement map at a level k. This is of great importance for direct manipulation.
For example, consider the situation that we interactively adjust the configuration of
an articulated figure by dragging one of its segments at a certain frame through a
graphical interface. The user input is interpreted as constraints which are immedi-
39
ately added to the set of prescribed constraints. Then, our system smoothly deforms
a portion of the motion clip around this modified frame. Here, the range of influ-
ence on the motion clip is mainly dependent on the spacing of τk . Larger spacing
between knots yields wider range of deformation (See Figure 4.2). Therefore, the
displacement map d1 , that is derived from the coarsest sequence τ1 , has non-zero
values over the widest range to smoothly propagate the change of the motion. The
subsequent finer displacement maps dk , 2 ≤ k ≤ h, perform successive tunings to
satisfy the constraints.
The density of the finest knot sequence τh controls the precision of the final
motion mh . If τh is sufficiently fine to accommodate the distribution of constraints
in the time domain, mh can exactly satisfy all constraints. However, our algorithm
may leave a small deviation for each constraint in C even with several levels in the
hierarchy. In our experiments, we need just four or five levels for visually pleasing
results, that can be further enhanced to achieve an exact interpolation by enforcing
each constraint independently with the inverse kinematics solver.
For a spacetime problem, a good initial guess on the desired solution is very impor-
tant to improve both the convergence of numerical optimization and the quality of
the result [33]. We obtain an initial guess for motion fitting by shifting the position
of the root segment in the original motion. To motivate this, consider the walking
motion that is adapted to the rough terrain as shown in Figures 4.9 (a) and (b). The
position of a stance foot, that touches the surface of the terrain, is pulled upward
at a small hill on the terrain, and thus the character is unwantedly forced to squat.
Even though the inverse kinematics solver tries to minimize the deviation of joint
angles, it cannot prevent the knee bending completely. To reduce this artifact, we
change the position of the root segment due to the change of geometry. Specifi-
cally, we displace the root segment by the average shift of the contact positions at
each frame. The shift of the root segment position at a frame can also be smoothly
propagated to the neighboring frames using the multilevel B-spline fitting method.
40
4.5 Inverse Kinematics
The most time consuming component in the motion fitting algorithm is the inverse
kinematics solver which is invoked very frequently at each level of the fitting hierar-
chy. Therefore, the overall performance of a hierarchical fitting critically depends on
the performance of the inverse kinematics solver. We describe, in this section, two
inverse kinematics algorithms. In Section 5.1, we introduce an inverse kinematics al-
gorithm for a general tree-structured figure with spherical joints based on numerical
optimization techniques. In Section 5.2, we present a faster specialized algorithm for
a human-like figure with limb linkages. The latter algorithm combines the numerical
techniques with an analytical method illustrated in Section 5.3.
p = p0 + v0 , and
(4.5)
qi = qi0 exp(vi ), 1 ≤ i ≤ n,
41
x. Accordingly, our constrained optimization problem is formulated as follows:
1 T
minimize f (x) = x Mx,
2
subject to ci (x) = 0, i ∈ Ne ,
ci (x) > 0, i ∈ Ni ,
g(x) = f (x) + ωi ci (x)2 + ωi (min(ci (x), 0))2 , (4.6)
i∈Ne i∈Ni
The major difficulty of solving an inverse kinematics problem stems from the exces-
sive DOFs of an articulated figure. A reasonable human model may have about 40
DOFs for computer animation, while we specify much fewer constraints for manip-
ulating the figure. For a figure of n DOFs, we can remove c of those DOFs with a
set of c independent constraints imposed on it. The remaining (n − c) DOFs span
the solution space of the problem.
A reduced-coordinate formulation parameterizes the redundant DOFs with a
reduced set of (n − c) variables. One explicit redundancy in the human body is
the “elbow circle” that was first mentioned in Korein and Badler [62]. Even though
the shoulder and the wrist are firmly planted, we can still afford to move the elbow
42
3 DOF
3 DOF
1 DOF
3 DOF
1 DOF
3 DOF
Figure 4.3: A human-like figure that has explicit redundancies at its limb linkages
along a circle with its axis through the shoulder and the wrist (See Figure 4.3). The
human figure has four limbs, two from arms and two from legs. The redundant DOF
for the i-th limb linkage can be parameterized with a rotation angle θi , 1 ≤ i ≤ 4,
about the axis.
Without loss of generality, we assume that the positions and orientations of
hands and feet are fixed by constraints. If there is a free hand or foot, the DOFs in
the corresponding limb are left unchanged. Let m = (p, q1 , · · · , qr , qr+1 , · · · , qn )
be the configuration of a human-like figure. Its rear part (qr+1 , · · · , qn ) denotes
the DOFs for the limbs and the fore part (p, q1 , · · · , qr ) does the remaining DOFs.
Since the constraints restrain the DOFs in the limb linkages, the reduced set of
parameters (p, q1 , · · · , qr , θ1 , · · · , θ4 ) span all possible configurations of the figure
under the constraints.
Incorporating the idea of reduced-coordinate formulation into the numerical
optimization framework, we can solve an inverse kinematics problem using a fewer
number of optimization parameters x̂ = (x0 , · · · , x3r+2 , θ1 , · · · , θ4 ) ∈ R3r+7 . Note
that we have replaced the rear part of x with the elbow circle parameters θ1 , · · · , θ4
for limb linkages. Whenever we evaluate the objective function with new param-
eters x̂, the parameters (p, q1 , · · · , qr ) are computed first by Equation (4.5), and
then the others for (qr+1 , · · · , qn ) are uniquely determined by an analytical solver
43
goal position & orientation
r1
l2 l1
r2 φ
θi
Consider a limb linkage, for example, an arm linkage. Starting from an initial
configuration, we sequentially adjust the joint angles for the elbow, the shoulder
and the wrist of the arm linkage to place the hand at the desired position and
orientation. We assume that the torso and the shoulder positions are given. Let l1 ,
44
l2 , r1 , r2 and L be defined as follows (See Figure 4.4(a)):
r2 = the distance from the elbow rotation axis to the wrist, and
To place the wrist at a position distant from the shoulder by L (See Figure 4.4(b)),
the angle φ between upper and lower arms is given by
−1 l12 + l22 + 2 l12 − r12 l22 − r22 − L2
φ = cos , (4.7)
2r1 r2
as illustrated in the next section. Then, we bring the wrist to the goal position by
adjusting the shoulder angles (See Figure 4.4(c)). In the subsequent step, we rotate
the wrist angles to coincide with the goal orientation. Once one feasible solution is
given, the other solutions can be obtained by rotating the elbow about the axis that
passes through the shoulder and the wrist positions. Given θi , we can determine the
arm posture uniquely (See Figure 4.4(d)). Similarly, we can determine a leg posture.
If L is longer than the arm length, l1 + l2 , the elbow stretches as far as possible.
On the other hand, if L is too small, then the elbow angle could violate its lower
limit and thus is pulled back into the allowable range. In both cases, we cannot
place the wrist at the exact position and thus the corresponding penalty function
yields a positive value for the given torso configuration.
To identify the angle between the upper and fore arms, we project the joint positions
onto a plane perpendicular to the elbow rotation axis. Then, the projections for the
shoulder and the wrist are placed on two concentric circles whose center coincides
with the projection for the elbow (see Figure 4.5). The distance r between the
45
r
r1 L l1
r2 l2
φ
s2 s1
Letting s1 = l12 − r12 and s2 = l22 − r22 , respectively, the distance L between the
shoulder and the wrist positions is
L2 = (s1 + s2 )2 + r 2
= l1 + l2 + 2 l1 − r1 l22 − r22 − r12 − r22 + r 2
2 2 2 2 (4.9)
= l12 + l22 + 2 l12 − r12 l22 − r22 − 2r1 r2 cos(∠r1 r2 ).
√2 √
l12 +l22 +2 l1 −r12 l22 −r22 −L2
Therefore, cos φ = 2r1 r2 .
46
w w
ψ v
φ
φ
47
k0
k2
k1
w
Figure 4.7: The joint limit of the shoulder specified by a composite constraint
where v is an arbitrary unit vector. Under constraint HS (q0 , k), the joint is allowed
to rotate about an arbitrary axis, but the rotation angle must be smaller than or
equal to k.
For instance, the range of motion of the shoulder can be described as the
intersection of two constraints as shown in Figure 4.7 :
HA (q0 , w, k1 , k2 ) ∩ HC (q0 , w, k0 ).
q0 denotes the orientation of the shoulder when it relaxes and w does the direction
from the shoulder to the elbow. Then, the axial constraint describes the range of the
twist angle and the conic constraint gives limitation to the direction of the upper
arm.
Let q ∈ S3 be a joint configuration and H ⊂ S3 be a composite constraint that
is described as a boolean combination of primitive constraints. To check whether q
is included in H, the point inclusion tests, which answer true or false, for primitives
and their logical combinations are required. Note that any quaternion point q and its
antipode −q represent the same orientation. Therefore, the constraints are satisfied
if H includes either q or −q. The following propositions explain how to test point
inclusion for each primitive constraint.
48
w w
φ
k ψ ψ
ŵ ŵ
eψv/2 eφw/2 q0 = q,
The image of w rotated by eψv/2 eφw/2 makes a cone whose direction is w and the
cone angle is k, as shown in Figure 4.8(a). The quaternion qq−1
0 is contained in
{eψv/2 eφw/2 | 0 ≤ ψ ≤ k} if and only if the image ŵ = (qq−1 −1 −1
0 )w(qq0 ) of w is
inside the conic region. Hence, q ∈ HC (q0 , w, k) if the angle between w and ŵ is
less than or equals to k.
w · ŵ, v = w×ŵ
w×ŵ , and ŵ = (qq−1 −1 −1
0 )w(qq0 ) .
eψv/2 eφw/2 q0 = q,
49
Let ŵ be the image of w rotated by qq−1
0 . Then, ψ and v are directly computed as
w×ŵ
(see Figure 4.8(b)): cos ψ = w · ŵ and v = w×ŵ . Hence, the angle φ is
φ = 2w · log(e−ψv/2 qq−1
0 ).
eφv/2 q0 = q,
eφv/2 = qq−1
0 ,
φv
= log(qq−1
0 ).
2
50
of heel-strikes and toe-offs for the motion clips. This information is used for es-
tablishing the kinematic constraints that enforce the foot contacts for the entire
motion. The terrain of Figure 4.9(b) is represented as a NURBS surface of which
control points are placed on a regular grid with a spacing of 80 % of the height of
the character, and their y-coordinates (heights) are randomly perturbed within 120
% of the height. To adapt the motion onto the rough terrain with doorways, we first
adjust the constraints such that the contact positions are shifted along the y-axis to
be placed on the terrain, and add new constraints to bend the character under the
doorways. Then, we use our motion fitting algorithm to warp the motion to satisfy
the constraints. The original and the adapted motions are depicted in Figures 4.9(a)
and 4.9(b), respectively.
The “climbing a rope” example in Figure 4.10(a) gives constraints on both
hands and feet. A physically simulated rope is used to explicitly illustrate the
moments of grasping and releasing the rope by a hand, which correspond to the
initiation and termination, respectively, of a variational constraint for that hand.
We adapt this motion to a different character with longer legs and a shorter body
and arms. For the character morphing example shown in Figure 4.10(b), the size
of a character smoothly changes to have extremely long legs and a short body, and
then to have extremely short legs and a long body. The original walking motion is
warped to preserve its uniform stride against the change of character size.
Our motion fitting method is also useful for generating a smooth transition
between motion clips. Figure 4.10(c) shows the transitions from walking to sneaking
and from sneaking to walking. The basic approach is very similar to the one pre-
sented by Rose et al. [81] We seamlessly connect the motion data by fading one out
while fading the other in. Over the fading duration, Hermite interpolation and time
warping techniques are used to smoothly blend the joint parameters of the motion
data. Since joint parameter blending may cause foot sliding, we enforce foot contact
constraints with the motion fitting method.
Table 4.1 gives a performance summary of the examples. Timing information
is obtained on a SGI Indigo2 workstation with an R10000 195 MHz processor. The
execution time for each example is not only influenced by quantitative factors such
51
as the number of frames, constraints and parameters, but also by qualitative factors
such as the difficulty of achieving desired features specified by constraints and the
quality of initial estimates. In particular, well-chosen initial estimates provide great
speedups for most of examples. One promising observation is that both execution
times and maximum errors rapidly decrease level by level. This implies that the
performance of our algorithm is not critically dependent on the error tolerance. In
all examples, every constraint is satisfied within or slightly over 1 % of the height of
the character by the hierarchical fitting of four levels. A few more levels may result
in a more accurate solution. As shown in experimental data, we can anticipate that
the computation cost for an additional level is much cheaper than the cost for the
prior level.
4.8 Discussion
In this section, we compare our motion fitting algorithm to the previous approaches
at several viewpoints.
52
53
Figure 4.9: Adaptation to environment change
(a) The original walking motion on the flat ground (b) The adapted motion for the rough terrain
54
(a) Climbing a rope for different characters (b) Character morphing
55
global method often suffers from over-shooting that may give an undesirable curve
shape. Ironically, the less accurate local method in Equation (4.2) is preferred in a
hierarchical framework. Since approximation errors at one level are incrementally
canceled out in the later levels, the accuracy at each level is not critical. This
local method is much more efficient and less prone to over-shooting than the global
method.
56
Chapter 5
Multiresolution Motion
Analysis and Synthesis
5.1 Motivation
Although it is relatively easy to obtain high quality motion clips by virtue of motion
capture techniques, crafting various animations of arbitrary length with available
motion clips is still difficult and involves significant manual efforts. Our work is
motivated by the following issues related to motion editing:
• Motion modification: It is desirable that motion editing tools have the capabil-
ity to change the global pattern of a motion while maintaining its fine details,
and conversely to change fine details while maintaining the global pattern. It
57
is also desirable that they have the capability to enhance or attenuate a motion
to generate its variations of different moods or emotions. A good example is
cartoon-like exaggeration that may have a more expressive power. To facilitate
those features, motion editing tools should manipulate each different level of
motion details, separately.
58
Bruderlin and Williams [7] who used a digital filterbank technique to store motion
data as a hierarchy of detail levels, where each level represents a different band of
frequencies. With the hierarchy of detail levels, they can not only edit motion data
interactively by amplifying/attenuating particular frequency bands but also gener-
ate a new motion by blending two existing motions band-wisely. While addressing
two issues such as motion modification and blending extensively, they hardly men-
tioned the issue of motion transition. Moreover, their approach often suffers from
the singularity due to parameterization of orientation data with Euler angles.
Provided with two motions sufficiently smooth, spline interpolation or space-
time control would be a good choice for producing a seamless transition between
them. However, most live-captured motion data highly oscillate to include fine de-
tails that may distinguish the motion of a live creature from the unnatural motion of
a robot. To connect the pair of motions seamlessly, we need to generate a natural-
looking in-between transition motion. Our multiresolution motion representation
scheme facilitates this motion through level-by-level manipulation of motion signals.
To illustrate the problem of singularity, it would be instructive to consider a
simple 2D example where orientations can be parameterized by a single scalar value
changing from zero to 2π. Motion signals have singularity at zero (or 2π) to cause
serious artifacts in signal processing. For 3D motions of a human-like articulated
figure, the problem gets worse so that many familiar motions such as simple turning
and arm swing may suffer from singularity. To avoid such a problem, it is desirable
to employ non-singular orientation representations such as rotation matrices or unit
quaternions which form a Lie group. Due to the inherent non-linearity of these
representations, however, it is challenging to generalize multiresolution analysis and
synthesis techniques for orientation data.
In this chapter, we make two major surgeries over the work of Bruderlin and
Williams to uniformly address all of the three issues for motion editing. We first con-
struct the multiresolution representation for motions with proper consideration on
handling orientation components. Then, we present a general scheme of synthesizing
a seamless motion of arbitrary length by combining canned motion clips.
59
The remainder of the chapter is organized as follows. After reviewing previous
work, a brief overview of our approach is described in Section 5.3. In Section 5.4,
we present a multiresolution structure for representing motion and explain how to
construct it. Section 5.5 is for synthesizing a new motion from available motion
clips using various motion editing techniques. In Section 5.6, we provide a gallery
of examples.
60
5.3 Overview
In this section, we briefly give an overall picture for this chapter. Our approach
consists of two parts: motion analysis and synthesis. To decompose a motion signal
into a hierarchy of detail characteristics, our approach relies on a spatial filtering
scheme presented in Chapter 3. With that scheme, we are able to construct a
multiresolution motion representation that is inspired by Gauss-Laplacian image
pyramids [8]. Level-wise manipulation of motion signals along the hierarchy enables
multiresolution editing that deals with fine details of motions.
In our representation, a motion signal is decomposed into a coarse base sig-
nal and a hierarchy of detail coefficients. Each level of the hierarchy consists of a
sequence of coefficients (a pair of 3D vectors). The coefficients at the base level
determine the overall shape of the motion signal, and its details are added succes-
sively with those at fine levels. The construction of the multiresolution representa-
tion is based on two basic operations: reduction and expansion. The expansion is
achieved by a subdivision operation that can be considered as up-sampling followed
by smoothing. The reduction is a reverse operation, that is, smoothing followed
by down-sampling. Motion filtering provides smoothing operators to avoid aliasing
caused by down-sampling and to interpolate missing information for up-sampling.
By exploiting the capability of the multiresolution representation, we can syn-
thesize an intended motion from canned motion clips. Through direct manipulation
of the detail coefficients at each level in the hierarchy, we can solve the motion mod-
ification and blending issues. For motion stitching, a base level motion signal is first
obtained by interpolating the coefficients at the coarsest level, and then fine details
are added successively with those at lower levels. To obtain the fine-level coefficients
in the transition motion connecting a pair of given motions, we employ a multires-
olution sampling scheme [3]. Since the detail coefficients of the transition motion is
sampled from the given motions, we can preserve the original characteristics for the
transition motion.
61
m(n−1)
Reduction Expansion
m(n) d(n−1)
62
Here, u = (x, y, z) ∈ R3 is considered as a purely imaginary quaternion (0, x, y, z) ∈
S3 . Given two motion signals m = {(pi , qi ) ∈ R3 × S3 } and m = {(pi , qi ) ∈
R3 × S3 }, we define their motion displacement d = {(ui , vi ) ∈ R3 × R3 } measured
in a local (body-fixed) coordinate system such that m = m ⊕ d or d = m m.
From the composite transformation T(pi ,qi ) = T(pi ,qi ) ◦ T(ui ,exp(vi )) , we can derive
each element of a displacement map d as follows:
v
where exp(v) denotes a 3-dimensional rotation about the axis v ∈ R3 by angle
2v ∈ R.
pi ← pi − λLj pi , (5.4)
63
where λ is a diffusion coefficient and L is a Laplacian operator [17, 38, 59, 91].
Filtering with this rule disperses small perturbations rapidly, while the main shape
is degraded slightly. Here, Laplacian operators can be estimated for discrete signals
by replacing differential operators with forward divided difference operators such
that Lj = ∆2j , where
pi+1 − pi
∆1 pi = ,
ti+1 − ti
(5.5)
∆j−1 pi+1 − ∆j−1 pi
∆j pi = , for j > 1.
ti+j − ti
Then, the update rule yields an affine-invariant spatial mask that can be generalized
for orientation data using Equation (3.12). For example, by adopting the second
1
Laplacian operator L2 , we have a filter mask 24 (−λ, 4λ, 24 − 6λ, 4λ, −λ) and its
corresponding filter,
1
pi = −λpi−2 +4λpi−1 +(24 − 6λ)pi +4λpi+1 −λpi+2 ,
24
λ
qi = qi exp (ωi−2 − 3ωi−1 + 3ωi − ωi+1 ) .
24
64
Up-sampling Smoothing
at level n + 1 are the frames (pni , qni ) at level n, and the odd numbered frames
(pn+1 n+1
2i+1 , q2i+1 ) are newly inserted between old frames.
are inserted at the halfway between two successive old frames using (spherical)
linear interpolation. Assuming that the motion frames are sampled uniformly,
1 n
pn+1
2i+1 = 2 pi + 12 pni+1 and qn+1 n n
2i+1 = slerp 12 (qi , qi+1 ) which will be refined in the
following step to give a smoother motion. Here, slerpt (q1 , q2 ) denotes a spherical
linear interpolation between two unit quaternion points q1 and q2 with interpo-
lation parameter t, that is, slerpt (q1 , q2 ) = q1 exp(t · log(q−1
1 q2 )) [88]. Apply-
ing the smoothing operator HE to the up-sampled data with a subdivision mask
1 9 9 1
(− 16 , 0, 16 , 0, 16 , 0, − 16 ) gives the refined data as follows:
pn+1
2i = pni ,
1
pn+1 (−pn+1 2i+2 − p2i+4 )
n+1
2i+1 = 2i−2 + 9p2i + 9pn+1 n+1
(5.8)
16
1
= (−pni−1 + 9pni + 9pni+1 − pni+2 ).
16
65
d(N −1) ... d(1) d(0)
on the cubic polynomial curve interpolating four neighboring points pni−1 , pni , pni+1
and pni+2 [20, 21]. We can generalize this scheme for orientation data using Equa-
tion (3.12) to have
qn+1
2i = qni ,
n
ωi−1 − ωi+1
n (5.9)
qn+1 n n
2i+1 = slerp 12 (qi , qi+1 ) exp( ),
16
Construction: Our construction algorithm starts with the original motion m(N )
to compute its simplified versions and their corresponding displacement maps suc-
cessively in a fine-to-coarse order. Suppose that we are now at the n-th level for
0 ≤ n ≤ N − 1. Given a signal m(n+1) , we can compute a coarser signal m(n) by
reduction. The expansion of m(n) interpolates the missing information to approxi-
mate the original signal m(n+1) . Thus, the difference between them is expressed as
a displacement map d(n) as follows:
m(n) = Rm(n+1) ,
(5.10)
d(n) = m(n+1) Em(n) ,
Cascading these operations until there remain a sufficiently small number of frames
in the motion signal, we can construct the multiresolution representation which
includes the coarse base signal m(0) and a series of displacement maps as shown in
66
Figure 5.3(upper). Formally stating, the multiresolution representation of m(N ) is
given by
m(0) = RN m(N ) ,
(5.11)
d(n) = RN −n−1 m(N ) ERN −n m(N ) ,
for 0 ≤ n ≤ N − 1. The original signal m(N ) can be reconstructed from the mul-
tiresolution representation by recursively adding the displacement map at each level
to the expansion of the signal at the same level, that is,
67
5.4.3 Extension
Though most motion captured data are sampled at a sequence of time instances of
uniform interval, we often need to process non-uniform data to support tasks such
as time warping which aligns motion clips with respect to time [7]. To construct a
multiresolution representation for non-uniformly sampled motion data, we further
generalize the reduction and expansion operators for a non-uniform setting. For
reduction, we can easily derive smoothing masks by estimating discrete Laplacian
operators for a non-uniform setting, since the divided difference operator is well-
defined. Given a knot sequence [ti−2 , ti−1 , ti , ti+1 , ti+2 ], a non-uniform smoothing
mask (c0 , c1 , c2 , c3 , c4 ) for a second Laplacian operator is as follows:
1
c0 = ,
(ti−2 − ti−1 )(ti−2 − ti )(ti−2 − ti+1 )(ti−2 − ti+2 )
1
c1 = ,
(ti−2 − ti−1 )(ti−1 − ti )(ti−1 − ti+1 )(ti−1 − ti+2 )
1
c2 = , (5.13)
(ti−2 − ti )(ti−1 − ti )(ti − ti+1 )(ti − ti+2 )
1
c3 = ,
(ti−2 − ti+1 )(ti−1 − ti+1 )(ti − ti+1 )(ti+1 − ti+2 )
1
c4 = .
(ti−2 − ti+2 )(ti−1 − ti+2 )(ti − ti+2 )(ti+1 − ti+2 )
For expansion, the coefficients of the subdivision mask are derived from the cu-
bic Lagrange polynomials [60]. The cubic polynomial that interpolates four points
(pni−1 , pni , pni+1 , pni+2 ) defined over the knot sequence [tm n n n
i−1 , ti , ti+1 , ti+2 ] can be writ-
ten as follows:
p(t) = l1000 (t)pni−1 +l0100 (t)pni +l0010 (t)pni+1 +l0001 (t)pni+2 , (5.14)
where the cardinal function lu0 u1 u2 u3 (t) is the unique cubic polynomial that in-
terpolates uj at tni+j−1 for 0 ≤ j ≤ 3 [57]. Note that Equation (5.14) is a sim-
ple generalization of Equation (5.8). Therefore, we can obtain a subdivision mask
(l1000 (tn+1 n+1 n+1 n+1 n+1 n+1
2i+1 ), 0, l0100 (t2i+1 ), 0, l0010 (t2i+1 ), 0, l0001 (t2i+1 )) to compute p2i+1 and q2i+1 .
68
1.0
0.0
-1.0
Figure 5.4: Level-of-detail generation for a live-captured signal. The four curves rep-
resent the change of w-, x-, y-, and z-components, respectively, of a unit quaternion
with respect to time. (from left to right) Original signal and its approximations at
successively coarser resolutions
pn+1
1 from the cubic polynomial that interpolates the four left-most points pn0 , pn1 ,
pn2 and pn3 of the original sequence m(n) . qn+1
1 can also be computed with the
spatial mask induced from the interpolating polynomial.
Our multiresolution representation allows for modifying its fine details at each level
independently of those at the other levels through level-wise manipulation of detail
coefficients. Note that each detail coefficient is represented with a pair of 3D vectors
that correspond to the displacements for position and orientation, respectively.
A natural application is to construct an LOD (level-of-detail) representation
of a motion that consists of its several versions at various levels of detail (see Fig-
ure 5.4). Given a detailed signal, we can construct a series of successively simpler
versions by removing the detail coefficients level by level starting from the finest
level. For continuous transition between levels, we also consider the fractional levels
n + α of a motion signal with blending parameter 0 < α < 1, that define a linear
interpolation between levels n and n + 1. To obtain a fractional level motion, we
69
scale the coefficients at level n by a factor of α, and set all coefficients at higher
levels zero.
Another promising application is enhancing/attenuating the detailed features
of a motion signal to convey different moods or emotions. This application can be
achieved through the level-wise scaling of detail coefficients with different scaling
factors. For the motion “jump and kick” in Figure 5.6, we multiply the detail coef-
ficients by constant factors to produce the enhanced (top) and attenuated (bottom)
versions, respectively. The enhancement results in a higher jump and kick, while
the attenuation conveys a milder emotional mood and softer action. The effects
are clearly observed along the trajectories of the feet. Figure 5.7 shows a motion in
which the face is hit by an object. The enhanced and attenuated versions successfully
simulate the effects of hard and soft hitting, respectively.
Our representation scheme is also useful for blending motion clips together. A par-
ticular example in Figure 5.8 blends three motions of the same size, that is, straight
walking mws , turning with a walk mwt , and straight walking with a limp mls . From
these motions, we produce a new motion mlt that describes turning with a limp.
The basic observation is that the global shape of the target motion is similar to mwt
(0)
and its fine details are similar to mls . Therefore, we obtain the base signal mlt by
(0) (0) (0)
applying the displacement map Φst = mwt mws to mls . Similarly, the detail co-
(n) (n) (n) (n)
efficients in dlt are computed by applying the displacement map Φwl = dls dws
(n)
to dwt .
(n)
Here, Φst describes how a straight movement is transformed to a turning, and Φwl
does how normal walking is transformed to limping.
Time warping is an ingredient of blending schemes that gives a correspondence
among example motions with respective to time. Bruderlin and Williams [7] pro-
70
vided a good explanation how time warping can be used to achieve a better blend. In
general, time warping yields a non-uniform correspondence that introduces a com-
plication to blending schemes. To circumvent this complication, we often resample
example motions non-uniformly to yield a one-to-one frame correspondence between
each pair of example motions that are supposed to be blended. The non-uniform
subdivision and smoothing operators introduced in the previous section facilitate
the construction of multiresolution representations for the resampled signals.
Given motion signals A and B, we have three cases to stitch them depending on
their overlapping in time:
In the context of image mosaics, the first two cases are well discussed by Burt and
Adelson [9]. It is straightforward to adopt their idea for motion stitching. For case 1,
we can achieve motion stitching through the level-wise blending of coefficients along
the overlapping interval. Time warping can also be used to find a better correspon-
dence between motions A and B over the interval. For case 2, the coefficients at
each level of A as well as B are first extrapolated across its boundary to form an
overlapped transition interval and then blended along the interval. Since one mo-
tion abuts on the other, the extrapolation do not yield serious artifacts. Therefore,
we focus on case 3 in which we need to generate a seamless in-between transition
motion T that connects the end of A and the start of B. Since there is no overlap-
ping between A and B, we may not use any blending technique. A simple solution
would be to estimate the linear and angular velocities at the boundaries of A and B,
and then to perform a C 1 interpolation. However, this solution has two difficulties:
First, it is difficult to robustly estimate the velocities from live-captured signals since
they usually oscillate to include fine details. Second, the resulting transition motion
exhibits visual artifacts due to the lack of fine details.
71
keyframes
72
(n)
at different resolutions. Let f (mi ) be a feature function that maps a motion signal
(n)
mi to a vector-value for a feature response such as a linear or angular velocity
change measured at a local coordinate system. To consider the features of different
scales simultaneously, we define the vector of feature responses such that
(n) (n) (n−1) (0)
F(mi ) = f (mi ), f (m i ), · · · , f (m i ) , (5.16)
2 2n
where the feature response with a fractional index i + α, for 0 < α < 1, is defined
as a linear combination of the feature responses at frames i and i + 1 with blending
parameter α.
Our multiresolution sampling scheme generates d(n) (T ), 0 ≤ n < N , level by
level upward from the coarsest level. At each level n, we sample the coefficients
of d(n) (T ) from d(n) (C), where C is A, B, or even a third motion signal provided
by a user. To determine a value for (ûni , v̂in ) ∈ d(n) (T ), we first select a small
set of candidates {(unj , vjn )} from the corresponding level d(n) (C). By matching
(n−1) (n−1)
the features of mi/2 (T ) and those of mj (C), while varying the index j, we
find the best match at some frame j ∗ . That is, we minimize the feature difference
(n−1) (n−1)
F(mi/2 (T )) − F(mj (C)) over all j to determine j ∗ . Its corresponding dis-
(n)
placement (un2j ∗ , v2j ∗) ∈ d
n
(C) is taken as a candidate for (ûni , v̂in ). There are
alternatives in selecting candidates depending on the characteristics of given motion
data. If two motions A and B have similar appearance, we select a single candidate
from either A or B for (ûni , v̂in ). Otherwise, we select one candidate from A and
the other from B to blend them along the transition interval. When user-provided
motion is used, we sample a constant number of candidates to blend. The weight for
each candidate is proportional to the reciprocal of the magnitude of a corresponding
feature difference.
5.5.4 Discussion
Other Applications: The multiresolution sampling scheme can be used for other
applications. A practical application is noise removal. Given a motion signal cor-
rupted by impulse noise, we would like to restore the corrupted frames while main-
73
taining the characteristics of the signal. This problem can be solved easily by tearing
off the corrupted frames and filling the missing portion through motion transition.
Another application is for seamlessly duplicating and shuffling the frames of a given
motion (see Figure 5.11). Usually, these can be done by splitting a given motion into
several pieces of motion segments and combining them again in a given order. Our
multiresolution sampling scheme can perform this task without explicit splitting and
recombining. Given an example motion m, we first decompose it into a base signal
m(0) and a series of displacement maps. The base signal m̂(0) of a new motion can
be specified interactively by duplicating and shuffling the frames of m(0) . Then, our
scheme samples the detail coefficients of a new motion m̂ from the given motion m
through feature matching.
74
R10000 processor, 195 MHz) with various motion data (both 30 Hz and 24 Hz) that
were captured at a commercial studio. The execution time for decomposition and
reconstruction is almost negligible. The most time-consuming component of our
approach is hierarchical motion fitting which takes about 0.01 to 0.03 seconds per
frame.
Figures 5.6 and 5.7 show two captured motions, “jump and kick” and “face hit”.
Those motions have 88 and 169 frames of uniform interval, respectively. For each
of them, extra frames are added at the end of the signal to form a multiresolution
representation of four levels. We multiply constant factors of 1.5 and 0.5 to the detail
coefficients at d(0) and d(1) to enhance (top) and attenuate (bottom) the motions,
respectively.
The blending example in Figure 5.8 combines three motion clips that have 186,
193, and 210 frames, respectively. For time normalization, we resample each of them
such that it has 25 frames between each pair of consecutive heel-strikes of the right
foot to establish a frame correspondence among the motions. With these resampled
motions, we generate a new motion (lower right) using Equation (5.15).
Figure 5.9 shows a transition motion between walking and running that are
not overlapped in time. The walking motion ends with the right foot of a character
moving forward and the running motion starts with its right foot forward as well.
We insert an additional keyframe with its left foot forward to enable its legs to
swing over the transition interval (top). The interpolation at a base level offers a
smooth connection between given motions but yields serious artifacts due to the lack
of fine details. Those artifacts are observed clearly along the trajectory of the head
that waves for the original motions but move straight for a synthesized transition
motion (middle). Our multiresolution sampling scheme circumvents such artifacts by
incorporating the original visual characteristics into a transition motion (bottom).
Our multiresolution sampling scheme is also useful for noise removal. The mo-
tion signals in Figure 5.10 have 161 frames, and 15 successive frames at the middle of
them are corrupted by impulse noise (upper row). We remove the corrupted frames
and divide the remaining portion into two separate segments to obtain their indi-
vidual multiresolution representations. The representations thus obtained are used
75
later to combine them again by sampling detail coefficients for the missing portion
(lower row). Figure 5.10 shows two examples with different characteristics. While
a motion signal (left column) for the left thigh oscillates rapidly up and down to
include relatively large features, a signal (right column) for the left shoulder contains
small features that resemble noise. In either case, our approach successfully recon-
structs the missing portion to have the same visual characteristics to its neighboring
portion.
Figure 5.11 illustrates how we can create a longer sequence of frames from a
short example motion. We start with a given example motion (upper left) which is
resampled such that it has 24 frames for each cycle (from a heel-strike to its next
for the same foot). Recursive reduction of the motion gives a base signal of 6 frames
(lower left). We duplicate and shuffle its frames to obtain a new base signal of 10
frames (lower right). Finally, a desired motion (upper right) is achieved using our
multiresolution sampling scheme that adds fine details to the new base signal.
76
Figure 5.6: Jump and kick. (top) Attenuated; (middle) Original; (bottom) Enhanced
Figure 5.7: Face hit. (top) Attenuated; (middle) Original; (bottom) Enhanced
77
Φst
Φwl Φwl
Φst
Figure 5.8: Frequency-based motion blending. (upper left) Straight walking; (upper
right) Turning with a normal walk; (lower left) Walking with a limp; (lower right)
Turning with a limp
Figure 5.9: Motion transition between walking and running that are not overlapped
in time. Motions are depicted by superimposing their stick figures. (top) Original
motions and a user-specified keyframe between them; (middle) Smooth interpolation;
(bottom) Adding fine details
78
Figure 5.10: Noise removal for live-captured motion data. (left column) Left thigh;
(right column) Left shoulder; (upper row) Corrupted by impulse noise; (lower row)
Corrupted frames recovered
Simplify
Reconstruct
Rearrange
Figure 5.11: Duplication and shuffling. (upper left) An example motion; (lower left)
Its base signal; (lower right) A modified base signal; (upper right) A synthesized
motion
79
Chapter 6
Conclusion
6.1 Contributions
Crafting animation involves a variety of signal processing tasks such as smoothing,
attenuation, enhancing, resampling, interactive editing, blending, stitching, and so
on. This thesis elaborates fundamental techniques that facilitate such tasks. That
is, spatial filtering for orientation data, motion editing with spacetime constraints,
and multiresolution analysis/synthesis.
Spatial masking is a simple, powerful technique for digital signal processing.
We present a novel scheme to design an orientation filter that corresponds to a given
spatial mask. We show that our orientation filters have some desirable properties
such as coordinate-invariance, shift-invariance, and symmetry. We also provide some
examples that perform smoothing and sharpening on orientation signals. Experi-
mental results show that our orientation filters perform well for live-captured data.
We investigate a new approach to adapting an existing motion of a human-like
character to have desired features specified by a set of constraints. The key idea
of our approach is to introduce a hierarchical displacement mapping by which we
cannot only manipulate a motion adaptively to satisfy a large set of constraints
within a specified error tolerance, but also edit an arbitrary portion of the motion
through direct manipulation. The performance of our method is greatly improved
80
by employing a curve fitting technique that minimizes a local approximation er-
ror. The hierarchical structure compensates for the possible drawbacks of the local
approximation method by globally propagating displacements at coarse levels and
later tuning at fine levels. Further performance gain is achieved by the new inverse
kinematics solver. Our hybrid algorithm performs much faster than pure numerical
algorithms.
Motion analysis and synthesis can benefit from hierarchical representations and
procedures. We have presented a new multiresolution approach to motion analysis
and synthesis. Our motion representation allows to modify the coefficients at each
level in the hierarchy independently of those at the other levels through the level-wise
manipulation of detail coefficients. Exploiting this capability, we have developed a
variety of motion editing tools that can be used for modifying, blending, and stitching
highly detailed motion data. The success of our approach is mainly due to motion
filtering and displacement mapping. Our filtering scheme can handle orientations
as well as positions in a coherent manner. The notion of displacement mapping
provides an elegant formulation for multiresolution representations in which each
individual detail coefficient is represented as a pair of 3D vectors measured at a
local coordinate system. This formulation leads to multiresolution motion synthesis
through coordinate-independent operations such as scaling, blending, interpolation,
and sampling.
81
the need for copyright protection. Watermarking is to embed an authenticity or
ownership information into the data. The embedded information is an invisible
identification code permanently remains in the data unless it is extremely degraded.
It is well-known that a watermarking scheme can be more robust and reliable with
multiresolution transformation [15, 56, 78].
82
clips. Thus, a sequence of task successively performed by a synthetic character can
be instantiated as a seamless motion using various motion editing tools such as
stitching, blending, and shuffling explained in this thesis.
A typical task used frequently in a script is to move a character from the start
position to the goal position. In our context, achieving such a task may involve two
problems: One is to align given motion clips such as “straight walk”, “turn left”, and
“turn right” in a desired sequence to form a seamless motion which connects the start
and goal positions approximately. The other is refining the motion thus obtained
to enforce exact interpolation at boundary frames and to take valid footholds at
intermediate frames. In my opinion, a randomized planning approach is well suited
to this paradigm [12, 50, 51, 52, 53]. The basic idea of randomized planning is to
construct a roadmap (a directed graph) whose nodes correspond to valid (collision-
free) posture of a character, and in which two nodes are connected by an edge if
the character can move from one posture to another using a motion clip chosen
from a given candidate set within a specified tolerance. Then, the motion planning
problem is reduced to the shortest path problem on a directed graph that can be
solved efficiently. Our hierarchical motion fitting technique may be used at the
refinement step.
83
and environments to avoid geometric inconsistency such as inter-penetration. In
terms of efficiency, there have been efforts to speed up the inverse kinematics routine
by trading some extent of generality for efficiency [87].
84
Bibliography
85
[8] P. J. Burt and E. H. Adelson. The Laplacian pyramid as a compact image
code. IEEE Transactions on Communications, 31:532–540, 1983.
[11] K.-J. Choi and H.-S. Ko. On-line motion retargetting. In Proceedings of Pacific
Graphics ’99, pages 32–42, 1999.
86
[19] R. A. DeVore, B. Jawerth, and B. J. Lucier. Surface compression. Computer
Aided Geometric Design, 9(3):219–239, 1992.
[27] D. R. Forsey and R. H. Bartels. Surface fitting with hierarchical splines. ACM
Transactions of Graphics, 14(2):134–161, April 1995.
87
[31] M. Girard and A. A. Maciejewski. Computational modeling for the computer
animation of legged figures. Computer Graphics (Proceedings of SIGGRAPH
85), pages 263–270, July 1985.
[37] S. Guo and J. Robergé. A high-level control mechanism for human locomotion
based on parametric frame space interpolation. In Proceedings of Computer
Animation and Simulation ’96, Eurographics Animation Workshop, pages 95–
107. Springer-Verlag, 1996.
88
[41] D. J. Heeger and J. R. Bergen. Pyramid based texture analysis/synthesis.
Computer Graphics (Proceedings of SIGGRAPH 95), pages 229–238, August
1995.
[42] K. Hirata and T. Kato. Query by visual example–content based image re-
trieval. In A. Pirotte, C. Delobel, and G. Gottlob, editors, Advances in
Database Technology (EDBT ’92), pages 56–71. Springer-Verlag, Berlin.
[46] B. Jähne. Digital Image Processing: Concepts, Algorithms and Scientific Ap-
plications. Springer-Verlag, 1992.
[49] T. Kato, T. Kurita, N. Otsu, and K. Hirata. A sketch retrieval method for
full color image database–query by visual example. In Proceedings of the 11th
IAPR International Conference on Pattern Recognition, pages 530–533. IEEE
Computer Society Press, Los Alamitos, CA, 1992.
89
[51] L. Kavraki and J.-C. Latombe. Randomized preprocessing of configuration
space for fast path planning. In Proceedings of IEEE International Conference
on Robotics and Automation, pages 2138–2145, 1994.
90
[61] Y. Koga, K. Kondo, J. Kuffer, and J. Latombe. Planning motions with in-
tentions. Computer Graphics (Proceedings of SIGGRAPH 94), pages 395–408,
July 1994.
[65] S. Lee, K.-Y. Chwa, S. Y. Shin, and G. Wolberg. Image metamorphosis us-
ing snakes and free-form deformations. Computer Graphics (Proceedings of
SIGGRAPH 95), pages 439–448, August 1995.
[66] S. Lee, G. Wolberg, and S. Y. Shin. Scattered data interpolation with multi-
level B-splines. IEEE Transactions on Visualization and Computer Graphics,
3(3):228–244, 1997.
[67] S. J. Leffler, T. Reeves, and E. F. Ostby. The menv modelling and animation
environment. The Journal of Visualization and Computer Animation, 1(1):33–
40, August 1990.
91
[72] W. Niblack, R. Barber, W. Equitz, M. Flickner, E. Glasman, D. Petkovic,
P. Yanker, C. Faloutsos, and G. Taubin. The QBIC project: Querying images
by content using color, texture, and shape. In volumn 1908 of Proceedings
of the SPIE on Storage and Retrieval for Image and Video Databases, pages
173–187, 1993.
[74] B. Paden. Kinematics and Control Robot Manipulators. PhD thesis, University
of California, Berkeley, 1986.
[75] K. Perlin and A. Goldberg. Improv: A system for scripting interactive actors
in virtual worlds. Computer Graphics (Proceedings of SIGGRAPH 96), pages
205–216, August 1996.
[76] J. C. Platt and A. H. Barr. Constraint methods for flexible models. Computer
Graphics (Proceedings of SIGGRAPH 88), pages 279–288, August 1988.
92
[82] C. Rose, B. Guenter, B. Bodenheimer, and M. F. Cohen. Efficient genera-
tion of motion transitions using spacetime constraints. Computer Graphics
(Proceedings of SIGGRAPH 96), pages 147–154, August 1996.
[87] H. J. Shin, J. Lee, and S. Y. Shin. On-line motion retargetting for performance-
based animation. In preparation, 2000.
[90] Gilbert Strang. The discrete cosine transform. SIAM Review, 41(1):135–147,
1999.
[92] N. M. Thalmann and D. Thalmann. The use of high-level 3-d graphical types
in the mira animation system. IEEE CG&A, 3(9):9–16, December 1983.
93
[93] D. Tolani and N. I. Badler. Real-time inverse kinematics of the human arm.
Presence, 5(4):393–401, 1996.
[96] D. J. Wiley and J. K. Hahn. Interpolation synthesis for articulated figure mo-
tion. In Proceedings of IEEE Virtual Reality Annual International Symposium
’97, pages 157–160. IEEE Computer Society Press, 1997.
94