A Hierarchical Approach To Motion Analysis and Synthesis For Articulated Figures

PhD Thesis
A Hierarchical Approach to
Motion Analysis and Synthesis
for Articulated Figures
2000 . 2 . 18
Jehee Lee
Department of Computer Science

Korea Advanced Institute of Science and Technology
Abstract
Animating human-like characters is a recurring issue in computer graphics.

Recently, capturing live motion has become one of the most promising technologies
in character animation. Due to the success of this technology, realistic and highly
detailed motion data are rapidly spread in computer graphics applications. Crafting
animation with such data requires a rich set of specialized tools such as interactive
editing, retargetting, blending, stitching, smoothing, enhancing, up-sampling, down-
sampling and so on. With those tools, animators combine motion clips together into
animations of arbitrary length with great variety in character size, environment
geometry, and scenario.
A typical process of producing animation from live-captured data includes three
major steps: The first step is to filter the raw input signal received from a motion
capture system to obtain smoother, pleasing motion data. The next is to adapt the
motion data to fit into a specific character or a virtual environment that may different
from the live puppeteer or the environment, respectively, where the motion takes
place. The motion segments thus obtained are combined into a seamless animation
in the final step. In the thesis, we elaborate fundamental techniques that facilitate
such tasks with proper consideration on orientation components and articulated
structures. Specifically, three issues are addressed: We first present a general scheme
of constructing spatial filters that perform smoothing and sharpening on orientation
data. The next issue is to adapt an existing motion of a human-like character to
have the desired features specified by a set of constraints. An efficient algorithm
based on a hierarchical curve fitting technique and a fast inverse kinematics solver is
provided. Finally, we present a multiresolution analysis scheme which decomposes
a motion signal into a base level and a hierarchy of detail coefficients, followed by
a synthesis scheme that produces a new motion by hierarchically combining detail
coefficients of prescribed example motions.
Contents
1 Introduction 1
2 Preliminary 5
3 General Construction of Spatial Filters for Orientation Data 10

3.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.2 General Properties of Spatial Filters . . . . . . . . . . . . . . . . . . 12
3.3 Spatial Orientation Filters . . . . . . . . . . . . . . . . . . . . . . . . 13
3.3.1 Basic Idea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.3.2 Filter Design . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.3.3 Properties of Orientation Filters . . . . . . . . . . . . . . . . 17
3.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.5.1 Implementation Issues . . . . . . . . . . . . . . . . . . . . . . 22
3.5.2 Experimental Results . . . . . . . . . . . . . . . . . . . . . . 23
3.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4 Motion Editing with Spacetime Constraints 28

4.1 Basic Idea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.2.1 Motion Editing . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.2.2 Hierarchical Curve/Surface Manipulation . . . . . . . . . . . 31
4.2.3 Inverse Kinematics . . . . . . . . . . . . . . . . . . . . . . . . 32
iii
4.3 Preliminary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.3.1 Displacement Mapping . . . . . . . . . . . . . . . . . . . . . . 33
4.3.2 Multilevel B-spline Approximation . . . . . . . . . . . . . . . 34
4.4 Hierarchical Motion Fitting . . . . . . . . . . . . . . . . . . . . . . . 36
4.4.1 Hierarchical Displacement Mapping . . . . . . . . . . . . . . 36
4.4.2 Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.4.3 Motion Fitting . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.4.4 Knot Spacing . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.4.5 Initial Guesses . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.5 Inverse Kinematics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.5.1 A Numerical Approach . . . . . . . . . . . . . . . . . . . . . 41
4.5.2 A Hybrid Approach . . . . . . . . . . . . . . . . . . . . . . . 42
4.5.3 Arm and Leg Postures . . . . . . . . . . . . . . . . . . . . . . 44
4.5.4 Derivation for Equation (4.7) . . . . . . . . . . . . . . . . . . 45
4.6 Joint Limit Specification . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.7 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.8 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
5 Multiresolution Motion Analysis and Synthesis 57

5.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
5.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.3 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
5.4 Multiresolution Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 62
5.4.1 Displacement Mapping . . . . . . . . . . . . . . . . . . . . . . 62
5.4.2 Multiresolution Representation . . . . . . . . . . . . . . . . . 63
5.4.3 Extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
5.5 Multiresolution Synthesis . . . . . . . . . . . . . . . . . . . . . . . . 69
5.5.1 Motion Modification . . . . . . . . . . . . . . . . . . . . . . . 69
5.5.2 Motion Blending . . . . . . . . . . . . . . . . . . . . . . . . . 70
5.5.3 Motion Transition . . . . . . . . . . . . . . . . . . . . . . . . 71
5.5.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.6 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
iv
6 Conclusion 80
6.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
Bibliography 85
v
List of Figures
2.1 Exponential and logarithmic maps . . . . . . . . . . . . . . . . . . . 7

2.2 Geodesics and spherical linear interpolation . . . . . . . . . . . . . . 9
3.1 The transform between an angular signal in S3 and a linear signal in

R3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.2 The conceptual view of filtering orientation data . . . . . . . . . . . 16
3.3 Bird flying . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.4 The live-captured motion . . . . . . . . . . . . . . . . . . . . . . . . 26
3.5 Experimental results of orientation filters . . . . . . . . . . . . . . . 27
4.1 Hierarchical curve fitting to scattered data through multilevel B-spline

approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.2 A live-captured walking motion was interactively modified at the mid-
dle frame such that the character bent forward and lowered the pelvis.
The character is depicted at the modified frame. The range of defor-
mation is determined by the density of the knot sequences. The knots
in τ1 are spaced every (top) 4, (middle) 6, and (bottom) 12 frames. 39
4.3 A human-like figure that has explicit redundancies at its limb linkages 43
4.4 The process for adjusting the arm posture . . . . . . . . . . . . . . . 44
4.5 Computing the elbow angle . . . . . . . . . . . . . . . . . . . . . . . 46
4.6 Primitive joint constraints . . . . . . . . . . . . . . . . . . . . . . . 47
4.7 The joint limit of the shoulder specified by a composite constraint . 48
4.8 Point inclusion test . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
vi
4.9 Adaptation to environment change . . . . . . . . . . . . . . . . . . . 53
4.10 Adaptation to character change and motion transition . . . . . . . . 54
5.1 Wiring diagram of the multiresolution analysis . . . . . . . . . . . . 62

5.2 Interpolatory subdivision . . . . . . . . . . . . . . . . . . . . . . . . 65
5.3 Decomposition (upper) and reconstruction (lower) . . . . . . . . . . 66
5.4 Level-of-detail generation for a live-captured signal. The four curves
represent the change of w-, x-, y-, and z-components, respectively, of
a unit quaternion with respect to time. (from left to right) Original
signal and its approximations at successively coarser resolutions . . . 69
5.5 Motion transition through multiresolution sampling. (upper) Motion
signals to be connected; (middle) Smooth interpolation at a base level;
(lower) Transition motion with fine details . . . . . . . . . . . . . . 72
5.6 Jump and kick. (top) Attenuated; (middle) Original; (bottom) En-
hanced . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
5.7 Face hit. (top) Attenuated; (middle) Original; (bottom) Enhanced . 77
5.8 Frequency-based motion blending. (upper left) Straight walking; (up-
per right) Turning with a normal walk; (lower left) Walking with a
limp; (lower right) Turning with a limp . . . . . . . . . . . . . . . . 78
5.9 Motion transition between walking and running that are not over-
lapped in time. Motions are depicted by superimposing their stick
figures. (top) Original motions and a user-specified keyframe between
them; (middle) Smooth interpolation; (bottom) Adding fine details 78
5.10 Noise removal for live-captured motion data. (left column) Left thigh;
(right column) Left shoulder; (upper row) Corrupted by impulse noise;
(lower row) Corrupted frames recovered . . . . . . . . . . . . . . . . 79
5.11 Duplication and shuffling. (upper left) An example motion; (lower
left) Its base signal; (lower right) A modified base signal; (upper right)
A synthesized motion . . . . . . . . . . . . . . . . . . . . . . . . . . 79
vii
List of Tables
4.1 Performance data. # of parameters counts the DOFs of a character

used for resolving kinematic constraints. A number in ( ) indicates
that of reduced parameters. The maximum error is measured in per-
centages of the height of the character. . . . . . . . . . . . . . . . . 55
viii
Chapter 1
Introduction
Animating human-like characters is a recurring issue in computer graphics. In the

past two decades, two main approaches have evolved for creating motion of synthetic
figures. The first approach develops kinematic tools and spline-based keyframing
techniques to provide animators with better control. The second approach consults
with physical laws, that govern motion in real worlds, to generate physically plausible
motion. Although each of them has produced some pleasing computer-generated
motion, they both have the difficulty of scaling up the complexity of articulated
characters. A reasonable human model in computer graphics may have 40 degrees of
freedom. Manipulating such a model is cumbersome and extremely time-consuming
even for skillful animators. Automatic and semi-automatic techniques based on
dynamics also suffer from excessive computational costs and poor controllability.
Since early 1990s, there has been a drastic change of paradigm in computer
animation; capturing live motion has become one of the most promising technologies
in character animation. The advent of motion capture systems offers a convenient
means of acquiring realistic motion data. Due to the success of those systems,
realistic and highly detailed motion data are rapidly spread in computer graphics. An
archive of motion clips is also commercially available. Such data sets are used widely
in a variety of applications including animation film production and interactive 3D
character animation for television and video games.
1
Crafting animations with a set of available motion clips requires a rich set of spe-
cialized tools such as interactive editing, retargetting, blending, stitching, smooth-
ing, enhancing, up-sampling, down-sampling, compression and so on. Motion clips
are typically short and specific to particular characters, environments and the con-
text of animation. With those tools, animators combine the motion clips together
into animations of arbitrary length with great variety in character size, environment
geometry, and scenario.
Motion data consist of a bundle of motion signals. Each signal represents a
sequence of sampled values that correspond to either the position or orientation of
a body segment. We will denote the position by a 3-dimensional vector and the
orientation by a unit quaternion. It is well-known that unit quaternions can repre-
sent 3-dimensional orientations smoothly and compactly without singularity. Those
signals are sampled at a sequence of discrete time instances at uniform intervals to
form a motion clip that consists of a sequence of frames. In each frame, the sampled
values from the signals determine the configuration of an articulated figure at that
frame.
A great deal of research has been devoted to processing regularly sampled
vector-valued data, for example, images which consist of RGB values sampled on
regular grids. An abundance of signal processing tools in both spatial and frequency
domains have been developed for image processing. Unfortunately, it is hard to
employ those tools without significant modification for processing motion data by
two reasons: The first reason is due to orientation components of motion data,
such as joint angles and the orientation of a root segment. An orientation in a 3-
dimensional space cannot be parameterized by a vector in the 3-dimensional space
without yielding singularity. The non-singular representations, such as rotation
matrices and unit quaternions, form a Lie group which introduces a complication
to signal processing techniques. The second reason is the lack of consideration on
articulated structures which yield kinematic relationships among motion signals that
comprise a motion clip. For example, if a signal that corresponds to each individual
joint such as the shoulder and the elbow is considered independently, we may have
undesirable trajectories of end-effectors, that is, hands of a synthetic character.
2
A typical process of producing animation from live-captured data includes three
major steps: The first step is to filter the raw input signal received from a motion
capture system to obtain smoother, pleasing motion data. The next is to adapt
the motion data to fit into a specific character or a virtual environment that may
different from the live puppeteer or the environment, respectively, where the mo-
tion takes place. Finally, the motion segments thus obtained are combined into a
seamless animation. In the thesis, we elaborate fundamental techniques that facili-
tate such tasks with proper consideration on orientation components and articulated
structures. The specific issues addressed in the thesis can be summarized as follows:
General Construction of Spatial Filters for Orientation Data: Spatial

masking (linear shift-invariant filtering) is a simple, powerful technique for digi-
tal signal processing. The theory in a vector space is well-established, whereas the
research on orientation data is now emerging. We present a new scheme to apply a
spatial mask to orientation data. The key idea is to transform the orientation data
into their analogies in a vector space, to apply a spatial mask on them, and then to
transform the results back to the orientation space. This scheme gives spatial filters
for orientation data satisfying important properties such as coordinate-invariance,
shift-invariance and symmetry. Experimental results show that our scheme is useful
for various purposes including smoothing and sharpening.
Editing with Spacetime Constraints: We also address how to adapt an exist-

ing motion of a human-like character to have the desired features specified by a set
of constraints. This problem can be typically formulated as a spacetime constraint
problem. Our approach combines a new inverse kinematics solver with a hierarchical
curve fitting technique. Through the fitting technique, the motion displacement of
every joint at each constrained frame is interpolated and thus smoothly propagated
to other frames. We are able to add motion details adaptively to satisfy the con-
straints within a specified tolerance by adopting a multilevel B-spline representation,
which also provides a speedup for the interpolation. The performance of our method
is further enhanced by the new inverse kinematics solver, which adjust the configu-
ration of an articulated figure to meet the constraints in each frame. We present a
3
closed-form solution to compute the joint angles of a limb linkage. This analytical
method greatly reduces the burden of a numerical optimization to find the solutions
for full degrees of freedom of a human-like articulated figure.
Multiresolution Analysis and Synthesis: Multiresolution representations are

now established as a fundamental component in signal processing. We presents a
new approach to multiresolution motion analysis and synthesis as a unified frame-
work to facilitate a variety of motion editing tasks. Motion data are represented as
a collection of coefficients that form a coarse-to-fine hierarchy. The coefficients at
the coarsest level describe the global pattern of a motion signal, while those at fine
levels provide different details at successively finer resolutions. Due to the inherent
non-linearity of the orientation space, it is challenging to generalize multiresolution
representations for motion data that contain orientations as well as positions. With
our filtering scheme, we are able to separate motion details level-by-level to build a
multiresolution representation. Our representation enables multiresolution motion
editing through level-wise coefficient manipulation to uniformly address issues raised
by motion modification, blending, and transition. The capability of our approach
is demonstrated through experiments on motion editing such as motion enhance-
ment/attenuation, blending, stitching, shuffling, and noise removal.
The remainder of the thesis is organized as follows. Chapter 2 gives a brief re-
view of unit quaternions and their relation to orientation representation, Chapter 3
describes a general scheme to design spatial filters applicable to motion data that
comprise both translation and rotation components. Chapter 4 describes a hierarchi-
cal motion representation by which we cannot only manipulate a motion adaptively
to satisfy a large set of constraints within a specified error tolerance, but also edit an
arbitrary portion of the motion through direct manipulation. Chapter 5 presents an
analysis scheme to decompose motion into a hierarchy of detail levels, followed by
a synthesis scheme which produces a new motion by hierarchically combining detail
coefficients of prescribed example motions. Finally, Chapter 6 concludes the thesis
and describes the direction of future work.
4
Chapter 2
Preliminary
Orientations as well as positions are important to describe motions. Quaternions,

discovered by Sir William Rowan Hamilton [39], provide a solid base to represent the
orientation of a 3-dimensional object. In this chapter, we give a brief introduction to
quaternions to explain the geometric and algebraic structures of the unit quaternion
space S3 .
Quaternion Basics
The four-dimensional space of quaternions is spanned by a real axis and three or-
thogonal imaginary axes, denoted by î, ĵ, and k̂, which obey Hamilton’s rules
î2 = ĵ 2 = k̂2 = îĵ k̂ = −1,
îĵ = k̂, ĵ î = −k̂,

(2.1)
ĵ k̂ = î, k̂ ĵ = −î, and
k̂ î = ĵ, îk̂ = −ĵ.
It is clear that quaternion multiplication is not commutative. Conventionally, we

denote a quaternion q = w + xî + y ĵ + z k̂ by an ordered pair of a real number and
a vector, q = (w, v) ∈ R × R3 , where v = (x, y, z). The product of two quaternions
5
q1 and q2 can be written in several forms:
q1 q2 = (w1 + x1 î + y1 ĵ + z1 k̂)(w2 + x2 î + y2 ĵ + z2 k̂)
= w1 w2 − x1 x2 − y1 y2 − z1 z2 +
(x1 w2 + w1 x2 − z1 y2 + y1 z2 )î + (2.2)
(y1 w2 + z1 x2 + w1 y2 − x1 z2 )ĵ +
(z1 w2 − y1 x2 + x1 y2 + w1 z2 )k̂,
and equivalently
q1 q2 = (w1 , v1 )(w2 , v2 )
(2.3)
= (w1 w2 − v1 · v2 , w1 v2 + w2 v1 + v1 × v2 ).
Quaternions form a non-commutative group under multiplication with its identity

1 = (1, 0, 0, 0). The inverse of a quaternion q is q−1 = (w, −x, −y, −z)/(w2 + x2 +
y 2 + z 2 ). A quaternion of unit length is called a unit quaternion, which can be
considered as a point on the unit hyper-sphere S3 .
From the fundamental theorem presented by Euler, any orientation of a rigid
body can be represented as a rotation about a fixed axis v̂ by an angle θ, where v̂ is a
3-dimensional vector of unit length. With a unit quaternion q = (cos θ2 , v̂ sin θ2 ) ∈ S3 ,
we can describe a rotation map Rq ∈ SO(3) as follows:
Rq (p) = qpq−1 , for p ∈ R3 . (2.4)
Here, the vector p = (x, y, z) is interpreted as a purely imaginary quaternion

(0, x, y, z). Note that Rq = R−q ; that is, two antipodal points, q and −q rep-
resent the same rotation in SO(3). Therefore, the mapping between S3 and SO(3)
is two-to-one.
The tangent vector at a point of a unit quaternion curve q(t) ∈ S3 lies in the
tangent space Tq S3 at the point. Multiplying the tangent vector by the inverse of
the quaternion point, we bring every tangent space to coincide with the one at the
identity 1 = (1, 0, 0, 0). Any tangent vector at the identity is perpendicular to the
6
T1 S3 ≡ R3
1
log(q)
v
q exp(v)
S3
Figure 2.1: Exponential and logarithmic maps
real axis and thus it must be a purely imaginary quaternion, which corresponds to
a vector in R3 . In physics terminology, the purely imaginary tangent is identical to
the angular velocity ω(t) ∈ R3 of q(t):
ω(t) = 2q−1 (t)q̇(t), (2.5)
ω(t)
which is measured in the local coordinate frame specified by q(t). Here, ω(t)
is an instantaneous axis of the rotation, and ω(t) is the rate of change of the
rotation angle about the axis. Since the unit quaternion space is folded by the
antipodal equivalence, the angular velocity measured in S3 is twice as fast as the
angular velocity measured in SO(3). The constant factor 2 in Equation (2.5) keeps
consistency between the unit quaternion space and the rotation space.
Exponential and Logarithmic Maps
One of the main connections between vectors and unit quaternions is the exponential
mapping. Quaternion exponentiation is defined in the standard way as:
q q qn
exp(q) = 1 + + + ··· + + ··· (2.6)
1! 2! n!
7
If the real part of q is zero, then exponential mapping gives a unit quaternion which
can be expressed in a closed-form.
v
exp(q) = exp(0, v) = (cos v, sin v) ∈ S3 . (2.7)
v
For simplicity, we often denote exp(0, v) as exp(v). This map is onto but not one-to-
one. To define its inverse function, we limit the domain such that v < π. Then, the
exponential map becomes one-to-one and thus its inverse map log : S3 \(−1, 0, 0, 0) →
R3 is well-defined.


 π

 2 v, if w = 0,
log(q) = log(w, v) = v
tan−1 v
w , if 0 < |w| < 1, (2.8)

 v

 0, if w = 1.
In a geometric viewpoint, the exponential and logarithmic maps provide a cor-

respondence between S3 and its tangent space T1 S3 ≡ R3 at the identity 1 (see
Figure 2.1). An element of T1 S3 can be interpreted as the angular velocity of a
certain angular motion at a particular time instance. For any given q in the neigh-
borhood of the identity, there exists v ∈ T1 S3 which is mapped to q by exponential
mapping, that is, q = exp(v). This implies that we can parameterize the neighbor-
hood of the identity with three variables v = (x, y, z) ∈ R3 , whereas we need at least
four variables to parameterize the entire space S3 without singularity.
Geodesics
A rigid object starting at orientation q1 will experience the orientation q2 when

rotating uniformly with constant angular velocity v = log(q−1
1 q2 ). The trajectory
traversed by the object gives the shortest path on S3 between q1 and q2 , and is
called a geodesic of S3 . In that sense, the geodesic norm
dist(q1 , q2 ) = log(q−1
1 q2 ) (2.9)
8
T1 S3 ≡ R3
1
tv
v = log(q−1
1 q2 ) exp(tv)
q−1
1 q2
t q1
1−t
slerpt (q1 , q2 ) = q1 exp(tv)
geodesic q2
curve
S3
Figure 2.2: Geodesics and spherical linear interpolation
is a natural distance metric in the unit quaternion space. This metric is bi-invariant,
that is,
dist(q1 , q2 ) = dist(aq1 , aq2 )

(2.10)
= dist(q1 b, q2 b)
for any a, b ∈ S3 .
The slerp (spherical linear interpolation) introduced by Shoemake [88] param-
eterizes the points on a geodesic curve to compute the in-betweens of two given
orientations q1 and q2 (see Figure 2.2). That is,
slerpt (q1 , q2 ) = q1 exp(t · log(q−1

1 q2 ))
(2.11)
= q1 (q−1 t
1 q2 ) .
slerpt (q1 , q2 ) describes an angular motion on the geodesic between q1 and q2 as

t changes uniformly from zero to one. This motion has constant angular velocity
log(q−1
1 q2 ).
9
Chapter 3
General Construction of
Spatial Filters for
Orientation Data
Efforts have been increasing to develop signal processing tools for filtering digitized
motion data, which comprise both translation and rotation components. A great deal
of research has been devoted to processing translation data, whereas the research
on orientation data is now emerging. In this chapter, we present a general scheme
of constructing spatial filters that perform smoothing and sharpening on orientation
data.
3.1 Motivation
Spatial filtering (as opposed to frequency domain filtering) has a variety of utilities
in digital signal processing including smoothing, sharpening, predicting, warping,
and so on [34, 46, 47]. Given a vector-valued signal (· · · , pi−1 , pi , pi+1 , · · · ) and a
spatial mask (a−k , · · · , a0 , · · · , ak ), the basic approach of spatial filtering is to sum
the products between the mask coefficients and the sample values under the mask
10
at a specific position on the signal. The i-th filter response is
F(pi ) = a−k pi−k + · · · + a0 pi + · · · + ak pi+k . (3.1)
This type of filtering is very popular for vector signals. However, if such a mask
is applied to a unit quaternion signal, then the response of the mask will not, in
general, be a quaternion of unit length, because the unit quaternion space is not
closed under addition. Azuma and Bishop had a similar problem in applying a
Kalman filter to unit quaternion signals [1]. They simply normalize filter responses,
which causes undesirable side effects such as singularity and unexpected distortion.
There have been efforts to develop digital signal processing techniques for ori-
entation signals. Lee and Shin [64] suggested smoothing operators derived from a
series of fairness functionals defined on orientation data. Such an operator can be
applied to orientation data for incrementally constructing a smooth angular motion.
Fang et al. [23] applied a low-pass filter to the estimated angular velocity of an input
signal to reconstruct a smooth angular motion by integrating the filter responses.
Hsieh et al. [43, 54] formulated the problem of smoothing orientation data as a non-
linear optimization of which objective function is defined in terms of the squared
magnitude of angular acceleration. They modified the traditional gradient-descent
method to enforce the unitariness of a quaternion signal during optimization.
All those techniques successfully exclude brute-force normalization. Their com-
mon idea is to employ exponential and logarithmic maps that have been used for
handling orientation data in recent literature [16, 40, 55, 88]. The exponential loga-
rithmic maps provide a natural, non-singular parameterization for “small” angular
displacements. This parameterization allows us to draw analogies between quater-
nion algebra and its vector counterpart. For example, the slerp (spherical linear
interpolation), that gives an intermediate point between two unit quaternion points,
is an analogue of the linear interpolation between two vector points [88].
There are a variety of possible schemes to draw a quaternion analogue of Equa-
tion (3.1). Many of the variations suffer from lack of crucial properties of spatial
filters such as shift-invariance and symmetry, or from limited applicability. Smooth-
11
ing operators presented by Lee and Shin [64] are shift-invariant but not symmetric.
The scheme of Fang et al [23] is neither shift-invariant nor symmetric. The numerical
optimization technique of Hsieh et al [43, 54] provides a special filter for smoothing
orientation data. However, this idea is not based on spatial masking and thus can
hardly be generalized to other types of spatial masking.
Our goal is to find a better analogue of spatial masking given in Equation (3.1),
which satisfies some desirable properties, i.e., coordinate-invariance, shift-invariance,
and symmetry. The basic idea is to convert the orientation data into their analogies
in a vector space to apply a spatial mask, and then to bring the result back to
the orientation space. We do not focus on designing a specific filter but propose
a general scheme applicable to a large class of spatial filters. Our interest lies in
affine-invariant spatial masks of which coefficients are summed up to one due to
their wide applicability.
In the following section, we give a brief introduction to spatial masking and ori-
entation representation. In Section 3.3, we present an orientation filtering scheme in
detail. Some examples of orientation filters are derived in Section 3.4. In Section 3.5,
we discuss detailed implementation issues and illustrates relevant experimental re-
sults. In Section 3.6, we compare our filter design scheme with others. Finally, we
conclude this paper in Section 3.7.
3.2 General Properties of Spatial Filters

Applying spatial masks to an input signal yields a class of spatial filters. It is well-
known that those filters are both linear and shift-invariant, and no others satisfy
these properties [46, 47]. Hence, a filter in that class is also called an LSI (Linear,
Shift-Invariant) filter. Let F be a filter which maps a discrete signal in Rd onto
another in Rd . The filter is linear if and only if
F(api + bpi ) = aF(pi ) + bF(pi ) (3.2)
holds for any given scalar values a and b, and vector-valued signals pi , pi ∈ Rd . The
filter is shift-invariant if its response does not depend on the position in the signal.
12
We can formulate this property more elegantly by introducing shift operator S l that
translates the signal in the time domain by l steps:
S l (pi ) = pi−l . (3.3)
Now, we can define the shift-invariance as follows: The filter F is shift-invariant if

and only if it commutes with the shift operator, that is, F ◦ S l = S l ◦ F.
Another important property is symmetry. The filter F is symmetric if it com-
mutes with mirror reflection operator R:
F ◦ R = R ◦ F, (3.4)
where R(pi ) = p−i . Since asymmetric filters could shift the moments of a signal,
symmetric filters are preferred in many cases [34, 46, 47].
3.3 Spatial Orientation Filters
3.3.1 Basic Idea
The difficulty of filtering unit quaternion data stems from the non-linear nature
of the unit quaternion space. Since the unit quaternion space is not closed under
addition and scalar multiplication, the weighted sum of unit quaternion points is
generally not a quaternion of unit length. This implies that the masking operation
given in Equation (3.1) may not be effective for unit quaternion data.
The exponential and logarithmic maps provide a clue to address this problem.
Let {qi ∈ S3 |i ≥ 0} be a unit quaternion signal that forms a piecewise slerp curve on
S3 . Through a simple derivation, each point qi can be represented as a cumulation
13
S3 R3
q2
exp(ω1)
q1
p0 ω0
exp(ω0)
q0 p1 ω1
p2
Figure 3.1: The transform between an angular signal in S3 and a linear signal in R3
of displacements from a start point:
qi = (q0 q−1 −1 −1
0 )(q1 q1 ) · · · (qi−1 qi−1 )qi
= q0 (q−1 −1
0 q1 ) · · · (qi−1 qi )

i−1
= q0 exp(ωj ), (3.5)
j=0
where ωj = log(q−1 3
j qj+1 ) ∈ R . Note that the angular displacement between two
quaternion points qj and qj+1 can be parameterized by a 3-dimensional vector ωj

using the exponential map. Thus, we can reconstruct the original signal from a start
point q0 and a sequence of displacements (ω0 , · · · , ωi−1 , · · · ). This observation leads
to a natural relationship between the unit quaternion space and the 3-dimensional
vector space. This relationship allows us not only to avoid brute-force normaliza-
tion but also to give a simple, powerful scheme for constructing spatial filters for
orientation data.
3.3.2 Filter Design
Let Q = (q0 , · · · , qi , · · · ), qi ∈ S3 , be a unit quaternion signal. We define its vector

counterpart P = (p0 , · · · , pi , · · · ), pi ∈ R3 , in such a way that each linear displace-
ment pi+1 − pi equals to the corresponding angular displacement log(q−1
i qi+1 ).
14
Then, P and Q can be transformed to each other as follows (See Figure 3.1):

i−1
pi = p0 + log(q−1
j qj+1 ), and (3.6)
j=0

i−1
qi = q0 exp(pj+1 − pj ). (3.7)
j=0
As given in Equation (3.1), let F be an affine-invariant filter of which coefficients

are summed up to one:
F(pi ) = a−k pi−k + · · · + a0 pi + · · · + ak pi+k , (3.8)
k
where m=−k am = 1. If we apply F to the vector signal P , then each point is
displaced by F(pi ) − pi . The key idea of our approach is to exploit the one-to-
one correspondence between linear displacements (or linear velocity) and angular
displacements (or angular velocity) to construct a meaningful analogue of F in the
unit quaternion space. We define the corresponding orientation filter H in such a
way that it yields the angular displacement log(q−1
i H(qi )) which equals to the linear
displacement F(pi ) − pi . The resulting orientation filter is
H(qi ) = qi exp(F(pi ) − pi ). (3.9)
The unitariness of filter responses is guaranteed, since the unit quaternion space is
closed under the quaternion multiplication. Conceptually, our filtering scheme is
to transform the input orientation signal Q to its analogue P in a vector space, to
apply a mask to P in a normal way, and finally to generate a filter response through
inverse transformation (See Figure 3.2).
The notion of local support is important for designing a filter that corresponds
to a mask of finite size. Since we are dealing with a sequence of displacements, the
filter with an infinite support may cause a large discrepancy at the end of the signal
15
qi pi
qi−2
qi+2 pi−2
qi−1 qi+1 pi+1 pi+2
pi−1
qi pi
exp(F (pi )−pi ) F (pi )−pi
H(qi ) F (pi )
Figure 3.2: The conceptual view of filtering orientation data
even for a small deviation at each time instance. Letting pim = pi+m − pi ,

m=k
H(qi ) = qi exp(( am pi+m ) − pi )
m=−k

m=k
= qi exp( am (pi+m − pi ))
m=−k

m=k
= qi exp( am pim ). (3.10)
m=−k
In Equation (3.10), the angular displacement caused by H is described in terms

of the masking operation over pim which can be computed from Equation (3.6) as
follows:

 m −1

 j=1 log(qi+j−1 qi+j ), if m ≥ 1,

pim = 0, if m = 0, and (3.11)



 −1 −1
j=m − log(qi+j qi+j+1 ), if m ≤ −1.
16
Clearly, H(qi ) is locally supported by the neighboring points (qi−k , · · · , qi , · · · , qi+k ).
The size of support of H is identical to that of F. One interesting observation is
that the explicit evaluation of pi and F(pi ) is not actually needed for computing
H(qi ) although we originally define H in terms of them.
Letting ωi = log(q−1
i qi+1 ), we can further simplify Equation (3.10):

H(qi ) = qi exp a1 ωi + a2 (ωi + ωi+1 ) + · · · + ak (ωi + · · · + ωi+k−1 )

− a−1 ωi−1 − a−2 (ωi−1 + ωi−2 ) − · · · − a−k (ωi−1 + · · · + ωi−k )

= qi exp (a1 + a2 + · · · + ak )ωi + (a2 + a3 + · · · + ak )ωi+1 + · · · + ak ωi+k−1

− (a−1 + · · · + a−k )ωi−1 − (a−2 + · · · + a−k )ωi−2 − · · · − a−k ωi−k
k−1
k−1

= qi exp bm ωi+m = qi exp bm log(q−1 q
i+m i+m+1 ) , (3.12)
m=−k m=−k
where

 k if 0 ≤ m ≤ k − 1,
j=m+1 aj ,
bm = (3.13)
 m
j=−k −aj , if −k ≤ m < 0.
Equation (3.12) will be used in the next section for proving crucial properties of the
filter.
3.3.3 Properties of Orientation Filters
As mentioned earlier, the exponential map exp(v) is defined for all v ∈ R3 but its
inverse, that is, the logarithm is not well-defined at −I = (−1, 0, 0, 0). To evaluate
ωi reliably, we need an assumption that the angle between any pair of consecutive
points is smaller than π
2, that is, log(q−1
i qi+1 ) <
π
2 for all i. π
2 in the unit
quaternion space is equivalent to π in the orientation space, and thus our assumption
is reasonable in practice.
The orientation filter H inherits important properties from spatial masking,
m=k
since the angular displacement m=−k am pim caused by H is represented as a mask-
17
ing operation on a vector signal. The first property we will prove in this section is
coordinate-invariance. Due to this property, our filter gives the same results inde-
pendent of the coordinate system in which the orientation data are represented.
Proposition 1 H is coordinate-invariant, that is, aH(qi )b = H(aqi b) for any a

and b ∈ S3 .
Proof: The first step of the proof is to show that exp(b−1 vb) = b−1 exp(v)b for
any v ∈ R3 and b ∈ S3 . Since v = b−1 vb,
sin v −1
exp(b−1 vb) = (cos v, b vb)
v
sin v −1
= (cos v, 0) + (0, b vb)
v
sin v
= b−1 (cos v, 0)b + b−1 (0, v)b
v
sin v
= b−1 (cos v, v)b
v
= b−1 exp(v)b.
Similarly, we can show that log(b−1 qb) = b−1 log(q)b for any q and b ∈ S3 . Then,
we have
k−1

−1 −1 −1
H(aqi b) = aqi b exp bm log(b qi+m a aqi+m+1 b)
m=−k

k−1
−1 −1
= aqi b exp b bm log(qi+m qi+m+1 )b
m=−k
k−1

−1
= aqi exp bm log(qi+m qi+m+1 ) b
m=−k
= aH(qi )b.

Since the support of H is finite, we can show that H is shift-invariant.
Proposition 2 H is shift-invariant.
18
Proof: Using Equation (3.12), we show that H commutes with S l for any l:
k−1

−1
S ◦ H(qi ) = qi−l exp
l
bm log(q(i+m)−l q(i+m+1)−l )
m=−k
k−1

−1
= qi−l exp bm log(q(i−l)+m q(i−l)+m+1 )
m=−k
= H(qi−l ) = H ◦ S l (qi ).

Now, we show that H is symmetric for any given symmetric coefficients.
Proposition 3 H is symmetric, if its coefficients (a−k , · · · , a0 , · · · , ak ) are sym-

metric.
Proof: We will be done if we show that H commutes with R. We first expand R◦H
using Equation (3.12):
k−1

−1
R ◦ H(qi ) = q−i exp bm log(q−i−m q−i−m−1 )
m=−k
k−1

−1 −1
= q−i exp bm log(q−i−m−1 q−i−m )
m=−k
k−1

−1
= q−i exp −bm log(q−i−m−1 q−i−m ) .
m=−k
From Equation (3.13), bm = −b−m−1 when am = a−m . Therefore, letting n =

−m − 1,
k−1

−1
R ◦ H(qi ) = q−i exp −b−n−1 log(q−i+n q−i+n+1 )
n=−k
k−1

−1
= q−i exp bn log(q−i+n q−i+n+1 )
n=−k
= H(q−i ) = H ◦ R(qi ).
19
3.4 Examples
In this section, we provide some examples of orientation filters that correspond to
popular spatial filters such as smoothing, blurring and sharpening filter masks.
Smoothing: Our first example is a smoothing filter mask that is of practical use

in signal processing [59, 60]. The smoothness measure p (t)2 dt is minimized
if the corresponding Euler-Lagrange equation p (t) = 0 holds [28]. The discrete
version of the Euler-Lagrange equation, ∆4 pi = 0, is obtained by replacing differ-
ential operators with forward divided difference operators. It is well-known that an
iterative scheme using a local update rule
pi ← pi − λ∆4 pi (3.14)
gradually adjusts the data points to approach the optimal solution [59, 60]. Here,
λ is a damping factor that controls the rate of convergence. This update rule yields
an affine-invariant spatial mask ( −λ 4λ 16−6λ 4λ −λ
16 , 16 , 16 , 16 , 16 ) and the corresponding ori-
entation filter can be derived from Equation (3.10) as follows:
−λ i 4λ i 16 − 6λ i 4λ i −λ i
HS (qi ) = qi exp( p + p + p0 + p + p )
16 −2 16 −1 16 16 1 16 2
λ
= qi exp( (−(−ωi−2 − ωi−1 ) + 4(−ωi−1 ) + 4ωi − (ωi + ωi+1 )))
16
λ
= qi exp( (ωi−2 − 3ωi−1 + 3ωi − ωi+1 )). (3.15)
16
There is a strong analogy between orientation signal Q and its vector counterpart
P in a sense that the estimated velocity and acceleration of the linear motion rep-
resented by P are identical to the estimated angular velocity and acceleration, re-
spectively, of the angular motion represented by Q. Accordingly, if a given filter

mask is able to minimize i p (ti )2 , then its orientation counterpart HS can be

expected to minimize the corresponding measure i ω (ti )2 to give a fair angular
motion. Here, the angular acceleration at a discrete time instance is estimated as
20
follows [43, 64]:
log(q−1 −1
i qi+1 ) − log(qi−1 qi )
ω (ti ) = , (3.16)
h2
where h is the time interval between two successive points.
Blurring: A more popular class of filters are derived from the binomial distribu-
tion [46]. Binomial coefficients give a low-pass filter that suppresses Gaussian noise
and blurs the details of the signal. The coefficients of an odd-sized (2k + 1) binomial
mask can be written:
1 (2k)!
Bi2k+1 = 2k
, −k ≤ i ≤ k. (3.17)
2 (k − i)!(k + i)!
For a large k, the binomial distribution closely approximates Gaussian distribution.

Thus, binomial masks are often used for approximating the ideal Gaussian filter.
1 4 6 4 1
In particular, the orientation filter which corresponds to B 5 = ( 16 , 16 , 16 , 16 , 16 ) is
derived from Equation (3.10):
1
HB (qi ) = qi exp( (−ωi−2 − 5ωi−1 + 5ωi + ωi+1 )). (3.18)
16
Sharpening: Our final example is a high-frequency boost filter. Sharpening has

commonly been achieved through spatial masking to reveal the detailed features
of the signal more clearly [34]. Because fine details of a signal are the primary
contributors to the high-frequency components of a signal, high-frequency boosting
sharpens the signal. The high-frequency components can be extracted by blurring
the original signal and subtracting the blurred signal from the original one. The
high-frequency components thus obtained are added to the original signal as follows:
(high-boost) = (original) + λ(high-pass)
= (original) + λ((original) − (blurred)).
21
If the binomial mask B 5 is used for blurring the original signal, then the coeffi-
cients ( −λ −4λ 16+10λ −4λ −λ
16 , 16 , 16 , 16 , 16 ) for sharpening is obtained. By substituting the
coefficients in Equation (3.10), we have the orientation filter
λ
HU (qi ) = qi exp( (ωi−2 + 5ωi−1 − 5ωi − ωi+1 )) (3.19)
16
that performs high-frequency boosting on orientation signals.
3.5 Experiments
3.5.1 Implementation Issues
In this section, we describe detailed issues for facilitating implementation.
Antipodal Equivalence: We have inherent ambiguity in a unit quaternion sig-

nal (q0 , · · · , qn ) due to the antipodal equivalence. Since any unit quaternion point
and its antipode represent the same orientation, the signs of quaternion points in
a signal are often chosen arbitrarily. However, filter responses are quite depen-
dent on the signs, and thus we need to correct them consistently. Assuming that
log(q−1
i qi+1 ) <
π
2 for all i, we determine the sign of each point in the signal such
that the point is placed near its adjacent neighbors. To do so, we first fix the sign
of the first point q0 and then replace qi with −qi sequentially for each i > 0, if the
π
angle between qi−1 and qi is larger than 2.
Sampling Rate: Yet another ambiguity would stem from the slow sampling rate of
a motion capture system. Consider an object which spins at the rate of 2π times the
sampling rate. Then, motion data obtained from the object may be indistinguishable
from the one captured from a stationary object. In practice, the sampling rate is
fast enough not to cause such a problem and thus the angle between two consecutive
orientations in a signal is sufficiently small. In our experiments, we assume that each
subsequence (qi−k , · · · , qi , · · · , qi+k ) of an input signal is inside an open half-sphere
whose center is at qi for a given spatial mask of size (2k + 1). With this assumption,
22
the geodesic distance from qi to any of its neighboring points under the mask is less
π
than 2.
Boundary Conditions: In general, the input signal is neither infinite nor peri-
odic. The signal has boundary points, and the left boundary seldom has anything
to do with the right boundary. A periodic extension can be expected to have a dis-
continuity. The natural way to avoid this discontinuity is to reflect the signal at its
endpoints to seamlessly extend the signal [90]. Let (q0 , · · · , qn ) be a unit quaternion
signal and ωi = log(q−1
i qi+1 ), 0 ≤ i < n, be the angular displacements of the signal.
Then, the extension of the signal at both boundaries yields


 ω , if i < 0,
−i
ωi = (3.20)
 ω if i ≥ n.
2n−i−2 ,
3.5.2 Experimental Results
As shown in Figure 3.3, we first apply our orientation filter to synthetic motion
data to visualize the effect of the filtering. The initial orientation data (top left) are
uniformly sampled from a unit quaternion spline curve and perturbed with noises
by multiplying each sample qi with the exponent eδi of a 3D vector δi < 0.2,
which is randomly-generated. This moves each sampled point slightly in a random
direction up to 0.2 radians. We apply smoothing filter HS to the initial motion once
(top right), twice (bottom left), and ten times (bottom right) to illustrate a series
of incrementally refined motions. The smoothing effect is clearly observed along the
trajectories of the tips of bird wings.
We also apply the filter to captured motion data. Our motion capture system
(MotionStar, Ascension Technology) consists of a magnetic field transmitter and 14
trackers each of which is attached to a link of a puppeteer and detects both the
position and orientation of the link measured in the global coordinate system. As
shown in Figure 3.4, we capture a live athletic stretching motion. This motion is
sampled at the rate of 30 frames per second. In particular, we concentrate on the
orientation signal for the left shoulder. In Figure 3.5(a), x-axis represents the frame
numbers of the signal and y-axis does the magnitude of each component of unit
23
quaternions. The magnitude of angular acceleration is plotted to show the noise in
the captured signal (See Figure 3.5(b)).
In Figures 3.5(c) and (e), we use smoothing filters HS and HB , respectively, to
reduce the noise in the signal. Each filter is applied to the signal five times. The effect
of smoothing is clearly shown in the corresponding magnitude plots of angular accel-
eration (See Figures 3.5(d) and (f)). On the contrary, the high-frequency boosting
filter HU enhances the high-frequency components of the signal and thus the esti-
mated angular acceleration vectors are magnified as expected (See Figures 3.5(g)
and (h)).
3.6 Discussion
As mentioned earlier, there are many variations in drawing a quaternion analogue
of spatial masking. In this section, we compare our filter design scheme with others.
Global vs. Local Parameterization: The naive use of the exponential and
logarithm maps could lead to a troublesome orientation filter
H̄(qi ) = exp(F(gi )), (3.21)
where gi = log(qi ). Since the logarithm of a unit quaternion is not well-defined at

−I = (−1, 0, 0, 0), the filter H̄ is also undefined at that point. Furthermore, H̄(qi ) is
not invariant even under a constant rotation, and thus we may have quite different
results for the change of the coordinate system in which the orientation data are
represented. The main reason of these problems is that there is no globally non-
singular three parameter representation for rotations. Instead, we exploit the spatial
coherency to locally parameterize the angular displacement log(q−1
i qi+1 ) between
each pair of consecutive points in the signal. The local parameterization enables us
to design the coordinate-invariant orientation filters which are singularity-free for
sufficiently dense samples.
24
Symmetric vs. Asymmetric Transform: Fang et al [23] also employed the idea
of the local parameterization to design an orientation filter H̃ in that a corresponding
spatial filter F is directly applied to a sequence of angular displacements as follows:

i−1
H̃(qi ) = q0 exp(F(ωj )), (3.22)
j=0
where ωj = log(q−1
j qj+1 ). However, this filter has some drawbacks. To illustrate
this, consider the support of the filter. Since the responses of F are explicitly
cumulated to produce the response of H̃, the value of H̃(qi ) is influenced by the
non-local, asymmetric neighbors qj for 0 ≤ j ≤ i + k, and thus the filter H̃ may
yield a larger deviation at the end of the signal. Instead, in our orientation filter

m=k
H(qi ) = qi exp(F(pi ) − pi ) = qi exp( am pim ), (3.23)
m=−k
we consider the displacement, (F(pi )−pi ), gained by a given filter. Unlike the direct
filter response F(pi ), this displacement is computed from a local neighborhood of
qi without explicitly evaluating pi and F(pi ). Therefore, the deviation at a frame
is not propagated to other frames and thus we do not have any exaggeration at the
end of the signal.
Algebraic vs. Geometric Approach: It is well-known that any affine combi-

nation of multiple points can be computed by converting it into a sequence of linear
interpolations between pairs of points. For example, consider the de Casteljau con-
struction of a Bézier curve that is defined as an affine sum of control points for which
Bernstein basis functions determine the weights. The de Casteljau algorithm evalu-
ates a point on a Bézier curve only with repeated linear interpolations. Analogously,
we would compute an affine combination of unit quaternion points by converting
it into repeated “slerping” [77, 88]. Such a geometric scheme may result in a very
complicated algebraic form. Therefore, it is hard, if not impossible, to analyze the
properties of the geometric scheme.
25
Figure 3.3: Bird flying
Figure 3.4: The live-captured motion
26
1 3.5
0.8
3
0.6
2.5
0.4
0.2 2
0 1.5
-0.2
w 1
-0.4 x
y 0.5
-0.6 z
-0.8 0
-1
0 20 40 60 80 100 120 140 160 180 200 0 20 40 60 80 100 120 140 160 180 200
(a) Motion capture data (b) ω (t)2 (radian/s2 )

1 3.5
0.8
3
0.6
2.5
0.4
0.2 2
0 1.5
-0.2
1
-0.4
-0.6 0.5
-0.8 0
-1
0 20 40 60 80 100 120 140 160 180 200 0 20 40 60 80 100 120 140 160 180 200
(c) Smoothing by HS (d) ω (t)2 (radian/s2 )

1 3.5
0.8
3
0.6
2.5
0.4
0.2 2
0 1.5
-0.2
1
-0.4
-0.6 0.5
-0.8 0
-1
0 20 40 60 80 100 120 140 160 180 200 0 20 40 60 80 100 120 140 160 180 200
(e) Blurring by HB (b) ω (t)2 (radian/s2 )

1 35
0.8 original
30
filtered
0.6
25
0.4
0.2 20
0 15
-0.2
10
-0.4
-0.6 5
-0.8 0
-1
0 20 40 60 80 100 120 140 160 180 200 0 20 40 60 80 100 120 140 160 180 200
(c) Sharpening by HU (d) ω (t)2 (radian/s2 )
Figure 3.5: Experimental results of orientation filters
27
Chapter 4
Motion Editing with

Spacetime Constraints
Much of the recent research in character animation has been devoted to developing
various kinds of editing tools to produce a convincing motion from prerecorded
motion clips. To reuse motion-captured data, animators often adapt them to a
different character, i.e., retargetting a motion from one character to another [33], or
to a different environment to compensate for geometric variations [7, 98]. Animators
also combine two motion clips in such a way that the end of one motion is seamlessly
connected to the start of the other [82].
This chapter presents a technique for adapting an existing motion of a human-
like character to have the desired features specified by a set of constraints. This tech-
nique can be used for retargetting a motion to compensate for geometric variations
caused by both characters and environments, as well as for directly manipulating
motion clips through a graphical interface.
4.1 Basic Idea

The core of the motion manipulation can be modeled as a spacetime constraint
problem [32, 33, 82]. Each kinematic constraint specifies the desired position or
28
orientation of an end-effector, such as a foot and a hand, of an articulated figure at a
specific time. The important features of the target motion are specified interactively
as constraints, and the captured motion is deformed to satisfy those constraints.
Motion data consist of a bundle of motion signals. Each signal represents a
sequence of sampled values for each degree of freedom. Those signals are sampled
at a sequence of discrete time instances with a uniform interval to form a motion
clip that consists of a sequence of frames. In each frame, the sampled values from
the signals determine the configuration of an articulated figure at that frame, and
thus they are related to each other by kinematic constraints. This structure yields
two relationships among sampled values: inter-frame and intra-frame relationships.
Through the use of an inverse kinematics solver, the intra-frame relationship, that
is, the configuration of an articulated figure within each frame can be adjusted to
meet the kinematic constraints. However, if each frame is considered independently,
then there could be an undesirable jerkiness between consecutive frames. Therefore,
we have to take account of the inter-frame relationship as well. For this purpose,
we employ the multilevel B-spline fitting technique. We also present an efficient
inverse kinematics algorithm which is used in conjunction with the fitting technique.
Our approach is distinct from the work of Gleicher [33] who addressed the same
problem. He provided a unified approach to fuse both relationships into a very
large non-linear optimization problem, which is cumbersome to handle. Instead, we
decouple the problem into manageable subproblems each of which can be solved very
efficiently.
Multilevel B-spline fitting techniques have been investigated to design smooth
surfaces which interpolate scattered features within a specified tolerance [27, 65, 66,
95]. Among them, we extend the technique presented by Lee et al. [66] for adapting a
motion to satisfy the constraints which are scattered over the frames. The multilevel
B-splines make use of a coarse-to-fine hierarchy of knot sequences to generate a series
of uniform cubic B-spline curves whose sum approaches the desired function. At each
level in the hierarchy, the control points of the B-spline curve are computed locally
with a least-squares method which provides an interactive performance. With this
fitting technique, we cannot only manipulate a curve adaptively to satisfy a large
29
set of constraints within a specified error tolerance, but also edit a curve at any
level of detail to allow an arbitrary portion of the motion to be affected through
direct manipulation. Exploiting these favorable properties of the multilevel B-spline
curves, we conveniently derive a hierarchy of displacement maps which are applied
to the original motion data to obtain a new, smoothly modified motion. Because of
this displacement mapping, the detail characteristics of the original motion can be
preserved [7, 98].
The performance of our approach is further enhanced by our new inverse kine-
matics solver. It is commonplace to formulate the inverse kinematics with multiple
targets as a constrained non-linear optimization for which the computational cost is
expensive [82, 99]. As noticed by Korein and Badler [62], we can find a closed-form
solution to the inverse kinematics problem for a limb linkage which consists of three
joints, for example, shoulder-elbow-wrist for the arm and hip-knee-ankle for the
leg. We combine this analytical method with a numerical optimization technique to
compute the solutions for full degrees of freedom of a human-like articulated figure.
Our hybrid algorithm enables us to edit the motions of a 37 DOF articulated figure,
interactively.
The remainder of this chapter is organized as follows. After a review of previous
works, we give an introduction to the displacement mapping and the multilevel B-
spline fitting technique in Section 4.3. In Section 4.4, we present our motion editing
technique. In Section 4.5, we describe two inverse kinematics algorithms: One is
designed to manipulate a general tree-structured articulated figure and the other is
specialized to a human-like figure with limb linkages. In Section 4.6, we describe
how to specify joint limits in a unit quaternion representation. In Section 4.7, we
demonstrate how our technique can be used for motion capture-based animation
which includes adapting a motion from one character to another, fitting a recorded
walk onto a rough terrain and performing seamless transitions among motion clips.
In Section 4.8, we compare our algorithm to the previous approaches at several
viewpoints.
30
4.2 Related Work
4.2.1 Motion Editing
There have been an abundance of research results to develop motion editing tools.
Bruderlin and Williams [7] showed that techniques from the signal processing do-
main can be applied to manipulating animated motions. They introduced the idea
of displacement mapping to alter a motion clip. Witkin and Popović [98] presented
a motion warping technique for the same purpose. Bruderlin and Williams also pre-
sented a multi-target interpolation with dynamic time warping to blend two motions.
Unuma et al. [94] used Fourier analysis techniques to interpolate and extrapolate mo-
tion data in the frequency domain. Wiley and Hahn [96] and Guo and Robergé [37]
investigated spatial domain techniques to linearly interpolate a set of example mo-
tions. Rose et al. [81] adopted a multidimensional interpolation technique to blend
multiple motions all together.
Witkin and Kass [97] proposed a spacetime constraint technique to produce
the optimal motion which satisfies a set of user-specified constraints. Brotman and
Netravali [6] achieved a similar result by employing optimal control techniques. The
spacetime formulation leads to a constrained non-linear optimization problem. Co-
hen [14] developed a spacetime control system which allows a user to interactively
guide a numerical optimization process to find an acceptable solution in a feasible
time. Liu et al. [68] used a hierarchical wavelet representation to automatically
add motion details. Rose et al. [82] adopted this approach to generate a smooth
transition between motion clips. Gleicher [32] simplified the spacetime problem by
removing the physics-related aspects from the objective function and constraints to
achieve an interactive performance for motion editing. He also applied this technique
for motion retargetting [33].
4.2.2 Hierarchical Curve/Surface Manipulation
There is a vast amount of literature devoted to investigating hierarchical represen-

tations of curves and surfaces. Schmitt et al. [84] presented an adaptive subdivision
method to produce a smooth surface from sampled data. Forsey and Bartels [26]
31
introduced a hierarchical B-spline representation to enhance surface modeling ca-
pability. This representation allows details to be adaptively added to the surface
through local refinement. They also employed the hierarchical representation for
fitting a spline surface to the regular data sampled at grid points [27]. Welch and
Witkin [95] proposed a variational approach to directly manipulate a B-spline sur-
face with scattered features, such as points and curves. Lee et al. [65, 66] suggested
an efficient method for interpolating scattered data points. They also demonstrated
that image warping applications can be cast as a surface fitting problem by adopting
the idea of displacement mapping. Although authors used different terms, such as
hierarchical and multilevel B-spline surfaces, to refer to their hierarchical structures,
their underlying ideas are the same, that is, a coarse-to-fine hierarchy of control lat-
tices. Another class of approaches is due to multiresolution analysis and wavelets.
Finkelstein and Salesin [25] used B-spline wavelets for multiresolution editing of
curves. Many authors have investigated multiresolution analysis for manipulating
spline surfaces and polygonal meshes [10, 22, 69, 85].
4.2.3 Inverse Kinematics
Traditionally, inverse kinematics solvers can be divided into two categories: analytic
and numerical solvers. Most industrial manipulators are designed to have analytic
solutions for efficient and robust control. Kahan [48] and Paden [74] independently
discussed methods to solve an inverse kinematics problem by reducing it into a
series of simpler subproblems whose closed-form solutions are known. Korein and
Badler [62] showed that the inverse kinematics problem of a human arm and leg
allows an analytic solution. Actual solutions are derived by Tolani and Badler [93].
A numerical method relies on an iterative process to obtain a solution. Girard
and Maciejewski [31] addressed the locomotion of a legged figure using Jacobian ma-
trix and its pseudo inverse. Koga et al. [61] made use of results from neurophysiology
to achieve an “experimentally” good initial guess and then employed a numerical
procedure for fine tuning. Zhao and Badler [99] formulated the inverse kinematics
problem of a human figure as a constrained non-linear optimization problem. Rose
et al. [82] extended this formulation to handle variational constraints that hold over
32
an interval of motion frames.
4.3 Preliminary
4.3.1 Displacement Mapping
The configuration of an articulated figure is specified by its joint angles in addition

to the position and orientation of the root segment. We will denote the position of
the root by a 3-dimensional vector and the others by unit quaternions. It is well-
known that unit quaternions can represent 3-dimensional orientations smoothly and
compactly without singularity [88]. This representation can also describe a human
joint conveniently.
A motion is a time-varying function which provides the configuration of an
articulated figure at a time. We denote a motion by m(t) = (p(t), q1 (t), · · · , qn (t))T ,
where p(t) ∈ R3 and q1 (t) ∈ S3 describe the translational and rotational motion of
the root segment, and qi (t) ∈ S3 gives the rotational motion of the (i − 1)-th joint
for 2 ≤ i ≤ n.
A displacement map (also called a warp function) describes the difference be-
tween two motions [7, 98]. Gleicher [33] provided a good explanation to introduce
this technique into a spacetime formulation. In our mathematical setting, the dis-
placement map is defined as d(t) = m(t) m0 (t), where d(t) = (v0 (t), · · · , vn (t))T
and vi (t) ∈ R3 for 0 ≤ i ≤ n. Thus, a new motion can be obtained by applying the
displacement map to the original motion as m(t) = m0 (t) ⊕ d(t), that is,
       
p p0 v0 p0 + v0
       
       
 q1   q10   v1   q10 exp(v1 ) 
 = ⊕ = . (4.1)
 ..   ..   ..   .. 
 .   .   .   . 
       
qn qn0 vn qn0 exp(vn )
v
Here, exp(v) denotes a 3-dimensional rotation about the axis v ∈ R3 by angle
v ∈ R. With the displacement mapping, we are able to deal with both position
and orientation data in a uniform way; the displacement map is a homogeneous
33
array of 3-dimensional vectors, while the configuration of an articulated figure is
represented as a heterogeneous array of a vector and unit quaternions.
4.3.2 Multilevel B-spline Approximation
Lee et al. [66] proposed a multilevel B-spline approximation technique for fitting a
spline surface to scattered data points. In this section, we give a brief summary to
introduce their fitting technique. Since we need to manipulate a curve rather than
a surface, our derivation focuses on curve fitting.
Let Ω = {t ∈ R|0 ≤ t < n} be a domain interval. Consider a set of scattered
data points P = {(ti , xi )} for ti ∈ Ω. To interpolate the data points, we formulate an
approximation function f as a B-spline function which is defined over a uniform knot
3
sequence overlaid on the domain Ω. The function f (t) = k=0 Bk (t − t)bt+k−1
can be described in terms of its control points and uniform cubic B-spline basis
functions Bk , 0 ≤ k ≤ 3. Here, bi is the i-th control point on the knot sequence
for −1 ≤ i ≤ n + 1. With this formulation, the problem of deriving function f is
reduced to that of finding the control points that best approximate the data points
in P.
Since each control point bj is influenced by the data points in its neighborhood,
we can define the proximity set Pj = {(ti , xi ) ∈ P|j − 2 ≤ ti < j + 2} which affect
the value of bj . Simple linear algebra using pseudo inverse provides a least-squares
solution
2
(t ,xi )∈Pj ωij βij
bj = i 2 (4.2)
(ti ,xi )∈Pj ωij

which minimizes a local approximation error (ti ,xi )∈Pj f (ti ) − xi 2 . Here, ωij =
ωij xi
Bj+1−ti (ti − ti ) comes from a B-spline basis function, and βij = 3
Bk (ti −ti )
k=0
takes an effect to pull the curve toward a data point (ti , xi ).

There is a trade-off between the shape smoothness and accuracy of the approx-
imation function. If the knot spacing of the approximation function is too coarse, it
may leave large deviations at the data points. Conversely, if the function is defined
over an excessively fine knot sequence, its shape would undulate through the data
34
f0 f0 + f1 f0 + f1 + f2
f1 f2
Figure 4.1: Hierarchical curve fitting to scattered data through multilevel B-spline
approximation
points. Multilevel B-spline approximation uses a series of B-spline functions with

different knot spacings to achieve a smooth shape while accurately approximating
given data points in P. The function from the coarsest knot sequence provides a
rough approximation, which is further refined in accuracy by the functions derived
from subsequent finer knot sequences.
A multilevel B-spline function is a sum of cubic B-spline functions f0 , · · · , fh
which are defined over uniform knot sequences overlaid on domain Ω. The knot
sequences yield a coarse-to-fine hierarchy. Without loss of generality, we assume that
the knot spacing of fi is coarser than that of fj for any i < j. The multilevel B-spline
approximation begins by determining the control points of f0 with Equation (4.2),
which serves as a smooth initial approximation. f0 may leave a deviation ∆1 xi = xi −
f0 (ti ) for each point (ti , xi ) in P. The next finer function f1 is used to approximate
the difference D1 = {(ti , ∆1 xi )}. Then, the sum f0 + f1 yields a smaller deviation
∆2 xi = xi −f0 (ti )−f1 (ti ) for each point (ti , xi ). At a level k of the hierarchy, function
k−1
fk is derived to approximate Dk = {(ti , ∆k xi )} where ∆k xi = xi − l=0 fl (ti ).
By repeating this process to the finest level, we can incrementally derive the final
h
approximation function f = k=0 fk .
35
4.4 Hierarchical Motion Fitting
4.4.1 Hierarchical Displacement Mapping
Given the original motion m0 and a set C of constraints, our problem is to derive
a smooth displacement map d such that a target motion m = m0 ⊕ d satisfies the
constraints in C. Current motion editing techniques represent the displacement map
as an array of spline curves defined over a common knot sequence [33, 98]. Each
spline curve gives the time-varying motion displacement of its corresponding joint.
With a finer knot sequence, we can possibly find a solution that accurately satisfies
all the constraints in C. However, we have to pay a higher computational cost for the
accuracy due to the finer knot sequence. Ideally, we wish to determine the density of
knots to yield just enough shape freedom for an exact solution. However, the target
motion is not known in advance and thus we require the displacement map which
allows details to be added by adaptively refining the knot sequence.
We adopt the hierarchical structure [66] reviewed in Section 4.3.2 to perform
this adaptive refinement. The multilevel B-spline approximation technique was em-
ployed to derive a warp function for image morphing and geometry reconstruction.
In our context, we extend this technique to handle motion data. From the displace-
ment map d, we derive a series of successively finer submaps d1 , · · · , dh that give
the corresponding series of incrementally refined motions, m1 , · · · , mh .
mh = (· · · ((m0 ⊕ d1 ) ⊕ d2 ) ⊕ · · · ⊕ dh ). (4.3)
Here, dk , 1 ≤ k ≤ h, is represented by an array of cubic B-spline curves in the

3-dimensional vector space. The component curves of dk are defined over a common
sequence τk of knots that are uniformly spaced. The knot sequences τk , 1 ≤ k ≤ h,
form a coarse-to-fine hierarchy. τ1 is placed on the coarsest level in the hierarchy and
τh is on the finest level. The motion m1 = m0 ⊕ d1 provides a rough approximation
to a target motion, which is further refined by applying d2 to give a more accurate ap-
proximation m2 . By applying the submaps one by one, we can incrementally obtain
h
the final motion. Be aware that it is a very frequent mistake to have d = k=1 dk
36
from an erroneous derivation, that is, to substitute exp(v1 ) exp(v2 ) · · · exp(vh ) for
exp(v1 + v2 + · · · + vh ) in Equation (4.1) and (4.3). This derivation is not correct,
since the quaternion multiplication is not commutative.
4.4.2 Constraints
To specify the desired features of the target motion, two categories of constraints are
employed: The ones in the first category are used to describe an articulated figure
itself, such as a joint limit and an anatomical relationship among joints. Those in the
other category are for placing end-effectors of the figure at particular positions and
orientations which are interactively specified by the user or automatically derived
from the interaction between the figure and its environment. For example, we first
specify the contact point between the foot and the ground through a graphical
interface and automatically modify the point later in accordance with the geometric
variation of the ground. We assume that a constraint in either category is defined
at a particular instance of time. A variational constraint that holds over an interval
of motion frames can be realized by a sequence of constraints for the time interval.
An ordered pair (tj , Cj ) specifies the set Cj of constraints at a frame tj .
4.4.3 Motion Fitting
In order to compute a displacement map, it is necessary to estimate the motion

displacement of each joint at every constrained frame. The displacement of the
joint at a particular frame is interpolated by the corresponding component curve
of the displacement map and thus smoothly propagated to the neighboring frames.
Suppose that we are now at the k-th level for 1 ≤ k ≤ h. At each constrained
frame tj , our inverse kinematics solver gives the configuration mtj of the character,
that meets a given set of constraints (tj , Cj ). Since there may exist many possible
configurations that satisfy all constraints in Cj , we consistently choose the one that
is minimally deviated from the motion mk−1 at the previous level. That is, we
37
minimize
dtj 2α = mtj mk−1 (tj )2α

n (4.4)
= αi vi 2 ,
i=0
where dtj = (v0 , · · · , vn ). We can control the rigidity of an individual joint by

adjusting its weight value.
Combining the inverse kinematics solver with the hierarchical structure given
in Equation (4.3), we give the following motion fitting algorithm:
Algorithm 1 Hierarchical motion fitting

INPUT : the original motion m0 , the set C of constraints
OUTPUT : a new motion mh
1: for k := 1 to h do
2: D := ∅
3: for all (tj , Cj ) ∈ C do
4: mtj := IK solver(Cj , mk−1 (tj ))
5: dtj := mtj mk−1 (tj )
6: D := D ∪ (tj , dtj )
7: end for
8: Compute dk by curve fitting to D
9: mk := mk−1 ⊕ dk
10: end for
This algorithm evaluates dk , 1 ≤ k ≤ h, in the coarse-to-fine order. At each

level k in the hierarchy, we compute the motion displacement dtj = mtj mk−1 (tj )
for all (tj , Cj ) in C using the inverse kinematics solver (See lines 2–7). Here, mk−1
is either the original motion (k = 1) or has been already obtained in the previous
level (k > 1). The newly computed displacements in D = {(tj , dtj )} are used as the
keyframes for deriving the displacement map dk (See line 8). Several techniques are
available for computing dk that interpolates the displacements. One possible solu-
tion is to employ an iterative numerical method that may give an optimal solution.
However, we choose a curve fitting method given in Equation (4.2) to achieve an
interactive performance. Our method is extremely fast, since the solution can be
obtained analytically unlike the numerical method that normally requires a heavy
computational cost for iterations. In general, local curve fitting with B-splines may
38
Figure 4.2: A live-captured walking motion was interactively modified at the middle
frame such that the character bent forward and lowered the pelvis. The character
is depicted at the modified frame. The range of deformation is determined by the
density of the knot sequences. The knots in τ1 are spaced every (top) 4, (middle) 6,
and (bottom) 12 frames.
have several drawbacks. The resulting curve may be less accurate and could have
undulations because of the lack of the global propagation of a displacement. For-
tunately, our hierarchical structure can compensate for such drawbacks by globally
propagating the displacement at a coarse level and performing the later tuning at
fine levels.
4.4.4 Knot Spacing
For simplicity, our implementation doubles the density of a knot sequence from one
level to the next. Therefore, if τk has (n + 3) control points on it, then the next
finer knot sequence τk+1 will have (2n + 3) control points. The density of a knot
sequence τk , 1 ≤ k ≤ h, determines the range of influence of a constraint on the
displacement map at a level k. This is of great importance for direct manipulation.
For example, consider the situation that we interactively adjust the configuration of
an articulated figure by dragging one of its segments at a certain frame through a
graphical interface. The user input is interpreted as constraints which are immedi-
39
ately added to the set of prescribed constraints. Then, our system smoothly deforms
a portion of the motion clip around this modified frame. Here, the range of influ-
ence on the motion clip is mainly dependent on the spacing of τk . Larger spacing
between knots yields wider range of deformation (See Figure 4.2). Therefore, the
displacement map d1 , that is derived from the coarsest sequence τ1 , has non-zero
values over the widest range to smoothly propagate the change of the motion. The
subsequent finer displacement maps dk , 2 ≤ k ≤ h, perform successive tunings to
satisfy the constraints.
The density of the finest knot sequence τh controls the precision of the final
motion mh . If τh is sufficiently fine to accommodate the distribution of constraints
in the time domain, mh can exactly satisfy all constraints. However, our algorithm
may leave a small deviation for each constraint in C even with several levels in the
hierarchy. In our experiments, we need just four or five levels for visually pleasing
results, that can be further enhanced to achieve an exact interpolation by enforcing
each constraint independently with the inverse kinematics solver.
4.4.5 Initial Guesses
For a spacetime problem, a good initial guess on the desired solution is very impor-
tant to improve both the convergence of numerical optimization and the quality of
the result [33]. We obtain an initial guess for motion fitting by shifting the position
of the root segment in the original motion. To motivate this, consider the walking
motion that is adapted to the rough terrain as shown in Figures 4.9 (a) and (b). The
position of a stance foot, that touches the surface of the terrain, is pulled upward
at a small hill on the terrain, and thus the character is unwantedly forced to squat.
Even though the inverse kinematics solver tries to minimize the deviation of joint
angles, it cannot prevent the knee bending completely. To reduce this artifact, we
change the position of the root segment due to the change of geometry. Specifi-
cally, we displace the root segment by the average shift of the contact positions at
each frame. The shift of the root segment position at a frame can also be smoothly
propagated to the neighboring frames using the multilevel B-spline fitting method.
40
4.5 Inverse Kinematics
The most time consuming component in the motion fitting algorithm is the inverse
kinematics solver which is invoked very frequently at each level of the fitting hierar-
chy. Therefore, the overall performance of a hierarchical fitting critically depends on
the performance of the inverse kinematics solver. We describe, in this section, two
inverse kinematics algorithms. In Section 5.1, we introduce an inverse kinematics al-
gorithm for a general tree-structured figure with spherical joints based on numerical
optimization techniques. In Section 5.2, we present a faster specialized algorithm for
a human-like figure with limb linkages. The latter algorithm combines the numerical
techniques with an analytical method illustrated in Section 5.3.
4.5.1 A Numerical Approach
Our inverse kinematics solver is based on a constrained non-linear optimization

technique that minimizes the objective function subject to a set C of constraints. We
have an additional burden to enforce the unitariness of each quaternion parameter
q = (w, x, y, z). One possible solution would be to augment the set C with a
new constraint w2 + x2 + y 2 + z 2 − 1 = 0 for the unit quaternion parameter. We
circumvent this extra constraint from the observation that every orientation q can
be expressed as a rotation exp(v) ∈ S3 from a fixed reference orientation q0 ∈ S3 ,
that is, q = q0 exp(v). Therefore, we can parameterize (p, q1 , · · · , qn ) by a simple
vector x = (x0 , · · · , x3n+2 ) ∈ R3n+3 using the displacement d = (v0 , · · · , vn ) from
a given reference configuration (p0 , q10 , · · · , qn0 ) as follows:
p = p0 + v0 , and
(4.5)
qi = qi0 exp(vi ), 1 ≤ i ≤ n,
where vi = (x3i , x3i+1 , x3i+2 ) for 0 ≤ i ≤ n. Letting the reference configuration be

mk−1 (tj ) in Equation (4.4), we reduce the objective function to a quadratic form of
41
x. Accordingly, our constrained optimization problem is formulated as follows:
1 T
minimize f (x) = x Mx,
2
subject to ci (x) = 0, i ∈ Ne ,
ci (x) > 0, i ∈ Ni ,
where M is a diagonal matrix that determines the rigidity of individual parameters

in x.
A typical approach to the constrained optimization is to transform the con-
strained problem into an unconstrained version with extra parameters (the Lagrange
multipliers) or extra energy terms (penalty functions). We avoid illegal configura-
tions by employing the penalty method that allows us to handle the equality and
inequality constraints in a uniform way [76]. The objective function for the uncon-
strained version is

g(x) = f (x) + ωi ci (x)2 + ωi (min(ci (x), 0))2 , (4.6)
i∈Ne i∈Ni
where ωi weights each individual constraint. We adopt the conjugate gradient

method to minimize this function [79].
4.5.2 A Hybrid Approach
The major difficulty of solving an inverse kinematics problem stems from the exces-
sive DOFs of an articulated figure. A reasonable human model may have about 40
DOFs for computer animation, while we specify much fewer constraints for manip-
ulating the figure. For a figure of n DOFs, we can remove c of those DOFs with a
set of c independent constraints imposed on it. The remaining (n − c) DOFs span
the solution space of the problem.
A reduced-coordinate formulation parameterizes the redundant DOFs with a
reduced set of (n − c) variables. One explicit redundancy in the human body is
the “elbow circle” that was first mentioned in Korein and Badler [62]. Even though
the shoulder and the wrist are firmly planted, we can still afford to move the elbow
42
3 DOF
3 DOF
1 DOF
3 DOF
1 DOF
3 DOF
Figure 4.3: A human-like figure that has explicit redundancies at its limb linkages
along a circle with its axis through the shoulder and the wrist (See Figure 4.3). The
human figure has four limbs, two from arms and two from legs. The redundant DOF
for the i-th limb linkage can be parameterized with a rotation angle θi , 1 ≤ i ≤ 4,
about the axis.
Without loss of generality, we assume that the positions and orientations of
hands and feet are fixed by constraints. If there is a free hand or foot, the DOFs in
the corresponding limb are left unchanged. Let m = (p, q1 , · · · , qr , qr+1 , · · · , qn )
be the configuration of a human-like figure. Its rear part (qr+1 , · · · , qn ) denotes
the DOFs for the limbs and the fore part (p, q1 , · · · , qr ) does the remaining DOFs.
Since the constraints restrain the DOFs in the limb linkages, the reduced set of
parameters (p, q1 , · · · , qr , θ1 , · · · , θ4 ) span all possible configurations of the figure
under the constraints.
Incorporating the idea of reduced-coordinate formulation into the numerical
optimization framework, we can solve an inverse kinematics problem using a fewer
number of optimization parameters x̂ = (x0 , · · · , x3r+2 , θ1 , · · · , θ4 ) ∈ R3r+7 . Note
that we have replaced the rear part of x with the elbow circle parameters θ1 , · · · , θ4
for limb linkages. Whenever we evaluate the objective function with new param-
eters x̂, the parameters (p, q1 , · · · , qr ) are computed first by Equation (4.5), and
then the others for (qr+1 , · · · , qn ) are uniquely determined by an analytical solver
43
goal position & orientation
r1
l2 l1
r2 φ
(a) Initial configuration (b) Elbow rotation
θi
(d) Redundancy (c) Shoulder & Wrist rotation
Figure 4.4: The process for adjusting the arm posture
which takes (p, q1 , · · · , qr ) and θi , 1 ≤ i ≤ 4, as input. Then, we extract the un-

known part (x3r+3 , · · · , x3n+2 ) of x from (qr+1 , · · · , qn ) to evaluate the objective
function in Equation (4.6). The reduced-coordinate formulation uses a fewer number
of parameters to yield faster convergence and fewer iterations to enhance the overall
performance.
4.5.3 Arm and Leg Postures
Consider a limb linkage, for example, an arm linkage. Starting from an initial
configuration, we sequentially adjust the joint angles for the elbow, the shoulder
and the wrist of the arm linkage to place the hand at the desired position and
orientation. We assume that the torso and the shoulder positions are given. Let l1 ,
44
l2 , r1 , r2 and L be defined as follows (See Figure 4.4(a)):
l1 = the length of the upper arm,
l2 = the length of the fore arm,
r1 = the distance from the elbow rotation axis to the shoulder,
r2 = the distance from the elbow rotation axis to the wrist, and
L = the distance between the shoulder and the wrist.
To place the wrist at a position distant from the shoulder by L (See Figure 4.4(b)),
the angle φ between upper and lower arms is given by

−1 l12 + l22 + 2 l12 − r12 l22 − r22 − L2
φ = cos , (4.7)
2r1 r2
as illustrated in the next section. Then, we bring the wrist to the goal position by
adjusting the shoulder angles (See Figure 4.4(c)). In the subsequent step, we rotate
the wrist angles to coincide with the goal orientation. Once one feasible solution is
given, the other solutions can be obtained by rotating the elbow about the axis that
passes through the shoulder and the wrist positions. Given θi , we can determine the
arm posture uniquely (See Figure 4.4(d)). Similarly, we can determine a leg posture.
If L is longer than the arm length, l1 + l2 , the elbow stretches as far as possible.
On the other hand, if L is too small, then the elbow angle could violate its lower
limit and thus is pulled back into the allowable range. In both cases, we cannot
place the wrist at the exact position and thus the corresponding penalty function
yields a positive value for the given torso configuration.
4.5.4 Derivation for Equation (4.7)
To identify the angle between the upper and fore arms, we project the joint positions
onto a plane perpendicular to the elbow rotation axis. Then, the projections for the
shoulder and the wrist are placed on two concentric circles whose center coincides
with the projection for the elbow (see Figure 4.5). The distance r between the
45
r
r1 L l1
r2 l2
φ
s2 s1
Figure 4.5: Computing the elbow angle
projections is given in terms of r1 , r2 and the angle φ between them.
r 2 = r12 + r22 − 2r1 r2 cos φ. (4.8)

Letting s1 = l12 − r12 and s2 = l22 − r22 , respectively, the distance L between the
shoulder and the wrist positions is
L2 = (s1 + s2 )2 + r 2

= l1 + l2 + 2 l1 − r1 l22 − r22 − r12 − r22 + r 2
2 2 2 2 (4.9)

= l12 + l22 + 2 l12 − r12 l22 − r22 − 2r1 r2 cos(∠r1 r2 ).
√2 √
l12 +l22 +2 l1 −r12 l22 −r22 −L2
Therefore, cos φ = 2r1 r2 .
4.6 Joint Limit Specification

Representing the configuration of a ball-and-socket joint with the Euler angles, the
joint limit can be specified easily by providing upper and lower bounds for three
angles that correspond to X-, Y-, Z-rotations. However, there exists no such a
simple way for the unit quaternion representation. In this section, we present a
general scheme to specify the joint limit in the unit quaternion space. The basic
idea is to define simple primitive constraints that can be used for describing a more
complex constraint through boolean combination. This scheme is general enough to
support complicated requirements in joint limit specification for human-like figures.
46
w w
ψ v
φ
φ
(a) Conic (b) Axial (c) Spherical
Figure 4.6: Primitive joint constraints
Our primitives are given as follows (see Figure 4.6):
Conic constraint is denoted by
HC (q0 , w, k) = {eψv/2 eφw/2 q0 | 0 ≤ ψ ≤ k, v = w = 1, v · w = 0},
where q0 ∈ S 3 is the reference orientation and v ∈ R3 is an arbitrary unit vector

perpendicular to w. Under constraint HC (q0 , w, k), the direction of the body-fixed
axis must lie in the cone whose axis is w and the cone angle is k.
Axial constraint is denoted by
HA (q0 , w, k1 , k2 ) = {eψv/2 eφw/2 q0 | k1 ≤ φ ≤ k2 , v = w = 1, v · w = 0},
where q0 ∈ S3 is the reference orientation and v ∈ R3 is an arbitrary unit vector

perpendicular to w. Under constraint HA (q0 , w, k1 , k2 ), the joint is allowed to rotate
about the axis w by the angle within [k1 , k2 ].
Spherical constraint is denoted by
HS (q0 , k) = {eφv/2 q0 | 0 ≤ φ ≤ k, v = 1},
47
k0
k2
k1
w
Figure 4.7: The joint limit of the shoulder specified by a composite constraint
where v is an arbitrary unit vector. Under constraint HS (q0 , k), the joint is allowed
to rotate about an arbitrary axis, but the rotation angle must be smaller than or
equal to k.
For instance, the range of motion of the shoulder can be described as the
intersection of two constraints as shown in Figure 4.7 :
HA (q0 , w, k1 , k2 ) ∩ HC (q0 , w, k0 ).
q0 denotes the orientation of the shoulder when it relaxes and w does the direction
from the shoulder to the elbow. Then, the axial constraint describes the range of the
twist angle and the conic constraint gives limitation to the direction of the upper
arm.
Let q ∈ S3 be a joint configuration and H ⊂ S3 be a composite constraint that
is described as a boolean combination of primitive constraints. To check whether q
is included in H, the point inclusion tests, which answer true or false, for primitives
and their logical combinations are required. Note that any quaternion point q and its
antipode −q represent the same orientation. Therefore, the constraints are satisfied
if H includes either q or −q. The following propositions explain how to test point
inclusion for each primitive constraint.
Proposition 4 (Point inclusion for conic constraints) Given a unit quaternion q,
48
w w
φ
k ψ ψ
ŵ ŵ
(a) Conic (b) Axial
Figure 4.8: Point inclusion test
it is included in HC (q0 , w, k) if 0 ≤ ψ ≤ k, where ŵ = (qq−1 −1 −1

0 )w(qq0 ) and
cos ψ = w · ŵ.
Proof: Equating q and HC (q0 , w, k),
eψv/2 eφw/2 q0 = q,
eψv/2 eφw/2 = qq−1

0 .
The image of w rotated by eψv/2 eφw/2 makes a cone whose direction is w and the
cone angle is k, as shown in Figure 4.8(a). The quaternion qq−1
0 is contained in
{eψv/2 eφw/2 | 0 ≤ ψ ≤ k} if and only if the image ŵ = (qq−1 −1 −1
0 )w(qq0 ) of w is
inside the conic region. Hence, q ∈ HC (q0 , w, k) if the angle between w and ŵ is
less than or equals to k.
Proposition 5 (Point inclusion for axial constraints) Given a unit quaternion q,

it is included in HA (q0 , w, k1 , k2 ) if k1 ≤ 2w · log(e−ψv/2 qq−1
0 ) ≤ k2 , where cos ψ =
w · ŵ, v = w×ŵ
w×ŵ , and ŵ = (qq−1 −1 −1
0 )w(qq0 ) .
Proof: Equating q and HA (q0 , w, k1 , k2 ),
eψv/2 eφw/2 q0 = q,
eψv/2 eφw/2 = qq−1

0 .
49
Let ŵ be the image of w rotated by qq−1
0 . Then, ψ and v are directly computed as
w×ŵ
(see Figure 4.8(b)): cos ψ = w · ŵ and v = w×ŵ . Hence, the angle φ is
eφw/2 = e−ψv/2 qq−1

0 ,
φ = 2w · log(e−ψv/2 qq−1
0 ).
Since k1 ≤ φ ≤ k2 , k1 ≤ 2w · log(e−ψv/2 qq−1

0 ) ≤ k2 .
Proposition 6 (Point inclusion for spherical constraints) Given a unit quaternion

q, it is included in HS (q0 , k) if 2 log(qq−1
0 ) ≤ k.
Proof: Equating q and HS (q0 , k),
eφv/2 q0 = q,
eφv/2 = qq−1
0 ,
φv
= log(qq−1
0 ).
2
Since φ ≤ k and v = 1, 2 log(qq−1

0 ) ≤ k.
4.7 Experimental Results

Our human model has 6 DOFs for the pelvis position and orientation, 3 DOFs for
the (either rigid or flexible) spine, and 7 DOFs for each limb to yield the total of 37
parameters for the inverse kinematics problem. Other parameters for the head, the
neck, and the fingers are not used for resolving kinematic constraints. Through the
reduced-coordinate formulation, we can remove 6 DOFs for each limb and thus we
have at most 13 DOFs to be computed by a numerical optimization method. The
motion clips for our experiments have been sampled at the rate of 30 frames per
second.
The walking motion of Figure 4.9(a) is produced by performing a sequence of
transitions among a set of motion-captured clips, which include “walk straight”,
“turn left”, “turn right”, “start”, and “stop”. We interactively specify the moments
50
of heel-strikes and toe-offs for the motion clips. This information is used for es-
tablishing the kinematic constraints that enforce the foot contacts for the entire
motion. The terrain of Figure 4.9(b) is represented as a NURBS surface of which
control points are placed on a regular grid with a spacing of 80 % of the height of
the character, and their y-coordinates (heights) are randomly perturbed within 120
% of the height. To adapt the motion onto the rough terrain with doorways, we first
adjust the constraints such that the contact positions are shifted along the y-axis to
be placed on the terrain, and add new constraints to bend the character under the
doorways. Then, we use our motion fitting algorithm to warp the motion to satisfy
the constraints. The original and the adapted motions are depicted in Figures 4.9(a)
and 4.9(b), respectively.
The “climbing a rope” example in Figure 4.10(a) gives constraints on both
hands and feet. A physically simulated rope is used to explicitly illustrate the
moments of grasping and releasing the rope by a hand, which correspond to the
initiation and termination, respectively, of a variational constraint for that hand.
We adapt this motion to a different character with longer legs and a shorter body
and arms. For the character morphing example shown in Figure 4.10(b), the size
of a character smoothly changes to have extremely long legs and a short body, and
then to have extremely short legs and a long body. The original walking motion is
warped to preserve its uniform stride against the change of character size.
Our motion fitting method is also useful for generating a smooth transition
between motion clips. Figure 4.10(c) shows the transitions from walking to sneaking
and from sneaking to walking. The basic approach is very similar to the one pre-
sented by Rose et al. [81] We seamlessly connect the motion data by fading one out
while fading the other in. Over the fading duration, Hermite interpolation and time
warping techniques are used to smoothly blend the joint parameters of the motion
data. Since joint parameter blending may cause foot sliding, we enforce foot contact
constraints with the motion fitting method.
Table 4.1 gives a performance summary of the examples. Timing information
is obtained on a SGI Indigo2 workstation with an R10000 195 MHz processor. The
execution time for each example is not only influenced by quantitative factors such
51
as the number of frames, constraints and parameters, but also by qualitative factors
such as the difficulty of achieving desired features specified by constraints and the
quality of initial estimates. In particular, well-chosen initial estimates provide great
speedups for most of examples. One promising observation is that both execution
times and maximum errors rapidly decrease level by level. This implies that the
performance of our algorithm is not critically dependent on the error tolerance. In
all examples, every constraint is satisfied within or slightly over 1 % of the height of
the character by the hierarchical fitting of four levels. A few more levels may result
in a more accurate solution. As shown in experimental data, we can anticipate that
the computation cost for an additional level is much cheaper than the cost for the
prior level.
4.8 Discussion
In this section, we compare our motion fitting algorithm to the previous approaches
at several viewpoints.
Interpolating Splines vs. Multilevel B-splines: An interpolating spline is a

possible choice to represent the displacement map [7, 98]. However, the interpolating
spline could undulate rapidly for a dense distribution of constraints so that it often
fails to preserve the detail characteristics of the original motion. Instead, we use
a series of uniform B-splines that form a hierarchy of motion displacements. Since
uniform B-splines approximate different details of displacements according to knot
spacings, we are able to edit the motion at any level of detail; fitting at the coarsest
level makes a gross deformation and then fine details are incrementally added at the
finer levels.
Global vs. Local Least-squares: Fitting a B-spline curve to scattered data

points can be formulated as a least-squares approximation problem for solving an
over-determined or under-determined system of linear equations [79]. To obtain
the optimal solution in the least-squares sense, we can use the pseudo inverse of a
large matrix that accommodates the data points all together [44]. However, this
52
53
Figure 4.9: Adaptation to environment change
(a) The original walking motion on the flat ground (b) The adapted motion for the rough terrain
54
(a) Climbing a rope for different characters (b) Character morphing
Figure 4.10: Adaptation to character change and motion transition

(c) Transitions between walking and sneaking
Table 4.1: Performance data. # of parameters counts the DOFs of a character
used for resolving kinematic constraints. A number in ( ) indicates that of reduced
parameters. The maximum error is measured in percentages of the height of the
character.
walking on rough terrain
# of parameters 20 (8)
# of frames 464
# of constraints (except joint limits) 5568
preprocessing time (CPU sec.) 0.17
level 1st 2nd 3rd 4th
# of frames/knot 16 8 4 2
execution time (CPU sec.) 2.50 1.67 1.31 1.10
maximum error (%) 14.31 9.24 3.48 1.06
climbing a rope
# of frames 275
maximum error (%) 11.09 7.51 3.57 1.11
character morphing
# of frames 62
maximum error (%) 6.98 5.35 3.08 1.10
transition
# of frames 119
preprocessing time (CPU sec.) ·
maximum error (%) 7.17 4.71 2.14 0.74
55
global method often suffers from over-shooting that may give an undesirable curve
shape. Ironically, the less accurate local method in Equation (4.2) is preferred in a
hierarchical framework. Since approximation errors at one level are incrementally
canceled out in the later levels, the accuracy at each level is not critical. This
local method is much more efficient and less prone to over-shooting than the global
method.
Single Large Optimization vs. Many Small Optimizations: Gleicher [33]

cast motion retargetting as a large optimization problem. Based on the “divide-
and-conquer” strategy, however, we partition his optimization problem into many
smaller inverse kinematics optimizations and then solve each of them very efficiently
by adopting specialized techniques such as the hybrid inverse kinematics solver.
Finally, we combine the poses at constrained frames employing the hierarchical curve
fitting technique. This divide-and-conquer approach provides a noticeable speedup
to satisfy the performance requirement of our interactive motion editing system.
Limitation of Our Approach: Our motion fitting algorithm requires additional

work to handle an inter-frame constraint that enforces the relationship among pa-
rameters scattered over multiple frames. This kind of constraints are used for avoid-
ing foot sliding between the heel-strike of a foot and its toe-off while allowing the
absolute coordinates of the foot to be altered. We address this problem by incor-
porating the intra-frame inverse kinematics constraints at those co-related frames
into a larger optimization problem that includes inter-frame constraints among those
frames as well. In the extreme case where all frames are related to each other, this
approach is reduced to Gleicher’s so that we have to solve one large optimization
problem to derive the displacement map at each level in the hierarchy.
56
Chapter 5
Multiresolution Motion
Analysis and Synthesis
Multiresolution representations are now established as a fundamental component

in signal processing. We present a multiresolution representation of motion using
spatial filters introduced in Chapter 3. Our representation enables multiresolution
motion editing through level-wise coefficient manipulation to uniformly address is-
sues raised by motion modification, blending, and transition. The capability of our
approach is demonstrated through experiments on motion editing such as motion
enhancement/attenuation, blending, stitching, shuffling, and noise removal.
5.1 Motivation
Although it is relatively easy to obtain high quality motion clips by virtue of motion
capture techniques, crafting various animations of arbitrary length with available
motion clips is still difficult and involves significant manual efforts. Our work is
motivated by the following issues related to motion editing:
• Motion modification: It is desirable that motion editing tools have the capabil-
ity to change the global pattern of a motion while maintaining its fine details,
and conversely to change fine details while maintaining the global pattern. It
57
is also desirable that they have the capability to enhance or attenuate a motion
to generate its variations of different moods or emotions. A good example is
cartoon-like exaggeration that may have a more expressive power. To facilitate
those features, motion editing tools should manipulate each different level of
motion details, separately.
• Motion blending: Motion blending is a popular operation to produce a wide

span of motions from a small number of example motions. However, the fine
details of motions may be lost, if we blend them grossly without separately
combining their features at different scales. The notion of levels of detail
may give a clue to avoid this problem. In particular, level-wise blending of
two motions with different weighting factors allows for producing an intended
motion of better quality [7, 94]. As a simple example, suppose that a motion
is generated by taking coarse levels of detail from one motion and fine levels
of detail from another. Then, the synthesized motion resembles the former in
overall shape and also does the latter in fine details.
• Motion transition: In animation production, it is often required to stitch two

motion clips in such a way that the end of one motion is seamlessly connected
to the start of the other. Rose et al. [82] addressed this problem by inserting
an in-between motion which minimizes the torque required to transit from one
to the other based on spacetime optimization. However, such a physics-based
approach is time-consuming and often fails to convey the aliveness of motion.
As mentioned by Witkin and Popović [98], a crucial source of aliveness is the
fine details of motion, that are difficult to obtain through spline interpolation
or spacetime optimization.
In this chapter, we present a multiresolution method for motion analysis and

synthesis as a unified framework to address those issues. The basic idea is to repre-
sent motion data (or signals) as a collection of coefficients that form a coarse-to-fine
hierarchy. The coefficients at the coarsest level (or resolution) describe the global
pattern of a motion signal, while those at fine levels provide the different details
at successively finer resolutions. Our approach is perhaps most similar to that of
58
Bruderlin and Williams [7] who used a digital filterbank technique to store motion
data as a hierarchy of detail levels, where each level represents a different band of
frequencies. With the hierarchy of detail levels, they can not only edit motion data
interactively by amplifying/attenuating particular frequency bands but also gener-
ate a new motion by blending two existing motions band-wisely. While addressing
two issues such as motion modification and blending extensively, they hardly men-
tioned the issue of motion transition. Moreover, their approach often suffers from
the singularity due to parameterization of orientation data with Euler angles.
Provided with two motions sufficiently smooth, spline interpolation or space-
time control would be a good choice for producing a seamless transition between
them. However, most live-captured motion data highly oscillate to include fine de-
tails that may distinguish the motion of a live creature from the unnatural motion of
a robot. To connect the pair of motions seamlessly, we need to generate a natural-
looking in-between transition motion. Our multiresolution motion representation
scheme facilitates this motion through level-by-level manipulation of motion signals.
To illustrate the problem of singularity, it would be instructive to consider a
simple 2D example where orientations can be parameterized by a single scalar value
changing from zero to 2π. Motion signals have singularity at zero (or 2π) to cause
serious artifacts in signal processing. For 3D motions of a human-like articulated
figure, the problem gets worse so that many familiar motions such as simple turning
and arm swing may suffer from singularity. To avoid such a problem, it is desirable
to employ non-singular orientation representations such as rotation matrices or unit
quaternions which form a Lie group. Due to the inherent non-linearity of these
representations, however, it is challenging to generalize multiresolution analysis and
synthesis techniques for orientation data.
In this chapter, we make two major surgeries over the work of Bruderlin and
Williams to uniformly address all of the three issues for motion editing. We first con-
struct the multiresolution representation for motions with proper consideration on
handling orientation components. Then, we present a general scheme of synthesizing
a seamless motion of arbitrary length by combining canned motion clips.
59
The remainder of the chapter is organized as follows. After reviewing previous
work, a brief overview of our approach is described in Section 5.3. In Section 5.4,
we present a multiresolution structure for representing motion and explain how to
construct it. Section 5.5 is for synthesizing a new motion from available motion
clips using various motion editing techniques. In Section 5.6, we provide a gallery
of examples.
5.2 Related Work

The notion of multiresolution analysis and synthesis was initiated by Burt and Adel-
son et al. [8, 9, 89] who used them for constructing a multiresolution image repre-
sentation, called Gauss-Laplacian pyramid, to facilitate operations such as seamless
merging of image mosaics and temporal dissolving between images. Their under-
lying idea is to decompose an image into a set of band-pass filtered component
images, each of which represents a different band of spatial frequency. This idea was
further elaborated by Mallat and Meyer [71] to establish a multiresolution analysis
for continuous functions in connection with wavelet transformation.
Multiresolution techniques have been extensively exercised in computer graph-
ics for curve and surface editing [10, 22, 25, 60, 69, 85], polygonal mesh edit-
ing [38, 59, 63, 100], image editing [2], image querying [45], texture analysis and syn-
thesis [3, 41], video editing and viewing [24], image and surface compression [18, 19],
global illumination [36], and variational modeling [35]. These techniques have been
used for motion editing and synthesis as well. Liu et al. [68] reported that adaptive
refinement with hierarchical wavelets provides a significant speedup for spacetime
optimization. Bruderlin and Williams [7] adopted multiresolution techniques for
motion editing to manipulate the detail characteristics of motion data. Their hier-
archical representation of a motion with frequency bands allows level-by-level editing
of motion characteristics.
60
5.3 Overview
In this section, we briefly give an overall picture for this chapter. Our approach
consists of two parts: motion analysis and synthesis. To decompose a motion signal
into a hierarchy of detail characteristics, our approach relies on a spatial filtering
scheme presented in Chapter 3. With that scheme, we are able to construct a
multiresolution motion representation that is inspired by Gauss-Laplacian image
pyramids [8]. Level-wise manipulation of motion signals along the hierarchy enables
multiresolution editing that deals with fine details of motions.
In our representation, a motion signal is decomposed into a coarse base sig-
nal and a hierarchy of detail coefficients. Each level of the hierarchy consists of a
sequence of coefficients (a pair of 3D vectors). The coefficients at the base level
determine the overall shape of the motion signal, and its details are added succes-
sively with those at fine levels. The construction of the multiresolution representa-
tion is based on two basic operations: reduction and expansion. The expansion is
achieved by a subdivision operation that can be considered as up-sampling followed
by smoothing. The reduction is a reverse operation, that is, smoothing followed
by down-sampling. Motion filtering provides smoothing operators to avoid aliasing
caused by down-sampling and to interpolate missing information for up-sampling.
By exploiting the capability of the multiresolution representation, we can syn-
thesize an intended motion from canned motion clips. Through direct manipulation
of the detail coefficients at each level in the hierarchy, we can solve the motion mod-
ification and blending issues. For motion stitching, a base level motion signal is first
obtained by interpolating the coefficients at the coarsest level, and then fine details
are added successively with those at lower levels. To obtain the fine-level coefficients
in the transition motion connecting a pair of given motions, we employ a multires-
olution sampling scheme [3]. Since the detail coefficients of the transition motion is
sampled from the given motions, we can preserve the original characteristics for the
transition motion.
61
m(n−1)
Reduction Expansion
m(n) d(n−1)
Figure 5.1: Wiring diagram of the multiresolution analysis
5.4 Multiresolution Analysis

In this section, we present a multiresolution representation of motion, consisting
of a coarse base signal and detail coefficients that form a hierarchy of motion dis-
placement maps. Displacement mapping is originally invented for warping a canned
motion while preserving its fine details [7, 98]. In our context, displacement maps
are used for adding details level by level to the base signal to reproduce the original
motion encoded in a multiresolution representation. The displacement maps can
be computed with two basic operations: reduction and expansion (see Figure 5.1).
Based on the spatial filtering scheme given in the previous section, we generalize
those operations to deal with both position and orientation data in a uniform way.
5.4.1 Displacement Mapping
The configuration of an articulated figure can be specified by its joint configurations

in addition to the position and orientation of the root segment. For uniformity,
we assume that the configuration of each joint is given by a 3-dimensional rigid
transformation. Then, we can describe the DOFs at every body segment as a pair
of a vector in R3 and a unit quaternion in S3 .
The motion data for an articulated figure comprise a bundle of motion signals.
Every signal is represented with a sequence of frames, {(pi , qi ) ∈ R3 × S3 }, each
of which corresponds to the position and orientation of a body segment. A frame
(pi , qi ) specifies a rigid transformation T(pi ,qi ) that maps a point in R3 to a point
in R3 :
T(pi ,qi ) (u) = qi uq−1

i + pi . (5.1)
62
Here, u = (x, y, z) ∈ R3 is considered as a purely imaginary quaternion (0, x, y, z) ∈
S3 . Given two motion signals m = {(pi , qi ) ∈ R3 × S3 } and m = {(pi , qi ) ∈
R3 × S3 }, we define their motion displacement d = {(ui , vi ) ∈ R3 × R3 } measured
in a local (body-fixed) coordinate system such that m = m ⊕ d or d = m m.
From the composite transformation T(pi ,qi ) = T(pi ,qi ) ◦ T(ui ,exp(vi )) , we can derive
each element of a displacement map d as follows:
(pi , qi ) = (pi , qi ) ⊕ (ui , vi )

(5.2)
= (qi ui q−1
i + pi , qi exp(vi )),
v
where exp(v) denotes a 3-dimensional rotation about the axis v ∈ R3 by angle
2v ∈ R.
5.4.2 Multiresolution Representation
Our multiresolution representation for a motion signal m = m(N ) is defined by a

series of successively refined signals m(0) , m(1) , · · · , m(N −1) together with that of
displacement maps d(0) , d(1) , · · · , d(N −1) .
Reduction: Given a detailed signal m(n+1) , the reduction operator R generates

its simplified version m(n) at a coarser resolution by applying a smoothing filter to
m(n+1) and then removing every other frames to down-sample the signal. Hence,
R can be regarded as the composition of a down-sampling operator D of factor two
and a smoothing operator HR , that is,
m(n) = Rm(n+1) = (D ◦ HR )m(n+1) . (5.3)
A common way to implement a smoothing operator HR is to adopt a diffusion

process that leads to a local update rule
pi ← pi − λLj pi , (5.4)
63
where λ is a diffusion coefficient and L is a Laplacian operator [17, 38, 59, 91].
Filtering with this rule disperses small perturbations rapidly, while the main shape
is degraded slightly. Here, Laplacian operators can be estimated for discrete signals
by replacing differential operators with forward divided difference operators such
that Lj = ∆2j , where
pi+1 − pi
∆1 pi = ,
ti+1 − ti
(5.5)
∆j−1 pi+1 − ∆j−1 pi
∆j pi = , for j > 1.
ti+j − ti
Then, the update rule yields an affine-invariant spatial mask that can be generalized
for orientation data using Equation (3.12). For example, by adopting the second
1
Laplacian operator L2 , we have a filter mask 24 (−λ, 4λ, 24 − 6λ, 4λ, −λ) and its
corresponding filter,
(pi , qi ) = HR (pi , qi ). (5.6)
Here, letting ωi = log(q−1

i qi+1 ),

1
pi = −λpi−2 +4λpi−1 +(24 − 6λ)pi +4λpi+1 −λpi+2 ,
24

λ
qi = qi exp (ωi−2 − 3ωi−1 + 3ωi − ωi+1 ) .
24
Expansion: Given a coarse signal m(n) , the expansion operator E approximates

a corresponding signal m(n+1) at a higher resolution by interpolation followed by
error compensation:
m(n+1) = Em(n) ⊕ d(n) = (HE ◦ U)m(n) ⊕ d(n) , (5.7)
where d(n) represents an approximation error. To obtain a smoother signal of higher

resolution, a cubic polynomial is a good choice for trading off smoothness for effi-
ciency. Thus, the operator E can be achieved by a four-point interpolatory subdivi-
sion scheme that maps a sequence of motion frames m(n) = {(pni , qni )} to a refined
64
Up-sampling Smoothing
Figure 5.2: Interpolatory subdivision
sequence m(n+1) = {(pn+1

i , qn+1
i )}, where the even numbered frames (pn+1 n+1
2i , q2i )
at level n + 1 are the frames (pni , qni ) at level n, and the odd numbered frames
(pn+1 n+1
2i+1 , q2i+1 ) are newly inserted between old frames.
The subdivision can be considered as up-sampling U followed by smoothing

HE (see Figure 5.2). For up-sampling, the odd numbered frames (pn+1 n+1
2i+1 , q2i+1 )
are inserted at the halfway between two successive old frames using (spherical)
linear interpolation. Assuming that the motion frames are sampled uniformly,
1 n
pn+1
2i+1 = 2 pi + 12 pni+1 and qn+1 n n
2i+1 = slerp 12 (qi , qi+1 ) which will be refined in the
following step to give a smoother motion. Here, slerpt (q1 , q2 ) denotes a spherical
linear interpolation between two unit quaternion points q1 and q2 with interpo-
lation parameter t, that is, slerpt (q1 , q2 ) = q1 exp(t · log(q−1
1 q2 )) [88]. Apply-
ing the smoothing operator HE to the up-sampled data with a subdivision mask
1 9 9 1
(− 16 , 0, 16 , 0, 16 , 0, − 16 ) gives the refined data as follows:
pn+1
2i = pni ,
1
pn+1 (−pn+1 2i+2 − p2i+4 )
n+1
2i+1 = 2i−2 + 9p2i + 9pn+1 n+1
(5.8)
16
1
= (−pni−1 + 9pni + 9pni+1 − pni+2 ).
16
Then, the point pn+1 n n

2i+1 thus obtained locates at the halfway between pi and pi+1
65
d(N −1) ... d(1) d(0)
m(N ) m(N −1) ... m(1) m(0)
d(N −1) ... d(1) d(0)
m(N ) m(N −1) ... m(1) m(0)
Figure 5.3: Decomposition (upper) and reconstruction (lower)
on the cubic polynomial curve interpolating four neighboring points pni−1 , pni , pni+1
and pni+2 [20, 21]. We can generalize this scheme for orientation data using Equa-
tion (3.12) to have
qn+1
2i = qni ,
n
ωi−1 − ωi+1
n (5.9)
qn+1 n n
2i+1 = slerp 12 (qi , qi+1 ) exp( ),
16
where ωin = log((qni )−1 qni+1 )
Construction: Our construction algorithm starts with the original motion m(N )
to compute its simplified versions and their corresponding displacement maps suc-
cessively in a fine-to-coarse order. Suppose that we are now at the n-th level for
0 ≤ n ≤ N − 1. Given a signal m(n+1) , we can compute a coarser signal m(n) by
reduction. The expansion of m(n) interpolates the missing information to approxi-
mate the original signal m(n+1) . Thus, the difference between them is expressed as
a displacement map d(n) as follows:
m(n) = Rm(n+1) ,
(5.10)
d(n) = m(n+1) Em(n) ,
Cascading these operations until there remain a sufficiently small number of frames
in the motion signal, we can construct the multiresolution representation which
includes the coarse base signal m(0) and a series of displacement maps as shown in
66
Figure 5.3(upper). Formally stating, the multiresolution representation of m(N ) is
given by
m(0) = RN m(N ) ,
(5.11)
d(n) = RN −n−1 m(N ) ERN −n m(N ) ,
for 0 ≤ n ≤ N − 1. The original signal m(N ) can be reconstructed from the mul-
tiresolution representation by recursively adding the displacement map at each level
to the expansion of the signal at the same level, that is,
m(N ) = Em(N −1) ⊕ d(N −1)
= E(Em(N −2) ⊕ d(N −2) ) ⊕ d(N −1) (5.12)
= E(E · · · (Em(0) ⊕ d(0) ) ⊕ · · · ) ⊕ d(N −2) ) ⊕ d(N −1) ,
as shown in Figure 5.3(lower).

If the smoothing filters for reduction and expansion are not induced from a bi-
orthogonal wavelet basis, then this construction scheme gives over-representations, as
Gauss-Laplacian image pyramids do, in a sense that the decomposition of m(n) into
a coarser signal m(n−1) and its detail coefficients in d(n−1) yields extra data to store.
For memory-critical applications such as compression and progressive transmission,
we can circumvent such extra data by skipping the smoothing step of the reduction
operation in a spirit of lazy wavelets [71, 86]. Then, m(n−1) contains the even frames
of m(n) and thus we have non-zero detail coefficients in d(n−1) only for odd frames
to achieve an exact representation of the same size.
The original motion must have (k2n + 1) frames to construct a hierarchy of
n levels with its base signal of (k + 1) frames. If the original motion has less than
(k2n +1) frames, then we append extra frames at the end of the signal by duplicating
the last frame or resample the signal to make the required number of frames.
67
5.4.3 Extension
Though most motion captured data are sampled at a sequence of time instances of
uniform interval, we often need to process non-uniform data to support tasks such
as time warping which aligns motion clips with respect to time [7]. To construct a
multiresolution representation for non-uniformly sampled motion data, we further
generalize the reduction and expansion operators for a non-uniform setting. For
reduction, we can easily derive smoothing masks by estimating discrete Laplacian
operators for a non-uniform setting, since the divided difference operator is well-
defined. Given a knot sequence [ti−2 , ti−1 , ti , ti+1 , ti+2 ], a non-uniform smoothing
mask (c0 , c1 , c2 , c3 , c4 ) for a second Laplacian operator is as follows:
1
c0 = ,
(ti−2 − ti−1 )(ti−2 − ti )(ti−2 − ti+1 )(ti−2 − ti+2 )
1
c1 = ,
(ti−2 − ti−1 )(ti−1 − ti )(ti−1 − ti+1 )(ti−1 − ti+2 )
1
c2 = , (5.13)
(ti−2 − ti )(ti−1 − ti )(ti − ti+1 )(ti − ti+2 )
1
c3 = ,
(ti−2 − ti+1 )(ti−1 − ti+1 )(ti − ti+1 )(ti+1 − ti+2 )
1
c4 = .
(ti−2 − ti+2 )(ti−1 − ti+2 )(ti − ti+2 )(ti+1 − ti+2 )
For expansion, the coefficients of the subdivision mask are derived from the cu-
bic Lagrange polynomials [60]. The cubic polynomial that interpolates four points
(pni−1 , pni , pni+1 , pni+2 ) defined over the knot sequence [tm n n n
i−1 , ti , ti+1 , ti+2 ] can be writ-
ten as follows:
p(t) = l1000 (t)pni−1 +l0100 (t)pni +l0010 (t)pni+1 +l0001 (t)pni+2 , (5.14)
where the cardinal function lu0 u1 u2 u3 (t) is the unique cubic polynomial that in-
terpolates uj at tni+j−1 for 0 ≤ j ≤ 3 [57]. Note that Equation (5.14) is a sim-
ple generalization of Equation (5.8). Therefore, we can obtain a subdivision mask
(l1000 (tn+1 n+1 n+1 n+1 n+1 n+1
2i+1 ), 0, l0100 (t2i+1 ), 0, l0010 (t2i+1 ), 0, l0001 (t2i+1 )) to compute p2i+1 and q2i+1 .
Proper boundary handling is required for the subdivision scheme in either a

uniform or non-uniform setting. At the left boundary, for example, we determine
68
1.0
0.0
-1.0
Figure 5.4: Level-of-detail generation for a live-captured signal. The four curves rep-
resent the change of w-, x-, y-, and z-components, respectively, of a unit quaternion
with respect to time. (from left to right) Original signal and its approximations at
successively coarser resolutions
pn+1
1 from the cubic polynomial that interpolates the four left-most points pn0 , pn1 ,
pn2 and pn3 of the original sequence m(n) . qn+1
1 can also be computed with the
spatial mask induced from the interpolating polynomial.
5.5 Multiresolution Synthesis

Up to now we have considered an analysis algorithm which decomposes a motion
signal into a hierarchy of details. We now consider how the multiresolution repre-
sentation can be used for motion editing and synthesis.
5.5.1 Motion Modification
Our multiresolution representation allows for modifying its fine details at each level
independently of those at the other levels through level-wise manipulation of detail
coefficients. Note that each detail coefficient is represented with a pair of 3D vectors
that correspond to the displacements for position and orientation, respectively.
A natural application is to construct an LOD (level-of-detail) representation
of a motion that consists of its several versions at various levels of detail (see Fig-
ure 5.4). Given a detailed signal, we can construct a series of successively simpler
versions by removing the detail coefficients level by level starting from the finest
level. For continuous transition between levels, we also consider the fractional levels
n + α of a motion signal with blending parameter 0 < α < 1, that define a linear
interpolation between levels n and n + 1. To obtain a fractional level motion, we
69
scale the coefficients at level n by a factor of α, and set all coefficients at higher
levels zero.
Another promising application is enhancing/attenuating the detailed features
of a motion signal to convey different moods or emotions. This application can be
achieved through the level-wise scaling of detail coefficients with different scaling
factors. For the motion “jump and kick” in Figure 5.6, we multiply the detail coef-
ficients by constant factors to produce the enhanced (top) and attenuated (bottom)
versions, respectively. The enhancement results in a higher jump and kick, while
the attenuation conveys a milder emotional mood and softer action. The effects
are clearly observed along the trajectories of the feet. Figure 5.7 shows a motion in
which the face is hit by an object. The enhanced and attenuated versions successfully
simulate the effects of hard and soft hitting, respectively.
5.5.2 Motion Blending
Our representation scheme is also useful for blending motion clips together. A par-
ticular example in Figure 5.8 blends three motions of the same size, that is, straight
walking mws , turning with a walk mwt , and straight walking with a limp mls . From
these motions, we produce a new motion mlt that describes turning with a limp.
The basic observation is that the global shape of the target motion is similar to mwt
(0)
and its fine details are similar to mls . Therefore, we obtain the base signal mlt by
(0) (0) (0)
applying the displacement map Φst = mwt mws to mls . Similarly, the detail co-
(n) (n) (n) (n)
efficients in dlt are computed by applying the displacement map Φwl = dls dws
(n)
to dwt .
(0) (0) (0)

mlt = mls ⊕ (mwt m(0)
ws )
(5.15)
(n) (n) (n)
dlt = dwt ⊕ (dls d(n)
ws ), for 0 < n < N .
(n)
Here, Φst describes how a straight movement is transformed to a turning, and Φwl
does how normal walking is transformed to limping.
Time warping is an ingredient of blending schemes that gives a correspondence
among example motions with respective to time. Bruderlin and Williams [7] pro-
70
vided a good explanation how time warping can be used to achieve a better blend. In
general, time warping yields a non-uniform correspondence that introduces a com-
plication to blending schemes. To circumvent this complication, we often resample
example motions non-uniformly to yield a one-to-one frame correspondence between
each pair of example motions that are supposed to be blended. The non-uniform
subdivision and smoothing operators introduced in the previous section facilitate
the construction of multiresolution representations for the resampled signals.
5.5.3 Motion Transition
Given motion signals A and B, we have three cases to stitch them depending on
their overlapping in time:
Case 1: A and B are overlapped in time.
Case 2: A and B abut but do not overlap in time.
Case 3: A and B neither overlap nor abut each other in time.
In the context of image mosaics, the first two cases are well discussed by Burt and
Adelson [9]. It is straightforward to adopt their idea for motion stitching. For case 1,
we can achieve motion stitching through the level-wise blending of coefficients along
the overlapping interval. Time warping can also be used to find a better correspon-
dence between motions A and B over the interval. For case 2, the coefficients at
each level of A as well as B are first extrapolated across its boundary to form an
overlapped transition interval and then blended along the interval. Since one mo-
tion abuts on the other, the extrapolation do not yield serious artifacts. Therefore,
we focus on case 3 in which we need to generate a seamless in-between transition
motion T that connects the end of A and the start of B. Since there is no overlap-
ping between A and B, we may not use any blending technique. A simple solution
would be to estimate the linear and angular velocities at the boundaries of A and B,
and then to perform a C 1 interpolation. However, this solution has two difficulties:
First, it is difficult to robustly estimate the velocities from live-captured signals since
they usually oscillate to include fine details. Second, the resulting transition motion
exhibits visual artifacts due to the lack of fine details.
71
keyframes
Figure 5.5: Motion transition through multiresolution sampling. (upper) Motion

signals to be connected; (middle) Smooth interpolation at a base level; (lower) Tran-
sition motion with fine details
In this section, we present a novel scheme to generate a natural-looking transi-

tion motion based on multiresolution sampling [3] (see Figure 5.5). We first construct
(n) (n)
multiresolution representations m(A) = {mi (A)} and m(B) = {mi (B)} for mo-
tion signals A and B, respectively, where the superscript indicates the level in the
hierarchy and the subscript does the frame number. The base signal m(0) (S) of
a stitched motion S can be obtained by combining the base signals m(0) (A) and
m(0) (B). Regarding the motion frames of each signal as its control points, their
concatenation is interpolated to give a base signal m(0) (T ) across the transition in-
terval, where T denotes a transition motion. If given motions A and B are apart in
time from each other, we can guide the generation of spline interpolation by insert-
ing additional keyframes into m(0) (T ), interactively. The concatenation gives a new
signal m(0) (S) that consists of three parts: m(0) (S) = (m(0) (A)|m(0) (T )|m(0) (B)).
The next step is for applying the displacement maps d(n) (S), 0 ≤ n < N , to give
a stitched motion with fine details. Each of the displacement maps also consists of
three parts: d(n) (S) = (d(n) (A)|d(n) (T )|d(n) (B)). Here, d(n) (A) and d(n) (B) are
obtained directly from A and B, respectively. However, we have to synthesize the
remaining portion d(n) (T ), 0 ≤ n < N , to add fine details to the transition motion
T.
In general, a motion signal contains important features at a wide range of
scales. Our multiresolution representation facilitates the detection of those features
72
(n)
at different resolutions. Let f (mi ) be a feature function that maps a motion signal
(n)
mi to a vector-value for a feature response such as a linear or angular velocity
change measured at a local coordinate system. To consider the features of different
scales simultaneously, we define the vector of feature responses such that

(n) (n) (n−1) (0)
F(mi ) = f (mi ), f (m i ), · · · , f (m i ) , (5.16)
2 2n
where the feature response with a fractional index i + α, for 0 < α < 1, is defined
as a linear combination of the feature responses at frames i and i + 1 with blending
parameter α.
Our multiresolution sampling scheme generates d(n) (T ), 0 ≤ n < N , level by
level upward from the coarsest level. At each level n, we sample the coefficients
of d(n) (T ) from d(n) (C), where C is A, B, or even a third motion signal provided
by a user. To determine a value for (ûni , v̂in ) ∈ d(n) (T ), we first select a small
set of candidates {(unj , vjn )} from the corresponding level d(n) (C). By matching
(n−1) (n−1)
the features of mi/2 (T ) and those of mj (C), while varying the index j, we
find the best match at some frame j ∗ . That is, we minimize the feature difference
(n−1) (n−1)
F(mi/2 (T )) − F(mj (C)) over all j to determine j ∗ . Its corresponding dis-
(n)
placement (un2j ∗ , v2j ∗) ∈ d
n
(C) is taken as a candidate for (ûni , v̂in ). There are
alternatives in selecting candidates depending on the characteristics of given motion
data. If two motions A and B have similar appearance, we select a single candidate
from either A or B for (ûni , v̂in ). Otherwise, we select one candidate from A and
the other from B to blend them along the transition interval. When user-provided
motion is used, we sample a constant number of candidates to blend. The weight for
each candidate is proportional to the reciprocal of the magnitude of a corresponding
feature difference.
5.5.4 Discussion
Other Applications: The multiresolution sampling scheme can be used for other
applications. A practical application is noise removal. Given a motion signal cor-
rupted by impulse noise, we would like to restore the corrupted frames while main-
73
taining the characteristics of the signal. This problem can be solved easily by tearing
off the corrupted frames and filling the missing portion through motion transition.
Another application is for seamlessly duplicating and shuffling the frames of a given
motion (see Figure 5.11). Usually, these can be done by splitting a given motion into
several pieces of motion segments and combining them again in a given order. Our
multiresolution sampling scheme can perform this task without explicit splitting and
recombining. Given an example motion m, we first decompose it into a base signal
m(0) and a series of displacement maps. The base signal m̂(0) of a new motion can
be specified interactively by duplicating and shuffling the frames of m(0) . Then, our
scheme samples the detail coefficients of a new motion m̂ from the given motion m
through feature matching.
Adjustment with Spacetime Constraints: As mentioned by Gleicher [33], per-

turbation in motion is visually perceptible particularly when a body segment touches
its environments. Thus, all motion editing techniques explained in this section may
cause artifacts such as foot sliding and penetration into the ground. To avoid such
artifacts, additional efforts are needed for fine-tuning the modified motion. In our
experiments, we interactively specify the instances of heel-strikes and toe-offs to
identify the sequence of footprints. Those footprints together with time instances
are used later for adjusting the motion. This problem is typically formulated as a
motion retargetting problem that can be solved efficiently by employing the hierar-
chical motion fitting technique presented in Chapter 4.
5.6 Experimental Results

We have implemented a prototype motion editing system in C++ on top of X-
windows/MOTIFTM and Open InventorTM . Our human model has 43 DOFs that
consist of six DOFs for the pelvis position and orientation, and three DOFs for
each of other joints except for the elbows and knees which have a single DOF. We
apply our multiresolution analysis and synthesis techniques to several applications
such as enhancement/attenuation, blending, transition, noise removal, and dupli-
cation/shuffling. Experiments are conducted on a SGI Indigo2 workstation (single
74
R10000 processor, 195 MHz) with various motion data (both 30 Hz and 24 Hz) that
were captured at a commercial studio. The execution time for decomposition and
reconstruction is almost negligible. The most time-consuming component of our
approach is hierarchical motion fitting which takes about 0.01 to 0.03 seconds per
frame.
Figures 5.6 and 5.7 show two captured motions, “jump and kick” and “face hit”.
Those motions have 88 and 169 frames of uniform interval, respectively. For each
of them, extra frames are added at the end of the signal to form a multiresolution
representation of four levels. We multiply constant factors of 1.5 and 0.5 to the detail
coefficients at d(0) and d(1) to enhance (top) and attenuate (bottom) the motions,
respectively.
The blending example in Figure 5.8 combines three motion clips that have 186,
193, and 210 frames, respectively. For time normalization, we resample each of them
such that it has 25 frames between each pair of consecutive heel-strikes of the right
foot to establish a frame correspondence among the motions. With these resampled
motions, we generate a new motion (lower right) using Equation (5.15).
Figure 5.9 shows a transition motion between walking and running that are
not overlapped in time. The walking motion ends with the right foot of a character
moving forward and the running motion starts with its right foot forward as well.
We insert an additional keyframe with its left foot forward to enable its legs to
swing over the transition interval (top). The interpolation at a base level offers a
smooth connection between given motions but yields serious artifacts due to the lack
of fine details. Those artifacts are observed clearly along the trajectory of the head
that waves for the original motions but move straight for a synthesized transition
motion (middle). Our multiresolution sampling scheme circumvents such artifacts by
incorporating the original visual characteristics into a transition motion (bottom).
Our multiresolution sampling scheme is also useful for noise removal. The mo-
tion signals in Figure 5.10 have 161 frames, and 15 successive frames at the middle of
them are corrupted by impulse noise (upper row). We remove the corrupted frames
and divide the remaining portion into two separate segments to obtain their indi-
vidual multiresolution representations. The representations thus obtained are used
75
later to combine them again by sampling detail coefficients for the missing portion
(lower row). Figure 5.10 shows two examples with different characteristics. While
a motion signal (left column) for the left thigh oscillates rapidly up and down to
include relatively large features, a signal (right column) for the left shoulder contains
small features that resemble noise. In either case, our approach successfully recon-
structs the missing portion to have the same visual characteristics to its neighboring
portion.
Figure 5.11 illustrates how we can create a longer sequence of frames from a
short example motion. We start with a given example motion (upper left) which is
resampled such that it has 24 frames for each cycle (from a heel-strike to its next
for the same foot). Recursive reduction of the motion gives a base signal of 6 frames
(lower left). We duplicate and shuffle its frames to obtain a new base signal of 10
frames (lower right). Finally, a desired motion (upper right) is achieved using our
multiresolution sampling scheme that adds fine details to the new base signal.
76
Figure 5.6: Jump and kick. (top) Attenuated; (middle) Original; (bottom) Enhanced
Figure 5.7: Face hit. (top) Attenuated; (middle) Original; (bottom) Enhanced
77
Φst
Φwl Φwl
Φst
Figure 5.8: Frequency-based motion blending. (upper left) Straight walking; (upper
right) Turning with a normal walk; (lower left) Walking with a limp; (lower right)
Turning with a limp
Figure 5.9: Motion transition between walking and running that are not overlapped
in time. Motions are depicted by superimposing their stick figures. (top) Original
motions and a user-specified keyframe between them; (middle) Smooth interpolation;
(bottom) Adding fine details
78
Figure 5.10: Noise removal for live-captured motion data. (left column) Left thigh;
(right column) Left shoulder; (upper row) Corrupted by impulse noise; (lower row)
Corrupted frames recovered
Simplify
Reconstruct
Rearrange
Figure 5.11: Duplication and shuffling. (upper left) An example motion; (lower left)
Its base signal; (lower right) A modified base signal; (upper right) A synthesized
motion
79
Chapter 6
Conclusion
6.1 Contributions
Crafting animation involves a variety of signal processing tasks such as smoothing,
attenuation, enhancing, resampling, interactive editing, blending, stitching, and so
on. This thesis elaborates fundamental techniques that facilitate such tasks. That
is, spatial filtering for orientation data, motion editing with spacetime constraints,
and multiresolution analysis/synthesis.
Spatial masking is a simple, powerful technique for digital signal processing.
We present a novel scheme to design an orientation filter that corresponds to a given
spatial mask. We show that our orientation filters have some desirable properties
such as coordinate-invariance, shift-invariance, and symmetry. We also provide some
examples that perform smoothing and sharpening on orientation signals. Experi-
mental results show that our orientation filters perform well for live-captured data.
We investigate a new approach to adapting an existing motion of a human-like
character to have desired features specified by a set of constraints. The key idea
of our approach is to introduce a hierarchical displacement mapping by which we
cannot only manipulate a motion adaptively to satisfy a large set of constraints
within a specified error tolerance, but also edit an arbitrary portion of the motion
through direct manipulation. The performance of our method is greatly improved
80
by employing a curve fitting technique that minimizes a local approximation er-
ror. The hierarchical structure compensates for the possible drawbacks of the local
approximation method by globally propagating displacements at coarse levels and
later tuning at fine levels. Further performance gain is achieved by the new inverse
kinematics solver. Our hybrid algorithm performs much faster than pure numerical
algorithms.
Motion analysis and synthesis can benefit from hierarchical representations and
procedures. We have presented a new multiresolution approach to motion analysis
and synthesis. Our motion representation allows to modify the coefficients at each
level in the hierarchy independently of those at the other levels through the level-wise
manipulation of detail coefficients. Exploiting this capability, we have developed a
variety of motion editing tools that can be used for modifying, blending, and stitching
highly detailed motion data. The success of our approach is mainly due to motion
filtering and displacement mapping. Our filtering scheme can handle orientations
as well as positions in a coherent manner. The notion of displacement mapping
provides an elegant formulation for multiresolution representations in which each
individual detail coefficient is represented as a pair of 3D vectors measured at a
local coordinate system. This formulation leads to multiresolution motion synthesis
through coordinate-independent operations such as scaling, blending, interpolation,
and sampling.
6.2 Future Work

In this section, We discuss future research directions and potential derivations from
the work presented in this thesis.
Applications of Multiresolution Analysis and Synthesis: A multiresolution

representation has an abundance of potential applications. One promising applica-
tion is compression. With our multiresolution representation, we can approximate
a motion signal with a smaller set of detail coefficients. Thus, lossy compression is
easily achieved by removing detail coefficients of small magnitude. Another future
application is digital watermarking. The proliferation of motion data stimulates
81
the need for copyright protection. Watermarking is to embed an authenticity or
ownership information into the data. The embedded information is an invisible
identification code permanently remains in the data unless it is extremely degraded.
It is well-known that a watermarking scheme can be more robust and reliable with
multiresolution transformation [15, 56, 78].
Motion Database Management: Another interesting area for future research

involves storing motion data into a database and retrieve them for reusing. Due
to the short history of motion capture technology, large motion databases are not
available yet. However, it is not far from the presence of a large motion database that
contains tens of thousands of canned motion clips. With a database of a few hundred
motion clips, a user can inspect them exhaustively to find a specific query. However,
it is much harder to locate that query among several thousand, and it is almost
impossible among tens of thousand. A promising approach for searching a motion
database would be to express the query either as a rough keyframed motion, or as
another motion clip obtained by motion capturing. This kind of approach (called
“content-based query”, “query by example” or “similarity retrieval”) is extensively
exercised for digital images [42, 45, 49, 58, 72]. We expect that this approach can
be adapted easily for dealing with motion data.
High-level Control for Synthetic Characters: To create synthetic characters

for an interactive environment, it is indispensable for a user to have high-level (or
task-level) directives to control the characters that may highly interact with each
other and with synthetic environments. An approach for achieving high-level di-
rectives would be to employ a task-level scripting language for describing stories
and behavioral patterns of characters, and real-time motion synthesis for simulating
tasks described in a script or given from user interaction.
There have been a good deal of scripting languages developed so far [13, 29, 30,
67, 70, 73, 75, 80, 92]. Combining a scripting language with our paradigm, that is
generating a desired animation from canned motion clips, may yield a task scheduling
problem to interpret the flow of animation as sequences of tasks. Each task can be
identified with either portion of a motion clip or a sequence of multiple motion
82
clips. Thus, a sequence of task successively performed by a synthetic character can
be instantiated as a seamless motion using various motion editing tools such as
stitching, blending, and shuffling explained in this thesis.
A typical task used frequently in a script is to move a character from the start
position to the goal position. In our context, achieving such a task may involve two
problems: One is to align given motion clips such as “straight walk”, “turn left”, and
“turn right” in a desired sequence to form a seamless motion which connects the start
and goal positions approximately. The other is refining the motion thus obtained
to enforce exact interpolation at boundary frames and to take valid footholds at
intermediate frames. In my opinion, a randomized planning approach is well suited
to this paradigm [12, 50, 51, 52, 53]. The basic idea of randomized planning is to
construct a roadmap (a directed graph) whose nodes correspond to valid (collision-
free) posture of a character, and in which two nodes are connected by an edge if
the character can move from one posture to another using a motion clip chosen
from a given candidate set within a specified tolerance. Then, the motion planning
problem is reduced to the shortest path problem on a directed graph that can be
solved efficiently. Our hierarchical motion fitting technique may be used at the
refinement step.
On-line Control for Performance Animation: Performance-based animation

has become popular recently for fast motion generation of synthetic characters [4,
5, 83]. The basic idea is capturing the motion of a real actor to adapt it to a
synthetic actor while avoiding excessive frame delay. The on-line version of a motion
retargetting problem is normally formulated in a quite different way from its off-line
version discussed in this thesis, since the on-line version must be described in terms
of a sequence of recent postures independently of the future postures [11, 87]. There
are two primary difficulties for solving the on-line version: The first difficulty is
that the kinematic constraints for retargetting should be specified on the fly, and
the second is that the retargetting algorithm should be fast enough to support real-
time interactive animation. A simple approach to on-the-fly constraint specification
would be to extract constraints from the geometric relationships among characters
83
and environments to avoid geometric inconsistency such as inter-penetration. In
terms of efficiency, there have been efforts to speed up the inverse kinematics routine
by trading some extent of generality for efficiency [87].
84
Bibliography
[1] R. Azuma and G. Bishop. Improving static and dynamic registration in an

optical see-through HMD. Computer Graphics (Proceedings of SIGGRAPH
94), pages 197–204, July 1994.
[2] D. Berman, J. Bartell, and D. Salesin. Multiresolution painting and composit-

ing. Computer Graphics (Proceedings of SIGGRAPH 94), pages 85–90, July
1994.
[3] J. S. De Bonet. Multiresolution sampling procedure for analysis and synthesis

of texture images. Computer Graphics (Proceedings of SIGGRAPH 97), pages
361–368, August 1997.
[4] R. Boulic, T. Molet, and D. Thalmann. A real-time anatomical converter for

human motion capture. In Proceedings of the 7th Eurographics Workshop on
Animation and Simulation, 1996.
[5] R. Boulic, T. Molet, and D. Thalmann. Human motion capture driven by

orientation measurements. Presence, 8:187–203, 1999.
[6] L. S. Brotman and A. N. Netravali. Motion interpolation by optimal control.

Computer Graphics (Proceedings of SIGGRAPH 88), pages 309–315, August
1988.
[7] A. Bruderlin and L. Williams. Motion signal processing. Computer Graphics

(Proceedings of SIGGRAPH 95), pages 97–104, August 1995.
85
[8] P. J. Burt and E. H. Adelson. The Laplacian pyramid as a compact image
code. IEEE Transactions on Communications, 31:532–540, 1983.
[9] P. J. Burt and E. H. Adelson. A multiresolution spline with application to

image mosaics. ACM Transactions on Graphics, 2:215–236, 1983.
[10] A. Certain, J. Popović, T. DeRose, T. Duchamp, D. Salesin, and W. Stuetzle.

Interactive multiresolution surface viewing. Computer Graphics (Proceedings
of SIGGRAPH 96), pages 97–115, August 1996.
[11] K.-J. Choi and H.-S. Ko. On-line motion retargetting. In Proceedings of Pacific
Graphics ’99, pages 32–42, 1999.
[12] M. G. Choi, J. Lee, and S. Y. Shin. A randomized approach to planning biped

locomotion with prescribed motions. In preparation, 2000.
[13] T. S. Chua, W. H. Wong, and K. C. Chu. Design and implementation of the

animation language. New Trends in Computer Graphics (Proceedings of CG
International ’88), pages 15–26, 1988.
[14] M. F. Cohen. Interactive spacetime control for animation. Computer Graphics

(Proceedings of SIGGRAPH 92), pages 293–302, July 1992.
[15] I. J. Cox, J. Kilian, F. T. Leighton, and T. Shamoon. Secure spread spec-

trum watermarking for multimedia. IEEE Transactions on Image Processing,
6(12):1673–1687, 1997.
[16] M. Curtis. Matrix Groups. Springer-Verlag, 1972.
[17] M. Desbrun, M. Meyer, P. Schröder, and A. H. Barr. Implicit fairing of irregu-

lar meshes using diffusion and curvature flow. Computer Graphics (Proceedings
[18] R. A. DeVore, B. Jawerth, and B. J. Lucier. Image compression through

wavelet transform coding. IEEE Transactions on Information Theory,
38(2):719–746, 1992.
86
[19] R. A. DeVore, B. Jawerth, and B. J. Lucier. Surface compression. Computer
Aided Geometric Design, 9(3):219–239, 1992.
[20] S. Dubuc. Interpolation through an iterative scheme. J. Math. Anal. Appl.,

114:185–204, 1986.
[21] N. Dyn, J. Gregory, and D. Levin. A 4-point interpolatory subdivision scheme

for curve design. Computer Aided Geometric Design, 7:129–140, 1987.
[22] M. Eck, T. DeRose, T. Duchamp, H. Hoppe, M. Lounsbery, and W. Stuetzle.

Multiresolution analysis of arbitrary meshes. Computer Graphics (Proceedings
[23] Y. C. Fang, C. C. Hsieh, M. J. Kim, J. J. Chang, and T. C. Woo. Real time

motion fairing with unit quaternions. Computer-Aided Design, 30(3):191–198,
March 1998.
[24] A. Finkelstein, C. E. Jacobs, and D. H. Salesin. Multiresolution video. Com-

puter Graphics (Proceedings of SIGGRAPH 96), pages 281–290, August 1996.
[25] A. Finkelstein and D. H. Salesin. Multiresolution curves. Computer Graphics

[26] D. R. Forsey and R. H. Bartels. Hierarchical B-spline refinement. Computer

Graphics (Proceedings of SIGGRAPH 88), pages 205–212, August 1988.
[27] D. R. Forsey and R. H. Bartels. Surface fitting with hierarchical splines. ACM
Transactions of Graphics, 14(2):134–161, April 1995.
[28] I. M. Gelfand and S. V. Fomin. Calculus of Variations. Prentice-Hall, 1963.
[29] M. Gervautz and D. Schmalstieg. Intergrating a scripting language into an

interactive animation system. In Proceedings of Computer Animation ’94,
pages 156–166. IEEE Computer Society Press, 1994.
[30] P. Getto and D. Breen. An object-oriented architecture for a computer ani-

mation system. The Visual Computer, 6(2):79–92, 1990.
87
[31] M. Girard and A. A. Maciejewski. Computational modeling for the computer
animation of legged figures. Computer Graphics (Proceedings of SIGGRAPH
85), pages 263–270, July 1985.
[32] M. Gleicher. Motion editing with spacetime constraints. In Proceedings of

Symposium on Interactive 3D Graphics, pages 139–148, 1997.
[33] M. Gleicher. Retargetting motion to new characters. Computer Graphics

[34] R. C. Gonzalez and R. E. Woods. Digital Image Processing. Addison-Wesley,

1993.
[35] S. J. Gortler and M. F. Cohen. Hierarchical and variational geometric modeling

with wavelets. In Proceedings of the 1995 ACM Symposium on Interactive 3D
Graphics, pages 35–42, 1995.
[36] S. J. Gortler, P. Schröder, M. F. Cohen, and P. Hanrahan. Wavelet radiosity.

1993.
[37] S. Guo and J. Robergé. A high-level control mechanism for human locomotion
based on parametric frame space interpolation. In Proceedings of Computer
Animation and Simulation ’96, Eurographics Animation Workshop, pages 95–
107. Springer-Verlag, 1996.
[38] I. Guskov, W. Sweldens, and P. Schröder. Multiresolution signal processing for

meshes. Computer Graphics (Proceedings of SIGGRAPH 99), pages 325–334,
August 1999.
[39] W. R. Hamilton. Elements of Quaternions I & II. Chelsea Publishing Com-

pany, 1969.
[40] G. Hanotaux and B. Peroche. Interactive control of interpolations for anima-

tion and modeling. In Graphics Interface ’93, pages 201–208, 1993.
88
[41] D. J. Heeger and J. R. Bergen. Pyramid based texture analysis/synthesis.
1995.
[42] K. Hirata and T. Kato. Query by visual example–content based image re-
trieval. In A. Pirotte, C. Delobel, and G. Gottlob, editors, Advances in
Database Technology (EDBT ’92), pages 56–71. Springer-Verlag, Berlin.
[43] C. C. Hsieh, Y. C. Fang, M. E. Wang, C. K. Wang, M. J. Kim, S. Y. Shin,

and T. C. Woo. Noise smoothing for VR equipment in quaternions. IIE
Transactions, 30:581–587, 1998.
[44] W. M. Hsu, J. F. Hughes, and H. Kaufman. Direct manipulation of free-form

deformations. Computer Graphics (Proceedings of SIGGRAPH 92), pages 177–
184, July 1992.
[45] C. E. Jacobs, A. Finkelstein, and D. H. Salesin. Fast multiresolution image

querying. Computer Graphics (Proceedings of SIGGRAPH 95), pages 277–286,
August 1995.
[46] B. Jähne. Digital Image Processing: Concepts, Algorithms and Scientific Ap-
plications. Springer-Verlag, 1992.
[47] A. K. Jain. Fundamentals of Digital Image Processing. Prentice-Hall, 1989.
[48] W. Kahan. Lectures on computational aspects of geometry. Unpublished

manuscripts, 1983.
[49] T. Kato, T. Kurita, N. Otsu, and K. Hirata. A sketch retrieval method for
full color image database–query by visual example. In Proceedings of the 11th
IAPR International Conference on Pattern Recognition, pages 530–533. IEEE
Computer Society Press, Los Alamitos, CA, 1992.
[50] L. Kavraki, M. Kolountzakis, and J.-C. Latombe. Analysis of probabilistic

roadmaps for path planning. In Proceedings of IEEE International Conference
on Robotics and Automation, pages 3020–3025, 1996.
89
[51] L. Kavraki and J.-C. Latombe. Randomized preprocessing of configuration
space for fast path planning. In Proceedings of IEEE International Conference
on Robotics and Automation, pages 2138–2145, 1994.
[52] L. Kavraki, J.-C. Latombe, R. Motwani, and P. Raghavan. Randomized query

processing in robot motion planning. In Proceedings of the 27th Annual ACM
Symposium on Theory of Computing (STOC), pages 353–362, 1995.
[53] L. Kavraki, P. Svestka, J.-C. Latombe, and M. H. Overmars. Probabilistic

roadmaps for path planning in high dimensioanl configuration space. IEEE
Journal of Robotics and Automation, 12(4):566–580, 1996.
[54] M. J. Kim, C. C. Hsieh, M. E. Wang, C. K. Wang, Y. C. Fang, and T. C. Woo.

Noise smoothing for VR equipment in the quaternion space. In Proceedings of
the Symposium on Virtual Reality in Manufacturing Research and Education,
October 1996.
[55] M. J. Kim, M. S. Kim, and S. Y. Shin. A general construction scheme for

unit quaternion curves with simple high order derivatives. Computer Graphics
[56] T. H. Kim, J. Lee, and S. Y. Shin. Robust motion watermarking based on

multiresolution analysis. In preparation, 2000.
[57] D. Kincaid and W. Cheney. Numerical Analysis. Books/Cole, 1990.
[58] A. Kitamoto, C. Zhou, and M. Takagi. Similarity retrieval of NOAA satellite

imagery by graph matching. In volumn 1908 of Proceedings of the SPIE on
Storage and Retrieval for Image and Video Databases, pages 60–73, 1993.
[59] L. Kobbelt, S. Campagna, J. Vorsatz, and H.-P. Seidel. Interactive multi-

resolution modeling on arbitrary meshes. Computer Graphics (Proceedings of
SIGGRAPH 98), pages 105–114, July 1998.
[60] L. Kobbelt and P. Schröder. A multiresolution framework for variational sub-

division. ACM Transactions on Graphics, 17(4):209–237, 1998.
90
[61] Y. Koga, K. Kondo, J. Kuffer, and J. Latombe. Planning motions with in-
tentions. Computer Graphics (Proceedings of SIGGRAPH 94), pages 395–408,
July 1994.
[62] J. U. Korein and N. I. Badler. Techniques for generating the goal-directed

motion of articulated structures. IEEE CG&A, pages 71–81, Nov. 1982.
[63] A. W. F. Lee, W. Sweldens, P. Schröder, L. Cowsar, and D. Dobkin. MAPS:

Multiresolution adaptive parameterization of surfaces. Computer Graphics
[64] J. Lee and S. Y. Shin. Motion fairing. In Proceedings of Computer Animation

’96, pages 136–143, June 1996.
[65] S. Lee, K.-Y. Chwa, S. Y. Shin, and G. Wolberg. Image metamorphosis us-
ing snakes and free-form deformations. Computer Graphics (Proceedings of
SIGGRAPH 95), pages 439–448, August 1995.
[66] S. Lee, G. Wolberg, and S. Y. Shin. Scattered data interpolation with multi-
level B-splines. IEEE Transactions on Visualization and Computer Graphics,
3(3):228–244, 1997.
[67] S. J. Leffler, T. Reeves, and E. F. Ostby. The menv modelling and animation
environment. The Journal of Visualization and Computer Animation, 1(1):33–
40, August 1990.
[68] Z. Liu, S. G. Gortler, and M. F. Cohen. Hierarchical spacetime control. Com-

puter Graphics (Proceedings of SIGGRAPH 94), pages 35–42, July 1994.
[69] M. Lounsbery, T. D. DeRose, and J. Warren. Multiresolution analysis for sur-

faces of arbitrary topological type. ACM Transactions on Graphics, 16(1):34–
73, 1997.
[70] R. Maiocchi and B. Pernici. Directing an animated scene with autonomous

actors. The Visual Computer, 6(6):359–371, 1990.
[71] S. Mallat. A Wavelet Tour of Signal Processing. Academic Press, 1998.
91
[72] W. Niblack, R. Barber, W. Equitz, M. Flickner, E. Glasman, D. Petkovic,
P. Yanker, C. Faloutsos, and G. Taubin. The QBIC project: Querying images
by content using color, texture, and shape. In volumn 1908 of Proceedings
of the SPIE on Storage and Retrieval for Image and Video Databases, pages
173–187, 1993.
[73] T. J. O’Donnell and A. J. Olson. Gramps – a graphics language interpreter

for real-time, interactive, three-dimensional picture editing and animation.
1981.
[74] B. Paden. Kinematics and Control Robot Manipulators. PhD thesis, University
of California, Berkeley, 1986.
[75] K. Perlin and A. Goldberg. Improv: A system for scripting interactive actors
in virtual worlds. Computer Graphics (Proceedings of SIGGRAPH 96), pages
205–216, August 1996.
[76] J. C. Platt and A. H. Barr. Constraint methods for flexible models. Computer
[77] D. Pletinckx. Quaternion calculus as a basic tool in computer graphics. The

Visual Computer, 5:2–13, 1989.
[78] E. Praun, H. Hoppe, and A. Finkelstein. Robust mesh watermarking. Com-

puter Graphics (Proceedings of SIGGRAPH 99), pages 49–56, August 1999.
[79] W. H. Press, Saul A. Teukolsky, W. T. Vetterling, and B. P. Flannery. Nu-

merical Recipes in C: The Art of Scientific Computing. Cambridge University
Press, second edition, 1992.
[80] C. W. Reynolds. Computer animation with scripts and actors. Computer

Graphics (Proceedings of SIGGRAPH 82), pages 289–296, July 1982.
[81] C. Rose, M. F. Cohen, and B. Bodenheimer. Verbs and Adverbs: Multidimen-

sional motion interpolation. IEEE CG&A, 18(5):32–40, October 1998.
92
[82] C. Rose, B. Guenter, B. Bodenheimer, and M. F. Cohen. Efficient genera-
tion of motion transitions using spacetime constraints. Computer Graphics
[83] S. Rosenthal, B. Bodenheimer, C. Rose, and J. Pella. The process of mo-

tion capture: Dealing with the data. In Proceedings of the 8th Eurographics
Workshop on Animation and Simulation, 1997.
[84] F. J. M. Schmitt, B. A. Barsky, and W. Du. An adaptive subdivision method

for surface-fitting from sampled data. Computer Graphics (Proceedings of
SIGGRAPH 86), pages 179–188, August 1986.
[85] P. Schröder and W. Sweldens. Spherical wavelets: Efficiently representing

functions on the sphere. Computer Graphics (Proceedings of SIGGRAPH 95),
pages 161–172, August 1995.
[86] P. Schröder, W. Sweldens, M. Cohen, T. DeRose, and D. Salesin. Wavelets in

Computer Graphics (SIGGRAPH 96 Course note #13). ACM press, 1996.
[87] H. J. Shin, J. Lee, and S. Y. Shin. On-line motion retargetting for performance-
based animation. In preparation, 2000.
[88] K. Shoemake. Animating rotation with quaternion curves. Computer Graphics

(Proceedings of SIGGRAPH 85), pages 245–254, 1985.
[89] E. P. Simoncelli and E. H. Adelson. Subband transforms. In Subband Image

Coding. J. W. Woods, editors. Kluwer Academic Publishers, 1990.
[90] Gilbert Strang. The discrete cosine transform. SIAM Review, 41(1):135–147,
1999.
[91] G. Taubin. A signal processing approach to fair surface design. Computer

[92] N. M. Thalmann and D. Thalmann. The use of high-level 3-d graphical types
in the mira animation system. IEEE CG&A, 3(9):9–16, December 1983.
93
[93] D. Tolani and N. I. Badler. Real-time inverse kinematics of the human arm.
Presence, 5(4):393–401, 1996.
[94] M. Unuma, K. Anjyo, and R. Takeuchi. Fourier principles for emotion-based

human figure animation. Computer Graphics (Proceedings of SIGGRAPH 95),
pages 91–96, August 1995.
[95] W. Welch and A. Witkin. Variational surface modeling. Computer Graphics

[96] D. J. Wiley and J. K. Hahn. Interpolation synthesis for articulated figure mo-
tion. In Proceedings of IEEE Virtual Reality Annual International Symposium
’97, pages 157–160. IEEE Computer Society Press, 1997.
[97] A. Witkin and M. Kass. Spacetime constraints. Computer Graphics (Proceed-

ings of SIGGRAPH 88), pages 159–168, August 1988.
[98] A. Witkin and Z. Popović. Motion warping. Computer Graphics (Proceedings

[99] J. Zhao and N. I. Badler. Inverse kinematics positioning using nonlinear

programming for highly articulated figures. ACM Transactions on Graphics,
13(4):313–336, 1994.
[100] D. Zorin, P. Schröder, and W. Sweldens. Interactive multiresolution mesh

editing. Computer Graphics (Proceedings of SIGGRAPH 97), pages 259–268,
August 1997.
94

A Hierarchical Approach To Motion Analysis and Synthesis For Articulated Figures

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A Hierarchical Approach To Motion Analysis and Synthesis For Articulated Figures

Uploaded by

Copyright:

Available Formats

PhD Thesis

Department of Computer Science

Animating human-like characters is a recurring issue in computer graphics.

3 General Construction of Spatial Filters for Orientation Data 10

4 Motion Editing with Spacetime Constraints 28

5 Multiresolution Motion Analysis and Synthesis 57

2.1 Exponential and logarithmic maps . . . . . . . . . . . . . . . . . . . 7

3.1 The transform between an angular signal in S3 and a linear signal in

4.1 Hierarchical curve ﬁtting to scattered data through multilevel B-spline

5.1 Wiring diagram of the multiresolution analysis . . . . . . . . . . . . 62

4.1 Performance data. # of parameters counts the DOFs of a character

Animating human-like characters is a recurring issue in computer graphics. In the

General Construction of Spatial Filters for Orientation Data: Spatial

Editing with Spacetime Constraints: We also address how to adapt an exist-

Multiresolution Analysis and Synthesis: Multiresolution representations are

Orientations as well as positions are important to describe motions. Quaternions,

î2 = ĵ 2 = k̂2 = îĵ k̂ = −1,

îĵ = k̂, ĵ î = −k̂,

k̂ î = ĵ, îk̂ = −ĵ.

It is clear that quaternion multiplication is not commutative. Conventionally, we

q1 q2 = (w1 + x1 î + y1 ĵ + z1 k̂)(w2 + x2 î + y2 ĵ + z2 k̂)

(x1 w2 + w1 x2 − z1 y2 + y1 z2 )î + (2.2)

Quaternions form a non-commutative group under multiplication with its identity

Rq (p) = qpq−1 , for p ∈ R3 . (2.4)

Here, the vector p = (x, y, z) is interpreted as a purely imaginary quaternion

Figure 2.1: Exponential and logarithmic maps

ω(t) = 2q−1 (t)q̇(t), (2.5)

Exponential and Logarithmic Maps

In a geometric viewpoint, the exponential and logarithmic maps provide a cor-

A rigid object starting at orientation q1 will experience the orientation q2 when

Figure 2.2: Geodesics and spherical linear interpolation

dist(q1 , q2 ) = dist(aq1 , aq2 )

slerpt (q1 , q2 ) = q1 exp(t · log(q−1

slerpt (q1 , q2 ) describes an angular motion on the geodesic between q1 and q2 as

F(pi ) = a−k pi−k + · · · + a0 pi + · · · + ak pi+k . (3.1)

3.2 General Properties of Spatial Filters

F(api + bpi ) = aF(pi ) + bF(pi ) (3.2)

S l (pi ) = pi−l . (3.3)

Now, we can deﬁne the shift-invariance as follows: The ﬁlter F is shift-invariant if

3.3 Spatial Orientation Filters

3.3.1 Basic Idea

of displacements from a start point:

quaternion points qj and qj+1 can be parameterized by a 3-dimensional vector ωj

3.3.2 Filter Design

Let Q = (q0 , · · · , qi , · · · ), qi ∈ S3 , be a unit quaternion signal. We deﬁne its vector

As given in Equation (3.1), let F be an affine-invariant filter of which coefficients

F(pi ) = a−k pi−k + · · · + a0 pi + · · · + ak pi+k , (3.8)

displacement F(pi ) − pi . The resulting orientation ﬁlter is

H(qi ) = qi exp(F(pi ) − pi ). (3.9)

Figure 3.2: The conceptual view of ﬁltering orientation data

In Equation (3.10), the angular displacement caused by H is described in terms

3.3.3 Properties of Orientation Filters

Proposition 1 H is coordinate-invariant, that is, aH(qi )b = H(aqi b) for any a

Proposition 3 H is symmetric, if its coeﬃcients (a−k , · · · , a0 , · · · , ak ) are sym-

From Equation (3.13), bm = −b−m−1 when am = a−m . Therefore, letting n =

entation ﬁlter can be derived from Equation (3.10) as follows:

where h is the time interval between two successive points.

For a large k, the binomial distribution closely approximates Gaussian distribution.

Sharpening: Our ﬁnal example is a high-frequency boost ﬁlter. Sharpening has

(high-boost) = (original) + λ(high-pass)

= (original) + λ((original) − (blurred)).

that performs high-frequency boosting on orientation signals.

(a) Motion capture data (b) ω (t)2 (radian/s2 )

(c) Smoothing by HS (d) ω (t)2 (radian/s2 )

(e) Blurring by HB (b) ω (t)2 (radian/s2 )

(c) Sharpening by HU (d) ω (t)2 (radian/s2 )

dtj 2α = mtj mk−1 (tj )2α

HC (q0 , w, k) = {eψv/2 eφw/2 q0 | 0 ≤ ψ ≤ k, v = w = 1, v · w = 0},

HA (q0 , w, k1 , k2 ) = {eψv/2 eφw/2 q0 | k1 ≤ φ ≤ k2 , v = w = 1, v · w = 0},

HS (q0 , k) = {eφv/2 q0 | 0 ≤ φ ≤ k, v = 1},