Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

First-Hand:The Unscented Transform

Contributed by Jeffrey Uhlmann (https://engineering.missouri.edu/faculty/jeffrey-uhlmann/), University of Missouri, by invitation

An invited, composed interview stimulated by the revelation that Uhlmann and Simon Julier's 2004 Proceedings of the IEEE article,
"Unscented Filtering and Nonlinear Estimation," was one of the most cited Proceedings articles of the decade. For more information
about those articles, see IEEE History Center Staff [Alexander B. Magoun], "Proceedings of the IEEE (http://ieeexplore.ieee.org/document/63
31611/) through 100 Years: 2000-2009," (http://ieeexplore.ieee.org/document/6331611/) Proceedings of the IEEE 100, no. 11 (November
2012), p. 3131-45.

목차
1 What led you to inventing the Unscented Transform (UT)?
1.1 A. Tracking research at Naval Research Laboratory (NRL)
1.2 B. Tracking research at Oxford University
2 What are the challenges of nonlinear modeling?
3 What make Kalman Filters so popular yet problematic for solving nonlinear problems?
4 How does the Unscented Transform (UT) work?
5 How did you make the switch from linear to nonlinear solutions for nonlinear problems?
6 How long did it take to develop the Unscented Transform?
7 Did you develop this on your own or collaborate with others?
8 How did your long-term collaboration with Simon come about?
9 What’s the difference between the Unscented Transform (UT) and Unscented Kalman Filter (UKF)?
10 What’s the appeal of the Unscented Transform?
11 Where do people use the UT and UKF?
12 What’s with the Name “Unscented”?
13 References

What led you to inventing the Unscented Transform (UT)?


A. Tracking research at Naval Research Laboratory (NRL)
I began work at the Naval Research Laboratory (http://www.nrl.navy.mil/about-nrl/history/) in 1987, doing research that was relevant to
tracking multiple objects ranging from particles in molecular dynamics simulations (http://www.ch.embnet.org/MD_tutorial/pages/MD.P
art1.html) to missiles in SDI (http://en.wikipedia.org/wiki/Strategic_Defense_Initiative#References) (aka Star Wars) applications. These
involved high-dimensional, highly nonlinear motion models that took into account factors such as atmospheric drag at different
altitudes, temperature changes, and even local gravitational effects. My original research focus was on spatial data structures (http://ope
nlife.cc/blogs/2012/august/spatial-data-structures), which are needed for quickly finding objects that are near other objects. For
example, if you get a line-of-sight sensor measurement of some object, how do you determine which of the possibly thousands of
objects in your database produced that measurement? The naive approach would be to check every object in the database to see which
one is most likely to be found along that line of sight, but a better data structure can allow the appropriate object to be found much
faster. Once found, its state—i.e., position, velocity, etc.—can be updated using a filtering algorithm, typically some variant of the
Kalman filter (http://www.cs.unc.edu/~welch/kalman/).

The great thing about NRL is that it usually has the latest cutting-edge technologies, which for me at that time included a Connection
Machine (http://en.wikipedia.org/wiki/Connection_Machine) with up to 65 thousand simple processors and a Butterfly machine (http://e
n.wikipedia.org/wiki/BBN_Butterfly) with 128 powerful processors. At that time there was a sense, especially at the Pentagon, that any
computational bottlenecks that might arise in the tracking of large numbers of objects could be addressed simply with money for more
processors. My colleagues and I showed that this was seriously mistaken because the algorithms scaled much worse than linearly; e.g., a
doubling of the size of the problem might increase the computation time by a factor of thousand or more.

What was really needed were better algorithms rather than more processors. This was demonstrated when Mike Zuniga (http://www.ne
wgs.com/MZunigaResume.pdf) and I used a desktop workstation to perform the first-ever simulation of the tracking problem for a
million objects in 1990—something that was well beyond what anyone was doing on any machine at that time. We were able to do it by
using better algorithms and data structures, and this helped push the SDI program to focus its attention more on computer science than
on computers. I told some of this story in an article in American Scientist in 1992.

Around this time I received an email from Hugh Durrant-Whyte (http://en.wikipedia.org/wiki/Hugh_F._Durrant-Whyte), who had recently
established a data fusion group at Oxford. He was spearheading the application of the Kalman filter and related tracking techniques to
problems in robotics. He asked if I'd like to join his group, and what he described sounded very interesting. So I applied for a 2-year
sabbatical from NRL to pursue a doctorate in robotics at Oxford.

B. Tracking research at Oxford University


I arrived at Oxford University on 29 September 1993. When I met with Hugh for the first time, he introduced me to a completely new
problem: how to apply the Kalman filter to track objects in an environment so that an autonomous vehicle can self-localize. In other
words, the goal was to have a vehicle identify natural features to use as landmarks so that it always knows where it is. That problem
fascinated me because it seemed to require a completely new theoretical generalization of the Kalman filter. In the process of
developing the mathematical framework for that I began to re-think other aspects of the way the Kalman filter was used in practice. One
of those was the way it was applied to nonlinear systems.

What are the challenges of nonlinear modeling?


Almost any real-world tracking or control problem involves the use of nonlinear equations defining how the state of a system changes in
time (e.g., the motion of a ballistic object) and how sensor measurements taken in one coordinate system are converted to another
coordinate system. The Kalman filter is defined for linear models, so the standard approach for accommodating a nonlinear model was
to linearize it, i.e., find the linear transformation that best approximates the nonlinear one. I had never really thought deeply about
linearization even though I was aware of its limitations. For example, the SDI program funded a group of physicists to develop an ultra-
high-fidelity model of a warhead entering the atmosphere, and the result was a high-dimensional set of nonlinear equations taking into
effect atmospheric drag, heat dissipation, and local gravitational variations along the trajectory. It was very impressive, but after
linearization it produced terrible results. Traditionally people would evaluate nonlinear functions/transformations by linearizing them, i.e.,
deriving the linearized approximation (Jacobian matrix (http://en.wikipedia.org/wiki/Jacobian_matrix_and_determinant)). The result can
be very tedious to obtain or not even well-defined in some cases, and even when it’s done it’s still just a linear approximation. In that
particular case I had attributed the poor performance to a failure to tune the filter properly to exploit the higher fidelity model. After a
couple of months working on my map-building system at Oxford, however, I came to understand the problem more deeply.

What make Kalman Filters so popular yet problematic for solving nonlinear
problems? foremost : 가장중요한

Stepping back a bit, the Kalman filter is widely used is because it provides an elegant mathematical framework that captures—at least
for linear systems—all aspects of the tracking problem. First and foremost, it provides a way to represent the uncertainty associated with
the state of a system, e.g., the position and velocity of a moving object, in the form of a mean vector and an associated error covariance
matrix. This is significant because it isn't clear a priori how best to represent uncertainty. For example, it could be represented in the
form of an error radius, a confidence factor of some sort, or something else entirely.

Kalman showed that if the covariance matrix somehow relates to the error associated with the mean position vector, then his filtering
algorithm will reduce that error in a particular fashion as observations are made. He further showed that his algorithm is the best
possible if the error in the mean is Gaussian-distributed (http://hyperphysics.phy-astr.gsu.edu/hbase/math/gaufcn.html) and the
covariance matrix is equal to the second central moment (i.e., the covariance) of that Gaussian distribution. In fact, that's why that matrix
is almost universally referred to as a "covariance matrix," even though the Kalman filter doesn't require an assumption of Gaussianity or
even that the covariance matrix exactly represents the second central moment of some other underlying probability distribution.
Unfortunately, many later authors chose to impose a strict Bayesian (http://en.wikipedia.org/wiki/Thomas_Bayes) interpretation on the
Kalman filter which does make these additional assumptions, and this has resulted in a continuing misconception that the Kalman filter
requires Gaussianity assumptions for theoretical rigor. What the Kalman filter requires is a set of linear transformations, typically referred
to as models, describing how the system evolves through time and how to transform between the system coordinates and the
coordinates of the sensor observations. (It also requires that error processes be independent or that their correlations are known
exactly.)

The Kalman filter’s importance stems mostly from the way it formulates the overall structure of a filtering system, but there's another
reason why it became so widely used: Apollo. The Kalman filter was studied and used in at least two systems onboard the Apollo
spacecraft. That really sanctified it to such an extent that afterward nobody could propose anything in the area of tracking or control
without saying that it was based on Kalman filtering. The problem is that even on Apollo there were practical issues with its performance
due to nonlinear effects. One Apollo engineer described to me that it was used for attitude control for docking, but with a human
(astronaut) in the loop. The filter would be turned on and it was understood that it would only give reliable outputs for around eight
seconds, after which it would be recycled or re-started. That was okay because it gave good enough snapshots of information to help
with steering the spacecraft.

In other post-Apollo applications the use of the so-called Extended Kalman Filter (EKF) (http://en.wikipedia.org/wiki/Extended_Kalman_fil
ter#Unscented_Kalman_filters), which is basically the Kalman filter plus linearization, proved to be more problematic because it often
required an enormous amount of tuning and tweaking to produce decent, stable results. The Kalman filter had to be used, but many
implementers regarded it as a kind of stone soup (http://c2.com/cgi/wiki?StoneSoup) in which any good performance was likely due as
much to the tuning efforts of the implementer as it was to the filter architecture itself. Once completed, the system was typically so
tuned to a particular scenario that any significant deviation would cause the filter to diverge (fail). One old-timer described the EKF as a
"ball of chalk" (http://www.phrases.org.uk/bulletin_board/59/messages/864.html) because of its tendency to fall apart if stressed.

Unfortunately, filtering technology today is not really that much better. What has changed is the speed of processing which allows many
more observations to be incorporated to keep the filter on track. A cynic might say that enough sensor observations can make anything
easy to track, but good engineering still plays as big or even bigger role in maximizing accuracy and stability. So, going back to the
merits of the Kalman filter, whether it can be applied rigorously in a given application is possibly not as important as the fact that it
provides a framework for structuring the tuning process to make it work. Someone might ask whether there is a better technique, yet to
be discovered, that can be applied rigorously to practical problems, but the answer is no. The Kalman filter belongs to a family of
approaches that maintains a fixed amount of information about the state of a system while updating that state with observations of it.
When it comes to nonlinear systems this isn't possible because each observation adds new information that can't simply be rolled into a
fixed-sized estimate. The only possible general solution is to have the filter maintain information that is roughly proportional in size to
the entire observation history.

In other words, the most general solution is one in which the state of the system is recalculated at each timestep based on the newest
observation combined with all previous ones. Such an approach would have been unthinkable twenty years ago, but it will likely be
commonplace in the near future as computers become more powerful. The Kalman filter will always have its place because there will
always be problems where the fastest computers aren't fast enough, but its role may diminish.
so much as: 조차도 없이, 심지어 도 없이
How does the Unscented Transform (UT) work?
What was most striking to people about the UT was not the accuracy so much as the ease with which it could be implemented. There
was no longer a need to derive a linearized approximation which would then have to be coded up for use in the filter.
What the Unscented Transform does is to replace the mean vector and its associated error covariance matrix with a special set of points
with the same mean and covariance. In the case of the mean and covariance representing the current position estimate for a target, the
UT is applied to obtain a set of points, referred to as sigma points, to which the full nonlinear equations of motion can be applied
directly. In other words, instead of having to derive a linearized approximation, the equations could simply be applied to each of the
points as if it were the true state of the target. The result is a transformed set of points, and the mean and covariance of that set
represents the estimate of the predicted state of the target.

Put another way, the UT provides a mechanism for estimating what happens when a nonlinear operation is performed on a partially-
characterized probability distribution. This is something that has to be done in almost any nontrivial tracking or control problem. If you
know the mean position and associated error covariance defining the state of a satellite, for example, then to predict its future state
requires a transformation of the current mean and covariance using nonlinear equations of motion. Technically that’s an ill-posed
problem because it requires knowledge of the complete–though unknown–probability distribution defining the current state of the
satellite. What I recognized was that in the absence of that knowledge, any assumed distribution with the appropriate mean and
covariance is in theory as good as any other, so a choice can be made that makes the computations not just tractable, but easy.

The Kalman filter works because the second central moment of a probability distribution transforms linearly. That is, the covariance of
the transformed distribution is just the transformed covariance of the original distribution. The covariance matrix used in the filter can be
larger than the actual covariance of the unknown underlying error distribution because each operation that reduces the filter covariance
reduces the true underlying error covariance as well. When trying to apply a nonlinear transformation, however, there is an enormous
insurmountable technical problem: the nonlinearly-transformed covariance depends fundamentally on the characteristics of the
unknown underlying probability distribution. Without knowing that distribution exactly the problem is ill-posed, but that's precisely the
situation in virtually all real-world applications of the Kalman filter. Linearization hides the problem but not the effects because, while
each transformation is linear, it's the wrong transformation!
insurmountable : 대처할 수 없는

How did you make the switch from linear to nonlinear solutions for nonlinear
problems? proxy:대리, 대리인
I came to realize that we were compounding the inherent technical problem of insufficient knowledge of the true underlying error
distribution with the use of a crude approximation to the true transformation equations. That’s when it hit me: I can encode the given
mean and covariance information exactly in the form of a set of points—called sigma points—which has the same mean and covariance.
This set of points can then serve as a proxy for the unknown error distribution in the sense that, if we don't know which distribution
corresponds to the given mean and covariance, then any particular choice of distribution is as good as any other for estimating the
transformed state. What makes this point set attractive is that we can transform it exactly using our nonlinear transformations with no
need for linearization. More specifically, we can just apply our nonlinear transformation to each of the sigma points with the resulting
set of points representing the transformed distribution. We compute the mean and covariance of the transformed sigma points and this
will give us our estimate of the nonlinearly transformed mean and covariance that we need for the Kalman filter. It's simple, elegant, and
no other approach can provably do better without making other assumptions. That's the Unscented Transform (UT).

Some people mistakenly think that the sigma points used by the UT are somehow random samples analogous to Monte Carlo methods
(http://jackman.stanford.edu/mcmc/metropolis1.pdf), when in fact they are deterministically computed to exactly encode the given
mean and covariance information. In fact, the UKF produces exactly the same results as the Kalman filter in the linear case—they are
provably equivalent—but the UKF is defined and operates the same way with nonlinear transformations, though of course the results
are necessarily suboptimal.

How long did it take to develop the Unscented Transform?


There was nothing too hard about developing the transform itself other than coming up with algorithms for generating the required
sets of sigma points. At first I worked out how to construct the general algorithms in N dimensions for finding the minimal set of N+1
sigma points and the minimal symmetric set of 2N points to encode a given N-dimensional mean and covariance. For experimental
verification, I tested it out on some simple examples involving nonlinear transformations of sensor observations. These verified that it
produced better results than linearization and in some cases much better results. My feeling was that in more stressing applications the
difference would be even more pronounced.

Did you develop this on your own or collaborate with others?


I described the approach to people in my lab, and a couple of them, Simon Julier (http://www0.cs.ucl.ac.uk/people/S.Julier.html) and Ben
Quine (http://www.yorku.ca/bquine/pages/Brendan_Quine_2011.htm), were particularly interested. They became really convinced when
they ran it on their own highly nonlinear problems. Later Simon developed algorithms for generating sets of sigma points for exploiting
information about higher moments of a distribution when that information is available.

How did your long-term collaboration with Simon come about?


Simon was just beginning his doctoral studies in the Robotics Research Group (http://www.robots.ox.ac.uk/) when we became active
collaborators in 1994. We’re extremely compatible in many ways, not least of which is that we both really enjoy what we do and can talk
for hours and hours about it. Simon was working on a problem that had the potential to really demonstrate the value of the UT, so that
formed the basis of our initial collaboration. In addition to becoming an equal partner on the theory side, Simon is responsible for most
of the empirical results in our various papers. A year or so after I returned to NRL in 1995, Simon joined me in the same branch and we
continued our collaborations with funding from the Office of Naval Research (http://www.history.navy.mil/wars/nrl.pdf).

What’s the difference between the Unscented Transform (UT) and Unscented
Kalman Filter (UKF)?
The Unscented Transform (UT) is applicable whenever a state estimate needs to be transformed from one coordinate system to another.
The UKF is just the use of the UT within a Kalman filter to deal with those kinds of transformations.

What’s the appeal of the Unscented Transform?


What was most striking to people about the UT was not the accuracy so much as the ease with which it could be implemented. There
was no longer a need to derive a linearized approximation which would then have to be coded up for use in the filter.

Where do people use the UT and UKF? plethora:과잉, 광활한

Over the ten years following 1994 the Unscented Transform (UT) and Unscented Kalman Filter (UKF) were widely adopted in preference
to linearization-based techniques, e.g., the Extended Kalman Filter (EKF) (http://en.wikipedia.org/wiki/Extended_Kalman_filter), because
they are often more accurate and easier to implement. As a consequence Simon and I were invited to submit a paper for a special issue
of Proceedings of the IEEE in March 2004 (https://ieeexplore.ieee.org/document/1271397). Both separately and jointly we applied the
Unscented approach to a variety of challenging tracking, control, and estimation problems. Today there are a plethora of methodologies
built on top of the UT that are tailored to deal with the effects of nonlinearities in applications ranging from financial (http://dare.ubvu.v
u.nl/bitstream/handle/1871/15817/dissertation.pdf?sequence=1) and weather forecasting (http://www.math.univ-toulouse.fr/~baehr/pa
pers/UEVF.pdf) to the Mars rover program (http://www.nasa.gov/mission_pages/mer/index.html), where I was an evaluator for testing of
filter modules for the Mars Science Laboratory (http://www.nasa.gov/mission_pages/msl/index.html).

Closer to home I mentored a local high school student, Anand Palaniappan, for his project to computerize asteroid tracking. He was a
semifinalist in both the Siemens Westinghouse and Intel competitions. The use of the UKF wasn't what was most important to the
astronomy audience, but orbital motion is probably the case in which the advantage of the UKF over the EKF is most pronounced. The
group we were working with on the asteroid tracking problem works with some of the most unusual coordinate systems that have ever
been defined, and conversions between them can't be done accurately with linearization.

What’s with the Name “Unscented”?


Initially I only referred to it as the “new filter.” Needing a more specific name, people in my lab began referring to it as the “Uhlmann
filter,” which obviously isn’t a name that I could use, so I had to come up with an official term. One evening everyone else in the lab was
at the Royal Opera House (http://www.roh.org.uk/), and as I was working I noticed someone’s deodorant on a desk. The word
“unscented” caught my eye as the perfect technical term. At first people in the lab thought it was absurd—which is okay because
absurdity is my guiding principle—and that it wouldn’t catch on. My claim was that people simply accept technical terms as technical
terms: for example, does anyone think about why a tree is called a tree?

Within a few months we had a speaker visit from another university who talked about his work with the “unscented filter.” Clearly he had
never thought about the origin of the term. The cover of the issue of the March 2004 Proceedings (http://ieeexplore.ieee.org/stamp/sta
mp.jsp?tp=&arnumber=1271392) we’re discussing right now has “Unscented” in large letters on the cover, which shows that it has been
accepted as the technical term for that approach.

References
Simon J. Julier and Jeffrey K. Uhlmann, “Unscented Filtering and Nonlinear Estimation,” (http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arn
umber=1271397&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D1271397) Proc. IEEE Vol. 92, No. 3
(March 2004), p. 401-22.

Simon J. Julier and Jeffrey K. Uhlmann, “Corrections to ‘Unscented Filtering and Nonlinear Estimation,’” (http://ieeexplore.ieee.org/xpl/lo
gin.jsp?tp=&arnumber=1360170&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D1360170) Proc. IEEE
Vol. 92, No. 12 (December 2004), p. 1958.

---

S. J. Julier, J. K. Uhlmann, and Hugh F. Durrant-Whyte, “A New Approach for Filtering Nonlinear Systems,” (http://www0.cs.ucl.ac.uk/staff/
S.Julier/publications/1995/new_approach_for_filtering_nonlinear_systems.pdf) 1995 American Control Conference, Seattle, WA, p. 1628-
32.

S. J. Julier and J. K. Uhlmann, “A General Method for Approximating Nonlinear Transformations of Probability Distributions,” (http://www.
smpp.northwestern.edu/savedLiterature/JulierUhlmannUnscented.pdf) Robotics Research Group Technical Report, Department of
Engineering Science, University of Oxford: November 1996, p. 1-27.

S. J. Julier and J. K. Uhlmann, “A New Extension of the Kalman Filter to Nonlinear Systems,” (http://www.cs.unc.edu/~welch/kalman/medi
a/pdf/Julier1997_SPIE_KF.pdf) Proc. AeroSense: 11th Int. Symp. Aerospace/Defense Sensing, Simulation and Controls (1997), p. 182-93.

Jeffrey K. Uhlmann, “Algorithms for Multiple-Target Tracking,” (http://www.jstor.org/stable/29774599) American Scientist 80, no. 2
(March-April 1992), p. 128-41.

Jeffrey K. Uhlmann, Dynamic Map Building and Localization: New Theoretical Foundations (http://people.cs.missouri.edu/~uhlmannj/Disser
tation-pref.pdf), University of Oxford Doctor of Philosophy thesis, Trinity Term 1995.
Jeffrey K. Uhlmann and Miguel R. Zuniga, “Results of an Efficient Gating Algorithm for Large-Scale Tracking Scenarios," Naval Research
Reviews 1 (Spring 1991), p. 24-9.

Jeffrey K. Uhlmann, Miguel R. Zuniga, and J. Michael Picone, “Efficient Approaches for Report/Cluster Correlation in Multitarget Tracking
Systems, (http://www.dtic.mil/dtic/tr/fulltext/u2/a231589.pdf)” NRL Report 9281, 31 December 1990, p. 1-14.

원본 주소 "https://ethw.org/w/index.php?title=First-Hand:The_Unscented_Transform&oldid=184560"

이 문서는 2021년 4월 1일 (목) 15:16에 마지막으로 편집되었습니다.

You might also like