Professional Documents
Culture Documents
Phase Transition
Phase Transition
transformation
21 languages
Article
Talk
Read
Edit
View history
Tools
From Wikipedia, the free encyclopedia
This article is about an involution transform commonly used in classical mechanics and
thermodynamics. For the integral transform using Legendre polynomials as kernels, see
Legendre transform (integral transform).
The function
𝑓(𝑥)
[𝑎,𝑏]
. For a given
𝑝
, the difference
𝑝𝑥−𝑓(𝑥)
𝑥′
𝑓(𝑥)
is
𝑓∗(𝑝)=𝑝𝑥′−𝑓(𝑥′)
For sufficiently smooth functions on the real line, the Legendre transform
𝑓∗
of a function
𝑓
can be specified, up to an additive constant, by the condition that the functions' first
derivatives are inverse functions of each other. This can be expressed in Euler's
derivative notation as
𝐷𝑓(⋅)=(𝐷𝑓∗)−1(⋅) ,
where
is an operator of differentiation,
(𝜙)−1(⋅)
(𝜙)−1(𝜙(𝑥))=𝑥
or equivalently, as
𝑓′(𝑓∗′(𝑥∗))=𝑥∗
and
𝑓∗′(𝑓′(𝑥))=𝑥
in Lagrange's notation.
Definition[edit]
Definition in
[edit]
Let
𝐼⊂𝑅
be an interval, and
𝑓:𝐼→𝑅
is the function
𝑓∗:𝐼∗→𝑅
defined by
𝑓∗(𝑥∗)=sup𝑥∈𝐼(𝑥∗𝑥−𝑓(𝑥)), 𝐼∗={𝑥∗∈𝑅:𝑓∗(𝑥∗)<∞}
where
sup
, e.g.,
𝑥
in
𝑥∗𝑥−𝑓(𝑥)
is maximized at each
𝑥∗
, or
𝑥∗
is such that
𝑥∗𝑥−𝑓(𝑥)
𝑥
exists (e.g., when
𝑓(𝑥)
is a linear function).
𝑓(𝑥)
𝑥∗𝑥−𝑓(𝑥)
𝑅𝑛
[edit]
𝑓:𝑋→𝑅
on a convex set
𝑋⊂𝑅𝑛
is straightforward:
𝑓∗:𝑋∗→𝑅
has domain
𝑋∗={𝑥∗∈𝑅𝑛:sup𝑥∈𝑋(⟨𝑥∗,𝑥⟩−𝑓(𝑥))<∞}
and is defined by
𝑓∗(𝑥∗)=sup𝑥∈𝑋(⟨𝑥∗,𝑥⟩−𝑓(𝑥)),𝑥∗∈𝑋∗ ,
where
⟨𝑥∗,𝑥⟩
𝑥∗
and
𝑥
.
The function
𝑓∗
. For historical reasons (rooted in analytic mechanics), the conjugate variable is often
denoted
, instead of
𝑥∗
𝑓∗(𝑝)=sup𝑥∈𝐼(𝑝𝑥−𝑓(𝑥))=(𝑝𝑥−𝑓(𝑥))|𝑥=(𝑓′)−1(𝑝)
(𝑥,𝑦)
points, or as a set of tangent lines specified by their slope and intercept values.
𝑓′
(𝑓′)−1
𝑓∗
, can be specified, up to an additive constant, by the condition that the functions' first
derivatives are inverse functions of each other, i.e.,
𝑓′=((𝑓∗)′)−1
and
(𝑓∗)′=(𝑓′)−1
𝑥¯
𝑥↦𝑝⋅𝑥−𝑓(𝑥)
𝑥¯
(by convexity, see the first figure in this Wikipedia page). Therefore, the Legendre
transform of
is
𝑓∗(𝑝)=𝑝⋅𝑥¯−𝑓(𝑥¯)
.
Then, suppose that the first derivative
𝑓′
𝑔=(𝑓′)−1
, the point
𝑔(𝑝)
𝑥¯
of the function
𝑥↦𝑝𝑥−𝑓(𝑥)
(i.e.,
𝑥¯=𝑔(𝑝)
) because
𝑓′(𝑔(𝑝))=𝑝
at
𝑔(𝑝)
is
𝑝−𝑓′(𝑔(𝑝))=0
. Hence we have
𝑓∗(𝑝)=𝑝⋅𝑔(𝑝)−𝑓(𝑔(𝑝))
for each
, we find
(𝑓∗)′(𝑝)=𝑔(𝑝)+𝑝⋅𝑔′(𝑝)−𝑓′(𝑔(𝑝))⋅𝑔′(𝑝).
Since
𝑓′(𝑔(𝑝))=𝑝
this simplifies to
(𝑓∗)′(𝑝)=𝑔(𝑝)=(𝑓′)−1(𝑝)
. In other words,
(𝑓∗)′
and
𝑓′
ℎ′=(𝑓′)−1
as the inverse of
𝑓′
, then
ℎ′=(𝑓∗)′
so integration gives
𝑓∗=ℎ+𝑐
. with a constant
𝑓(𝑥)
𝑥𝑓′(𝑥)−𝑓(𝑥)
versus
𝑓′(𝑥)
𝑓∗(𝑝)
versus
𝑓(𝑥)−𝑓∗(𝑝)=𝑥𝑝.
is a function of
, then we have
d𝑓=d𝑓d𝑥d𝑥
𝑝=d𝑓d𝑥
d𝑓=𝑝d𝑥
,
d(𝑢𝑣)=𝑢d𝑣+𝑣d𝑢
, then we have
d(𝑥𝑝−𝑓)=𝑥d𝑝
and taking
𝑓∗=𝑥𝑝−𝑓
, we have
d𝑓∗=𝑥d𝑝
, which means
d𝑓∗d𝑝=𝑥.
When
is a function of
variables
𝑥1,𝑥2,⋯,𝑥𝑛
, then we can perform the Legendre transformation on each one or several
variables: we have
d𝑓=𝑝1d𝑥1+𝑝2d𝑥2+⋯+𝑝𝑛d𝑥𝑛
where
𝑝𝑖=∂𝑓∂𝑥𝑖
𝑥1
, then we take
𝑝1
together with
𝑥2,⋯,𝑥𝑛
d(𝑓−𝑥1𝑝1)=−𝑥1d𝑝1+𝑝2d𝑥2+⋯+𝑝𝑛d𝑥𝑛
so for function
𝜑(𝑝1,𝑥2,⋯,𝑥𝑛)=𝑓(𝑥1,𝑥2,⋯,𝑥𝑛)−𝑥1𝑝1
, we have
∂𝜑∂𝑝1=−𝑥1,∂𝜑∂𝑥2=𝑝2,⋯,∂𝜑∂𝑥𝑛=𝑝𝑛
.
𝑥2,⋯,𝑥𝑛
d𝜑=−𝑥1d𝑝1−𝑥2d𝑝2−⋯−𝑥𝑛d𝑝𝑛
where
𝜑=𝑓−𝑥1𝑝1−𝑥2𝑝2−⋯−𝑥𝑛𝑝𝑛
𝑞˙1,𝑞˙2,⋯,𝑞˙𝑛
of the Lagrangian
𝐿(𝑞1,⋯,𝑞𝑛,𝑞˙1,⋯,𝑞˙𝑛)
𝐻(𝑞1,⋯,𝑞𝑛,𝑝1,⋯,𝑝𝑛)=∑𝑖=1𝑛𝑝𝑖𝑞˙𝑖−𝐿(𝑞1,⋯,𝑞𝑛,𝑞˙1⋯,𝑞˙𝑛)
𝑈(𝑆,𝑉)
, we have
d𝑈=𝑇d𝑆−𝑝d𝑉
𝑆,𝑉
yielding
d𝐻=d(𝑈+𝑝𝑉) = 𝑇d𝑆+𝑉d𝑝
d𝐹=d(𝑈−𝑇𝑆) =−𝑆d𝑇−𝑝d𝑉
d𝐺=d(𝑈−𝑇𝑆+𝑝𝑉)=−𝑆d𝑇+𝑉d𝑝
𝑓,𝑥1,⋯,𝑥𝑛,𝑝1,⋯,𝑝𝑛,
𝑅𝑛
d𝑓,d𝑥𝑖,d𝑝𝑖
their differentials (which are treated as cotangent vector field in the context
of differentiable manifold). And this definition is equivalent to the modern
mathematicians' definition as long as
𝑥1,𝑥2,⋯,𝑥𝑛
Properties[edit]
● The Legendre transform of a convex function, of which double derivative values are
all positive, is also a convex function of which double derivative values are all
positive.
Proof. Let us show this with a doubly differentiable function
● 𝑓(𝑥)
● with all positive double derivative values and with a bijective (invertible)
derivative.
For a fixed
● 𝑝
● , let
● 𝑥¯
● maximize or make the function
● 𝑝𝑥−𝑓(𝑥)
● bounded over
● 𝑥
● . Then the Legendre transformation of
● 𝑓
● is
● 𝑓∗(𝑝)=𝑝𝑥¯−𝑓(𝑥¯)
● , thus,
● 𝑓′(𝑥¯)=𝑝
●
by the maximizing or bounding condition
● 𝑑𝑑𝑥(𝑝𝑥−𝑓(𝑥))=𝑝−𝑓′(𝑥)=0
● . Note that
● 𝑥¯
● depends on
● 𝑝
● . (This can be visually shown in the 1st figure of this page above.)
Thus
● 𝑥¯=𝑔(𝑝)
● where
● 𝑔≡(𝑓′)−1
● , meaning that
● 𝑔
● is the inverse of
● 𝑓′
● that is the derivative of
● 𝑓
● (so
● 𝑓′(𝑔(𝑝))=𝑝
● ).
Note that
● 𝑔
● is also differentiable with the following derivative (Inverse function rule),
● 𝑑𝑔(𝑝)𝑑𝑝=1𝑓″(𝑔(𝑝)) .
●
Thus, the Legendre transformation
● 𝑓∗(𝑝)=𝑝𝑔(𝑝)−𝑓(𝑔(𝑝))
● is the composition of differentiable functions, hence
it is differentiable.
Applying the product rule and the chain rule with the found equality
● 𝑥¯=𝑔(𝑝)
● yields
● 𝑑(𝑓∗)𝑑𝑝=𝑔(𝑝)+(𝑝−𝑓′(𝑔(𝑝)))⋅𝑑𝑔(𝑝)𝑑𝑝=𝑔(𝑝),
●
giving
● 𝑑2(𝑓∗)𝑑𝑝2=𝑑𝑔(𝑝)𝑑𝑝=1𝑓″(𝑔(𝑝))>0,
●
so
● 𝑓∗
● is convex with its double derivatives are all positive.
● The Legendre transformation is an involution, i.e.,
● 𝑓∗∗=𝑓
● .
Proof. By using the above identities as
● 𝑓′(𝑥¯)=𝑝
● ,
● 𝑥¯=𝑔(𝑝)
● ,
● 𝑓∗(𝑝)=𝑝𝑥¯−𝑓(𝑥¯)
● and its derivative
● (𝑓∗)′(𝑝)=𝑔(𝑝)
● ,
● 𝑓∗∗(𝑦)=(𝑦⋅𝑝¯−𝑓∗(𝑝¯))|(𝑓∗)′(𝑝¯)=𝑦=𝑔(𝑝¯)⋅𝑝¯−𝑓∗(𝑝¯)=𝑔(𝑝¯)⋅𝑝¯−(𝑝¯𝑔(𝑝¯)
−𝑓(𝑔(𝑝¯)))=𝑓(𝑔(𝑝¯))=𝑓(𝑦) .
●
Note that this derivation does not require the condition to have all positive
values in double derivative of the original function
● 𝑓
● .
Identities[edit]
𝑓(𝑥)
, with
𝑥=𝑥¯
maximizing or making
𝑝𝑥−𝑓(𝑥)
bounded at each
𝑓∗(𝑝)=𝑝𝑥¯−𝑓(𝑥¯)
and with
𝑔≡(𝑓′)−1
● 𝑓′(𝑥¯)=𝑝
● ,
● 𝑥¯=𝑔(𝑝)
● ,
● (𝑓∗)′(𝑝)=𝑔(𝑝)
● .
Examples[edit]
Example 1[edit]
𝑓(𝑥)=𝑒𝑥
over the domain
𝐼=𝑅
𝑓∗(𝑥∗)=𝑥∗(ln(𝑥∗)−1)
𝐼∗=(0,∞)
𝑓(𝑥)=𝑒𝑥,
𝐼=𝑅
𝑓∗(𝑥∗)=sup𝑥∈𝑅(𝑥∗𝑥−𝑒𝑥),𝑥∗∈𝐼∗
where
𝐼∗
𝑥∗𝑥−𝑒𝑥
with respect to
𝑥
and set equal to zero:
𝑑𝑑𝑥(𝑥∗𝑥−𝑒𝑥)=𝑥∗−𝑒𝑥=0.
−𝑒𝑥
𝑥=ln(𝑥∗)
𝑓∗(𝑥∗)=𝑥∗ln(𝑥∗)−𝑒ln(𝑥∗)=𝑥∗(ln(𝑥∗)−1)
𝐼∗=(0,∞).
This illustrates that the domains of a function and its Legendre transform
can be different.
𝑓∗∗(𝑥)=sup𝑥∗∈𝑅(𝑥𝑥∗−𝑥∗(ln(𝑥∗)−1)),𝑥∈𝐼,
where a variable
𝑥
is intentionally used as the argument of the function
𝑓∗∗
𝑓∗∗=𝑓
. we compute
0=𝑑𝑑𝑥∗(𝑥𝑥∗−𝑥∗(ln(𝑥∗)−1))=𝑥−ln(𝑥∗)
𝑥∗=𝑒𝑥
𝑑2𝑑𝑥∗2𝑓∗∗(𝑥)=−1𝑥∗<0
𝑓∗∗
as
𝐼∗=(0,∞).
As a result,
𝑓∗∗
is found as
𝑓∗∗(𝑥)=𝑥𝑒𝑥−𝑒𝑥(ln(𝑒𝑥)−1)=𝑒𝑥,
𝑓=𝑓∗∗,
as expected.
Example 2[edit]
2
Let f(x) = cx defined on R, where c > 0 is a fixed constant.
2
For x* fixed, the function of x, x*x − f(x) = x*x − cx has the first derivative x* − 2cx
and second derivative −2c; there is one stationary point at x = x*/2c, which is always a
maximum.
Thus, I* = R and
𝑓∗(𝑥∗)=𝑥∗24𝑐 .
The first derivatives of f, 2cx, and of f *, x*/(2c), are inverse functions to each other.
Clearly, furthermore,
𝑓∗∗(𝑥)=14(1/4𝑐)𝑥2=𝑐𝑥2 ,
namely f ** = f.
Example 3[edit]
2
Let f(x) = x for x ∈ (I = [2, 3]).
For x* fixed, x*x − f(x) is continuous on I compact, hence it always takes a finite
maximum on it; it follows that the domain of the Legendre transform of
is I* = R.
The stationary point at x = x*/2 (found by setting that the first derivative of x*x − f(x)
with respect to
𝑥
equal to zero) is in the domain [2, 3] if and only if 4 ≤ x* ≤ 6. Otherwise the maximum
is taken either at x = 2 or x = 3 because the second derivative of x*x − f(x) with respect
to
is negative as
−2
𝑥∗<4
𝑥∈[2,3]
is obtained at
𝑥=2
while for
𝑥∗>6
𝑥=3
𝑓∗(𝑥∗)={2𝑥∗−4,𝑥∗<4𝑥∗24,4≤𝑥∗≤6,3𝑥∗−9,𝑥∗>6.
Example 4[edit]
The function f(x) = cx is convex, for every x (strict convexity is not required for the
Legendre transformation to be well defined). Clearly x*x − f(x) = (x* − c)x is never
bounded from above as a function of x, unless x* − c = 0. Hence f* is defined on I* =
{c} and f*(c) = 0. (The definition of the Legendre transform requires the existence of the
supremum, that requires upper bounds.)
One may check involutivity: of course, x*x − f*(x*) is always bounded as a function of
x*∈{c}, hence I** = R. Then, for all x one has
sup𝑥∗∈{𝑐}(𝑥𝑥∗−𝑓∗(𝑥∗))=𝑥𝑐,
Example 5[edit]
𝑓(𝑥)=|𝑥|
. This gives
𝑓∗(𝑥∗)=sup𝑥(𝑥𝑥∗−|𝑥|)=max(sup𝑥≥0𝑥(𝑥∗−1),sup𝑥≤0𝑥(𝑥∗+1)),
and thus
𝑓∗(𝑥∗)=0
on its domain
𝐼∗=[−1,1]
.
Example 6: several variables[edit]
Let
𝑓(𝑥)=⟨𝑥,𝐴𝑥⟩+𝑐
Orthogonal matrix
31 languages
Article
Talk
Read
Edit
View history
Tools
From Wikipedia, the free encyclopedia
For matrices with orthogonality over the complex number field, see unitary matrix.
𝑄T𝑄=𝑄𝑄T=𝐼,
T
where Q is the transpose of Q and I is the identity matrix.
𝑄T=𝑄−1,
−1
where Q is the inverse of Q.
−1 T −1
An orthogonal matrix Q is necessarily invertible (with inverse Q = Q ), unitary (Q =
∗ ∗
Q ), where Q is the Hermitian adjoint (conjugate transpose) of Q, and therefore
∗ ∗
normal (Q Q = QQ ) over the real numbers. The determinant of any orthogonal matrix is
either +1 or −1. As a linear transformation, an orthogonal matrix preserves the inner
product of vectors, and therefore acts as an isometry of Euclidean space, such as a
rotation, reflection or rotoreflection. In other words, it is a unitary transformation.
The set of n × n orthogonal matrices, under multiplication, forms the group O(n), known
as the orthogonal group. The subgroup SO(n) consisting of orthogonal matrices with
determinant +1 is called the special orthogonal group, and each of its elements is a
special orthogonal matrix. As a linear transformation, every special orthogonal matrix
acts as a rotation.
Overview[edit]
Visual understanding of multiplication by the transpose of a matrix. If A is an orthogonal matrix and B is its
T
transpose, the ij-th element of the product AA will vanish if i≠j, because the i-th row of A is orthogonal to the j-
th row of A.
An orthogonal matrix is the real specialization of a unitary matrix, and thus always a
normal matrix. Although we consider only real matrices here, the definition can be used
for matrices with entries from any field. However, orthogonal matrices arise naturally
from dot products, and for matrices of complex numbers that leads instead to the unitary
requirement. Orthogonal matrices preserve the dot product,[1] so, for vectors u and v in
an n-dimensional real Euclidean space
𝑢⋅𝑣=(𝑄𝑢)⋅(𝑄𝑣)
where Q is an orthogonal matrix. To see the inner product connection, consider a vector
v in an n-dimensional real Euclidean space. Written with respect to an orthonormal
T
basis, the squared length of v is v v. If a linear transformation, in matrix form Qv,
preserves vector lengths, then
𝑣T𝑣=(𝑄𝑣)T(𝑄𝑣)=𝑣T𝑄T𝑄𝑣.
Thus finite-dimensional linear isometries—rotations, reflections, and their combinations
—produce orthogonal matrices. The converse is also true: orthogonal matrices imply
orthogonal transformations. However, linear algebra includes orthogonal
transformations between spaces which may be neither finite-dimensional nor of the
same dimension, and these have no orthogonal matrix equivalent.
Orthogonal matrices are important for a number of reasons, both theoretical and
practical. The n × n orthogonal matrices form a group under matrix multiplication, the
orthogonal group denoted by O(n), which—with its subgroups—is widely used in
mathematics and the physical sciences. For example, the point group of a molecule is a
subgroup of O(3). Because floating point versions of orthogonal matrices have
advantageous properties, they are key to many algorithms in numerical linear algebra,
such as QR decomposition. As another example, with appropriate normalization the
discrete cosine transform (used in MP3 compression) is represented by an orthogonal
matrix.
Examples[edit]
Below are a few examples of small orthogonal matrices and possible interpretations.
● [1001]
● (identity transformation)
● [cos𝜃−sin𝜃sin𝜃cos𝜃]
Elementary constructions[edit]
Lower dimensions[edit]
The simplest orthogonal matrices are the 1 × 1 matrices [1] and [−1], which we can interpret as
the identity and a reflection of the real line across the origin.
[𝑝𝑡𝑞𝑢],
1=𝑝2+𝑡2,1=𝑞2+𝑢2,0=𝑝𝑞+𝑡𝑢.
In consideration of the first equation, without loss of generality let p = cos θ, q = sin θ;
then either t = −q, u = p or t = q, u = −p. We can interpret the first case as a rotation by
θ (where θ = 0 is the identity), and the second as a reflection across a line at an angle of
[0110].
A reflection is its own inverse, which implies that a reflection matrix is symmetric (equal
to its transpose) as well as orthogonal. The product of two rotation matrices is a rotation
matrix, and the product of two reflection matrices is also a rotation matrix.
Higher dimensions[edit]
represent an inversion through the origin and a rotoinversion, respectively, about the z-
axis.
The most elementary permutation is a transposition, obtained from the identity matrix by
exchanging two rows. Any n × n permutation matrix can be constructed as a product of
no more than n − 1 transpositions.
𝑄=𝐼−2𝑣𝑣T𝑣T𝑣.
Here the numerator is a symmetric matrix while the denominator is a number, the
squared magnitude of v. This is a reflection in the hyperplane perpendicular to v
T
(negating any vector component parallel to v). If v is a unit vector, then Q = I − 2vv
suffices. A Householder reflection is typically used to simultaneously zero the lower part
of a column. Any orthogonal matrix of size n × n can be constructed as a product of at
most n such reflections.
n(n − 1)
such rotations. In the case of 3 × 3 matrices, three such rotations suffice; and by fixing
the sequence we can thus describe all 3 × 3 rotation matrices (though not uniquely) in
terms of the three angles used, often called Euler angles.
A Jacobi rotation has the same form as a Givens rotation, but is used to zero both off-
diagonal entries of a 2 × 2 symmetric submatrix.
Properties[edit]
Matrix properties[edit]
A real square matrix is orthogonal if and only if its columns form an orthonormal basis of
n
the Euclidean space R with the ordinary Euclidean dot product, which is the case if and
n
only if its rows form an orthonormal basis of R . It might be tempting to suppose a
matrix with orthogonal (not orthonormal) columns would be called an orthogonal matrix,
T
but such matrices have no special interest and no special name; they only satisfy M M
= D, with D a diagonal matrix.
The determinant of any orthogonal matrix is +1 or −1. This follows from basic facts about
determinants, as follows:
1=det(𝐼)=det(𝑄T𝑄)=det(𝑄T)det(𝑄)=(det(𝑄))2.
[20012]
With permutation matrices the determinant matches the signature, being +1 or −1 as the
parity of the permutation is even or odd, for the determinant is an alternating function of the
rows.
Stronger than the determinant restriction is the fact that an orthogonal matrix can
always be diagonalized over the complex numbers to exhibit a full set of eigenvalues, all
of which must have (complex) modulus 1.
Group properties[edit]
The inverse of every orthogonal matrix is again orthogonal, as is the matrix product of
two orthogonal matrices. In fact, the set of all n × n orthogonal matrices satisfies all the
axioms of a group. It is a compact Lie group of dimension
n(n − 1)
[0O(𝑛)⋮00⋯01]
Since an elementary reflection in the form of a Householder matrix can reduce any
orthogonal matrix to this constrained form, a series of such reflections can bring any
orthogonal matrix to the identity; thus an orthogonal group is a reflection group. The last
column can be fixed to any unit vector, and each choice gives a different copy of O(n) in
n
O(n + 1); in this way O(n + 1) is a bundle over the unit sphere S with fiber O(n).
Similarly, SO(n) is a subgroup of SO(n + 1); and any special orthogonal matrix can be
generated by Givens plane rotations using an analogous procedure. The bundle
n
structure persists: SO(n) ↪ SO(n + 1) → S . A single rotation can produce a zero in the
first row of the last column, and series of n − 1 rotations will zero all but the last row of
the last column of an n × n rotation matrix. Since the planes are fixed, each rotation has
only one degree of freedom, its angle. By induction, SO(n) therefore has
(𝑛−1)+(𝑛−2)+⋯+1=𝑛(𝑛−1)2
Permutation matrices are simpler still; they form, not a Lie group, but only a finite group,
the order n! symmetric group Sn. By the same kind of argument, Sn is a subgroup of Sn +
1. The even permutations produce the subgroup of permutation matrices of determinant
+1, the order
n!
alternating group.
Canonical form[edit]
More broadly, the effect of any orthogonal matrix separates into independent actions on
orthogonal two-dimensional subspaces. That is, if Q is special orthogonal then one can
always find an orthogonal matrix P, a (rotational) change of basis, that brings Q into
block diagonal form:
where the matrices R1, ..., Rk are 2 × 2 rotation matrices, and with the remaining entries
zero. Exceptionally, a rotation block may be diagonal, ±I. Thus, negating one column if
necessary, and noting that a 2 × 2 reflection diagonalizes to a +1 and −1, any orthogonal matrix
can be brought to the form
𝑃T𝑄𝑃=[𝑅1⋱𝑅𝑘00±1⋱±1],
The matrices R1, ..., Rk give conjugate pairs of eigenvalues lying on the unit circle in the
complex plane; so this decomposition confirms that all eigenvalues have absolute value
1. If n is odd, there is at least one real eigenvalue, +1 or −1; for a 3 × 3 rotation, the eigenvector
associated with +1 is the rotation axis.
Lie algebra[edit]
𝑄T𝑄=𝐼
yields
𝑄˙T𝑄+𝑄T𝑄˙=0
𝑄˙T=−𝑄˙.
In Lie group terms, this means that the Lie algebra of an orthogonal matrix group
consists of skew-symmetric matrices. Going the other direction, the matrix exponential
of any skew-symmetric matrix is an orthogonal matrix (in fact, special orthogonal).
For example, the three-dimensional object physics calls angular velocity is a differential
rotation, thus a vector in the Lie algebra
𝑠𝑜(3)
tangent to SO(3). Given ω = (xθ, yθ, zθ), with v = (x, y, z) being a unit vector, the
correct skew-symmetric matrix form of ω is
Ω=[0−𝑧𝜃𝑦𝜃𝑧𝜃0−𝑥𝜃−𝑦𝜃𝑥𝜃0].
The exponential of this is the orthogonal matrix for rotation around axis v by angle θ;
setting c = cos
/
2
, s = sin
exp(Ω)=[1−2𝑠2+2𝑥2𝑠22𝑥𝑦𝑠2−2𝑧𝑠𝑐2𝑥𝑧𝑠2+2𝑦𝑠𝑐2𝑥𝑦𝑠2+2𝑧𝑠𝑐1−2𝑠2+2𝑦2𝑠22𝑦𝑧𝑠2−2𝑥𝑠𝑐2
𝑥𝑧𝑠2−2𝑦𝑠𝑐2𝑦𝑧𝑠2+2𝑥𝑠𝑐1−2𝑠2+2𝑧2𝑠2].
Benefits[edit]
Numerical analysis takes advantage of many of the properties of orthogonal matrices for
numerical linear algebra, and they arise naturally. For example, it is often desirable to
compute an orthonormal basis for a space, or an orthogonal change of bases; both take
the form of orthogonal matrices. Having determinant ±1 and all eigenvalues of
magnitude 1 is of great benefit for numeric stability. One implication is that the condition
number is 1 (which is the minimum), so errors are not magnified when multiplying with
an orthogonal matrix. Many algorithms use orthogonal matrices like Householder
reflections and Givens rotations for this reason. It is also helpful that, not only is an
orthogonal matrix invertible, but its inverse is available essentially free, by exchanging
indices.
Permutations are essential to the success of many algorithms, including the workhorse
Gaussian elimination with partial pivoting (where permutations do the pivoting).
However, they rarely appear explicitly as matrices; their special form allows more
efficient representation, such as a list of n indices.
Likewise, algorithms using Householder and Givens matrices typically use specialized
methods of multiplication and storage. For example, a Givens rotation affects only two
3
rows of a matrix it multiplies, changing a full multiplication of order n to a much more
efficient order n. When uses of these reflections and rotations introduce zeros in a
matrix, the space vacated is enough to store sufficient data to reproduce the transform,
and to do so robustly. (Following Stewart (1976), we do not store a rotation angle, which
is both expensive and badly behaved.)
Decompositions[edit]
A number of important matrix decompositions (Golub & Van Loan 1996) involve
orthogonal matrices, including especially:
QR decomposition
T
M = UΣV , U and V orthogonal, Σ diagonal matrix
T
S = QΛQ , S symmetric, Q orthogonal, Λ diagonal
Polar decomposition
Examples[edit]
𝑅=[⋅⋅⋅0⋅⋅00⋅000000].
The linear least squares problem is to find the x that minimizes ‖Ax − b‖, which is
equivalent to projecting b to the subspace spanned by the columns of A. Assuming the
T
columns of A (and hence R) are independent, the projection solution is found from A Ax
T T T
= A b. Now A A is square (n × n) and invertible, and also equal to R R. But the lower
rows of zeros in R are superfluous in the product, which is thus already in lower-
triangular upper-triangular factored form, as in Gaussian elimination (Cholesky
T T T
decomposition). Here orthogonality is important not only for reducing A A = (R Q )QR
T
to R R, but also for allowing solution without magnifying numerical problems.
The case of a square invertible matrix also holds interest. Suppose, for example, that A
is a 3 × 3 rotation matrix which has been computed as the composition of numerous
twists and turns. Floating point does not match the mathematical ideal of real numbers,
so A has gradually lost its true orthogonality. A Gram–Schmidt process could
orthogonalize the columns, but it is not the most reliable, nor the most efficient, nor the
most invariant method. The polar decomposition factors a matrix into a pair, one of
which is the unique closest orthogonal matrix to the given matrix, or one of the closest if
the given matrix is singular. (Closeness can be measured by any matrix norm invariant
under an orthogonal change of basis, such as the spectral norm or the Frobenius norm.)
For a near-orthogonal matrix, rapid convergence to the orthogonal factor can be
achieved by a "Newton's method" approach due to Higham (1986) (1990), repeatedly
averaging the matrix with its inverse transpose. Dubrulle (1999) has published an
accelerated method with a convenient convergence test.
For example, consider a non-orthogonal matrix for which the simple averaging algorithm
takes seven steps
[3175]→[1.81250.06253.43752.6875]→⋯→[0.8−0.60.60.8]
and which acceleration trims to two steps (with γ = 0.353553, 0.565685).
[3175]→[1.41421−1.060661.060661.41421]→[0.8−0.60.60.8]
[3175]→[0.393919−0.9191450.9191450.393919]
Randomization[edit]
Some numerical applications, such as Monte Carlo methods and exploration of high-
dimensional data spaces, require generation of uniformly distributed random orthogonal
matrices. In this context, "uniform" is defined in terms of Haar measure, which
essentially requires that the distribution not change if multiplied by any freely chosen
orthogonal matrix. Orthogonalizing matrices with independent uniformly distributed
random entries does not result in uniformly distributed orthogonal matrices[citation needed],
but the QR decomposition of independent normally distributed random entries does, as
long as the diagonal of R contains only positive entries (Mezzadri 2006). Stewart (1980)
replaced this with a more efficient idea that Diaconis & Shahshahani (1987) later
generalized as the "subgroup algorithm" (in which form it works just as well for
permutations and rotations). To generate an (n + 1) × (n + 1) orthogonal matrix, take an
n × n one and a uniformly distributed unit vector of dimension n + 1. Construct a
Householder reflection from the vector, then apply it to the smaller matrix (embedded in
the larger size with a 1 at the bottom right corner).
The problem of finding the orthogonal matrix Q nearest a given matrix M is related to
the Orthogonal Procrustes problem. There are several different ways to get the unique
solution, the simplest of which is taking the singular value decomposition of M and
replacing the singular values with ones. Another method expresses the R explicitly but
requires the use of a matrix square root:[2]
𝑄=𝑀(𝑀T𝑀)−12
This may be combined with the Babylonian method for extracting the square root of a
matrix to give a recurrence which converges to an orthogonal matrix quadratically:
𝑄𝑛+1=2𝑀(𝑄𝑛−1𝑀+𝑀T𝑄𝑛)−1
where Q0 = M.
These iterations are stable provided the condition number of M is less than three.[3]
Using a first-order approximation of the inverse and the same initialization results in the
modified iteration:
𝑁𝑛=𝑄𝑛T𝑄𝑛
𝑃𝑛=12𝑄𝑛𝑁𝑛
𝑄𝑛+1=2𝑄𝑛+𝑃𝑛𝑁𝑛−3𝑃𝑛
The Pin and Spin groups are found within Clifford algebras, which themselves can be
built from orthogonal matrices.
Rectangular matrices[edit]
T T
If Q is not a square matrix, then the conditions Q Q = I and QQ = I are not equivalent.
T
The condition Q Q = I says that the columns of Q are orthonormal. This can only
T
happen if Q is an m × n matrix with n ≤ m (due to linear dependence). Similarly, QQ =
I says that the rows of Q are orthonormal, which requires n ≥ m.
There is no standard terminology for these matrices. They are variously called "semi-
orthogonal matrices", "orthonormal matrices", "orthogonal matrices", and sometimes
simply "matrices with orthonormal rows/columns".
See also[edit]
● Biorthogonal system
Notes[edit]
[full citation needed]
● ^ "Paul's online math notes" , Paul Dawkins, Lamar University, 2008. Theorem
3(c)
● ^ "Finding the Nearest Orthonormal Matrix", Berthold K.P. Horn, MIT.
● ^ "Newton's Method for the Matrix Square Root" Archived 2011-09-29 at the Wayback
Machine, Nicholas J. Higham, Mathematics of Computation, Volume 46, Number 174, 1986.
References[edit]
External links[edit]
hide
● V
● T
● E
Matrix classes
Expl
● Alternant
icitl ● Anti-diagonal
y ● Anti-Hermitian
● Anti-symmetric
con ● Arrowhead
strai ● Band
● Bidiagonal
ned ● Bisymmetric
entri ● Block-diagonal
● Block
es ● Block tridiagonal
● Boolean
● Cauchy
● Centrosymmetric
● Conference
● Complex Hadamard
● Copositive
● Diagonally dominant
● Diagonal
● Discrete Fourier Transform
● Elementary
● Equivalent
● Frobenius
● Generalized permutation
● Hadamard
● Hankel
● Hermitian
● Hessenberg
● Hollow
● Integer
● Logical
● Matrix unit
● Metzler
● Moore
● Nonnegative
● Pentadiagonal
● Permutation
● Persymmetric
● Polynomial
● Quaternionic
● Signature
● Skew-Hermitian
● Skew-symmetric
● Skyline
● Sparse
● Sylvester
● Symmetric
● Toeplitz
● Triangular
● Tridiagonal
● Vandermonde
● Walsh
● Z
Con
● Exchange
stan ● Hilbert
t ● Identity
● Lehmer
● Of ones
● Pascal
● Pauli
● Redheffer
● Shift
● Zero
Con
● Companion
ditio ● Convergent
ns ● Defective
● Definite
on ● Diagonalizable
eige ● Hurwitz
● Positive-definite
nval ● Stieltjes
ues
or
eige
nve
ctor
s
Sati
● Congruent
sfyi ● Idempotent or Projection
ng ● Invertible
● Involutory
con ● Nilpotent
ditio ● Normal
● Orthogonal
ns ● Unimodular
on ● Unipotent
● Unitary
prod ● Totally unimodular
ucts ● Weighing
or
inve
rses
With
● Adjugate
spe ● Alternating sign
cific ● Augmented
● Bézout
● Carleman
appl ● Cartan
● Circulant
icati ● Cofactor
ons ● Commutation
● Confusion
● Coxeter
● Distance
● Duplication and elimination
● Euclidean distance
● Fundamental (linear differential equation)
● Generator
● Gram
● Hessian
● Householder
● Jacobian
● Moment
● Payoff
● Pick
● Random
● Rotation
● Seifert
● Shear
● Similarity
● Symplectic
● Totally positive
● Transformation
Use
● Centering
d in ● Correlation
stati ● Covariance
● Design
stic ● Doubly stochastic
s ● Fisher information
● Hat
● Precision
● Stochastic
● Transition
Use
● Adjacency
d in ● Biadjacency
grap ● Degree
● Edmonds
h ● Incidence
theo ● Laplacian
● Seidel adjacency
ry ● Tutte
Use
● Cabibbo–Kobayashi–Maskawa
d in ● Density
scie ● Fundamental (computer vision)
● Fuzzy associative
nce ● Gamma
and ● Gell-Mann
● Hamiltonian
engi ● Irregular
neer ● Overlap
● S
ing ● State transition
● Substitution
● Z (chemistry)
Rela
● Jordan normal form
ted ● Linear independence
term ● Matrix exponential
● Matrix representation of conic sections
s ● Perfect matrix
● Pseudoinverse
● Row echelon form
● Wronskian
● Mathematics portal
● List of matrices
● Category:Matrices
Category:
Matrices
This page was last edited on 22 May 2024, at 16:45 (UTC).
Text is available under the Creative Commons Attribution-ShareAlike License 4.0; additional terms may apply.
By using this site, you
n
be defined on X = R , where A is a real, positive definite matrix.
⟨𝑝,𝑥⟩−𝑓(𝑥)=⟨𝑝,𝑥⟩−⟨𝑥,𝐴𝑥⟩−𝑐,
has gradient p − 2Ax and Hessian −2A, which is negative; hence the stationary point x
−1
=A p/2 is a maximum.
n
We have X* = R , and
𝑓∗(𝑝)=14⟨𝑝,𝐴−1𝑝⟩−𝑐.
Behavior of differentials under Legendre transforms[edit]
Let f(x,y) be a function of two independent variables x and y, with the differential
𝑑𝑓=∂𝑓∂𝑥𝑑𝑥+∂𝑓∂𝑦𝑑𝑦=𝑝𝑑𝑥+𝑣𝑑𝑦.
Assume that the function f is convex in x for all y, so that one may perform the Legendre
transform on f in x, with p the variable conjugate to x (for information, there is a relation
∂𝑓∂𝑥|𝑥¯=𝑝
where
𝑥¯
𝑝𝑥−𝑓(𝑥,𝑦)
bounded for given p and y). Since the new independent variable of the
transform with respect to f is p, the differentials dx and dy in df devolve to dp and dy in
the differential of the transform, i.e., we build another function with its differential
expressed in terms of the new basis dp and dy.
𝑑𝑔=𝑑𝑓−𝑝𝑑𝑥−𝑥𝑑𝑝=−𝑥𝑑𝑝+𝑣𝑑𝑦
𝑥=−∂𝑔∂𝑝
𝑣=∂𝑔∂𝑦.
The function −g(p, y) is the Legendre transform of f(x, y), where only the independent
variable x has been supplanted by p. This is widely used in thermodynamics, as
illustrated below.
Applications[edit]
Analytical mechanics[edit]
𝐿(𝑣,𝑞)=12⟨𝑣,𝑀𝑣⟩−𝑉(𝑞),
where
(𝑣,𝑞)
n n
are coordinates on R × R , M is a positive definite real matrix, and
⟨𝑥,𝑦⟩=∑𝑗𝑥𝑗𝑦𝑗.
𝐿(𝑣,𝑞)
is a convex function of
, while
𝑉(𝑞)
𝐿(𝑣,𝑞)
as a function of
𝐻(𝑝,𝑞)=12⟨𝑝,𝑀−1𝑝⟩+𝑉(𝑞).
(𝑣,𝑞)
𝑇𝑀
of a manifold
. For each q,
𝐿(𝑣,𝑞)
is a convex function of the tangent space Vq. The Legendre transform gives the
Hamiltonian
𝐻(𝑝,𝑞)
𝑇∗𝑀
; the inner product used to define the Legendre transform is inherited from the
pertinent canonical symplectic structure. In this abstract setting, the Legendre
[further explanation needed]
transformation corresponds to the tautological one-form.
Thermodynamics[edit]
The strategy behind the use of Legendre transforms in thermodynamics is to shift from a
function that depends on a variable to a new (conjugate) function that depends on a new
variable, the conjugate of the original one. The new variable is the partial derivative of the
original function with respect to the original variable. The new function is the difference between
the original function and the product of the old and new variables. Typically, this transformation
is useful because it shifts the dependence of, e.g., the energy from an extensive variable to its
conjugate intensive variable, which can often be controlled more easily in a physical
experiment.
For example, the internal energy U is an explicit function of the extensive variables
entropy S, volume V, and chemical composition Ni (e.g.,
𝑖=1,2,3,…
𝑈=𝑈(𝑆,𝑉,{𝑁𝑖}),
𝑑𝑈=𝑇𝑑𝑆−𝑃𝑑𝑉+∑𝜇𝑖𝑑𝑁𝑖
where
(Subscripts are not necessary by the definition of partial derivatives but left here for
clarifying variables.) Stipulating some common reference state, by using the (non-
standard) Legendre transform of the internal energy U with respect to volume V, the
enthalpy H may be obtained as the following.
𝑈∗
𝑢(𝑝,𝑆,𝑉,{𝑁𝑖})=𝑝𝑉−𝑈
∂𝑢∂𝑉=𝑝−∂𝑈∂𝑉=0→𝑝=∂𝑈∂𝑉
needs to be satisfied, so
𝑈∗=∂𝑈∂𝑉𝑉−𝑈
−𝑈∗=𝐻=𝑈−∂𝑈∂𝑉𝑉=𝑈+𝑃𝑉
.
H is definitely a state function as it is obtained by adding PV (P and V as state variables)
to a state function
𝑈=𝑈(𝑆,𝑉,{𝑁𝑖})
𝑑𝐻=𝑇𝑑𝑆+𝑉𝑑𝑃+∑𝜇𝑖𝑑𝑁𝑖
𝐻=𝐻(𝑆,𝑃,{𝑁𝑖})
The enthalpy is suitable for description of processes in which the pressure is controlled
from the surroundings.
It is likewise possible to shift the dependence of the energy from the extensive variable
of entropy, S, to the (often more convenient) intensive variable T, resulting in the
Helmholtz and Gibbs free energies. The Helmholtz free energy A, and Gibbs energy G,
are obtained by performing Legendre transforms of the internal energy and enthalpy,
respectively,
𝐴=𝑈−𝑇𝑆 ,
𝐺=𝐻−𝑇𝑆=𝑈+𝑃𝑉−𝑇𝑆 .
The Helmholtz free energy is often the most useful thermodynamic potential when
temperature and volume are controlled from the surroundings, while the Gibbs energy is
often the most useful when temperature and pressure are controlled from the
surroundings.
Variable capacitor[edit]
As another example from physics, consider a parallel conductive plate capacitor, in which
the plates can move relative to one another. Such a capacitor would allow transfer of the
electric energy which is stored in the capacitor into external mechanical work, done by
the force acting on the plates. One may think of the electric charge as analogous to the
"charge" of a gas in a cylinder, with the resulting mechanical force exerted on a piston.
Compute the force on the plates as a function of x, the distance which separates them.
To find the force, compute the potential energy, and then apply the definition of force as
the gradient of the potential energy function.
The electrostatic potential energy stored in a capacitor of the capacitance C(x) and a
positive electric charge +Q or negative charge -Q on each conductive plate is (with using
the definition of the capacitance as
𝐶=𝑄𝑉
),
𝑈(𝑄,𝑥)=12𝑄𝑉(𝑄,𝑥)=12𝑄2𝐶(𝑥),
where the dependence on the area of the plates, the dielectric constant of the insulation
material between the plates, and the separation x are abstracted away as the capacitance
C(x). (For a parallel plate capacitor, this is proportional to the area of the plates and
inversely proportional to the separation.)
The force F between the plates due to the electric field created by the charge separation
is then
𝐹(𝑥)=−𝑑𝑈𝑑𝑥 .
If the capacitor is not connected to any electric circuit, then the electric charges on the
plates remain constant and the voltage varies when the plates move with respect to each
other, and the force is the negative gradient of the electrostatic potential energy as
𝐹(𝑥)=12𝑑𝐶(𝑥)𝑑𝑥𝑄2𝐶(𝑥)2=12𝑑𝐶(𝑥)𝑑𝑥𝑉(𝑥)2
where
𝑉(𝑄,𝑥)=𝑉(𝑥)
However, instead, suppose that the voltage between the plates V is maintained constant
as the plate moves by connection to a battery, which is a reservoir for electric charges at
a constant potential difference. Then the amount of charges
and
are the Legendre conjugate to each other. To find the force, first compute the non-
standard Legendre transform
𝑈∗
with respect to
𝐶=𝑄𝑉
),
𝑈∗=𝑈−∂𝑈∂𝑄|𝑥⋅𝑄=𝑈−12𝐶(𝑥)∂𝑄2∂𝑄|𝑥⋅𝑄=𝑈−𝑄𝑉=12𝑄𝑉−𝑄𝑉=−12𝑄𝑉=−12𝑉2𝐶(𝑥).
so is convex on it. The force now becomes the negative gradient of this Legendre
transform, resulting in the same force obtained from the original function
𝐹(𝑥)=−𝑑𝑈∗𝑑𝑥=12𝑑𝐶(𝑥)𝑑𝑥𝑉2.
and
𝑈∗
happen to stand opposite to each other (their signs are opposite), only because of
the linearity of the capacitance—except now Q is no longer a constant. They reflect the
two different pathways of storing energy into the capacitor, resulting in, for instance, the
same "pull" between a capacitor's plates.
Probability theory[edit]
In large deviations theory, the rate function is defined as the Legendre transformation of
the logarithm of the moment generating function of a random variable. An important
application of the rate function is in the calculation of tail probabilities of sums of i.i.d.
random variables, in particular in Cramér's theorem.
If
𝑋𝑛
𝑆𝑛=𝑋1+⋯+𝑋𝑛
𝑀(𝜉)
𝑋1
. For
𝜉∈𝑅
𝐸[𝑒𝜉𝑆𝑛]=𝑀(𝜉)𝑛
𝜉≥0
and
𝑎∈𝑅
𝑃(𝑆𝑛/𝑛>𝑎)≤𝑒−𝑛𝜉𝑎𝑀(𝜉)𝑛=exp[−𝑛(𝜉𝑎−Λ(𝜉))]
where
Λ(𝜉)=log𝑀(𝜉)
, we may take the infimum of the right-hand side, which leads one to consider the
supremum of
𝜉𝑎−Λ(𝜉)
, evaluated at
𝑥=𝑎
.
Microeconomics[edit]
A simple theory explains the shape of the supply curve based solely on the cost
function. Let us suppose the market price for a one unit of our product is P. For a
company selling this good, the best strategy is to adjust the production Q so that its
profit is maximized. We can maximize the profit
profit=revenue−costs=𝑃𝑄−𝐶(𝑄)
Qopt represents the optimal quantity Q of goods that the producer is willing to supply,
which is indeed the supply itself:
𝑆(𝑃)=𝑄opt(𝑃)=(𝐶′)−1(𝑃).
profitmax(𝑃)
𝐶(𝑄)
Geometric interpretation[edit]
and
-intercept
𝑏
is given by
𝑦=𝑝𝑥+𝑏
at the point
(𝑥0,𝑓(𝑥0))
requires
𝑓(𝑥0)=𝑝𝑥0+𝑏
and
𝑝=𝑓′(𝑥0).
𝑓′
is strictly monotone and thus injective. The second equation can be solved for
𝑥0=𝑓′−1(𝑝),
allowing elimination of
𝑥0
-intercept
𝑝,
𝑏=𝑓(𝑥0)−𝑝𝑥0=𝑓(𝑓′−1(𝑝))−𝑝⋅𝑓′−1(𝑝)=−𝑓⋆(𝑝)
where
𝑓⋆
𝑓.
is therefore given by
𝑦=𝑝𝑥−𝑓⋆(𝑝),
𝐹(𝑥,𝑦,𝑝)=𝑦+𝑓⋆(𝑝)−𝑝𝑥=0 .
The graph of the original function can be reconstructed from this family of lines as the
envelope of this family by demanding
∂𝐹(𝑥,𝑦,𝑝)∂𝑝=𝑓⋆′(𝑝)−𝑥=0.
Eliminating
𝑦=𝑥⋅𝑓⋆′−1(𝑥)−𝑓⋆(𝑓⋆′−1(𝑥)).
Identifying
with
𝑓(𝑥)
and recognizing the right side of the preceding equation as the Legendre transform
of
𝑓⋆,
yield
𝑓(𝑥)=𝑓⋆⋆(𝑥) .
Legendre transformation in more than one dimension[edit]
n
For a differentiable real-valued function on an open convex subset U of R the Legendre
conjugate of the pair (U, f) is defined to be the pair (V, g), where V is the image of U
under the gradient mapping Df, and g is the function on V given by the formula
𝑔(𝑦)=⟨𝑦,𝑥⟩−𝑓(𝑥),𝑥=(𝐷𝑓)−1(𝑦)
where
⟨𝑢,𝑣⟩=∑𝑘=1𝑛𝑢𝑘⋅𝑣𝑘
n
is the scalar product on R . The multidimensional transform can be interpreted as an
encoding of the convex hull of the function's epigraph in terms of its supporting
[2]
hyperplanes. This can be seen as consequence of the following two observations. On
the one hand, the hyperplane tangent to the epigraph of
at some point
(𝑥,𝑓(𝑥))∈𝑈×𝑅
(∇𝑓(𝑥),−1)∈𝑅𝑛+1
𝐶∈𝑅𝑚
can be characterized via the set of its supporting hyperplanes by the equations
𝑥⋅𝑛=ℎ𝐶(𝑛)
, where
ℎ𝐶(𝑛)
. But the definition of Legendre transform via the maximization matches precisely that
of the support function, that is,
𝑓∗(𝑥)=ℎepi(𝑓)(𝑥,−1)
(𝑥,𝑓(𝑥))
is given explicitly by
{𝑧∈𝑅𝑛+1:𝑧⋅𝑥=𝑓∗(𝑥)}.
Alternatively, if X is a vector space and Y is its dual vector space, then for each point x of
X and y of Y, there is a natural identification of the cotangent spaces T*Xx with Y and
T*Yy with X. If f is a real differentiable function over X, then its exterior derivative, df, is a
section of the cotangent bundle T*X and as such, we can construct a map from X to Y.
Similarly, if g is a real differentiable function over Y, then dg defines a map from Y to X. If
both maps happen to be inverses of each other, we say we have a Legendre transform.
The notion of the tautological one-form is commonly used in this setting.
When the function is not differentiable, the Legendre transform can still be extended, and
is known as the Legendre-Fenchel transformation. In this more general setting, a few
properties are lost: for example, the Legendre transform is no longer its own inverse
(unless there are extra assumptions, like convexity).
Legendre transformation on manifolds[edit]
Let
and
𝜋:𝐸→𝑀
be a vector bundle on
𝐿:𝐸→𝑅
𝑀=𝑅
𝐸=𝑇𝑀=𝑅×𝑅
and
𝐿(𝑥,𝑣)=12𝑚𝑣2−𝑉(𝑥)
for some positive number
𝑚∈𝑅
and function
𝑉:𝑀→𝑅
is denote by
𝐸∗
. The fiber of
over
𝑥∈𝑀
is denoted
𝐸𝑥
to
𝐸𝑥
is denoted by
𝐿|𝐸𝑥:𝐸𝑥→𝑅
𝐹𝐿:𝐸→𝐸∗
defined by
𝐹𝐿(𝑣)=𝑑(𝐿|𝐸𝑥)(𝑣)∈𝐸𝑥∗
, where
𝑥=𝜋(𝑣)
. In other words,
𝐹𝐿(𝑣)∈𝐸𝑥∗
𝑤∈𝐸𝑥
𝑑𝑑𝑡|𝑡=0𝐿(𝑣+𝑡𝑤)∈𝑅
𝑈⊆𝑀
𝐸
is trivial. Picking a trivialization of
over
, we obtain charts
𝐸𝑈≅𝑈×𝑅𝑟
and
𝐸𝑈∗≅𝑈×𝑅𝑟
𝐹𝐿(𝑥;𝑣1,…,𝑣𝑟)=(𝑥;𝑝1,…,𝑝𝑟)
, where
𝑝𝑖=∂𝐿∂𝑣𝑖(𝑥;𝑣1,…,𝑣𝑟)
for all
𝑖=1,…,𝑟
𝐿:𝐸→𝑅
to each fiber
𝐸𝑥
is strictly convex and bounded below by a positive definite quadratic form minus a
constant, then the Legendre transform
𝐹𝐿:𝐸→𝐸∗
[3]
is a diffeomorphism. Suppose that
𝐹𝐿
𝐻:𝐸∗→𝑅
𝐻(𝑝)=𝑝⋅𝑣−𝐿(𝑣),
where
𝑣=(𝐹𝐿)−1(𝑝)
𝐸≅𝐸∗∗
as a map
𝐹𝐻:𝐸∗→𝐸
[3]
. Then we have
(𝐹𝐿)−1=𝐹𝐻.
Further properties[edit]
Scaling properties[edit]
The Legendre transformation has the following scaling properties: For a > 0,
𝑓(𝑥)=𝑎⋅𝑔(𝑥)⇒𝑓⋆(𝑝)=𝑎⋅𝑔⋆(𝑝𝑎)
𝑓(𝑥)=𝑔(𝑎⋅𝑥)⇒𝑓⋆(𝑝)=𝑔⋆(𝑝𝑎).
It follows that if a function is homogeneous of degree r then its image under the
Legendre transformation is a homogeneous function of degree s, where 1/r + 1/s = 1.
r s
(Since f(x) = x /r, with r > 1, implies f*(p) = p /s.) Thus, the only monomial whose degree
is invariant under Legendre transform is the quadratic.
𝑓(𝑥)=𝑔(𝑥)+𝑏⇒𝑓⋆(𝑝)=𝑔⋆(𝑝)−𝑏
𝑓(𝑥)=𝑔(𝑥+𝑦)⇒𝑓⋆(𝑝)=𝑔⋆(𝑝)−𝑝⋅𝑦
𝑓(𝑥)=𝑔−1(𝑥)⇒𝑓⋆(𝑝)=−𝑝⋅𝑔⋆(1𝑝)
n m n
Let A : R → R be a linear transformation. For any convex function f on R , one has
(𝐴𝑓)⋆=𝑓⋆𝐴⋆
where A* is the adjoint operator of A defined by
⟨𝐴𝑥,𝑦⋆⟩=⟨𝑥,𝐴⋆𝑦⋆⟩,
(𝐴𝑓)(𝑦)=inf{𝑓(𝑥):𝑥∈𝑋,𝐴𝑥=𝑦}.
𝑓(𝐴𝑥)=𝑓(𝑥),∀𝑥,∀𝐴∈𝐺
Infimal convolution[edit]
(𝑓⋆inf𝑔)(𝑥)=inf{𝑓(𝑥−𝑦)+𝑔(𝑦)|𝑦∈𝑅𝑛}.
n
Let f1, ..., fm be proper convex functions on R . Then
(𝑓1⋆inf⋯⋆inf𝑓𝑚)⋆=𝑓1⋆+⋯+𝑓𝑚⋆.
Fenchel's inequality[edit]
For any function f and its convex conjugate f * Fenchel's inequality (also known as the
Fenchel–Young inequality) holds for every x ∈ X and p ∈ X*, i.e., independent x, p
pairs,
⟨𝑝,𝑥⟩≤𝑓(𝑥)+𝑓⋆(𝑝).
See also[edit]
● Dual curve
● Projective duality
● Young's inequality for products
● Convex conjugate
● Moreau's theorem
● Integration by parts
● Fenchel's duality theorem
References