Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 75

Legendre

transformation
21 languages
Article
Talk
Read
Edit
View history
Tools
From Wikipedia, the free encyclopedia

This article is about an involution transform commonly used in classical mechanics and
thermodynamics. For the integral transform using Legendre polynomials as kernels, see
Legendre transform (integral transform).

The function

𝑓(𝑥)

is defined on the interval

[𝑎,𝑏]

. For a given

𝑝
, the difference

𝑝𝑥−𝑓(𝑥)

takes the maximum at

𝑥′

. Thus, the Legendre transformation of

𝑓(𝑥)

is

𝑓∗(𝑝)=𝑝𝑥′−𝑓(𝑥′)

In mathematics, the Legendre transformation (or Legendre transform), first introduced by


[1]
Adrien-Marie Legendre in 1787 when studying the minimal surface problem, is an
involutive transformation on real-valued functions that are convex on a real variable.
Specifically, if a real-valued multivariable function is convex on one of its independent
real variables, then the Legendre transform with respect to this variable is applicable to
the function.

In physical problems, the Legendre transform is used to convert functions of one


quantity (such as position, pressure, or temperature) into functions of the conjugate
quantity (momentum, volume, and entropy, respectively). In this way, it is commonly
used in classical mechanics to derive the Hamiltonian formalism out of the Lagrangian
formalism (or vice versa) and in thermodynamics to derive the thermodynamic
potentials, as well as in the solution of differential equations of several variables.

For sufficiently smooth functions on the real line, the Legendre transform

𝑓∗

of a function

𝑓
can be specified, up to an additive constant, by the condition that the functions' first
derivatives are inverse functions of each other. This can be expressed in Euler's
derivative notation as

𝐷𝑓(⋅)=(𝐷𝑓∗)−1(⋅) ,

where

is an operator of differentiation,

represents an argument or input to the associated function,

(𝜙)−1(⋅)

is an inverse function such that

(𝜙)−1(𝜙(𝑥))=𝑥

or equivalently, as

𝑓′(𝑓∗′(𝑥∗))=𝑥∗

and

𝑓∗′(𝑓′(𝑥))=𝑥

in Lagrange's notation.

The generalization of the Legendre transformation to affine spaces and non-convex


functions is known as the convex conjugate (also called the Legendre–Fenchel
transformation), which can be used to construct a function's convex hull.

Definition[edit]
Definition in

[edit]

Let

𝐼⊂𝑅

be an interval, and

𝑓:𝐼→𝑅

a convex function; then the Legendre transform of

is the function

𝑓∗:𝐼∗→𝑅

defined by

𝑓∗(𝑥∗)=sup𝑥∈𝐼(𝑥∗𝑥−𝑓(𝑥)), 𝐼∗={𝑥∗∈𝑅:𝑓∗(𝑥∗)<∞}

where

sup

denotes the supremum over

, e.g.,

𝑥
in

is chosen such that

𝑥∗𝑥−𝑓(𝑥)

is maximized at each

𝑥∗

, or

𝑥∗

is such that

𝑥∗𝑥−𝑓(𝑥)

as a bounded value throughout

𝑥
exists (e.g., when

𝑓(𝑥)

is a linear function).

The transform is always well-defined when

𝑓(𝑥)

is convex. This definition requires

𝑥∗𝑥−𝑓(𝑥)

to be bounded from above in

in order for the supremum to exist.


Definition in

𝑅𝑛

[edit]

The generalization to convex functions

𝑓:𝑋→𝑅

on a convex set

𝑋⊂𝑅𝑛

is straightforward:

𝑓∗:𝑋∗→𝑅

has domain

𝑋∗={𝑥∗∈𝑅𝑛:sup𝑥∈𝑋(⟨𝑥∗,𝑥⟩−𝑓(𝑥))<∞}

and is defined by

𝑓∗(𝑥∗)=sup𝑥∈𝑋(⟨𝑥∗,𝑥⟩−𝑓(𝑥)),𝑥∗∈𝑋∗ ,

where

⟨𝑥∗,𝑥⟩

denotes the dot product of

𝑥∗
and

𝑥
.

The function

𝑓∗

is called the convex conjugate function of

. For historical reasons (rooted in analytic mechanics), the conjugate variable is often
denoted

, instead of

𝑥∗

. If the convex function

is defined on the whole line and is everywhere differentiable, then

𝑓∗(𝑝)=sup𝑥∈𝐼(𝑝𝑥−𝑓(𝑥))=(𝑝𝑥−𝑓(𝑥))|𝑥=(𝑓′)−1(𝑝)

can be interpreted as the negative of the

-intercept of the tangent line to the graph of

that has slope


𝑝

The Legendre transformation is an application of the duality relationship between points


and lines. The functional relationship specified by

can be represented equally well as a set of

(𝑥,𝑦)

points, or as a set of tangent lines specified by their slope and intercept values.

Understanding the Legendre transform in terms of


derivatives[edit]

For a differentiable convex function

on the real line with the first derivative

𝑓′

and its inverse

(𝑓′)−1

, the Legendre transform of

𝑓∗
, can be specified, up to an additive constant, by the condition that the functions' first
derivatives are inverse functions of each other, i.e.,

𝑓′=((𝑓∗)′)−1

and

(𝑓∗)′=(𝑓′)−1

To see this, first note that if

as a convex function on the real line is differentiable and

𝑥¯

is a critical point of the function of

𝑥↦𝑝⋅𝑥−𝑓(𝑥)

, then the supremum is achieved at

𝑥¯

(by convexity, see the first figure in this Wikipedia page). Therefore, the Legendre
transform of

is

𝑓∗(𝑝)=𝑝⋅𝑥¯−𝑓(𝑥¯)

.
Then, suppose that the first derivative

𝑓′

is invertible and let the inverse be

𝑔=(𝑓′)−1

. Then for each

, the point

𝑔(𝑝)

is the unique critical point

𝑥¯

of the function

𝑥↦𝑝𝑥−𝑓(𝑥)

(i.e.,

𝑥¯=𝑔(𝑝)

) because

𝑓′(𝑔(𝑝))=𝑝

and the function's first derivative with respect to

at
𝑔(𝑝)

is

𝑝−𝑓′(𝑔(𝑝))=0

. Hence we have

𝑓∗(𝑝)=𝑝⋅𝑔(𝑝)−𝑓(𝑔(𝑝))

for each

. By differentiating with respect to

, we find

(𝑓∗)′(𝑝)=𝑔(𝑝)+𝑝⋅𝑔′(𝑝)−𝑓′(𝑔(𝑝))⋅𝑔′(𝑝).

Since

𝑓′(𝑔(𝑝))=𝑝

this simplifies to

(𝑓∗)′(𝑝)=𝑔(𝑝)=(𝑓′)−1(𝑝)

. In other words,

(𝑓∗)′

and

𝑓′

are inverses to each other.


In general, if

ℎ′=(𝑓′)−1

as the inverse of

𝑓′

, then

ℎ′=(𝑓∗)′

so integration gives

𝑓∗=ℎ+𝑐

. with a constant

In practical terms, given

𝑓(𝑥)

, the parametric plot of

𝑥𝑓′(𝑥)−𝑓(𝑥)

versus

𝑓′(𝑥)

amounts to the graph of

𝑓∗(𝑝)
versus

In some cases (e.g. thermodynamic potentials, below), a non-standard requirement is


used, amounting to an alternative definition of f * with a minus sign,

𝑓(𝑥)−𝑓∗(𝑝)=𝑥𝑝.

Formal Definition in Physics Context[edit]

In analytical mechanics and thermodynamics, Legendre transformation is usually defined as


follows: suppose

is a function of

, then we have

d𝑓=d𝑓d𝑥d𝑥

performing Legendre transformation on this function means that we take

𝑝=d𝑓d𝑥

as the independent variable, so that the above expression can be written as

d𝑓=𝑝d𝑥
,

and according to Leibniz's rule

d(𝑢𝑣)=𝑢d𝑣+𝑣d𝑢

, then we have

d(𝑥𝑝−𝑓)=𝑥d𝑝

and taking

𝑓∗=𝑥𝑝−𝑓

, we have

d𝑓∗=𝑥d𝑝

, which means

d𝑓∗d𝑝=𝑥.

When

is a function of

variables

𝑥1,𝑥2,⋯,𝑥𝑛
, then we can perform the Legendre transformation on each one or several
variables: we have

d𝑓=𝑝1d𝑥1+𝑝2d𝑥2+⋯+𝑝𝑛d𝑥𝑛

where

𝑝𝑖=∂𝑓∂𝑥𝑖

. Then if we want to perform Legendre transformation on, e.g.

𝑥1

, then we take

𝑝1

together with

𝑥2,⋯,𝑥𝑛

as independent variables, and with Leibniz's rule we have

d(𝑓−𝑥1𝑝1)=−𝑥1d𝑝1+𝑝2d𝑥2+⋯+𝑝𝑛d𝑥𝑛

so for function

𝜑(𝑝1,𝑥2,⋯,𝑥𝑛)=𝑓(𝑥1,𝑥2,⋯,𝑥𝑛)−𝑥1𝑝1

, we have

∂𝜑∂𝑝1=−𝑥1,∂𝜑∂𝑥2=𝑝2,⋯,∂𝜑∂𝑥𝑛=𝑝𝑛
.

We can also do this transformation for variables

𝑥2,⋯,𝑥𝑛

. If we do it to all the variables, then we have

d𝜑=−𝑥1d𝑝1−𝑥2d𝑝2−⋯−𝑥𝑛d𝑝𝑛

where

𝜑=𝑓−𝑥1𝑝1−𝑥2𝑝2−⋯−𝑥𝑛𝑝𝑛

In analytical mechanics, people perform this transformation on variables

𝑞˙1,𝑞˙2,⋯,𝑞˙𝑛

of the Lagrangian

𝐿(𝑞1,⋯,𝑞𝑛,𝑞˙1,⋯,𝑞˙𝑛)

to get the Hamiltonian:

𝐻(𝑞1,⋯,𝑞𝑛,𝑝1,⋯,𝑝𝑛)=∑𝑖=1𝑛𝑝𝑖𝑞˙𝑖−𝐿(𝑞1,⋯,𝑞𝑛,𝑞˙1⋯,𝑞˙𝑛)

and in thermodynamics, people perform this transformation on variables according to


the type of thermodynamic system they want. E.g. starting from the cardinal function of
state, the internal energy

𝑈(𝑆,𝑉)

, we have
d𝑈=𝑇d𝑆−𝑝d𝑉

we can perform Legendre transformation on either or both of

𝑆,𝑉

yielding

d𝐻=d(𝑈+𝑝𝑉) = 𝑇d𝑆+𝑉d𝑝

d𝐹=d(𝑈−𝑇𝑆) =−𝑆d𝑇−𝑝d𝑉

d𝐺=d(𝑈−𝑇𝑆+𝑝𝑉)=−𝑆d𝑇+𝑉d𝑝

and each of these three expressions has a physical meaning.

This definition of Legendre transformation is the one originally introduced by Legendre


[1]
in his work in 1787, and still applied by physicists nowadays. Indeed, this definition can
be mathematically rigorous if we treat all the variables and functions defined above, e.g.

𝑓,𝑥1,⋯,𝑥𝑛,𝑝1,⋯,𝑝𝑛,

as differentiable functions defined on an open set of

𝑅𝑛

or on a differentiable manifold, and

d𝑓,d𝑥𝑖,d𝑝𝑖
their differentials (which are treated as cotangent vector field in the context
of differentiable manifold). And this definition is equivalent to the modern
mathematicians' definition as long as

is differentiable and convex for the variables

𝑥1,𝑥2,⋯,𝑥𝑛

Properties[edit]

● The Legendre transform of a convex function, of which double derivative values are
all positive, is also a convex function of which double derivative values are all
positive.
Proof. Let us show this with a doubly differentiable function
● 𝑓(𝑥)
● with all positive double derivative values and with a bijective (invertible)
derivative.
For a fixed
● 𝑝
● , let
● 𝑥¯
● maximize or make the function
● 𝑝𝑥−𝑓(𝑥)
● bounded over
● 𝑥
● . Then the Legendre transformation of
● 𝑓
● is
● 𝑓∗(𝑝)=𝑝𝑥¯−𝑓(𝑥¯)
● , thus,
● 𝑓′(𝑥¯)=𝑝

by the maximizing or bounding condition
● 𝑑𝑑𝑥(𝑝𝑥−𝑓(𝑥))=𝑝−𝑓′(𝑥)=0

● . Note that
● 𝑥¯
● depends on
● 𝑝
● . (This can be visually shown in the 1st figure of this page above.)
Thus
● 𝑥¯=𝑔(𝑝)
● where
● 𝑔≡(𝑓′)−1
● , meaning that
● 𝑔
● is the inverse of
● 𝑓′
● that is the derivative of
● 𝑓
● (so
● 𝑓′(𝑔(𝑝))=𝑝
● ).
Note that
● 𝑔
● is also differentiable with the following derivative (Inverse function rule),
● 𝑑𝑔(𝑝)𝑑𝑝=1𝑓″(𝑔(𝑝)) .


Thus, the Legendre transformation
● 𝑓∗(𝑝)=𝑝𝑔(𝑝)−𝑓(𝑔(𝑝))
● is the composition of differentiable functions, hence
it is differentiable.
Applying the product rule and the chain rule with the found equality
● 𝑥¯=𝑔(𝑝)
● yields
● 𝑑(𝑓∗)𝑑𝑝=𝑔(𝑝)+(𝑝−𝑓′(𝑔(𝑝)))⋅𝑑𝑔(𝑝)𝑑𝑝=𝑔(𝑝),


giving
● 𝑑2(𝑓∗)𝑑𝑝2=𝑑𝑔(𝑝)𝑑𝑝=1𝑓″(𝑔(𝑝))>0,


so
● 𝑓∗
● is convex with its double derivatives are all positive.
● The Legendre transformation is an involution, i.e.,
● 𝑓∗∗=𝑓
● .
Proof. By using the above identities as
● 𝑓′(𝑥¯)=𝑝
● ,
● 𝑥¯=𝑔(𝑝)
● ,
● 𝑓∗(𝑝)=𝑝𝑥¯−𝑓(𝑥¯)
● and its derivative
● (𝑓∗)′(𝑝)=𝑔(𝑝)
● ,
● 𝑓∗∗(𝑦)=(𝑦⋅𝑝¯−𝑓∗(𝑝¯))|(𝑓∗)′(𝑝¯)=𝑦=𝑔(𝑝¯)⋅𝑝¯−𝑓∗(𝑝¯)=𝑔(𝑝¯)⋅𝑝¯−(𝑝¯𝑔(𝑝¯)
−𝑓(𝑔(𝑝¯)))=𝑓(𝑔(𝑝¯))=𝑓(𝑦) .


Note that this derivation does not require the condition to have all positive
values in double derivative of the original function
● 𝑓
● .

Identities[edit]

As shown above, for a convex function

𝑓(𝑥)

, with

𝑥=𝑥¯

maximizing or making

𝑝𝑥−𝑓(𝑥)
bounded at each

to define the Legendre transform

𝑓∗(𝑝)=𝑝𝑥¯−𝑓(𝑥¯)

and with

𝑔≡(𝑓′)−1

, the following identities hold.

● 𝑓′(𝑥¯)=𝑝
● ,
● 𝑥¯=𝑔(𝑝)
● ,
● (𝑓∗)′(𝑝)=𝑔(𝑝)
● .

Examples[edit]

Example 1[edit]

𝑓(𝑥)=𝑒𝑥
over the domain

𝐼=𝑅

is plotted in red and its Legendre transform

𝑓∗(𝑥∗)=𝑥∗(ln⁡(𝑥∗)−1)

over the domain

𝐼∗=(0,∞)

in dashed blue. Note that the Legendre transform appears convex.

Consider the exponential function

𝑓(𝑥)=𝑒𝑥,

which has the domain

𝐼=𝑅

. From the definition, the Legendre transform is

𝑓∗(𝑥∗)=sup𝑥∈𝑅(𝑥∗𝑥−𝑒𝑥),𝑥∗∈𝐼∗

where

𝐼∗

remains to be determined. To evaluate the supremum, compute the derivative of

𝑥∗𝑥−𝑒𝑥

with respect to

𝑥
and set equal to zero:

𝑑𝑑𝑥(𝑥∗𝑥−𝑒𝑥)=𝑥∗−𝑒𝑥=0.

The second derivative

−𝑒𝑥

is negative everywhere, so the maximal value is achieved at

𝑥=ln⁡(𝑥∗)

. Thus, the Legendre transform is

𝑓∗(𝑥∗)=𝑥∗ln⁡(𝑥∗)−𝑒ln⁡(𝑥∗)=𝑥∗(ln⁡(𝑥∗)−1)

and has domain

𝐼∗=(0,∞).

This illustrates that the domains of a function and its Legendre transform
can be different.

To find the Legendre transformation of the Legendre transformation of

𝑓∗∗(𝑥)=sup𝑥∗∈𝑅(𝑥𝑥∗−𝑥∗(ln⁡(𝑥∗)−1)),𝑥∈𝐼,

where a variable

𝑥
is intentionally used as the argument of the function
𝑓∗∗

to show the involution property of the Legendre transform as

𝑓∗∗=𝑓

. we compute

0=𝑑𝑑𝑥∗(𝑥𝑥∗−𝑥∗(ln⁡(𝑥∗)−1))=𝑥−ln⁡(𝑥∗)

thus the maximum occurs at

𝑥∗=𝑒𝑥

because the second derivative

𝑑2𝑑𝑥∗2𝑓∗∗(𝑥)=−1𝑥∗<0

over the domain of

𝑓∗∗

as

𝐼∗=(0,∞).

As a result,

𝑓∗∗

is found as

𝑓∗∗(𝑥)=𝑥𝑒𝑥−𝑒𝑥(ln⁡(𝑒𝑥)−1)=𝑒𝑥,

thereby confirming that

𝑓=𝑓∗∗,

as expected.
Example 2[edit]

2
Let f(x) = cx defined on R, where c > 0 is a fixed constant.

2
For x* fixed, the function of x, x*x − f(x) = x*x − cx has the first derivative x* − 2cx
and second derivative −2c; there is one stationary point at x = x*/2c, which is always a
maximum.

Thus, I* = R and

𝑓∗(𝑥∗)=𝑥∗24𝑐 .

The first derivatives of f, 2cx, and of f *, x*/(2c), are inverse functions to each other.
Clearly, furthermore,

𝑓∗∗(𝑥)=14(1/4𝑐)𝑥2=𝑐𝑥2 ,

namely f ** = f.

Example 3[edit]

2
Let f(x) = x for x ∈ (I = [2, 3]).

For x* fixed, x*x − f(x) is continuous on I compact, hence it always takes a finite
maximum on it; it follows that the domain of the Legendre transform of

is I* = R.

The stationary point at x = x*/2 (found by setting that the first derivative of x*x − f(x)
with respect to
𝑥

equal to zero) is in the domain [2, 3] if and only if 4 ≤ x* ≤ 6. Otherwise the maximum
is taken either at x = 2 or x = 3 because the second derivative of x*x − f(x) with respect
to

is negative as

−2

; for a part of the domain

𝑥∗<4

the maximum that x*x − f(x) can take with respect to

𝑥∈[2,3]

is obtained at

𝑥=2

while for

𝑥∗>6

it becomes the maximum at

𝑥=3

. Thus, it follows that

𝑓∗(𝑥∗)={2𝑥∗−4,𝑥∗<4𝑥∗24,4≤𝑥∗≤6,3𝑥∗−9,𝑥∗>6.
Example 4[edit]

The function f(x) = cx is convex, for every x (strict convexity is not required for the
Legendre transformation to be well defined). Clearly x*x − f(x) = (x* − c)x is never
bounded from above as a function of x, unless x* − c = 0. Hence f* is defined on I* =
{c} and f*(c) = 0. (The definition of the Legendre transform requires the existence of the
supremum, that requires upper bounds.)

One may check involutivity: of course, x*x − f*(x*) is always bounded as a function of
x*∈{c}, hence I** = R. Then, for all x one has

sup𝑥∗∈{𝑐}(𝑥𝑥∗−𝑓∗(𝑥∗))=𝑥𝑐,

and hence f **(x) = cx = f(x).

Example 5[edit]

As an example of a convex continuous function that is not everywhere differentiable, consider

𝑓(𝑥)=|𝑥|

. This gives

𝑓∗(𝑥∗)=sup𝑥(𝑥𝑥∗−|𝑥|)=max(sup𝑥≥0𝑥(𝑥∗−1),sup𝑥≤0𝑥(𝑥∗+1)),

and thus

𝑓∗(𝑥∗)=0

on its domain

𝐼∗=[−1,1]

.
Example 6: several variables[edit]

Let

𝑓(𝑥)=⟨𝑥,𝐴𝑥⟩+𝑐

Orthogonal matrix
31 languages
Article
Talk
Read
Edit
View history
Tools
From Wikipedia, the free encyclopedia

For matrices with orthogonality over the complex number field, see unitary matrix.

This article includes


a list of general
references, but it
lacks sufficient
corresponding
inline citations.
Please help to
improve this article
by introducing more
precise citations.
(May 2023) (Learn
how and when to
remove this message)

In linear algebra, an orthogonal matrix, or orthonormal matrix, is a real square matrix


whose columns and rows are orthonormal vectors.

One way to express this is

𝑄T𝑄=𝑄𝑄T=𝐼,
T
where Q is the transpose of Q and I is the identity matrix.

This leads to the equivalent characterization: a matrix Q is orthogonal if its transpose is


equal to its inverse:

𝑄T=𝑄−1,

−1
where Q is the inverse of Q.

−1 T −1
An orthogonal matrix Q is necessarily invertible (with inverse Q = Q ), unitary (Q =
∗ ∗
Q ), where Q is the Hermitian adjoint (conjugate transpose) of Q, and therefore
∗ ∗
normal (Q Q = QQ ) over the real numbers. The determinant of any orthogonal matrix is
either +1 or −1. As a linear transformation, an orthogonal matrix preserves the inner
product of vectors, and therefore acts as an isometry of Euclidean space, such as a
rotation, reflection or rotoreflection. In other words, it is a unitary transformation.

The set of n × n orthogonal matrices, under multiplication, forms the group O(n), known
as the orthogonal group. The subgroup SO(n) consisting of orthogonal matrices with
determinant +1 is called the special orthogonal group, and each of its elements is a
special orthogonal matrix. As a linear transformation, every special orthogonal matrix
acts as a rotation.

Overview[edit]
Visual understanding of multiplication by the transpose of a matrix. If A is an orthogonal matrix and B is its
T
transpose, the ij-th element of the product AA will vanish if i≠j, because the i-th row of A is orthogonal to the j-
th row of A.

An orthogonal matrix is the real specialization of a unitary matrix, and thus always a
normal matrix. Although we consider only real matrices here, the definition can be used
for matrices with entries from any field. However, orthogonal matrices arise naturally
from dot products, and for matrices of complex numbers that leads instead to the unitary
requirement. Orthogonal matrices preserve the dot product,[1] so, for vectors u and v in
an n-dimensional real Euclidean space

𝑢⋅𝑣=(𝑄𝑢)⋅(𝑄𝑣)

where Q is an orthogonal matrix. To see the inner product connection, consider a vector
v in an n-dimensional real Euclidean space. Written with respect to an orthonormal
T
basis, the squared length of v is v v. If a linear transformation, in matrix form Qv,
preserves vector lengths, then

𝑣T𝑣=(𝑄𝑣)T(𝑄𝑣)=𝑣T𝑄T𝑄𝑣.
Thus finite-dimensional linear isometries—rotations, reflections, and their combinations
—produce orthogonal matrices. The converse is also true: orthogonal matrices imply
orthogonal transformations. However, linear algebra includes orthogonal
transformations between spaces which may be neither finite-dimensional nor of the
same dimension, and these have no orthogonal matrix equivalent.

Orthogonal matrices are important for a number of reasons, both theoretical and
practical. The n × n orthogonal matrices form a group under matrix multiplication, the
orthogonal group denoted by O(n), which—with its subgroups—is widely used in
mathematics and the physical sciences. For example, the point group of a molecule is a
subgroup of O(3). Because floating point versions of orthogonal matrices have
advantageous properties, they are key to many algorithms in numerical linear algebra,
such as QR decomposition. As another example, with appropriate normalization the
discrete cosine transform (used in MP3 compression) is represented by an orthogonal
matrix.

Examples[edit]

Below are a few examples of small orthogonal matrices and possible interpretations.

● [1001]

● (identity transformation)
● [cos⁡𝜃−sin⁡𝜃sin⁡𝜃cos⁡𝜃]

● (rotation about the origin)


● [100−1]

● (reflection across x-axis)


● [0001001010000100]

● (permutation of coordinate axes)

Elementary constructions[edit]
Lower dimensions[edit]

The simplest orthogonal matrices are the 1 × 1 matrices [1] and [−1], which we can interpret as
the identity and a reflection of the real line across the origin.

The 2 × 2 matrices have the form

[𝑝𝑡𝑞𝑢],

which orthogonality demands satisfy the three equations

1=𝑝2+𝑡2,1=𝑞2+𝑢2,0=𝑝𝑞+𝑡𝑢.

In consideration of the first equation, without loss of generality let p = cos θ, q = sin θ;
then either t = −q, u = p or t = q, u = −p. We can interpret the first case as a rotation by
θ (where θ = 0 is the identity), and the second as a reflection across a line at an angle of

[cos⁡𝜃−sin⁡𝜃sin⁡𝜃cos⁡𝜃] (rotation), [cos⁡𝜃sin⁡𝜃sin⁡𝜃−cos⁡𝜃] (reflection)


The special case of the reflection matrix with θ = 90° generates a reflection about the
line at 45° given by y = x and therefore exchanges x and y; it is a permutation matrix,
with a single 1 in each column and row (and otherwise 0):

[0110].

The identity is also a permutation matrix.

A reflection is its own inverse, which implies that a reflection matrix is symmetric (equal
to its transpose) as well as orthogonal. The product of two rotation matrices is a rotation
matrix, and the product of two reflection matrices is also a rotation matrix.

Higher dimensions[edit]

Regardless of the dimension, it is always possible to classify orthogonal matrices as


purely rotational or not, but for 3 × 3 matrices and larger the non-rotational matrices can
be more complicated than reflections. For example,

[−1000−1000−1] and [0−1010000−1]

represent an inversion through the origin and a rotoinversion, respectively, about the z-
axis.

Rotations become more complicated in higher dimensions; they can no longer be


completely characterized by one angle, and may affect more than one planar subspace.
It is common to describe a 3 × 3 rotation matrix in terms of an axis and angle, but this
only works in three dimensions. Above three dimensions two or more angles are
needed, each associated with a plane of rotation.

However, we have elementary building blocks for permutations, reflections, and


rotations that apply in general.
Primitives[edit]

The most elementary permutation is a transposition, obtained from the identity matrix by
exchanging two rows. Any n × n permutation matrix can be constructed as a product of
no more than n − 1 transpositions.

A Householder reflection is constructed from a non-null vector v as

𝑄=𝐼−2𝑣𝑣T𝑣T𝑣.

Here the numerator is a symmetric matrix while the denominator is a number, the
squared magnitude of v. This is a reflection in the hyperplane perpendicular to v
T
(negating any vector component parallel to v). If v is a unit vector, then Q = I − 2vv
suffices. A Householder reflection is typically used to simultaneously zero the lower part
of a column. Any orthogonal matrix of size n × n can be constructed as a product of at
most n such reflections.

A Givens rotation acts on a two-dimensional (planar) subspace spanned by two


coordinate axes, rotating by a chosen angle. It is typically used to zero a single
subdiagonal entry. Any rotation matrix of size n × n can be constructed as a product of
at most

n(n − 1)

such rotations. In the case of 3 × 3 matrices, three such rotations suffice; and by fixing
the sequence we can thus describe all 3 × 3 rotation matrices (though not uniquely) in
terms of the three angles used, often called Euler angles.

A Jacobi rotation has the same form as a Givens rotation, but is used to zero both off-
diagonal entries of a 2 × 2 symmetric submatrix.
Properties[edit]

Matrix properties[edit]

A real square matrix is orthogonal if and only if its columns form an orthonormal basis of
n
the Euclidean space R with the ordinary Euclidean dot product, which is the case if and
n
only if its rows form an orthonormal basis of R . It might be tempting to suppose a
matrix with orthogonal (not orthonormal) columns would be called an orthogonal matrix,
T
but such matrices have no special interest and no special name; they only satisfy M M
= D, with D a diagonal matrix.

The determinant of any orthogonal matrix is +1 or −1. This follows from basic facts about
determinants, as follows:

1=det(𝐼)=det(𝑄T𝑄)=det(𝑄T)det(𝑄)=(det(𝑄))2.

The converse is not true; having a determinant of ±1 is no guarantee of orthogonality,


even with orthogonal columns, as shown by the following counterexample.

[20012]

With permutation matrices the determinant matches the signature, being +1 or −1 as the
parity of the permutation is even or odd, for the determinant is an alternating function of the
rows.

Stronger than the determinant restriction is the fact that an orthogonal matrix can
always be diagonalized over the complex numbers to exhibit a full set of eigenvalues, all
of which must have (complex) modulus 1.

Group properties[edit]
The inverse of every orthogonal matrix is again orthogonal, as is the matrix product of
two orthogonal matrices. In fact, the set of all n × n orthogonal matrices satisfies all the
axioms of a group. It is a compact Lie group of dimension

n(n − 1)

, called the orthogonal group and denoted by O(n).

The orthogonal matrices whose determinant is +1 form a path-connected normal


subgroup of O(n) of index 2, the special orthogonal group SO(n) of rotations. The
quotient group O(n)/SO(n) is isomorphic to O(1), with the projection map choosing [+1] or
[−1] according to the determinant. Orthogonal matrices with determinant −1 do not include the
identity, and so do not form a subgroup but only a coset; it is also (separately) connected.
Thus each orthogonal group falls into two pieces; and because the projection map
splits, O(n) is a semidirect product of SO(n) by O(1). In practical terms, a comparable
statement is that any orthogonal matrix can be produced by taking a rotation matrix and
possibly negating one of its columns, as we saw with 2 × 2 matrices. If n is odd, then the
semidirect product is in fact a direct product, and any orthogonal matrix can be
produced by taking a rotation matrix and possibly negating all of its columns. This
follows from the property of determinants that negating a column negates the
determinant, and thus negating an odd (but not even) number of columns negates the
determinant.

Now consider (n + 1) × (n + 1) orthogonal matrices with bottom right entry equal to 1.


The remainder of the last column (and last row) must be zeros, and the product of any
two such matrices has the same form. The rest of the matrix is an n × n orthogonal
matrix; thus O(n) is a subgroup of O(n + 1) (and of all higher groups).

[0O(𝑛)⋮00⋯01]
Since an elementary reflection in the form of a Householder matrix can reduce any
orthogonal matrix to this constrained form, a series of such reflections can bring any
orthogonal matrix to the identity; thus an orthogonal group is a reflection group. The last
column can be fixed to any unit vector, and each choice gives a different copy of O(n) in
n
O(n + 1); in this way O(n + 1) is a bundle over the unit sphere S with fiber O(n).

Similarly, SO(n) is a subgroup of SO(n + 1); and any special orthogonal matrix can be
generated by Givens plane rotations using an analogous procedure. The bundle
n
structure persists: SO(n) ↪ SO(n + 1) → S . A single rotation can produce a zero in the
first row of the last column, and series of n − 1 rotations will zero all but the last row of
the last column of an n × n rotation matrix. Since the planes are fixed, each rotation has
only one degree of freedom, its angle. By induction, SO(n) therefore has

(𝑛−1)+(𝑛−2)+⋯+1=𝑛(𝑛−1)2

degrees of freedom, and so does O(n).

Permutation matrices are simpler still; they form, not a Lie group, but only a finite group,
the order n! symmetric group Sn. By the same kind of argument, Sn is a subgroup of Sn +
1. The even permutations produce the subgroup of permutation matrices of determinant
+1, the order

n!

alternating group.

Canonical form[edit]

More broadly, the effect of any orthogonal matrix separates into independent actions on
orthogonal two-dimensional subspaces. That is, if Q is special orthogonal then one can
always find an orthogonal matrix P, a (rotational) change of basis, that brings Q into
block diagonal form:

𝑃T𝑄𝑃=[𝑅1⋱𝑅𝑘] (𝑛 even), 𝑃T𝑄𝑃=[𝑅1⋱𝑅𝑘1] (𝑛 odd).

where the matrices R1, ..., Rk are 2 × 2 rotation matrices, and with the remaining entries
zero. Exceptionally, a rotation block may be diagonal, ±I. Thus, negating one column if
necessary, and noting that a 2 × 2 reflection diagonalizes to a +1 and −1, any orthogonal matrix
can be brought to the form

𝑃T𝑄𝑃=[𝑅1⋱𝑅𝑘00±1⋱±1],

The matrices R1, ..., Rk give conjugate pairs of eigenvalues lying on the unit circle in the
complex plane; so this decomposition confirms that all eigenvalues have absolute value
1. If n is odd, there is at least one real eigenvalue, +1 or −1; for a 3 × 3 rotation, the eigenvector
associated with +1 is the rotation axis.

Lie algebra[edit]

Suppose the entries of Q are differentiable functions of t, and that t = 0 gives Q = I.


Differentiating the orthogonality condition

𝑄T𝑄=𝐼
yields

𝑄˙T𝑄+𝑄T𝑄˙=0

Evaluation at t = 0 (Q = I) then implies

𝑄˙T=−𝑄˙.

In Lie group terms, this means that the Lie algebra of an orthogonal matrix group
consists of skew-symmetric matrices. Going the other direction, the matrix exponential
of any skew-symmetric matrix is an orthogonal matrix (in fact, special orthogonal).

For example, the three-dimensional object physics calls angular velocity is a differential
rotation, thus a vector in the Lie algebra

𝑠𝑜(3)

tangent to SO(3). Given ω = (xθ, yθ, zθ), with v = (x, y, z) being a unit vector, the
correct skew-symmetric matrix form of ω is

Ω=[0−𝑧𝜃𝑦𝜃𝑧𝜃0−𝑥𝜃−𝑦𝜃𝑥𝜃0].

The exponential of this is the orthogonal matrix for rotation around axis v by angle θ;
setting c = cos

/
2

, s = sin

exp⁡(Ω)=[1−2𝑠2+2𝑥2𝑠22𝑥𝑦𝑠2−2𝑧𝑠𝑐2𝑥𝑧𝑠2+2𝑦𝑠𝑐2𝑥𝑦𝑠2+2𝑧𝑠𝑐1−2𝑠2+2𝑦2𝑠22𝑦𝑧𝑠2−2𝑥𝑠𝑐2
𝑥𝑧𝑠2−2𝑦𝑠𝑐2𝑦𝑧𝑠2+2𝑥𝑠𝑐1−2𝑠2+2𝑧2𝑠2].

Numerical linear algebra[edit]

Benefits[edit]

Numerical analysis takes advantage of many of the properties of orthogonal matrices for
numerical linear algebra, and they arise naturally. For example, it is often desirable to
compute an orthonormal basis for a space, or an orthogonal change of bases; both take
the form of orthogonal matrices. Having determinant ±1 and all eigenvalues of
magnitude 1 is of great benefit for numeric stability. One implication is that the condition
number is 1 (which is the minimum), so errors are not magnified when multiplying with
an orthogonal matrix. Many algorithms use orthogonal matrices like Householder
reflections and Givens rotations for this reason. It is also helpful that, not only is an
orthogonal matrix invertible, but its inverse is available essentially free, by exchanging
indices.

Permutations are essential to the success of many algorithms, including the workhorse
Gaussian elimination with partial pivoting (where permutations do the pivoting).
However, they rarely appear explicitly as matrices; their special form allows more
efficient representation, such as a list of n indices.

Likewise, algorithms using Householder and Givens matrices typically use specialized
methods of multiplication and storage. For example, a Givens rotation affects only two
3
rows of a matrix it multiplies, changing a full multiplication of order n to a much more
efficient order n. When uses of these reflections and rotations introduce zeros in a
matrix, the space vacated is enough to store sufficient data to reproduce the transform,
and to do so robustly. (Following Stewart (1976), we do not store a rotation angle, which
is both expensive and badly behaved.)

Decompositions[edit]

A number of important matrix decompositions (Golub & Van Loan 1996) involve
orthogonal matrices, including especially:

QR decomposition

M = QR, Q orthogonal, R upper triangular

Singular value decomposition

T
M = UΣV , U and V orthogonal, Σ diagonal matrix

Eigendecomposition of a symmetric matrix (decomposition according to the


spectral theorem)

T
S = QΛQ , S symmetric, Q orthogonal, Λ diagonal

Polar decomposition

M = QS, Q orthogonal, S symmetric positive-semidefinite

Examples[edit]

Consider an overdetermined system of linear equations, as might occur with repeated


measurements of a physical phenomenon to compensate for experimental errors. Write
Ax = b, where A is m × n, m > n. A QR decomposition reduces A to upper triangular R.
For example, if A is 5 × 3 then R has the form

𝑅=[⋅⋅⋅0⋅⋅00⋅000000].
The linear least squares problem is to find the x that minimizes ‖Ax − b‖, which is
equivalent to projecting b to the subspace spanned by the columns of A. Assuming the
T
columns of A (and hence R) are independent, the projection solution is found from A Ax
T T T
= A b. Now A A is square (n × n) and invertible, and also equal to R R. But the lower
rows of zeros in R are superfluous in the product, which is thus already in lower-
triangular upper-triangular factored form, as in Gaussian elimination (Cholesky
T T T
decomposition). Here orthogonality is important not only for reducing A A = (R Q )QR
T
to R R, but also for allowing solution without magnifying numerical problems.

In the case of a linear system which is underdetermined, or an otherwise non-invertible


T
matrix, singular value decomposition (SVD) is equally useful. With A factored as UΣV ,
+ T +
a satisfactory solution uses the Moore-Penrose pseudoinverse, VΣ U , where Σ
+ T
merely replaces each non-zero diagonal entry with its reciprocal. Set x to VΣ U b.

The case of a square invertible matrix also holds interest. Suppose, for example, that A
is a 3 × 3 rotation matrix which has been computed as the composition of numerous
twists and turns. Floating point does not match the mathematical ideal of real numbers,
so A has gradually lost its true orthogonality. A Gram–Schmidt process could
orthogonalize the columns, but it is not the most reliable, nor the most efficient, nor the
most invariant method. The polar decomposition factors a matrix into a pair, one of
which is the unique closest orthogonal matrix to the given matrix, or one of the closest if
the given matrix is singular. (Closeness can be measured by any matrix norm invariant
under an orthogonal change of basis, such as the spectral norm or the Frobenius norm.)
For a near-orthogonal matrix, rapid convergence to the orthogonal factor can be
achieved by a "Newton's method" approach due to Higham (1986) (1990), repeatedly
averaging the matrix with its inverse transpose. Dubrulle (1999) has published an
accelerated method with a convenient convergence test.

For example, consider a non-orthogonal matrix for which the simple averaging algorithm
takes seven steps

[3175]→[1.81250.06253.43752.6875]→⋯→[0.8−0.60.60.8]
and which acceleration trims to two steps (with γ = 0.353553, 0.565685).

[3175]→[1.41421−1.060661.060661.41421]→[0.8−0.60.60.8]

Gram-Schmidt yields an inferior solution, shown by a Frobenius distance of 8.28659


instead of the minimum 8.12404.

[3175]→[0.393919−0.9191450.9191450.393919]

Randomization[edit]

Some numerical applications, such as Monte Carlo methods and exploration of high-
dimensional data spaces, require generation of uniformly distributed random orthogonal
matrices. In this context, "uniform" is defined in terms of Haar measure, which
essentially requires that the distribution not change if multiplied by any freely chosen
orthogonal matrix. Orthogonalizing matrices with independent uniformly distributed
random entries does not result in uniformly distributed orthogonal matrices[citation needed],
but the QR decomposition of independent normally distributed random entries does, as
long as the diagonal of R contains only positive entries (Mezzadri 2006). Stewart (1980)
replaced this with a more efficient idea that Diaconis & Shahshahani (1987) later
generalized as the "subgroup algorithm" (in which form it works just as well for
permutations and rotations). To generate an (n + 1) × (n + 1) orthogonal matrix, take an
n × n one and a uniformly distributed unit vector of dimension n + 1. Construct a
Householder reflection from the vector, then apply it to the smaller matrix (embedded in
the larger size with a 1 at the bottom right corner).

Nearest orthogonal matrix[edit]

The problem of finding the orthogonal matrix Q nearest a given matrix M is related to
the Orthogonal Procrustes problem. There are several different ways to get the unique
solution, the simplest of which is taking the singular value decomposition of M and
replacing the singular values with ones. Another method expresses the R explicitly but
requires the use of a matrix square root:[2]

𝑄=𝑀(𝑀T𝑀)−12

This may be combined with the Babylonian method for extracting the square root of a
matrix to give a recurrence which converges to an orthogonal matrix quadratically:

𝑄𝑛+1=2𝑀(𝑄𝑛−1𝑀+𝑀T𝑄𝑛)−1

where Q0 = M.

These iterations are stable provided the condition number of M is less than three.[3]

Using a first-order approximation of the inverse and the same initialization results in the
modified iteration:

𝑁𝑛=𝑄𝑛T𝑄𝑛

𝑃𝑛=12𝑄𝑛𝑁𝑛

𝑄𝑛+1=2𝑄𝑛+𝑃𝑛𝑁𝑛−3𝑃𝑛

Spin and pin[edit]


A subtle technical problem afflicts some uses of orthogonal matrices. Not only are the group
components with determinant +1 and −1 not connected to each other, even the +1
component, SO(n), is not simply connected (except for SO(1), which is trivial). Thus it is
sometimes advantageous, or even necessary, to work with a covering group of SO(n),
the spin group, Spin(n). Likewise, O(n) has covering groups, the pin groups, Pin(n). For
n > 2, Spin(n) is simply connected and thus the universal covering group for SO(n). By
far the most famous example of a spin group is Spin(3), which is nothing but SU(2), or
the group of unit quaternions.

The Pin and Spin groups are found within Clifford algebras, which themselves can be
built from orthogonal matrices.

Rectangular matrices[edit]

Main article: Semi-orthogonal matrix

T T
If Q is not a square matrix, then the conditions Q Q = I and QQ = I are not equivalent.
T
The condition Q Q = I says that the columns of Q are orthonormal. This can only
T
happen if Q is an m × n matrix with n ≤ m (due to linear dependence). Similarly, QQ =
I says that the rows of Q are orthonormal, which requires n ≥ m.

There is no standard terminology for these matrices. They are variously called "semi-
orthogonal matrices", "orthonormal matrices", "orthogonal matrices", and sometimes
simply "matrices with orthonormal rows/columns".

For the case n ≤ m, matrices with orthonormal columns may be referred to as


orthogonal k-frames and they are elements of the Stiefel manifold.

See also[edit]

● Biorthogonal system

Notes[edit]
[full citation needed]
● ^ "Paul's online math notes" , Paul Dawkins, Lamar University, 2008. Theorem
3(c)
● ^ "Finding the Nearest Orthonormal Matrix", Berthold K.P. Horn, MIT.
● ^ "Newton's Method for the Matrix Square Root" Archived 2011-09-29 at the Wayback
Machine, Nicholas J. Higham, Mathematics of Computation, Volume 46, Number 174, 1986.

References[edit]

● Diaconis, Persi; Shahshahani, Mehrdad (1987), "The subgroup algorithm for


generating uniform random variables", Probability in the Engineering and
Informational Sciences, 1: 15–32, doi:10.1017/S0269964800000255, ISSN
0269-9648, S2CID 122752374
● Dubrulle, Augustin A. (1999), "An Optimum Iteration for the Matrix Polar
Decomposition", Electronic Transactions on Numerical Analysis, 8: 21–25
● Golub, Gene H.; Van Loan, Charles F. (1996), Matrix Computations (3/e ed.),
Baltimore: Johns Hopkins University Press, ISBN 978-0-8018-5414-9
● Higham, Nicholas (1986), "Computing the Polar Decomposition—with
Applications" (PDF), SIAM Journal on Scientific and Statistical Computing, 7
(4): 1160–1174, doi:10.1137/0907079, ISSN 0196-5204
● Higham, Nicholas; Schreiber, Robert (July 1990), "Fast polar decomposition
of an arbitrary matrix", SIAM Journal on Scientific and Statistical Computing,
11 (4): 648–655, CiteSeerX 10.1.1.230.4322, doi:10.1137/0911038, ISSN
0196-5204, S2CID 14268409 [1]
● Stewart, G. W. (1976), "The Economical Storage of Plane Rotations",
Numerische Mathematik, 25 (2): 137–138, doi:10.1007/BF01462266, ISSN
0029-599X, S2CID 120372682
● Stewart, G. W. (1980), "The Efficient Generation of Random Orthogonal
Matrices with an Application to Condition Estimators", SIAM Journal on
Numerical Analysis, 17 (3): 403–409, Bibcode:1980SJNA...17..403S,
doi:10.1137/0717034, ISSN 0036-1429
● Mezzadri, Francesco (2006), "How to generate random matrices from the
classical compact groups", Notices of the American Mathematical Society,
54, arXiv:math-ph/0609050, Bibcode:2006math.ph...9050M

External links[edit]

Wikiversity introduces the orthogonal matrix.

● "Orthogonal matrix", Encyclopedia of Mathematics, EMS Press, 2001 [1994]


● Tutorial and Interactive Program on Orthogonal Matrix

hide
● V
● T
● E

Matrix classes

Expl
● Alternant
icitl ● Anti-diagonal
y ● Anti-Hermitian
● Anti-symmetric
con ● Arrowhead
strai ● Band
● Bidiagonal
ned ● Bisymmetric
entri ● Block-diagonal
● Block
es ● Block tridiagonal
● Boolean
● Cauchy
● Centrosymmetric
● Conference
● Complex Hadamard
● Copositive
● Diagonally dominant
● Diagonal
● Discrete Fourier Transform
● Elementary
● Equivalent
● Frobenius
● Generalized permutation
● Hadamard
● Hankel
● Hermitian
● Hessenberg
● Hollow
● Integer
● Logical
● Matrix unit
● Metzler
● Moore
● Nonnegative
● Pentadiagonal
● Permutation
● Persymmetric
● Polynomial
● Quaternionic
● Signature
● Skew-Hermitian
● Skew-symmetric
● Skyline
● Sparse
● Sylvester
● Symmetric
● Toeplitz
● Triangular
● Tridiagonal
● Vandermonde
● Walsh
● Z

Con
● Exchange
stan ● Hilbert
t ● Identity
● Lehmer
● Of ones
● Pascal
● Pauli
● Redheffer
● Shift
● Zero

Con
● Companion
ditio ● Convergent
ns ● Defective
● Definite
on ● Diagonalizable
eige ● Hurwitz
● Positive-definite
nval ● Stieltjes
ues
or
eige
nve
ctor
s

Sati
● Congruent
sfyi ● Idempotent or Projection
ng ● Invertible
● Involutory
con ● Nilpotent
ditio ● Normal
● Orthogonal
ns ● Unimodular
on ● Unipotent
● Unitary
prod ● Totally unimodular
ucts ● Weighing
or
inve
rses

With
● Adjugate
spe ● Alternating sign
cific ● Augmented
● Bézout
● Carleman
appl ● Cartan
● Circulant
icati ● Cofactor
ons ● Commutation
● Confusion
● Coxeter
● Distance
● Duplication and elimination
● Euclidean distance
● Fundamental (linear differential equation)
● Generator
● Gram
● Hessian
● Householder
● Jacobian
● Moment
● Payoff
● Pick
● Random
● Rotation
● Seifert
● Shear
● Similarity
● Symplectic
● Totally positive
● Transformation

Use
● Centering
d in ● Correlation
stati ● Covariance
● Design
stic ● Doubly stochastic
s ● Fisher information
● Hat
● Precision
● Stochastic
● Transition

Use
● Adjacency
d in ● Biadjacency
grap ● Degree
● Edmonds
h ● Incidence
theo ● Laplacian
● Seidel adjacency
ry ● Tutte

Use
● Cabibbo–Kobayashi–Maskawa
d in ● Density
scie ● Fundamental (computer vision)
● Fuzzy associative
nce ● Gamma
and ● Gell-Mann
● Hamiltonian
engi ● Irregular
neer ● Overlap
● S
ing ● State transition
● Substitution
● Z (chemistry)

Rela
● Jordan normal form
ted ● Linear independence
term ● Matrix exponential
● Matrix representation of conic sections
s ● Perfect matrix
● Pseudoinverse
● Row echelon form
● Wronskian

● Mathematics portal
● List of matrices
● Category:Matrices

Category:

Matrices
This page was last edited on 22 May 2024, at 16:45 (UTC).

Text is available under the Creative Commons Attribution-ShareAlike License 4.0; additional terms may apply.
By using this site, you

n
be defined on X = R , where A is a real, positive definite matrix.

Then f is convex, and

⟨𝑝,𝑥⟩−𝑓(𝑥)=⟨𝑝,𝑥⟩−⟨𝑥,𝐴𝑥⟩−𝑐,

has gradient p − 2Ax and Hessian −2A, which is negative; hence the stationary point x
−1
=A p/2 is a maximum.

n
We have X* = R , and

𝑓∗(𝑝)=14⟨𝑝,𝐴−1𝑝⟩−𝑐.
Behavior of differentials under Legendre transforms[edit]

The Legendre transform is linked to integration by parts, p dx = d(px) − x dp.

Let f(x,y) be a function of two independent variables x and y, with the differential

𝑑𝑓=∂𝑓∂𝑥𝑑𝑥+∂𝑓∂𝑦𝑑𝑦=𝑝𝑑𝑥+𝑣𝑑𝑦.

Assume that the function f is convex in x for all y, so that one may perform the Legendre
transform on f in x, with p the variable conjugate to x (for information, there is a relation

∂𝑓∂𝑥|𝑥¯=𝑝

where

𝑥¯

is a point in x maximizing or making

𝑝𝑥−𝑓(𝑥,𝑦)

bounded for given p and y). Since the new independent variable of the
transform with respect to f is p, the differentials dx and dy in df devolve to dp and dy in
the differential of the transform, i.e., we build another function with its differential
expressed in terms of the new basis dp and dy.

We thus consider the function g(p, y) = f − px so that

𝑑𝑔=𝑑𝑓−𝑝𝑑𝑥−𝑥𝑑𝑝=−𝑥𝑑𝑝+𝑣𝑑𝑦
𝑥=−∂𝑔∂𝑝

𝑣=∂𝑔∂𝑦.

The function −g(p, y) is the Legendre transform of f(x, y), where only the independent
variable x has been supplanted by p. This is widely used in thermodynamics, as
illustrated below.

Applications[edit]

Analytical mechanics[edit]

A Legendre transform is used in classical mechanics to derive the Hamiltonian formulation


from the Lagrangian formulation, and conversely. A typical Lagrangian has the form

𝐿(𝑣,𝑞)=12⟨𝑣,𝑀𝑣⟩−𝑉(𝑞),

where

(𝑣,𝑞)
n n
are coordinates on R × R , M is a positive definite real matrix, and

⟨𝑥,𝑦⟩=∑𝑗𝑥𝑗𝑦𝑗.

For every q fixed,

𝐿(𝑣,𝑞)
is a convex function of

, while

𝑉(𝑞)

plays the role of a constant.

Hence the Legendre transform of

𝐿(𝑣,𝑞)

as a function of

is the Hamiltonian function,

𝐻(𝑝,𝑞)=12⟨𝑝,𝑀−1𝑝⟩+𝑉(𝑞).

In a more general setting,

(𝑣,𝑞)

are local coordinates on the tangent bundle

𝑇𝑀

of a manifold

. For each q,

𝐿(𝑣,𝑞)
is a convex function of the tangent space Vq. The Legendre transform gives the
Hamiltonian

𝐻(𝑝,𝑞)

as a function of the coordinates (p, q) of the cotangent bundle

𝑇∗𝑀

; the inner product used to define the Legendre transform is inherited from the
pertinent canonical symplectic structure. In this abstract setting, the Legendre
[further explanation needed]
transformation corresponds to the tautological one-form.

Thermodynamics[edit]

The strategy behind the use of Legendre transforms in thermodynamics is to shift from a
function that depends on a variable to a new (conjugate) function that depends on a new
variable, the conjugate of the original one. The new variable is the partial derivative of the
original function with respect to the original variable. The new function is the difference between
the original function and the product of the old and new variables. Typically, this transformation
is useful because it shifts the dependence of, e.g., the energy from an extensive variable to its
conjugate intensive variable, which can often be controlled more easily in a physical
experiment.

For example, the internal energy U is an explicit function of the extensive variables
entropy S, volume V, and chemical composition Ni (e.g.,

𝑖=1,2,3,…

𝑈=𝑈(𝑆,𝑉,{𝑁𝑖}),

which has a total differential

𝑑𝑈=𝑇𝑑𝑆−𝑃𝑑𝑉+∑𝜇𝑖𝑑𝑁𝑖
where

𝑇=∂𝑈∂𝑆|𝑉,𝑁𝑖 𝑓𝑜𝑟 𝑎𝑙𝑙 𝑖 𝑣𝑎𝑙𝑢𝑒𝑠,𝑃=−∂𝑈∂𝑉|𝑆,𝑁𝑖 𝑓𝑜𝑟 𝑎𝑙𝑙 𝑖 𝑣𝑎𝑙𝑢𝑒𝑠,𝜇𝑖=∂𝑈∂𝑁𝑖|𝑆,𝑉,𝑁𝑗 𝑓𝑜𝑟 𝑎𝑙𝑙


𝑗≠𝑖

(Subscripts are not necessary by the definition of partial derivatives but left here for
clarifying variables.) Stipulating some common reference state, by using the (non-
standard) Legendre transform of the internal energy U with respect to volume V, the
enthalpy H may be obtained as the following.

To get the (standard) Legendre transform

𝑈∗

of the internal energy U with respect to volume V, the function

𝑢(𝑝,𝑆,𝑉,{𝑁𝑖})=𝑝𝑉−𝑈

is defined first, then it shall be maximized or bounded by V.


To do this, the condition

∂𝑢∂𝑉=𝑝−∂𝑈∂𝑉=0→𝑝=∂𝑈∂𝑉

needs to be satisfied, so

𝑈∗=∂𝑈∂𝑉𝑉−𝑈

is obtained. This approach is justified because U is a linear function


with respect to V (so a convex function on V) by the definition of extensive variables. The
non-standard Legendre transform here is obtained by negating the standard version, so

−𝑈∗=𝐻=𝑈−∂𝑈∂𝑉𝑉=𝑈+𝑃𝑉

.
H is definitely a state function as it is obtained by adding PV (P and V as state variables)
to a state function

𝑈=𝑈(𝑆,𝑉,{𝑁𝑖})

, so its differential is an exact differential. Because of

𝑑𝐻=𝑇𝑑𝑆+𝑉𝑑𝑃+∑𝜇𝑖𝑑𝑁𝑖

and the fact that it must be an exact differential,

𝐻=𝐻(𝑆,𝑃,{𝑁𝑖})

The enthalpy is suitable for description of processes in which the pressure is controlled
from the surroundings.

It is likewise possible to shift the dependence of the energy from the extensive variable
of entropy, S, to the (often more convenient) intensive variable T, resulting in the
Helmholtz and Gibbs free energies. The Helmholtz free energy A, and Gibbs energy G,
are obtained by performing Legendre transforms of the internal energy and enthalpy,
respectively,

𝐴=𝑈−𝑇𝑆 ,

𝐺=𝐻−𝑇𝑆=𝑈+𝑃𝑉−𝑇𝑆 .

The Helmholtz free energy is often the most useful thermodynamic potential when
temperature and volume are controlled from the surroundings, while the Gibbs energy is
often the most useful when temperature and pressure are controlled from the
surroundings.

Variable capacitor[edit]

As another example from physics, consider a parallel conductive plate capacitor, in which
the plates can move relative to one another. Such a capacitor would allow transfer of the
electric energy which is stored in the capacitor into external mechanical work, done by
the force acting on the plates. One may think of the electric charge as analogous to the
"charge" of a gas in a cylinder, with the resulting mechanical force exerted on a piston.

Compute the force on the plates as a function of x, the distance which separates them.
To find the force, compute the potential energy, and then apply the definition of force as
the gradient of the potential energy function.

The electrostatic potential energy stored in a capacitor of the capacitance C(x) and a
positive electric charge +Q or negative charge -Q on each conductive plate is (with using
the definition of the capacitance as

𝐶=𝑄𝑉

),

𝑈(𝑄,𝑥)=12𝑄𝑉(𝑄,𝑥)=12𝑄2𝐶(𝑥),

where the dependence on the area of the plates, the dielectric constant of the insulation
material between the plates, and the separation x are abstracted away as the capacitance
C(x). (For a parallel plate capacitor, this is proportional to the area of the plates and
inversely proportional to the separation.)

The force F between the plates due to the electric field created by the charge separation
is then

𝐹(𝑥)=−𝑑𝑈𝑑𝑥 .

If the capacitor is not connected to any electric circuit, then the electric charges on the
plates remain constant and the voltage varies when the plates move with respect to each
other, and the force is the negative gradient of the electrostatic potential energy as

𝐹(𝑥)=12𝑑𝐶(𝑥)𝑑𝑥𝑄2𝐶(𝑥)2=12𝑑𝐶(𝑥)𝑑𝑥𝑉(𝑥)2
where

𝑉(𝑄,𝑥)=𝑉(𝑥)

as the charge is fixed in this configuration.

However, instead, suppose that the voltage between the plates V is maintained constant
as the plate moves by connection to a battery, which is a reservoir for electric charges at
a constant potential difference. Then the amount of charges

is a variable instead of the voltage;

and

are the Legendre conjugate to each other. To find the force, first compute the non-
standard Legendre transform

𝑈∗

with respect to

(also with using

𝐶=𝑄𝑉

),
𝑈∗=𝑈−∂𝑈∂𝑄|𝑥⋅𝑄=𝑈−12𝐶(𝑥)∂𝑄2∂𝑄|𝑥⋅𝑄=𝑈−𝑄𝑉=12𝑄𝑉−𝑄𝑉=−12𝑄𝑉=−12𝑉2𝐶(𝑥).

This transformation is possible because

is now a linear function of

so is convex on it. The force now becomes the negative gradient of this Legendre
transform, resulting in the same force obtained from the original function

𝐹(𝑥)=−𝑑𝑈∗𝑑𝑥=12𝑑𝐶(𝑥)𝑑𝑥𝑉2.

The two conjugate energies

and

𝑈∗

happen to stand opposite to each other (their signs are opposite), only because of
the linearity of the capacitance—except now Q is no longer a constant. They reflect the
two different pathways of storing energy into the capacitor, resulting in, for instance, the
same "pull" between a capacitor's plates.

Probability theory[edit]
In large deviations theory, the rate function is defined as the Legendre transformation of
the logarithm of the moment generating function of a random variable. An important
application of the rate function is in the calculation of tail probabilities of sums of i.i.d.
random variables, in particular in Cramér's theorem.

If

𝑋𝑛

are i.i.d. random variables, let

𝑆𝑛=𝑋1+⋯+𝑋𝑛

be the associated random walk and

𝑀(𝜉)

the moment generating function of

𝑋1

. For

𝜉∈𝑅

𝐸[𝑒𝜉𝑆𝑛]=𝑀(𝜉)𝑛

. Hence, by Markov's inequality, one has for

𝜉≥0

and

𝑎∈𝑅
𝑃(𝑆𝑛/𝑛>𝑎)≤𝑒−𝑛𝜉𝑎𝑀(𝜉)𝑛=exp⁡[−𝑛(𝜉𝑎−Λ(𝜉))]

where

Λ(𝜉)=log⁡𝑀(𝜉)

. Since the left-hand side is independent of

, we may take the infimum of the right-hand side, which leads one to consider the
supremum of

𝜉𝑎−Λ(𝜉)

, i.e., the Legendre transform of

, evaluated at

𝑥=𝑎
.

Microeconomics[edit]

Legendre transformation arises naturally in microeconomics in the process of finding the


supply S(P) of some product given a fixed price P on the market knowing the cost
function C(Q), i.e. the cost for the producer to make/mine/etc. Q units of the given
product.

A simple theory explains the shape of the supply curve based solely on the cost
function. Let us suppose the market price for a one unit of our product is P. For a
company selling this good, the best strategy is to adjust the production Q so that its
profit is maximized. We can maximize the profit

profit=revenue−costs=𝑃𝑄−𝐶(𝑄)

by differentiating with respect to Q and solving


𝑃−𝐶′(𝑄opt)=0.

Qopt represents the optimal quantity Q of goods that the producer is willing to supply,
which is indeed the supply itself:

𝑆(𝑃)=𝑄opt(𝑃)=(𝐶′)−1(𝑃).

If we consider the maximal profit as a function of price,

profitmax(𝑃)

, we see that it is the Legendre transform of the cost function

𝐶(𝑄)

Geometric interpretation[edit]

For a strictly convex function, the Legendre transformation can be interpreted as a


mapping between the graph of the function and the family of tangents of the graph. (For
a function of one variable, the tangents are well-defined at all but at most countably
many points, since a convex function is differentiable at all but at most countably many
points.)

The equation of a line with slope

and

-intercept
𝑏

is given by

𝑦=𝑝𝑥+𝑏

. For this line to be tangent to the graph of a function

at the point

(𝑥0,𝑓(𝑥0))

requires

𝑓(𝑥0)=𝑝𝑥0+𝑏

and

𝑝=𝑓′(𝑥0).

Being the derivative of a strictly convex function, the function

𝑓′

is strictly monotone and thus injective. The second equation can be solved for

𝑥0=𝑓′−1(𝑝),

allowing elimination of

𝑥0

from the first, and solving for the


𝑦

-intercept

of the tangent as a function of its slope

𝑝,

𝑏=𝑓(𝑥0)−𝑝𝑥0=𝑓(𝑓′−1(𝑝))−𝑝⋅𝑓′−1(𝑝)=−𝑓⋆(𝑝)

where

𝑓⋆

denotes the Legendre transform of

𝑓.

The family of tangent lines of the graph of

parameterized by the slope

is therefore given by

𝑦=𝑝𝑥−𝑓⋆(𝑝),

or, written implicitly, by the solutions of the equation

𝐹(𝑥,𝑦,𝑝)=𝑦+𝑓⋆(𝑝)−𝑝𝑥=0 .
The graph of the original function can be reconstructed from this family of lines as the
envelope of this family by demanding

∂𝐹(𝑥,𝑦,𝑝)∂𝑝=𝑓⋆′(𝑝)−𝑥=0.

Eliminating

from these two equations gives

𝑦=𝑥⋅𝑓⋆′−1(𝑥)−𝑓⋆(𝑓⋆′−1(𝑥)).

Identifying

with

𝑓(𝑥)

and recognizing the right side of the preceding equation as the Legendre transform
of

𝑓⋆,

yield

𝑓(𝑥)=𝑓⋆⋆(𝑥) .
Legendre transformation in more than one dimension[edit]

n
For a differentiable real-valued function on an open convex subset U of R the Legendre
conjugate of the pair (U, f) is defined to be the pair (V, g), where V is the image of U
under the gradient mapping Df, and g is the function on V given by the formula

𝑔(𝑦)=⟨𝑦,𝑥⟩−𝑓(𝑥),𝑥=(𝐷𝑓)−1(𝑦)

where

⟨𝑢,𝑣⟩=∑𝑘=1𝑛𝑢𝑘⋅𝑣𝑘

n
is the scalar product on R . The multidimensional transform can be interpreted as an
encoding of the convex hull of the function's epigraph in terms of its supporting
[2]
hyperplanes. This can be seen as consequence of the following two observations. On
the one hand, the hyperplane tangent to the epigraph of

at some point

(𝑥,𝑓(𝑥))∈𝑈×𝑅

has normal vector

(∇𝑓(𝑥),−1)∈𝑅𝑛+1

. On the other hand, any closed convex set

𝐶∈𝑅𝑚

can be characterized via the set of its supporting hyperplanes by the equations
𝑥⋅𝑛=ℎ𝐶(𝑛)

, where

ℎ𝐶(𝑛)

is the support function of

. But the definition of Legendre transform via the maximization matches precisely that
of the support function, that is,

𝑓∗(𝑥)=ℎepi⁡(𝑓)(𝑥,−1)

. We thus conclude that the Legendre transform characterizes the


epigraph in the sense that the tangent plane to the epigraph at any point

(𝑥,𝑓(𝑥))

is given explicitly by

{𝑧∈𝑅𝑛+1:𝑧⋅𝑥=𝑓∗(𝑥)}.

Alternatively, if X is a vector space and Y is its dual vector space, then for each point x of
X and y of Y, there is a natural identification of the cotangent spaces T*Xx with Y and
T*Yy with X. If f is a real differentiable function over X, then its exterior derivative, df, is a
section of the cotangent bundle T*X and as such, we can construct a map from X to Y.
Similarly, if g is a real differentiable function over Y, then dg defines a map from Y to X. If
both maps happen to be inverses of each other, we say we have a Legendre transform.
The notion of the tautological one-form is commonly used in this setting.

When the function is not differentiable, the Legendre transform can still be extended, and
is known as the Legendre-Fenchel transformation. In this more general setting, a few
properties are lost: for example, the Legendre transform is no longer its own inverse
(unless there are extra assumptions, like convexity).
Legendre transformation on manifolds[edit]

Let

be a smooth manifold, let

and

𝜋:𝐸→𝑀

be a vector bundle on

and its associated bundle projection, respectively. Let

𝐿:𝐸→𝑅

be a smooth function. We think of

as a Lagrangian by analogy with the classical case where

𝑀=𝑅

𝐸=𝑇𝑀=𝑅×𝑅

and

𝐿(𝑥,𝑣)=12𝑚𝑣2−𝑉(𝑥)
for some positive number

𝑚∈𝑅

and function

𝑉:𝑀→𝑅

As usual, the dual of

is denote by

𝐸∗

. The fiber of

over

𝑥∈𝑀

is denoted

𝐸𝑥

, and the restriction of

to

𝐸𝑥

is denoted by
𝐿|𝐸𝑥:𝐸𝑥→𝑅

. The Legendre transformation of

is the smooth morphism

𝐹𝐿:𝐸→𝐸∗

defined by

𝐹𝐿(𝑣)=𝑑(𝐿|𝐸𝑥)(𝑣)∈𝐸𝑥∗

, where

𝑥=𝜋(𝑣)

. In other words,

𝐹𝐿(𝑣)∈𝐸𝑥∗

is the covector that sends

𝑤∈𝐸𝑥

to the directional derivative

𝑑𝑑𝑡|𝑡=0𝐿(𝑣+𝑡𝑤)∈𝑅

To describe the Legendre transformation locally, let

𝑈⊆𝑀

be a coordinate chart over which

𝐸
is trivial. Picking a trivialization of

over

, we obtain charts

𝐸𝑈≅𝑈×𝑅𝑟

and

𝐸𝑈∗≅𝑈×𝑅𝑟

. In terms of these charts, we have

𝐹𝐿(𝑥;𝑣1,…,𝑣𝑟)=(𝑥;𝑝1,…,𝑝𝑟)

, where

𝑝𝑖=∂𝐿∂𝑣𝑖(𝑥;𝑣1,…,𝑣𝑟)

for all

𝑖=1,…,𝑟

. If, as in the classical case, the restriction of

𝐿:𝐸→𝑅

to each fiber

𝐸𝑥

is strictly convex and bounded below by a positive definite quadratic form minus a
constant, then the Legendre transform
𝐹𝐿:𝐸→𝐸∗
[3]
is a diffeomorphism. Suppose that

𝐹𝐿

is a diffeomorphism and let

𝐻:𝐸∗→𝑅

be the "Hamiltonian" function defined by

𝐻(𝑝)=𝑝⋅𝑣−𝐿(𝑣),

where

𝑣=(𝐹𝐿)−1(𝑝)

. Using the natural isomorphism

𝐸≅𝐸∗∗

, we may view the Legendre transformation of

as a map

𝐹𝐻:𝐸∗→𝐸
[3]
. Then we have

(𝐹𝐿)−1=𝐹𝐻.

Further properties[edit]

Scaling properties[edit]
The Legendre transformation has the following scaling properties: For a > 0,

𝑓(𝑥)=𝑎⋅𝑔(𝑥)⇒𝑓⋆(𝑝)=𝑎⋅𝑔⋆(𝑝𝑎)

𝑓(𝑥)=𝑔(𝑎⋅𝑥)⇒𝑓⋆(𝑝)=𝑔⋆(𝑝𝑎).

It follows that if a function is homogeneous of degree r then its image under the
Legendre transformation is a homogeneous function of degree s, where 1/r + 1/s = 1.
r s
(Since f(x) = x /r, with r > 1, implies f*(p) = p /s.) Thus, the only monomial whose degree
is invariant under Legendre transform is the quadratic.

Behavior under translation[edit]

𝑓(𝑥)=𝑔(𝑥)+𝑏⇒𝑓⋆(𝑝)=𝑔⋆(𝑝)−𝑏

𝑓(𝑥)=𝑔(𝑥+𝑦)⇒𝑓⋆(𝑝)=𝑔⋆(𝑝)−𝑝⋅𝑦

Behavior under inversion[edit]

𝑓(𝑥)=𝑔−1(𝑥)⇒𝑓⋆(𝑝)=−𝑝⋅𝑔⋆(1𝑝)

Behavior under linear transformations[edit]

n m n
Let A : R → R be a linear transformation. For any convex function f on R , one has

(𝐴𝑓)⋆=𝑓⋆𝐴⋆
where A* is the adjoint operator of A defined by

⟨𝐴𝑥,𝑦⋆⟩=⟨𝑥,𝐴⋆𝑦⋆⟩,

and Af is the push-forward of f along A

(𝐴𝑓)(𝑦)=inf{𝑓(𝑥):𝑥∈𝑋,𝐴𝑥=𝑦}.

A closed convex function f is symmetric with respect to a given set G of orthogonal


linear transformations,

𝑓(𝐴𝑥)=𝑓(𝑥),∀𝑥,∀𝐴∈𝐺

if and only if f* is symmetric with respect to G.

Infimal convolution[edit]

The infimal convolution of two functions f and g is defined as

(𝑓⋆inf𝑔)(𝑥)=inf{𝑓(𝑥−𝑦)+𝑔(𝑦)|𝑦∈𝑅𝑛}.

n
Let f1, ..., fm be proper convex functions on R . Then

(𝑓1⋆inf⋯⋆inf𝑓𝑚)⋆=𝑓1⋆+⋯+𝑓𝑚⋆.

Fenchel's inequality[edit]
For any function f and its convex conjugate f * Fenchel's inequality (also known as the
Fenchel–Young inequality) holds for every x ∈ X and p ∈ X*, i.e., independent x, p
pairs,

⟨𝑝,𝑥⟩≤𝑓(𝑥)+𝑓⋆(𝑝).

See also[edit]

● Dual curve
● Projective duality
● Young's inequality for products
● Convex conjugate
● Moreau's theorem
● Integration by parts
● Fenchel's duality theorem

References

You might also like