Download as pdf or txt
Download as pdf or txt
You are on page 1of 58

Notes on Analytical Mechanics

By Peter Diehr, May 2006


0. Overview of Analytical Mechanics

“… to reduce the theory of mechanics, and the art of solving the associated problems, to
general formulae, whose simple development provides all the equations necessary for the
solution of each problem… . to unite, and present from one point of view, the different
principles which have, so far, been found to assist in the solution of problems in
mechanics; by showing their mutual dependence and making a judgment of their validity
and scope possible. … No diagrams will be found in this work. The methods that I
explain in it require neither constructions nor geometrical or mechanical arguments, but
only the algebraic operations inherent to a regular and uniform process. Those who love
Analysis will, with joy, see mechanics become a new branch of it and will be grateful to
me for having extended its field.”

(Joseph Louis de Lagrange, Avertissement for Mechanique Analytique, 1788)

Outline:

0-Overview of Analytical Mechanics


1-Forces, Work and Potential Energy
2-Partial Derivatives and the Chain Rule
3-Derivatives in Higher Dimensions
4-Motivating the Euler-Lagrange Equations of Motion
5-An Introduction to the Calculus of Variations
6-Constraints and Lagrange's Method of Undetermined Multipliers
7-Some Worked Examples of Lagrangian Methods
8-Hamiltons Principle and Hamilton's Equations of Motion
9-Hamilton-Jacobi Equation
10-Connections to Optics
11-Connections to Quantum Mechanics

References:

The Variational Principles of Mechanics, Lanczos


Analytical Mechanics, Fowles and Cassiday
Classical Mechanics, Goldstein
Classical Dynamics, Greenwood
Principles of Optics, Born and Wolf
Mathematical Methods for Physicists, Arfken
Div, Grad, Curl, and All That, Schey
Schaum’s Outline of Theory and Practice of Theoretical Mechanics, Spiegel
Wikipedia, The Online Free Encyclopedia
PlanetMath, Online Math for the People, by the People

Chapter 0-Overview of Analytical Mechanics Page 1


1. Forces, Work and Potential Energy

Define the position of a particle with a position vector, r = ( x, y, z ).

The velocity is the rate of change of position, v = = r , and


dr
dt
acceleration is the rate of change of velocity, a = = r.
dv

Mechanical momentum is defined as p = mv , where m is the mass.


dt

Kinetic energy is defined as T = mv 2 = pi p / 2m = p 2 / 2m.


1
2
We will also have occasion to use the concepts of moment of inertia, torque, angular
velocity, and angular momentum. These will be introduced as needed.

Newton’s 1st law of motion, the law of inertia, tells us that motion or lack of motion of a
particle is unchanged if there are no forces present; consequently, this is also a test for the
presence of a force.

Newton’s 2nd law of motion relates an applied force to the rate of change of momentum
of a body: F = = p = ma , where the latter is true whenever the mass is unchanged.
dp
dt
Thus if the forces are known a priori, the motions are determined, and vice versa.

d ( p1 + p2 )
By Newton’s 3rd law of motion, the reaction force is equal and opposite to the action
force. Thus F1 + F2 = 1 + 2 = = 0, and the total momentum of the system
dp dp
dt dt dt
is unchanged. This is easily extended to many particles, and hence momentum is
conserved in the absence of external forces.

by a path integral: W = ∫ F idr . If the amount of work done is independent of the actual
The common notion of work is the application of a force through a distance, made precise
C
path, depending only upon the end-points, then we have a conservative force. The
mathematical test is if the curl is zero: ∇ × F = 0. Gravity and static electric forces are
conservative; friction is non-conservative. For conservative forces, we have the work-

W = ∫ F idr = ∫ (
idr = 1 ∫ m idp = 1 ∫ pidp = pB2 − pA2 2m. )
energy theorem:
B B dp pB dr pB

A A dt m pA dt m pA

So work is the change in kinetic energy. If the process is reversible, we can let the work
be performed on us, such as with a compressed spring, or a weight on a lever. Thus we

potential energy: U ( P ) = − ∫ F idr , where P is any point, and R0 is the point of reference
can store the work, and later convert it back to kinetic energy. This stored work is called
P

R0

Chapter 1-Forces, Work and Potential Energy Page 1


for this potential. This reference point is essentially arbitrary, but care must be taken if
you shift the reference point, because it is a shift in the zero level. The everyday formula
for gravitational potential, U=mgh , assumes a reference point at the surface of the earth,
with a potential of zero at the surface; however, for problems dealing with escape
velocity, it is convenient to put the reference point at infinity, with the potential at infinity
being zero. This gives a negative potential at the surface of the earth. Only the
difference in two potentials gives a definite energy value; without taking a difference
there is always an arbitrary additive constant, due to the reference point.

If the potential is known, the force can be obtained by means of the gradient; this is
simply an application of the fundamental theorem of (multivariable) calculus:
F ( P ) = −∇U ( P). The gradient operator kills off the reference level associated with the
potential, so we get the true force.

have a total energy at each point on the trajectory, r ( t ) , such that


If we restrict ourselves to a particle moving in a force field describable by a potential, we

( )
E ( t ) = T ( t ) + U r ( t ) . Then the rate of change of energy with time is

( )
⎟ + U r (t ) = ⎜ i p ⎟ + ∑
dE d ⎛ p 2 ⎞ d ⎛p ⎞ ∂U dxi
= ⎜ = vi p + ∇U iv. The first term is
dt dt ⎝ 2m ⎠ dt ⎝ m ⎠ i ∂xi dt
instantaneous power, vi F , and the second term contains the gradient of the potential,
which is the negative of the force, followed by the rate of change of position with time,
= 0, so that the
dE
which is the velocity. The two terms are equal and opposite, leaving
dt
energy is conserved; we say that this is a conservative force. This would not be the case if
the potential depended upon time or velocity, as the additional terms from the time
derivative would not cancel.

The mechanics of Newton are carried out with forces, which are represented as vectors.
But the kinetic and potential energies are scalar quantities, and we have seen that forces
and the magnitude of the momentum can be recovered from them. In order to further
explore the expressions of mechanics in terms of energies, we must first review
derivatives, and the construction of generalized coordinate systems.

Chapter 1-Forces, Work and Potential Energy Page 2


2. Partial Derivatives and the Chain Rule

Let z = f ( x1 , x2 , , xn ) and xi = gi ( u1 , u2 , , um ) for i = 1 to n. Then if there is a domain


D containing points P = ( u1 , u2 , , um ) such that f ( g1 , g2 , , g n ) is defined, then the
chain rule gives the explicit variations:

=∑
∂z ∂z ∂xi
for each j = 1 to m.
n

∂u j i =1 ∂xi ∂u j

The proof can be found in advanced calculus texts; it is mostly keeping track of details,
which makes it lengthy. The fundamental idea is that you must compound the rates of
change through each extant parametric dependence; each of these is just the simple chain
rule.

z = f ( r ,θ ) , r 2 = x 2 + y 2 , tan (θ ) = y
Example:

( x ).
x
so in the first quadrant r = x 2 + y 2 , θ = tan −1 y
∂z ∂z ∂r ∂z ∂θ
= +
∂x ∂r ∂x ∂θ ∂x
Then .

If a function depends on only one variable, then the partial and ordinary derivatives are
the same.

z = f ( r ,θ ) , r = r ( t ) , θ = θ ( t ) .
Example with parametric form:

∂z ∂z ∂r ∂z ∂θ ∂r dr ∂θ dθ
= + = =
∂t ∂r ∂t ∂θ ∂t ∂t dt ∂t
Then with and .
dt

In the presence of constraints one or more of the independent variables may be held
constant. In that case the non-varying terms vanish. This is especially common in
thermodynamics, where the chemist is able to control the volume or the pressure. The
notation must be adapted to indicate the held variables to avoid ambiguities.

Chapter 2-Partial Derivatives and the Chain Rule Page 1


Total Derivatives and Some Special Results
The chain rule is similar when some or all of the coordinates are connected through a
single parameter, such as time; then it is called a total derivative. This name reminds you
that the result of a total derivative can be integrated with respect to that parameter to
recover the original function. We will see later that the total derivative gives the rate of
change along the parameterized path; this type of parameterization results in implicit
variations with respect to time for moving objects.

Consider the time variation of the position vector r ( x1 , x2 ,


, xn , t ) , where we assume
independence of the spatial coordinates; this explicit time dependence allows for moving

=∑ =∑
or rotating coordinate systems:
∂ r dxi ∂ r n ∂ r ∂r
r= + xi + ,
n
dr
dt i =1 ∂xi dt ∂t i =1 ∂xi ∂t

()
and also its partial with respect to xk :

⎜∑ xi + ⎟ = ∑
∂ ∂ ⎛ n ∂r ∂ r ⎞ n ∂ r ∂xi
r =
∂xk ∂xk ⎝ i =1 ∂xi ∂t ⎠ i =1 ∂xi ∂xk
,

∂r
∂xi
where we have recognized that there is no xk dependence in the . Independence of the

∂xi
= 0 except when i = k , so the summation collapses to
∂xk
coordinates also means

∂r ∂r
=
∂xk ∂xk
(2.1) ; this is called cancellation of the dots. We will use this later, along with

d ⎛ ∂r ⎞ ∂r
⎜ ⎟=
dt ⎝ ∂xk ⎠ ∂
this interchange of operators: (2.2) . This is established by starting from
x

=∑ ⎟=∑
k

∂r ∂2 r ∂2 r d ⎛ ∂r ⎞ n ∂ 2 r ∂2 r
xi + ⎜ xi +
n

∂xk i =1 ∂xk ∂xi ∂xk ∂t dt ⎝ ∂xk ⎠ i =1 ∂xi ∂xk ∂t∂xk


r above, then find . But , and

the two sides differ only in the order of the partial derivatives; Clairaut’s theorem tells us
that these can be interchanged as long as both of the resulting partial derivatives are
continuous, which provides the condition for the identity.

Chapter 2-Partial Derivatives and the Chain Rule Page 2


df ( y ) dg ( x ) df ( g ( x ) ) df dy
Simple Chain Rule and Some Applications

f g ( x) = = , where y = g ( x ) .
d
|y= g ( x) or briefly,
dx dy dx dx dy dx

Let F ( x, y ) = 0, and assume y = y ( x). Then


Implicit Definitions

∂F dx ∂F dy ∂F
0= F ( x, y ) = + = Fx + Fy = − Fx Fy for Fy = ≠ 0.
d dy dy
∂x dx ∂y dx ∂y
so
dx dx dx

Thus y ( x ) = ∫ dx = − ∫ ⎜ x ⎟ dx + C.
dy ⎛F ⎞
dx ⎝ Fy ⎠

f −1 ( f (θ ) ) = θ by definition; Let y = f (θ ) then


Inverse Functions

dθ d −1 df (θ )
1= = f ( y ) | y = f (θ ) ⇒ f ( y) = 1 f ( y) = 1
d −1 d −1
dθ dy dθ dθ dθ
dy dy
so .

Just remember to convert all of the variables from θ to y when you are done.
dy dy

Example:
n
x n = x, y = x n ⇒ x = n y = y1/ n . So

( y)= d ⎛ 1n ⎞ 1 1n −1
1

= = = = y ⇒ ⎜y ⎟= y .
1 1n −1

( )
n
d 1 1 1 1x 1 y
n x n −1 n y n y n dy ⎝ ⎠ n
n
dy d n
x
dx

y = tan (θ ) , θ = tan −1 ( y ) . Then tan −1 ( y ) =


Example:

tan (θ )
d 1 . But

dy d

d ⎛ 1 ⎞ cos (θ ) sin (θ )
tan (θ ) = ⎜⎜ sin (θ ) ⎟⎟ = − = 1 + tan 2 (θ ) = 1 + y 2 .
cos (θ ) ⎠ cos (θ ) cos (θ )
2

dθ dθ ⎝
d
2

tan −1 ( y ) =
d 1
1+ y2
So .
dy

Chapter 2-Partial Derivatives and the Chain Rule Page 3


Practice Problems

1. Use the chain rule to find the derivatives of the inverses of the following functions:

sin (θ )
exp ( x )

2. Use the chain rule to find the total derivatives of the following functions, with respect
to the parameter t :

(
cos x 2 + y 2 )
x2 + y 2 + z 2

3. Repeat problem (2), but take the partial derivatives with respect to z.

∂r
∂xk
4. Recall equation (2.1); use the same methods to find .

Chapter 2-Partial Derivatives and the Chain Rule Page 4


3. Derivatives in Higher Dimensions

Let X ∈ and Y ∈ and F : X → Y . Let {ui } be a basis set for X so that


x = ∑ ε i uˆi . If N , M = 1, then the traditional derivative can be defined as:
N M

F ( x0 + h ) − F ( x0 )
( x0 ) = lim
i

dF
.
dx h →0 h

Generalize first with N ≥ 1 so that the domain can support a surface (e.g., z = F ( x, y ) ).

x0 = ∑ α i uˆi in the direction of a unit vector nˆ = ∑ βi ûi by means of a parametric


We obtain simple curves by passing a cutting plane through the point of interest

( )
line x0 + hnˆ ; this defines the directional derivative:
i i

( ) ( )
( ) ( )
F x0 + hnˆ − F x0
x0 = lim = lim F x0 + hnˆ .
dF d
dnˆ h → 0 h h → 0 dh

If n̂ is one of the basis vectors, say uˆk , only the uk variable is then subject to change.
Thus follows the definition and rule for partial derivatives: hold all of the ui constant
dF ∂F
=
duˆk ∂uk
except for uk , and you get .

∂F
∂uk
If we find continuous for each of the basis vectors at the point x0 we have the slopes

gradient: ∇F = ∑ uˆi , which is a vector pointing “up hill” on the tangent plane. The
∂F
of a hyper-plane tangent to that point. This plane represents a new type of derivative, the

i ∂ui

∂F
gradient is identical to the directional derivative having the greatest magnitude, and is
independent of the coordinates used. Note then that ∇F iuˆk =
∂uk
for any basis.

( ) ( ) ( ) ( )
Applying the chain rule to the definition of the directional derivative gives
x0 = lim x0 + hnˆ = lim ∇F x0 + hnˆ inˆ = ∇F x0 inˆ. We get a linear
dF dF
dnˆ h → 0 dh h →0

direction vector, nˆ. If we change to the basis where nˆ = vˆk then all of the above remains
combination of the partial derivatives making up the gradient, with weightings from the

true since vector relationships are independent of the basis chosen; this provides an
alternative proof of the gradient/unit vector method of evaluation.

Chapter 3-Derivatives in Higher Dimensions Page 1


Choosing a second basis set uˆk = ∑ α ki vˆi we use the generalized chain rule to find:

=∑ , where U = ∑ ui uˆi , a generalized vector in the uˆi .


dF ∂F ∂F ∂ui ∂U
i

= = ∇F i
dvˆk ∂vk i ∂ui ∂vk ∂vk
∂U
i

= vˆk . If we create a path connecting the ui we can


∂vk
Thus for any change of basis,

parameterize via the arc length along the path, or by its parallel, time. The derivative

=∑
with respect to this parameterization is called the total derivative and is given by
∂F dui
= ∇F i
dF dU
i ∂ui ds
. The total derivative includes both explicit and implicit
ds ds

=∑
variations due to the parameterization. If this is time, it gives you the speed along the
∂F ∂F ∂ui ∂U
= ∇F i
∂vk i ∂ui ∂vk ∂vk
path. We also see that .

∂F ∂U
= vˆk . Similarly,
∂vk ∂vk
So is a directional derivative in the direction of unit vector

dF dU
is a directional derivative in the direction of , but scaled by this magnitude; it is
ds ds
the direction of the “motion”.

Total Derivative, an Example

Given a spatial temperature map, T = T ( x, y, z ) , we find the maximum rate of


temperature change and its direction at any point r = ( x, y, z ) via the gradient operator:

()⎛ ∂T ∂T ∂T
∇T r = ⎜
⎝ ∂x ∂y ∂z
, ,

⎟ . If the temperature map includes time, T = T ( x, y, z , t ) , then

∂T
∂t
we can also find the temporal variation, . Changing the spatial coordinate system
does not change the gradient, though it does change the vector components. Now

R ( t ) = ( x ( t ) , y ( t ) , z ( t ) ) . There are two aspects to this map: a geometric path, and


condsider the case of a bicycle rider passing through. We can map his trip as

temporal position along the path. The bicyclist’s velocity is v = v ( t ) = R (t ) = ( x, y, z ) .


d
dt
The rate of change of temperature of the bicyclist is found by the total derivative of the

+∑
dT ∂T ∂T dxk ∂T
temperature map as parameterized by the bicyclist’s path:
= = + vi∇T .
dt ∂t k ∂xk dt ∂t
This is also known as the convective derivative in fluid flow. Note that the time is what
ties the bicyclist to a specific temperature, and that the second term is a directional
derivative, in the direction of travel, scaled by speed.

Chapter 3-Derivatives in Higher Dimensions Page 2


Derivatives in Higher Dimensions and the Jacobian

If we have N , M ≥ 1, then F is vector valued, and can be structured as:


( )
⎧ f1 x0 ⎫

( )
⎪ ⎪
⎪f x ⎪
⎪ ⎪
F =⎨ 2 0 ⎬
⎪ ⎪
( )
⎪ ⎪
⎪⎩ f M x0 ⎪⎭

( )
Then the derivative of F is the matrix where row k is the gradient of f k x0 . This is

( )
⎡ ∇f1 ⎤
called the Jacobian of F, and can be written as:

( )
⎡ ∂f k x0 ⎤ ⎢ ∇f ⎥ ∂ ( f , f , , fM )
JF x0 = ⎢ ⎥=⎢ 2⎥=
⎢ ∂uk ⎥ ⎢ ⎥ ∂ ( u1 , u2 , , uN )
1 2
.
⎣ ⎦ ⎢ ⎥
⎣ ∇f M ⎦
( ) ( )
Then it is easy to show that the change of F at x0 is: ΔF x0 = JF x0 iΔ x0 .
As the best linear estimator for F, this is what we want for a generalized derivative.
These properties make the determinant of the Jacobian matrix the best estimator for
volume changes, hence its use as the volume adjustment for change of variables in an
integral.

Divergence

closed infinitesimal region: ∇i E = lim ⎜ ∫∫ E idS ⎟ . This is a scalar, independent of


⎛1 ⎞
The divergence of a vector field is equivalent to the net flow through the boundary of a


V →0 V ∂V

the coordinate system. By Gauss’s theorem, ∫∫∫ ∇i E dV = ∫∫ E idS , or the volume
V ∂V

integral of the divergence of a field is equal to the net flow of the field through the
boundary of the volume. A fluid is incompressible if the divergence of its flow is zero.

Curl

patch (an open volume): ∇ × E = lim ⎜ ∫∫ E × dS ⎟ . This is a vector, independent of the


⎛1 ⎞
The curl of a vector field is equivalent to the net flow around a circuit of an infinitesimal


V →0 V ∂V

coordinate system. By Stoke’s theorem, ∫∫ ∇ × E idS = ∫ E id , where R is an open
∂R

surface, and ∂R is a closed loop about its opening; d is tangent to the curve. Applies to
R

rotations and circuital flows.

Chapter 3-Derivatives in Higher Dimensions Page 3


Constraints and Level Curves

Let X ∈ N
so that F : X → R defines a surface in N +1
. Let {ui } be a basis set for X

so that x = ∑ ε i uˆi . The stationary points of F occur where ∇F = ∑


∂F
uˆi = 0. Each of
i i ∂ui

the stationary points must be examined further to determine which are the local minima
and maxima of F ; if there are boundaries, these must be examined as well.

Note that since the gradient vanishes at a stationary point, so must the directional
= ∇F in. If we apply a side condition, or constraint to the space
dF
derivative, because
dn
in which we are working, then we must search for stationary points in the constrained
space.

Let gi : X → R for i = 1 to m, m < n. Let the equations gi ( x ) = 0 serve as constraints on

{ ( )
the surface defined by F . Let S be the space: S = X : gi X = 0, for all i = 1 to m . }
Assume that x0 ∈ S is such a stationary point. Let u ( s ) be a parameterized curve in

S which passes through x0 at s = s0 . Then ( s0 ) is tangent to the curve there, and each
du

gi (u ( s)) = 0. If we looked at all of the possible curves passing through x0 , and their
ds

derivatives there, we would have found the tangent plane at x0 , for S. By the chain rule

g i (u ( s0 )) = ∇g i |u ( s0 ) i |s0 = 0. From this we can conclude that each of the gradients,


d du

∇gi , is orthogonal to the tangent plane to S. This does not mean that they are parallel to
ds ds

each other, for we would expect the constraints to be independent of each other.

Lagrange’s theorem states that ∇F = ∑ λi ∇gi for a unique set of multipliers, the λi , for
i

∇gi are orthogonal to the level curves defined by their corresponding equations of
each stationary point in the constrained space, S. There are two things going on: the

constraint, which results in m independent directions orthogonal to the space S ; and the
gradient of F is a linear combination of those directions at each of the stationary points.

F ′ = F − ∑ λi gi , ∇F ′ = 0.
This allows us to rewrite the constrained function as

Lagrange’s method of undetermined multipliers enables us to determine the stationary


points subject to m constraints by adding m additional equations, and then only later
solving for the constraints.

This is reviewed in a more geometric setting in the following supplement.

Chapter 3-Derivatives in Higher Dimensions Page 4


Supplement on Gradients, Isocontours, and Lagrange’s
Theorem

The directional derivative gives the rate of change in the specified direction:
= ∇F iuˆ.
dF
duˆ
The maximum rate of change is when the direction vector points in the same direction as
the gradient vector; the gradient gives the maximum rate of change.

An isocontour, or constant contour plot, is what you get when you set F ( x, y, z ) = Ck for
even increments ΔCk . Several examples are shown here. The plot above is shown with
simple contour lines, and with color shading. If you “walk” the cyan, you never have to
climb over the hills, though the land is not quite flat!

When the terrain is very steep, you see many contour lines very close together. Their
“density” is proportional to the directional derivative in the direction viewed.

Note that the directional derivative along a contour line is zero, because the contour line
is “instantaneously” level in that direction; this is true in higher dimensional spaces also.
But the directional derivative is the inner product of the gradient with the direction
vector, so they must be orthogonal.

This means that the gradient is normal to every level curve and level surface. Look
closely at the figures above, and find the steepest gradients, where the level curves are
closest together. At these points it will be most obvious that the gradient must be running
perpendicular to the level curves, but it is true everywhere.

Chapter 3-Derivatives in Higher Dimensions Page 5


Stationary Points

If you look for minima or maxima of a function, a plot of the level curves helps you to
find the local extrema quickly. Now consider any directional derivative at these extreme
points: it must be zero in every direction, because you are at a stationary point of the
surface in every direction.

∇F = 0.
This means that the gradient must be zero at each stationary point:

Gradient: Alternative Definition

∫∫
⎡ 1 ⎤
An alternative, coordinate-free, or geometric definition of the gradient is as follows:
∇F = lim ⎢ FdS ⎥ ,
ΔV → 0 ΔV
⎣ ∂V

where V is a volume containing the point of interest, ∂V is the boundary of that volume,
and we take the limit as the volume shrinks to zero. The integral is a surface integral,
which uses the function F to weight each of the surface normal vectors, which are then
summed. The result is the net weight of the changes in F for every direction at the point
enclosed by the shrinking volume.

Chapter 3-Derivatives in Higher Dimensions Page 6


Stationary Points with Constraints

function, such as g ( x ) = 0. These are generalized level curves, because the “level” is
Now consider the process required to find stationary points subject to constraints on the

buried on the left-hand side of the equation. So we know that the gradient of G is normal
to this surface.

The plots here shows a series of level curves of f ( x, y ) , and the constraint
g ( x, y ) − k = 0. The level curve of f which intersects the level curve of g at the highest
point is a local maximum. But where two curves just touch (“osculate”) they are by

∇f is normal to the level curve of f , and ∇g is normal to the level curve of g , so they
necessity tangent to each other, and so their normals are parallel. We already know that

∇f = λ ∇g ,
satisfy Lagrange’s equation:

where λ is called a Lagrange multiplier. This is a necessary condition for the location of
a stationary point subject to a level-curve constraint.

If we have a set of independent constraints, gi ( x ) = 0, then each one removes a degree of

gradient, ∇gi , which must be removed from the solution space. The Lagrange equation
freedom from the problem, and each one represents an independent direction for its

for multiple constraints is then a linear combination of these gradients, each with its own

∇F = ∑ λi ∇gi .
multiplier:

F ′ = F − ∑ λi gi ,
i
It is often convenient to define a new function with all of the degrees of freedom restored:

and then the solution is ∇F ′ = 0, and the Lagrange multipliers are found be means of the
i

side conditions. Often the values of the multipliers are not required, and for this reason
the method is known as the Lagrange method of undetermined multipliers.

See also http://www.slimy.com/~steuard/tutorials/Lagrange.html for a nice tutorial.

Chapter 3-Derivatives in Higher Dimensions Page 7


We want to optimize a function f ( x, y, z ) subject to the constraint g ( x, y, z ) = 0.
Lagrange Multipliers

Method of Lagrange Multipliers

∇ f ( x , y , z ) = λ ∇g ( x , y , z )
1. Solve the following system of equations:

g ( x, y, z ) = 0.

2. Plug in all solutions ( x, y, z ) from the first step into f ( x, y, z ) and identify the
extreme values.

The constant λ is called a Lagrange Multiplier.

∂f ∂g ∂f ∂g ∂f ∂g
Note that we actually have four equations in four unknowns:
=λ =λ =λ
∂x ∂x ∂y ∂y ∂z ∂z
, , ,

and the constraint, g ( x, y, z ) = 0.

Chapter 3-Derivatives in Higher Dimensions Page 8


Example 1 Find the dimensions of the box with largest volume if the total surface area is
fixed. Perhaps we only have so much wrapping paper available!

First we must formulate the object function, f ( x, y, z ) , which is the volume of the box.
So f ( x, y, z ) = xyz. The constraint is that the surface area of the box is fixed, so since
opposite sides are the same we have g ( x, y, z ) = 2 ( xy + yz + zx ) − k = 0, where k is the
fixed value for the surface area. We will also required each side to have a positive
length, because this is a real box.

⎛ ∂xyz ∂xyz ∂xyz ⎞


∇f ( x, y, z ) = ∇xyz = ⎜ ⎟ = ( yz , xz , xy ) ,
⎝ ∂x ∂y ∂z ⎠
, ,

∇g ( x, y, z ) = ∇ ⎣⎡ 2 ( xy + yz + zx ) − k ⎦⎤ = 2 ( y + z , x + z , y + x ) .

yz = λ ⎡⎣ 2 ( y + z ) ⎤⎦ ,
Applying Lagrange’s theorem results in the following three equations:

xz = λ ⎣⎡ 2 ( x + z ) ⎦⎤ ,
xy = λ ⎡⎣ 2 ( y + x ) ⎤⎦ ,
along with the constraint, 2 ( xy + yz + zx ) − k = 0.

If you multiply the first by x, the second by y, and the third by z , you end up with the same

that λ = 0 must be rejected if these are to be useful boxes, we can divide out the common
expression on the left hand side of all three, so the right hand sides are all equal. Noting

factor 2λ , leaving:
x ( y + z ) = y ( x + z ) = z ( y + x ) , or expanding each of these, we have
xy + xz = yx + yz = zy + zx, and working through these pair-wise gives
x = y = z.

So the largest volume rectangular box with a given surface area is (surprise!) a cube.

Chapter 3-Derivatives in Higher Dimensions Page 9


Example 2 Find the maximum and minimum of f ( x, y ) = 5 x − 3 y subject to the
constraint x 2 + y 2 = 136.

f ( −10, 6 ) = −68,
Answer:

f (10, −6 ) = 68.

Example 3 Find the maximum and minimum values of f ( x, y, z ) = xyz subject to the
constraint x + y + z = 1.

f ( 0, 0,1) = f ( 0,1, 0 ) = f (1, 0, 0 ) = 0,


Answer:

⎛1 1 1⎞ 1
f ⎜ , , ⎟= .
⎝ 3 3 3 ⎠ 27

Example 4 Find the maximum and minimum values of f ( x, y ) = 4 x 2 + 10 y 2 on the disk


x 2 + y 2 ≤ 4.

Note that the constraint here is the inequality for the disk.

constraint). The only critical point is ( 0, 0 ) , and it satisfies the constraint.


The first step is to find all the critical points of f that are in the disk (i.e. satisfy the

Now proceed with Lagrange Multipliers and treat the constraint as an equality instead of
an inequality. We deal with the inequality when finding the critical points. The three

8 x = 2λ x ,
equations are:

20 y = 2λ y,
x 2 + y 2 = 4.
You will find four points: ( 0, 2 ) , ( 0, −2 ) , ( 2, 0 ) , ( −2, 0 ) . These are all on the boundary.

To find the maximum and minimum we need to simply plug these four points along with

f ( 0, 0 ) = 0,
the critical point into the function f :

f ( 2, 0 ) = f ( −2, 0 ) = 16,
f ( 0, 2 ) = f ( 0, −2 ) = 40.
The minimum is in the interior, and the maximum is on the boundary of the disk.

Chapter 3-Derivatives in Higher Dimensions Page 10


Example 5: A drug company manufactures drug MXY from two ingredients which are
known simply as ingredient X and ingredient Y. The number of doses of MXY produced,
D, is given by the Cobb-Douglas function D=6x 2/3 y1/ 2 where x and y are the number of
grams of ingredient X and Y respectively.

Suppose ingredient X costs $4 per gram and ingredient Y costs $3 per gram. Find the
maximum number of doses that can be made if no more than $7,000 can be spent on raw
materials.

Answer:

The constraint equation is 4 x + 3 y ≤ 7000.


The result is x = y = 1000, and D 38, 000 doses of drug MXY.

Chapter 3-Derivatives in Higher Dimensions Page 11


4. Motivating the Euler-Lagrange Equations of Motion

Let a particle be subject to a force f = ( f1 , f 2 , f3 ) , so that the work done over a distance
d r = (dx1 , dx2 , dx3 ) is
dW = f id r = f1dx1 + f 2 dx2 + f 3dx3 = ∑ f i dxi .
If there is a constraint F ( x1 , x2 , x3 ) = 0 which leaves only 2 degrees of freedom (DOF) for
(4.1)

the motion then we can find generalized coordinates ui = ui ( x1 , x2 , x3 ) for each degree of
freedom with implicitly defined inverses: xi = xi ( u1 , u2 ) . For example, we could be
constrained to the surface of a sphere.

Use the chain rule to get dr = ∑


∂r
∂u j
du j , and put it into (4.1):

dW = f idr = ∑ f i du j = ∑ G j du j , with G j = f i
∂r ∂r
j

∂u j ∂u j
(4.2) .
j j

We call the G j components of the generalized force.

By Newton’s 2nd law of motion f = m


dr
, so for the generalized forces we get

= m∑ i i .
dr ∂r dx ∂x
dt
Gj = m i
dt ∂u j dt ∂u j
(4.3)
i

d ⎛ ∂x ⎞ d ⎛ ∂x ⎞ d ⎛ ∂x ⎞
We can rearrange each term as the difference of two derivatives:
dxi ∂xi ∂x
= ⎜ xi i ⎟ − xi ⎜ i ⎟ = ⎜ xi i ⎟ − xi i ,
dt ∂u j dt ⎝⎜ ∂u j ⎟⎠ dt ⎜⎝ ∂u j ⎟⎠ dt ⎜⎝ ∂u j ⎟⎠ ∂u j
(4.4)

where we have applied our special result shown earlier (2.2) to interchange the partial
and total derivatives of xi in the final term. Also recalling the theorem for cancellation of
⎛ ∂x ⎞
the dots from the section on total derivatives (2.1), we now replace ⎜ i ⎟ in the leading
⎜ ∂u j ⎟
⎝ ⎠
⎛ ∂x ⎞
term by ⎜ i ⎟ , and substitute (4.4) back into (4.3) where we recognize quadratic forms:
⎜ ∂u j ⎟
⎝ ⎠
⎡ d ⎛ ∂x ⎞ ∂x ⎤ 1 ⎡ d ⎛ ∂x 2 ⎞ ∂x 2 ⎤
G j = m∑ ⎢ ⎜ xi i ⎟ − xi i ⎥ = m∑ ⎢ ⎜ i ⎟ − i ⎥ .
⎜ ∂u j ⎟
i ⎢ dt ⎝ ∂u j ⎦⎥ 2 i ⎣⎢ dt ⎝⎜ ∂u j ⎠⎟ ∂u j ⎥⎦
⎣ ⎠
(4.5)

If we carry out the summations of the xi 2 we have derivatives of the velocity squared, so

we can go further by applying the definition of kinetic energy: T = m∑ xi 2 to get


1
2 i

Chapter 4-Motivating the Euler-Lagrange Equations of Motion Page 1


d ⎛ ∂T ⎞ ∂T
Gj = ⎜ ⎟⎟ −
dt ⎜⎝ ∂u j ⎠ ∂
(4.6) .
u

( mxi ) , which is Newton’s


j

Test this by replacing u j with xi and you will find that G j =


d
dt
2nd law of motion for the xi direction.

system of general coordinates qi , i = 1 to n , there being n degrees of freedom. This gives


If we consider a many particle system with N particles or rigid bodies, described with a

, where G j = ∑ Fi i
d ⎛ ∂T ⎞ ∂T
the Euler-Lagrange equations of motion in terms of generalized forces:
∂r i
Gj = ⎜ ⎟ −
dt ⎜⎝ ∂q j ⎟⎠ ∂q j
N

∂q j
(4.7) .
i =1

Note that ri = ri ( q1 , q2 ,
, qn , t ) , as the particle positions may depend upon any or all of
the generalized coordinates, including explicit dependence upon time.

Conservative Forces

U = U ( q1 , q2 , , qn ) such that dW = −dU . Note that we have also assumed that there is
Conservative forces are independent of time, and can be derived from a work function

∂U d ⎛ ∂ (T − U ) ⎞ ∂T
no dependence upon the coordinate velocities, the qi . Then the generalized forces are
∂U ∂U
Gj = − = 0. This gives G j = − = ⎜ ⎟−
∂q j ∂q j ∂q j dt ⎜⎝ ∂q j ⎟⎠ ∂q j
, with . We can

simplify if we define the Lagrangian L = T − U , and rearrange to get:


d ⎛ ∂L
⎞ ∂L
⎟⎟ =
⎜ , for j = 1 to n, the DOF.
dt ⎜⎝ ∂q j
⎠ ∂q j
(4.8)

∂L
It is customary to define p j =
∂q j
, the momentum conjugate to q j or more simply, the

conjugate momentum. It is not necessarily the same as mechanical momentum.

∂L
mq 2 − U , where U = U ( q ). Then p =
Illustration: Let L = = mq, the mechanical
1
2 ∂q


momentum. We can thus identify a conjugate momentum operator for the Lagrangian:
pˆ j =
∂q j
.

Chapter 4-Motivating the Euler-Lagrange Equations of Motion Page 2


Constraints and Generalized Coordinates
We have introduced generalized coordinates in response to the existence of constraints on
the motion. The use of generalized coordinates often simplifies the description of the
dynamical problem in a very natural way, but it makes the direct application of Newton’s
2nd law of motion difficult, so instead you can use the Euler-Lagrange equations of
motion.

In this chapter we have looked at a minimal set of generalized coordinates, having used
the equations of constraint to remove coordinates beyond those required for the number
of degrees of freedom natural to the problem. However, it is not always easy or
convenient to find generalized coordinates which match the DOF; in those cases we will
use a clever method due to Lagrange for working with a surplus of coordinates, the
method of undetermined multipliers.

Chapter 4-Motivating the Euler-Lagrange Equations of Motion Page 3


5. An Introduction to the Calculus of Variations
The calculus of variations is a mathematical technique, but it was motivated by physical
considerations. We will begin with an important physical law which is an alternative to
Newton’s Laws of Motion.

The Principle of Virtual Work

Simon Stevin (1548-1620), Flemish Engineer and Mathematician, made a detailed study
of mechanical systems. He had the figure of a continuous chain of balls which rests on
an asymmetric ramp engraved on his tomb. Will the unbalanced weights cause the chain
to move? Consider a virtual displacement, where we move each ball clockwise by one
position. Since the configuration is unchanged by this action, there was no net work
performed … thus the chain is in equilibrium, and nothing moves. This reasoning is an
application of an early version of the Principle of Virtual Work.

From The Museum of Unworkable Devices:


http://www.lhup.edu/~dsimanek/museum/unwork.htm

following form: Let δ R be a virtual displacement which is arbitrary, but consistent with
This principle was later given precise mathematical form by Lagrange in essentially the

any constraints. Then the virtual work is δ W = F iδ R = ∑ Giδ qi . For systems in


equilibrium the Principle of Virtual Works states that δ W = 0.

Chapter 5-An Introduction to the Calculus of Variations Page 1


Consider a system with forces due to a potential: F = −∇U . If it is in equilibrium the
Principle of Virtual Work gives δ W = F iδ R = −∇U iδ R = −∑
∂U
δ qi = 0. Since the
∂qi
∂U
= 0 , and so ∇U = 0. Thus at
∂qi
generalized coordinates are independent, each of the

equilibrium we are at a stationary point of the potential. If it is a minima, we have a stable


equilibrium; if a maximum, it is unstable, otherwise it is neutral. Lanczos shows how to
derive all of the laws of statics from the Principle of Virtual Work.

In 1743 D’Alembert (1717-1783) extended this to include dynamic systems by noting

( ) ( )
that if we have both impressed forces and forces of constraint, then Newton’s 2nd law of
motion can be written: F = FI + FC = mr ⇒ FI − mr = − FC , where the so-called
inertial forces have been moved over with the impressed forces, isolating the unknown

we get δ W = FC iδ R = 0, which means that the constraints do no work. Substituting


forces of constraint. Extending the Principle of Virtual Work to the forces of constraint,

( )
∑ Fi − mi ri iδ ri = 0.
D’Alembert’s expression for the forces of constraint in a moving system gives

This is known as D’Alembert’s Principle, and is used in the derivation of Lagrange’s


equations of motion by means of the Calculus of Variations. The impressed forces and
the unknown trajectories together ensure that no work is done by the constraints.

Chapter 5-An Introduction to the Calculus of Variations Page 2


The Calculus of Variations

In order to minimize a known function we start by examining its stationary points; these
are the places where the derivatives vanish. The calculus of variations provides tools for
a related task: identification of functions which make an integral stationary over a path.
Instead of operating in a space of points, we now operate in a space of functions.

Suppose we have a trial function, y = f ( x ) , then we will need to vary it in order to see if
the integral is stationary; if it is, then the variation of the integral will be zero.

We will only consider weak variations which are defined as arbitrary functions
constrained only by continuity conditions and a requirement that they vanish at the end-

can do this by positing a parametric family of curves y = f ( x, ε ) = f ( x, 0 ) + ε ⋅ ϕ ( x),


points of the integration; that is, there are fixed boundary conditions on the integrals. We

where y = f ( x, 0 ) is the unknown function being sought, and ϕ ( x ) is a suitably smooth

the variations; ε is the parameter which connects the family together, and will be of
function which vanishes at the boundaries, but is otherwise arbitrary in order to provide

interest primarily as we take the limit as it passes towards zero. The end result will be a
partial differential equation (PDE) in y and its derivatives; the solutions of this PDE are
the functions which make the integral stationary under weak variations.

∂f ∂f
Then dy = dx + d ε , but our immediate interest is not in dy , which is a small
∂x ∂ε

which contains the arbitrary variations in y due to the parameter ε and the arbitrary
change in the function due to a small change in position, dx, but rather in the latter term,

function ϕ ( x ) via
∂f ∂
dε = ⎡ f ( x, 0 ) + ε ⋅ ϕ ( x ) ⎤⎦ d ε = ϕ ( x ) d ε . We thus introduce a
∂ε ∂ε ⎣
new type of differential, the d–process, for which only the dependent variables are varied:

δx=0
δ y = f ( x, ε ) − f ( x, 0 ) = ε ⋅ ϕ ( x ) .
(5.1)
(5.2)

represents an infinitesimal change as ε → 0, but it is called a virtual displacement since


These properties result in variations which are very similar to differentials in that it

ϕ ( x ) is completely arbitrary except for continuity conditions, and vanishing at the


boundaries.

Chapter 5-An Introduction to the Calculus of Variations Page 3


Working with the d–Process
(δ y ) = ( f ( x, ε ) − f ( x, 0 ) ) = (ε ⋅ ϕ ( x ) ) = ε ⋅ ϕ ( x ) = ε ⋅ ϕ ′ ( x ) , and
d d d d

f ( x, ε ) = δ f ′ ( x, ε ) = f ′ ( x, ε ) − f ′ ( x, 0 ) = ε ⋅ ϕ ′ ( x ) , so the derivative of the variation


dx dx dx dx
δ
d
dx
is equal to the variation of the derivative for the independent (unvaried) variable, or

(5.3) δ = δy.
dy d
dx dx

has a known form, say F ( x, y, y′ ) , we can use our d–process tools on the following
Now consider working with definite integrals with fixed end-points. Then if the integrand

integral where the function y = f ( x ) is to be determined:

I ( ε ) = ∫ F ( x, y ( x, ε ) , y′ ( x, ε ) )dx.
B
(5.4)
A
Since the unknown is a function, this type of integral is called a functional.

The stationary points are determined in the usual way, by setting the derivative of the

∂F ∂y ∂F ∂y′ ⎤
I (ε → 0 ) = ∫ dx = ∫ ⎢
B ∂F B ⎡ ∂F ∂x
functional to zero:
0= + + ⎥dx.
dε A ∂ε
⎣ ∂x ∂ε ∂y ∂ε ∂y′ ∂ε ⎦
d
(5.5)

∂x
A

= 0 , and the other partial derivatives with respect to the parameter ε can be
∂ε
But
∂y ∂y′
= ϕ ( x), = ϕ ′ ( x ) , so these can be put back
∂ε ∂ε
reduced to the forms arrived at above,

B ∂F ∂y ′
(5.6) ∫ dx = ∫ ϕ ( x ) |BA − ∫ ϕ ( x )dx = − ∫
B ∂F ∂F B d ∂F B d ∂F
ϕ ′ ( x )dx = ϕ ( x )dx,
into (5.5), and the final term can be integrated by parts to get:

A ∂y ′ ∂ε A ∂y ′ ∂y′ A dx ∂y ′ A dx ∂y ′

where the integrated term vanishes due to the boundary conditions imposed on ϕ ( x ) .
Putting this back into (5.5) and rearranging slightly gives:

0=∫ ⎢
B ⎡ ∂F d ∂F ⎤
− ⎥ϕ ( x ) dx.
⎣ ∂y dx ∂y′ ⎦
(5.7)

Since ϕ ( x ) is arbitrary we conclude that the bracketed expression must be zero in order
A

for this to be a stationary point. Thus the solutions to the problem must be solutions of
the resulting PDE, the Euler-Lagrange differential equation:
∂F d ∂F
− =0.
∂y dx ∂y′
(5.8)

Chapter 5-An Introduction to the Calculus of Variations Page 4


These steps may be more easily remembered if we use the variational notation of the d–

0 = δ I = δ ∫ F ( x, y, y′ )dx = ∫ δ F ( x, y, y′ )dx = ∫ ⎢ δ y +
B ⎡ ∂F ∂F ⎤
process to carry out the procedure:
δ y′⎥ dx,
⎣ ∂y ∂y′
B B


(5.9)

where we have automatically taken into account that δ x = 0. Recognizing that


A A A

δ y′ = δ y we can integrate the final term by parts to get


d

(5.10) ∫ δ y′dx = ∫ δ y |BA − ∫ δ y dx = − ∫


B ∂F B ∂F d ∂F B d ∂F B d ∂F
( δ y ) dx =
dx
δ y dx,
A ∂y ′ A ∂y ′ dx ∂y′ A dx ∂y ′ A dx ∂y ′

Putting this back into (5.10) gives the equivalent of (5.7), but with δ y in place of ϕ ( x ) .
where the integrated term vanishes because the variation vanishes at the boundaries.

We again arrive at the Euler-Lagrange differential equation.

Hamilton’s Principle

from x → t , y → q, and y′ → q to see the connection between the Euler-Lagrange


This variational condition will lead us to Hamilton’s Principle; we only need to transform

differential equation and the Euler-Lagrange equations of motion. Hamilton’s Principle

0 = δ ∫ L dt ,
can be succinctly stated:
B
(5.11)

where the Lagrangian is a function of generalized coordinates, qk ( t ) , their time


A

derivatives, qk ( t ) , and the time, t. The beginning and ending positions are fixed via the
definite times of the integration. We follow the weak variations method so that δ t = 0

determine the qk ( t ) , the trajectories, subject to (5.11).


and the variations of the other parameters vanish at the end points. The goal is to

Clearly, if we vary each qk ( t ) individually we obtain our previous solution,


∂L d ⎛ ∂L ⎞
− ⎜ ⎟ = 0.
∂qk dt ⎝ ∂qk

(5.12)

Designating this individual variation of qk ( t ) by δ k I , we get δ I = ∑ δ k I = 0, as each of


k
the individual variations is zero. Thus the Euler-Lagrange equations of motion hold for
all of the generalized coordinates.

Chapter 5-An Introduction to the Calculus of Variations Page 5


Worked Example of Hamilton’s Principle

A rigid body of mass m is falling in a uniform gravitational field with acceleration g.

Let height = y.
Let speed = y.
Then kinetic energy is T = my and potential energy is U = mgy.
1 2
2
So the Lagrangian is L = T − U = my 2 − mgy.
1
2

By Hamilton’s Principle we have

0 = δ ∫ L dt = δ ∫ ⎜ my 2 − mgy ⎟ dt.
⎛1 ⎞
⎝2 ⎠

⎛1
( )
∫ ⎜⎝ 2 mδ y − mgδ y ⎟⎠dt = ∫ ( myδ y − mgδ y )dt.

Carrying out the variation gives
2

∫ my dt (δ y ) dt = myδ y |A − ∫ dt ( my ) δ y dt = − ∫ myδ y dt ,
The first term can be rewritten as a differential, then integrated by parts:
d B d

where the integrated term vanishes due to the lack of variation at the boundaries.

0 = ∫ ( − myδ y − mgδ y )dt = ∫ ( − my − mg ) δ y dt.


Putting this result back together with the remaining term leaves

Since δ y is arbitrary, and could be held positive, its factor must be zero everywhere.

Thus for the freely falling particle y = − g .

∂L ∂ ⎛ 1 2 ⎞
We would get the same result by direct application of Lagrange’s equations of motion:
= ⎜ my − mgy ⎟ = my.
∂y ∂y ⎝ 2 ⎠
∂L ∂ ⎛ 1 2 ⎞
= ⎜ my − mgy ⎟ = −mg.
∂y ∂y ⎝ 2 ⎠
d ⎛ ∂L ⎞ ∂L d
0= ⎜ ⎟− = ( my ) − ( − mg ) = my + mg.
dt ⎝ ∂y ⎠ ∂y dt

Which is the expected result, y = − g .

Chapter 5-An Introduction to the Calculus of Variations Page 6


6. Constraints and Lagrange’s Method of Undetermined
Multipliers

When kinematical or other constraints apply we can no longer freely vary all of the

multiple constraints f j ( q1 , q2 , , qn , t ) = 0 so that there are m equations of constraint


generalized coordinates, qk , for they are no longer independent. Suppose we have

each with n variables, leaving ( n − m ) DOF. It may be possible and convenient to


algebraically eliminate some or all of the constraints, thereby reducing the number of
generalized coordinates, but we may prefer to keep some of the surplus coordinates due
to their natural interpretation or other reasons, such as symmetry. Lagrange’s method of

Taking the variation of the constraint f j gives δ f j = ∑ k =1


∂f j
undetermined multipliers allows this.
δ qk = 0, which holds at all
∂qk
n

times t; note that the time is not varied, as it will be our independent variable. Note also

equivalent to ∇f j iδ q = 0, where δ q is variation of the generalized position vector. This is


that it is the sum that is zero, so that each individual term may vary, and that this is

of how we pick the undetermined Lagrange multiplier, λ j , we get λ jδ f j = 0.


a reminder that Lagrange’s theorem is relevant. Because the sum is zero, then regardless

Consider then an augmented Lagrangian, L′ = L + λ1 f1 + + λm f m ; the variation is then

δ L′ = δ L + ∑ j =1 λ jδ f j = δ L + ∑ j =1 λ j ∑ k =1 j δ qk .
∂f
∂qk
m m n
(6.1)

Carrying out the variations in L, and regrouping in terms around the δ qk gives

δ L = ∑ k =1 ⎜⎜ ⎟ + ∑ j =1 λ j
⎛ ∂L d ⎛ ∂L ⎞ ∂f j ⎞
− ⎜ ⎟δ qk .
⎝ ∂qk dt ⎝ ∂qk ⎠ ∂qk ⎠⎟
n m
(6.2)

Note that each of the δ qk includes the entire set of m Lagrange multipliers; the original
problem had ( n − m ) DOF, and we have added in m DOF via the Lagrange multipliers.
This is enough to treat all n of the δ qk as independent, and we thus can set each of the

the equations of constraint, if required. Note that the λ j are functions of time; this is
terms independently to zero. The Lagrange multipliers can be determined with the aid of

because they must satisfy the conditions at each point in time. The Lagrange equation of
motion for the generalized coordinate qk and subject to m constraints is often written:

⎟ = ∑ j =1 λ j Ajk where Ajk =


∂L d ⎛ ∂L ⎞ ∂f j
− ⎜
∂qk dt ⎝ ∂qk ⎠ ∂qk
m
(6.3) .

Note that each of the constraints appears with qk in the form of λ j , but only with the
constraint coefficients due to δ qk , these being the A jk .

Chapter 6-Constraints and Lagrange’s Method of Undetermined Multipliers Page 1


The Lagrange multipliers correspond to the generalized forces required to maintain the
specified constraints. In the present case the constraints have been converted from
equations to total differentials, which are called holonomic constraints. If the constraints
were given in the form of non-integrable differential form, they are called nonholonomic,
and the same method applies. The principle difference is that the generalized forces from
holonomic constraints are derivable from a potential, which is conservative if it does not
depend upon explicit time; Lanczos calls these monogenic forces. The generalized forces
corresponding to nonholonomic constraints are polygenic, and cannot be derived from a
potential. Gravity is a monogenic force; friction is polygenic. A typical nonholonomic
constraint is “rolling without slipping”, which depends upon friction.

In the case of holonomic constraints, we can use Hamilton’s principle directly. With


d ⎛ ∂T ⎞ ∂T
nonholonomic constraints we go back to the Euler-Lagrange equations from chapter four,
∂ri
where we had Gk = ⎜ ⎟ − =
N

dt ⎝ ∂qk ⎠ ∂qk ∂qk


, with G Fi i , the work functions. If the work
i =1
k

functions are derivable from a conservative potential, Gk = −∇U k , then we can define
L = T − U for as many of the generalized forces as this holds. For polygenic forces and
nonholonomic constraints we instead end up with:

= Gk + ∑ j =1 λ j Ajk ,
d ⎛ ∂T ⎞ ∂T
⎜ ⎟−
dt ⎝ ∂qk ⎠ ∂qk
m
(6.4)

where A jk is the coefficient of dqk from the j th nonholonomic constraint equation; it


∂f j
would be the coefficient of δ qk ,
∂qk
for a holonomic constraint. This is the most

general form of the Lagrange equations of motion.

In the next lesson we will solve some problems involving constraints.

Chapter 6-Constraints and Lagrange’s Method of Undetermined Multipliers Page 2


7. Some Worked Examples of Lagrangian Methods

⎟ = ∑ j =1 λ j Ajk where A jk is the


∂L d ⎛ ∂L ⎞
− ⎜
∂qk dt ⎝ ∂qk
m


Recall from the previous lesson:

coefficient of dqk from the j th nonholonomic constraint. For holonomic constraints,


∂f
Ajk = j .
∂qk

A. Ball rolling off of a ball (Newtonian force analysis)

Rolling without slipping


conditions:

ψ (t = 0) = ϕ (t = 0) = 0
s = R1ϕ = R2ψ ⇒ ψ = ( R1 R2 ) ϕ
ϕ = π 2 − θ ⇒ ψ = ( R1 R2 )(π 2 − θ )
ϕ = −θ and ψ = − ( R1 R2 )θ

rˆ = cos (θ ) xˆ + sin (θ ) yˆ
θˆ = − sin (θ ) xˆ + cos (θ ) yˆ xˆ = cos (θ ) rˆ − sin (θ )θˆ
Vector relationships: rˆi yˆ = sin (θ ) θˆi xˆ = cos (θ ) yˆ = sin (θ ) rˆ + cos (θ )θˆ

rˆ = θθˆ θˆ = −θ rˆ ⇒ v = rrˆ + rθθˆ


and ( ) (
a = v = r − rθ 2 rˆ + rθ + 2rθ θˆ. )

Chapter 7-Some Worked Examples of Lagrangian Methods Page 1


( )
Forces: F = w + N + f = ma where weight is w = − mgyˆ = −mg sin (θ ) rˆ + cos (θ ) θˆ . The
normal force, supporting the rolling ball on the stationary ball, is N = Nrˆ. The frictional
force, which keeps the rolling ball from slipping, opposes the motion, and is f = f θˆ.
Since the rolling ball is accelerating, these latter cannot be found from the weight alone;
we will also have to consider the torque.

( )
Having stated all of the forces in polar coordinates, do the same for the acceleration of

( ) ( )
−mg sin (θ ) rˆ + cos (θ ) θˆ + Nrˆ + f θˆ = m ⎡ r − rθ 2 rˆ + rθ + 2rθ θˆ ⎤ .
the center of the rolling ball, which gives:

⎣ ⎦

( )
N = m r − rθ 2 + mg sin (θ ) ,
Matching coefficients of the component vectors allows isolation of the unknown forces:

f = m ( rθ + 2rθ ) + mg cos (θ ) .
But the radius vector is constrained to move along a circle of radius r = R1 + R2 , a
constant, so r = r = 0, and the unknown force expressions simplify to:
N = −m ( R1 + R2 ) θ 2 + mg sin (θ ) ,
f = m ( R1 + R2 ) θ + mg cos (θ ) .

( )
The additional concept required is that of the torque of the rolling ball, lumped at the

arm Λ = − R2 rˆ × f = − R2 f rˆ × θˆ = − R2 fzˆ, and also in the form of the 2nd law of motion
center of mass. This can be expressed in two ways: in terms of the force and the moment

for rotating bodies, derived from the rate of change of angular momentum, Λ = α I , where
⎛ R1 + R2 ⎞
α = − (ϕ + ψ ) = + ⎜ ⎟ θ is the angular acceleration with respect to the − ẑ axis, and
⎝ R2 ⎠
we have compounded the motions with respect to both spheres; and I = m ( R2 ) is the
2 2

⎛ R + R2 ⎞ 2
moment of inertia of the rolling ball. This gives us

− R2 f = + ⎜ 1 ⎟ θ i m ( R2 ) ⇒
⎝ R2 ⎠ 5
2

f = − m ( R1 + R2 ) θ .
2
5

m ( R1 + R2 ) θ + mg cos (θ ) = − m ( R1 + R2 ) θ , ⇒
Equating this with the previous expression for the frictional force gives
2

5 g cos (θ )
5

θ =− = − k cos (θ ) , k =
7 ( R1 + R2 ) 7 ( R1 + R2 )
5 g
.

Chapter 7-Some Worked Examples of Lagrangian Methods Page 2


To find the normal force, first find θ (t ) = ∫ θ (t )dt , where we assume θ ( t = 0 ) = 0 for the
t

∫ −k cos(θ )dt is not directly integrable because we


0
t

do not know the form of θ ( t ) yet. So multiply both sides by the integrating factor θ ,
rolling ball initially at rest. However
0

then integrate θθ = kθ cos (θ ) .

∫ θθ dt = ∫ θ dθ = θ 2 ,
For the LHS we get:
t θ 1
t =0 θ =0

π
2
and recalling that θ = when t = 0, for the RHS we get:

− k cos (θ ) dθ =k sin (θ ) − k sin ⎜ ⎟ = k ( sin (θ ) − 1) .


⎛π ⎞
∫ kθ cos (θ ) dt = ∫
2
θ

⎝2⎠
t
π
t =0 θ=

Combining these expressions gives θ 2 = 2k ( sin (θ ) − 1) , and plugging this into the
2

N = − m ( R1 + R2 ) ⋅ 2k ( sin (θ ) − 1) + mg sin (θ ) .
equation for the normal force gives:

( sin (θ ) − 1) + mg sin (θ )
Recalling the definition of the constant k , we get:

N = −m ( R1 + R2 ) ⋅ 2
7 ( R1 + R2 )
5 g

= −mg ⋅
10
7
( sin (θ ) − 1) + mg sin (θ ) = mg (17 sin (θ ) − 10 ) .
1
7

This occurs when sin (θ ) = , or a little bit more than 36 . This position is marked on
The rolling ball leaves the surface of the stationary ball when the normal force is zero.
10
17
the diagram with a dot.

We could integrate again to get θ ( t ) , which would allow us to determine the time of
departure of the rolling ball, and it’s speed and direction, but that is left as an exercise for
the reader.

Chapter 7-Some Worked Examples of Lagrangian Methods Page 3


B. Ball rolling off of a ball (Lagrangian analysis)

Rolling without slipping


conditions (non-
holonomic):

ψ (t = 0) = ϕ (t = 0) = 0
s = R1ϕ = R2ψ ⇒ R1ϕ = R2ψ
⇒ ψ = ( R1 R2 ) ϕ
⇒ R1dϕ − R2 dψ = 0

T = m ( R1 + R2 ) ϕ 2 + I ω 2
Kinetic Energy:
1 2 1

ω = ϕ + ψ I = m ( R2 )
2 2
2 2

U = mg ( R1 + R2 ) cos (ϕ )
Potential Energy:

The center of mass motion is along an arc of radius ( R1 + R2 ) , and at the rate ϕ . The

both spheres, so the angular speed is ω = ϕ + ψ . Putting this together gives the
rolling motion is about the radius R2 , but compounded of the rotations with respect to

Lagrangian:

L = T −U = T = m ( R1 + R2 ) ϕ 2 + m ( R2 ) (ϕ + ψ ) − mg ( R1 + R2 ) cos (ϕ )
1 2 1 2 2

2 5

The rolling without slipping condition is a non-holonomic constraint, R1dϕ − R2 dψ = 0,


which can be handled by the method of undetermined multipliers. There is only one
constraint, so there is just one undetermined multiplier. There is one Lagrange equation
for each generalized variable, and the undetermined multiplier appears with the
corresponding coefficient from the constraint:
d ⎛ ∂L ⎞ ∂L
⎜ ⎟− = λ R1
dt ⎝ ∂ϕ ⎠ ∂ϕ
d ⎛ ∂L ⎞ ∂L
⎜ ⎟− = −λ R2
dt ⎝ ∂ψ ⎠ ∂ψ

Chapter 7-Some Worked Examples of Lagrangian Methods Page 4


∂L ∂L
= m ( R1 + R2 ) ϕ + m ( R2 ) (ϕ +ψ ) = mg ( R1 + R2 ) sin (ϕ ) ⇒
It is natural to evaluate the expressions piece-wise, and then assemble the results.
ϕ:
∂ϕ ∂ϕ
2 2 2

d ⎛ ∂L ⎞ ∂L
⎜ ⎟− = m ( R1 + R2 ) ϕ + m ( R2 ) (ϕ + ψ ) − mg ( R1 + R2 ) sin (ϕ ) = λ R1.
dt ⎝ ∂ϕ ⎠ ∂ϕ
2 2 2

∂L 2 ∂L
ψ: = m ( R2 ) (ϕ +ψ ) =0⇒
∂ψ 5 ∂ψ
2

d ⎛ ∂L ⎞ ∂L 2
⎜ ⎟− = m ( R2 ) (ϕ +ψ ) = −λ R2
dt ⎝ ∂ψ ⎠ ∂ψ 5
2

From the equation of constraint we get ψ = ( R1 R2 ) ϕ , which can be used to eliminate ψ .

m ( R1 + R2 ) ϕ + m ( R2 ) (1 + ( R1 R2 ) ) ϕ − mg ( R1 + R2 ) sin (ϕ ) = λ R1.
This leaves us with two equations:
2 2 2
(7.1)

m ( R2 ) (1 + ( R1 R2 ) ) ϕ = −λ R2 .
5
2 2
(7.2)
5

We can eliminate the unknown multiplier, λ , by substitution of (7.2) into (7.1):


m ( R1 + R2 ) ϕ + m ( R2 ) (1 + ( R1 R2 ) ) ϕ − mg ( R1 + R2 ) sin (ϕ ) = − 1 m ( R2 ) (1 + ( R1 R2 ) ) ϕ .
2 2 2 R 2 2

5 g sin (ϕ )
5 R2 5

This simplifies when we divide out m ( R1 + R2 ) and rearrange: ϕ =


7 ( R1 + R2 )
.

π
This is the same as our previous solution if we substitute ϕ = −θ.
2

The condition for losing contact is an inequality, so we cannot use the undetermined
multipliers to find it; it is actually a boundary condition. Instead we note that the normal
force is equal to the centripetal force at the instant of leaving, so we need
a= = ( R1 + R2 ) ϕ 2 . We find ϕ by integration of ϕ as follows:
( R1 + R2 )
v2

5 g sin (ϕ )
∫ ϕϕ dt = 2 ϕ =∫ dϕ = − ( cos (ϕ ) − 1) ;
ϕ

7 ( R1 + R2 ) 7 ( R1 + R2 )
t 1 2 5 g

g ( cos (ϕ ) − 1) and the condition becomes:


0 0

So a = ( R1 + R2 ) ϕ 2 = −
10

ma = mg cos (ϕ ) ⇒ (1 − cos (ϕ ) ) = cos (ϕ ) ⇒ = cos (ϕ ) , the same point as found


7
10 10
7 17
previously.

Chapter 7-Some Worked Examples of Lagrangian Methods Page 5


Additional Lagrangian Problems

For each problem, write the Lagrangian, and identify the conserved quantity. The goal is
to properly describe the system. For extra credit, write the equations of motion for each
system. Then illustrate the solutions by establishing typical initial conditions.

The key steps are:


a. Select generalized coordinates that are natural to the problem; sketch the problem
b. Express the kinetic and potential energy in terms of the generalized coordinates
c. Write the Lagrangian in terms of (a) and (b)
d. Write the Lagrange equations of motion for each of the generalized coordinates

1. A mass M is free to slide along a horizontal rod, distance from origin is X. A


mass m hangs from a rod of length R, attached to a pivot on mass M such that it is free to
swing in the plane of the formed by the horizontal rod and the freely hanging rod. Ignore
the inertia of the hanging rod, and friction.

2. Same as problem (1), but with mass M fixed, not free to move.

Chapter 7-Some Worked Examples of Lagrangian Methods Page 6


3. A mass M, similar to problem (1), but now attached to a horizontal spring
(constant k), which is attached to a wall. Let position X=0 be the equilibrium point for
the mass-spring system.

4. Same as problem (3), but now include the pendulum from problem (1).


A particle of mass m is in a central force field F ( r ) = −U ( r ) ; that is, the only
∂r
5.

dependence is on distance, not direction. Justify a Lagrangian L ( r , θ ) based upon initial


conditions.

6. A mass m is free to slide down a wedge (base angle θ ) of mass M; the wedge is
free to move in the X direction. Consider only motions along a single horizontal line.
Describe the initial conditions that would violate this constraint.

Chapter 7-Some Worked Examples of Lagrangian Methods Page 7


7. A small pellet of soap is free to move along the inner surface of a hemispherical
bowl. Treat this as sliding without friction, and spherical coordinates.

8. A mass m is hung from a rod of length R which is freely pivoted to the ceiling,
allowing it to swing freely in all directions. Ignore the inertia of the rod, and use
spherical coordinates.

9. Same as problem (8), but now the rod is suspended from a free-moving block of
mass M, perhaps magnetically suspended from the ceiling. Ignore friction.

Chapter 7-Some Worked Examples of Lagrangian Methods Page 8


8-Hamilton’s Principle and Hamilton’s Equations of Motion

Consider a Lagrangian of the form L = L(q, q, t ) which satisfies the Euler-Lagrange


d ⎛ ∂L ⎞ ∂L
⎜ ⎟=
dt ⎝ ∂q ⎠ ∂q
equations of motion: . Defining the “momentum conjugate to q” as

∂L ∂L
p= reduces the equation of motion to p =
∂q ∂q
. The product of each pair of

conjugate coordinates, q j p j , always has units of action, the same as angular momentum.

dL ∂L dq ∂L dq ∂L ∂L d ∂L ∂L d
= ( pq ) + = ( pq − L ) .
Consider the time variation of the Lagrangian:
= + + = pq + pq + , so that −
dt ∂q dt ∂q dt ∂t ∂t dt ∂t ∂t dt
Thus there are both implicit and explicit variations with time, and a transformation is

Hamiltonian H = H ( q, p, t ) = pq − L .
apparent which would remove the implicit time variations: we thus define the

∂F
function. Suppose you have F ( x, y ) and need G ( x, y′ ) where y′ =
The Legendre (or “dual”) transformation is a method used to change the variables of a

∂y
; y is called the

active variable in the transformation from F to G , while x is said to be inactive. The


∂F ∂F ∂F
active variables arise with the differential dF = dx + dy = dx + y′dy. These are
∂x ∂y ∂x
very common and useful in thermodynamics. The transformation is carried out in two

G = yy ′ − F ,
steps, the first being to define the dual function:

followed by the algebraic removal of the variable y. This second step is possible only if
the Hessian determinant is non-zero. The Hessian is formed by taking the Jacobian of
the partial derivatives of the function F with respect to its arguments:
∂2F ∂2F
∂x 2 ∂x∂y
HF ( x, y ) = 2
∂ F ∂2 F
.

∂y∂x ∂y 2

∂G ∂G
dy′ = d ( yy′ − F ) = ydy′ + y′dy − dF , which expands and simplifies to:
Taking the differential of the dual function gives:
dG = dx +
∂x ∂y′
⎛ ∂F ⎞ ∂F
dG = ydy′ + y′dy − ⎜ dx + y′dy ⎟ = ydy′ −
⎝ ∂x ⎠ ∂x
dx.

Chapter 8-Hamilton’s Principle and Hamilton’s Equations of Motion Page 1


∂G
= y, so that we
∂y′
Comparing the formal and the final expressions for dG indicates that

∂G ∂F
can recover the original active variable by means of differentiation. Note also that
=−
∂x ∂x
for the passive variable.

Now consider again the Lagrangian L ( q, q, t ) , with momentum conjugate to q being


∂L
p= . Then the Legendre transform H = qp − L gives the function H ( q, p, t ) such
∂q

∂H
that:
=q
∂p
(8.1) this being the result of the active transform on q

∂H ∂L
=−
∂q ∂q
(8.2) this being the result of the passive transform on q

∂H ∂L
=−
∂t ∂t
(8.3) this being the result of the passive transform on t.

∂L d ⎛ ∂L ⎞ d ∂H
So far this is all mathematics. The physics comes from Lagrange’s equations of
= ⎜ ⎟ = ( p ) = p. Thus equation (8.2) becomes = − p.
∂q dt ⎝ ∂q ⎠ dt ∂q
motion:

The results of this transformation are called Hamilton’s equations of motion, and they
bring a great simplification to theoretical mechanics. This is due in part to the symmetric
relation between p and q, and the reduction from 2nd order to 1st order partial differential
equations. But the great value is that the theatre of physics has been relocated from the
busy confines of the n-dimensional configuration space to the more open 2n-dimensional
phase space.

The Lagrangian exists in the tangent space of the configuration manifold, while the
Hamiltonian is its dual, and exists in the corresponding cotangent space (phase space).

Following the Hamiltonian further would lead to a study of its symplectic structure, of
Poisson brackets, Liouville’s theorem, and of canonical transformations. Instead of the
configuration space of the Lagrangian, we work in the phase space of the Hamiltonian;
this rich environment is of great importance in classical, statistical, and quantum
mechanics. We will take a brief look at a number of these important topics, and try to
show their inter-connections.

Chapter 8-Hamilton’s Principle and Hamilton’s Equations of Motion Page 2


Conservation of Energy and the Hamiltonian

We began by looking at the time variation of the Lagrangian, and transformed to the

∂H ∂H ∂H ∂L ∂L
( H ) = q + p + = pq − qp − = − , where we have substituted the
Hamiltonian. The temporal variation of the Hamiltonian is:
d
dt ∂q ∂p ∂t ∂t ∂t
results from Hamilton’s equations of motion for the partial derivatives of H. Thus if the
Lagrangian has no explicit dependence upon time, the Hamiltonian is time independent.

represents total energy, which is conserved: H = T + U .


More simply, and under very common conditions, we can show that the Hamiltonian

We thus restrict ourselves to systems where H = 0 , and show that this must be a
conservative system. We need the result of Euler’s theorem on homogeneous functions;

beginning with T = ∑ aij q j qi this gives = ∑ aij q j . If the potential energy is


∂T
in our case this depends upon the kinetic energy being quadratic in the velocities, and
1
∂qi

, so we get ∑ pi qi = ∑ aij qi q j = 2T .
2 i, j
∂L ∂T
j

independent of the velocities, qi , then pi = =


∂qi ∂qi
The conclusion follows immediately, since H = ∑ q j p j − L = 2T − (T − U ) = T + U = E.
i i, j

Hamilton’s Principle and Hamilton’s Equations of Motion

principle , we will now apply Hamilton’s Principle to L = qp − H . Note that the path
Recalling that the Euler-Lagrange equations of motion can be derived from a variational

variable (time) is not varied, and that there are no variations at the endpoints for any
variable:

0 = δ ∫ Ldt = ∫ δ Ldt = ∫ δ (qp − H )dt = ∫ [ pδ q + qδ p − δ H ]dt

Noting that δ q = δ ⎜ ⎟ = (δ q ) , we can perform integration by parts on the first term,


⎛ dq ⎞ d
⎝ dt ⎠ dt
transforming it from pδ q to − pδ q, where the integrated portion is zero due to the lack of
variation at the endpoints. We make this substitution, and expand δ H to get:
⎡⎛ ∂H ⎞ ⎤
0 = ∫ ⎢ − pδ q + qδ p − δ p ⎥dt = ∫ ⎢⎜ − p −
⎡ ∂H ∂H ⎤ ∂H ⎞ ⎛
δq − ⎟δ q + ⎜ q − ⎟ δ p ⎥dt.
⎣ ∂q ∂p ⎦ ⎣⎝ ∂q ⎠ ⎝ ∂p ⎠ ⎦
The coefficients of the variations δ q and δ p provide Hamilton’s equations of motion:
∂H ∂H ∂H ∂L
= − p and = q, along with = H = − , which we derived above. Since we
∂q ∂p ∂t ∂t

δ q and δ p are independent of each other.


have already found them from the Legendre transform, this is a proof that the variations

Chapter 8-Hamilton’s Principle and Hamilton’s Equations of Motion Page 3


Some Worked Examples of Hamiltonians

1. The harmonic oscillator


2. The chain of harmonic oscillators

Chapter 8-Hamilton’s Principle and Hamilton’s Equations of Motion Page 4


9-Hamilton-Jacobi Equation

We now leave the N-dimensional configuration space of the Lagrangian L ( q, q, t ) , and


enter the 2N-dimensional phase space of the Hamiltonian H = H (q, p, t ) = ∑ pk qk − L .

∂L
k

In configuration space we found the momentum conjugate to qk as pk =


∂qk
, and there

∂L
are N equations of motion: pk =
∂qk
. In phase space the conjugate momentum becomes

∂H ∂H
= qk and = − pk .
∂p k ∂qk
an independent variable with 2N equations of motion:

We will explore some of the properties of phase space, in order to appreciate its
advantages over the more familiar configuration space, and seek methods for solving
Hamilton’s equations of motion. We will consider only conservative systems.

Canonical Transformations

∂L
The Lagrange equations of motion retain the same form even as the generalized
coordinates are transformed. A lucky choice may result in pk = = 0, which implies
∂q k

under the restricted group of canonical transformations { pk , qk } ↔ { Pk , Qk } such that


that pk is a constant of the motion. Hamilton’s equations of motion retain their form only

δ ∫ ⎜ ∑ pk qk − H ⎟dt = 0 = δ ∫ ⎜ ∑ Pk Qk − H ′ ⎟dt ,
⎛ ⎞ ⎛ ⎞
Hamilton’s Principle is satisified:

⎝ k ⎠ ⎝ k ⎠
(9.1)

dS = ∑ ( pk dqk − Pk dQk ) + ( H − H ′ )dt.


which means that the integrands may differ by a total differential or boundary term:
(9.2)
k
During the transformation process we pick one old and one new variable to be

on ( q, Q, t ) , ( q, P, t ) , ( p, P, t ) , or ( p, Q, t ) , depending on the problem at hand. From (9.2)


independent, and find the other two implicitly. Thus the generating function can depend

it is natural to take S = S ( q, Q, t ) , and the variation is:

δ∫ dt = δ ∫ dS = δ S ( q, Q, t ) =
∂S ∂S
δq+ δ Q = 0,
dS
∂q ∂Q
(9.3)
dt
the result being zero because the coordinates are fixed at the boundaries. But from

δ S = ∑ ( pk δ qk − Pk δ Qk ), and it follows that pk =


∂S ∂S
matching coefficents of the variation (9.3) with the natural differential (9.2) we get
, Pk = −
∂qk ∂Qk
.
k

Chapter 9-Hamilton-Jacobi Equation Page 1


We also note that this function S ( q, Q, t ) represents the integrated action for the specified

S ( q, Q, t ) = ∫ L dt.
conditions, where the parameters are for the boundary conditions:
(9.4)

We take advantage of this and define a generating function S ( q1 , , qN , Q1 , QN , t ) such


∂S ∂S
that the previous conjugate momentum satisfy pk = , and define the new Pk = −
∂qk ∂Qk
.

= ∑⎜ ( )
= ∑ pk qk − Pk Qk + .
⎛ ∂S ∂S ⎞ ∂S ∂S
qk + Qk ⎟ +
dS
k ⎝ ∂qk ∂Qk ⎠ ∂t ∂t
Then the total derivative is
dt k

∂S
H ' ( Q, P, t ) = H ( q, p, t ) +
Putting these results of the total derivative into (9.2) gives the desired transformation:

∂t
(9.5) ,
∂S
where we also need to substitute pk =
∂qk
into the old Hamiltonian. The most direct

approach is to assume that H ' = 0, which results in the Hamilton-Jacobi equation:


⎛ ∂S ∂S ⎞ ∂S
H ⎜ q1 , qN , ,t ⎟ + = 0.
⎝ ∂q1 ∂qN ⎠ ∂t
(9.6) , ,

The solution is the generating function for H ' ( Q, P, t ) = 0, and automatically gives us all

∂H ′ ∂H ′
of the constants of the motion of the system. This follows directly from the condition
that H ' = 0, so that Qk = − = 0, and Pk = = 0, so Qk and Pk are both constants in
∂Pk ∂Qk
this coordinate system, and thus describe a bundle of parallel lines in the state space, with
axes for Q, P, and t. The effect of the transformation has been to straighten out the world
lines for each of the N particles. The system is stationary in phase space; the generating
function is the annihilator of the old Hamiltonian.

Please note that while finding a solution to this single partial differential equation reduces
the Hamiltonian problem to a process of differentiation, it is still a very difficult problem
… its value is mostly in the theory, which gives us additional means to look at very

function may be separable, and can be written as S = ∑ S ( qk , Qk ), where the Qk are


general problems. Moreover if the Hamiltonian is time-independent the generating

k
constants of the motion. Such systems can be completely solved by these methods.

∂S
Homework: prove that qk = −
∂pk
. This shows that the old coordinates can be recovered

∂S
∂Pk
from the generating function. Also find the value of . Can you also find the total

energy, E?

Homework: find the corresponding equations for the alternative formulations of S.

Chapter 9-Hamilton-Jacobi Equation Page 2


Poisson Brackets

{F , G} = ∑ ⎜
⎛ ∂F ∂G ∂G ∂F ⎞
For functions defined on phase space we define the Poisson bracket as follows:

− ⎟.
k ⎝ ∂qk ∂pk ∂qk ∂pk ⎠
(9.7)

Many of the properties are easy to show:

{F , G} = − {G, F }
{F1 + F2 , G} = {F1 , G} + {F2 , G}
Anticommutation

{F1F2 , G} = F1 {F2 , G} + F2 {F1 , G}


Linearity

{F , {G, H }} + {H , {F , G}} + {G, {H , F }} = 0


Multiplication
Jacobi’s Identity (cyclic permutations)
∂ ∂F ∂G
{F , G} = ⎨⎧ , G ⎬⎫ + ⎨⎧ F , ⎫⎬
∂t ⎩ ∂t ⎭ ⎩ ∂t ⎭
{q , q } = 0, { p , p } = 0, { p , q } = δ
Derivative
j
Phase space variables
∂F ∂F
{ F , qk } = − ; { F , pk } =
j k j k j k k

∂pk ∂qk
Selection property

= ∑⎜ = ∑⎜
⎛ ∂F dqk ∂F dpk ⎞ ∂F ⎛ ∂F ∂H ∂F ∂H ⎞ ∂F
The total derivative has a special relationship with the Hamiltonian:
∂F
+ ⎟+ − ⎟+ = {F , H } +
dF
k ⎝ ∂qk dt ∂pk dt ⎠ ∂t k ⎝ ∂qk ∂pk ∂pk ∂qk ⎠ ∂t ∂t
,
dt
Where we have used Hamiltion’s equations of motion in the second step, which is
identical to the Poisson bracket found in the third step.

bracket notation: qk = {qk , H } ; pk = { pk , H }. A further application is F is conserved if


Using this relationship, we can rewrite Hamilton’s equations of motion in Poisson

and only if { F , H } = 0.

The Poisson brackets form a Lie algebra. It is important to note that the expressions are
invariant under canoncial transformations; the proof is a straight-forward, if lengthy,
exercise of the chain rule for partial derivatives. We will use these results briefly, but
will revisit them when we get to quantum mechanics.

Homework (extra credit): prove that the Poisson bracket {F , G}Q , P = { F , G}q , p ; start with
the dependence upon Qk , Pk and recall that each of these may depend upon all of the
qk , pk . Then use the Poisson bracket phase space variable identities to consolidate the
expansion.

Chapter 9-Hamilton-Jacobi Equation Page 3


Liousville’s Theorem

For a Hamiltonian with constant energy the system is clearly confined to move on a
hypersurface of constant energy through phase space. Liousville’s theorem states that the
phase fluid for any collection of Hamiltonians is incompressible. We show this by

simply v = ( q1 , , qN ; p1 , , pN ) , and the divergence is


proving that the divergence of the velocity field in phase space is zero. The velocity is

∇ iv = ∑ ⎜ ⎟ = ∑⎜
⎛ ∂qk ∂pk ⎞ ⎛ ∂2 H ∂2 H ⎞
+ − ⎟ = 0,
k ⎝ ∂qk ∂pk ⎠ k ⎝ ∂qk ∂pk ∂pk ∂qk ⎠
(9.8)

thanks to Hamilton’s equations of motion.

σ = ∫ d N q d N p,
This is an important result for statistical mechanics, where it is often written as
(9.9)
where the integration is over all of the 2N phase space coordinates for some initial
volume, and then follow that patch of phase fluid over time. The enclosed volume may
become distorted, but it remains constant for all time. If we divide by the fixed volume,
we get the unchanging phase space density as that collection of systems evolves.

∂ρ
= − {ρ , H } , where
∂t
Homework: An alternative expression for Liousville’s theorem is

ρ is a probability distribution on phase space. To prove this, show that = 0.
dt

Chapter 9-Hamilton-Jacobi Equation Page 4


10-Connections to Optics

We have come a long way from Newton’s F = ma. We will finish this lecture series
with some connections to optics and quantum mechanics. At a later date we may return
and show how analytic mechanics lies at the foundations of thermodynamics, that is, of
statistical mechanics.

By Hamilton’s Principle, δ ∫ Ldt = 0, where the integration takes place between fixed
limits in order for the boundary terms to cancel. In our search for a solution by means of

generating function, S ( q, Q, t ) = ∫ L dt. The boundary terms would then be the action
canonical transformations we found an expression for such boundary terms in the

constants S 2 − S1 = S ( q ( t2 ) , Q ( t2 ) , t2 ) − S ( q ( t1 ) , Q ( t1 ) , t1 ) . From this expression it is


clear that we have a parameterized family of surfaces of constant action in phase space,

dependency, the remaining two variables ( p, P ) are implicitly included, and usually one
the result of the Hamilton-Jacobi equation. Though you only see (q, Q ) in the functional

or the other appears in the definition of Q.

Recalling that a complete set of partial derivatives represent a tangent plane, it is natural
that the solutions to partial differential equations result in families of surfaces; the
corresponding analogy from ordinary differential equations are tangents to curves, whose
solutions are trajectories or parameterized paths. We have now seen that the calculus of
variations has provided solutions and insights into both types of problems … as our focus
has shifted from a specific path due to specified boundary conditions, to a family of
solutions which are related by the boundary conditions falling on the specified
hypersurfaces.

This is exactly the connection between ray optics and the optics of wave fronts.

Fermat’s Principle and Derivation of the Eikonal Equation

Optical path length (OPL) is the ideal way to measure the path followed by a beam of

along the path. We use the common notation for the index of refraction, n = c / v, where
light. It takes into account both the distance traveled, and the local speed of light all

c is the speed of light in vacuo, and v is the local speed of light. The index of refraction
is often varies spatially, but may also have temporal variations, such as small thermal or
pressure shifts in air.

variational principle: (10.1) δ ∫ n c ds = 0, where the integral covers a path between


Fermat’s prescription for light, that it travels by the path of least time, can be stated as a

fixed points in space. The integrated optical path length is divided by the speed of light
in vacuo in order to covert back to the true transit time.

Chapter 10-Connections to Optics Page 1


Figure 1- Where should point N be placed on the curve in order to get the shortest possible distance
between N and M? The figure makes it clear that you want the path to be normal to the curve; so the
least distance departure from a surface would follow the gradient.

The Huyghen’s construction is clearly related to Figure 1, but we will see that it is also
the key to the eikonal equation, which is the fundamental equation of ray optics. We will
derive it from Fermat’s Principle; it can also be derived as an approximation to
Maxwell’s equations for electromagnetic waves.

Suppose that our light source is a point, and that the curve is a surface of constant OPL

through a series of these surfaces dS = n ds … and each surface would be lit up in turn.
from the light source. Then if we flashed the light once, we would see the pulse travel

The fastest way to get to the next surface is for each ray to follow the gradient. It’s
magnitude will be the local index of refraction, in the direction of the path, ŝ :
∇S = nsˆ = k .
ω
c
(10.2)
This is the eikonal equation for isotropic media, where ω = 2πν , is the circular frequency
2π ω
(cycle rate), and k = is the wave number such that λν = = v = . The ray direction
λ
c
k n
defines the wave vector, k .

Chapter 10-Connections to Optics Page 2


Variational Derivation of the Eikonal Equation

of variations. First note that our path is ds 2 = dx 2 + dy 2 + dz 2 , so


Let us now derive the eikonal equation directly from Fermat’s Principle by the calculus

ds = ⎜ ⎟ + ⎜ ⎟ + ⎜ ⎟ du , and for an isotropic substance n = n ( x, y, z ) , along


⎛ dx ⎞ ⎛ dy ⎞ ⎛ dz ⎞
2 2 2

⎝ du ⎠ ⎝ du ⎠ ⎝ du ⎠
with v = = . So we can write Fermat’s Principle as:
c ds

δ ∫ dt ′ =δ ∫ = δ ∫ ds =δ ∫ ( x′2 + y′2 + z ′2 ) du =0.


n dt
ds n n
(10.3)
v c c

n ( x, y , z )
that ( x′2 + y′2 + z ′2 ) = 1, and u = s. So L ( x, y, z , x′, y′, z ′, s ) = ( )
Recognizing that the parameterization is arbitrary, let it be the path length, which means
x ′2 + y ′2 + z ′2 ,

= ( x′, y′, z ′ ) . Note that


c
with a unit vector tangent vector, sˆ = Tˆ =
dr

( )
ds

0= T iT = 2Tˆ i , which implies that the rate of change of the unit tangent to a
d ˆ ˆ dTˆ
ds ds
curve is perpendicular to the tangent’s rate of change with arc length; this is the origin of

centripetal acceleration. Let Nˆ = κ ; κ is called the curvature of the path. For a


dTˆ
ds
circular path of radius R, κ = . The Euler-Lagrange equations are:
1

d ⎛ ∂L ⎞ ∂L
R

⎜ ⎟− = 0, and similar for y and z.


ds ⎝ ∂x′ ⎠ ∂x
(10.4)

⎛ ⎞
Carrying out these operations we get:

d ⎜ x′
( )
⎟ = ∂n x′2 + y′2 + z ′2 , but the radicals satisfy the constraint,
ds ⎜⎜ ( )⎟
x′2 + y′2 + z ′2 ⎟ ∂x
n
⎝ ⎠
d ⎛ dx ⎞ ∂n
⎜ n ⎟ = ; taking all three together gives the
ds ⎝ ds ⎠ ∂x
and so are equal to 1, reducing to

( )
eikonal equation,

nTˆ = ∇n .
d
(10.5)
ds

Using the Frenet-Serret curvature apparatus this becomes ∇n = Tˆ + nκ Nˆ .


dn
ds

Integrating this form with respect to arc length recovers our earlier equation (10.2). Either
form of the eikonal equation is logically equivalent to Fermat’s Principle, and any of
these can be taken as the foundation of ray optics. The eikonal equation is also the
mathematical expression of the Huyghen’s construction of advancing wave fronts.

Chapter 10-Connections to Optics Page 3


Figure 2- Caustics are the envelopes of the light rays, causing a bright (or burning) edge. The bright
line and the cusp in the refelection from the metal ring are caustics due to reflection.

Figure 3- The computer-rendered image on the right has enabled a photon-mapping algorithm,
which makes the image appear more life-like. These caustics are due to reflection and refraction.
These phenomena were first studied systematically by Christan Huyghens.

Caustic curves are the envelopes of reflected or refracted rays. They can be found via the
singularities of the eikonal equation. They are responsible for the bright lines that ripple
across a clear pool of water with the wind, and for the effects seen (and unseen) in Figure
2 and Figure 3.

Chapter 10-Connections to Optics Page 4


Hamilton’s Opto-Mechanical Analogy

An alternative to the generating function transformation is Hamilton’s characteristic


function, W , which is derived in a similar fashion, but for time-independent
Hamiltonians. In this case the constitutive equation for the principle function makes use

⎛ ∂W ⎞
of the constant energy:
∂W
H ⎜ q1 , , qN , ⎟ = E.
⎝ ∂q1 ∂qN ⎠
(10.6) , ,

∂W
Otherwise W is defined in a way similar to S above, with pk =
∂qk .
. Then the vector of

all of the conjugate momenta is equal to the gradient of W . If we consider a single-

H = ( px2 + p y2 + pz2 ) / 2m + U ( x, y, z ) = E. But then we immediately get


particle system with with a time-independent potential, then the Hamiltonian is

⎛ ∂W ⎞ ⎛ ∂W ⎞ ⎛ ∂W ⎞
⎜ ⎟ +⎜ ⎟ +⎜ ⎟ = ( ∇W ) = 2m ( E − U ) , which is analagous to the optical
2 2 2

⎝ ∂x ⎠ ⎝ ∂y ⎠ ⎝ ∂z ⎠
2

eikonal equation, ( ∇ϕ ) = n 2 , where ϕ is the wave front, or surface of constant phase,


2

normal to the light rays, and which is equivalent to the Huyghen’s construction. Then the
opto-mechanical analogy is that the surfaces of constant action from Hamilton’s

refraction is proportional to 2m ( E − U ) . This result essentially says that all particles of


characteristic function are normal to the particle trajectories, and the effective index of

the same energy which are on rays at the initial surface will remain on rays for all time.

from Fermat’s variational principle: δ ∫ n c ds = 0, which says that light rays follow the
The opto-mechanical analogy goes further because the eikonal equation can be derived

mechanics, known as Jacobi’s principle: (10.7) δ ∫ ( E − U ) ds = 0, where ds is related to


“shortest” optical path. It follows that there is a corresponding variational principle of

the kinetic energy (and hence has absorbed the masses of all of the particles), and is
independent of the time. In order to evaluate this integral, we must reexpress the

ds 2 = 2Tdt 2 = ∑ aik dqi dqk .


differential in terms of another parameter, such as one of the qk . The value of ds is

The surfaces are now seen to be surfaces of constant action, and we have an alternative
condition for the principle of least action. However, the orthogonality is not Euclidean,
but Riemannian. For example, in the case of electrons in a magnetic field, the surfaces of
constant action are not perpendicular to the trajectories in the Euclidean sense, and thus
not all optical theorems for light hold for electron optics. Lanczos (p. 268) adds some
further cautions: light always obeys the ray property because the possible light rays of a
given optical field form a two-dimensional manifold of curves, while the possible particle
trajectories of mechanics forms a five-dimensional manifold.

Chapter 10-Connections to Optics Page 5


We only get ray-trajectories for particles which are especially selected to meet initial
conditions: we take those particles with the same total energy, and which have paths
perpendicular to the starting surface. Then these particles will continue to have ray
trajectories. Returning to the example of the electron gun, the generation process creates
a very small spread in energies, and the high fields in the cathode-anode region cause the
initial trajectories to be close to perpendicular; deviations from these conditions lead to
blurring of the image, a form of chromatic abberation.

Furthermore, we point out that the hypersurfaces of constant S (or W) are determined by
an ensemble of systems satisfying the same Hamiltonian, but with different boundary
conditions. Now consider the case where they are determined by a single system … these

waves; the wave fronts advance with a phase proportional to S = (W − Et ) .


surfaces would represent true wavefronts. This is the program required to find matter

Chapter 10-Connections to Optics Page 6


11-Connections to Quantum Mechanics

The Principles of Quantum Physics

The accepted principles of classical thermodynamics lead to “an ultraviolet catastrophe”


for black-body radiators. Inspired by recent experimental results, Max Planck (1900)
managed to derive a formula which fit the experimental curve and thus avoided a
theoretical catastrophe … along the way he introduced the idea of quantization of

E = hν = ω as the basis for explaining the photo-electric effect. It seems that


oscillator energy. Albert Einstein (1905) took Planck’s suggestion seriously, and used

electromagnetic waves have some particle-like properties.

In his spare time, between completing his dissertation on the statistical mechanics of
Brownian motion as a means to determine Avogadro’s number and the size of atoms and
molecules, and earning a future Nobel prize for his explanation of the photo-electric
effect, Einstein developed the special theory of relativity as a way to reconcile mechanics
with electrodynamics. This solution imposes additional constraints upon classical

(
E 2 = ( pc ) + mc 2 )
mechanics. These can be viewed as an interconnection of mass, energy, and momentem:
×
2 2 3
, or as a change in metrical structure from a Euclidean for

Chapter 11-Connections to Quantum Mechanics Page 1


space and time to Lorentzian 4 , where the invariant ds 2 = dx 2 + dy 2 + dz 2 − c 2 dt 2 ,
where the spatio-temporal coordinates are observer-dependent.

With the introduction of the nuclear atom by Ernest Rutherford (1911), and the orbital
quantization of this atom by Niels Bohr (1913), the era of “the old quantum theory” was
well under way. This period, ending with a crescendo in 1926, produced the still-useful
semi-classical quantum theory based upon the Hamilton-Jacobi equation. We will look at
only one element, the application by Louis deBroglie to obtain the “quantum wave”,

wavelength: p = h / λ = k .
which can be summarized by the deBroglie relation of classical momentum to quantum

This result was motivated by the Planck relation, and deBroglie applied the notions of
relativity to the Hamilton-Jacobi equation to show that if waves have particle attributes,
then so must particles have wave attributes. Einstein thought highly of deBroglie’s thesis
(1924), and Schrödinger (1925) took the idea up with a vengeance as an alternative to the
matrix mechanics of Heisenberg (1925). Heisenberg’s thesis was that the ills of the “old
quantum mechanics” were bound up with the reism of non-observable trajectories
inherited from classical mechanics … so he banished them! Matrix mechanics only dealt
with experimentally observable quantities. From Wikipedia article on basic quantum
mechanics: Heisenberg voice recording in an early lecture on the uncertainty principle
pointing to a Bohr model of the atom: "You can say, well, this orbit is really not a
complete orbit. Actually at every moment the electron has only an inactual position and
an inactual velocity and between these two inaccuracies there is an inverse correlation."

Schrödinger was an established professor, with a liking for music. He was familiar with
the treatment of analysis of vibrating strings by means of Hamiltonian perturbations, and
immediately saw that deBroglie’s matter waves could be given a Hamiltonian treatment.

It all looked very realistic, but in 1926 Schrödinger proved that the Heisenberg matrix
mechanics and the Schrödinger wave mechanics were mathematically equivalent. So the
quantum wave is an unobservable property of the quantum world. Max Born gave the
explaination (1927) that the quantum wave is a probability amplitude, and that if you
repeat the experiment many times, you will simply fill in the probability distribution.

As for quantum mechanics, the mathematical structure is linear algebra on a complex


state space of functions, known as a Hilbert space. The Heisenberg Uncertainty Principle
(1927) is a result of the so-called wave-particle duality; all wave phenomena have such a
relationship. It is easy of determine the wavelength of a long train of pulses, which of
course is not localized at all, while it is easy to localize a pulse, but then there is no well-
determined wavelength.

Homework: Find the deBroglie wavelength of an electron at room temperature (1/40


eV); of an electron traveling at 21 keV; of a 16 pound bowling ball at the bowling alley.
What do you think these values mean? Do you expect to find quantum effects among the
electrons? Which ones?

Homework: Consider a very thin wall. Describe quantum tunneling.

Chapter 11-Connections to Quantum Mechanics Page 2


Quantum Waves and Quantum Operators

( )
ψ ( r , t ) = Aiexp ⎡i k ir − ωt ⎤ = Aiexp ⎡⎣i ( pir − Et )⎤⎦ .
Assume the existance of de Broglie’s quantum wave:

⎣ ⎦
(11.1)

We can extract the momentum for the x-direction with pˆ x =
∂x
i; this generalizes to

total momentum for pˆ = i ∇, ⇒ pˆψ = pψ . Extraction of the energy is similar, with


∂ ∂
Eˆ = − i = i ⇒ Eˆψ = Eψ . These operators give us eigenvalues, which are scalar
∂t ∂t
multiples of the original quantum function. The classical Hamiltonian for a particle in a
conservative potential is H = + U = E , so we can apply the momentum operator to
p2
2m
the quantum wave twice, pˆ 2 = ∇ 2 = − 2∇ 2 , to get the momentum, and the energy
2

i2

∂ψ
operator once on the right hand side; we get the potential by multiplication:
H =− ∇ +U = E ⇒ − ∇ 2ψ + U iψ = i
2 2

∂t
(11.2) ˆ 2 ˆ .
2m 2m

Hˆψ = Enψ ,
This is the famous Schrödinger wave equation. For time-independent solutions:

∂ψ
(11.3)

an eigenvalue equation. The time evolution follows Eˆψ = i = Eψ , which is easily


∂t
solved as ψ ( t ) = exp [ −iEt / ]ψ ( 0 ) . A first course in quantum mechanics is a detailed
study of quantum operators, and their application to directly solvable problems, such as
the spectrum of the excited hydrogen atom, which involves spherical harmonics.
We have seemingly pulled several quantum operators “out of the hat”; more can be

Poisson brackets to get quantum commutators, [qˆ , pˆ ] = ( qp ˆ ˆ ) = i . That is, if you


ˆ ˆ − pq
obtained by looking at canonically conjugate pairs of variables, and then “quantizing” the

apply a conjugate pair of operators to a quantum wave function, the order of application
makes a difference … this is another way to state the Heisenberg Uncertainty Principle.

Homework: Verify the quantum commutator [qˆ , pˆ ] = ( qp ˆ ˆ ) = i using the function


ˆˆ − pq
f ( x ) . The quantum operator for position in this basis is simply xˆ = x.

Homework: Consider a particle in a 1D box; the sides of the box consist of a very steep
potential, while inside the box the potential is zero. Find the solutions to the time-
independent Schrödinger equation.

Chapter 11-Connections to Quantum Mechanics Page 3


Notation

Dirac introduced a notation that is still dominates quantum mechanics. It is brief, and
conveys the ideas of a vector space quite well. Since the solutions are vectors in a
function space, these functions are represented as vectors. But there is also a dual space,

as the result. Dirac slyly called it the bracket or bra-ket notation, where the bra, < ψ | , is
and the inner product of a vector and a dual is an inner product, with a complex number

in the dual space, and the ket, | ϕ > , is in the vector space. < ψ |* = | ψ >; they are
trasposed conjugates. Their inner product is < ψ | ϕ >=< ϕ | ψ >* , which would represent
the probability amplitude for scattering from the first state to the second. Once you find

this as an integral, ∫ψ *ϕ d n r . The formula for finding the expectation value, or mean
the corresponding functions (usually in terms of the eigenfunctions), you can evaluate

value to be expected from an experiment is < ψ | H | ψ >= ∫ψ * Hψ d n r , where H is the


Hamiltonian operator of the experiment, and ψ is the wave function found as the
solution to the Schrödinger equation. The quantity < ψ | ψ >= ∫ψ *ψ d n r = 1, because it is
the total probability. Always remember to normalize your solutions, or your probablities
(and homework) won’t get 100%. Most Hamiltonian’s are Hermitian operators, and the
rank of the matrix is a countable infinity.

Postscript

Note that we have used a classical, not a relativistic, Hamiltonian. Paul Dirac (1928)
gained fame by finding that equation. It has the inherent feature of requiring the
existance of quantum spin, which has no classical parallel, and the existance of anti-
pariticles, such as the positron.

Quantum mechanics was not complete though, because the classical potential was still
retained. This step requires quantization of the fields (“second quantization”), and was
essentially completed independently by Tomonaga and Schwinger using Hamiltonian
techniques, and Feynman using relativistically-invariant Lagrangian techniques, in the
late 1940’s. The result was Quantum Electro-Dynamics, or QED. In particular, the
Hamilton-Jacobi Equation again plays a role in the Feynman approach, because the
classical action is the temporal propagator. This is also known as the quantum theory of
light, because it covers the interactions between light and matter.

Gravity has yet to be quantized. The difficulty is due to the non-linear nature of
Einstein’s General Theory of Relativity. Current proposals include quantum loop and
string theories.

Homework: Find a fully relativistic quantum theory for gravitation. What currently
accepted theories need to be modified? What role does the Hamilton-Jacobi equation
play in your development?

Chapter 11-Connections to Quantum Mechanics Page 4


Relativistic Force
P. Diehr – July 2002

For relativistic momentum we have p = γ mv , where γ = [1 − v c ] is


2 2 −1/ 2

the Lorentz contraction factor. This applies to a particle moving with


respect to the lab frame.

p = [γ mv ] = γ mv + γ mv
Newton’s second law of motion can be correctly written as

F=
d d
(1.1)
dt dt
Thus we see that in addition to the direction of acceleration, the force also
acts in the direction of motion. Writing a for v , and applying the chain

⎛v a ⎞
rule we easily get

γ =γ3⎜ i ⎟
⎝ c c ⎠ (1.2)

( v ia ) v + γ ma
which gives

F =γ3
m

γ ≈ 1,
(1.3)
c2
The first term vanishes for velocities where v c 1 , and we also have

leaving the traditional form of F = ma. But for relativistic velocities we see
that we cannot find the acceleration directly from the force, for it also
depends upon the velocity.
Let û be the instantaneous direction of travel and û⊥ be any transverse
direction so that uˆ iuˆ⊥ = 0. Using this notation we can resolve the
transverse and longitudinal components of the force:

F⊥ = F iuˆ⊥ = γ m(a iuˆ⊥ ) = γ ma⊥ (1.4)

where the first term has vanished since v iuˆ⊥ = 0. The longitudinal

2 (
v i a ) (v iuˆ ) + γ m( a iuˆ )
component starts with two terms:

F = F iuˆ = γ 3
m
(1.5)

= vuˆ , giving:
c
We can simplify the first term by using v
Relativistic Force
P. Diehr – July 2002

F = γ 3m
1
( vuˆ ia ) (vuˆ iuˆ ) + γ m(a iuˆ )
= γ m 2 ( a ) (1) + γ m( a )
c2
3 v2
c
⎡ 2 v2 ⎤
(1.6)

= γ m ⎢γ 2 + 1⎥ a
⎣ c ⎦

It is easy to verify that γ 2 + 1 = γ , giving a simplified form for the


v2 2 2

F = γ 3ma
longitudinal force:
(1.7)

Historically, an attempt to retain F = ma for relativistic mechanics lead to

m⊥ = γ m
the obvious definitions:

m = γ 3m
These are called the transverse and longitudinal masses, and unlike the rest
mass, are obviously not relativistic invariants. This is because forces are
frame-dependent.

You might also like