Matrix Prop

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

Properties of the Trace and Matrix Derivatives

John Duchi

Contents
1 Notation 2 Matrix multiplication 3 Gradient of linear function 4 Derivative in a trace 5 Derivative of product in trace 6 Derivative of function of a matrix 7 Derivative of linear transformed input to function 8 Funky trace derivative 9 Symmetric Matrices and Eigenvectors 1 1 1 2 2 3 3 3 4

Notation

A few things on notation (which may not be very consistent, actually): The columns of a matrix m A Rmn are a1 through an , while the rows are given (as vectors) by aT throught aT . 1

Matrix multiplication
n

First, consider a matrix A Rnn . We have that AAT =


i=1

ai aT , i

that is, that the product of AAT is the sum of the outer products of the columns of A. To see this, consider that
n

(AAT )ij =
p=1

api apj

because the i, j element is the ith row of A, which is the vector a1i , a2i , , ani , dotted with the j th column of AT , which is a1j , , anj .

If we look at the matrix AAT , we see that n n p=1 ap1 ap1 p=1 ap1 apn . . .. . . AAT = = . . . n n p=1 apn ap1 p=1 apn apn

ai1 ai1 . . . i=1 ain ai1


n

.. .

ai1 ain . . = . ain ain

ai aT i
i=1

Gradient of linear function

Consider Ax, where A Rmn and x Rn . We have T x a1 x x aT x 2 Ax = = a1 a2 . x . . T x am x

am

= AT

Now let us consider xT Ax for A Rnn , x Rn . We have that n 1 n xT Ax = xT [T x aT x aT x]T = x1 aT x + + xn aT x a1 2 If we take the derivative with respect to one of the xl s, we have the l component for each ai , which is to say ail , and the term for xl aT x, which gives us that l T x Ax = xl In the end, we see that
xx n

l xi ail + aT x = aT x + aT x. l l
i=1

Ax = Ax + AT x.

Derivative in a trace

Recall (as in Old and New Matrix Algebra Useful for Statistics) that we can dene the dierential of a function f (x) to be the part of f (x + dx) f (x) that is linear in dx, i.e. is a constant times dx. Then, for example, for a vector valued function f , we can have f (x + dx) = f (x) + f (x)dx + (higher order terms). In the above, f is the derivative (or Jacobian). Note that the gradient is the transpose of the Jacobian. Consider an arbitrary matrix A. We see that T a1 dx1 .. tr . n T an dxn aT dxi tr(AdX) = = i=1 i . dX dX dX Thus, we have tr(AdX) dX so that =
ij n i=1

aT dxi i = aij xji

tr(AdX) = A. dX Note that this is the Jacobian formulation. 2

Derivative of product in trace


A trAB

In this section, we prove that

= BT

trAB = tr = tr
m

a1 a2 b1 . . . an a1 T b1 a2 T b1 . . . an T b1 a1 T b2 a2 T b2 an T b2
m

b2

bn

.. .

a1 T bn a2 T bn . . . an T bn
m

=
i=1

a1i bi1 +
i=1

a2i bi2 + . . . +
i=1

ani bin

trAB aij
A trAB

= =

bji BT

Derivative of function of a matrix


AT f (A)

Here we prove that =(


A f (A)) f (A) A21 f (A) A22

.
f (A) An1 f (A) An2

f (A) = = (

AT

f (A) A11 f (A) A12

. . .

. . .

f (A) A1n

A f (A))

f (A) A2n T

.. .

. . .

f (A) Ann

Derivative of linear transformed input to function

Consider a function f : Rn R. Suppose we have a matrix A Rnm and a vector x Rm . We wish to compute x f (Ax). By the chain rule, we have f (Ax) xi
n

=
k=1 n

f (Ax) (Ax)k = (Ax)k xi f (Ax) aki = (Ax)k f (Ax).


n

n k=1

f (Ax) (T x) ak (Ax)k xi

= =
k=1 aT i

k f (Ax)aki
k=1

As such, x f (Ax) = AT f (Ax). Now, if we would like to get the second derivative of this function (third derivatives would be a little nice, but I do not like tensors), we have 2 f (Ax) xi xj = = = aT i From this, it is easy to see that
l=1 k=1 2

T a f (Ax) = xj i xj
n n

aki
k=1

f (Ax) (Ax)k

2 f (Ax) aki ali (Ax)k (Ax)l

f (Ax)aj
2

2 x f (Ax)

= AT

f (Ax)A.

Funky trace derivative


A trABA T

In this section, we prove that C = CAB + C T AB T .

In this bit, let us have AB = f (A), where f is matrix-valued.


T A trABA C T = A trf (A)A C T T = trf ()A C + trf (A) C = (AT C)T f () + ( T trf (A) T C)T = C T AB T + ( T tr T Cf (A))T = C T AB T + ((Cf (A))T )T = C T AB T + CAB

Symmetric Matrices and Eigenvectors

In this we prove that for a symmetric matrix A Rnn , all the eigenvalues are real, and that the eigenvectors of A form an orthonormal basis of Rn . First, we prove that the eigenvalues are real. Suppose one is complex: we have xT x = (Ax)T x = xT AT x = xT Ax = xT x. Thus, all the eigenvalues are real. Now, we suppose we have at least one eigenvector v = 0 of A. Consider a space W of vectors orthogonal to v. We then have that, for w W , (Aw)T v = wT AT v = wT Av = wT v = 0. Thus, we have a set of vectors W that, when transformed by A, are still orthogonal to v, so if we have an original eigenvector v of A, then a simple inductive argument shows that there is an orthonormal set of eigenvectors. To see that there is at least one eigenvector, consider the characteristic polynomial of A: X (A) = det(A I). The eld is algebraicly closed, so there is at least one complex root r, so we have that A rI is singular and there is a vector v = 0 that is an eigenvector of A. Thus r is a real eigenvalue, so we have the base case for our induction, and the proof is complete. 4

You might also like