This article uses another definition for vector and matrix calculus than the form often encountered within the field of estimation theory and pattern recognition. The resulting equations will therefore appear to be transposed when compared to the equations used in textbooks within these fields.

2. Notation
Let M(n,m) denote the space of real nm matrices with n rows and m columns, such matrices will be denoted using bold capital letters A, X, Y, etc. !n element of M(n,"), that is, a column vector, is denoted with a boldface lowercase letter a, x, y, etc. !n element of M(",") is a scalar, denoted with lowercase italic typeface a, t, x, etc. XT denotes matrix transpose, tr(X) is trace, and det(X) is the determinant. !ll functions are assumed to be of differentiability class C" unless otherwise noted. #enerally letters from first half of the alphabet (a, b, c, $) will be used to denote constants, and from the second half (t, x, y, $) to denote variables.

3. Vector calculus
%ain article &ector calculus 'ecause the space M(n,") is identified with the (uclidean space Rn and M(",") is identified with R, the notations developed here can accommodate the usual operations of vector calculus.

The tangent vector to a curve x R ) Rn is

The gradient of a scalar function f Rn ) R

The directional derivative of f in the direction of v is then

The pushforward or differential of a function f Rm ) Rn is described by the *acobian matrix

The pushforward along f of a vector v in Rm is

4. Matrix calculus
+or the purposes of defining derivatives of simple functions, not much changes with matrix spaces, the space of n-m matrices is isomorphic to the vector space Rnm..dubious discuss/ The three derivatives familiar from vector calculus have close analogues here, though beware the complications that arise in the identities below.

The tangent vector of a curve F R ) M(n,m) The gradient of a scalar function f M(n,m) ) R

0otice that the indexing of the gradient with respect to X is transposed as compared with the indexing of X. The directional derivative of f in the direction of matrix Y is given by

The differential or the matrix derivative of a function F M(n,m) ) M(p,q) is an element of M(p,q) M(m,n), a fourth1rank tensor (the reversal of m and n here indicates the dual space of M(n,m)). 2n short it is an m-n matrix each of whose entries is a p-q matrix..citation needed/

and note that each 3F43Xi,j is a p-q matrix defined as above. 0ote also that this matrix has its indexing transposed, m rows and n columns. The pushforward along F of an n-m matrix Y in M(n,m) is then

as formal block matricies. 0ote that this definition encompasses all of the preceding definitions as special cases.

!ccording to *an 5. %agnus and 6ein7 0eudecker, the following notations are both unsuitable, as the determinants of the resulting matrices would have 8no interpretation8 and 8a useful chain rule does not exist8 if these notations are being used ."/ ". 9. The *acobian matrix, according to %agnus and 0eudecker, ."/ is


5. I entities
0ote that matrix multiplication is not commutative, so in these identities, the order must not be changed.

!"ain rule# 2f $ is a function of Y which in turn is a function of X, and these are all column vectors, then.citation needed/

%ro uct rule#2n all cases where the derivatives do not involve tensor products (for example, Y has more than one row and X has more than one column),.citation

&. 'xa()les &. 1. *erivative of linear functions

This section lists some commonly used vector derivative formulas for linear equations evaluating to a vector.

&. 2. *erivative of +ua ratic functions

This section lists some commonly used vector derivative formulas for quadratic matrix equations evaluating to a scalar.

5elated to this is the derivative of the (uclidean norm

&. 3. *erivative of (atrix traces

This section shows examples of matrix differentiation of common trace equations.

&. 4. *erivative of (atrix eter(inant

,. Relation to ot"er erivatives

The matrix derivative is a convenient notation for keeping track of partial derivatives for doing calculations. The +r:chet derivative is the standard way in the setting of functional analysis to take derivatives with respect to vectors. 2n the case that a matrix function of a matrix is +r:chet differentiable, the two derivatives will agree up to translation of notations. !s is the case in general for partial derivatives, some formulae may extend under weaker analytic conditions than the existence of the derivative as approximating linear mapping.

-. .sa/es
%atrix calculus is used for deriving optimal stochastic estimators, often involving the use of Lagrange multipliers. This includes the derivation of

;alman filter <iener filter (xpectation1maximi7ation algorithm for #aussian mixture

0. Alternatives
The tensor index notation with its (instein summation convention is very similar to the matrix calculus, except one writes only a single component at a time. 2t has the advantage that one can easily manipulate arbitrarily high rank tensors, whereas tensors of rank higher than two are quite unwieldy with matrix notation. 0ote that a matrix can be considered simply a tensor of rank two.

11. 2ee also

=erivative (generali7ations)

