Professional Documents
Culture Documents
Vector - Matrix Calculus
Vector - Matrix Calculus
In neural networks, we often encounter problems with analysis of several variables. Vector/Matrix calculus extends calculus of one variable into that of a vector or a matrix of variables. Vector Gradient: Let g(w) be a dierentiable scalar function of m variables, where
w = [w1, . . . , wm]T
Then the vector gradient of g(w) w.r.t. w is the m-dimensional vector of partial derivatives of g
g w1 g = g = w g = . . w g wm
2g
2g 2 w1
. .
2g wm w1
2g w1 wm . . 2g
2 wm
In this vector convention the columns of the Jacobian matrix are gradients of the corresponding components functions gi(w) w.r.t. the vector w.
2
gn w1 . . gn wm
Dierentiation Rules: The dierentiation rules are analogous to the case of ordinary functions:
Example: Consider
m
g(w) =
i=1
aiwi = aT w
where a is a constant vector. Thus, a1 g = . . w am aT w = a. w Example: Let w = (w1, w2, w3)T R3. g(w) = 2w1 + 5w2 + 12w3 = 2 5 12
w.
Example: Consider
m m
g(w) =
i=1 j=1
aij wiwj = wT Aw
. . m w a j=1 j mj +
m w a + j=1 j 1j
m wa i=1 i i1 m wa i=1 i im
which equals to A + AT .
Example: Let w = (w1, w2)T R2. g(w) = 3w1w1 + 2w1w2 + 6w2w1 + 5w2w2
5
w1 w2
3 2 6 5
w1 w2
g = (3w1w1 + 2w1w2 + 6w2w1 + 5w2w2) w1 w1 = 3 2w1 + 2w2 + 6w2 + 0 = 6w1 + 8w2 g = 0 + 2 w1 + 6w1 + 5 2w2 = 8w1 + 10w2 w2 and so in the vector notation wT Aw = w 3 2 6 5 = Hessian of g(w) is 2wT Aw wT Aw = ( ) 2 w w w = 23 2+6 6+2 25 = 6 8 8 10 .
6
w1 w2 6 8 8 10
+ w1 w2
3 6 2 5
w1 w2
Matrix Gradient: Consider a scalar valued function g(W) of the n m matrix W = {wij } (e.g. determinant of a matrix). The matrix gradient w.r.t W is a matrix of the same dimension as W consisting of partial derivatives of g(W) w.r.t. components of W:
g w11 g = . . W g wm1
g w1n . . g wmn
g(W) =
i=1 j=1
wij aiaj = aT Wa
i.e. assume that a is a constant vector, whereas W is a matrix of variables. Taking the gradient aT Wa = a a . Thus, in the w.r.t. to W yields w i j ij matrix form aT Wa = aaT W
7
Example: g(W) = 9w11 + 6w21 + 6w12 + 4w22 = 3 2 w11 w12 w21 w22 3 2
Thus we have g = (9w11 + 6w21 + 6w12 + 4w22) = 9 w11 w11 g = (9w11 + 6w21 + 6w12 + 4w22) = 6 w12 w12 g = (9w11 + 6w21 + 6w12 + 4w22) = 6 w21 w21 g = (9w11 + 6w21 + 6w12 + 4w22) = 4 w22 w22 hence g = W 9 6 6 4 = 3 2 3 2
Example: Let W be an invertible square matrix of dimension m with determinant, det(W). Then, det W = (WT )1 det W W Recall that 1 W1 = adj(W). det W In the above adj(W) is the adjoint of W: W11 Wm1 . adj(W) = . . . W1m Wmm
where Wij is a cofactor obtained by multiplying term (1)i+j by the determinant of a matrix obtained from W by removing the ith row and the j th column. Recall that the determinant of W can be also obtained using cofactors:
m
det W =
k=1
wik Wik .
9
In the above formula i denotes an arbitrary row. Now taking derivative of det(W) w.r.t. Wij gives det(W) = Wij wij but from the denition of the matrix gradient it follows that det(W) = adj(W)T W Using the formula for the inverse of W.
W1 =
We have
1 adj(W). det W
det(W) = (WT )1 det W W Homework: Prove that log |det(W)| 1 |det W| = = (WT )1. W |det W| W
10