OLS Estimation of Single Equation Models PDF

OLS ESTIMATION OF SINGLE EQUATION
MODELS
Structural Model:
y = 0 + 1 x1 + 2 x2 + + K xK + u
= x + u,
where, x1K = (x1 x2 xK ) with x1 = 1 (intercept) and
k1 =
.
...
Assumptions
1. We can obtain a random sample from the population, where the
sample observations {(xi, yi) : i = 1, 2, , N } are iid.
2. The population error has zero mean and is uncorrelated to the
regressors.
E(u) = 0, cov(xj , u) = 0, j = 1, 2, , K
(1)
Sufficient for (1) is the assumption [HW show it]

E(u|x1, x2, , xK ) = E(u|x) = 0
(2)
[Note:
i. An explanatory variable is called endogenous if it is correlated

with population error. An econometric model woth endogenous
explanatory variables is said to suffer from endogeneity.
ii. Usually endogeneity arises in one of the three ways:

(a) Omitted Variables: Suppose we cannot control for some explanatory variables or regressors in the structural model because we do not have data on them (they may not be enumerable at all). Let E(y|x, q) is the true population regression
function that is linear in all xs and q.
If we do not have
data on q, we may estimate E(y|x) (an estimable equation)
where q becomes part of u. Now if q and xj is correlated for

any j = 1, 2, , K it leads to endogeneity in the estimable
model.
(b) Measurement Error: Suppose we want to include a regressor
xK in the true structural model, but the data allows us to
observe only an imperfect measure of xK , namely xK [e.g.
true income versus reported income], where xK = xK + eK , eK
being measurement error. Depending on how xK and xK are
correlated, xK and u may be correlated if we use xK in place
of xK in the estimable model, leading to endogeneity.
(c) Simultaneity: Simultaneity arises when at least one explanatory
variable is determined simultaneously with the dependent variable of the equation. Let xK is determined partly as a function
of y. Then xK and u are generally correlated. E.g.- If quantity supplied is a dependent variable and price is an explanatory
variable, then the market clearing (or equilibrium) phenomenon
codetermines the values of quantity supplied and price as may
be observed in data. In this case xK (or price) is likely to be
endogenous.
]
Assumption (2) is stronger than what is required for deriving the
OLS . So we stick to (1):
asymptotic properties of
E(x0u) = 0
[OLS 1]
x1 u
0

x u
0
2

E
= ,
..
..
.
.
xK u
i.e. E(xj u) = 0 j = 1, , K.
Also as x1 = 1, Ex1u = 0 E(u) = 0 . Thus, E[xj E(xj )][u
E(u)] = 0 (remember E(xj ) is a population moment, hence constant).
cov(xj , u) = 0, j = 1, 2, , K .
3.
rank E(x0x) = K [OLS 2]
0
E(x x) = E
x1
x2
(x1x2 xK ) =
...
xK
2
Ex1
Ex x
2 1
...
ExK x1
Ex1x2
.....
Ex2
2
...
.....
...
ExK x2
.....
Ex1xk
Ex2xK
...
Ex2
K
KK
Since E(x0x) is symmetric and K K, full rank E(x0x) is positive

definite.
This condition implies that we are not replicating regressors. If, for
instance, x1 xK , then the first and last columns of of x0x is identical.
So rank(E(x0x)) < K. Also if the regressors are linearly dependent
[for instance if you include dummy variables for all categories] then
rank(E(x0x)) < K. Assumption OLS 2 precludes that possibility.
Identification of in Structural Model

In the context of linear models, identification of parameters of a
model implies that the parameters can be expressed in terms of population moments of observable variables.
Our model is, y = x + u
x 0 y = x0 x + x0 u
E(x0y) = E(x0x) + E(x0u)
By OLS 1, E(x0u) = 0
= [E(x0x)]1E(x0y), where OLS 2 ensures that [E(x0x)]1 exists.
Method of Moments
Replace population moments E(x0x) and E(x0y) with corresponding
sample moments to obtain the sample counterpart (or, estimator) of
the population parameter.
So,
OLS
= (N 1
N
X
N
X
xixi)1(N 1
i=1
xi yi )
i=1
1
P 2
x1i
P
x2ix1i
P 2
x2i
...
P
xKix1i
x1ix2i
...
xKix2i
.....
.....
...
.....
x1ixki
P
x2ixKi
...
P 2
xKi
x1iyi
x2iyi
...
P
xKiyi
The full data matrix analysis yields the same result.

Suppose the full data matrices are as follows:
x11
Let , XN K = 12
..
.
x21
.....
x22
...
.....
...
xK1
xK2
be the N-observation data

...
x1N x2N ..... xKN

matrix on regressors x1, x2, , xK , and
yN 1 =
able y.
y1
y2
be the N-observation data vector on dependent vari...
yN
OLS = (N 1 PN x0 xi)1(N 1 PN x0 yi) = (X0X)1X0y.

Then
i=1 i
i=1 i
Under assumption OLS 2, X0X is non-singular with probability apPN
0
0
p
proaching 1. This is because as N , [(1/N ) i=1 xixi] E(x x)
(the sample of size N approaching the entire population causing sample moment to converge to corresponding population moment in probability).
But E(x x) is non-singular.
P
0
0
Hence P [ N
x
x
X
X is
i=1 i i
non-singular] 1 as N .
PN
0
1
1
Hence by Corollary 1 of Asymptotic Theory, plim [(N
i=1 xixi) ] =
0
A1, where A = E(x x).
.
We have used the method of moments to derive this estimator
Why are we calling it Ordinary Least Square then?
The answer may be found by looking at property 8 of Conditional

Expectation Operator.
min
2 ].
E[(y
m(x))
mM
If (x) E(y|x) then is a solution to
The least square exercise, as we commonly
understand it, is a sample counterpart of this population problem:

min 1 PN
2 . In method of moments we did exactly the same
(y
x
)
i
i
i=1
N
thing: first, expressing in terms of population moments of x and

y and then replacing population moments with corresponding sample
moments. Thus in effect we found the sample counterpart of E(y|x),
or, 0 + 1x1 + 2x2 + + K xK as 0 + 1x1 + 2x2 + + K xK .
K1 is called least-square estimator of .
Hence
Consistency of OLS
OLS
= (N 1
N
X
xixi)1(N 1
i=1
+ (N 1
N
X
x i yi )
i=1
N
X
xixi)1(N 1
i=1
N
X
xiui).
i=1
By weak law of large numbers, (Theorem 1),

N 1
PN
0
0
p
x
u
E(x
u) = 0 by OLS 1.
i=1 i i
Hence by Slutskys Theorem (Lemma 4),

PN
0
1
1
1
plim OLS = + A 0 = (Remember, plim [(N

i=1 xixi) ] =
A1 and plim [(N 1
PN
0
x
i=1 iui)] = 0).
Note that if OLS1 or OLS2 fails, is not identified.

OLS is not necessarily unbiased under OLS1 and OLS2.
OLS = + (N 1 PN x0 xi)1(N 1 PN x0 ui) = + (X0X)1X0u,
i=1 i
i=1 i
where, uN 1 =
u1
u2
.
...
uN
OLS ) = + E[(X0X)1X0u] 6= in general.
But E(
However if we use the stronger assumption (2) instead of OLS1, i.e.,
OLS may be retrieved.
E(u|x) = 0 then unbiasedness of
OLS = + (X0X)1X0u.
|X) = + (X0X)1X0E(u|X) = + 0 = .
Now E(
) = E[E(
|X)] = E( ) = .
But E(
Asymptotic Inference Using OLS

Note that
OLS ) = (N 1 PN x0 xi)1(N 1/2 PN x0 ui).

N (
i=1 i
i=1 i
PN
0
p
1
1
We know that, (N
x
x
)
A1, so that
i=1 i i
PN
0
p
1 A1
1
x
x
)
(N
i=1 i i
, i.e.
PN
0
1
1 A1 = o (1).
(N
p
i=1 xixi)
0
Again, E(xiui) = 0, i = 1, 2, by OLS 1.

0
Also, {(xiui) : i = 1, 2, } is an iid sequence with zero mean and

each term is assumed to have a finite variance. Then by Central
Limit Theorem,
PN
0
0
0
d
1/2
x
u
N
(0,
B),
where
B
=
var(x
u
)
=
E(x
N
KK
i=1 i i
i i
i ui u i xi ) =
0
2 x0 x) for any i.
x
x
)
=
E(u
E(u2
i i i
This means N 1/2

Then
PN
0
x
i=1 iui = Op(1), by Lemma 5.
N (OLS ) = [A1 + op(1)]Op(1) = A1(N 1/2
op(1)Op(1) = A1(N 1/2
PN
0
x
i=1 iui) +
PN
0
x
i=1 iui) + op(1), by Lemma 2.
Assumption 4
4.
E(u2x x) = 2E(x x), where 2 E(u2) [OLS 3]
Since E(u) = 0, 2 E(u2) [E(u)]2 = var(u). In other words, OLS

3 states that the variance of u, viz E(u2) = 2 is constant and hence
0
independent of x, and thus can be taken out of E(u2x x).

OLS 3 is the weak homoskedasticity assumption. It means that
u2 is uncorrelated with xj , x2
j and xj xk , j, k = 1, ..., K.
Sufficient for OLS 3 is the assumption E(u2|x) = 2 which is
equivalent to var(u|x) = 2 when E(u|x) = 0.
Asymptotic Normality of OLS
OLS ) =
N (
PN
0
a
1
1/2
2 A1 ).
A (N
x
u
)+o
(1),
it
follows
that
N
(
N
(0,
p
i
OLS
i=1 i
From OLS 1 - OLS 3, and the fact that
Proof : N 1/2
PN
0
d
x
u
N (0, B).
i
i=1 i
Hence, by Corollary 2, A1(N 1/2

Now,
PN
0
d
1 BA1 ).
x
u
)
N
(0,
A
i
i=1 i
OLS ) A1(N 1/2 PN x0 ui) = op(1), i.e.,

N (
i=1 i
) A1(N 1/2
OLS
N (
PN
0
p
x
u
)
0.
i=1 i i
Hence by Lemma 7 (Asymptotic Equivalence),
d
OLS )
N (
N (0, A1BA1).
But under OLS 3, B = 2E(x0x) = 2A.
Thus,
a
OLS )
N (
N (0, 2A1).
OLS as approximately normal

The above result allows us to treat
2 A1
2 [E(x0 x)]1
with mean and variance-covariance matrix N , i.e.,

.
N
Usual estimator of 2 is
2 = NRSS
K , where RSS =
).
squared OLS residuals: ui = yi xi
PN
2 (sum of
u
i
i=1
It can be shown that

2 is consistent. [H.W. show it]
PN
2
2
0
1
0
Replace with
and E(x x) by the sample average N
i=1 xixi =
1 (X0 X).
N
1 =
OLS ) =

Thus Avar(
2N (X0X)1 N
2(X0X)1.
Hence under OLS 1 - OLS 3, usual OLS standard errors, t-statistics

and F -statistics are also asymptotically valid (the F -statistic being a
degrees of freedom adjusted Wald-statistic for testing linear restriction of the form R = r.)
[See undergraduate notes for derivation of t and F -stats by distributions of quadratic forms.]
Violation of CLRM Assumptions

Suppose OLS 3 does not hold (Heteroskedasticity)
We have already shown that,
d
OLS )
N (
N (0, A1BA1).
OLS is
In other words, asymptotic variance of
1 1
Avar( OLS ) = A BA1,

N
where AKK = E(x x)

0
2
and BKK = var(x u) = E(u x x).
PN
0
1
A consistent estimator of A is (N
i=1 xixi).
What is a consistent estimator of B?
By Law of Large Numbers (Theorem 1), N 1
PN
p
2 x0 x
2 x0 x) =
u
E(u
i=1 i i i
B.
OLS .
As ui cannot be observed, replace ui by OLS residual u
i = yi xi
White (1980, Econometrica) proves the following.
= N 1
White (1980): A consistent estimator of B is B
PN
2 x0 x .
u
i=1 i i i
Proof: The Proof Consists of several parts.

Part I:
)
u
i = ui xi(
)
xi u
i = xiui xixi(
(1)
Transposing,
0 0
) xi x i
u
ixi = uix (
(2)
Multiplying (1) by (2),

0
0
2
2
u
i xixi = ui xixi
)0 x i x i
ui x i (
0
)xi
ui x i x i (
0
0
)(
)0 x i x i
+ x i x i (
N
N
N
X
X
0
0
0
0
1 X
1
1
2
2
0
u
x xi =
u x xi
u i x i ( ) x i x i
N i=1 i i
N i=1 i i
N i=1
N
0
1 X
)xi
u i x i x i (
N i=1
N
0
0
1 X
0
+
xixi( )( ) xixi
N i=1
(3)
Part II: A digression on Matrix Algebra

The vec operator: stacking columns of a matrix to form a vector.
a
Thus, vec(A) = vec 11
a21
a11
a12
a21
a12
a22
a22
vec(ABC) = (C0A)vec(B) , where is the Kronecker or Direct
a11B
a B
Product: AKLBM N = CKM LN = 21
...
aK1B
a12B
a22B
...
...
aK2B
a1LB
a2LB
...
aKLB
To prove: vec(ABC) = (C0A)vec(B).

We prove it using an example.
b1

a13
, B31 = b2 , C12 =
c1
a23
b3
a
Let A23 = 11
a21
a12
a22
c2 ,
c (a b + a12b2 + a13b3)
ABC = 1 11 1
c1(a21b1 + a22b2 + a23b3)
c2(a11b1 + a12b2 + a13b3)

c2(a21b1 + a22b2 + a23b3)
c1(a11b1 + a12b2 + a13b3)
c (a b + a b + a b )
22 2
23 3
Therefore, LHS = vec(ABC) = 1 21 1
c2(a11b1 + a12b2 + a13b3)
c2(a21b1 + a22b2 + a23b3)
Now RHS=
c
a
(C 0 A)vec(B) = ( 1 11
c2
a21
b1
a13
)vec b2
a23
b3
a12
a22
c1a11
c a
= 1 21
c2a11
c2a21
c1a12
c1a22
c2a12
c2a22
c1a13
c1a23
c2a13
c2a23
b1
b2
b3
a11b1c1 + a12b2c1 + a13b3c1
a b c +a b c +a b c
22 2 1
23 3 1
= 21 1 1
a11b1c2 + a12b2c2 + a13b3c2
a21b1c2 + a22b2c2 + a23b3c2
c1(a11b1 + a12b2 + a13b3)
c (a b + a b + a b3)
22 2
23
= 1 21 1
c2(a11b1 + a12b2 + a13b3)
c2(a21b1 + a22b2 + a23b3)
Hence LHS=RHS.
Part III:
Consider 3rd term on RHS of eqn 3.
1 PN vec(u x0 x (
i
i )xi)
N
i=1
1 PN (u x0 x0 x )vec(
).
=N
i=1 i i
i i
[ui is a scalar.
)K1 = B and
Treat (xixi)KK = A and (
xi1K = C]
p
p
)
)
Now, clearly (
0 vec(
0.
uix1i
0
0
uix2i
Again, uixi xixi =
...
uixKi
2
x1i
x x
2i 1i
...
xKix1i
x1ix2i
x2
2i
.....
...
.....
...
xKix2i
.....
x1ixKi
x2ixKi
...
x2
Ki
uix1i
=
uix2i
u x
i Ki
2
x1i
x x
2i 1i
...
x1ix2i
.....
x2
2i
...
.....
...
xKix1i
xKix2i
.....
2
x1i
x x
2i 1i
...
x1ix2i
.....
x2
2i
...
.....
...
xKix1i
xKix2i
...
.....
2
x1i
x x
2i 1i
...
x1ix2i
.....
x2
2i
...
.....
...
xKix2i
.....
xKix1i
x1ixKi
x2ixKi
...
xKi
x1ixKi
x2ixKi
...
x2
Ki
x1ixKi
x2ixKi
...
x2
K 2 K
Ki
P
PN
0
0
1
1
The terms of the final N (uixixixi) matrix are of the form N i=1 uix3
ki
P
P
1
N u x2 x or 1
N
or N
i=1 i ki ji
N i=1 uixkixji xli, j, k, l = 1, . . . , K.
2
Assume corresponding population moments, i.e., E(x3
j u), E(xk xj u),
or E(xj xk xl u) exist and are finite.
PN
0
0
p
1
Then by WLLN, N i=1(uixi xixi) E(), the corresponding popu-
lation moments matrix.

Consequently, by Lemma 1 of Asymptotic Theory, the 3rd term of
RHS of (3) is Op(1)op(1) = op(1).
The 2nd term can be treated
similarly, as it is only a transpose of the 3rd term.
Now consider 4th term on RHS of equation (3).

1 PN vec(x0 x (
0 x0 x )
)(
)
i i
i i
N i=1
1 PN (x0 x x0 x )vec((
)(
)0).
=N
i=1 i i
i i
p
p

)(
)0 )
Again,
0, thus vec((
0, by Lemma 2.
Also, xixi xixi =
x2
1i
x21i
x2ix1i
..
.
xKix1i
x1ix2i
x22i
...
xKix2i
x2ix1i
.
..
.
.
xkix1i
.
..
.
.
.
.
...
.
...
.
.
...
.
...
...
...
...
...
...
...
...
x1ixki
x2ixKi
...
x2Ki
x1ix2i
.
...
.
..
.
.
x22i
.
..
.
.
.
.
...
.
.
.
...
.
...
...
...
...
...
...
...
...
.
...
...
...
...
...
.
...
xkix2i
.
..
.
.
.
.
...
.
x1ixki
.
...
...
x2ixki
.
..
.
.
.
.
...
.
...
...
...
...
.
.
...
.
...
...
...
...
...
.
..
.
.
...
...
...
...
...
.
...
x2ki
.
..
.
.
.
.
...
.
...
...
...
...
...
...
...

.
1 PN (x0 x x0 x ) matrix are of the form

The terms of the final N
i=1 i i
i i
1 PN x4 , 1 PN x3 x , 1 PN x2 x2 , 1 PN x2 x x , 1 PN x x x x ,
N i=1 ji N i=1 ji ki N i=1 ji ki N i=1 ji ki li N i=1 ji ki li mi
j, k, l, m = 1, . . . , K.
2 x ), E(x3 x ),
x
Assume corresponding population moments, i.e., E(x2
j k
j k l
2 2
E(x4
j ), E(xi xi ), E(xj xk xl xm) exist and are finite.
PN
0
0
p
1
Hence by WLLN, N i=1(xixi xixi) E(), the corresponding pop-
ulation moments matrix.

Consequently, by Lemma 1, the 4th term is also Op(1)op(1) = op(1).
1 PN vec(u x0 x (
1 PN vec(x0 x (
)(
Note also, if N
)x
)
or
i i i
i
i=1
i i
N i=1
0
0
) xixi) are op(1), then so also are the original expressions in RHS of
P
0
)xi) or 1 PN (x0 xi(
)(
equation (3), viz., 1 N (uix xi(

N
i=1
i=1
)0xixi), since the vec operator does nothing but stack the columns
of the original matrices into vectors.
PART IV: Finally, from equation (3),
N
N
X
0
0
1 X
1
2
2
u
x xi =
u x xi + op(1).
N i=1 i i
N i=1 i i
So
N
N
X
0
0
1 X
1
p
2
2
u
i xixi
ui xi x i .
N i=1
N i=1
We already know that,

N
0
0
1 X
p
2
2
u x xi E(u x x) = B, by WLLN.
N i=1 i i
Thus,
N
0
1 X
p
2
u
x xi B.
N i=1 i i
OLS
Hence the heteroskedasticity-robust-variance-covariance matrix of
is:
1B
A
1
A
)robust =

Avar(
N

N
1 X
0
1
x i xi
N
N i=1
N
X
N
1 X
N i=1
0
2
u
i xixi
N
1 X
N i=1
1
0
x i xi
N
X
0
2
0
1
0
1
u
i xixi (X X) , where
xixi = X0X.
= (X X)
i=1
i=1
This matrix is also often called the sandwich matrix because of its
form.
OLS ) is obtained, we can
Once heteroskedasticity-consistent varcov(
OLS , by
also get the heteroskedasticity-robust standard errors of
)robust.

taking square roots of diagonal terms of Avar(
Once robust standard errors are obtained, t and F statistics can be
computed in the usual way (robust t or F stats).
However under the heteroskedasticity, F -stats are usually not valid
even asymptotically. So instead Wald-stats should be employed.
H 0 : R(QK) (K1) = r(Q1), where rank(R) = Q K.
r)0(RVR
0)1(R
The heteroskedasticity robust Wald stat is W = (R
a 2
)robust. W
= Avar(

r), where V
.
Q
So far, we were proceeding with weak exogeneity assumption: OLS

1. If we had assumed strong exogeneity, i.e., E(u|x) = 0 then there
exists another solution to violation of OLS 3, i.e., heteroskedasticity.
In this case, i,e, E(u|x) = 0 if OLS 3 fails, then we can specify a
model for var(y|x), estimate the model and apply generalized least
square (GLS).
For observation i, yi and each element of xi are divided by an estimate
of the conditional standard deviation [var(yi|xi)]1/2. OLS applied to
GLS . GLS is a special form

this transformed (weighted) data gives
of weighted least squares (WLS).
In modern econometrics, it is however a more popular approach to
and heteroskedasticity correction for
stick to OLS estimate for
, viz., Avar(
OLS )robust.

the estimated variance-covariance matrix of
This latter matrix and consequent standard-errors are then used for
testing.
Note that robust standard errors are valid even when OLS 3 holds
)OLS ) simplifies to 2(X0X)1). So this is an easier

(only then Avar(
approach.
GLS may be avoided for other reasons.
1. GLS leads to efficiency gain only when the model for var(y|x) is
correct. So it requires a lot of information.
2. Finite sample properties of Feasible GLS are usually not known
except for simplest cases.
3. Under weak exogeneity assumption [OLS 1] GLS is generally inconsistent if E(u|x) 6= 0.
[See undergraduate notes for treatment of GLS].

OLS Estimation of Single Equation Models PDF

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

OLS Estimation of Single Equation Models PDF

Uploaded by

Copyright:

Available Formats

OLS ESTIMATION OF SINGLE EQUATION

Sufficient for (1) is the assumption [HW show it]

i. An explanatory variable is called endogenous if it is correlated

ii. Usually endogeneity arises in one of the three ways:

data on q, we may estimate E(y|x) (an estimable equation)

where q becomes part of u. Now if q and xj is correlated for

rank E(x0x) = K [OLS 2]

Since E(x0x) is symmetric and K K, full rank E(x0x) is positive

Identification of in Structural Model

The full data matrix analysis yields the same result.

be the N-observation data

x1N x2N ..... xKN

be the N-observation data vector on dependent vari...

OLS = (N 1 PN x0 xi)1(N 1 PN x0 yi) = (X0X)1X0y.

But E(x x) is non-singular.

The answer may be found by looking at property 8 of Conditional

If (x) E(y|x) then is a solution to

The least square exercise, as we commonly

understand it, is a sample counterpart of this population problem:

thing: first, expressing in terms of population moments of x and

By weak law of large numbers, (Theorem 1),

Hence by Slutskys Theorem (Lemma 4),

plim OLS = + A 0 = (Remember, plim [(N

Note that if OLS1 or OLS2 fails, is not identified.

OLS = + (N 1 PN x0 xi)1(N 1 PN x0 ui) = + (X0X)1X0u,

Asymptotic Inference Using OLS

OLS ) = (N 1 PN x0 xi)1(N 1/2 PN x0 ui).

Again, E(xiui) = 0, i = 1, 2, by OLS 1.

Also, {(xiui) : i = 1, 2, } is an iid sequence with zero mean and

This means N 1/2

N (OLS ) = [A1 + op(1)]Op(1) = A1(N 1/2

op(1)Op(1) = A1(N 1/2

E(u2x x) = 2E(x x), where 2 E(u2) [OLS 3]

Since E(u) = 0, 2 E(u2) [E(u)]2 = var(u). In other words, OLS

independent of x, and thus can be taken out of E(u2x x).

Hence, by Corollary 2, A1(N 1/2

OLS ) A1(N 1/2 PN x0 ui) = op(1), i.e.,

Hence by Lemma 7 (Asymptotic Equivalence),

OLS as approximately normal

with mean and variance-covariance matrix N , i.e.,

It can be shown that

Hence under OLS 1 - OLS 3, usual OLS standard errors, t-statistics

Violation of CLRM Assumptions

Avar( OLS ) = A BA1,

where AKK = E(x x)

What is a consistent estimator of B?

By Law of Large Numbers (Theorem 1), N 1

Proof: The Proof Consists of several parts.

Multiplying (1) by (2),

Part II: A digression on Matrix Algebra

vec(ABC) = (C0A)vec(B) , where is the Kronecker or Direct

Product: AKLBM N = CKM LN = 21

To prove: vec(ABC) = (C0A)vec(B).

c2(a11b1 + a12b2 + a13b3)

c1(a11b1 + a12b2 + a13b3)

c2(a11b1 + a12b2 + a13b3)

c2(a21b1 + a22b2 + a23b3)

a11b1c1 + a12b2c1 + a13b3c1

a11b1c2 + a12b2c2 + a13b3c2

a21b1c2 + a22b2c2 + a23b3c2

c1(a11b1 + a12b2 + a13b3)

c2(a11b1 + a12b2 + a13b3)