Stat513 l12

STAT 513: Lecture 12
Mostly linear algebra

(there are more things in heaven and earth)
There are more things in heaven and earth
> -0.035+0.025+0.01
[1] -1.734723e-18
> -0.035+0.01+0.025
[1] 0
> sum(-0.035,0.025,0.01)
[1] -1.734723e-18
> sum(-0.035,0.01,0.025)
[1] 0
> sum(c(-0.035,0.025,0.01))
[1] -1.734723e-18
> sum(c(-0.035,0.01,0.025))
[1] -1.734723e-18
> mean(c(-0.035,0.025,0.01))
[1] -5.779588e-19
> mean(c(-0.035,0.01,0.025))
[1] -5.776765e-19
> sum(c(-0.035,0.01,0.025))/3
[1] -5.782412e-19
> sum(c(-0.035,0.025,0.01))/3
[1] -5.782412e-19
1
And not just in R
Python 3.7.2 (default, Feb 12 2019, 08:15:36)
[Clang 10.0.0 (clang-1000.11.45.5)] on darwin
Type "help", "copyright", "credits" or "license" for more information
>>> -0.035+0.025+0.01
-1.734723475976807e-18
>>> -0.035+0.01+0.025
0.0
>>> sum(-0.035,0.025, 0.01)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: sum expected at most 2 arguments, got 3
>>> sum([-0.035, 0.025, 0.01])
-1.734723475976807e-18
>>> sum([-0.035, 0.01, 0.025])
0.0
>>>
>>> mean([-0.035,0.01, 0.025])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name ’mean’ is not defined
2
Numerics can be treacherous
Time series: let Wt be a “white noise” with standard normal
distribution: Wt are uncorrelated (and thus independent) random
variables with mean 0 and variance 1. An AR(1) process
Yt = ϕYt−1 + Wt is “stationary” (let us say: stable), if |ϕ| < 1.
> tser=rep(0,100)
> for (k in 2:100) tser[k] = (1/2)*tser[k-1]+rnorm(1) ## phi=1/2
> plot.ts(tser)
2
1
tser
0
-1
-2
-3
0 20 40 60 80 100
Time
3
On the other hand
On the other hand, the AR(1) process with ϕ = 2 is “explosive”
> tser=rep(0,100)
> for (k in 2:100) tser[k] = 2*tser[k-1]+rnorm(1) ## phi=2
> plot.ts(tser)
4e+29
2e+29
tser
0e+00
0 20 40 60 80 100
Index
But it is still stationary, as it is distributionally equivalent to an

AR(1) process with ϕ = 1/2 in “reverse time”
1 1
Yt = ϕYt−1 + Wt is equivalent to Yt−1 = Yt − Wt
ϕ ϕ
4
Really?
> set.seed(007)
> inno=rnorm(1000)
> tser=rep(0,1000)
> for (k in 1000:2) tser[k-1] = (1/2)*tser[k]-(1/2)*inno[k]
> tss=tser[1:100]
> (tss[1:99]+(1/2)*inno[2:100])/tss[2:100]
[1] 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5
[16] 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5
[31] 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5
[46] 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5
[61] 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5
[76] 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5
[91] 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5
> (tss[2:100]-inno[2:100])/tss[1:99]
[1] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
[32] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
[63] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
[94] 2 2 2 2 2 2
> plot.ts(tss)
5
Really?
1.0
0.0
tss
-1.0
-2.0
0 20 40 60 80 100
Index
So: when I start with Y1 as above, set Yt = Yt−1 + Wt, with the
same Wt I used above, I should get the same thing, right?
> tser=rep(0,100)
> tser[1]=tss[1]
> for (k in 2:100) tser[k] = 2*tser[k-1]+inno[k]
> plot.ts(tser)
6
How come???
2.0e+13
tser
1.0e+13
0.0e+00
0 20 40 60 80 100
Index
Let us investigate... Plotting only half of them:
7
Hm... [1:50]
forward
1.0
0.0
tser[1:50]
-1.0
-2.0
0 10 20 30 40 50
Index
backward
1.0
0.0
tss[1:50]
-1.0
-2.0
0 10 20 30 40 50
Index
8
A bit more... [1:57]
forward
2
1
tser[1:57]
0
-1
-2
0 10 20 30 40 50
Index
backward
1.0
0.0
tss[1:57]
-1.0
-2.0
0 10 20 30 40 50
Index
9
And yet a bit more... [1:60]
forward
20
15
tser[1:60]
10
5
0
0 10 20 30 40 50 60
Index
backward
1.0
0.0
tss[1:60]
-1.0
-2.0
0 10 20 30 40 50 60
Index
10
A tale of expert code I: floating point arithmetics
Floating-point arithmetics: numbers are represented as
base ∗ 10exponent - which has inevitable consequences
> 0.000001*1000000
[1] 1
> x=0; for (k in (1:1000000)) x=x+0.000001
> x
[1] 1
> x-1
[1] 7.918111e-12
> x=1000000; for (k in 1:1000000) x=x+0.000001

> x
[1] 1000001
> x-1000000
[1] 1.000008
> x-1000001
[1] 7.614493e-06
The moral here is: with floating-point arithmetics, adding works well
if the added numbers are about of the same magnitude
11
A better algorithm thus does it
> x=0; for (k in (1:1000000)) x=x+0.000001; x=x+1000000
> x
[1] 1000001
> x-1000000
[1] 1
> x-1000001
[1] 0
Yeah, but what to do in general? The solution seems to be: use
addition programmed by experts
> sum
function (..., na.rm = FALSE) .Primitive("sum")
> x=sum(c(1000000,rep(0.000001,1000000)))
> x
[1] 1000001
> x-1000000
[1] 1
> x-1000001
[1] -2.561137e-09
12
Vectorization alone does not do it
> x=rep(1,1000001) %*% c(1000000,rep(0.000001,1000000))

> x-1000000
[,1]
[1,] 1.000008
> x-1000001
[,1]
[1,] 7.614493e-06
> x=crossprod(rep(1,1000001),c(1000000,rep(0.000001,1000000)))
> x-1000000
[,1]
[1,] 1.000008
> x-1000001
[,1]
[1,] 7.614493e-06
13
A tale of expert code II: never invert a matrix...
The theory for a linear model y ∼ Xβ suggests that you obtain the
least squares estimates via the formula
b = (XTX)−1XTy
However, in computing you are never ever (well, every rule has an
exception, but still) supposed to do
b <- solve(t(X) %*% X) %*% t(X) %*% y
Doing alternatively
b <- solve(crossprod(X)) %*% crossprod(X, y)
does not really save it
14
... but rather solve (a system of) equations
It is much better to get b via solving the system of normal equations

(XTX)b = XTy
To this end,
b <- solve(crossprod(X), crossprod(X, y))
may work pretty well; but experts know that the best way is via a so-
called QR decomposition (MATLAB “backslash” operator), which
in R amounts to
b <- qr.solve(X, y)
This is correct - but many people do not need to know that much;
unless they are in certain special situations), they may just do
b <- coef(lm(y ~ X-1))
and it amounts to the same thing!
15
Showing the difference is, however, a bit intricate...
...because the numerics of R is very good...

The first attempt didn’t work
The second one will?
> library(Matrix)
> set.seed(007)
> A=as.matrix(Hilbert(7))
> AA=A
> for (k in 1:5) AA=rbind(AA,A+rnorm(49,0,0.000001))
> x=rnorm(7)
> bb=AA %*% x
> x1 = solve(crossprod(AA)) %*% crossprod(AA,bb)
> x2 = solve(crossprod(AA),crossprod(AA,bb))
> x3 = qr.solve(AA,bb)
16
So...
First, let us try this:
> sum((x1-x)^2)
[1] 9.795661e-10
> sum((x2-x)^2)
[1] 8.119665e-10
> sum((x3-x)^2)
[1] 7.313153e-22
This is only mildly convincing (and in fact, may be even other way
round in some versions)
But this one seems to stay:
> sum((bb - AA %*% x1)^2)
[1] 2.482263e-13
> sum((bb - AA %*% x2)^2)
[1] 3.111039e-20
> sum((bb - AA %*% x3)^2)
[1] 1.84273e-29
> sum((bb - AA %*% x)^2)
[1] 0
17
Vector and matrix algebra
* is for componentwise multiplication

(components better match!)
%*% vector/matrix multiplication
crossprod(A,B) ATB (uses dedicated algorithm)
crossprod(A) in particular ATA
rep() a repetition function, very flexible
solve(A, y) finds b such that Ab = y
solve(A) finds A−1 (if needed be)
c() concatenation of vectors, flexible too
matrix() setting up matrices
rbind(A,B) matrices are merged by rows (must match)
cbind(A,B) matrices are merged by columns (must match)
length() returns the length of a vector
dim() returns the dimension of a matrix
18
Type conversions
General format as.type
> qr.solve(X, y)
x
20733.83 -20728.85
> as.vector(qr.solve(X, y))
[1] 20733.83 -20728.85
> as.vector(coef(lm(y~X-1)))
[1] 20733.83 -20728.85
> as.vector(solve(crossprod(X), crossprod(X, y)))
[1] 20737.19 -20732.21
> as.vector(solve(t(X) %*% X) %*% t(X) %*% y)
[1] 20737.20 -20732.22
Note: in R, vectors are interpreted not rowwise or columnwise, but in
an “ambiguous manner”: whatever suits more for a multiplication to
succeed. In other words, the same square matrix can be multiplied by
the same vector from both sides: X %*% a or a %*% X - which creates
usually no problem, until we have an expression a %*% a which is
always a number, aTa for column vectors. If we want to obtain
aaT, a matrix, we need to write a %*% t(a)
19
Potpourri
> numeric(4)
[1] 0 0 0 0
> rep(0,4)
[1] 0 0 0 0
> rep(c(0,1),4)
[1] 0 1 0 1 0 1 0 1
> rep(c(0,1),c(3,2))
[1] 0 0 0 1 1
> X=matrix(0,nrow=2,ncol=2)
> X=matrix(1:4,nrow=2,ncol=2)
> X
[,1] [,2]
[1,] 1 3
[2,] 2 4
> as.vector(X)
[1] 1 2 3 4
> as.matrix(1:4)
[,1]
[1,] 1
[2,] 2
[3,] 3
[4,] 4
20
Finally, reminder
Inverse of a matrix should never be computed, unless:

- it is absolutely necessary to compute standard errors
- the number of right-hand sides is so much larger than n that the
extra cost is insignificant
(this one is based on the following: solving two systems, Ax = b1
and Ax = b2 costs exactly that much as solving one system Ax = b
by first calculating A−1 and then A−1b)
- the size of n is so small that the costs are irrelevant
(yeah, in the toy setting we don’t care)
(John F. Monahan, Numerical Methods of Statistics)
(remarks by I.M.)
21
Some reminders from linear algebra
Useful formulae: (AB)T = BTAT
det(AB) = det(A) det(B) det(AT) = det(A)
Useful definitions: we say that matrix A is
nonnegative definite (or positive semidefinite): xTAx > 0 for every x
positive definite: xTAx > 0 for every x 6= 0
The definitions imply that A is a square matrix; some automatically
require that it is also symmetric, so better check (in statistics it is
almost always symmetric matrices the definitions are applied to)
Useful habit in theory (albeit not observed by R in practice): consider
vectors as n × 1 columns (in statistics, it is always like this)
Useful caution: if a is an n × 1 vector, then aTa is a number (which
we did denote by kak22), but aaT is an n × n matrix. In general,
matrix multiplication is not commutative: AB is in general different
from BA
Useful principle: block matrices are multiplied in a same way as usual
matrices, only blocks are itself matrices, thus multiplied as such, and
hence the dimensions must match
Useful practice: check dimensions
22
Appendix: some Python again
Adding again
>>> 0.000001*1000000
1.0
>>> x=0
>>> for k in range(1000000): x=x+0.000001
>>> x
1.000000000007918
>>> x-1
7.918110611626616e-12
>>> x=1000000
>>> for k in range(1000000): x=x+0.000001
>>> x
1000001.0000076145
>>> x-1000000
1.00000761449337
>>> x-1000001
7.614493370056152e-06
23
Elementary arithmetics also no problem

>>> x=0
>>> for k in range(1000000): x=x+0.000001
>>> x=x+1000000
>>> x
1000001.0
>>> x-1000000
1.0
>>> x-1000001
0.0
24
Now, the code of the experts

>>> import numpy as np
>>> x=sum(np.concatenate(([1000000],np.repeat(0.000001,1000000))))
>>> x
1000001.0000076145
>>> x-1000000
1.00000761449337
>>> x-1000001
7.614493370056152e-06
25

Stat513 l12

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Stat513 l12

Uploaded by

Copyright:

Available Formats

STAT 513: Lecture 12

Mostly linear algebra

But it is still stationary, as it is distributionally equivalent to an

Let us investigate... Plotting only half of them:

> x=1000000; for (k in 1:1000000) x=x+0.000001

> x=rep(1,1000001) %*% c(1000000,rep(0.000001,1000000))

It is much better to get b via solving the system of normal equations

...because the numerics of R is very good...

* is for componentwise multiplication

Inverse of a matrix should never be computed, unless:

Python 3.7.2 (default, Feb 12 2019, 08:15:36)

Python 3.7.2 (default, Feb 12 2019, 08:15:36)

You might also like