Download as pdf or txt
Download as pdf or txt
You are on page 1of 26

STAT 513: Lecture 12

Mostly linear algebra


(there are more things in heaven and earth)
There are more things in heaven and earth
> -0.035+0.025+0.01
[1] -1.734723e-18
> -0.035+0.01+0.025
[1] 0
> sum(-0.035,0.025,0.01)
[1] -1.734723e-18
> sum(-0.035,0.01,0.025)
[1] 0
> sum(c(-0.035,0.025,0.01))
[1] -1.734723e-18
> sum(c(-0.035,0.01,0.025))
[1] -1.734723e-18
> mean(c(-0.035,0.025,0.01))
[1] -5.779588e-19
> mean(c(-0.035,0.01,0.025))
[1] -5.776765e-19
> sum(c(-0.035,0.01,0.025))/3
[1] -5.782412e-19
> sum(c(-0.035,0.025,0.01))/3
[1] -5.782412e-19

1
And not just in R
Python 3.7.2 (default, Feb 12 2019, 08:15:36)
[Clang 10.0.0 (clang-1000.11.45.5)] on darwin
Type "help", "copyright", "credits" or "license" for more information
>>> -0.035+0.025+0.01
-1.734723475976807e-18
>>> -0.035+0.01+0.025
0.0
>>> sum(-0.035,0.025, 0.01)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: sum expected at most 2 arguments, got 3
>>> sum([-0.035, 0.025, 0.01])
-1.734723475976807e-18
>>> sum([-0.035, 0.01, 0.025])
0.0
>>>
>>> mean([-0.035,0.01, 0.025])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name ’mean’ is not defined

2
Numerics can be treacherous
Time series: let Wt be a “white noise” with standard normal
distribution: Wt are uncorrelated (and thus independent) random
variables with mean 0 and variance 1. An AR(1) process
Yt = ϕYt−1 + Wt is “stationary” (let us say: stable), if |ϕ| < 1.
> tser=rep(0,100)
> for (k in 2:100) tser[k] = (1/2)*tser[k-1]+rnorm(1) ## phi=1/2
> plot.ts(tser)
2
1
tser

0
-1
-2
-3

0 20 40 60 80 100

Time

3
On the other hand
On the other hand, the AR(1) process with ϕ = 2 is “explosive”
> tser=rep(0,100)
> for (k in 2:100) tser[k] = 2*tser[k-1]+rnorm(1) ## phi=2
> plot.ts(tser)
4e+29
2e+29
tser

0e+00

0 20 40 60 80 100

Index

But it is still stationary, as it is distributionally equivalent to an


AR(1) process with ϕ = 1/2 in “reverse time”
1 1
Yt = ϕYt−1 + Wt is equivalent to Yt−1 = Yt − Wt
ϕ ϕ
4
Really?
> set.seed(007)
> inno=rnorm(1000)
> tser=rep(0,1000)
> for (k in 1000:2) tser[k-1] = (1/2)*tser[k]-(1/2)*inno[k]
> tss=tser[1:100]
> (tss[1:99]+(1/2)*inno[2:100])/tss[2:100]
[1] 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5
[16] 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5
[31] 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5
[46] 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5
[61] 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5
[76] 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5
[91] 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5
> (tss[2:100]-inno[2:100])/tss[1:99]
[1] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
[32] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
[63] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
[94] 2 2 2 2 2 2
> plot.ts(tss)

5
Really?

1.0
0.0
tss

-1.0
-2.0

0 20 40 60 80 100

Index

So: when I start with Y1 as above, set Yt = Yt−1 + Wt, with the
same Wt I used above, I should get the same thing, right?
> tser=rep(0,100)
> tser[1]=tss[1]
> for (k in 2:100) tser[k] = 2*tser[k-1]+inno[k]
> plot.ts(tser)

6
How come???

2.0e+13
tser

1.0e+13
0.0e+00

0 20 40 60 80 100

Index

Let us investigate... Plotting only half of them:

7
Hm... [1:50]
forward

1.0
0.0
tser[1:50]

-1.0
-2.0

0 10 20 30 40 50

Index

backward
1.0
0.0
tss[1:50]

-1.0
-2.0

0 10 20 30 40 50

Index

8
A bit more... [1:57]
forward

2
1
tser[1:57]

0
-1
-2

0 10 20 30 40 50

Index

backward
1.0
0.0
tss[1:57]

-1.0
-2.0

0 10 20 30 40 50

Index

9
And yet a bit more... [1:60]
forward

20
15
tser[1:60]

10
5
0

0 10 20 30 40 50 60

Index

backward
1.0
0.0
tss[1:60]

-1.0
-2.0

0 10 20 30 40 50 60

Index

10
A tale of expert code I: floating point arithmetics
Floating-point arithmetics: numbers are represented as
base ∗ 10exponent - which has inevitable consequences
> 0.000001*1000000
[1] 1
> x=0; for (k in (1:1000000)) x=x+0.000001
> x
[1] 1
> x-1
[1] 7.918111e-12

> x=1000000; for (k in 1:1000000) x=x+0.000001


> x
[1] 1000001
> x-1000000
[1] 1.000008
> x-1000001
[1] 7.614493e-06
The moral here is: with floating-point arithmetics, adding works well
if the added numbers are about of the same magnitude

11
A better algorithm thus does it
> x=0; for (k in (1:1000000)) x=x+0.000001; x=x+1000000
> x
[1] 1000001
> x-1000000
[1] 1
> x-1000001
[1] 0
Yeah, but what to do in general? The solution seems to be: use
addition programmed by experts
> sum
function (..., na.rm = FALSE) .Primitive("sum")

> x=sum(c(1000000,rep(0.000001,1000000)))
> x
[1] 1000001
> x-1000000
[1] 1
> x-1000001
[1] -2.561137e-09

12
Vectorization alone does not do it

> x=rep(1,1000001) %*% c(1000000,rep(0.000001,1000000))


> x-1000000
[,1]
[1,] 1.000008
> x-1000001
[,1]
[1,] 7.614493e-06
> x=crossprod(rep(1,1000001),c(1000000,rep(0.000001,1000000)))
> x-1000000
[,1]
[1,] 1.000008
> x-1000001
[,1]
[1,] 7.614493e-06

13
A tale of expert code II: never invert a matrix...

The theory for a linear model y ∼ Xβ suggests that you obtain the
least squares estimates via the formula
b = (XTX)−1XTy

However, in computing you are never ever (well, every rule has an
exception, but still) supposed to do
b <- solve(t(X) %*% X) %*% t(X) %*% y

Doing alternatively
b <- solve(crossprod(X)) %*% crossprod(X, y)
does not really save it

14
... but rather solve (a system of) equations

It is much better to get b via solving the system of normal equations


(XTX)b = XTy

To this end,
b <- solve(crossprod(X), crossprod(X, y))
may work pretty well; but experts know that the best way is via a so-
called QR decomposition (MATLAB “backslash” operator), which
in R amounts to
b <- qr.solve(X, y)

This is correct - but many people do not need to know that much;
unless they are in certain special situations), they may just do
b <- coef(lm(y ~ X-1))
and it amounts to the same thing!

15
Showing the difference is, however, a bit intricate...

...because the numerics of R is very good...


The first attempt didn’t work
The second one will?
> library(Matrix)
> set.seed(007)
> A=as.matrix(Hilbert(7))
> AA=A
> for (k in 1:5) AA=rbind(AA,A+rnorm(49,0,0.000001))
> x=rnorm(7)
> bb=AA %*% x
> x1 = solve(crossprod(AA)) %*% crossprod(AA,bb)
> x2 = solve(crossprod(AA),crossprod(AA,bb))
> x3 = qr.solve(AA,bb)

16
So...
First, let us try this:
> sum((x1-x)^2)
[1] 9.795661e-10
> sum((x2-x)^2)
[1] 8.119665e-10
> sum((x3-x)^2)
[1] 7.313153e-22
This is only mildly convincing (and in fact, may be even other way
round in some versions)
But this one seems to stay:
> sum((bb - AA %*% x1)^2)
[1] 2.482263e-13
> sum((bb - AA %*% x2)^2)
[1] 3.111039e-20
> sum((bb - AA %*% x3)^2)
[1] 1.84273e-29
> sum((bb - AA %*% x)^2)
[1] 0

17
Vector and matrix algebra

* is for componentwise multiplication


(components better match!)
%*% vector/matrix multiplication
crossprod(A,B) ATB (uses dedicated algorithm)
crossprod(A) in particular ATA
rep() a repetition function, very flexible
solve(A, y) finds b such that Ab = y
solve(A) finds A−1 (if needed be)
c() concatenation of vectors, flexible too
matrix() setting up matrices
rbind(A,B) matrices are merged by rows (must match)
cbind(A,B) matrices are merged by columns (must match)
length() returns the length of a vector
dim() returns the dimension of a matrix

18
Type conversions
General format as.type
> qr.solve(X, y)
x
20733.83 -20728.85
> as.vector(qr.solve(X, y))
[1] 20733.83 -20728.85
> as.vector(coef(lm(y~X-1)))
[1] 20733.83 -20728.85
> as.vector(solve(crossprod(X), crossprod(X, y)))
[1] 20737.19 -20732.21
> as.vector(solve(t(X) %*% X) %*% t(X) %*% y)
[1] 20737.20 -20732.22
Note: in R, vectors are interpreted not rowwise or columnwise, but in
an “ambiguous manner”: whatever suits more for a multiplication to
succeed. In other words, the same square matrix can be multiplied by
the same vector from both sides: X %*% a or a %*% X - which creates
usually no problem, until we have an expression a %*% a which is
always a number, aTa for column vectors. If we want to obtain
aaT, a matrix, we need to write a %*% t(a)

19
Potpourri
> numeric(4)
[1] 0 0 0 0
> rep(0,4)
[1] 0 0 0 0
> rep(c(0,1),4)
[1] 0 1 0 1 0 1 0 1
> rep(c(0,1),c(3,2))
[1] 0 0 0 1 1
> X=matrix(0,nrow=2,ncol=2)
> X=matrix(1:4,nrow=2,ncol=2)
> X
[,1] [,2]
[1,] 1 3
[2,] 2 4
> as.vector(X)
[1] 1 2 3 4
> as.matrix(1:4)
[,1]
[1,] 1
[2,] 2
[3,] 3
[4,] 4

20
Finally, reminder

Inverse of a matrix should never be computed, unless:


- it is absolutely necessary to compute standard errors
- the number of right-hand sides is so much larger than n that the
extra cost is insignificant
(this one is based on the following: solving two systems, Ax = b1
and Ax = b2 costs exactly that much as solving one system Ax = b
by first calculating A−1 and then A−1b)
- the size of n is so small that the costs are irrelevant
(yeah, in the toy setting we don’t care)
(John F. Monahan, Numerical Methods of Statistics)
(remarks by I.M.)

21
Some reminders from linear algebra
Useful formulae: (AB)T = BTAT
det(AB) = det(A) det(B) det(AT) = det(A)
Useful definitions: we say that matrix A is
nonnegative definite (or positive semidefinite): xTAx > 0 for every x
positive definite: xTAx > 0 for every x 6= 0
The definitions imply that A is a square matrix; some automatically
require that it is also symmetric, so better check (in statistics it is
almost always symmetric matrices the definitions are applied to)
Useful habit in theory (albeit not observed by R in practice): consider
vectors as n × 1 columns (in statistics, it is always like this)
Useful caution: if a is an n × 1 vector, then aTa is a number (which
we did denote by kak22), but aaT is an n × n matrix. In general,
matrix multiplication is not commutative: AB is in general different
from BA
Useful principle: block matrices are multiplied in a same way as usual
matrices, only blocks are itself matrices, thus multiplied as such, and
hence the dimensions must match
Useful practice: check dimensions

22
Appendix: some Python again
Adding again
Python 3.7.2 (default, Feb 12 2019, 08:15:36)
[Clang 10.0.0 (clang-1000.11.45.5)] on darwin
Type "help", "copyright", "credits" or "license" for more information
>>> 0.000001*1000000
1.0
>>> x=0
>>> for k in range(1000000): x=x+0.000001
>>> x
1.000000000007918
>>> x-1
7.918110611626616e-12
>>> x=1000000
>>> for k in range(1000000): x=x+0.000001
>>> x
1000001.0000076145
>>> x-1000000
1.00000761449337
>>> x-1000001
7.614493370056152e-06

23
Elementary arithmetics also no problem

Python 3.7.2 (default, Feb 12 2019, 08:15:36)


[Clang 10.0.0 (clang-1000.11.45.5)] on darwin
Type "help", "copyright", "credits" or "license" for more information
>>> x=0
>>> for k in range(1000000): x=x+0.000001
>>> x=x+1000000
>>> x
1000001.0
>>> x-1000000
1.0
>>> x-1000001
0.0

24
Now, the code of the experts

Python 3.7.2 (default, Feb 12 2019, 08:15:36)


[Clang 10.0.0 (clang-1000.11.45.5)] on darwin
Type "help", "copyright", "credits" or "license" for more information
>>> import numpy as np
>>> x=sum(np.concatenate(([1000000],np.repeat(0.000001,1000000))))
>>> x
1000001.0000076145
>>> x-1000000
1.00000761449337
>>> x-1000001
7.614493370056152e-06

25

You might also like