Lect01 Handouts

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 45

Time Series Analysis

Lasse Engbo Christiansen

DTU Applied Mathematics and Computer Science


Technical University of Denmark

September 8, 2017

1 / 45
Outline of the lecture

I Practical information
I Introductory examples (see also Chapter 1)
I A brief outline of the course
I Chapter 2:
I Multivariate random variables
I The multivariate normal distribution
I Linear projections
I Example

2 / 45
Contact info

I Lasse Engbo Christiansen


Technical University of Denmark
Applied Mathematics and Computer Science
Section for Dynamical Systems
From October 1st: Building 303B, room 010
Email laec@dtu.dk
I Teaching assistants: Jesper, Sebastian & me.
For consultation besides that – please drop me an email

3 / 45
What you should be able to do
Mink and Muskrat skins traded
in Canada 1850−1911

Mink

1500000
Muskrat

1000000
# of skins traded

500000
0

1850 1860 1870 1880 1890 1900 1910

Years

4 / 45
What you should be able to do
Mink and Muskrat skins traded
in Canada 1850−1911

Mink

1500000
Muskrat

1000000
# of skins traded

500000
0

1850 1860 1870 1880 1890 1900 1910

Years

5 / 45
What you should be able to do
Mink and Muskrat skins traded
in Canada 1850−1911

Mink

1500000
Muskrat

1000000
# of skins traded

500000
0

1850 1860 1870 1880 1890 1900 1910

Years

6 / 45
Introductory example – shares (COLO B 1 month)

From finance.yahoo.com

7 / 45
Introductory example – shares (COLO B 1 year)

From finance.yahoo.com

8 / 45
Introductory example – shares (COLO B all)

From finance.yahoo.com

9 / 45
Introductory example – shares (COLO B log(all) )

From finance.yahoo.com

10 / 45
Number of Monthly Airline Passengers in the US
60000
Airline passengers
50000
40000

1995 1996 1997 1998 1999 2000 2001 2002

11 / 45
Consumption of District Heating (VEKS) – data
Heat Consumption (GJ/h)

2000
1200
800

Nov Dec Jan Feb Mar Apr


1995 1996
10
Air Temperature (°C)

5
0
−5
−10

Nov Dec Jan Feb Mar Apr


1995 1996

12 / 45
Consumption of DH – simple model

2500
2000
Heat Consumption (GJ/h)

1500
1000

−15 −10 −5 0 5 10

Air Temperature (°C)


13 / 45
Discussion: What is a dynamical system?

14 / 45
Consumption of DH – model error

Model Error
Model Error (GJ/h)

400
0
−400

Nov Dec Jan Feb Mar Apr


1995 1996

Model Error as it should be if the model were OK


Model Error (GJ/h)

400
0
−400

Nov Dec Jan Feb Mar Apr


1995 1996

15 / 45
A brief outline of the course

I General aspects of multivariate random variables


I Prediction using the general linear model
I Time series models
I Some theory on linear systems
I Time series models with external input
Some goals:
I Characterization of time series / signals; correlation functions,
covariance functions, stationarity, linearity, . . .
I Signal processing; filtering and smoothing
I Modelling; with or without external input
I Prediction with uncertainty

16 / 45
Today: Multivariate random variables

I Distribution functions
I Density functions
I The multivariate normal distribution
I Marginal densities
I Conditional distributions and independence
I Expectations and moments
I Moments of multivariate random variables
I Conditional expectation
I Distributions derived from the normal distribution
I Linear projections and relations to conditional means

17 / 45
Multivariate random variables – distr. functions

I Definition (n-dimensional random variable; random vector)


 
X1
 X2 
X= . 
 
 .. 
Xn

I Joint distribution function:

F (x1 , . . . , xn ) = P{X1 ≤ x1 , . . . , Xn ≤ xn }

I Notice notation (lowercase, capital letters, bold font)

18 / 45
Multivariate random variables - joint densities

I Joint distribution function (repeated from last slide):

F (x1 , . . . , xn ) = P{X1 ≤ x1 , . . . , Xn ≤ xn }

I Joint density function - continuous case:

∂ n F (x1 , . . . , xn )
f (x1 , . . . , xn ) =
∂x1 . . . ∂xn

I and back to the joint distribution function:


Z x1 Z xn
F (x1 , . . . , xn ) = ... f (t1 , . . . , tn ) dt1 . . . dtn
−∞ −∞
I Joint density function - discrete case:

f (x1 , . . . , xn ) = P{X1 = x1 , . . . , Xn = xn }

19 / 45
The Multivariate Normal Distribution

I The joint p.d.f.


 
1 1
fX (x) = √ exp − (x − µ)T Σ −1 (x − µ)
(2π)n/2 det Σ 2

I Σ is symmetric and positive semi-definite


I Notation: X ∼ N(µ, Σ )
I Standard multivariate normal: Z ∼ N(0
0, I)
I If X = µ + T Z, where Σ = T T T , then X ∼ N(µ, Σ )
I ΣB T )
If X ∼ N(µ, Σ ) and Y = a + BX then Y ∼ N(a + Bµ, BΣ
I More relations between distributions in Sec. 2.7

20 / 45
Stochastic variables and distributions

I If X ∼ N(0, 1), then −X ∼ N(0, 1)


Density for X Density for -X
0.4

0.4
0.3

0.3
Density

Density
0.2

0.2
0.1

0.1
0.0

-4 -2 0 2 4 0.0 -4 -2 0 2 4
X X

I X, −X are different variables that have the same distribution

21 / 45
Marginal density function
I Sub-vector: (X1 , . . . , Xk )T , (k < n)
I Marginal density function:
Z ∞ Z ∞
fS (x1 , . . . , xk ) = ... f (x1 , . . . , xn ) dxk+1 . . . dxn
−∞ −∞

Joint density marginal density


10 11 12 13 14 15

0.3
f

0.2
x2

0.1
x1

0.0
x2
9

-2 -1 0 1 2 3 4 -2 -1 0 1 2 3 4
x1 x1

22 / 45
Blackboard - conditional probability

23 / 45
Conditional distributions
I The conditional density of X1
given X2 = x2 is defined as
(joint density of (X1 , X2 ) divided by
(fX1 (x1 ) > 0):
the marginal density of X2 evaluated
at x2 )
fX1 ,X2 (x1 , x2 )
fX1 |X2 =x2 (x1 ) =
fX2 (x2 )
15

0.05 0.10 0.15 0.20 0.25


14
13
x2
12

f
11
10
9

-2 -1 0 1 2 3 4 -2 -1 0 1 2 3 4
x1 x1
24 / 45
Independence

I If knowledge of X does not give information about Y , we get that


fY |X=x (y ) = fY (y )
I This leads to the following definition of independence:
def
X, Y stochastically independent ⇔

fX,Y (x, y ) = fX (x)fY (y )

25 / 45
Expectation

I Let X be a univariate random variable with density fX (x). The


expectation of X is then defined as:
Z ∞
E[X] = x fX (x)dx (continuous case)
−∞
X
E[X] = x P (X = x) (discrete case)
all x
I Expectation is a linear operator
I Calculation rule:

E[a + bX1 + cX2 ] = a + b E[X1 ] + c E[X2 ]

26 / 45
Moments and Variance

I n’th moment: Z ∞
n
E[X ] = x n fX (x) dx
−∞
I n’th central moment:
Z ∞
n
E[(X − E[X]) ] = (x − E[X])n fX (x) dx
−∞

I The 2’nd central moment is called the variance:

V [X] = E[(X − E[X])2 ] = E[X 2 ] − (E[X])2

27 / 45
Covariance
I Covariance:

Cov[X1 , X2 ] = E[(X1 −E[X1 ])(X2 −E[X2 ])] = E[X1 X2 ]−E[X1 ]E[X2 ]

I Variance and covariance:

V [X] = Cov[X, X]

I Calculation rules:

Cov[aX1 + bX2 , cX3 + dX4 ] =


ac Cov[X1 , X3 ] + ad Cov[X1 , X4 ] + bc Cov[X2 , X3 ] + bd Cov[X2 , X4 ]

I The calculation rule can be used for the variance as well. For
instance:

V [a + bX2 ] = b2 V [X2 ]

28 / 45
Moment representation

I All moments up to a given order.


I Second order moment representation:
I Mean
I Variance
I Covariance (If relevant)

29 / 45
Expectation and Variance for Random Vectors

I Expectation: E[X] = [E[X1 ], E[X2 ], . . . , E[Xn ]]T


I Variance-covariance (matrix): Σ X = V [X] = E[(X − µ)(X − µ)T ] =
 
V [X1 ] Cov[X1 , X2 ] · · · Cov[X1 , Xn ]
Cov[X2 , X1 ]
 V [X2 ] · · · Cov[X2 , Xn ]
 .. .. 
 . . 
Cov[Xn , X1 ] Cov[Xn , X2 ] ··· V [Xn ]

I Correlation:
Cov[Xi , Xj ] σij
ρij = p =
V [Xi ]V [Xj ] σi σj

30 / 45
Correlation and Independence

I If X and Y are independent stochastic variables then Cov(X, Y ) = 0


and thus Corr(X, Y ) = 0.
I However, if X ∈ N(0, 1), then

Cov(X, X 2 ) =E[X · X 2 ] − E[X] · E[X 2 ] = E[X 3 ]


Z
= x 3 fX (x) dx = 0

I Thus X and X 2 are uncorrelated, but E[X 2 |X = x] = x 2 .


I Independence implies no correlation, not the other way around.

31 / 45
Expectation and Variance for Random Vectors

I The correlation matrix R = ρ is an arrangement of ρij in a matrix


I Covariance matrix between X (dim. p) and Y (dim. q):

Σ XY = C[X, Y ] = E (X − µ)(Y − ν)T


 
 
Cov[X1 , Y1 ] · · · Cov[X1 , Yq ]
.. ..
= 
 
. . 
Cov[Xp , Y1 ] ··· Cov[Xp , Yq ]

I Calculation rules – see the book.


I The special case of the variance C[X, X] = V [X] results in
V [AX] = AV [X]AT

32 / 45
Conditional expectation

Z ∞
E[Y |X = x] = y fY |X=x (y ) dy
−∞

E[Y |X] = E[Y ] if X and Y are independent


E[Y ] = E [E[Y |X]]
E[g(X)Y |X] = g(X)E[Y |X]
E[g(X)Y ] = E [g(X)E[Y |X]]
E[a|X] = a
E[g(X)|X] = g(X)
E[cX + dZ|Y ] = cE[X|Y ] + dE[Z|Y ]

33 / 45
Variance separation

I Definition of conditional variance and covariance:


h  T i
V [Y |X] = E Y − E[Y |X] Y − E[Y |X] |X
h  T i
C[Y , Z|X] = E Y − E[Y |X] Z − E[Z|X] |X

I The variance separation theorem:

V [Y ] = E [V [Y |X]] + V [E[Y |X]]


C[Y , Z] = E [C[Y , Z|X]] + C [E[Y |X], E[Z|X]]

34 / 45
Linear Projections

I Consider two random vectors Y and X, then


       
Y µY Y ΣY Y ΣY X
E = and V =
X µX X Σ XY Σ XX

def
I Define the linear projection ρX (Y ) = µY + Σ Y X Σ −1
XX (X − µX )
I Then:
I ρX (Y ) is of the form a + BX;
I V [Y − ρX (Y )] = Σ Y Y − Σ Y X Σ −1 T
XX Σ Y X ;
I Cov(Y − ρX (Y ), X) = 0.

35 / 45
Linear projections and conditional means

I ρX (Y ) minimizes the variance among functions a + BX that gives


projection errors uncorrelated with X.
I If (X, Y ) is multivariate normal, this is a property of E[Y |X].
I Nevertheless, if X ∼ N(0, 1) then
ρX (X 2 ) = 1 while E[X 2 |X] = X 2
I We shall write E[Y |X] for ρX (Y ) anyway.
BECAUSE:
1. The book and other time series literature does so
2. It is true for Normal stochastic variables
3. ρX (Y ) satisfies the same calculus rules as E[Y |X] for normal
distributions
I The differences should be kept in mind.

36 / 45
Air pollution in cities

I Carstensen (1990) has used time series analysis to set up models for
NO and NO2 at Jagtvej in Copenhagen
I Measurements of NO and NO2 are available every third hour (00,
03, 06, 09, 12, . . . )
I We have µNO2 = 48µg/m3 and µNO = 79µg/m3
I In the model X1,t = NO2,t − µNO2 and X2,t = NOt − µNO is used

37 / 45
Air pollution in cities – model and forecast

      
X1,t 0.9 −0.1 X1,t−1 ξ1,t
= +
X2,t 0.4 0.8 X2,t−1 ξ2,t

Xt = Φ Xt−1 + ξt

σ12
   
σ12 30 21
V [ξt ] = Σ = = (µg/m3 )2
σ21 σ22 21 23

I Assume that t corresponds to 09:00 today and we have


measurements 64 µg/m3 NO2 and 93 µg/m3 NO
I Forecast the concentrations at 12:00 (t + 1)
I What is the variance-covariance of the forecast error?
I The best predictor is the conditional expectation

38 / 45
Air pollution in cities – model and forecast

The forecast:

E(Xt+1 |Xt ) = E(Φ


ΦXt + ξt+1 |Xt )
= Φ Xt

Variance-covariance of the forecast error.

V (Xt+1 − E(Xt+1 |Xt )|Xt ) = V (Φ


ΦXt + ξt+1 − Φ Xt |Xt )
= V (ξt+1 |Xt )

39 / 45
Air pollution in cities – forecast with R

## The system
mu <- matrix(c(48,79),nrow=2)
Phi <- matrix(c(.9,.4,-.1,.8),nrow=2)
Sigma <- matrix(c(20,21,21,23),nrow=2)
## The forecast of the concentrations
Xt <- matrix(c(64,93),nrow=2)-mu
Xtp1.hat <- Phi%*%Xt
Xtp1.hat + mu

## [,1]
## [1,] 61.0
## [2,] 96.6

## Variance of the error is trivial

40 / 45
Air pollution in cities – linear projection

I At 12:00 (t + 1) we now assume that NO2 is measured with


67 µg/m3 as the result, but NO cannot be measured due to some
trouble with the equipment.
I Estimate the missing NO measurement.
I What is the variance of the error of the estimation?

41 / 45
Air pollution in cities – linear projection

compare with (2.65)


z }| {
E[X2,t+1 |X1,t+1 , Xt ] = E[(X2,t+1 |X1,t+1 )|Xt ] =
E[X2,t+1 |Xt ]+Cov(X1,t+1 , X2,t+1 |Xt )V [X1,t+1 |Xt ]−1 (X1,t+1 −E(X1,t+1 |Xt ))
= (Φ21 X1,t + Φ22 X2,t ) + Σ12 Σ−1
11 (X1,t+1 − (Φ11 X1,t + Φ12 X2,t ))

The variance of the projection error is (2.66)

E(V (X2,t+1 |X1,t+1 , Xt ))


= V (X2,t+1 |Xt ) − Cov(X2,t+1 , X1,t+1 |Xt )2 V (X1,t+1 |Xt )−1
= Σ22 − Σ212 /Σ11

42 / 45
Air pollution in cities – linear projection with R

## The new observation of X_{1,t+1}


X1tp1 <- 67 - mu[1]
## The projection
Xtp1.hat[2] + mu[2] +
Sigma[1,2]/Sigma[1,1] * (X1tp1 - Xtp1.hat[1])

## [1] 102.9

## The variance of the projection error


Sigma[2,2] - Sigma[1,2]^2/Sigma[1,1]

## [1] 0.95

43 / 45
Highlights
I Covariance calculation rule

Cov[aX1 + bX2 , cX3 + dX4 ] =


ac Cov[X1 , X3 ] + ad Cov[X1 , X4 ] + bc Cov[X2 , X3 ] + bd Cov[X2 , X4 ]

I The variance separation theorem:

V [Y ] = E [V [Y |X]] + V [E[Y |X]]


C[Y , Z] = E [C[Y , Z|X]] + C [E[Y |X], E[Z|X]]

I Linear projection:
def
(E[Y |X] =)ρX (Y ) = µY + Σ Y X Σ −1
XX (X − µX )

I Second order moment representation:


All moments up to second order
(mean, variance and covariance).

44 / 45
Exercises

Exercises 2.1, 2.2, 2.3


Correction in exercise 2.2: X and ε are mutually independent.

45 / 45

You might also like