Lect01 Handouts

Time Series Analysis
Lasse Engbo Christiansen
DTU Applied Mathematics and Computer Science

Technical University of Denmark
September 8, 2017
1 / 45
Outline of the lecture
I Practical information
I Introductory examples (see also Chapter 1)
I A brief outline of the course
I Chapter 2:
I Multivariate random variables
I The multivariate normal distribution
I Linear projections
I Example
2 / 45
Contact info
I Lasse Engbo Christiansen

Technical University of Denmark
Applied Mathematics and Computer Science
Section for Dynamical Systems
From October 1st: Building 303B, room 010
Email laec@dtu.dk
I Teaching assistants: Jesper, Sebastian & me.
For consultation besides that – please drop me an email
3 / 45
What you should be able to do
Mink and Muskrat skins traded
in Canada 1850−1911
Mink
1500000
Muskrat
1000000
# of skins traded
500000
0
1850 1860 1870 1880 1890 1900 1910
Years
4 / 45
Mink
1500000
Muskrat
1000000
# of skins traded
500000
0
1850 1860 1870 1880 1890 1900 1910
Years
5 / 45
Mink
1500000
Muskrat
1000000
# of skins traded
500000
0
1850 1860 1870 1880 1890 1900 1910
Years
6 / 45
Introductory example – shares (COLO B 1 month)
From finance.yahoo.com
7 / 45
Introductory example – shares (COLO B 1 year)
8 / 45
Introductory example – shares (COLO B all)
9 / 45
Introductory example – shares (COLO B log(all) )
10 / 45
Number of Monthly Airline Passengers in the US
60000
Airline passengers
50000
40000
1995 1996 1997 1998 1999 2000 2001 2002
11 / 45
Consumption of District Heating (VEKS) – data
Heat Consumption (GJ/h)
2000
1200
800
Nov Dec Jan Feb Mar Apr

1995 1996
10
Air Temperature (°C)
5
0
−5
−10

1995 1996
12 / 45
Consumption of DH – simple model
2500
2000
Heat Consumption (GJ/h)
1500
1000
−15 −10 −5 0 5 10
Air Temperature (°C)

13 / 45
Discussion: What is a dynamical system?
14 / 45
Consumption of DH – model error
Model Error
Model Error (GJ/h)
400
0
−400

1995 1996
Model Error as it should be if the model were OK

Model Error (GJ/h)
400
0
−400

1995 1996
15 / 45
A brief outline of the course
I General aspects of multivariate random variables

I Prediction using the general linear model
I Time series models
I Some theory on linear systems
I Time series models with external input
Some goals:
I Characterization of time series / signals; correlation functions,
covariance functions, stationarity, linearity, . . .
I Signal processing; filtering and smoothing
I Modelling; with or without external input
I Prediction with uncertainty
16 / 45
Today: Multivariate random variables
I Distribution functions
I Density functions
I The multivariate normal distribution
I Marginal densities
I Conditional distributions and independence
I Expectations and moments
I Moments of multivariate random variables
I Conditional expectation
I Distributions derived from the normal distribution
I Linear projections and relations to conditional means
17 / 45
Multivariate random variables – distr. functions
I Definition (n-dimensional random variable; random vector)

 
X1
 X2 
X= . 
 
 .. 
Xn
I Joint distribution function:
F (x1 , . . . , xn ) = P{X1 ≤ x1 , . . . , Xn ≤ xn }
I Notice notation (lowercase, capital letters, bold font)
18 / 45
Multivariate random variables - joint densities
I Joint distribution function (repeated from last slide):
F (x1 , . . . , xn ) = P{X1 ≤ x1 , . . . , Xn ≤ xn }
I Joint density function - continuous case:
∂ n F (x1 , . . . , xn )
f (x1 , . . . , xn ) =
∂x1 . . . ∂xn
I and back to the joint distribution function:

Z x1 Z xn
F (x1 , . . . , xn ) = ... f (t1 , . . . , tn ) dt1 . . . dtn
−∞ −∞
I Joint density function - discrete case:
f (x1 , . . . , xn ) = P{X1 = x1 , . . . , Xn = xn }
19 / 45
The Multivariate Normal Distribution
I The joint p.d.f.

1 1
fX (x) = √ exp − (x − µ)T Σ −1 (x − µ)
(2π)n/2 det Σ 2
I Σ is symmetric and positive semi-definite

I Notation: X ∼ N(µ, Σ )
I Standard multivariate normal: Z ∼ N(0
0, I)
I If X = µ + T Z, where Σ = T T T , then X ∼ N(µ, Σ )
I ΣB T )
If X ∼ N(µ, Σ ) and Y = a + BX then Y ∼ N(a + Bµ, BΣ
I More relations between distributions in Sec. 2.7
20 / 45
Stochastic variables and distributions
I If X ∼ N(0, 1), then −X ∼ N(0, 1)

Density for X Density for -X
0.4
0.4
0.3
0.3
Density
Density
0.2
0.2
0.1
0.1
0.0
-4 -2 0 2 4 0.0 -4 -2 0 2 4
X X
I X, −X are different variables that have the same distribution
21 / 45
Marginal density function
I Sub-vector: (X1 , . . . , Xk )T , (k < n)
I Marginal density function:
Z ∞ Z ∞
fS (x1 , . . . , xk ) = ... f (x1 , . . . , xn ) dxk+1 . . . dxn
−∞ −∞
Joint density marginal density

10 11 12 13 14 15
0.3
f
0.2
x2
0.1
x1
0.0
x2
9
-2 -1 0 1 2 3 4 -2 -1 0 1 2 3 4
x1 x1
22 / 45
Blackboard - conditional probability
23 / 45
Conditional distributions
I The conditional density of X1
given X2 = x2 is defined as
(joint density of (X1 , X2 ) divided by
(fX1 (x1 ) > 0):
the marginal density of X2 evaluated
at x2 )
fX1 ,X2 (x1 , x2 )
fX1 |X2 =x2 (x1 ) =
fX2 (x2 )
15
0.05 0.10 0.15 0.20 0.25

14
13
x2
12
f
11
10
9
-2 -1 0 1 2 3 4 -2 -1 0 1 2 3 4
x1 x1
24 / 45
Independence
I If knowledge of X does not give information about Y , we get that

fY |X=x (y ) = fY (y )
I This leads to the following definition of independence:
def
X, Y stochastically independent ⇔
fX,Y (x, y ) = fX (x)fY (y )
25 / 45
Expectation
I Let X be a univariate random variable with density fX (x). The

expectation of X is then defined as:
Z ∞
E[X] = x fX (x)dx (continuous case)
−∞
X
E[X] = x P (X = x) (discrete case)
all x
I Expectation is a linear operator
I Calculation rule:
E[a + bX1 + cX2 ] = a + b E[X1 ] + c E[X2 ]
26 / 45
Moments and Variance
I n’th moment: Z ∞
n
E[X ] = x n fX (x) dx
−∞
I n’th central moment:
Z ∞
n
E[(X − E[X]) ] = (x − E[X])n fX (x) dx
−∞
I The 2’nd central moment is called the variance:
V [X] = E[(X − E[X])2 ] = E[X 2 ] − (E[X])2
27 / 45
Covariance
I Covariance:
Cov[X1 , X2 ] = E[(X1 −E[X1 ])(X2 −E[X2 ])] = E[X1 X2 ]−E[X1 ]E[X2 ]
I Variance and covariance:
V [X] = Cov[X, X]
I Calculation rules:
Cov[aX1 + bX2 , cX3 + dX4 ] =

ac Cov[X1 , X3 ] + ad Cov[X1 , X4 ] + bc Cov[X2 , X3 ] + bd Cov[X2 , X4 ]
I The calculation rule can be used for the variance as well. For
instance:
V [a + bX2 ] = b2 V [X2 ]
28 / 45
Moment representation
I All moments up to a given order.

I Second order moment representation:
I Mean
I Variance
I Covariance (If relevant)
29 / 45
Expectation and Variance for Random Vectors
I Expectation: E[X] = [E[X1 ], E[X2 ], . . . , E[Xn ]]T

I Variance-covariance (matrix): Σ X = V [X] = E[(X − µ)(X − µ)T ] =
 
V [X1 ] Cov[X1 , X2 ] · · · Cov[X1 , Xn ]
Cov[X2 , X1 ]
 V [X2 ] · · · Cov[X2 , Xn ]
 .. .. 
 . . 
Cov[Xn , X1 ] Cov[Xn , X2 ] ··· V [Xn ]
I Correlation:
Cov[Xi , Xj ] σij
ρij = p =
V [Xi ]V [Xj ] σi σj
30 / 45
Correlation and Independence
I If X and Y are independent stochastic variables then Cov(X, Y ) = 0

and thus Corr(X, Y ) = 0.
I However, if X ∈ N(0, 1), then
Cov(X, X 2 ) =E[X · X 2 ] − E[X] · E[X 2 ] = E[X 3 ]

Z
= x 3 fX (x) dx = 0
I Thus X and X 2 are uncorrelated, but E[X 2 |X = x] = x 2 .

I Independence implies no correlation, not the other way around.
31 / 45
Expectation and Variance for Random Vectors
I The correlation matrix R = ρ is an arrangement of ρij in a matrix

I Covariance matrix between X (dim. p) and Y (dim. q):
Σ XY = C[X, Y ] = E (X − µ)(Y − ν)T

 
Cov[X1 , Y1 ] · · · Cov[X1 , Yq ]
.. ..
= 
 
. . 
Cov[Xp , Y1 ] ··· Cov[Xp , Yq ]
I Calculation rules – see the book.

I The special case of the variance C[X, X] = V [X] results in
V [AX] = AV [X]AT
32 / 45
Conditional expectation
Z ∞
E[Y |X = x] = y fY |X=x (y ) dy
−∞
E[Y |X] = E[Y ] if X and Y are independent

E[Y ] = E [E[Y |X]]
E[g(X)Y |X] = g(X)E[Y |X]
E[g(X)Y ] = E [g(X)E[Y |X]]
E[a|X] = a
E[g(X)|X] = g(X)
E[cX + dZ|Y ] = cE[X|Y ] + dE[Z|Y ]
33 / 45
Variance separation
I Definition of conditional variance and covariance:

h T i
V [Y |X] = E Y − E[Y |X] Y − E[Y |X] |X
h T i
C[Y , Z|X] = E Y − E[Y |X] Z − E[Z|X] |X
I The variance separation theorem:
V [Y ] = E [V [Y |X]] + V [E[Y |X]]

C[Y , Z] = E [C[Y , Z|X]] + C [E[Y |X], E[Z|X]]
34 / 45
Linear Projections
I Consider two random vectors Y and X, then

Y µY Y ΣY Y ΣY X
E = and V =
X µX X Σ XY Σ XX
def
I Define the linear projection ρX (Y ) = µY + Σ Y X Σ −1
XX (X − µX )
I Then:
I ρX (Y ) is of the form a + BX;
I V [Y − ρX (Y )] = Σ Y Y − Σ Y X Σ −1 T
XX Σ Y X ;
I Cov(Y − ρX (Y ), X) = 0.
35 / 45
Linear projections and conditional means
I ρX (Y ) minimizes the variance among functions a + BX that gives

projection errors uncorrelated with X.
I If (X, Y ) is multivariate normal, this is a property of E[Y |X].
I Nevertheless, if X ∼ N(0, 1) then
ρX (X 2 ) = 1 while E[X 2 |X] = X 2
I We shall write E[Y |X] for ρX (Y ) anyway.
BECAUSE:
1. The book and other time series literature does so
2. It is true for Normal stochastic variables
3. ρX (Y ) satisfies the same calculus rules as E[Y |X] for normal
distributions
I The differences should be kept in mind.
36 / 45
Air pollution in cities
I Carstensen (1990) has used time series analysis to set up models for
NO and NO2 at Jagtvej in Copenhagen
I Measurements of NO and NO2 are available every third hour (00,
03, 06, 09, 12, . . . )
I We have µNO2 = 48µg/m3 and µNO = 79µg/m3
I In the model X1,t = NO2,t − µNO2 and X2,t = NOt − µNO is used
37 / 45
Air pollution in cities – model and forecast

X1,t 0.9 −0.1 X1,t−1 ξ1,t
= +
X2,t 0.4 0.8 X2,t−1 ξ2,t
Xt = Φ Xt−1 + ξt
σ12

σ12 30 21
V [ξt ] = Σ = = (µg/m3 )2
σ21 σ22 21 23
I Assume that t corresponds to 09:00 today and we have

measurements 64 µg/m3 NO2 and 93 µg/m3 NO
I Forecast the concentrations at 12:00 (t + 1)
I What is the variance-covariance of the forecast error?
I The best predictor is the conditional expectation
38 / 45
Air pollution in cities – model and forecast
The forecast:
E(Xt+1 |Xt ) = E(Φ

ΦXt + ξt+1 |Xt )
= Φ Xt
Variance-covariance of the forecast error.
V (Xt+1 − E(Xt+1 |Xt )|Xt ) = V (Φ

ΦXt + ξt+1 − Φ Xt |Xt )
= V (ξt+1 |Xt )
=Σ
39 / 45
Air pollution in cities – forecast with R
## The system
mu <- matrix(c(48,79),nrow=2)
Phi <- matrix(c(.9,.4,-.1,.8),nrow=2)
Sigma <- matrix(c(20,21,21,23),nrow=2)
## The forecast of the concentrations
Xt <- matrix(c(64,93),nrow=2)-mu
Xtp1.hat <- Phi%*%Xt
Xtp1.hat + mu
## [,1]
## [1,] 61.0
## [2,] 96.6
## Variance of the error is trivial
40 / 45
Air pollution in cities – linear projection
I At 12:00 (t + 1) we now assume that NO2 is measured with

67 µg/m3 as the result, but NO cannot be measured due to some
trouble with the equipment.
I Estimate the missing NO measurement.
I What is the variance of the error of the estimation?
41 / 45
Air pollution in cities – linear projection
compare with (2.65)

z }| {
E[X2,t+1 |X1,t+1 , Xt ] = E[(X2,t+1 |X1,t+1 )|Xt ] =
E[X2,t+1 |Xt ]+Cov(X1,t+1 , X2,t+1 |Xt )V [X1,t+1 |Xt ]−1 (X1,t+1 −E(X1,t+1 |Xt ))
= (Φ21 X1,t + Φ22 X2,t ) + Σ12 Σ−1
11 (X1,t+1 − (Φ11 X1,t + Φ12 X2,t ))
The variance of the projection error is (2.66)
E(V (X2,t+1 |X1,t+1 , Xt ))

= V (X2,t+1 |Xt ) − Cov(X2,t+1 , X1,t+1 |Xt )2 V (X1,t+1 |Xt )−1
= Σ22 − Σ212 /Σ11
42 / 45
Air pollution in cities – linear projection with R
## The new observation of X_{1,t+1}

X1tp1 <- 67 - mu[1]
## The projection
Xtp1.hat[2] + mu[2] +
Sigma[1,2]/Sigma[1,1] * (X1tp1 - Xtp1.hat[1])
## [1] 102.9
## The variance of the projection error

Sigma[2,2] - Sigma[1,2]^2/Sigma[1,1]
## [1] 0.95
43 / 45
Highlights
I Covariance calculation rule
Cov[aX1 + bX2 , cX3 + dX4 ] =

ac Cov[X1 , X3 ] + ad Cov[X1 , X4 ] + bc Cov[X2 , X3 ] + bd Cov[X2 , X4 ]
I The variance separation theorem:
V [Y ] = E [V [Y |X]] + V [E[Y |X]]

C[Y , Z] = E [C[Y , Z|X]] + C [E[Y |X], E[Z|X]]
I Linear projection:
def
(E[Y |X] =)ρX (Y ) = µY + Σ Y X Σ −1
XX (X − µX )
I Second order moment representation:

All moments up to second order
(mean, variance and covariance).
44 / 45
Exercises
Exercises 2.1, 2.2, 2.3

Correction in exercise 2.2: X and ε are mutually independent.
45 / 45

Lect01 Handouts

Uploaded by

Copyright:

Available Formats

You might also like

Lect01 Handouts

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lect01 Handouts

Uploaded by

Copyright:

Available Formats

Time Series Analysis

Lasse Engbo Christiansen

DTU Applied Mathematics and Computer Science

I Lasse Engbo Christiansen

1850 1860 1870 1880 1890 1900 1910

1850 1860 1870 1880 1890 1900 1910

1850 1860 1870 1880 1890 1900 1910

1995 1996 1997 1998 1999 2000 2001 2002

Nov Dec Jan Feb Mar Apr

Nov Dec Jan Feb Mar Apr

Air Temperature (°C)

Nov Dec Jan Feb Mar Apr

Model Error as it should be if the model were OK

Nov Dec Jan Feb Mar Apr

I General aspects of multivariate random variables

I Definition (n-dimensional random variable; random vector)

I Joint distribution function:

I Notice notation (lowercase, capital letters, bold font)

I Joint distribution function (repeated from last slide):

I Joint density function - continuous case:

I and back to the joint distribution function:

I The joint p.d.f.

I Σ is symmetric and positive semi-definite

I If X ∼ N(0, 1), then −X ∼ N(0, 1)

I X, −X are different variables that have the same distribution

Joint density marginal density

0.05 0.10 0.15 0.20 0.25

I If knowledge of X does not give information about Y , we get that

fX,Y (x, y ) = fX (x)fY (y )

I Let X be a univariate random variable with density fX (x). The

E[a + bX1 + cX2 ] = a + b E[X1 ] + c E[X2 ]

I The 2’nd central moment is called the variance:

V [X] = E[(X − E[X])2 ] = E[X 2 ] − (E[X])2

Cov[X1 , X2 ] = E[(X1 −E[X1 ])(X2 −E[X2 ])] = E[X1 X2 ]−E[X1 ]E[X2 ]

I Variance and covariance:

Cov[aX1 + bX2 , cX3 + dX4 ] =

I All moments up to a given order.

I Expectation: E[X] = [E[X1 ], E[X2 ], . . . , E[Xn ]]T

I If X and Y are independent stochastic variables then Cov(X, Y ) = 0

Cov(X, X 2 ) =E[X · X 2 ] − E[X] · E[X 2 ] = E[X 3 ]

I Thus X and X 2 are uncorrelated, but E[X 2 |X = x] = x 2 .

I The correlation matrix R = ρ is an arrangement of ρij in a matrix

Σ XY = C[X, Y ] = E (X − µ)(Y − ν)T

I Calculation rules – see the book.

E[Y |X] = E[Y ] if X and Y are independent

I Definition of conditional variance and covariance:

I The variance separation theorem:

V [Y ] = E [V [Y |X]] + V [E[Y |X]]

I Consider two random vectors Y and X, then

I ρX (Y ) minimizes the variance among functions a + BX that gives

I Assume that t corresponds to 09:00 today and we have

E(Xt+1 |Xt ) = E(Φ

Variance-covariance of the forecast error.

V (Xt+1 − E(Xt+1 |Xt )|Xt ) = V (Φ

## Variance of the error is trivial

I At 12:00 (t + 1) we now assume that NO2 is measured with

compare with (2.65)

The variance of the projection error is (2.66)