Financial ML

Topological Data Analysis (ECCE 794)
Name: Oscar Enrique Llerena Castro

Student Code: 100065298
Slide 0
Course Plan
1. Introduction to Financial Machine Learning
2. Algorithmic Trading
3. Automated Portfolio Management
4. Financial Data Structures
5. Labeling in Supervised Learning
6. Non-IID (Independent and Identically Distributed) Observations
7. Stationary, Memory retention and Differencing in Time Series
8. Fractional Differencing
9. Ensemble Methods, Cross Validation and Feature Importance in Finance
10. Backtesting
11. Deep Learning with Financial Time Series: Recurrent Neural Networks, LSTM, Attention
12. Machine Learning Asset Allocation
13. Stock APIs and Market data acquisition
14. Project - Daily Energy price prediction in Europe
15. Review
Slide 1
Financial Data
• Types of Financial Data
• Trade Data: Trade Bars
• Information-Driven Bars
• Tick/Volume/Dollars Bars: Advantages and Disadvantages
• Introduction to Imbalance Bars
Slide 2
1
Types of Financial Data
• Fundamental Data
– Assets
– Liabilities
– Sales
– Costs/earning
– Macro variables
• Market Data
– Price/yield/implied volatility
– Volume
– Divided/coupons
– Open interest
– Quotes/cancellations
– Aggressor side
• Analytics
– Analyst recommendations
– Credit ratings
– Earnings expectations
– News sentiment
• Alternative Data
– Satellite/CCTV images
– Google searches
– Twitter/chats
– Metadata
Slide 3
Trade Data
• Tick data is also called Trade or Time-and-Sales data and contains information regarding the
orders which have been executed on the exchange
• The basic piece of information of interest that a transaction tick contains are:
– Order timestamp
– Price
– Volume
– Aggressor flag (buy or sell)
Slide 4
2
Disadvantages of Time Bars
• They are loved by engineers as they are aligned with the uniform sampling of signals in com-
munication, data acquisition, etc
• Signal Processing: Shannon-Nyquist Theorem about the Sampling Rate
• They are not well adapted to finance because of several reasons, including:
1. Trading activity is not equally distributed over time: Financial markets don’t operate
uniformly; trading volumes fluctuate significantly throughout a day or over long periods
and this variation can’t be captured accurately by bars
2. Does not take into account volume activity: Time bars only consider price changes over
fixed intervals, ignoring how much was traded, which can be crucial for understanding
market momentum
3. Suffer from volatility-clustering heteroskedasticity: Financial markets often experience
periods of high and low volatility that cluster, which time bars might not represent well
due to their fixed intervals
4. Far from normal returns distribution: Returns in financial markets often don’t follow a
normal distribution, presenting challenges for models based on time bars
Slide 6
Heteroskedasticity
• Standard deviation of y vs the independent variable x is non constant
• Example: Predicted stock price as function of trade volume
• In data, Heteroskedasticity means the spread or “noise” of your information changes depending
on what you’re looking at.
Slide 7
Forming Better Bars

• Information does not arrive to the market at a constant entropy rate
• Sampling data in chronological intervals means that the informational content of the individual
observations is far from constant
• A better approach is to sample observations as a subordinated prcess of the amount information

exchanged:
– Trade bars
– Volume bars
– Dollar bars
– Volatility or runs bars
– Order imbalance bars
– Entropy bars
Slide 8
3
Forming Better Bars
• Trade (or Tick) Bars: OHLC is formed only when the number of trades reaches a predefined
threshold
• Volume Bars: OHLC is formed only when the aggregate of trades reaches a predefined threshold
• Dollar Bars: OHLC is formed only when the aggregate dollar value of trades reaches a prede-
fined threshold
Slide 9
Dollar Imbalance Bars: More Mathematical

The term “imbalance” refers to the discrepancy between the buying and selling pressures in the
market over a certain period or within a bar. This is a more sophisticated method of SAMPLING
financial time series data compared to traditional time bars.
• Let’s define the imbalance at time T as θT = Tt=1 bt vt , where bt ∈ {−1, 1} is the aggressor flag
P
and vt may represent either the number of securities traded or the dollar amount exchanged
• The imbalance at time T , denote as θT , is defined as the sum of the products of the aggressor
flab bt (indicating whether a trade is a buy {1} or a sell {−1}) and the volume vt (which
could represent the number of securities traded or the dollar amount exchanged). This gives a
measure of the net directional volume or dollar flow within a bar
• We compute the expected value of θT at the beginning of the bar

X X
E0 [θT ] = E0 vt − E0 vt
t|bt =1 t|bt =−1
= E0 [T ](P [bt = 1]E0 [vt |bt = 1] − P [bt = −1]E0 [vt |bt = −1])
• E0 denotes expected value. The subscript 0 is often used in financial mathematics and time
series analysis to indicate that the expectation is taken at the beginning of the period, or “time
zero”, before observing any future values. This notation emphasizes that the expectation is
based on information available up to that point, without any foresight of future events. It’s a
way to clarify that the calculation is based on historical or currently known data, adhering to
the principle of no look-ahead bias in financial modeling and forecasting.
• The expected imbalance E0 [θT ] is calculated by taking the difference between the expected
volumes of buy trades and sell trades. This is further refined by weighting these volumes by
their probabilities, P [bt = 1] for buys and P [bt = −1] for sells, to get v + and v − , respectively.
• Let’s denote +
P v = P [bt = 1]E 0 [vt |bt = 1], and v
−
= P [bt = −1]E0 [vt |bt = −1]), so that
E0 [T ] E0 [ t vt ] = E0 [vt ] = v + v . You can think of v + and v − as decomposing the initial
−1 + −
expectation of vt into the component contributed by buys and the component contributed by
sells.
•
P
t vt represents
P the total volume (or total dollar value) of trades over a period T . The expec-
tation E0 [ t vt ] is the average total volume expected over this period, considering all possible
outcomes of trades. In short, E0 [vt ] is the expected value of the volume traded per trade (or
per bar/tick), or the average amount of dollar volume for a single transaction.
4
• The term E0 [T ]−1 is essentially the reciprocal of the expected P number of trades (or bars) over
the period T . This normalizes the total expected volume P E0 [ t vt ] to obtain an average volume
per trade (or per bar), which is E0 [vt ]. In short, E0 [ t vt ] is the expected total volume traded
over a certain period, summed across all trades done in that period.
• The expectation E0 [vt ] can be thought of as the average volume of a single trade, considering
both buys and sells. This is where v + + v − comes into play, representing the combined con-
tributions of both buy and sell orders to this average volume. In short, E0 [T ] is the expected
number of trades (or ticks) that occur within the same period
P
E [ v ]
• The equation E0 [vt ] = 0E0 [Tt ] t implies that the average traded volume per trade is equal to the
total expected traded volume divided by the expected number of trades. Later, the expected
average traded volume E0 [vt ] is equalized to the sum of the contributions of buy orders v + and
sell orders v − .
• Specifically, v + = P [bt = 1]E0 [vt |bt = 1] calculates the expected volume from buy orders by
multiplying the probability of a trade being a buy (P [bt = 1]) with the conditional expected
volume of a buy trade E0 [vt |bt = 1]). Similarly, v − = P [bt = −1]E0 [vt |bt = −1] does the same
for sell orders.
• Then, E0 [θT ] = E0 [T ](v + − v − ) = E0 [T ](2v + − E0 [vt ]).
• Here, we use v − = E0 [vt ] − v +
• In practice, we can estimate E0 [T ] as an exponentially weighted moving average (EWMA) of

T values from prior bars, and (2v + − E0 [vt ]) as an exponentially weighted moving average
(EWMA) of bt vt values from prior bars.
• Video to understand Exponential Weighted Moving Average
• We define a bar as a T ∗ -contiguous subset of ticks such that the following condition is met
T ∗ = argT min{|θT | ≥ E0 [T ]|2v + − E0 [vt ]|}
where the size of the expected imbalance is implied by |2v + − E0 [vt ]|
• The function argT min{} is used to determine the points in time when a new Dollar Imbalance
Bar should be started. It ensures that each bar captures meaningful information about market
imbalances, enhancing the analysis and modeling of financial time series for trading algorithms.
• The term E0 [T ]|2v + − E0 [vt ]| calculates the expected magnitude of imbalance. E0 [T ] is the
expected number of trades (or bars). The expression 2v + − E0 [vt ] computes the net expected
volume from buy orders, adjusted by the overall expected volume E0 [vt ], essentially capturing
the expected net directional pressure in trading.
• The inequality |θT | ≥ E0 [T ]|2v + − E0 [vt ]| checks if the absolute value of the observed net
imbalance (|θT |) reaches or exceeds the expected magnitude of imbalance E0 [T ]|2v + −E0 [vt ]|. It
is a condition for determining significant deviations from expected trading patterns, indicating
a period of imbalance in buying versus selling pressures.
• The argument of the minimum function argT min seeks to find the smallest time T (or the
minimum number of trades) for which the above inequality is satisfied. Essentially, it identifies
the earliest point in time where the observed imbalance is significant enough to warrant the
closure of the current bar and the start of a new Dollar Imbalance Bar.
5
• The purpose of this function is to dynamically adjust the size of each bar based on market con-
ditions rather than fixed time intervals. By doing so, each Dollar Imbalance Bar encapsulates
a segment of trading data where there’s a significant net imbalance in trading volumes, making
these bars particularly useful for analysing market microstructure and for algorithmic trading
strategies that capitalize on imbalances in market orders.
• When θT is more imbalanced that expected, a low T will satisfy these conditions
• Dollar Imbalance Bars are indeed a method for sampling financial time series data that focuses
on capturing significant disruptions or imbalances in the market, specifically related to the
dollar volume of trades. Unlike traditional time-based bars that aggregate data over fixed time
intervals, Dollar Imbalance Bars are constructed based on the accumulation of trading activity
that leads to a substantial imbalance between buy and sell orders in terms of dollar value.
• This method is particularly useful for highlighting periods of significant market activity, where
there might be an influx of buy or sell orders that can indicate shifts in market sentiment,
potential price movements, or reactions to news and events. By focusing on the dollar volume
and the direction of trades, this technique provides a more nuanced view of market dynamics
and can be especially valuable for algorithmic trading strategies that aim to exploit short-term
inefficiencies or trends driven by order flow imbalances.
Slide 10
Principal Component Analysis for Financial Data

• Asset Pricing Model
• Multi Factor Portfolio
• Data-drive Factors
• Finding Factors with Principal Component Analysis
• Eigen Portfolios
• Textbook: Chapter 2, section 2.4.2
Slide 11
Stock Returns
Stock returns measure the change in the price of a stock over a specific period. They represent the
gain or loss made on a stock and are a fundamental metric used by investors to assess performance.

Si (t)
ri (t) = log , i = 1, ..., Nstocks
Si (t − ∆t)
ri (t) = αi + βi rM (t) + εit
Here:
• ri (t): the log-return for stock i
• In the first expression, the log-return for stock i in a time t as, Si (t) is the stock price at time
t, and Si (t − ∆t) is the stock price at a previous time.
6
• In the second expression, the stock-return is a function of the market returns, individual stock
characteristics, and a noise term: ri (t) = αi + βi rM (t) + εit .
• rM (t): the log-return for the “market” M (the DJI index)
• The return of a market index represents the overall market performance.
• βi : the “beta” - measure of correlation with the market.
• βi measures the stock’s volatility or sensitivity relative to the market. A βi greater than 1
indicates that the stock is more volatile than the market, while a βi less than 1 indicates less
volatility.
• εit : the “noise” term (residual) ⇔ E[εit ] = 0
• The noise or error term, represents unexplained variance in the stock’s returns not accounted
for by market movements. εit includes i for the stock index and t for the time period, indicating
that the noise is specific to each stock at each time.
• αi represents the stock’s expected return independent of the market return, essentially the
stock’s intrinsic value.
• ∆t: the time step (it can be 1m, 1h, 1d, 1w, 1y, etc.)
Please, keep in mind what we said last lecture about information-driven bars! This phrase likely
refers to a previous discussion on alternative data sampling methods, like information-driven bars,
which are designed to capture more relevant market information than traditional time bars. By
mentioning this, the lecturer suggests that understanding stock and market returns is enhanced by
considering how data is sampled and structured, emphasizing that the choice of data representation
(like Dollar Imbalance Bars) can significantly impact the analysis and modeling of stock returns.
Slide 12
Estimating Greek Symbols and Returns

r̂i (t) = α̂i + β̂i rM (t)
ε̂i (t) = ri (t) − r̂i (t) = ri (t) − α̂i − β̂i rM (t)
• This slide delves into the estimation of parameters in the model that describes stock returns
and their relationship to the market, and introduces the concept of diversification in portfolio
construction.
• Remember, the stock return ri (t) model includes a stock-specific component (αi ), a market-
related component (βi rM (t)), and a noise term (εit ).
• r̂i (t) represents the estimated or predicted return for stock i at time t, based on the
estimated parameters α̂i and β̂i . These estimates are derived from historical data, typically
using regression analysis or other statistical techniques.
• ε̂i (t) is defined as the difference between the actual return ri (t) and the estimated return
r̂i (t). It represents the residual or error in the estimation, capturing the part of the actual
return not explained by the model.
7
Purpose of r̂i (t) and ε̂i (t)
• The introduction of r̂i (t) serves to apply the theoretical model to actual market data. By
estimating α̂i and β̂i , analysts can predict stock returns based on observed market returns,
helping decision-making and investment strategy-formulation.
• Defining ε̂i (t) is essential for assessing the model’s accuracy and the effectiveness of the es-
timates. A smaller residual implies that the model and the estimates α̂i and β̂i are closely
capturing the stock’s return dynamics.
• Analyzing ε̂i (t) across different stocks can also provide insights into idiosyncratic risks and the
potential for diversification
Use diversification to produce low volatility portfolios uncorrelated with the market
• The phrase ”Use diversification to produce low volatility portfolios uncorrelated with the mar-
ket” highlights the importance of diversification in portfolio management. By combining stocks
with varying β values and uncorrelated residuals ε̂i (t), investors can construct a portfolio that
has lower overall volatility and is less sensitive to market movements.
• Diversification is a fundamental principle in finance that helps in reducing unsystematic risk

(specific to individual stocks) in a portfolio. By holding a diversified portfolio, the impact of
any single stock’s performance is minimized, leading to more stable returns.
Slide 13
Multi-Factor Model
The slide transitions from a single-factor model, where stock returns ri (t) are explained by the market
return rM (t) ad a unique stock-specific term εit , to a multi-factor mode. In the multi-factor model,
individual stock returns Ri are influenced by multiple factors Fj , each with its own sensitivity βij ,
and an idiosyncractic term R ei .

⇓
Multi-factor model: individual returns Ri (i = 1, ..., N ) are driven by m factors Fj (j = 1, ..., m)

and idiosyncratic components Rei with E[R
ei ] = 0
m
X
Ri = αi + βij Fj + R
ei
j=1
• Ri is analogous to ri (t), representing the return of stock i in both single-factor and multi-factor
models
m
• βi rM (t) in the single-factor model is analogous to
P
βij Fj in the multi-factor model, where
j=1
the market return rM (t) is replaced by multiple factors Fj affecting the stock return. This
m
P
expression βij Fj sums the contributions of m different factors Fj to the stock return,
j=1
each weighted by βij , which measures the sensitivity of stock i to factor j. These factors
could represent various systematic risks or influences, such as economic indicators, industry
performance, or geopolitical events.
8
• εit is analogous to R
e i , where both terms represent the residual or idiosyncratic component of
the stock return not explained by the market or other factors.
• N represents the number of stocks in the model, while m represents the number of factors
considered to influence stock returns.
• The is no fixed relationship between N and m; however, m is typically much smaller that N
to avoid overfitting and ensure the model’s manageability and interpretability.
• “Idiosyncratic” refers to the stock-specific or unique factors affecting an individual stock’s

return that are not captured by the broader systematic factors Fj in the model.
• E[Re i ] = 0 implies that these idiosyncratic components are expected to average out to zero
over time, indicating they are random, unsystematic influences that do not have a predictable
impact on stock returns.
Factors Fj can be thought of as returns of “benchmarks” portfolios representing systematic factors

(e.g. industry and geography factors)
• This phrase means that each Fj captures the return of a hypothetical portfolio that is con-
structed to represent a specific systematic risk or influence on stock returns. These could
include portfolios constructed to isolate the effects of certain industries, economic cycles, in-
terest rates, or geographical regions, allowing the model to capture how these broader factors
influence individual stock returns.
Slide 14
What Factors?
• Growth vs Value
• Market Capitalization
• Credit Rating
• Stock Price Volatility
• Are they independent? Uncorrelated?
• How do we estimate the beats?
Slide 15
Can we build intrinsic factors using stock data?

Building intrinsic factors using stock data helps identify underlying market forces and asset-specific
characteristics influencing stock prices. The approximation from price difference to log returns is valid
for small price changes, making returns additive over time. Standardized returns normalize the data,
making it easier to compare across different stocks. The empirical correlation matrix measures how
returns of different stocks move together, used for portfolio optimization and risk management. For a
detailed explanation, refer to financial textbooks or articles on portfolio theory and risk management.
(i) (i)
Data: history of N + 1 daily prices of M stocks Sn ≡ Stn for i = 1, ..., M measured at times
t = [t0 , t1 , ..., tN ]
9
(i)
• Sn denotes the price stock i at time n, where n is an index for time periods (days in this
case).
(i)
• Stn is another way to to express the price, explicitly showing tn as the specific time point (like
a data) for the n-th observation.
(i) (i)
• The terms Sn and Stn are used interchangeably, and the approximation sign might simply
denote this equivalence.
• N + 1 represents the number of daily price observations for each stock. If N is the number of
days, N + 1 price points include the starting price and the closing price on the N -th day.
1. Compute daily returns

(i) (i) (i)
Sn − Sn−1 Sn
Rni = (i)
≈ log (i)
, n = 1, ..., N i = 1, ..., M
Sn−1 Sn−1
• Rni is the daily return for stock i on day n, calculated as the relative change in price
(i) (i)
Sn −Sn−1
from the previous day: (i)
Sn−1
(i)
• The approximation to log Sn
(i) is made because for small relative changes in price, the
Sn−1
logarithm of the ratio of successive prices closely approximates the relative change, and
log returns are additive over time, making them useful for time series analysis.
• Taylor series expansion of the logarithm function. For small x, log(1 + x) ≈ x with x, in
(i) (i)
Sn −Sn−1
this case, being x = (i)
Sn−1
2. Standardized returns (data normalization):

N N
Rni − Ri 1 X 1 X
Xni = , Ri = Rni , σ 2i = (Rni − Ri )2
σi N n=1 N − 1 n=1
• Xni is the standardized return for stock i on day n, making returns comparable across
stocks by removing the mean and scaling by the standard deviation.
• The term “standardized return” Xni refers to the process of subtracting the mean and
dividing by the standard deviation, which normalizes the returns. It’s called “standard-
ized” because it transforms the returns to have a mean of 0 and a standard deviation of
1, aligning with the standard normal distribution.
• Ri is the average daily return for stock i over N days.
• Standardization helps in normalizing returns, facilitating comparison and correlation anal-
ysis.
• σ 2i is the variance of daily returns for stock i, measuring the dispersion of returns
around their mean.
3. The empirical correlation matrix is the covariance matrix of standardized returns:

N
1 X 1
Cij = Xni Xnj = (XT X)ij
N − 1 n=1 N −1
• Cij is the correlation matrix derived from standardized returns indicating the strength
and direction of the linear relationship between pairs of stocks.
10
N
• Cij is calculated as 1 1
(XT X)ij
P
N −1
Xni Xnj , which simplifies to N −1
when considering
n=1
matrix operations, where X is the matrix of standardized returns.
This matrix is not diagonal!
• The conclusion that Cij is not diagonal implies that there are non-zero off-diagonal ele-
ments, indicating correlations between different stocks. A diagonal matrix would suggest
no correlation between different stocks, which is unlikely in real markets due to various
systemic factors affecting multiple stocks simultaneously.
Slide 16
How do Stock Returns Correlate with Each Other?

250 pairs of stock have about 0.3 normalized correlation
Slide 16
To Get Uncorrelated Factors, C must be Diagonal!

• How to make correlation Matrix C diagonal?
• Introduce a linear transform (linear encoder) of the data Z = XV by a p × k orthogonal matrix

V with VVT = I
• X is typically the matrix containing all standardized Xni returns of each stock i in the model.
Each row in X would represent a different time point (e.g., each day), and each column would
represent the standardized returns of a specific stock i.
• The matrix V is created or used to transform X into Z in such way that the features (stocks)
in Z become uncorrelated (meaning that the off-diagonal elements of the matrix are zero or
statistically indistinguishable from zero in practical applications). V is determined through
the eigenvalue decomposition of the correlation matrix C of X, not created arbitrarily.
• Z is the transformed data matrix obtained by applying the linear transformation Z = XV ,

where V contains the eigenvectors of C. The transformation aims to decorrelate the features
(stocks), meaning the covariance matrix of Z will be diagonal if V is chosen correctly (as U ,
the matrix of eigenvectors of C).
• The importance of Z’s covariance matrix being diagonal (represented by Λ) is that it indicates
the transformed features are uncorrelated, simplifying the analysis and interpretation of the
data. Each diagonal element (an eigenvalue) in Λ represents the variance explained by the cor-
responding principal component in Z, facilitating dimensionality reduction and more efficient
data representation.
• A decoded signal (inverse transform) is obtained as X̂ = ZVT
• X is the original standardized return data matrix before any transformation.
• X̂ is the approximation or reconstruction of X after it has been transformed to Z and then

inversely transformed back. The ˆ symbol typically denotes an estimated or derived value in
statistics.
11
• In an ideal scenario, where no information is lost during the transformation and inverse trans-
formation, X̂ would be identical to X. However, in practical applications, especially in di-
mensionality reduction techniques like Principal Component Analysis (PCA), some information
may be discarded to focus on the most significant components (those associated with the largest
eigenvalues). In such cases, X̂ would be a close approximation of X, but not an exact replica,
as it represents the data reconstructed from a subset of all the original dimensions.
• Eigenvalue decomposition of the correlation matrix (note that C is non-negative definite! mean-
ing that all C’s eigenvalues are non-negative.)
C = UΛUT
• Remember that C is defined as N1−1 (X T X)ij , representing the correlations between each pair
of stocks based on their standardized returns X.
• This process decomposes the correlation matrix C into a product of its eigenvectors (U ), a
diagonal matrix of its eigenvalues (Λ), and the transpose of its eigenvectors (U T ).
• Λ = diag(λ1 , ..., λN )(λ1 ≥ ... ≥ λN ) is a diagonal matrix of ordered eigenvalues
• The eigenvalue λi represents the magnitude of an eigenvector i’s contribution to the data
variance. Higher eigenvalues correspond to directions (eigenvectors) that explain more variance.
• U is a N × N orthogonal matrix (UUT = I) that stores eigenvectors column-wise
• Matrix U contains the eigenvectors of the correlation matrix C. Each column in U is an

eigenvector corresponding to a specific eigenvalue. These eigenvectors represent the directions
in the data space along which variance (or information) is maximized.
• Use this to compute the covariance matrix of Z = XV

1 1 1 1 1
Cov[Z] = ZT Z = VT XT XV = VT CV = VT UΛ(VT U)T −−−! Λ
N −1 N −1 N−1 N−1 V=U N − 1
• This series of equalities explains how transforming the standardized return matrix X with an
orthogonal matrix V to get Z and then computing the covariance of Z leads to a diagonal
matrix Λ when V is chosen to be U , the eigenvectors of C.
• This transformation decorrelates the data, as evidenced by the covariance matrix of Z being
diagonal, meaning the transformed factors in Z are uncorrelated.
• Covariance measures how two variables change together. If they tend to increase and decrease
together, the covariance is positive; if one increases when the other decreases, the covariance is
negative.
• Correlation Matrix standardizes the Covariance Matrix by dividing each element by

the product of the standard deviations of the corresponding variables, normalizing the scale
and making the matrix dimensionless. Correlation values range from -1 to 1.
• This video is a very easy-to-understand explanation of correlation and covariance matrixes.
Slide 18
12
Preserving Total Variation of Stock Returns
• U is a N × N orthogonal matrix (UUT = I) that stores eigenvectors column-wise
• Don’t forget that U is the matrix of eigenvectors from the eigenvalue decomposition of the
correlation matrix C

1 1 1 1 1
Cov[Z] = ZT Z = VT XT XV = VT CV = VT UΛ(VT U)T −−−! Λ
N −1 N −1 N−1 N−1 V=U N −1
• Remember the properties of matrix transpose: Z T = (XV )T = V T X T . Also, the correlation

matrix C = N1−1 X T X where X contains standardized data
• Therefore, when V = U, encoding Z = XV preserves the total variation of data

1 1
T otV ar[X] = T r[XT X] = T r[C] = T r[UΛUT ] = T r[ΛU T U ] = T r[Λ]
N −1 N −1
1 1
T otV ar[Z] = T r[ZT Z] = T r[VT UΛ(VT U)T ] −−−! T r[Λ]
N −1 V=U N − 1
• T r denotes the “trace” of a matrix, which is the sum of its diagonal elements. For a diagonal
matrix like Λ, which contains the eigenvalues λ1 , λ2 , ...λN on its diagonal, T r[Λ] is simply
the sum of these eigenvalues. The trace is a significant property because, in the context
of covariance or correlation matrices, it represents the total variance (or total information)
contained in the data.
• The slide shows that the total variation of the original data X and the transformed data Z
are both equal to the sum of the eigenvalues T r[Λ]. This demonstrates that the transformation
Z = XV , when V is chosen to be U (the eigenvectors of C), preserves the total variation (or
total information) contained in the original data.
• The equality of T otV ar[X] = T otV ar[Z] = T r[Λ] means that the transformation to Z does
not lose any information in terms of variance. This is a crucial property for techniques like
PCA, where Z represents the data projected onto the principal components. It ensures that
although the data dimensions might be reduced (if not all eigenvalues/eigenvectors are used),
the variance (information) preserved is maximized according to the largest eigenvalues.
Slide 19
The PCA are the top Eigenvectors in U

• Top eigenvectors are those that correspond to the largest eigenvalues
• Every column of Z is a portfolio made of linear combinations of DJ stocks
Slide 20
Dimension Reduction with PCA

• We had a linear transform (linear encoder) of the data Z = XV parametrized by a p × k
orthogonal matrix V with VVT = 1
13
• Making the projection matrix V consist of the first k ≤ p eigenvectors of C means selecting
the eigenvetors associated with the k largest eigenvalues from the eigenvalue decomposition of
the correlation matrix C. These k eigenvectors are used because they represent the directions
in the data space that capture the most variance.
• A decoded signal (inverse transform) is obtained as X̂ = ZVT
• Dimension reduction: make the projection matrix V of the first k ≤ p eigenvector of C
V = U[1:k]
• V = U [1:k] indicates that a matrix V is formed by taking the first k columns of U , where U
is the matrix containing all the eigenvectors of C as its columns. The notation U [1:k] signifies
selecting columns 1 through k from U .

1 1 1
Cov[Z] = ZT Z = VT XT XV = VT CV = VT UΛ(VT U)T −−−−−! Λ[1:k]
N −1 N −1 V=U[1:k] N − 1
• When V is chosen as U [1:k] , the transformed data Z captures the variance along the directions
defined by the first k eigenvectors. The covariance matrix of Z, Cov[Z], becomes diagonal
with the first k diagonal elements being the corresponding k largest eigenvalues, represented
by Λ[1:k] .
• The fact that Cov[Z] = Λ[1:k] does not imply that the remaining eigenvalues λk+1 to λp
are zero; rather, it means that the transformation and subsequent dimensionality reduction
process have effectively disregarded the variance captured by these smaller eigenvalues. The
total variance preserved in Z is the sum of the variances along the principal components
retained, correspoding to the k largest eigenvalues.
• Only a part of total variation of X is preserved:

k
X
T otV ar[Z] = T r[Λ[1:k] ] = λi
i=1
Slide 22
Project the Data so that Variance is Maximized

• Goal: The objective is to find the directions (in the feature space) along which the data varies
the most. This is done by projecting the data onto a new axis (defined by a weight vector w)
and maximizing the variance of the projected data.
14
• Pick directions along which data varies the most
n n 2
1X 1X
Variance = yt − ys
n t=1 n s=1
n n 2
1X T 1X T
= w xt − w xs
n t=1 n s=1
n X n 2
1X 1
= w T xt − w T xs
n t=1 n s=1
n
1X T
= (w (xt − µ))2
n t=1
= average squared inner product
n
1X
= ||xt − µ||2 cos (w, xt − µ)
n t=1
• The equation yt = wT xt represents the projection of a data point xt onto the direction defined
by w. Here, yt is the projected value of xt , and w is a weight vector that defines the direction
of projection. This equivalence is used to transform the original data points into their projected
values, allowing the analysis to focus on the variance of these projections.
• xt represents an individual data point at time t, where t indexes the observations in the dataset.
xs is used similarly but for a different observation indexed by s. In the context of the slide, xs
appears in the summation to compute the mean of the projected points, which is substracted
to center the data around zero before calculating the variance.
• First principal component

n n 2
1X 1X T
w1 = arg max w T xt − w xt
w:∥w∥2 =1 n n t=1
t=1
n
1X T
= arg max (w (xt − µ))2
w:∥w∥2 =1 n
t=1
n
1X T
= arg max w (xt − µ)(xt − µ)T w
w:∥w∥2 =1 n
t=1
X
T
= arg max w w
w:∥w∥2 =1
• The arg max function is used to find the value that maximizes a given expression. In this
context, it seeks to find the weight vector w that maximizes the variance of the projected data.
• The condition ||w||2 = 1 ensures that w is a unit vector, meaning its length or norm is
constrained to 1. This normalization is necessary to make the optimization problem well-
defined and to prevent the trivial solution where the magnitude of w goes to infinity.
• The expression can be read as: “Find the vector w with a norm of 1 that maximizes the given
expression.”
•
P P
is the covariance matrix solution: w1 = Largest Eigenvector of
• The final result shows that the weight vector w1 , which maximizes the variance of the projected
data and thus defines the first principal component, is the eigenvector of the covariance matrix
15
P
corresponding to the largest eigenvalue. This is because the expression being maximized
essentially represents the variance of the data when projected onto w, and through the opti-
mization process, it isPshown that the direction that maximizes this variance is given by the
largest eigenvector of .
S
Slide 25
Stationary Time Series

• A time series is said to be stationary if its statistical properties do not change over time.
Specifically, a time series {yt } is covariance stationary if it meets the following criteria:
– Constant Mean: the expected value E[yt ] = µ is constant for all time periods t. This
implies that the time series has a stable long-term average level.
– Constant Variance: The variance of yt does not depend on t. Although not explicitly
mentioned in the slide, this is implied by the constant autocovariance property for different
lags j.
– Constant Autocovariance: The autocovariance between yt and yt−j only depends on
the lag j and not on the specific time t. Mathematically, cov(yt , yt−j = γj for any t and
lag j. This means the way values of the series relate to past values remains consistent
over time.
• The autocovariance (γj ) measures how the series at one time point is related to its past
values, considering different lags (ȷ). Measures the linear dependency between two points in
the same series separated by a lag j. It’s a measure of how much past values of the series
influence its future values.
• The autocorrelation (ρj ) normalizes/standardizes autocovariance by the variance of the se-

ries, producing a scaled measure (−1 ≤ pj ≤ 1) that indicates the strength and direction of
the linear relationship between yt and yt−j .
• These concepts help in analyzing the time series’ predictability and cyclic patterns. For further
details, you might refer to statistical textbooks on time series analysis.
• The time series {yt } is covariance stationary of
E[yt ] = µ for all t

cov(yt , yt−j ) = E[(yt − µ)(yt−j − µ)] = γj for all t and any j
• The parameter γj is called the j-th order or lag j autocovariance of {yt } and a plot of γj against
j is called the autocovariance function
• The autocorrelations of {yt } are defined by
cov(yt , yt−j ) γj
ρj = p =
var(yt )var(yt−j ) γ0
• and a plot of ρj against j is called the autocorrelation function (ACF)
Slide 26
16
Sample Auto-covariance
• Sample autocovariance at lag j is an estimate of the theoretical autocovariance γj for a time
series. It measures the linear dependency between two points in the same series separated by
lag j, based on the observed data.
• The lag j sample autocovariance γ̂j and lag j sample autocorrelation ρ̂j are defined as:
T
1 X
γ̂j = (yt − y)(yt−j − y)
T t=j+1
γ̂j
ρ̂j =
γ̂0
• where y = 1
PT
T t=1 is the sample mean
• In the ladder formula, where y is the sample mean of the time series yt , and T is the total
number of observations. This formula calculates the covariance between values j periods apart,
adjusted for the mean of the series.
• Sample autocorrelation at lag j (ρ̂j ) is the correlation between two points in the same series
separated by lag j, based on the observed data. It standardizes the sample autocovariance by
the variance of the series to produce a dimensionless measure.
γ̂
• ρ̂j = γ̂0j where γ̂0 is the sample autocovariance at lag 0 (which is equivalent to the sample
variance of the series). This formula scales the autocovariance at lag j (γ̂j ) by the variance of
the series to obtain a measure that ranges from -1 to 1, indicating the strength and direction
of the linear relationship between observations j periods apart.
• The sample ACF (SACF) is a plot of ρ̂j against j
Slide 27
Example: White Noise

• Perhaps the most simple stationary time series is the independent Gaussian White Noise
(GWN) process yt ∼ iid N (0, σ 2 ) ≡ GW N (0, σ 2 ). This process has µ = γj = ρj = 0 (j ̸= 0)
• A GWN process yt is a sequence of random variables where each variable yt is independently

and identically distributed (iid) as a normal distribution with mean 0 and constant variance
σ 2 , denote as N (0, σ 2 ).
• For j ̸= 0, the mean µ = 0, autocovariance γj = 0, and autocorrelation ρj = 0. This implies

that there is no linear dependence between different time points in the series, making it a purely
random process with no predictable structure.
• Two slightly more general processes are the independent white noise (IWN) process, yt ∼
IW N (0, σ 2 ), and the white noise (WN) process, yt ∼ W N (0, σ 2 )
• IWN is a more general form where yt has independent increments, meaning the change between
any two time point is is independent of changes at other time points. This process has mean
zero and variance σ 2 , similar to GWN.
• WN shares the zero mean and constant variance properties with GWN and IWN but is charac-
terized by uncorrelated (rather than strictly independent) increments. The lack of correlation
between increments allows some forms of dependence that do not affect linear correlation.
17
• Both processes have mean zero and variance σ 2 , but the IWN process has independent incre-
ments, whereas the WN process has uncorrelated increments
Slide 28
Example: White Noise Statistics

• The SACF is typically shown with 95% confidence limits about zero. These limits are based
on the result that if {yt } ∼ iid(0, σ 2 ) then:

A 1
ρ̂j ∼ N 0, ,j > 0
T
• The SACF is a tool used to measure and visualize the autocrrelation

A
• The notation ρ̂j ∼ N (0, T1 ) means that the distribution of ρ̂j is approximated by normal
distribution with mean 0 and variance T1 and is based on the central limit theorem result
√ d
T ρ̂j !
− N (0, 1). The 95% limits about zero are then ± 1.96
√
T
• For a white noise process, where {yt } is a series of iid random variables with mean 0 and
variance σ 2 , the autocorrelation at lag j > 0 is expected to be 0.
• However, due to random variation, the estimated autocorrelation ρ̂j for a finite sample can
deviate from 0. The Central Limit Theorem allows us to approximate the distribution of ρ̂j as
a normal distribution with mean 0 and variance T1 where T is the sample size.
• This approximation leads to the establishment of 95% confidence limits aroud 0, typically
set at ± 1.96
T
. These limits serve as a benchmark to assess the significance of the estimated
autocorrelation at each lag. Autocorrelation values that fall outside these limits suggest a
departure from the white noise behavior, indicating potential patterns or structures in the time
series that could be exploited for predictive modeling.
Slide 29
Ergodic Time Series

• A stationary time series {yt } is ergodic if sample moments converge in probability to population
p p p
moments; i.e. if y !
− µ, γ̂j !
− γj and ρ̂j ! − ρj
• A time series {yt } is called ergodic when its long-term, time-averaged properties can replicate
the ensemble-averaged properties. It means that given a single, sufficiently long time series,
you can obtain reliable estimates of the overall statistical properties (moments) of the entire
population of such series.
• Stationarity is a prerequisite for ergodicity in this context. While stationarity ensures that
the statistical properties (like mean and variance) of the time series do not change over time, er-
godicity takes this a step further by implying that these properties can be accurately estimated
from a single time series trajectory.
• The slide specifies that for stationary time series to be ergodic, the sample moments (like the
sample mean y, sample autocovariance γ̂j , and the sample autocorrelation ρ̂j must converge in
probability to their true population counterparts (µ, γj , ρj ) as the sample size increases. This
p
convergence is denoted by ! − , which stands for “converges in probability to”.
18
• Ergodicity is crucial for practical time series analysis because it justifies the use of time-averaged
estimates from a single series to infer the properties of the broader population of such series.
This is especially relevant in financial markets, climatology, and other fields where analysts often
have access to only one realization of a process (e.g., one historical price series, one sequence
of temperature measurements) and need to make inferences about its overall behavior.
Slide 30
Linear Time Series

• Wold’s decomposition theorem (c.f. Fuller (1996) pg. 96) states that any covariance stationary
time series {yt } has a linear process or infinite order moving average representation of the form
∞
X ∞
X
yt = µ + ψk εt−k , ψ0 = 1, ψk2 < ∞
k=0 k=0
2
εt = W N (0, σ )
• Wold’s theorem posits that any covariance stationary time series {yt } can be represented as a
sum of two components: a deterministic part is typically a linear combination of past values (if
it exists), and the stochastic part is a linear process or an infinite order moving average (MA)
representation.
• The slide focuses on the stochastic component, stating that any covariance stationarity time
series can be expressed as an infinite MA process:
∞
X
yt = µ + ψk εt−k
k=0
Where:
– µ is the mean of the time series

– ψk are the weights of P
the MA process, with ψ0 = 1 ensuring the process starts at the current
time. The condition ∞ 2
k−0 ψk < ∞ ensures the series of weights is square-summable, which
is necessary for the convergence and stationarity of the series.
– εr represents the innovation or “shock” at time t, which is assumed to be white noise
W N (0, σ 2 ) with mean 0 and constant variance σ 2 . White noise signifies that these shocks
are uncorrelated and identically distributed random variables, providing the series’ ran-
domness.
• Linear Representation: The Theorem provides a linear representation of stationarity time

series, which is crucial for modeling and forecasting. It implies that the current value of
stationary series can be seen as a weighted sum of past random shocks.
• Forecasting: In practical terms, Wold’s decomposition allows for the prediction of future
values of the time series based on past shocks, assuming the process’s structure (the weights
ψk ) is known or can be estimated.
• Infinite Order MA: The infinite sum implies that the effect of a shock εt−k on the current
value yt diminishes as k increases, assuming the weights ψk decreases appropriately. This
reflects the idea that more recent events have a more significant impact on the current state of
the series than distant past events.
Slide 31
19
Properties of Linear Time Series
∞
X
E[yt ] = µ, γ0 = var(yt ) = σ 2 ψk2
k=0
∞
X
γj = cov(yt , yt−j ) = σ 2 ψk ψk+j
k=0
P∞
k=0 ψk ψk+j
ρj = P ∞ 2
k=0 ψk
• γ0 defines the variance of the time series yt , which is also the autocovariance at lag 0.
• Meaning of ψk :
• Influence of Past Shocks: Each weight ψk quantifies the impact of the random schock εt−k
(from k periods ago) on the current value yt of the time series. A larger absolute value of ψk
indicates a stronger influence of that past shock on the current time series value.
• Decay of Influence: In many practical and theoretical time series models, the weights ψk
often decrease as k increases. This decay reflects the intuition that more recent events (lower
k) tend to have a more substantial impact on the current state than events further in the past
(higher k).
• Construction of the Series: The series ∞

P
k=0 εt−k represents the cumulative effect of all past
shocks on the current time series value, adjusted by their respective weights. This infinite sum
captures the idea that the current state of the series is the result of a long history of random
influences, each diminishing in importance with time.
Slide 32
ARMA Time Series

• ARMA (p, q) models take the form of a pth order stochastic difference equation
yt − µ = ϕ1 (yt−1 − µ) + ... + ϕp (yt−p − µ) + εt + θ1 εt−1 + ... + θq εt−q

εt ∼ W N (0, σ 2 )
• An ARMA model combines two components: an autoregressive part of order p and a moving
average of order q. It provides a versatile framework for modeling time series that exhibit both
autoregressive and moving average characteristics.
• yt is the value of the time series at time t
• µ is the mean of the time series
• ψ1 , ..., ψp are the coefficients of the AR part, indicating the influence of the p previous values
on the current value.
• εt is white noise with mean 0 and variance σ 2 , representing the random schock at time t.
• θ1 , ..., θq are the coefficients of the MA part, indicating the influence of the q previous random
shocks on the current value.
20
• The autoregressive component models the current value of the time series as a linear combi-
nation of its previous values. The inclusion of (yt−i ) for i = 1, ..., p in the AR part represents
the idea that the deviation of past values from the mean (mu) influences the deviation of the
current value from the mean. This captures the “memory” of the series, where the past values
have a direct impact on the future values.
• Substracting the mean µ from each term yt−i centers the series, making the model focus on how
deviations from the long-term average (mean) predict future deviations. This is particularly
useful for ensuring the model works well with stationary time series , where the mean is constant
over time. It helps in emphasizing the fluctuations or deviations from the mean rather than
absolute values, which is more meaningul for understanding the dynamics of the series.
• The moving average component models the current value of the series as a linear combination
of past and current random shocks (εt , εt−1 , ..., εt−q . These shocks represent unforeseen events
or errors from past predictions that affecto the current state of the series.
• The MA part is essentially an accumulation of white noise because it captures the effect of
past random disturbances on the current value. Each εt−i term represents a random shock
from i periods ago, and the θi coefficients determine how much these past shocks contribute to
the current value. This setup modesl the idea that the series is influenced not just by its past
values but also by random, unpredictable events. The “white noise” term W N (0, σ 2 ) ensures
these shocks are uncorrelated with each other and have a constant variance. embodying the
random “noise” that perturbs the series over time.
Slide 33
MA(1) Time Series

• The MA(1) model has the form
yt = µ + εt + θεt−1 , εt ∼ W N (0, σ 2 )
• For any finite θ the MA(1) is stationary and ergodic. The moments are E[yt ] = µ, γ0 =
σ 2 (1 + θ2 ), γ1 = σ 2 θ, γj = 0 for j > 1 and ρ1 = θ/(1 + θ2 ).
• Hence, the ACF of an MA(1) process cuts off at lag one, and the maximum value of this
correlation is ±0.5
• The main purpose of modeling a time series with only one lag of white noise is to highlight
simplicity while still capturing essential dynamics in the time series data.
Slide 34
MA(1) Time Series: Invertibility
ρ1 = θ/(1 + θ2 )
• There is an identification problem with the MA(1) model since theta = 1/theta produce the
same value of ρ1 . The MA(1) is called invertible if |θ| < 1 and is called non-invertible if |θ| ≥ 1.
In the invertible MA(1), the error term εt has an infinite order AR representation of the form
∞
X
εt = θ∗j (yt−j − µ)
j=0
21
• where θ∗ = −θ so that εt may be thought of as a prediction error based on past values of yt
Slide 35
MA(1) Time Series: Example

• MA(1) models often arise through data transformations like aggregation and differencing. For
example, consider the signal plus noise model
yt = zt + εt , εt ∼ W N (0, σε2 )
zt = zt−1 + ηt , ηt ∼ W N (0, ση2 )
• where εt and ηt are independent. For example, zt could represent the fundamental value of an
asset price and ηt could represent an iid deviation about the fundamental price. A stationary
representation requires differencing yt
∆yt = ηt + εt − εt−1
• It can be shown that ∆yt is an MA(1) process with

p
−(q + 2) + q 2 + 4q
θ=
2
ση2
• where q = σε2
is the signal-to-noise ratio and ρ1 = −1
q+2
<0
Slide 36
MA(p) Time Series

• The MA(q) model has the form
yt = µ + εt + θ1 εt−1 + ... + θq εt−q , whereεt ∼ W N (0, σ 2 )
• The MA(q) model is stationary and ergodic provided θ1 , ..., θq are finite. It is invertible if all
of the roots of the MA characteristic polynomial
θ(z) = 1 + θ1 z + ... + θq z q = 0
• lie outside the complex unit circle. The moments of the MA(q) are
E[yt ] = µ
γ0 = σ 2 (1 + θ12 + ... + θq2 )
(
(θj + θj+1 θ1 + θj+2 θ2 + ... + θq θq−j )σ 2 for j = 1, 2, ..., q
γj =
0 for j > q
• Hence, the ACF of an MA(q) is non-zero up to lag q and is zero afterwards

Slide 37
22
Overlapping Returns
• Example: Overlapping returns and MA(q) models
• MA(q) models often arise in finance through data aggregation transformations. For example,
let Rt = ln(Pt /Pt−1 ) denote the monthly continuously compounded return on an asset with
price
P11 Pt . Define the annual return at 2time t using monthly returns as Rt (12) = ln(Pt /Pt−12 ) =
j=0 Rt−j . Suppose Rt ∼ W N (µ, σ ) and consider a sample of monthly returns of size T ,
{R1 , R2 , ..., RT }.
• A sample of annual returns may be created using overlapping or non-overlapping returns. Let
{R12 (12), R13 (12), ..., RT (12)} denote a sample of T ∗ = T − 11 monthly overlapping annual
returns and {R12 (12), R24 (12), ..., RT (12)} denote a sample of T /12 non-overlapping annual
returns
• Researchers often use overlapping returns in analysis due to the apparent larger sample size.
One must be careful using overlapping returns because the monthly annual return sequence
{Rt (12)} is not a white noise process even if the monthly return sequence {Rt } is. To see this,
straightforward calculations give:
E[Rt (12)] = 12µ

γ0 = var(Rt (12)) = 12σ 2
γj = cov(Rt (12), Rt−j (12)) = (12 − j)σ 2 for j < 12
γj = 0 for j ≥ 12
• Since γj = 0 for j ≥ 12 notice that {Rt (12)} behaves like an MA(11) process
Rt (12) = 12µ + εt + θ1 εt−1 + ... + θ11 εt−11

εt ∼ W N (0, σ 2 )
Slide 38
ARMA Time Series

• ARMA(p,q) models take the form a p-th order stochastic difference equation

εt ∼ W N (0, σ 2 )
Slide 39
AR(1) Time Series

• A commonly used stationary and ergodic time series in financial modeling is the AR(1) process
yt − µ = ϕ(yt−1 − µ) + εt , t = 1, ..., T
• where εt ∼ W N (0, σ 2 ) and |ϕ| < 1. The above representation is called the mean-adjusted form.
The characteristic equation for the AR(1) is
ϕ(z) = 1 − ϕz = 0
23
• so that the root is z = 1
ϕ
∞
X
yt = µ + ρj εt−j
j=0
E[yt ] = µ
σ2
γ0 =
1 − ϕ2
Slide 40
AR(1) Half Life

• In a stationary AR(1) model, {yt } exhibits mean-reverting behavior. That is, {yt } fluctuates
about the mean value µ = 1. The ACF and IRF decay at a geometric rate
• The decay rate of the IRF is sometimes reported as a half-life - the lag j half at which the IRF
reaches 12 . For the AR(1) with positive ϕ, it can be shown that
j half = ln(0.5)/ ln(ϕ)
∞
X
j=0
Slide 41
AR(p) Time Series

• The AR(p) model in mean-adjusted form is
yt − µ = ϕ1 (yt−1 − µ) + ... + ϕp (yt−p − µ) + εt
• It can be shown that the AR(p) is stationary and ergodic provided the roots of the characteristic
equation
ϕ(z) = 1 − ϕ1 z − ϕ2 z 2 − ... − ϕp z p = 0
• lie outside the complex unit circle (have modulus greater than one)
Slide 42
24
Financial Time Series (Practice)
Last Lecture
• Financial Time Series
• Stationary
• White Noise
• Autocovariance
• Autocorrelation Function (ACF) and Sample ACF (SACF)
• Ergodicity
Slide 43
Today’s Lecture
• Moving Averages: Property Derivations
• Series of Overlapping Returns
• Random Walk Properties
• Observing Asset Price Fluctuation
• Autoregressive Time Series: Property Derivations
Slide 44
Warm-up Questions
• If we add two time series, what is the mean of the resulting series?
• If we multiplu a time series with a constant number, what is the mean of the resulting series?
• If we multiply a time series with a constant number, what is the variance of the resulting series?
• If we add two independent time series of variances σ 2 and σ 2 , respectively, what is the variance
of the resulting series?
• If we add 30 independent time series, having the same variance, what is the variance of the
resulting series?
Slide 45
MA(1) Time Series

yt = µ + εt + θεt−1 , εt ∼ W N (0, σ 2 )
σ 2 (1 + σ 2 ), γ1 = σ 2 θ, γj = 0 for j > 1 and ρ1 = θ/(1 + θ2 ). Hence, the ACF of an MA(1)
process cuts off at lag one, and the maximum value of this correlation is ±0.5
Slide 46
25
Exercises: MA(1) Property Derivation
• Derive the mean of the MA(1) series from its expression
• Derive the auto-covariance of the MA(1) series from its expression
• Derive the covariance between yt and yt−1
• Derive the correlation coefficient between yt and yt−1
• Find the covariance between yt and yt−2 , yt−3 , yt−4 , yt−5 , ...
Slide 47
Random Walk: Property Derivation
zt = zt−1 + ηt , ηt ∼ W N (0, σ 2 ) ⇐ Random Walk
• Express zt as function of ηt and its past values. Assume z0 = 0
• Derive the mean of the random walk
• Derive the autocovariance of the random walk
• Is the random walk stationary?
Slide 48
Observing Asset Price Fluctuation

• MA(1) models often arise through data transformations like aggreagtion and differencing. For
yt = zt + εt , εt ∼ W N (0, σε2 )
zt = zt−1 + ηt , ηt ∼ W N (0, ση2 )
• where εt and ηt are independent. For example, zt could represent the fundamental value of an
asset price and εt could represent and iid deviation about the fundamental price. A stationary
representation requires differencing yt :
• It can be shown that ∆yt is an MA(1) process with

p
−(q + 2) + q 2 + 4q
θ=
2
ση2 −1
where q = σε2
is the signal-to-noise ratio and ρ1 = q+2
<0
yt = µ + εt + θεt−1 , εt ∼ W N (0, σ 2 )
Slide 49
26
MA(p) Time Series
• The MA(q) model has the form:
yt = µ + εt + θ1 εt−1 + ... + θq εt−q , where εt ∼ W N (0, σ 2 )
of the roots of the MA characteristics polynomial
θ(z) = 1 + θ1 z + ... + θq z q = 0
E[yt ] = µ
γ0 = σ 2 (1 + θ12 + ... + θq2 )
(
γj =
0 for j > q

Slide 50
Overlapping Returns
Example: Overlapping returns and MA(q) models
• MA(q) models often arise in finance through data aggregation transformations. For example,
price
P11 Pt . Define the annual return at 2time t using monthly returns as Rt (12) = ln(Pt /Pt−12 ) =
j=0 Rt−j . Suppose Rt ∼ W N (0, σ ) and consider a sample of monthly returns of size T ,
{R1 , R2 , ..., RT }
returns
• Researches ogten use overlapping returns in analysis due to the apparent larger sample size.
One must be careful using overlapping returns because the monthly annual return sequence
{Rt (12)} is not a white noise process even if the monthly return sequence {Rt } is. To see this,
straightforward calculations give
E[Rt (12)] = 12µ
γ0 = var(Rt (12)) = 12σ 2
γj = 0 for j ≥ 12
• Since γj = 0 for j ≥ 12 notice that {Rt (12)} behaves like and MA(11) process
Rt (12) = 12µ + εt + θ1 εt−1 + ... + θ11 εt−11
εt ∼ W N (0, σ 2 )
Slide 51
27
ARMA Time Series
• ARMA(p, q) models take the form of a p-th order stochastic difference equation:
εt ∼ W N (0, σ 2 )
Slide 52
AR(1) Time Series

yt − µ = ϕ(yt−1 − µ) + εt , t = 1, ..., T
ϕ(z) = 1 − ϕz = 0

ϕ
Slide 53
Exercises: AR(1) Property Derivarion

• Derive the mean of the AR(1) series from its expression
• Derive the auto-covariance of the RA(1) series from its expression
Slide 53
AR(1) Time Series

yt − µ = ϕ(yt−1 − µ) + εt , t = 1, ..., T
ϕ(z) = 1 − ϕz = 0

ϕ
∞
X
j=0
E[yt ] = µ
σ2
γ0 =
1 − ϕ2
Slide 54
28
AR(1) Half Life
about the mean value µ = 1. The ACF and IRF decay at a geometric rate

X∞
j=0
Slide 55
Autoregressive Models
Last Lecture
Slide 56
Today’s Lecture
• AR(1)
• ARMA(p,q)
Slide 57
MA(1) Time Series

yt = µ + εt + θεt−1 , εt ∼ W N (0, σ 2 )
σ 2 (1 + θ2 ), γ1 = σ 2 θ, γj = 0 for j > 1 and ρ1 = θ/(1 + θ2 ). Hence, the ACF of an MA(1) process
cuts off at lag one, and the maximum value of this correlation is ±0.5
Slide 57
29
MA(p) Time Series
• The MA(q) model has the form
yt = µ + εt + θ1 εt−1 + ... + θq εt−q , where εt ∼ W N (0, σ 2 )
of the roots of the MA characteristic polynomial
θ(z) = 1 + θ1 z + ... + θq z q = 0
E[yt ] = µ
γ0 = σ 2 (1 + θ12 + ... + θq2 )
(
γj =
0 for j > q

Slide 58
Overlapping Returns
• Example: Overlapping returns and MA(q) models
• MA(q) models often arise in finance through data aggregation transofrmations. For example,
process
P11 Pt . Define the annual return at time t using monthly returns as Rt (12) = ln(Pt /Pt−12 =
j=0 R t−j . Suppose Rt ∼ W N (µ, σ 2 ) and consider a sample of monthly returns of size T ,
{R1 , R2 , ..., RT }.
returns.
Slide 59
Overlapping Returns
• Researches often use overlapping returns in analysis due to the apparent larger sample size.
One must be careful using overlapping returns because the monthly return sequence {Rt } is.
To see this, straightforward calculations give
E[Rt (12)] = 12µ
γ0 = var(Rt (12)) = 12σ 2
γj = 0 for j ≥ 12
• Since γj = 0 for j ≥ 12 notice that {Rt (12)} behaves like an MA(11) process
Rt (12) = 12µ + εt + θ1 εt−1 + ... + θ11 εt−11
εt ∼ W N (0, σ 2 )
Slide 60
30
AR(1) Time Series
yt − µ = ϕ(yt−1 − µ) + εt , t = 1, ..., T
ϕ(z) = 1 − ϕz = 0

ϕ
Slide 61
Exercises: AR(1) Property Derivation

• Derive the mean of the AR(1) series from its expression
• Derive the auto-covariance of the AR(1) series from its expression
Slide 62
AR(1) Time Series

yt − µ = ϕ(yt−1 − µ) + εt , t = 1, ..., T
ϕ(z) = 1 − ϕz = 0

ϕ
E[yt ] = µ
σ2
γ0 =
1 − ϕ2
Slide 63
31
Solving the AR(1) Time Series
• To see this, consider the general case of an AR(1) with no drift:
yt = ϕyt−1 + εt
• Let ϕ take any value for now
• We can write:
yt−1 = ϕyt−2 + εt−1

yt−2 = ϕyt−3 + εt−2
• Substituiting into (1) yields:
yt = ϕ(ϕyt−2 + εt−1 ) + εt
= ϕ2 yt−2 + ϕεt−1 + εt
• Substituiting again for yt−2 :
yt = ϕ2 (ϕyt−3 + εt−2 ) + ϕεt−1 + εt

= ϕ3 yt−3 + ϕ2 εt−2 + ϕεt−1 + εt
• n Successive substituitings of this type lead to:
yt = ϕn y0 + ϕεt−1 + ϕ2 εt−2 + ϕ3 εt−3 + ... + ϕn ε0 + εt
Slide 64
AR(1) Time Series: Stationarity Cases

We have 3 cases
1. ϕ < 1 =⇒ ϕT ! 0 as T ! ∞
So, the driving process to the system gradually die away
2. ϕ = 1 =⇒ ϕT = 1, ∀T
So, the driving process persists in the system and never die away. We obtain:
∞
X
yt = y0 + µt as T ! ∞
t=0
3. So, just an infinite sum of past inputs plus some starting value of y0
4. ϕ > 1. Now, given inputs become more influential as time goes on, since if ϕ > 1, ϕ3 > ϕ2 > ϕ,
etc.
Slide 65
32
Detrending a Stochastically Non-stationary series
• Going back to our 2 characterisations of non-stationarity, the random walk with drift:
yt = µ + yt−1 + µt
and the trend-stationarity process
yt = α + βt + µt
• The two will require different treatments to induce stationarity. The second case is known as
deterministic non-stationarity and what is require is detrending
• The first case is known as stochastic non-stationarity. If we let
∆yt = yt − yt−1
and Lyt = yt−1

So, (1 − L)yt = yt − Lyt = yt − yt−1
• If we take (1) and substract yt−1 from both sides:
yt − yt−1 = µ + µt
∆yt = µ + µt
• We say that we have induced stationarity by “differencing once”
Slide 66
AR(1) Half Life

about the mean value µ = 1. The ACF and IRF decay at a geometric rate.
Slide 67
ARMA Time Series

• ARMA(p,q) models take the form of a p-th order stochastic difference equation

εt ∼ W N (0, σ 2 )
Slide 68
33
Summary
Slide 69
ARMA
Today’s Lecture
• AR(1)
• ARMA(p,q)
• Auto Regressive Integrated Moving Average (ARIMA)
• Volatility Modeling
Slide 70
Is the Market Index a Random Walk?
zt = zt−1 + ηt , ηt ∼ W N (0, ση2 ) ⇐ Random Walk with Drift
• Derive the mean of the random walk with drift
• Derive the autocovariance of the random walk with drift
• Is the random walk with drift stationary?
Slide 71
The Lag (or Delay) Operator

The lag operator L is defined such that for any time series {yt }, Lyt = yt−1 . It has the following
properties: L2 yt = L · Lyt = yt−2 , L0 = 1 and L−1 yt = yt+1 . The operator ∆ = 1 − L creates the first
difference of a time series: ∆yt = (1 − L)yt = yt − yt−1 . The ARMA(p, q) model may be compactly
expressed using lag polynomials. Define ϕ(L) = 1 − ϕ1 L − ... − ϕp Lp and θ(L) = 1 + θ1 L + ... + θq Lq .
Then, the ARMA model may be expressed as
ϕ(L)(yt − µ) = θ(L)εt
Slide 72
34
AR(1) Time Series
yt − µ = ϕ(yt−1 − µ) + εt , t = 1, ..., T
ϕ(z) = 1 − ϕz = 0

ϕ
Slide 73
Exercise: AR(1) Property Derivation

• Derie the man of the AR(1) series from its expression
• Derive the auto-covariance of the RA(1) series from its expression
• Show that the autocorrelation function is geometrically decreasing
Slide 74
AR(1) Time Series

yt − µ = ϕ(yt−1 − µ) + εt , t = 1, ..., T
ϕ(z) = 1 − ϕz = 0

ϕ
t
X
yt = µ + ϕj εt−j
j=0
E[yt ] = µ
σ2
γ0 =
1 − ϕ2
Slide 75
35
AR(1) Half Life
about the mean value µ. The ACF and IRF decay at a geometric rate.
reaches 12 . For the AR(1) with positive ϕ, it can be shown that:

t
X
yt = µ + ϕj εt−j
j=0
Slide 76
ARMA Time Series

• ARMA(p, q) models take the form of a p-th order stochastic difference equation
yt − µ = ϕ1 (yt−1 − µ) + ... + ϕp (yt−p − µ) ⇐ Auto Regressive (AR)

+ εt + θ1 εt−1 + ... + θq εt−q ⇐ Moving Average (MA)
εt ∼ W N (0, σ 2 )
Slide 77
Random Walk: Property Derivation
zt = zt−1 + ηt , ηt ∼ W N (0, ση2 ) ⇐ Random Walk
• Derive the mean of the random walk. The mean is zero
• Derive the autocovariance of the random walk. The autocovariance is tση2
• Is the random walk stationarity? No, because the autocovariance is dependent on time.
• Financial examples of random walk: Stock prices, log stock prices
Slide 78
Is the Market Index a Random Walk?
pt = µ + pt−1 + at ⇐ Random Walk With Drift
• Express zt as function of ηt and its past values. Assume p0 = 0
• Derive the mean of the random walk with drift
• Derive the autocovariance of the random walk with drift
• Is the random walk with drift stationarity?
Slide 79
36
Observing Asset Price Fluctuation
• MA(1) models often arise through data transformations like aggregation and differencing. For
yt = zt + εt , εt ∼ W N (0, σε2 )
zt = zt−1 + ηt , ηt ∼ W N (0, ση2 ) ⇐ Random Walk
where εt and ηt are independent. For example, zt could represent the fundamental value of an
asset price and εt could represent an iid deviation about the fundamental price. A stationary
representation requires differencing yt
√
−(q+2)+ q 2 +4q ση2
• It can be shown that ∆yt is a MA(1) process with θ = 2
where q = σε2
is the
−1
signal-to-noise and ρ1 = q+2 <0
yt = µ + εt + θεt−1 , εt ∼ W N (0, σ 2 )
Slide 80
Differencing
• Is the difference process stationary?
• Can we write the difference process as a moving average?
∆yt = c + ut + τ ut−1 , ut ∼ W N (0, σu2 )
• What are the parameters of the new moving average?
∆yt = (1 − L)yt = yt − yt−1
Slide 81
Differencing and Stationarity

• The random walk was not stationary
• In general, prices and log prices are not stationary
• Log returns are difference processes and are stationary
• In finance, the preference is to work with log returns
• More will be said about differencing when we address fractional differencing of financial time
series
Slide 82
37
ARIMA(p, d, q Time Series
• The specification of the ARMA(p, q) model assumes that yt is stationary and ergodic. If yt is a
trending variable like an asset price or a macroeconomic aggregate like real GDP, then yt must
be transformed to stationary form by eliminating the trend. Box and Jenkins (1976) advocate
removal of trends by differencing.
• Let ∆ = 1 − L denote the difference operator. If there is a linear trend in yt then the first
difference ∆yt = yt − yt−1 will not have a trend. If there is a quadratic trend in yt , then ∆yt
will contain a linear trend but the second difference ∆2 yt = (1 − 2L + L2 )yt = yt − 2yt−1 − yt−2
will not have a trend.
• The class of ARMA(p, q) models where the trends have been transformed by differencing d
times is denoted ARIMA(p, d, q)
Slide 83
Volatility Measure: Conditional Variance

• Volatility here refers to the conditional variance of a time series.
• For a return series {rt }, we are now interested in
σt2 = Var(rt |Ft−1 )
where Ft−1 is the information set at time t − 1.
Slide 84
Volatility Features
• Clustering: There exist volatility clusters. That is, volatility may be high for certain time
periods and low for other periods.
• Continuity: Volatility evolves over time in a continuous manner. That is, volatility jumps are
rare.
Slide 85
Volatility Model: ARCH(1)

• A well known stylized fact about high frequency financial asset returns is that volatility appears
to be autocorrelated. A simple model to capture such volatility autocorrelation is the ARCH
process due to Engle (1982). To illustrate, let rt denote the daily return on an asset and assyme
that E[rt ] = 0. An ARCH(1) model for rt is
rt = σt zt
zt ∼ iidN (0, 1)
σt2 2
= w + αrt−1
σt2 = Var(rt |Ft−1 ) where w > 0 and 0 < α < 1
Slide 86
38
Summary
• Observing Asset Price Fluctuation: Differencing
• Auto Regressive Integrated Moving Average (ARIMA)
• Volatility Modeling
Slide 87
PCA Code
• SP500 = pd.read csv(’data/SP500.csv’, index col=’Date’).dropna(axis = 1)
(i)
• SP500 is Sn
• SP500moves = SP500.pct change().dropna()
• SP500moves is Rni
• SP500norm = (SP500moves - SP500moves.mean()) / SP500moves.std()
• SP500norm is Xni
• cov = SP500norm.cov()
• eig vals, eig vecs = np.linalg.eig(cov)
• eig vals = np.real(eig vals)
• eig vecs = np.real(eig vecs)
• cov corresponds to the standardized covariance matrix X
• eig vals are the eigen-values matrix Λ
• eig vecs are the eigen-vectors matrix U

Where:
(i)
– Sn is the price for stock i in the day n
(i) (i)
Sn −Sn−1
– Rni is the daily return such that Rni = (i) , n = 1, ..., N i = 1, ..., M
Sn−1
Rni −Ri
– Xni is the standardized returns such that Xni = σi
N
1 1
(XT X)ij
P
– The correlation matrix C is defined as Cij = N −1
Xni Xnj = N −1
n=1
– The eigenvalue decomposition of the correlation matrix C is defined by C = U ΛU T where

U is a N × N orthogonal matrix that stores the eigenvectors and Λ is the diagonal matrix
of eigenvalues λi and all these terms are related through the following equation:
1 1
Cov[Z] = ZT Z −−−! Λ
N −1 V=U N − 1
where Z is defined as Z = XV , and V is an orthogonal matrix and X is the standardized

return matrix X
39
QUIZ MOCK
Question 1
One aspect of financial data facilitates its use in financial decision making is its homo-
geneity and compatibility with existing standards
• False
• Financial data is notably heterogeneous and often incompatible with existing standards with-
out substantial preprocessing and transformation. This diversity arises from various sources,
formats, definitions, and granularities of financial data. For instance, different financial markets
may have different reporting standards and formats for financial statements, trading data may
come in different frequencies (tick data, minute data, daily data, etc.), and economic indicators
may be reported in various formats and units.
Question 2
Time bars are recorded over regular time intervals but are not well adapted to finance
because they don’t include closing prices.
• False
• Time bars in financial data are constructed based on regular time intervals, such as 1-minute,
5-minute, hourly, daily, etc., and typically include four main components: the opening price
(the first price at the beginning of the interval), the highest price, the lowest price, and the
closing price (the last price at the end of the interval) within that period. These are often
referred to as OHLC data (Open, High, Low, Close).
Question 3
One disadvantage of tick bars is that bar generation is independent of trade volumes.
• True
• Tick bars are formed based on a specified number of transactions (ticks), regardless of the
volume of trades involved in each transaction. This means that each tick bar represents a fixed
number of trades, but the volume of these trades can vary significantly from one bar to the
next. Unlike volume bars, which are constructed after a predetermined volume of shares has
been traded, tick bars do not directly account for the trading volume in their construction.
Question 4
The closest bar to having normally distributed returns is the traditional time bar.
• False
• Financial returns, regardless of the type of bar used (time bars, tick bars, volume bars, etc.),
are generally not normally distributed. They tend to exhibit characteristics such as skewness,
kurtosis (fat tails), and volatility clustering, which deviate from the normal distribution. This
is a well-documented phenomenon in financial markets known as stylized facts.
40
Question 5
When the stock price movement is small, percentage return is a good approximation of
log return.
• True
• The equation
(i) (i) (i)
Sn − Sn−1 Sn
Rni = (i)
≈ log (i)
, n = 1, ..., N i = 1, ..., M
Sn−1 Sn−1
is indeed related to the approximation between log returns and percentage returns for small
stock price movements. The equation shows the calculation of daily returns for a stock as the
percentage change from one day to the next. This is then approximated by the log return.
(i) (i)
• For small changes in the stock price (Sn − Sn−1 is small), the percentage change is very close
to the log of the price ratio. This is because the Taylor series expansion of log(1 + x) around
x = 0 (which corresponds to no change in price) starts with x, which is the percentage change
for small values of x. Hence, for small price changes, the log return is a good approximation of
the percentage return, as reflected in your equation.
Question 6
The covariance matrix of two perfectly correlated equities is diagonal.
• False
• A covariance matrix,
P
, for a set of variables (or equities), is defined as:
X σ 2 σ12
1
=
σ21 σ22
Where:
– σ12 is the variance of the first equity

– σ22 is the variance of the second equity
P
– σ12 (or σ21 , since is symmetric) is the covariance between the two equities
• Perfect Correlation means that two equities are perfectly correlated (correlation coefficient,
ρ, is either 1 or -1), it implies that their returns move exactly in tandem either in the same
direction (ρ = 1) or in opposite direction (ρ = −1). The covariance between two perfectly
positively correlated equities (ρ = 1) is given by:
σ12 = ρ · σ1 · σ2 = σ1 · σ2
• Diagonal Matrix is one where all off-diagonal elements are zero. If we were to assert that
the covariance matrix of two perfectly correlated equities is diagonal, it would imply:
X σ 2 0
= 1
0 σ22
However, for perfectly correlated equities, the off-diagonal elements (σ12 and σ21 ) represent the
covariance between the equities and are not zero (except in the trivial case where either of
41
the variances or both are zero). Therefore, the covariance matrix of two perfectly correlated
equities is not diagonal but instead looks like:
X σ2 σ1 · σ2

1
=
σ1 · σ2 σ22
for ρ = 1, and similarly, with the negative values for σ12 and σ21 if ρ = −1, reflecting perfect
negative correlation
Question 7
The value of any diagonal coefficient in a correlation matrix is 1.
• True
• A correlation matrix is a special case of a covariance matrix where the elements are standardized
to show how variables are related to each other. The diagonal elements of a correlation matrix
represent the correlation of each variable with itself.
• Mathematically, the correlation coefficient between two variables, X and Y , is defined as:
cov(X, Y
ρX,Y =
σX σY
where:
– cov(X, Y ) is the covariance between X and Y

– σX and σY are the standard deviations of X and Y , respectively.
When considering the correlation of a variable with itself, the formula becomes:
cov(X, X) σ2
ρX,X = = X
2
=1
σX σX σX
Hence, every diagonal element in a correlation matrix, which represents the correlation
of a variable with itself, is always 1. This is because the correlation of any variable with
itself is perfect, i.e., every variable moves perfectly in tandem with itself.
Question 8
In a moving-average time series of order 1, the autocovariance of order 2 is zero.
• True
• A moving-average (MA) time series of order 1, denoted as MA(1), is defined as:
Yt = µ + ϵt + θϵt−1
where:
– Yt is the value of the time seriesat time t

– µ is the mean of the time series
– ϵt is the white noise error term at time t with mean 0 and variance σ 2
– σ is the parameter of the model which determines the influence of the lag-1 error term
42
• The autocovariance function of an MA(1) process for lag k is given by:

2 2
σ (1 + θ ) if k = 0

γk = σ 2 θ, if k = 1

0 if k > 1

• For an MA(1) series, the autocovariance at lag k > 1 is zero. This is because the influence
of the error term does not extend beyond one lag. In other words, there’s no correlation
between Yt and Yt−2 that’s driven by a common error term, as the errors (ϵt ) are assumed
to be uncorrelated with each other. Therefore, for a moving-average process of order 1, the
autocovariance of order 2 (γ2 ) is indeed zero.
Question 9
The number of eigen portfolios generated from the top k principal components is k.
• True
• Eigen portfolios are constructed from eigenvectors corresponding to the largest eigenvalues
(principal components) of the covariance matrix of asset returns. These principal components
capture the most signifiant variance in the data set.
• When you perform PCA on the covariance matrix of asset returns, you decompose this matrix
into eigenvectors and eigenvalues. The iegenvectors represent the directions of maximum vari-
ance (principal components), and the eigenvalues represent the magnitude of these variances.
• If you select the top k principal components (based on the largest k eigenvalues), you can
construct k eigen portfolios, where each eigen portfolio corresponds to one of these principal
components. Each eigen portfolio is a linear combination of the assets, weighted according to
the corresponding eigenvector (principal component).
• Therefore, if you chose the top k principal components, you will indeed generate k eigen port-
folios, one for each of the selected components. This approach is used to construct portfolios
that capture significant variance in the dataset with a reduced number of components, often
the purposes of diversification, risk management, or idetifying factors in market movements.
Question 10
For a given equity, the time series of the tick bars and dollar bars have the same
autocorrelation function.
• False
• Tick bars and dollar bars are two different methods of sampling financial data, and they are
constructed based on different criteria:
– Tick Barse: These are formed after a fixed number of transactions have occurred, regard-
less of the volume or the time taken for these transactions. Tick bars are used to capture
information based on market activity or the number of trades.
– Dolar Bars: These are formed after a predetermined dollar amount has been transacted.
For example, a dollar bar could be formed every time $1M worth of a particular equity
has been traded. This method is volume-driven and aims to normalize the bars based on
the traded value, rather tan the number of trades.
43
• Because tick bars and dollar bars aggregate data based on different criteria, their time series
properties, including autocorrelations, can differ significantly. The autocorrelation function of
a time series measures the correlation between values of the series at different times, and is
influenced by the underlying sampling method:
– Tick bars might capture more information during high-frequency trading periods, leading
to a potential over-representation of periods with many small trades in terms of number,
but not necessarily in dollar value.
– Dollar bars, on the other hand, could provide a more consistent representation of market
activity in terms of value traded, which might be more relevant during periods of large
transaciotns.
• Given these differences, the autocorrelation functions of tick bars and dollar bars for the same
equity can vary, reflecting the distinct information and market dynamics captured by each
sampling method.
Question 11
Assume the correlation matrix of the log returns of two assets has the form

1 ρ
C=
ρ 1
Then the coefficient ρ must satisfy which inequality?
• |ρ| ≤ 1
• In the context of a correlation matrix, the elements of the matrix represent the Pearson corre-
lation coefficients between pairs of variables (or assets, in this case). The Pearson correlation
coefficient measures the linear correlation between two variables and ranges from -1 to 1, where:
– 1 indicates a perfect positive linear relationship

– -1 indicates a perfect negative linear relationship, and
– 0 indicates no linear relationship
Given the correlation matrix of the log returns of two assets:

1 ρ
C=
ρ 1
• The diagonal elements (1s) represent the correlation of each asset with itself, which is always
1, indicating a perfect positive linear relationship with itself.
• The off-diagonal element, ρ, represents the correlation between the two assets. Since the
Pearson correlation coefficient must be within the range [-1,1], the coefficient ρ must satisfy
the inequality |ρ| ≤ 1. This ensures that ρ represents a valid correlation value, from perfect
negative correlation to perfect positive correlation, including no correlation at all when ρ = 0
44
Question 12
Assume the covariance matrix of the log returns of two assets has the form

1 1
C=
1 2
and that the capital is allocated 40% to Asset #1 and 60% to Asset #2. What is the
variance of the resulting portfolio?
• 1.36
• The variance of a portfolio with two assets can be calculated using the formula:
σp2 = w12 σ12 + w22 σ22 + 2w1 w2 σ12
where:
– σp2 is the portfolio variance

– w1 and w2 are the weights allocated to Asset #1 and Asset #2, respectively, which are
the diagonal elements of the covariance matrix C
• Given the covariance matrix C an the weights, we can substitute the values into the formula:
σp2 = (0.4)2 × 1 + (0.6)2 × 2 + 2 × 0.4 × 0.6 × 1
Expanding and simplifying:
σp2 = 0.16 × 1 + 0.36 × 2 + 0.48 × 1

σ 2 = 0.16 + 0.72 + 0.48
σ 2 = 1.36
Question 13

2 1
C=
1 2
then, which of the following is the largest eigenvalue of the covariance matrix?
• To find the largest eigenvalue of the covariance matrix, we need to solve the characteristic
equation of the matrix. The characteristic equation is obtained by substracting λ (the eigen-
value) times the identity matrix from the covariance matrix and setting the determinant of the
resulting matrix to zero.
• Given the covariance matrix:

2 1
C=
1 2
The characteristic equation is derived from:
det(C − λI) = 0
45
Where I is the identity matrix, and det denotes the determinant. Substituting C and I, we
get:

2 1 1 0 2−λ 1
det −λ = det =0
1 2 0 1 1 2−λ
Expanding the determinant:
(2 − λ)(2 − λ) − (1)(1) = 0
λ2 − 4λ + 3 = 0
(λ − 3)(λ − 1) = 0
λ1 = 3
λ2 = 1
• Hence, the largest eigenvalue of the covariance matrix is 3.
Question 14

2 1
C=
1 2
then, which of the following matrices can be used for the eigen decomposition of C?
1.
1 1 1
√
2 1 −1
2.
1 1
1 −1
3.
1 1
1 1
4.
1 −1 1
√
2 1 −1
• To determine which of the matrices can be used for the eigen decomposition of C, we need to
find the eigenvectors of C and see which option corresponds to the normalized eigenvectors
• Given the covariance matrix:

2 1
C=
1 2
The eigen decomposition of C is given by: C = P DP −1 Where P is a matrix whose columns

are the eigenvectors of C, and D is the diagonal matrix containing the eigenvalues of C
• From a previous calculation (despite the technical issue), we know the characteristic equation
for the eigenvalues of C is λ2 − 4λ + 3 = 0, leading to eigenvalues λ1 = 1 and λ2 = 3.
46
• λ1 = 3:

2 1 x x
=3
1 2 y y
This gives us:
2x + y = 3x
x + 2y = 3y

1
Solving, we find that x = y. Choosing x = 1 for simplicity, we get the other eigenvector as .
1
• For λ2 = 1

2 1 x x
=1 (1)
1 2 y y
This gives us the system:
2x + y = x
x + 2y = y

1
Solving, we find that x = −y. Choosing x = 1 for simplicity, we get one eigenvector as
−1
Normalizing these eigenvectors, we construct the matrix P :

1 1 1
P =√
2 1 −1
Among the given options, the closest one is option 1:

1 1 1
√
2 1 −1
Question 15
Assume you are being an n-dimensional vector and that you are being told it is the top
principal component of returns on a portfolio of n assets. What data do you need to
evaluate the performance of its eigen portfolio?
1. The task is hopeless, and no amount of financial data can help in evaluating the eigen portfolio.
2. I need the series of the asset returns over the time period corresponding to the principal
component calculation
3. I don’t need any data. The performance can be evaluated from the principal component vector
itself
4. I need the time series of the asset returns.
5. Solution
• The performance of an eigen portfolio, derived from the top principal component, is di-
rectly tied to the specific time period over which the principal component was calculated.
This is because the principal component captures the variance (and hence the investment
opportunity) present in that specific time period.
47
• Using the asset returns series from the exact time period of the principal component
calculation is crucial. The principal component reflects the structure and relationships
between the assets during that period. Applying it to a different time period might
yield misleading results, as the market conditions, correlations between assets, and their
volatilities could have changed.
• Therefore, to evaluate the performance of the eigen portfolio accurately, one must use the
asset returns series from the same time period used for calculating the principal compo-
nent. This ensures that the performance evaluation is consistent with the market condi-
tions and investment opportunities the principal component was intended to capture.
Question 16
The daily returns of a given equity on the stock market are modeled with the time series:
zt = zt−1 + at
where at is a Gaussian random variable of mean 0 and standard deviation 5. What is the standard
deviation on the equity after 100 business days of training, assuming at t = 0, z0 = 0?
• The given time series model for the daily returns of an equity is:
zt = zt−1 + at
where at is a Gaussian random variable with mean 0 and standard deviation 5. This model
represents a Random Walk, where each step zt is the sum of the previous step zt−1 and a new
random step at
• Since at is Gaussian with a standard deviation of 5, and assuming the steps are independent,
the variance of the equity’s return after n days, σn2 , is the sum of the variance of each individual
step at .
• Given the variance of a single step σa2 = 52 = 25, the variance after n steps is n · σa2 . The
standard deviation is the square root of the variance.
• So, after 100 business days, the variance σ100

2
would be 100 · 25, and the standard deviation
σ100 would be:
√ √
σ100 = 100 · 25 = 2500 = 50
Therefore, the standard deviation on the equity after 100 business days of training, assuming
z0 = 0, is 50.
Question 17
The daily price of an asset is modeled as the following time series:
zt = 100 + 0.5at−1 + 0.5at
where at is a series of independent, identically distributed Gaussian random of mean 0 and standard
deviation 5. What are the mean and the variance of zt ?
48
• Given the time series model for the daily price of an asset:
zt = 100 + 0.5at−1 + 0.5at
where at is an independent, identically distributed (i.i.d) Gaussian random series with a mean
of 0 and a standard deviation of 5.
• The mean of zt : Since at and at−1 are i.i.d. Gaussian random variables with a mean of 0, the
expected value (mean) of each term involving at and at−1 is also 0. Therefore, the mean of zt
is simply:
Mean of zt = 100 + 0.5 × Mean of at−1 + 0.5 × Mean of at = 100 + 0.5 × 0 + 0.5 × 0 = 100
• Variance of zt : The variance of a sum of independent random variables is the sum of their
variances. since at and at−1 are i.i.d with a variance of σa2 = 52 = 25, the variance of zt can be
computed as:
Variance of zt = 0.52 × Variance of at−1 + 0.52 × Variance of at = 0.25 × 25 + 0.25 × 25 = 12.5
is 100, and the variance of zt is 12.5.
Question 18
zt = 100 + 0.5at−1 + 0.5at
where at is a series of independent, identically distributed Gaussian random variables of mean 0 and
standard deviation 5. What is the lag-1 autocovariance function of zt ? Recall that the autocovariance
function of lag 1 is expressed as E[(zt − m)(zt−1 − m)], where m is the mean of zt
solution
• Given the time series model for the daily price of an asset:
zt = 100 + 0.5at−1 + 0.5at
where at is a series of independent, identically distributed Gaussian random variables with a

mean of 0 and a standard deviation of 5, and given that the mean of zt (m) is 100, we are to
find the lag-1 autocovariance function:
Cov(zt , zt−1 ) = E[(zt − m)(zt−1 − m)]
• First, express zt−1 :
zt−1 = 100 + 0.5at−2 + 0.5at−1
Now, considering zt − m and zt−1 − m:
zt − m = 0.5at−1 + 0.5at
zt−1 − m = 0.5at−2 + 0.5at−1
Since at are independent and have a mean of 0, E[at ] = 0 and E[a2t ] = σ 2 where σ = 5.
49
• The lag-1 autocovariance function is:
E[(0.5at−1 + 0.5at )(0.5at−2 + 0.5at−1 )]
Expanding this and using the fact that E[at ] = 0 and the variables are independent (hence
terms involving products of different at ’s will be zero), we get:
E[0.25a2t−1 ] + 0
(since terms involving at at−1 , at at−2 , and at−1 at−2 will be zero)
• The expected value of a2t−1 is the variance of at , which is σ 2 = 25. So, we have:
0.25 × 25 = 6.25
Therefore, the lag-1 autocovariance function of zt is 6.25
Question 19
zt = 100 + 0.5zt−1 + at
where at is a series of independent, identically distribute Gaussian random variables of mean 0 and
standard deviation 5. What is the mean of zt ? Assume the process is stationary.
Solution
• For stationary process, the mean of zt is constant over time. Given the time series model:
zt = 100 + 0.5zt−1 + at
where at is a series of independent, identically distributed Gaussian random variables with

mean 0 and standard deviation 5.
• To find the mean of zt , denoted as µ, we take expectations on both sides of the equation, keeping
in mind that the process is stationary (which means E[zt ] = E[zt−1 ] = µ and E[at ] = 0:
E[zt ] = E[100 + 0.5zt−1 + at ]
since the expectation of a sum is the sum of the expectations, and cosntants come outside the
expectation, we have:
µ = 100 + 0.5E[zt−1 ] + E[at ]
Substituting E[zt−1 ] = µ and E[at ] = 0, we get:
µ = 100 + 0.5E[zt−1 ] + E[at ]
Substituting E[zt−1 ] = µ and E[at ] = 0, we get:
µ = 100 + 0.5µ
Rearranging to solve for µ:
0.5µ = 100
µ = 200
Therefore, under the assumption of stationarity, the mean µ of zt is 200.
50
Question 20
zt = 100 + 0.5zt−1 + at
standard deviation 5. What is the variance of zt ? Assume the process is stationary
Solution
• For a stationary process, the variance of zt is constant over time. Given the time series model:
zt = 100 + 0.5zt−1 + at
where at is a series of independent, identically distributed Gaussian random variables with

mean 0 and standard deviation 5.
The variance of zt , denoted as σz2 , can be found by analyzing the equation in terms of variances.
Taking the variance on both sides of the equation, and noting that the variance of a constant
is 0, we have:
V ar(zt ) = V ar(100) + V ar(0.5zt−1 ) + V ar(at )
Since V ar(100) = 0 and knowing that V ar(at ) = σa2 = 52 = 25, and V ar(cX) = c2 V ar(X) for
any constant c and random variable X, we can rewrite the equation as:
σz2 = 0.52 σz2 + 25

σz2 = 0.25σz2 + 25
0.75σz2 = 25
100
σz2 =
3
Therefore, under assumption of stationarity, the variance σz2 of zt is 100/3.
51
QUIZ
Question 1
After a sudden, large price movement on an equity, a trade should look at its volume
bar to assess the reaction of the market
• True
• Pending explanation
Question 2
One advantage of time bars is that trading activity is uniformly distributed over time
• False
• Time bars, which aggregate data points based on uniform time intervals (e.g., hourly, daily),
do not ensure that trading activity is uniformly distributed over time. While time bars provide
a consistent temporal framework for analyzing price movements, the volume and frequency
of trading activity can vary significantly within each interval. Factors such as market news,
opening and closing sessions, economic data releases, and other events can lead to periods of
high volatility and trading volume interspersed with quieter periods. Therefore, while time
bars offer a regular time-based structure for data analysis, they do not guarantee uniform
distribution of trading activity within those intervals.
Question 3
One advantage of time bars is that their distribution is close to normal.
• False
• The distribution of time bars, in terms of price changes or returns within those bars, is generally
not close to normal (Gaussian). Financial market returns, even when aggregated into time bars
like daily or hourly returns, often exhibit characteristics that deviate significantly from a normal
distribution.
Question 4
Trade bars are susceptible to outliers, especially during market auctions.
• True
• Trade bars, which aggregate trading data over specific intervals (time, volume, tick, etc.), are
indeed susceptible to outliers, particularly during market auctions such as the opening and
closing auctions. These auction periods often experience heightened trading activity, leading
to increased volatility and the possibility of extreme price movements. During market auctions,
a large number of buy and sell orders are executed simultaneously when the market opens or
closes, which can lead to significant price jumps or drops. These sudden movements can appear
as outliers in the aggregated data of trade bars, distorting the overall picture of market behavior
during more stable periods.
52
Question 5
The log returns of an equity is a non-negative number
• False
• The log returns of an equity can be both positive and negative. Log returns are calculated
using the natural logarithm of the ratio of subsequent prices.
Question 6
The covariance matrix of three independent market sectors is likely to be diagonal.
• True
• The covariance matrix of three independent market sectors is likely to be diagonal. In a

covariance matrix, the diagonal elements represent the variances of each individual variable (or
market sector, in this case), and the off-diagonal elements represent the covariances between
pairs of variables.
Question 7
In a correlation matrix, the off diagonal coefficients cannot be negative.
• False
• In a correlation matrix, the off-diagonal coefficients can indeed be negative. The elements of
a correlation matrix represent the Pearson correlation coefficients between pairs of variables,
and these coefficients can range from -1 to 1.
Question 8
In an autoregressive time series of order 1, the partial autocorrelation function of order
2 or more is zero.
• True
• In an autoregressive (AR) time series model of order 1, denoted as AR(1), the current value
of the series is explained by its immediately preceding value, plus a stochastic term (an error
term). The model can be represented as:
Question 9
The explainable variance decreases monotonically as we add more components to the
eigenvalues portfolios.
• False
• In PCA, principal components are ordered by the amount of variance they explain, with the first
principal component explaining the most variance. Each subsequent component explains less
variance than the previous one, but the total explainable variance increases as more components
are added.
53
Question 10
For a given equity, the time series of the volume bars and dollar bars have the same
partial autocorrelation function.
• False
• The time series of volume bars and dollar bars for a given equity are unlikely to have the
same partial autocorrelation function (PACF). Volume bars and dollar bars are two different
methods of aggregating financial time series data, and they emphasize different aspects of the
trading activity.
Question 11
The major component of a covariance matrix of n assets is the direction along which
the variance of the log returns is maximized.
• True
• the maximum variance (or major component) is indeed represented by the largest eigenvalue
of the covariance matrix, and the direction in which this variance is maximized is given by
the associated eigenvector. This principal component (eigenvector) can be interpreted as a
portfolio of assets where the weights are determined by the components of the eigenvector,
capturing the most significant collective movement of the assets.
Question 12
The autocorrelation function of an AR(1) time series decreases geometrically as function
of the lag.
• True
• the maximum variance (or major component) is indeed represented by the largest eigenvalue
of the covariance matrix, and the direction in which this variance is maximized is given by
the associated eigenvector. This principal component (eigenvector) can be interpreted as a
portfolio of assets where the weights are determined by the components of the eigenvector,
capturing the most significant collective movement of the assets.
Question 13
The partial autocorrelation function of an MA(1) time series decreases geometrically as
function of the lag.
• False
• The MA(1) is defined as
Yt = µ + εt + θεt−1
where:
– Yt is the value of the series at time t
54
– µ is the mean of the series
– εt is white noise with mean 0 and variace σ 2
– θ is the coefficient for the lag-1 moving averange
• In an MA(1) model, the current value Yt is directly influenced by the current shock εt and the
shock from the previous time period εt−1 . There is no dependence on Yt−1 or Yt−2 beyond what
is mediated through the ε terms.
Question 14
in an AR(1) time series: z(t) = ϕz(t − 1) + a(t) the coefficient ϕ must have an absolute
value < 1 for the series to be stationary.
• True
• A stationary time series has statistical properties such as mean, variance, and autocorrelation
that are constant over time. For an AR(1) model to be stationary, its future values must
depend on its past values in a way that does not amplify fluctuations over time.
Question 15
Assume the correlation matrix of the log returns of three assets has the form
 
1 0 0
C = 0 1 ρ
0 ρ 1
Then the coefficient ρ must satisfy which inequality?
1. ρ ≤ 1
2. |ρ| > 1
3. |ρ| ≤ 1
4. |ρ| ≥ 1
Question 16
Assume the covariance matrix of the log returns of three assets has the form
 
1 1 0
C = 1 2 0
0 0 1
and that the capital is allocated 20% to asset 1, 30% to asset 2, and 50% to asset 3.
What is the variance of the resulting portfolio?
• To find the variance of the portfolio, we use the formula for portfolio variance
σp2 = wT Cw
where
– wT = [0.2, 0.3, 0.5]
• the answer is 0.59
55
Question 17
Assume the covariance matrix of the log returns of three assets has the form
 
2 1 0
C = 1 2 0
0 0 1
then which is the largest eigenvalue of the covariance matrix?
Question 18
The daily returns of a given equity on the stock market are modeled with the time
series:
zt = zt−1 + at
where at is a Gaussian random variable of mean 0 and standard deviation 10. What is
the standard deviation on the equity after 100 business days of training, assuming at
t = 0, z0 = 0?
Question 19
1
zt = 3 + (at−2 + at−1 + at )
3
standard deviation 3. What are the mean and the variance of zt ?
Question 20
1
zt = 2 + zt−1 + at
3
variance 8. What is the standard deviation of zt ? Assume the process is stationary
56

Financial ML

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Financial ML

Uploaded by

Copyright:

Available Formats

Topological Data Analysis (ECCE 794)

Name: Oscar Enrique Llerena Castro

1. Introduction to Financial Machine Learning

3. Automated Portfolio Management

4. Financial Data Structures

5. Labeling in Supervised Learning

6. Non-IID (Independent and Identically Distributed) Observations

7. Stationary, Memory retention and Differencing in Time Series

9. Ensemble Methods, Cross Validation and Feature Importance in Finance

12. Machine Learning Asset Allocation

13. Stock APIs and Market data acquisition

14. Project - Daily Energy price prediction in Europe

• Trade Data: Trade Bars

• Tick/Volume/Dollars Bars: Advantages and Disadvantages

• Introduction to Imbalance Bars

• Signal Processing: Shannon-Nyquist Theorem about the Sampling Rate

• Example: Predicted stock price as function of trade volume

Forming Better Bars

• A better approach is to sample observations as a subordinated prcess of the amount information

Dollar Imbalance Bars: More Mathematical

• We compute the expected value of θT at the beginning of the bar

• Then, E0 [θT ] = E0 [T ](v + − v − ) = E0 [T ](2v + − E0 [vt ]).

• Here, we use v − = E0 [vt ] − v +

• In practice, we can estimate E0 [T ] as an exponentially weighted moving average (EWMA) of

• Video to understand Exponential Weighted Moving Average

T ∗ = argT min{|θT | ≥ E0 [T ]|2v + − E0 [vt ]|}

where the size of the expected imbalance is implied by |2v + − E0 [vt ]|

Principal Component Analysis for Financial Data

• Multi Factor Portfolio

• Finding Factors with Principal Component Analysis

• Textbook: Chapter 2, section 2.4.2

• ri (t): the log-return for stock i

• rM (t): the log-return for the “market” M (the DJI index)

• The return of a market index represents the overall market performance.

• βi : the “beta” - measure of correlation with the market.

• εit : the “noise” term (residual) ⇔ E[εit ] = 0

Estimating Greek Symbols and Returns

ri (t) = αi + βi rM (t) + εit

• Diversification is a fundamental principle in finance that helps in reducing unsystematic risk

ri (t) = αi + βi rM (t) + εit

Multi-factor model: individual returns Ri (i = 1, ..., N ) are driven by m factors Fj (j = 1, ..., m)

• “Idiosyncratic” refers to the stock-specific or unique factors affecting an individual stock’s

Factors Fj can be thought of as returns of “benchmarks” portfolios representing systematic factors

• Stock Price Volatility

• Are they independent? Uncorrelated?

• How do we estimate the beats?

Can we build intrinsic factors using stock data?

1. Compute daily returns

2. Standardized returns (data normalization):

3. The empirical correlation matrix is the covariance matrix of standardized returns:

This matrix is not diagonal!

How do Stock Returns Correlate with Each Other?

To Get Uncorrelated Factors, C must be Diagonal!

• Introduce a linear transform (linear encoder) of the data Z = XV by a p × k orthogonal matrix

• Z is the transformed data matrix obtained by applying the linear transformation Z = XV ,

• A decoded signal (inverse transform) is obtained as X̂ = ZVT

• X is the original standardized return data matrix before any transformation.

• X̂ is the approximation or reconstruction of X after it has been transformed to Z and then

• Λ = diag(λ1 , ..., λN )(λ1 ≥ ... ≥ λN ) is a diagonal matrix of ordered eigenvalues

• U is a N × N orthogonal matrix (UUT = I) that stores eigenvectors column-wise

• Matrix U contains the eigenvectors of the correlation matrix C. Each column in U is an

• Use this to compute the covariance matrix of Z = XV

• Correlation Matrix standardizes the Covariance Matrix by dividing each element by