Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 10

1

White Noise:
A white noise process, in simple words is a time series with no recognizable structure. Think of it to
be some kind of an Error or Shock term in the equation of any series analysis. White Noise is the
unexplained change or shock in any particular time series.

For example ,this time series plot shown here is just random, independent, identically distributed
observations. No Trend, No Seasonality, No Cyclicity, Not even any autocorrelations in different time
stamps. Just Randomness.

In time series it is called “White Noise”


and is a purely random time series. The
name comes from physics where white
light has some similar mathematical
characteristics. Like how the white light is
a noisy mix of all lights making it unclear
to recognize different lights.

Why is it important?

Yt = Signal (Predictable with modelling) + Noise (unpredictable)


Whenever we do the analysis of a time series (Yt), we assume that it is a combination of a Signal
(Which is things or patterns that we can predict and can make model for) + Noise (which is White
Noise i.e. something that’s totally random and unpredictable).

So, if you truly capture all the components of a time series perfectly, i.e. if you figure out all the
components that make up the signal, whether it is Auto-regressive Moving Average (ARMA) or
something further, then that means then that means that the resulting residual (Noise) would be –

= time series – signal

That would be White Noise, and unpredictable. So, when you can prove that the residuals are either
White Noise or close to it, you can say that the modelling is correct that fits that particular data set.

Characterstics / properties of White Noise-

1. Constant Mean
2. Constant Variance – ( Standard Deviation is constant)
3. No Auto-correlation (ACF) – (Correlation between lags = 0). The value of a time series
at a particular period has no relation with the values in the past time period.
4. White Noise is not predictable

How to identify White Noise-

1. Visual Inspection
2. Global & Local checks – If the tests for mean & variance hold true for whole population
of data (global) and random subsets (local) or slices of that time series either at different
plots or may be in a rolling window manner(1-5, 2-6, 3-7...). When you calculate the
Mean & Variance for these windows or local subsets, they should match up to each other
and to the global one to fulfil the criteria.
2

3. Auto-Correlation plots (Check ACF) - We will study about this in coming sessions.
4. Statistical tests – Null Hypothesis & Alternative Hypothesis.

Visual test of above 3 series/ graphs above-

1. The first plot has somewhat visible constant mean, and more or less similar variations,
but no visible Auto-correlation between random two time slots in it. Difficult to identify
that if you know the particular value at some particular time period, it’s difficult to
estimate or predict the future values a there’s no evident pattern. Therefore, this is White
Noise plot
2. In the second plot- It doesn’t appear to have a constant mean, so we can automatically
disqualify it. However, if you see further, there’s a clear Auto-correlation. You can see a
rising pattern, higher highs (peaker higher than the old ones) each time. This pattern can
help you predicting future values, repeating peaks can tell you the next rise could take it
to a new peak. The variance is also not constant. So, this is not a White Noise.
3. In the third series graph here, it seems to have a constant mean, but there’s Auto-
correlation. There’s a visible pattern of repeating highs and lows at visibly same time
distances. It may help you to predict values for futures. But it’s not a White Noise in the
presence of a constant mean.

Why is it important to understand if some series is White Noise – Because it answers a crucial
questions of ‘when should I stop fitting my model’. If you realize that a particular time series is
White Noise, you’ll have to abandon the project as it is impossible to predict future values of White
Noise.

What are Moments [of a statistical distribution]:

Moments are statistical measures that give certain characteristics of the distribution of data.
Generally, in frequency distribution, four moments are obtained. The shape of any distribution can
be described by its various ‘moments’. The first four moments to describe distribution of a data are:

1) The mean, which indicates the central tendency of a distribution.


2) The second moment is the variance, which indicates the width or deviation.
3) The third moment is the skewness, which indicates any asymmetric ‘leaning’ to either left or right.
3

4) The fourth moment is the Kurtosis, which indicates the degree of central ‘peakedness’ or,
equivalently, the ‘fatness’ of the outer tails.

Skewness & Kurtosis - In statistics these are sometimes also called the measures of shape.

Simply put skewness is defined as a measure of symmetry or probably the lack of it. The word
skewness in English means – lopsidedness/ tilted or biased. So with skewness we try to find out if
and how much biased is the distribution of a real valued random variable with respect to its mean.

Kurtosis – Kurtosis on the other hand until recently was defined as the measure of “peakedness” of
the bell curve or the normally distributed curve. Kurtosis originates from the Greek word ‘krytos’ or
‘kurtos’ which means ‘curved’. So what is the requirement of this at all? It is to find out the normality
of the data for analysis.
“Skewness essentially measures the symmetry of the distribution, while kurtosis
determines the heaviness of the distribution tails.”

SKEWNESS:
If the values of a specific independent variable (feature) are skewed, depending on the model,
skewness may violate model assumptions or may reduce the interpretation of feature importance.
In statistics, skewness is a degree of asymmetry observed in a probability distribution that deviates
from the symmetrical normal distribution (bell curve) in a given set of data.
The normal distribution helps to know a skewness. When we talk about normal distribution, data
symmetrically distributed. The symmetrical distribution has zero skewness as all measures of a
central tendency lies in the middle.         

When data is symmetrically distributed, the left-hand side, and right-hand side, contain the same
number of observations. (If the dataset has 90 values, then the left-hand side has 45 observations,
and the right-hand side has 45 observations.). But, what if not symmetrical distributed? That data is
called asymmetrical data, and that time skewness
comes into the picture.

Types of skewness 

1. Positive skewed or right-skewed  

In statistics, a positively skewed distribution is a sort of distribution where, unlike symmetrically


distributed data where all measures of the central tendency (mean, median, and mode) equal each
other,  with positively skewed data, the measures are dispersing, which means Positively Skewed
Distribution is a type of distribution where the mean, median, and mode of the distribution are
positive rather than negative or zero.
4

In positively skewed, the mean of the data is greater than the median (a large number of data-
pushed on the right-hand side). In other words, the results are bent towards the lower side. The
mean will be more than the median as the median is the middle value and mode is always the
highest value
The extreme positive skewness is not desirable for distribution, as a high level of skewness can cause
misleading results. The data transformation tools are helping to make the skewed data closer to a
normal distribution. For positively skewed distributions, the famous transformation is the log
transformation. The log transformation proposes the calculations of the natural logarithm for each
value in the dataset.

 2. Negative skewed or left-skewed

A negatively skewed distribution is the straight reverse of a positively skewed distribution. In


statistics, negatively skewed distribution refers to the distribution model where more values are
plots on the right side of the graph, and the tail of the
distribution is spreading on the left side.
In negatively skewed, the mean of the data is less than the
median (a large number of data-pushed on the left-hand
side). Negatively Skewed Distribution is a type of
distribution where the mean, median, and mode of the
distribution are negative rather than positive or zero.

Median is the middle value, and mode is the highest


value, and due to unbalanced distribution median will be higher than the mean.

Measure of Skewness:

If the skewness is between -0.5 & 0.5, the data are nearly symmetrical.
If the skewness is between -1 & -0.5 (negative skewed) or between 0.5 & 1(positive skewed), the
data are slightly skewed.
If the skewness is lower than -1 (negative skewed) or greater than 1 (positive skewed), the data are
extremely skewed.

Although skewness gets affected by the extreme values, the parameter may not be able to detect
outliers all the time. Outliers or extreme values affect skewness primarily because when we calculate
skewness, we are effectively calculating the cubed (X 3) value of the distance of the data points
around the mean. That would lead us to conclude that Skewness can only be used for identifying
symmetry in a data or to ascertain whether a data is symmetric or normally distributed..k,l
5

Kurtosis:
Kurtosis was historically considered to be a measure of peakedness. That is ofcourse, until recently.
In 1945 a mathematician Irving Kaplansky had proposed a revision in the approach contrary to
kurtosis being a measure of ‘peakedness’. Later, recently in 2014 Peter H. Westfall finally laid to rest
all speculation about the definition of Kurtosis as a measure of peakedness. He very vehemently
mentioned proved that kurtosis has nothing to do with peakedness and instead it is more a measure
of the combined weight of the data in the tails relative to the overall distribution.

Peter H. Westfall is a doctorate in statistics and professor of quantitative sciences at the Texas Tech
University. Today it has been accepted that Kurtosis is indeed a measure of weight of the data in the
tails relative to its overall shape. However, certain textbooks continue to maintain the historical
definition of Kurtosis. The reason why kurtosis is dependent on the tails is well understood by this
equation for kurtosis. We have known that mean and standard deviation are affected by extreme
values that usually reside in the tails of a distribution.

So, Kurtosis refers to the degree of presence of outliers in the distribution. Kurtosis is a statistical
measure, whether the data is heavy-tailed or light-tailed in a normal distribution.

In finance, kurtosis is used as a measure of financial risk. A large kurtosis is associated with a high
level of risk for an investment because it indicates that there are high probabilities of extremely
large and extremely small returns. On the other hand, a small kurtosis signals a moderate level of
risk because the probabilities of extreme returns are relatively low.

Excess Kurtosis

The excess kurtosis is used in statistics and probability theory to compare the kurtosis coefficient
with that normal distribution. Excess kurtosis can be positive (Leptokurtic distribution), negative
(Platykurtic distribution), or near to zero (Mesokurtic distribution). Since normal distributions have a
kurtosis of 3, excess kurtosis is calculating by subtracting kurtosis by 3.

               Excess kurtosis = Kurtosis – 3


Types of excess kurtosis

1) Leptokurtic or heavy-tailed distribution (kurtosis more than normal distribution).


2) Mesokurtic (kurtosis same as the normal distribution).
3) Platykurtic or short-tailed distribution (kurtosis less than normal distribution).
6

In financial modelling and investment analysis, sometimes, there is an over-reliance on the


assumption of normal distribution. In such applications, neglecting the concept of Kurtosis can result
in what is called as the “Kurtosis Risk”. Kurtosis Risk relates to not being able to identify assets or
investments that are either wildly high or wildly low on their returns. Often, they will not be
identified using three standard deviations from the mean.

So when modelling investment returns, this suggests that assets with leptokurtic distribution are
riskier than the ones with normal distribution. These assets tend to perform inconsistently and from
stable returns point of view are better avoided. Such risky financial securities or assets can be
identified using the concept of Kurtosis. Hence the study of the Kurtosis of a distribution is important
in risk management applications.

Summary

The skewness is a measure of symmetry or asymmetry of data distribution, and kurtosis measures
whether data is heavy-tailed or light-tailed in a normal distribution. Data can be positive-skewed
(data-pushed towards the right side) or negative-skewed (data-pushed towards the left side).

When data skewed, the tail region may behave as an outlier for the statistical model , and outliers
unsympathetically affect the model’s performance especially regression-based models. Some
statistical models are hardy to outliers like Tree-based models, but it will limit the possibility to try
other models. So there is a necessity to transform the skewed data to close enough to a Normal
distribution.

Excess kurtosis can be positive (Leptokurtic distribution), negative (Platykurtic distribution), or near
to zero (Mesokurtic distribution). Leptokurtic distribution (kurtosis more than normal
distribution).Mesokurtic distribution (kurtosis same as the normal distribution).Platykurtic
distribution (kurtosis less than normal distribution).

The understanding of skewness and kurtosis is critical in risk management and financial analysis.
Research has shown that most securities Returns tend to exhibit Skewness and Kurtosis.

In general, greater positive Kurtosis (Leptokurtic) and more negative Skewness indicate increased
risk. This is because of the wider left tail which indicates a higher probability of extremely large
negative outcomes.
7

STATIONARITY, AUTOCORRELATION, NORMALITY

The major goal of time series analysis or forecasting is to fit the data to the appropriate model under
consideration to be able to then predict future values in the series. Data points in these series are
often non-stationary i.e they have means, variances, and covariances that change over time. Non-
stationary behaviours can be trends, cycles, seasonality etc.

For us to work on any time series for estimating a financial model, there are certain assumptions
made in time series analysis. Some of the assumptions are

• That the time series should be stationary. In other words, it means that in time series
analysis, the series are normally distributed and the mean and variance is constant over that
period of time.
• That the errors are randomly distributed. The errors in time series analysis are assumed to
be uncorrelated to each other. No Autocorrelation.
• That no outlier is present in the series. The presence of outliers affects the inference of the
data in time series analysis and thus leads to inaccurate results.

What is Stationarity?

A stationary process is a stochastic process whose unconditional joint probability distribution does


not change when shifted in time. Consequently, parameters such as mean and variance also do not
change over time. A stationary process has the property that the mean, variance and autocorrelation
structure do not change over time. In simple terms, Stationarity means that the statistical properties
of the time series data are more or less same over the time and there are no evidence of seasonality,
trends or variance.

Properties of stationarity:

- Constant Mean (fluctuating around the same value over time)


- Constant variance (the fluctuations should not be varying too much with time)
- No seasonality

Non-stationary data, as a rule, are unpredictable and cannot be modelled or forecasted. In order to
receive consistent, reliable results, the non-stationary data needs to be transformed into stationary
data.

Transformations to Achieve Stationarity:

Although, in common practice, instead of using series of stock prices, we may use series of daily
returns of that stock prices which may give us a stationary series. If you identify that the particular
time series is not stationary, you need to convert it into a stationary model so that you can use a
time series model for that. There are different ways to do that.

• Differencing:
Let’s say we have some time series – which is modeled by: yt = α + βt + et .
8

Meaning that over time it’s a function of the current time stamp where α is the intercept
(starting point on Y axis) – and βt is the slope (value at next time stamp) and et is some error
or noise. For applying differencing method, we’ll create a new series Zt. Which is:
zt = yt - yt-1

Now, the mean of this new series is constant, and variance of this new series is constant
over time too. Also, if you’ll plot it you’ll know it won’t be seasonal anymore.
• Higher order differencing:
If first order differencing is not enough to make our series stationary then we may apply
second order differencing which means subtracting the values or observations two time
periods ago from the current values.
Similarly, for third order differencing, we’ll obtain our differenced series by subtracting the
values or observations three time periods ago from the current values. The differenced
series of the nth order will be given by.
zt = yt - yt-n

• Seasonal differencing:
If you observe clear values of seasonaility, seasonal differencing may be applied to achieve a
stationary series. Here we subtract the value of previous time cycles or seasons from current
ones instead of time periods. This is used to fix the data which is not stationary because of
only seasonality.

• For non-constant variance, taking the logarithm or square root of the series may stabilize the
variance.

Testing for Non-Stationarity:

One of the most commonly used and frequently applied statistical tests for stationarity in time series
analysis are Dickey-Fuller test and Augmented Dickey-Fuller test. Dickey-Fuller test helps us in
identifying if a time series has ‘unit-root’ present in it. The presence of unit root would indicate
towards the time series being non-stationary.

When we are testing the return series of our data for Non-stationarity in its first lag, Dickey-Fuller
test may be used. But for testing Non-stationarity in the lags two or more, Dickey-Fuller test will not
be able to give us results and we need to apply Augmented Dickey-Fuller test. In the illustration used
in the excel, we tested the returns of our time series for non-stationarity with 1 and 2 lags with three
types of non-stationary processes – Pure Random Walk, Random Walk with Drift and Random Walk
with Drift and Deterministic Trend.

The t-stat values obtained from these tests are then compared to the Dickey-Fuller critical
(reference) values to interpret if our concerned returns series is Stationary or not.
9

What is Autocorrelation?

Autocorrelation (also known a ‘serial correlation’) is a mathematical representation of the degree of


similarity between a given time series and a lagged version of itself over successive time intervals.
It's conceptually similar to the correlation between two different time series, but autocorrelation
uses the same time series twice: once in its original form and once lagged one or more time periods. 

For example, if it's rainy today, the data suggests that it's more likely to rain tomorrow than if it's
clear today. When it comes to investing, a stock might have a strong positive autocorrelation of
returns, suggesting that if it's "up" today, it's more likely to be up tomorrow, too.

Types of Autocorrelaion:

Positive Autocorrelation: An increase seen in one value of the time series leads to a proportionate
increase in the following values in time series. The effect of previous day’s values is positive on
current day.

Negative Autocorrelation: An increase seen in one value of the time series results in a proportionate
decrease in the other lagged time series, ie. Following values in the time series. That the effect of
previous day’s values is negative on current day.

Testing for Autocorrelation:

Although, with the study of ACF (Autocorrelation Function) and PACF (Partial Autocorrelation
Function) it will be much easier to test autocorrelation, for now for the interest of conceptual
understanding of testing autocorrelation in the time series, Durbin Watson Test could be studied to
find out autocorrelation with 1 lag series.

In this test we test the autocorrelation in the residuals of the actual time series and its predicted
time series (which is obtained with the help of Alpha (α) and Beta (β) coefficients) to then calculate
the residual values.

With the help of the given formula we calculate


the Durbin Watson stat. The resultant value of the
Durbin Watson stat should vary between ‘0’ to ‘4’,
where ‘0’ would represent total positive
Autocorrelation and ‘4’ would mean negative
correlation.

Getting the resultant value of the Durbin Watson stat in the middle of its extreme values, somewhat
closer to ‘2’ would mean that there is no autocorrelation present in the residuals.

(*Correction here – In the class session I may have said autocorrelation in the returns data – but in
that excel example we tested autocorrelation in the residuals because we do not want any
remaining autocorrelation in the residuals)

Hint – The numerator of this formula can simply be obtained by adding all the squared differences of
the residuals (So this is only summation of squared differences). And the denominator is the sum of
the squares of the residuals (also known as RSS in statistics).

Terminology used –
10

• Residuals- Residuals are the error values of the predictions. They tell us how far the
predicted value is from the actual value.
o Residual (R) = Y(Actual Value) – Y*(Predicted Value)

• Expected Returns – These are the predicted values of the returns. They can be
obtained using the regression formula –
y= α + βx
Expected returns of y = Alpha + Beta*x (X is independent variable)

• Alpha (α) and Beta (β) - Alpha (α) is the constant or the intercept in the above formula. The
Beta (β) coefficient is the degree of change in the outcome variable for every 1-unit of
change in the predictor variable. (These can be calculated using the simple ‘LINEST’ function
in excel)

What is Normality?

We expect the data of our time series to be normally distributed and that it makes this perfect bell
distribution curve. It also means, we don’t have extreme values present in our time series i.e. there
are very few values in the tails – the outliers

Testing for Normality

In common practice, it is assumed for many theories and models in finance that the returns of assets
are normally distributed - as in - the returns follow normal distribution. We used the Kolmogorov-
Smirnov test to test this claim if the returns are really normally distributed.

In Kolmogorov-Smirnov test, with the comparison of Empirical distribution and Theoretical


distribution values, we are able to calculate the absolute differences of actual and ideal normal
distribution of a series. The resultant values of K-stat in this test are then compared to the critical
values to check if the data in our stock returns series is normally distributed or not. Alternatively, we
can also calculate the p-value which will give us the probability of our stock returns series being
normal.

(* All the excel sheets for these tests are shared along with this study material)

You might also like