Professional Documents
Culture Documents
High Frequency Trading Analysis Using R
High Frequency Trading Analysis Using R
PGP ID PGP31143
1. Data Information
High frequency 5 minute data from July 1, 2015 to June 30, 2016 (1 year data).
No. of observations in sample is 18525.
Data heads and interpretation o Time interval Represents 5 minute time interval for particular day for which
price movements are observed
o Close - Closing price of the stock at the end of time interval
o Open - Opening price of the stock at the start of time interval
o Net chg Algebraic change in price in particular interval measured as
Net chg = Close - open
o High - Highest price of the stock during 5 minute time interval
o Low Lowest price of the stock during 5 minute time interval
o Volume Number of shares traded during the period
2. Logarithmic returns
Logarithmic retunes are calculated as below 2 formula because for 5 minutes high frequency
data Opening price for any period is very close ( many time equals) to closing price of
immediate previous period, i.e.
Open(t) = Close(t 1)
). (1)
Logarithmic returns are calculated using 2nd formula in csv file under header Logarithmic
returns
3. Normality test
Step 1 Import file into tcs and attach tcs for easy column handling in R
#reading CSV file
tcs=read.csv(file.choose(), sep = ",")
#attaching imported file
attach(tcs)
Interpretation
Skewness of Logarithmic retunes of TCS has +ve skewness of 0.238 which means that in
TCS logarithmic returns probability of occurring high +ve returns is more than
probability of occurring high ve returns.
From Agostino test we are able to reject H0 : Skewness = 0 and conclude that TCS
logarithmic returns has +ve skewness.
From Anscombe test we are able to reject H0 : Kurtosis = 3 and conclude that TCS
logarithmic returns has excess kurtosis i.e. kurtosis is greater than 3.
As kurtosis is very high comparing to normal distribution which implies that TCS
logarithmic returns has more value near to mean value comparing to a normal
distribution.
#testing normality
jarque.bera.test(Logarithmic.returns)
Jarque Bera Test
data: Logarithmic.returns
X-squared = 772460, df = 2,
p-value < 2.2e-16
Interpretation
Jqrque-bera test has p-value far less than 0.05 and it implies that H0 : excess kurtosis = 0
and skewness = 0 is rejected and we conclude that logarithmic returns of TCS not follow
Normal distribution.
Interpretation
Black color plot shows actual density plot of TCS Logarithmic returns while red color plot
shows the density plot of a normal distribution with same mean and Standard Deviation
as that of TCS Logarithmic returns. Here it is visible that returns has very high kurtosis
comparing to normal returns and +ve skewness.
= (
) (
)
1
1
Assumption - Due to unavailability of volume(t-1) for 1st observation (July 1, 2015, 9:15-9:20
A.M.) its value is assumed to be same that of 2nd observation.
5.
Testing stationarity
Step 1 Plotting close price and logarithmic returns to check stationarity by looking at graphical
properties.
# Changing plotting symbol for better visibility
par(pch = ".")
Interpretation
It is clearly visible from the above plots that share price (close price here) is not
following the stationary time series as expected while logarithmic returns are following
stationary time series.
Interpretation
ACF plot in the share close price shows declining ACF spikes as autocorrelation is
decreasing (very less per lag) with increasing lag but still it show significant
autocorrelation with higher lags, while PACF has unit root at value 1 which shows the
non stationarity time series of the share price.
ACF plot of Logarithmic returns shows sinusoidal decline in ACF and PACF plot has no
unit root and it shows stationarity of the time series as expected.
Interpretation
As expected in the case of share price H0 : time series is non- stationary is not rejected
due to very high p-value = 0.5715 which implies that time series is not stationary.
While in case of Logarithmic returns H0 : time series is non- stationary is rejected due
to p-value = 0.01 which is less than 0.05 which implies that time series is stationary.
ma2
0.0298
0.0073
ma3
0.0107
0.0074
log likelihood=95008.98
AIC=-190010
Interpretation
As expected from the ACF graph auto.arima also providing the process to follow MA(3)
but for ma3 test statistics is low (comparing to t-value 1.96 at 5% level of significance)
and it will have very low significance.
= + 0.04451 + 0.02982 + 0.01073
ma2
0.0183
0.0073
ma3
0.0083
0.0074
intercept
0
0
volret
0.4127
0.0073
Interpretation
From the aic value it is observed that it is decreased from -190010 to -192947.2 which
implies that returns of TCS are explained by volume-cum-returns and its coefficient is
significant.
= + 0.06121 + 0.01832 + 0.00833 + 0.4127
Preforming LR testTaking model with volret as unconstrained and without volret as constrained model
= = 2(
) ~ 2(1)
LR ratio = [2*(96479.62-95008.98) = 2941] >> [2(1) = 3.841]
Thus H0: volret is not significant is rejected. We can conclude that volret is significantly
explains variations in returns.
ma2
0.0325
0.0073
ma3
0.0120
0.0074
intercept
0
0
lag(volret[1:18524], 1)
-0.0419
0.0086
Interpretation
From the aic value it is observed that it is increased from -192947.2 to -190021.2 which
implies that returns of TCS are explained by volume-cum-returns-based technical
indicator contemporaneously is better than model explained with a lag.