Predictive Intraday Correlations in Stable and Volatile Market Environments: Evidence From Deep Learning

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

Predictive intraday correlations in stable and volatile market environments:

Evidence from deep learning

Ben Moewsa,∗, Gbenga Ibikunleb


a Institute for Astronomy, University of Edinburgh, Royal Observatory, Edinburgh, EH9 3HJ, UK
b Business School, University of Edinburgh, 29 Buccleuch Pl, Edinburgh, EH8 9JS, UK
arXiv:2002.10385v1 [q-fin.CP] 24 Feb 2020

Abstract
Standard methods and theories in finance can be ill-equipped to capture highly non-linear interactions in
financial prediction problems based on large-scale datasets, with deep learning offering a way to gain insights
into correlations in markets as complex systems. In this paper, we apply deep learning to econometrically
constructed gradients to learn and exploit lagged correlations among S&P 500 stocks to compare model
behaviour in stable and volatile market environments, and under the exclusion of target stock information
for predictions. In order to measure the effect of time horizons, we predict intraday and daily stock price
movements in varying interval lengths and gauge the complexity of the problem at hand with a modification
of our model architecture. Our findings show that accuracies, while remaining significant and demonstrating
the exploitability of lagged correlations in stock markets, decrease with shorter prediction horizons. We
discuss implications for modern finance theory and our work’s applicability as an investigative tool for
portfolio managers. Lastly, we show that our model’s performance is consistent in volatile markets by
exposing it to the environment of the recent financial crisis of 2007/2008.
Keywords: Econophysics, Deep learning, Financial crisis, Market efficiency, Trend forecasting
2010 MSC: 68T05, 62H15, 62P20

1. Introduction

Changes in stock markets are fuelled by human decisions based on beliefs about a stock’s future per-
formance. Investors anticipating the actions of other investors, especially in a period of high uncertainty,
also influence such beliefs [1, 2]. This makes markets inherently noisy and prone to fluctuations. Such
inconsistencies in the valuation of stock prices has been the subject of a long-standing academic debate
centred on the efficient market [see 3, 4] and random walk hypotheses [see 5, 6, 7], meaning whether such
lagged correlations in the stock markets exist at all. Ferreira and Dionı́so [8] deliver evidence against the
efficient market hypothesis in the U.S. stock market, identifying market memory in the form of correlations
over seven months, with related research finding evidence for long-term correlation in market indices [9].
Previous research establishes the existence of lagged correlations in non-volatile market environments
for day-to-day forecasts when combined with infinite impulse response filtering in the data preprocessing.
These inputs can be used to realise above-average accuracies in predicting price trend changes without the
inclusion of data from the target stock as an input, delivering evidence against the random walk hypothesis
and most forms of the efficient market hypothesis in stable market environments [10]. The growing interest
in research dealing with the use of artificial neural networks for stock market prediction is facilitated by
the availability of large historical stock market trading data [11, 12, 13, 14]. Such data usually takes the
form of time series, and thus classical approaches to time series analysis are currently widespread within the

∗ Corresponding author
Email addresses: b.moews@ed.ac.uk (Ben Moews), gbenga.ibikunle@ed.ac.uk (Gbenga Ibikunle)

Preprint submitted to Physica A: Statistical Mechanics and its Applications Declarations of interest: none
investment industry [15]. This configuration, together with the existence of related hypotheses, makes the
prediction of price changes based on historical data an attractive use case for trend forecasting involving
inter-correlated time series in stock markets as an example of real-world complex systems [16].
We investigate intraday predictability with differing time intervals, and test our model’s complexity and
performance in high-volatility crisis scenarios. We find evidence of the presence of time-delayed correlations
in S&P 500 stocks in both stable and volatile markets, and of the viability of using deep learning for trend
predictions in large numbers of inter-correlated time series. Our experiments outperform predefined baselines
for strict statistical key performance indices, which includes accuracies for different prediction horizons.
Predictions of one stock’s trend changes based on other stocks’ price trend gradients in the preceding time
step show an improved accuracy for larger time intervals, with average and maximum accuracies of 56.02%
and 63.95%, respectively, for one-day predictions. In this regard, we outperform related research by Huck
[11] and Krauss et al. [14], and slightly underperform for day-to-day predictions when compared to the
results by Moews et al. [10], demonstrating the negative impact of a lack of input smoothing.
Our findings are novel in that we exploit correlations of a target stock with other stocks in the same
economy under different market scenarios, without recourse to industry classifications and input smoothing,
to make successful predictions about the target stock’s price evolution. Furthermore, we address intraday
intervals, which are more relevant in today’s high-tech high frequency trading market environments. The
results are consistent in financial crisis situations and demonstrate that our framework is able to exploit
existing and inter-temporal correlations in a highly volatile market environment. In addition, we gauge the
complexity of the targeted prediction problem by forcing information compression through bottleneck layers
of different sizes. With this work, we further demonstrate the utility of linear regression derivatives as inputs
for the investigation of economic hypotheses and risk diversification with deep learning.
Our results deliver rigorously tested empirical evidence against the random walk hypothesis, as the
assumption of stock prices over time as random walks effectively excludes the possibility of exploitable
information in historical stock market data. The findings thus contradict Sitte and Sitte [17], who argue
in favour of the existence of a random walk due to the perceived inability of artificial neural networks to
extract actionable information. Our results agree with previous results for one-day predictability in stable
market environments, adding to the evidence against most forms of the efficient market hypothesis [18, 10].
Previous research also shows that market efficiency is highly dependent on how developed a stock market
is, with higher-developed markets exhibiting increased efficiency [19].
The remainder of the paper is set out as follows: Section 2 discusses the progress made on predicting
stock returns and prices, and the functionality of deep learning models in finance. Section 3 presents the
data and methodology used in this paper, Section 4 discusses the results and findings of our experiments,
and Section 5 concludes our work.

2. Predicting stock returns

Technical analysis involves making stock trading decisions on the basis of historical stock market data.
The assumption behind its utilisation in the investment industry is that above-average risk-adjusted returns
are possible when using past time series of stock information, and Clarke et al. [15] show that this practice
is wide-spread in the investment industry. A meta-analysis by [20] also demonstrates that the majority of
studies on the topic of technical analysis report a profitability that undercuts the arguments of the efficient
market hypothesis [21]. Apart from moving average methods, perceived patterns are among the other,
less investigated methods that are used by technical analysts, for example the so-called “head-and-shoulders
pattern”. The latter is used as an indicator for a trend shift, using the negative value of the height denoted in
Figure 1 as the target price for a trade initiated at the breakout point, which marks the pattern’s completion.
The lack of statistical research on such patterns has been criticised by [22], noting that there is a disparity
between the rigour of academic time series analysis and the decision-making of traders. Subsequent studies
by [23] and [24] show indicators of applications for select currencies in foreign exchange markets and conclude
that such patterns may hold some practical value. The above explanations are, however, only provided for
the sake of a holistic overview, and this paper does not delve further into, or apply, technical analysis.

2
Figure 1: Head-and-shoulders pattern in stock market data is not a mathematically developed decision aid. However, the
pattern is widely used by technical analysts as a guideline for investments, as it is perceived to follow a predictable scheme as
a rule-of-thumb within the stock market.

White [25] hypothesised early that artificial neural networks could be successfully used to deliver empirical
evidence against all the three forms of the efficient market hypothesis, reporting an R2 value of 0.175 when
using a simple feed-forward network with five previous days of IBM stock prices as inputs for a regression
task [see also 26]. More recently, Dixon et al. [27] also implement an artificial neural network with five
hidden layers for trinary classification, differing in an output that represents little or no change from the
previously cited studies. Using data of CME-listed commodities and foreign exchange futures in five-minute
intervals to generate a variety of engineered features like moving correlations, a single model is trained
instead of a separate model for each target instrument, resulting in an average accuracy of 42.0% for the
investigated three-class prediction task.
No cross-validation is carried out in the above work, which would further validate the results for economic
conclusions. The latter process refers to the splitting of the dataset into multiple sets of approximately the
same size. A model is then trained on all but one of the sets and tested on the remaining one, alternating
which sets are used for training and testing. This way, multiple tests are carried out over all instances of
the data, and the results for all models are then averaged. This is the “gold standard” in machine learning,
as it addresses the possibility of testing models on an unrepresentative subset. Cross-validation makes for
more reliable and stable results; therefore, we apply this approach to our analysis [28].
Krauss et al. [14] investigate the effectiveness of deep learning, gradient-boosted-trees and random forests
in the detection of statistical arbitrage. Their results indicate that, when combining one each of deep-layer
neural networks, gradient-boosted trees and random forests, one may obtain returns exceeding 0.45% per
day before taking transaction costs into account. Research on artificial neural networks for stock market
prediction does, however, remain sparse over the last two decades. In contrast, momentum trading strategies
have received increased attention in financial research in recent times. The apparent ability of momentum-
based strategies to outperform the market are viewed as a premier anomaly within the framework of the
efficient market hypothesis [see 29].
Another area of research that has gathered attention is the text-based prediction of stock markets using
machine learning models. The notion of using news articles, with new information, as opposed to historical
market data, to predict stock prices was introduced by Lavrenko et al. [30] and is a common baseline for
subsequent research. A system devised by Schumaker and Chen [31], named AZFinText, which employs
wide-spread news coverage, results in a directional accuracy of 57.1% for the best-performing model. The
approach involves a support vector machine with a proper-nouns scheme instead of a simple bag-of-words
approach, in combination with a stock’s current price as inputs, over a five-week period. The authors of the
above research show their system to be successful within a twenty-minute time frame, which falls under the
margin of earlier research concluding that the majority of market responses to new information experiences
a time lag of approximately ten minutes [see 32]. Subsequent research demonstrates that AzFinText can
outperform established quantitative funds [33]. In related research, Ding et al. [34] propose the use of a
neural tensor network to learn event embedding from financial news articles in order to feed that information
into a deep convolutional neural network for a two-class prediction of a stock price’s future movement. For
this system, an accuracy of 65.9% is reported for 15 different S&P 500 stocks and daily trend predictions.
No clear indication, however, is given as to how the reported stocks are selected, which undermines the
reported results.
3
3. Data and methodology

3.1. Obtaining suitable datasets


We obtain two sets of high-frequency transaction data from the Thomson Reuters Tick History 1
database. Each dataset contains a stock identification code, the transaction date and time to the near-
est millisecond, the high/best bid price, the low/best ask price, the volume, and the average transaction
price. The first dataset is sampled from April 2011 to April 2016, covering five years of transaction price
information for S&P 500 stocks. The second dataset spans April 1996 to April 2016, which results in a larger
dataset covering 20 years, including the financial crisis period of 2007/2008. The datasets show instances
of missing values for the price variables in some stocks, especially earlier in the largest of the datasets,
thus suggesting that there are no transactions recorded for those intervals. We follow Chordia et al. [35]
by replacing the missing values with the same-column entries of the preceding index or the index with the
next non-missing value, depending on whether the former belongs to the same stock as indicated by the
stock identification code. In order to generate feature vectors that can be used as inputs, the time stamps
have to be aligned perfectly. We implement a data cleansing approach to secure a time-wise alignment of
observations for different stocks by substituting missing rows.
As naı̈ve procedures that loop over complete datasets and copies of a full matrix for every missing
observation result in infeasible time estimates, our approach acts solely on time vector comparisons for a
specific stock’s matrix, with matrix shifts in the case of local time vector incompatibility, and operating
on a declared matrix of sufficient dimensionality to allow for insertions instead of appending values. All
observations with inexplicable and incomplete timestamps are also eliminated. Following the data cleansing
process, the datasets are checked for a subset of stocks satisfying the requirement of being consistently
present over a sufficiently large portion of the dataset’s time frame divisible by the chosen number of time
steps for the subsequent gradient calculation. The datasets’ price information is extracted and transformed
into a feature matrix with row-wise time steps and column-wise stocks.

3.2. Statistical feature engineering


Feature engineering describes the manual selection and, if necessary, transformation of given datasets
into new data that better represents the features needed for a chosen task. For this paper, and following
Moews et al. [10], simple linear regressions are used as way to approximate the trends over given time
intervals. These regressions are least-squares estimators of such models with one explanatory variable to fit
a line that minimises the squared sum of the residuals. They take the form of a minimisation problem to
find the intercept β0 and the slope β1 ,
n
!
X
2
min Q(β0 , β1 ) = min (yi − β0 − β1 xi ) . (1)
β0 ,β1 β0 ,β1
i=1

By running a linear regression over each time series and time interval separately, and by taking the first
derivative of the resulting equation, the trend gradients for single stocks and time intervals are obtained.
Given aligned stock prices for N points in time and a chosen step size s, the resulting feature vector generated
from a stock’s price has the length N/s. Depending on the time frame covered by the dataset, limits apply
to the size of intervals that can be chosen to still obtain a viable size for the training set. For the five-year
dataset, three sets of gradients are computed: The first set covers 1,242 one-day gradients per stock, the
second set covers 7,361 one-hour gradients, and the third set covers 14,725 half-hour gradients for 449 stocks.
For the 20-year dataset with hourly values, daily gradients are computed for the years from 2003, preceding
the global financial crisis, to and including 2008, resulting in 2,131 gradients.

1 https://developers.refinitiv.com/thomson-reuters-tick-history-trth

4
3.3. Training deep learning models
Figure 2 depicts a schematic overview of the experimental setup used in this paper. For the number n1
of stocks that are made usable during the data cleansing and preprocessing, gradients of the price trends
for each separate stock are computed in the feature engineering step, as described in Section 3.2. The n − 1
gradients for one time step t − 1 are then used as inputs to a feedforward artificial neural network that is
fully connected for adjacent layers to predict whether the gradient of the left-out nth stock changes upwards
or downwards with regard to its gradient in the preceding time step t − 1. This setup resembles Moews et al.
[10] and allows the comparison of one-day results with the latter. The reason for omitting the target stock
gradient for the preceding period in the input vector is that the described framework is designed to test
for lagged correlations, which prohibits using the target stock’s own information. Predictions are meant to
explore the predictability based solely on correlations with other members of the S&P 500 stocks. This is a
major distinguishing aspect of this paper from related research on time series-based stock market prediction.

Figure 2: Model setup for the experiments. A simple linear regression is performed on each time step for each stock’s price
series, after which the first derivative of the resulting regression is computed as a trend strength indicator. For each target
stock, all stocks’ gradients for the preceding time step except for information about the target stock are then used as input
features for an neural network with five hidden layers and two output nodes for a binary classification task. The model is
then, with separate experiments for each stock, trained to predict the upwards or downwards change of the target stock’s trend
gradient in the next time step.

In order to identify general correlations, we reduce result variability via five-fold cross-validation After
each five-way split and sorting into a training set of 80% of the dataset for the respective fold, another 25%
of the training set is partitioned off as the validation set for an early stopping procedure, resulting in a
60-20-20 split for each fold and stock. The experiments are run for all stocks, with an output vector that is
replaced by a binary one-hot representation to state whether a gradient in a given time interval is increased
or reduced for the next interval. Normalising the inputs is necessary in order to address geometrical biases,
to distribute the importance of values equally, and to ensure that all values are situated in the same range

5
to make them comparable for an efficient learning process. The training examples split from the dataset are
normalised element-wise via
X − Xmin
Xnorm = . (2)
Xmax − Xmin
Regularisation is a valuable way to address issues with overfitting, meaning poor generalisation. Com-
menting on the inherent vagueness of parameter tuning with regard to real-world applications, Snoek et
al. [36] summarise the problems associated with the terrain by stating that machine learning algorithms
“[...] frequently require careful tuning of model hyperparameters, regularization terms, and optimization
parameters. Unfortunately, this tuning is often a ‘black art’ that requires expert experience, unwritten rules
of thumb, or sometimes brute-force search” [36]. While such a statement sounds bleak, the parameters and
hyperparameters that have to be set can be determined, or at least approximated. The choices we make in
this study follow a combination of established scientific reasoning and preliminary experimentation, as well
as experience in the application of deep learning architectures and simple heuristics. These are employed to
slowly increase the model’s complexity, and further details can be found in Nielsen [37]. Preliminary tests
on a subset of stocks result in a model architecture with five hidden layers, with 400 artificial neurons each.
Addressing potential memory issues, a mini-batch size of 100 is chosen, and each model is trained for 50
epochs, with early stopping as a regularisation measure. In order to optimise the gradient descent trajectory,
our setup also utilises dynamic learning rate decay and momentum. We make use of hyperbolic tangent
activations, sigmoid outputs, and Gaussian-initialised model weights [see 38]. In training artificial neural
networks, sigmoid functions are a term applied to the special case of the logistic function, with a steepness
value of k = 1 and a midpoint of x0 = 1. The sigmoid function is calculated as
1
sigm(x)j = . (3)
1 + e−k·(xj −x0 )
It is important to note that values for the sigmoid function level out at zero on the lower end, which
can lead to a fast saturation of weights at the top layers of multi-layered artificial neural networks [39].
For this reason, we restrict its application to the output layer and use the hyperbolic tangent function for
activations, which is similar but is centred on zero instead of 0.5, with a lower limit of −1 and the same
upper limit of one, meaning

sinh(xj ) exj − e−xj 1 − e−2xj


tanh(x)j = = xj = . (4)
cosh(xj ) e + e−xj 1 + e−2xj

In addition, ℓ2 regularisation is chosen over ℓ1 regularisation. This decision is driven by the encour-
agement to use all inputs to a certain degree, as a complex interdependence of the studied time series is
assumed due to the failures of past approaches to identify simpler correlations. Through the introduction of
decaying weights, and with the parameter λ defining the trade-off between the loss function and the penalty
for larger weights, the gradient descent is given as
∂E
wj,i = wj,i − η − ηλwj,i , (5)
∂wj,i

where wj,i is a weight, η is the chosen learning rate, and E is the total error valued on the basis of loss
functions under a certain set of weights between layers.P Using
P the example of the quadratic cost function,
the total error for this case can be calculated as E = 0.5 i j (ŷj,i − yj,i ), where j indexes the output units
and i the pairs of training examples and corresponding outputs, and ŷ and y denote the calculated outputs
and actual labels, respectively.

3.4. High volatility and bottlenecks


In the context of the given problem, an interesting question is that of its general complexity with regard
to the number of variables the relevant data can be reduced to for an acceptable accuracy. We investigate
this by inserting a bottleneck layer consisting of a small number of neurons into the models for the daily

6
predictions based on the five-year dataset with hourly values. This process is implemented for numbers
of bottleneck nodes from the set {1, 3, 5, 10} to see how the bottleneck size influences the accuracy. This
approach is favoured over using autoencoders [see 13] as a precedent to using their compression layer as
inputs for a full model, as autoencoders learn a goalless reduction of their inputs.
While similar to using a bottleneck, auto-encoders recreate their input as their output through a reduction
layer without a target outside of the input recreation, whereas our chosen method reduces the complexity
via the bottleneck layer with the explicit goal of making that reduction suitable for the computational task.
For a bottleneck layer in the same model, the latter is forced to learn a compressed representation that is
suited to the target predictions at hand, funnelling the model’s learning process through the nodes of the
respective bottleneck. Figure 3 shows the resulting model, featuring an exemplary additional bottleneck
layer for complexity tests with one node marked b.

Figure 3: Model with a one-neuron bottleneck layer. The depicted model inserts a one-node bottleneck in a deep feed-forward
neural network model for binary classification. The reason for this insertion is to force the model to learn a goal-oriented
reduction of the model’s information by “squeezing” the necessary information through a narrow representation mid-way.

Virtually all previous studies applying machine learning approaches to stock market prediction, with
the exception of Krauss et al. [14], are not tested for above-average results in volatile market situations.
However, Schumaker and Chen [31], in their development of the AZFinText system, note the importance
of such a test. For time series-based prediction, high-volatility environments are more problematic contexts
than for text analysis-based systems like AZFinText. This is because they rely solely on stock market
information that is effectively in a state of turmoil. In order to test the gradient-based approach and the
model implementation for such environments, data from 2003 to, and including, 2008 is extracted from the
20-year dataset with hourly values and used to compute daily gradients. No cross-validation is performed
for these experiments, as the test set has to represent a phase of enhanced volatility in the market.
In order to achieve this, a test set from July 2007 to the end of 2008 is split from the set of gradients,
with the previous 4.5 years serving as training data. This setup also more closely resembles an applicable
prediction system instead of just an economic hypothesis testing framework, as only past data is used instead
of identifying general correlations across different time periods through cross-validation. Due to the large
fluctuations in the dataset, and given that some stocks remain more stable than others during the financial
crisis of 2007/2008, a higher variance of accuracies is expected. An accuracy above the baseline, which is
expected to be higher than random chance due to the general negative trend in that time period, delivers
evidence for the persistence of correlations in high-volatility scenarios such as global crises.
7
3.5. Reliability and robustness analysis
In order to validate our results, baselines addressing the market behaviour and the distributions of target
vectors are necessary to find statistically significant evidence. The focus of this paper on lagged correlations
between stocks in relation to economic theory, manifested in excluding the target stock data in the models’
inputs, sets this work apart from research that aims to find the best-possible stock market predictions
instead of correlations in stock market prices. For research focussed purely on prediction accuracy, baselines
representing naı̈ve regression (or classification) or basic machine learning methods are more suitable. For
related research in Moews et al. [10], however, the gradient-based deep learning approach is shown to
considerably outperform a simple support vector machine, which translates to our work. In this paper, the
threat of coincidences is partly approached through cross-validation, but a model’s accuracy must also lie
significantly above the accuracies of one-value and random predictions.
Another area of concern relates to the accuracy of random mock predictions, which we address by
randomly shuffling the predictions of our model. If the accuracies of these mock predictions are similar to
the accuracies achieved through unshuffled predictions, this indicates that only distributional characteristics
are learned. Importantly, this shuffled set of mock predictions also provides a test against the alternative
explanation of market shocks, as correlations resulting from the latter would move more uniformly and, as
such, would lift this comparison baseline closer to the model accuracy. The third problem that needs to
be contended with is that models could identify the majority class of the binary targets, simply outputting
the latter. This potential issue can be explored by creating two additional sets of mock predictions, each
one covering one of the binary targets, which also allows a comparison regardless of target imbalances. The
model’s predictions have to outperform both of these mock sets in order to demonstrate the results not
being the result of simply learning the majority class.
The last baseline is a combination of the three described baselines, for which we take the highest accuracy
for each stock and time step. In doing so, the resulting set of predictive accuracies is guaranteed to be always
be at least 50% and provides an easy way to check for model performance against a single baseline. We
calculate the average accuracy across S&P 500 stocks for a given experiment, and visually compare it to
the baseline results via notched box-and-whiskers plots. In addition, we also provide the lower bounds for
confidence intervals and p-values for an upper-tail Welch’s t-test with a = 0.001.

4. Results and discussion

4.1. Primary experiments


In panels A, B and C of Figure 4, we present box-and-whiskers plots to visualise the results of the
one-day, one-hour and half-hour gradient interval results, respectively. For every experiment, the accuracies
for the model and the baselines are given, as well as the p-values with regard to the means and the minimal
difference for a 99.9% confidence interval. The whiskers show the lowest and highest data point that is
within 1.5-times the interquartile range of the first and third quartile, and are computed via
whisker↑ = min(max(data), Q3 + 1.5 · (Q1 − Q3 )),
(6)
whisker↓ = max(min(data), Q1 − 1.5 · (Q1 − Q3 )).
The reason for using Welch’s t-test instead of Student’s t-test is that the former is more robust when
confronted with unequal variances. As shown in Figure 4, this choice proves to be relevant due to larger
variances of model predictions.
The accuracy for one-day predictions, as shown in panel A of Table 1, is 56.02% and shows statistically
significant deviation in means from the baselines, with associated p-values affirming the discardment of the
null hypothesis. This result is further reinforced by panel A of Figure 4 for medians, with no overlap in
both notches and boxes in the box-and-whiskers plots. Panel B of Figure 4 shows the plot for the one-hour
gradient intervals. With an accuracy of 53.95%, the model’s accuracies exhibit the same increased variability
as for one-day gradients, albeit at a smaller rate. The baselines’ accuracies are centred more closely on 50%
(see also panel B of Table 1), which is consistent with the overall smaller spread of the accuracies for both
the model and the baselines, trading accuracy for narrowness.
8
Figure 4: Notched box-and-whiskers plots for accuracies of various time intervals. Panels A, B and C present the plots for
one-day, one-hour and half-hour intervals, respectively. On the horizontal axis, ‘class 1’ and ‘class 2’ concern downwards-only
and upwards-only changes, ‘randomised’ represents randomly shuffled model predictions in order to test for simple distribution
learning, and ‘best-of’ takes the best result out of all three baselines at each step. A statistically significant difference in means
at 95% confidence is indicated if notches are non-overlapping. The lower and upper ends of a box indicate the first and third
quartile, while the median is depicted as a horizontal bar. Outliers are shown as circles above or below the whiskers.

Figure 4’s panel C, showing the plot for the half-hour interval gradients, suggests a model accuracy of
51.70%. First, the distribution of the model’s accuracies is also skewed towards lower values, meaning more
variability above the median. Secondly, there appears to be an emerging pattern when all three plots are
considered together, as the model accuracies are lower for smaller gradient time intervals. The narrower
boxes for the baselines also imply, however, that the quartile ranges of the accuracy values are narrower,
evidencing decreasing variability as considered time intervals become shorter.
In combination, these metrics deliver strong evidence in support of the economic arguments implied in
our hypothesis, meaning that price series in historical stock market data contain lagged correlations that can
be successfully exploited with deep-layered feed-forward neural networks, and that the time horizon has a
notable effect on predictability. This exploitation can result in above-baseline price trend predictions without
using any data from the target stock. Larger time intervals for gradient calculations and predictions based
on the latter result in higher average accuracies, but with a trade-off in the form of an increased spread of
the accuracies, meaning a larger variance. It therefore seems to be easier for the models to learn correlations
between gradients and make corresponding predictions for larger time steps, although more volatility with
regard to the variance is incorporated.
A natural explanation for these differences is the presence of more noise in short-time stock observations,
indicating that noise is smoothed out for regressions over larger intervals. Such noise events, such as bid-ask

9
accuracies of predictions
panel model rand. class 1 class 2 best-of
A 0.5602 0.5002 0.4955 0.5045 0.5092
B 0.5395 0.5008 0.4973 0.5027 0.5043
C 0.5170 0.5011 0.4979 0.5021 0.5030
tests against baselines
panel measure rand. class 1 class 2 best-of
A p-value < 0.001 < 0.001 < 0.001 < 0.001
B p-value < 0.001 < 0.001 < 0.001 < 0.001
C p-value < 0.001 < 0.001 < 0.001 < 0.001
A min. diff. 0.0599 0.0607 0.0518 0.0471
B min. diff. 0.0356 0.0392 0.0338 0.0322
C min. diff. 0.0137 0.0169 0.0127 0.0118

Table 1: 449 S&P 500 stocks are used to predict directional trend changes for each stock, with only the respective other 448
stocks’ trends from the preceding time interval as input features. In the table, accuracies, p-values and the minimal difference
with regard to the significant difference in means are given. Here, ‘class 1’ is the prediction that stock trends will change
downwards, and ‘class 2’ is the prediction that stock trends will change upwards, whereas ‘randomised’ represents the same
prediction distribution as the model’s predictions sampled randomly to test for simple distribution learning. The ‘best-of’
features the respective highest accuracy of the three baselines, meaning ‘class 1’, ‘class 2’, and ‘randomised’, to result in a
fourth baseline that is, by default, at least 50%. Panels A, B and C present the estimates for one day, one hour and half-hour
intervals, respectively.

bounce, are microstructural in nature and are thus more prevalent in higher-frequency data. Although our
results are in direct contradiction to two of the well-established hypotheses in modern finance, the findings
presented are not exactly far-fetched. While meta-analyses should always be interpreted cautiously due
to the possibility of publication biases, Park and Irwin [20] find that a majority of the published research
dealing with quantitative stock market prediction reports findings that indicate a critical view of the efficient
market hypothesis in its strong form.
Gradient-based deep learning for stock market trend prediction is explored in Moews et al. [10], but
employs infinite impulse response filtering to smooth input features and only covers daily predictions in
non-volatile markets. Therefore, a further distinction of our study are the periods on which the stock price
predictions are based, namely one-day, one-hour and half-an-hour predictions instead of months or single
days, as is the case with existing research. Focussing on higher-frequency intervals is critical given the
evolution of global financial markets in recent times, as trading now occurs in large volumes at sub-second
frequencies and at much lower trade sizes [see 35]. Thus, investment windows of even significantly large
funds have shortened over time. In terms of an investigative tool for portfolio managers, previous studies
have explored the applicability of machine learning to intraday risk prediction [40]. Our approach, in
contrast, offers an easily quantifiable way to measure the impact of complex non-linear market correlations
of individual stocks. In order to shield portfolios against market movements, the identification of stocks with
very low correlations is facilitated via stocks with low to non-existent trend predictability in our framework,
thus offering an additional tool for risk diversification.
Our results for daily predictions outperform related research on neural networks for one-day trend pre-
dictions [11, 14]. In contrast, our result of 56.02% underperforms in comparison to 58.10% reported by
Moews et al. [10], the difference being that no aid in the form of input smoothing is provided to the models.
The latter research also answers the question about the impact of target stock information being addition-
ally provided to the model; the result is a minimal increase in accuracy from 58.10% to 58.54%, meaning
that almost all relevant information is already reflected in, and can be extracted from, other stocks’ trend
approximations. As our work is, to our knowledge, the first study to compare prediction horizons and model
viability in financial crisis environments, we hope that future research will enable comparisons.

10
4.2. Complexity and volatile environments
In order to show the effect of bottleneck sizes, the box plots for four cases are depicted in Figure 5.
The models’ accuracies significantly increase with the increases in bottleneck sizes, with the maximum
(10 bottleneck nodes) resulting in an accuracy slightly below the full model without a bottleneck. The
quartile ranges of the box plots, remaining approximately symmetrical, also increase with a higher number
of bottleneck nodes. The inclusion of a bottleneck layer in the neural network model should hinder its
performance, with the number of nodes forming the bottleneck being the deciding factor. However, if
judged via the notches of the box plots, the step-wise increases from one to three, then five, and finally ten
nodes, each time lead to a statistically significant rise in performance. While a one-node bottleneck results
in an average accuracy of 51.07%, the result for a ten-node bottleneck, with 55.03%, differs only by 0.99%
from the accuracy of the same model and dataset without a bottleneck layer.

Figure 5: Notched box-and-whiskers plots for accuracies for different bottleneck sizes within a feed-forward neural network
model. A statistically significant difference in medians at 95% confidence is indicated if notches are non-overlapping. The lower
and upper ends of a box indicate the first and third quartile, while the median is depicted as a horizontal bar. Outliers are
shown as circles above or below the whiskers.

The possibility of the model not learning anything new after the introduction of bottleneck(s), meaning
the bottlenecked model performance being identical to a model with less hidden layers, has to be taken into
account. This can, however, be safely dismissed due to the accuracies for 3, 5 and 10 nodes being notably
different from each other. The results suggest that a large portion of the information can be compressed in
ten weighted variables halfway through the model, which gives a rough indication of the overall complexity
of the prediction problem itself. Notably, Park et al. [41] find that stock market complexity has decreased
over the course of the last 20 years, so future research may find smaller bottlenecks to be similarly viable.
Figure 6 presents plots for accuracies computed in a high-volatility (crisis) environment. Table 2 also
shows the key performance estimates for the same scenario. Specifically, the results are for the trading
environment during the global financial crisis of 2007/2008. The results show a large spread of the accuracies
for different stocks. Notably, the medians and mean accuracies for one-class predictions show that the price
trends more often change downwards during this modelled volatile scenario. In addition, the distribution
of the model’s accuracies is also skewed below the median; for example, the accuracies are spread higher
upwards from the median, and the interquartile ranges are wider than for non-crisis scenarios.
While the average accuracies for predicting exclusively down- or upwards trend changes do not differ
by more than 0.9% for the primary experiments, this difference grows to 11.90% for the crisis data. The
11
Figure 6: Notched box-and-whiskers plots for accuracies in a high-volatility scenario. A statistically significant difference in
medians at 95% confidence is indicated if notches are non-overlapping. The lower and upper ends of a box indicate the first
and third quartile, while the median is depicted as a horizontal bar. Outliers are shown as circles above or below the whiskers.

accuracies of predictions
model rand. class 1 class 2 best-of
0.6113 0.5301 0.4405 0.5595 0.5595
tests against baselines
rand. class 1 class 2 best-of
p-value < 0.001 < 0.001 < 0.001 < 0.001
min. diff. 0.0677 0.1589 0.0399 0.0388

Table 2: Statistical KPIs for high-volatility environments. In the table, accuracies, p-values and the minimal difference with
regard to the significant difference in means are given. ‘class 1’ is the prediction that stock trends will change downwards,
and ‘class 2’ is the prediction that stock trends will change upwards, whereas ‘randomised’ represents the same prediction
distribution as the model’s predictions sampled randomly to test for simple distribution learning. The ‘best-of’ features the
respective highest accuracy of the three baselines, meaning ‘class 1’, ‘class 2’ and ‘randomised’, to result in a fourth baseline
that is, by default, at least 50%.

model’s high accuracy of 61.13% can be partly explained by this difference, as the mean accuracy for
predicting exclusively negative gradient changes is 55.95% for the model’s predictions, as shown in Table 2.
The latter results are consistent with the general downwards-oriented trend of the market during a financial
crisis, when confidence in the economy is expected to have fallen. Nevertheless, the additional accuracy of the
model, along with the accuracy of the randomised mock predictions being below the exclusive predictions
for negative trend changes, demonstrates that the model is able to exploit correlations in high-volatility
environments such as the 2007/2008 financial crisis period. Thus, our findings are robust in the presence of
extreme market stress.

12
5. Conclusion

In this paper, we add further evidence to the hypothesis that stock price data contains lagged correlations
that can be successfully exploited through the application of deep learning models, including the exposure to
financial crisis environments. Specifically, by using trend approximations as features, we can make higher-
than-average price trend predictions without using the data of the target stock in the inputs. This finding
contradicts the essence of the random walk hypothesis and the semi-strong and strong forms of the efficient
market hypothesis, while underscoring the viability of applying deep learning for trend prediction in cross-
correlated financial time series. Several tests of robustness are implemented, including testing the model
in an environment of extreme price volatility, all of which firmly support the primary findings that stock
price data contains actionable information through lagged correlations. In addition, the levels of prediction
accuracies we obtain for the S&P 500 stocks in our sample outperform the predefined baselines for strict
statistical key performance indices to ensure significance.
Furthermore, we find that predictions of one stock’s trend changes based on other stocks’ price trend
gradients in the preceding time interval improves in accuracy for larger time intervals, with average and
maximum accuracies of 56.02% and 63.95%, respectively, for one-day predictions, all in excess of stated
baseline predictions. The estimates retain large parts of their accuracy for a minimum of 10 nodes for mid-
model bottleneck layers, and show equally above-baseline predictions in high-volatility market scenarios,
albeit with the trade-off of a higher variance for different stocks. The large difference in accuracies for the
model’s predictions and the baselines cannot be explained by the shift to a skewed distribution in favour
of negative trend changes, given that the training data does not contain such an imbalance. An alternative
explanation that assumes market shocks steering large-scale trend movements is ruled out by testing the
success of randomised model predictions.
As discussed previously by Moews et al. [10], weak-form market efficiency can be upheld under the
additional assumption that only a select number of market agents have access to, or choose to implement,
techniques such as the one described in this work. In other words, if a small-enough amount of capital is
involved with exploiting lagged correlations in stock markets, these correlations are still present in price
data and the market remains efficient for the majority of market agents. Given the observed reluctance of
financial firms to deploy artificial intelligence approaches on a large scale, as well as the inherent difficulties
of black-box models when having to produce research for market oversight agencies, this scenario does not
seem unlikely and allows for our results to be compliant with much of the financial literature.
This interpretation resembles the adaptive market hypothesis put forward by Lo [42], which extends the
efficient market hypothesis through behavioural economics. One of the central tenets of this extension is
that reduced market efficiency results from a lack in competition for abundant resources, whereas scarce
resources in highly competitive environments result in more effective markets. For the U.S. market, this
hypothesis is, for example, supported by research by Ito and Sugiyama [43] and Kim et al. [44]. Further,
more compelling evidence for the adaptive market hypothesis is subsequently presented by Urquhart and
McGroarty [45, 46], using long-term datasets for the Dow Jones Industrial Average and various global stock
exchanges, respectively, and through investigations of bond markets [47].
Finally, the results in this paper demonstrate the value of deep learning approaches to both risk diversifi-
cation and econometrics, and show the utility of linear regression derivatives as features for time series-based
investigations, thus offering a simple trend indicator with a high predictive value in order to further the
understanding of large-scale time series correlations in complex systems.

Acknowledgements

We wish to express our gratitude to Peter Zimmerman and other discussants of this paper at the Third
Workshop of the European Capital Markets Cooperative Research Centre (ECMCRC) on July 5 2019 in
Dublin, Ireland, whose additional insights led to the improvement of this work.

13
References

References
[1] K. Drakos, Terrorism-induced structural shifts in financial risk: airline stocks in the aftermath of the September 11th
terror attacks, Eur. J. Political Econ. 20 (2) (2004) 435–446. doi:10.1016/j.ejpoleco.2003.12.010.
[2] A. A. Perryman, F. C. Butler, J. A. Martin, G. R. Ferris, When the CEO is ill: Keeping quiet or going public?, Bus.
Horiz. 53 (1) (2010) 21–29. doi:10.1016/j.bushor.2009.08.006.
[3] E. F. Fama, The behaviour of Stock market prices, J. Bus. 38 (1) (1965) 34–105. doi:10.1086/294632.
[4] E. F. Fama, Efficient capital markets: A review of theory and empirical work, J. Finance 25 (2) (1970) 383–417.
doi:10.2307/2325486.
[5] M. G. Kendall, A. B. Hill, The analysis of economic time series part I: Prices, J. R. Stat. Soc. Ser. A Stat. Soc. 116 (1)
(1953) 11–34. doi:10.2307/2980947.
[6] P. Cootner, The random character of stock market prices, Cambridge: MIT Press, 1964.
[7] B. G. Malkiel, A random walk down Wall Street, New York: W. W. Norton & Company, Inc., 1973.
[8] P. Ferreira, A. Dionsio, How long is the memory of the US stock market?, Physica A 451 (2016) 502–506.
doi:10.1016/j.physa.2016.01.080 .
[9] D. Li, Z. Kou, Q. Sun, The scale-dependent market trend: Empirical evidences using the lagged DFA method, Physica A
433 (2015) 26–35. doi:10.1016/j.physa.2015.03.034.
[10] B. Moews, J. M. Herrmann, G. Ibikunle, Lagged correlation-based deep learning for directional trend change prediction
in financial time series, Expert Syst. Appl. 120 (2019) 197–206. doi:10.1016/j.eswa.2018.11.027.
[11] N. Huck, Pairs selection and outranking: An application to the S&P 100 index, Eur. J. Op. Res. 196 (2) (2009) 819–825.
doi:10.1016/j.ejor.2008.03.025.
[12] E. Chong, C. Han, F. C. Park, Deep learning networks for stock market analysis and prediction: Methodology, data
representations, and case studies, Expert Syst. Appl. 83 (2017) 187–205. doi:0.1016/j.eswa.2017.04.030 .
[13] J. B. Heaton, N. G. Polson, J. H. Witte, Deep learning for finance: Deep portfolios, Appl. Stoch. Models Bus. Ind. 33 (1)
(2017) 3–12. doi:10.1002/asmb.2209.
[14] C. Krauss, X. A. Do, N. Huck, Deep neural networks, gradient-boosted trees, random forests: Statistical arbitrage on the
S&P 500, Eur. J. Oper. Res. 259 (2) (2017) 689–702. doi:10.1016/j.ejor.2016.10.031.
[15] J. P. Clarke, T. Jandik, G. Mandelker, The efficient markets hypothesis, in: R. C. Arffa (Ed.), Expert financial planning:
Investment strategies from industry leaders, New York: Wiley, 2000, Ch. 9, pp. 126–141.
[16] N. F. Johnson, P. Jeffries, P. M. Hui, Financial market complexity, Oxford: Oxford University Press, 2003.
[17] R. Sitte, J. Sitte, Neural netwworks approach to the random walk dilemma of financial time series, Appl. Intell. 16 (3)
(2002) 163–171. doi:10.1023/A:1014380315182.
[18] A. Skabar, Direction-of-change financial time series forecasting using a similarity-based classification model, J. Forecast.
32 (5) (2013) 409–422. doi:10.1002/for.2247.
[19] S. A. R. Rizvi, G. Dewandaru, O. I. Bacha, M. Masih, An analysis of stock market efficiency: Developed vs Islamic stock
markets using MF-DFA, Physica A 407 (2014) 86–99. doi:10.1016/j.physa.2014.03.091.
[20] C.-H. Park, S. Irwin, The profitability of technical analysis: A review, Working Paper 37487, University of Illinois at
Urbana-Champaign, Department of Agricultural and Consumer Economics (2004). doi:10.2139/ssrn.603481 .
[21] M. Tanaka-Yamawaki, S. Tokuoka, Adaptive use of technical indicators for the prediction of intra-day stock prices, Physica
A 383 (1) (2007) 125–133. doi:10.1016/j.physa.2007.04.126 .
[22] S. N. Neftci, Naive trading rules in financial markets and Wiener-Kolmogorov prediction theory: A study of ”technical
analysis”, J. Bus. 64 (4) (1991) 549–571. doi:10.1016/0167-6687(93)91042-S.
[23] P. H. K. Chang, C. L. Osler, Methodical madness: Technical analysis and the irrationality of exchange-ate forecasts, Econ.
J. 109 (458) (1999) 636–661. doi:10.1111/1468-0297.00466.
[24] A. W. Lo, H. Mamaysky, J. Wang, Foundations of technical analysis: Computational algorithms, statistical inference, and
empirical implementation, J. Finance 55 (4) (2000) 1705–1765. doi:10.1111/0022-1082.00265 .
[25] H. White, Economic prediction using neural networks: The case of IBM daily stock returns, Proceedings of the IEEE 1988
International Conference on Neural Networks 2 (1988) 451–458. doi:10.1109/ICNN.1988.23959.
[26] E. W. Saad, D. V. Prokhorov, D. C. Wunsch, Comparative study of stock trend prediction using time delay, recurrent
and probabilistic neural networks, IEEE Trans. Neural Networks 9 (6) (1998) 1456–1470. doi:10.1109/72.728395.
[27] M. Dixon, D. Klabjan, J. H. Bang, Classification-based financial markets prediction using deep neural networks, Alg. Fin.
6 (3-4) (2017) 67–77. doi:10.3233/af-170176.
[28] S. Arlot, A. Celisse, A survey of cross-validation procedures for model selection, Statist. Surv. 4 (2010) 40–79.
doi:10.1214/09-SS054.
[29] E. F. Fama, K. R. French, Dissecting anomalies, J. Finance 63 (4) (2008) 1653–1678.
doi:10.1111/j.1540-6261.2008.01371.x.
[30] V. Lavrenko, M. Schmill, D. Lawrie, P. Ogilvie, D. Jensen, J. Allan, Language models for financial news recommendation,
in: Proceedings of the 9th International Conference on Information and Knowledge Management, 2000, pp. 389–396.
doi:10.1145/354756.354845.
[31] R. P. Schumaker, H. Chen, Textual analysis of stock market prediction using breaking financial news: The AZFin Text
system, ACM Trans. Inf. Syst. 27 (2) (2009) 12:1–12:19. doi:10.1145/1462198.1462204.
[32] J. M. Patell, M. A. Wolfson, The intraday speed of adjustment of stock prices to earnings and dividend announcements,
J. Financial Econ. 13 (2) (1984) 223–252. doi:10.1016/0304-405X(84)90024-2 .

14
[33] R. P. Schumaker, H. Chen, A quantitative stock prediction system Based on financial news, Inf. Process. Manag. 45 (5)
(2009) 571–583. doi:10.1016/j.ipm.2009.05.001 .
[34] X. Ding, Y. Zhang, T. Liu, J. Duan, Deep Learning for event-driven stock prediction, in: Proceedings of the 24th
International Conference on Artificial Intelligence (IJCAI’15), 2015, pp. 2327–2333.
[35] T. Chordia, R. Roll, A. Subrahmanyam, Market liquidity and trading activity, J. Finance 56 (2) (2001) 501–530.
doi:10.1111/0022-1082.00335 .
[36] J. Snoek, H. Larochelle, R. P. Adams, Practical Bayesian optimization of machine learning algorithms, in: Advances in
Neural Information Processing Systems 25, Vol. 4, 2012, pp. 2951–2959.
[37] M. A. Nielsen, Neural networks and deep learning, San Francisco: Determination Press, 2015.
[38] K. He, X. Zhang, S. Ren, J. Sun, Delving deep into rectifiers: Surpassing human-level performance on ImageNet classifi-
cation, in: Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), 2015, pp. 1026–1034.
[39] X. Glorot, Y. Bengio, Understanding the difficulty of training deep feedforward neural networks, in: Y. W. Teh, M. Tit-
terington (Eds.), Proceedings of the 13th International Conference on Artificial Intelligence and Statistics (AISTATS10),
Vol. 9, 2010, pp. 249–256.
[40] S. S. Groth, J. Muntermann, An intraday market risk management approach based on textual analysis, Decis. Support
Syst. 50 (4) (2011) 680 – 691. doi:10.1016/j.dss.2010.08.019.
[41] J. B. Park, J. W. Lee, J.-S. Yang, H.-H. Jo, H.-T. Moon, Complexity analysis of the stock market, Physica A 379 (1)
(2007) 179–187. doi:10.1016/j.physa.2006.12.042.
[42] A. W. Lo, The adaptive markets hypothesis: Market efficiency from an evolutionary perspective, J. Portfolio Manag.
30 (1) (2004) 15–9.
[43] M. Ito, S. Sugiyama, Measuring the degree of time varying market inefficiency, Econ. Lett. 103 (1) (2009) 62–64.
doi:10.1016/j.econlet.2009.01.028.
[44] J. H. Kim, A. Shamsuddin, K.-P. Lim, Stock return predictability and the adaptive markets hypothesis: Evidence from
century-long U.S. data, J. Empir. Finance 18 (5) (2011) 868–879. doi:10.1016/j.jempfin.2011.08.002 .
[45] A. Urquhart, F. McGroarty, Calendar effects, market conditions and the Adaptive Market Hypothesis: Evidence from
long-run U.S. data, Int. Rev. Financ. Anal. 35 (2014) 154–166. doi:10.1016/j.irfa.2014.08.003.
[46] A. Urquhart, F. McGroarty, Are stock markets really efficient? Evidence of the adaptive market hypothesis, Int. Rev.
Financ. Anal. 47 (2016”) 39–49. doi:10.1016/j.irfa.2016.06.011 .
[47] L. Charfeddine, K. B. Khediri, G. C. Aye, R. Gupta, Time-varying efficiency of developed and emerging bond markets:
Evidence from long-spans of historical data, Physica A 505 (2018) 632–647. doi:10.1016/j.physa.2018.04.004.

15

You might also like