Predicting Short-Term Stock Prices Using Ensemble Methods and Online Data Sources

See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/325786183
Predicting Short-Term Stock Prices using Ensemble Methods and Online Data
Sources
Article in Expert Systems with Applications · June 2018

DOI: 10.1016/j.eswa.2018.06.016
CITATIONS READS
29 3,530
5 authors, including:
Bin Weng Lin Lu

Amazon Auburn University
4 PUBLICATIONS 83 CITATIONS 8 PUBLICATIONS 64 CITATIONS
SEE PROFILE SEE PROFILE
Fadel M. Megahed Waldyn Martinez

Miami University Miami University
44 PUBLICATIONS 703 CITATIONS 13 PUBLICATIONS 68 CITATIONS
SEE PROFILE SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Stock Market Prediction View project
Advancing Safety Surveillance using Individualized Sensor Technology (ASSIST) View project
All content following this page was uploaded by Fadel M. Megahed on 05 February 2019.
The user has requested enhancement of the downloaded file.

Predicting Short-Term Stock Prices using Ensemble Methods and Online
Data Sources
Bin Wenga , Lin Lub , Xing Wangc , Fadel M. Megahedd , Waldyn Martineze
a
Department of Industrial & Systems Engineering, Auburn University, AL, 36849, USA | Email:
bzw0018@auburn.edu
b
lzl0032@auburn.edu
c
xzw0005@auburn.edu
d
Department of Information Systems & Analytics, Miami University, Oxford, OH 45056, USA | Email:
fmegahed@miamioh.edu
e
Department of Information Systems & Analytics, Miami University, Oxford, OH 45056, USA | Email:
martinwg@miamioh.edu | Phone: +1(513)529-2154.
Abstract
With the ubiquity of the Internet, platforms such as: Google, Wikipedia and the like can provide
insights pertaining to firms’ financial performance as well as capture the collective interest of traders
through search trends, number of web page visitors and/or financial news sentiment. Information
emanating from these platforms can significantly affect, or be affected by, changes in the stock
market. The overarching goal of this paper is to develop a financial expert system that incorporates
these features to predict short term stock prices. Our expert system is comprised of two main
modules: a knowledge base and an artificial intelligence (AI) platform. The “knowledge base” for
our expert system captures: (a) historical stock prices; (b) several well-known technical indicators;
(c) counts and sentiment scores of published news articles for a given stock; (d) trends in Google
searches for the given stock ticker; and (e) number of unique visitors for pertinent Wikipedia pages.
Once the data is collected, we use a structured approach for data preparation. Then, the AI
platform trains four machine learning ensemble methods: (a) a neural network regression ensemble;
(b) a support vector regression ensemble; (c) a boosted regression tree; and (d) a random forest
regression. In the cross-validation phase, the AI platform picks the “best” ensemble for a given
stock. To evaluate the efficacy of our expert system, we first present a case study based on the
Citi Group stock ($C) with data collected from 01/01/2013 - 12/31/2016. We show the expert
system can predict the 1-day ahead $C stock price with a mean absolute percent error (MAPE) ≤
1.50% and the 1-10 day ahead with a MAPE ≤ 1.89%, which is better than the reported results in
the literature. We show that the use of features extracted from online sources does not substitute
Preprint submitted to Expert Systems with Applications December 18, 2017

the traditional financial metrics, but rather supplements them to improve upon the prediction
performance of machine learning based methods. To highlight the utility and generalizability of our
expert system, we predict the 1-day ahead price of 19 additional stocks from different industries,
volatilities and growth patterns. We report an overall mean for the MAPE statistic of 1.07% across
our five different machine learning models, including a MAPE of under 0.75% for 18 of the 19 stocks
for the best ensemble (boosted regression tree).
Keywords: Big Data, Ensembles, Google Trends, R Programming, Sentiment Analysis, Wikipedia
2
1 1. INTRODUCTION
2 Stock market prediction has continued to be an attractive topic in academia and business. His-
3 torically, the topic of predicting the stock revolved around the following question: “To what extent
4 can the past history of a common stock’s price be used to make meaningful predictions concern-
5 ing the future price of the stock?” (Fama, 1965, p. 34) Important financial theories, specifically
6 the Efficient Market Hypothesis (Fama, 1965) and the random walk model (Cootner, 1964; Fama
7 et al., 1969), have suggested that stock prices cannot be predicted since they are driven by new
8 information which cannot be captured based on an analysis of stock prices (Geva & Zahavi, 2014).
9 Proponents of these hypotheses believe that the stock prices will follow a random walk and any
10 prediction of stock movement will be around 50% (Bollen et al., 2011). However, many studies have
11 rejected the premise of these two hypotheses and showed that the market can be predicted to some
12 extent (Abdullah & Ganapathy, 2000; Malkiel, 2003; Smith, 2003; Mok et al., 2004; Nofsinger, 2005;
13 Prechter Jr & Parker, 2007; Bollen et al., 2011; Ballings et al., 2015; Patel et al., 2015a; Nassirtoussi
14 et al., 2015; Nguyen et al., 2015; Oliveira et al., 2017; Chong et al., 2017; Weng et al., 2017a).
15 In our estimation, the literature on stock prediction can be categorized according to four different
16 metrics. These metrics are: (1) the type of outcome variable used for prediction, i.e., a dichotomous
17 outcome for movement or a continuous outcome for price/returns; (2) the predictors included in the
18 model, which include traditional predictors (features extracted from market, economic and techni-
19 cal indicators) and/or using crowd-sourced predictors (e.g., features extracted from web searches,
20 financial news sentiment, etc.); (3) the type of prediction models used, which are typically based
21 on the assumptions made in metrics (1)-(2); and (4) the length of the prediction period (i.e. short-
22 term versus long-term investment windows). These metrics (and the corresponding grouping of the
23 literature) is discussed in more detail in the paragraphs below.
24 There are two main types of prediction outcomes in the stock market prediction literature: (a)
25 stock market movement (see, e.g., Schumaker & Chen 2009; Bollen et al. 2011; Ballings et al. 2015;
26 Patel et al. 2015a; Nguyen et al. 2015; Weng et al. 2017a), where the prediction goal is whether the
27 stock is going up or down at a predefined time point; and (b) a continuous target, where the goal is
28 to predict either the price (e.g., Ticknor 2013; Patel et al. 2015b; Göçken et al. 2016) or the returns
29 on investment (e.g., Rather et al. 2015; Chong et al. 2017; Oliveira et al. 2017). From a financial
30 market point of view, the underlying motivation behind the movement and continuous prediction
31 models is somewhat different. Specifically, the literature on movement prediction implicitly assumes
8
32 that the task is to “generate profitable action signals (buy and sell) than to accurately predict future
33 values of a time series” (Gidofalvi, 2001, p.1). On the other hand, the prediction of the specific
34 stock price, index or return can provide decision makers more accurate information pertaining to
35 the risk adjusted trading profits (Kara et al., 2011). In this paper, our objective is to predict stock
36 price since it provides more complete information when compared to just predicting movement. It
37 should also be clear to the reader that the movement information can be generated from price, but
38 not the other way around.
39 From a predictors’ (explanatory variables) perspective, the literature has traditionally relied
40 on the time series data of the stock market, technical analysis/indicators and economic indicators
41 in predicting the future performance of stocks and indices. The trading of a given stock can be
42 characterized using: (a) the stock’s opening and/or closing prices; (b) statistics capturing the
43 variation/volatility of the stock; and (c) the trading volume of the stock. Some of those features are
44 included in most (if not all) stock market prediction models. Technical analysis considers historical
45 financial market data, such as past prices and volume of a stock, and uses charts as primary tools
46 to predict price trends and make investment decisions (Murphy, 1999). Commonly used technical
47 indicators include the moving average, moving average convergence and divergence, relative strength
48 index, and the commodity channel index (Tsai et al., 2011). For an introduction on how the market
49 data and technical indicators are used in the literature, we refer the reader to Weng et al. (2017a).
50 Economists have noted that stock prices can be correlated to: (a) macroeconomic indices, (b)
51 seasonal effects, and (c) political events (Mok et al., 2004; Kao et al., 2013). For instance, the
52 observed daily stock returns reflect the stock market reaction to factors such as the release of
53 economic indicators, government intervention or political issues, among others (Mok et al., 2004).
54 Our previous work (Weng et al., 2017b, Under Review) shows that using ensemble methods with
55 only macroeconomic indicators for predicting the one-month ahead prices of several U.S. indices and
56 sector indices can result in predictions with a mean absolute percent error (MAPE) < 1.87%. These
57 results build on the observations of Tsai & Hsiao (2010) who noted that economic performance has
58 a clear impact on the prospects of growth and earnings of companies. Generally speaking, economic
59 indicators can be divided into coincident, leading and lagging indicators. These indicators can be
60 obtained concurrently, prior or after the related economic activity occurs (Tsai et al., 2011).
61 With the increased popularity of web technologies and their continued evolution, various sources
62 of on-line data and analysis became more accessible to the public. These sources contain financial
63 information either explicitly (e.g., a Google News article discussing/predicting future stock per-
9
64 formance) or implicitly (e.g., measures of public interest in a stock/index through Google Search
65 trends). Utilizing such insights, stock market prediction models have started to capitalize on these
66 online data sources (Zhai et al., 2007; Tetlock, 2007; Moat et al., 2013; Preis et al., 2013; Geva
67 & Zahavi, 2014; Nassirtoussi et al., 2015; Nguyen et al., 2015; Weng et al., 2017a). The research
68 literature conjectures that combining extensive crowd-sourcing and/or financial news data with the
69 aforementioned traditional data sources facilitates more accurate predictions. Consequently, in this
70 paper, we examine the following sets of predictors to form the “knowledge base” of our financial
71 expert system: (a) market data (e.g., the opening, closing, low and high prices of an index); (b)
72 technical indicators (e.g., the Relative Strength Index and the Chande Momentum Oscillator); (c)
73 counts and sentiment scores of financial news (which were shown to have prediction significance
74 in Tetlock 2007); (d) trends in Google query volumes (relevance shown in Preis et al. 2013); and
75 (e) Wikipedia page visit trends (relevance shown in Moat et al. 2013; Weng et al. 2017a). To the
76 best of our knowledge, these five sets of predictors have never been examined in combination in the
77 literature. Note that we do not consider macroeconomic indicators in this paper since they update
78 monthly and thus, are invariant for shorter time prediction intervals.
79 Numerous models have been proposed/implemented to predict stock/indices performance. The
80 literature shows that machine learning models typically outperform statistical and econometric
81 models (Zhang & Wu, 2009; Meesad & Rasel, 2013; Patel et al., 2015b; Hsu et al., 2016; Weng et al.,
82 2017a). Perhaps more importantly, the use of machine learning models provide more flexibility when
83 compared to the more traditional models, since they: (a) do not require distributional assumptions
84 (Zhang & Wu, 2009), (b) more easily recognize patterns hidden in time series data (Meesad &
85 Rasel, 2013), and (c) can combine individual classifiers to reduce the variance and obtain better
86 prediction accuracy (Patel et al., 2015b). The literature pertaining to stock price prediction using
87 machine learning models can be categorized into: (a) methods utilizing single/individual classifiers
88 (see e.g., Zhang & Wu 2009; Schumaker & Chen 2009; Tsai & Hsiao 2010; Guresen et al. 2011;
89 Khansa & Liginlal 2011; Wang et al. 2011; Alkhatib et al. 2013; Meesad & Rasel 2013; Geva &
90 Zahavi 2014; Chen & Hao 2017; Chong et al. 2017); and (b) methods utilizing ensemble classifiers
91 (see e.g., Chen et al. 2007; Hassan et al. 2007; Qian & Rasheed 2007; Tsai et al. 2011; Wang et al.
92 2012; Booth et al. 2014; Kristjanpoller et al. 2014; Araújo et al. 2015; Barak & Modarres 2015;
93 Patel et al. 2015b; Rather et al. 2015; Wang et al. 2015; Göçken et al. 2016). From a machine
94 learning perspective, it is well documented that “ensembles can often perform better than single
95 classifiers” (Dietterich, 2000a, p.1). The superiority of ensembles have also been shown in the
10
96 context of financial expert systems (Chen et al., 2007; Qian & Rasheed, 2007; Tsai et al., 2011).
97 Thus, in this paper, we examine ensemble methods in an effort to predict stock prices using multiple
98 data streams. Specifically, we evaluate the effectiveness of the following ensemble methodologies:
99 (a) neural network regression bagged ensemble, (b) support vector regression bagged ensemble, (c)
100 boosted regression tree, and (d) random forest regression.
101 In terms of the time point for prediction, the majority of the papers discussed above focus on
102 one particular time point. From an investor/practitioner’s perspective, a single time period model
103 implicitly assumes the following: (a) buy and sell decisions are made periodically where the trading
104 cost is minimal compared to the investment; and (b) it is reasonable to sell the stock and then buy
105 it again in the next time period. From our experience, these assumptions are somewhat restrictive.
106 An investor would like to have more information (with the understanding that there is uncertainty
107 in the predictions) pertaining to how the stock price will perform over multiple time period. Ideally,
108 this information can allow the investor to make more informed decisions. In Table 1, we categorize
109 the literature on stock price prediction according to the machine learning approach and the time
110 intervals used. We use the “*” symbol to denote papers that incorporated multiple data sources
111 (i.e., traditional sources with online sources). Note that none of the reviewed ensemble methods
112 incorporate features from both traditional and online sources as potential predictors. Based on the
113 insights from Nassirtoussi et al. (2015),Nguyen et al. (2015) and Weng et al. (2017b), we hypothesize
114 that the prediction performance can be improved by filling this research gap.
Table 1: A tabular view of the stock price prediction literature using machine learning methods.
Paper One time interval Multiple intervals

Geva & Zahavi (2014)*
Chen & Hao (2017)
Alkhatib et al. (2013)
Chong et al. (2017)
Guresen et al. (2011)
Single Meesad & Rasel (2013)
Wang et al. (2011)
Classifiers Khansa & Liginlal (2011)
Tsai & Hsiao (2010)
Zhang & Wu (2009)
Schumaker & Chen (2009)*
Araújo et al. (2015)
Barak & Modarres (2015)
Rather et al. (2015)
Wang et al. (2015)
Göçken et al. (2016)
Booth et al. (2014)
Ensembles Patel et al. (2015b)
Wang et al. (2012)
Kristjanpoller et al. (2014)
Tsai et al. (2011)
Chen et al. (2007)
Hassan et al. (2007)
Qian & Rasheed (2007)
115 The overarching goal of this paper is to develop a financial expert system based on ensemble
11
116 methods that utilizes multiple data sources and is able to more accurately predict stock prices
117 over multiple short-term time periods. To encourage the adoption of our financial expert system
118 and/or similar approaches, we make all our code freely available at: https://github.com/martinwg/
119 stockprediction. Note that our code and documentation provides practitioners and researchers
120 the tools and software packages to scrape data pertaining to any stock (and not just the stocks
121 analyzed in our case study), with the purpose of broadening the utility of our financial expert
122 system. The remainder of this paper is organized as follows. In Section 2, we provide the details
123 for both the “knowledge base” construction and the “artificial intelligence platform”. We discuss
124 our experimental results in Section 3. Finally, we present a summary of the main contributions and
125 limitations of this work in Section 4, as well as some ideas for future research.
126 2. METHODS
127 We propose a data-driven approach that consists of two main phases, as shown in Figure 1.
128 In Phase 1, the data is collected through four web APIs, which are Yahoo YQL API, Wikimedia
129 RESTful API, Quandl Database API, and Google Trend API. Four sets of data are generated
130 that include: (a) publicly available market information on stocks, including opening/closing prices,
131 trade volume, NASDAQ and the DJIA indices, among others; (b) the number of unique visitors for
132 pertinent Wikipedia pages per day; (c) daily counts of financial news on the stocks of interest and
133 sentiment scores that are a measure of bullishness and bearishness of equity prices calculated as a
134 statistical index of positivity and negativity of news corps and (d) daily trend of stock related topics
135 searched on Google. We obtain commonly used technical indicators that reflect price variation over
136 time (Stochastic Oscillator, MACD, Chande Momentum Oscillator, etc.) from the R package TTR
137 (Ulrich 2016)to comprise our fifth set of data. The data will then enter two sequential preprocessing
138 steps: (a) data cleaning; which deals with missing and erroneous values; (b) data transformation;
139 required by some machine learning models, such as neural networks. A dimensional reduction
140 technique is applied to reduce the complexity of the data and keep the most important and relevant
141 information. In Phase 2, we make the stock price prediction with different periods (lags) using
142 four machine learning ensemble techniques. A modified leave-one-out cross validation (LOOCV)
143 is employed to minimize the bias associated with the sampling. These models are compared and
144 evaluated based on the modified LOOCV using three evaluation criteria. The details for each of
145 these phases are presented in the subsections below.
12
Phase I: Knowledge Base (Data Acquisition, Preprocessing and Feature Generation)
Stock Market Technical Indicators Financial News Google Trends Wikipedia Hits
Open Price,
Close Price, Stochastic Oscillator, MACD, Counts and sentiment scores Number of hits for Number of unique visitors for
Volume, Chande Monmentum, Oscilator, etc. on the stock of interest stock related searches pertinent Wikipedia pages
Index, etc.
Data Preprocessing Feature Generation
Data cleaning: missing data, outliers Features correlation analysis visualization

Data transformation: scaling, centering Principle component analysis
Phase II: AI Platform (AI Models, Evaluation and User Interface)
Machine Learning Ensemble Models Model Evaluation User Interface
a) Root mean squared error

b) Mean absolute error
c) Mean absolute percentage error
Time slicing cross-validation
Figure 1: An overview of the proposed method
146 2.1. Knowledge base: Data acquisition
147 In the data acquisition phase, five sets of data are obtained from three open source APIs and the
148 TTR R package (Ulrich, 2016). These include traditional time series stock market data, Wikipedia
149 hits, financial news, Google trends and technical indicators. The data sets are preprocessed and
150 merged in Phase I. First, we obtain publicly available market data on the stock choice through
151 Yahoo YQL Finance API. The following five variables are obtained as part of inputs: the daily
152 opening and closing price, daily highest and lowest price, volume of trades, and the stock related
153 indices (e.g. NASDAQ, DJIA).
154 The second set of data is queried through the Wikimedia RESTful API for pageview data, which
155 allows us to retrieve the daily visits for the selected stock-related pages also filtering the visitor’s
156 class and platform. The reader is referred to https://en.wikipedia.org/api/rest v1/ for more details.
157 The names of stock/company Wikipedia pages need to be input by users to process the queries.
158 The third set of data is acquired using the Quandl Database API, which is the largest public
13
159 API integrating millions of financial and economic datasets. The database “FinSentS Web News
160 Sentiment” is used in this study, which is subscription-based resource. The R package Quandl
161 (Raymond McTaggart et al., 2016) is used to access the database through its API. The queried
162 dataset includes daily news counts and daily average sentiment scores since 2013, derived from
163 publicly available Internet sources. The fourth data set is the daily trends (number of hits) for
164 stock related topics on Google Search. Our study uses the recent released Google Trends API
165 (2017) to capture information on stock trends. The default setting of our methodology is to search
166 the trends on the stock tickers and company names. The users are highly recommended to select
167 more accurate stock or company related terms to improve the performance of the prediction model.
168 Researchers list several technical indicators that could potentially have an impact on the stock
169 price/return prediction including the stochastic oscillator, moving average and its convergence,
170 divergence (MACD), the relative strength index (RSI), etc. (see e.g. Kim & Han 2000; Tsai
171 & Hsiao 2010; Göçken et al. 2016). In our study, eight commonly used technical indicators are
172 selected, which are shown in Table 2. Furthermore, the trends and news sentiment of these technical
173 indicators are obtained from Wikipedia, Financial News, and Google Trends. Six of the selected
174 indicators are applied to generate additional features for these three datasets. The six indicators
175 are presented in bold in Table 2. Please refer to http://stockcharts.com/ for a detailed calculation
176 for the indicators. Hereafter, ten periods of targets (based on prediction lags) are calculated using
177 the “Close Price” acquired from Yahoo QYL API. Five sets of data and ten targets are combined
178 to form the original input data set for preprocessing purposes.
Table 2: Description of technical indicators used in this study
Technical Indicators Description

Stochastic Oscillator Indicator shows the location of the close relative to the high-low range.
Relative Strength Index (RSI) Indicator that measures the speed and change of price movements
Chande Momentum Oscillator Capture the recent gains and losses to the price movement over the period
(CMO)
Commodity Channel Index (CCI) Indicator used to identify a new trend or warn of extreme conditions
MACD Moving average convergence or divergence oscillator for trend following
Moving Average Smooth the time series to form a trend following indicator
Rate Of Change (ROC) Measure the percent change from one period to the next
Percentage Price Oscillator Measure the difference between two moving average as a percentage
179 2.2. Knowledge base: Data preprocessing
180 Given that the data is automatically collected through the APIs in this study, some features
181 have missing values or no meaning for a given sample. The preprocessing approach here includes
14
182 two main steps; dealing with the missing data and removing potential outliers. First and foremost,
183 we scan through all features queried from the APIs and determine if any pattern of missing data
184 exists. For missing data, the statistical average is imputed to the appropriate observation when
185 applicable. Otherwise, the corresponding date with missing values will be removed from the data
186 sets. The spatial sign (Serneels et al., 2006) process is used to check for outliers and remove the
187 corresponding data points.
188 Feature scaling is performed for each predictor for a common scale. The process of scaling
189 is required by the models used in this study, especially the support vector regression and neural
190 networks in order to avoid attributes with greater numeric ranges dominating those with smaller
191 ranges. This study deploys a straightforward and common data transformation approach to center
192 and scale the predictor variables. We use a simple standardization of the variables by taking the
193 deviation of each observation from the average of each predictor divided by the standard deviation.
194 2.3. Knowledge base: Feature extraction
195 For each of the five sets of data, around ten features are collected for each given period leading
196 to more than fifty variables being collected. The final dataset contains 42 predictors including date,
197 along with lagged stock prices from 1 up to 10 days for a total of 52 variables. All the variables are
198 numeric except for date. Due to the curse of dimensionality, the accuracy and speed of many of the
199 common predictive techniques degrade on high dimensional and high velocity data. Therefore, the
200 process of dimension reduction is necessary and might improve the performance of at least some of
201 the prediction models considered. On the other hand capturing most of the information provided
202 by the original variables is of utmost importance. We apply principal component analysis (PCA) to
203 our training set for the prediction models. Researchers show that PCA improves, in some instances,
204 the accuracy and stability of stock prediction models (Lin et al., 2009; Tsai & Hsiao, 2010).
205 PCA is probably the most commonly-used multivariate technique. Its origin can be traced back
206 to Peason (1901), who described the geometric view of this analysis as looking for lines and planes
207 of closest fit to systems of points in space. Hotelling (1933) further developed the technique and was
208 the first to use the term “principal component”. The goal of PCA is to extract and only keep the
209 most important and relevant information from a given set of data. To achieve this, PCA projects
210 the original data into principal components (PCs), which are linear combinations of the original
211 variables, so that the (second-order) reconstruction error is minimized. For normal variables (with
212 mean zero), the (second-order) covariance matrix contains all the information about the data. Thus
15
213 the PCs provide the best linear approximation to the original data, the first PC is computed as the
214 linear combination to capture the largest possible variance, then the second PC is constrained to
215 be orthogonal to the first PC, while capturing the largest possible variance unaccounted for, and so
216 the process goes on. The PCs that capture the most varianace are obtained through singular value
217 decomposition (SVD). Since the variance depends on the scale of the variables, standardization
218 (i.e., centering and scaling) is needed beforehand, so that each variable has a zero mean and unit
219 standard deviation. To further understand the properties of PCA, we let X be the standardized data
220 matrix, the covariance matrix can be obtained as Σ = 1
n
XX> , which is symmetric and positive
221 definite. By spectral decomposition, we can write Σ = QΛQ> , where Λ is a diagonal matrix
222 consisting of the ordered eigenvalues of Σ, and the column vectors of Q are the corresponding
223 eigenvectors, which are orthonormal. The PCs can be obtained as the columns of QΛ. It can be
224 shown (Fodor, 2002) that the total variation is equal to the sum of the eigenvalues of the covariance
matrix pi=1 Var(PCi ) = pi=1 λi = pi=1 trace(Σ), and the fraction ki=1 λi /trace(Σ) gives the
P P P P
225
226 cumulative proportion of the variance explained by the first k PCs. In many cases, the first a few
227 PCs capture the most variation, so the remaining components can be disregarded only with minor
228 information loss.
229 PCA derives orthogonal components, meaning they are uncorrelated to each other, and since
230 our stock market data seems to contain many highly correlated variables, applying PCA helps
231 us alleviate the effect of strong correlations between features, while reducing the dimensionality
232 of the feature space. However, as an unsupervised learning algorithm, PCA does not consider
233 the target while summarizing the data variation. The relationship between the target and the
234 derived components might be more complex, or the surrogate predictors could provide no suitable
235 relationship to the target, so we provide results using PCs as predictors and also using the original
236 features. Moreover, since PCA utilizes the first and second moments, it relies heavily on the
237 assumption that the original has an approximate Gaussian distribution.
238 We use the PCs that retain the majority of the variance (information) setting the threshold to
239 95%. The results of the PCA analysis is discussed in Section 3. The prediction performance of the
240 proposed models with and without the dimension reduction is analyzed here.
241 2.4. The inference engine: AI model comparison and evaluation
242 In this phase, our models and their evaluation approach are introduced. We compare the
243 effectiveness of four machine learning models; a neural network regression ensemble, a support
16
244 vector regression ensemble, a boosted tree and a random forest. The four models are considered
245 ensembles of individual classifiers with the main difference stemming from the type of base-learner
246 used and choice of ensemble approach; boosting, bagging or random forest. From a machine learning
247 perspective, the following two components should be taken into consideration for a successful stock
248 price prediction models: (a) capture the dimensionality of the input space; (b) detect the trade-off
249 between bias and variance. A more detailed discussion on our feature extraction approach using
250 PCA is presented in Section 2.3. Therefore, this section focuses on describing the proposed models
251 based on the bias/variance trade-off. The reader should note that a cross-validation approach has
252 been applied to the four models during training.
253 In the following subsections, we first provide a short overview of our proposed models and cross
254 validation. We then introduce the performance evaluation metrics used in this study to identify the
255 most suitable approach.
256 2.4.1. Neural networks regression ensemble (NNRE)

257 Inspired by complex biological neuron systems in our brain, the artificial neurons were proposed
258 by McCulloch & Pitts (1943) using the threshold logic. Werbos (1974) and Rumelhart et al.
259 (1985) independently discovered the backpropagation algorithm which could train complex multi-
260 layer perceptrons effectively by computing the gradient of the objective function with respect to
261 the weights. Neural networks have been widely used since then, especially since the reviving of
262 the deep learning field in 2006 as parallel computing emerged. Neural networks have been used
263 successfully in stock market prediction, due to their ability to handle complex nonlinear systems of
264 stock market data.
265 In neural networks, we describe the features as input x and the corresponding weighted sum
266 (z = w> x). The information is then transformed by the activation functions within each neuron
267 and propagated through layers, finally resulting a given output. If there were hidden layers between
268 the input and output layer, the network is called “deep”, giving rise to the term deep learning.
269 The hidden layers could distort the linearity of the weighted sum of inputs, so that the outputs
270 become linearly separable. Theoretically, we can approximate any function that maps the inputs
271 to the output if the number of neurons are not limited. This flexibility gives the neural networks
272 the ability to obtain higher accuracy in stock market prediction where the true data generating
273 mechanism is extremely complicated. The functions in each neuron are called “activations”, and
274 could be of many different types. The most commonly used activation is the sigmoid function,
17
275 which is smooth and has an easy-to-express first order derivative (in terms of the sigmoid function
276 itself), thus it is appropriate to train by using back-propagation. Furthermore, its S-shaped curve
277 is good for classification, but as for regression, this property might be a disadvantage. It is worth
278 to note that the rectified linear unit (ReLu), which takes the simple form f (z) = max(z, 0), is less
279 likely to have a vanishing gradient but instead the gradient is rather constant (when z > 0). This
280 might result in faster learning for networks with many layers. Also, the sparcity of the weights arises
281 as z < 0, reducing the complexity of the representation on a large architecture. Both properties
282 allow the ReLu to become one of the most dominant non-linear activation functions in the last few
283 years, especially in the field of deep learning (LeCun et al., 2015). One of the main concerns of
284 using ensembles of neural networks is that because of their complexity neural networks are not weak
285 learners (classifiers with accuracy slightly higher than 50%). Ensembles rely on the use of unstable
286 and weak classifiers to reduce the variance of the predictions. To alleviate this problem we do not fit
287 a deep network and instead make use of a two-layer layer neural network structure (MacKay, 1992;
288 Foresee & Hagan, 1997) with the number of neurons chosen using cross-validation at each iteration
289 of the ensemble. We then construct the ensemble of neural networks by using bagging, that is, we
290 take bootstrap samples of the training data set and iterate the process multiple times to reduce
291 the variance in the bias-variance decomposition framework. The final prediction is computed as
292 the average across iterations. In our experiments the bagging approach results in an improvement
293 of on average 30% in test performance metrics compared to a single two-layer neural network with
294 the same characteristics and features, including number of neurons. We use 100 iterations in our
295 bagging ensemble.
296 2.4.2. Support vector regression ensemble (SVRE)

297 To explain the learning process from a statistical point of view, Vapnik & Chervonenkis (1974)
298 proposed the VC learning theory, and one of its major components characterizes the construction
299 of learning machines that enable them to generalize well. Based on that, Vapnik and his colleagues
300 developed the support vector machine (SVM) (Boser et al., 1992; Cortes & Vapnik, 1995) which
301 has been shown as one of the most influential supervised learning algorithms. The key insight of
302 SVM is that those points closest to the separating hyperplane, called the support vectors, are more
303 important than others. Assigning non-zero weights only to those support vectors while constructing
304 the learning machine can lead to better generalization. The separating hyperplane is called the
305 maximum margin separator. Drucker et al. (1997) then expanded the idea to regression problems,
18
306 by omitting the training points which deviate from the actual targets by less than a threshold ε,
307 while calculating the cost. These points with small errors are also called support vectors, and the
308 corresponding learning machine for regression is called support vector regression (SVR). The goal
309 of training SVM/SVR is to find a hyperplane that maximizes the margin, which is equivalent to
310 minimize the norm of the weight vector for every support vector, subject to the constraints that
311 make each training sample valid, i.e., for SVR, the optimization problem can be written as
1
min 2
||w||2
s.t. yi − wT xi − b ≤ ε
wT xi + b − yi ≤ ε
312 where xi is a training sample with target yi . We will not show the details here, but maximizing its
313 Lagrangian dual is a much simpler quadratic programming problem. This optimization problem is
314 convex, thus it would not be stuck in local optima. Convex optimization is solved by well-studied
315 techniques, such as the sequential minimal optimization (SMO) algorithm.
316 Theoretically, SVR could be deployed in our regression model to capture the important factors
317 that significantly affect stock price and avoid the problem of overfitting. The reason is not limited to
318 picking the support vectors but also the introduction of the idea of soft margins (Cortes & Vapnik,
319 1995). The allowance of softness in margins dramatically reduces the computational work while
320 training. More importantly, it captures the noisiness of real world data (such as the stock market
321 data) and could obtain more generalizable results. Another key technique that makes SVM/SVR so
322 successful is the use of the well-known kernel trick, which maps the non-linearly-separable original
323 input into a higher dimensional space, so that the data become linearly-separable, thus greatly
324 expanding the hypothesis space (Russell et al., 1995). SVM/SVR has its own disadvantages. The
325 performance of SVM/SVR is extremely sensitive to the selection of the kernel function, as well as
326 the parameters. In that case, we picked the Radial Basis Function (RBF) as the kernel in our SVR
327 since the stock market data contains high noise. Another major drawback to kernel machines is
328 that the computational cost of training is high when the dataset is large (Goodfellow et al., 2016).
329 SVM/SVR also suffers the curse of dimensionality and struggles to generalize well under certain
330 conditions. We also use bagging with 100 iterations to form an ensemble of SVRs, that is, we
331 take bootstrap samples of the training data set and iterate the process multiple times. The final
332 prediction (SVRE) is also computed as the average predicted value across iterations.
19
333 2.4.3. Boosted regression tree (BRT)
334 Rooted in probably approximately correct (PAC) learning theory (Valiant, 1984), Kearns &
335 Valiant (1988) posed the question whether a set of “weak” learners (i.e., learners that perform
336 slightly better than random guessing) can be combined to produce a learner with arbitrarily high
337 accuracy. Schapire (1990) and Freund (1990) then answered this question affirmatively with the first
338 provable boosting algorithm. Adaboost, the most popular boosting algorithm, was developed by
339 Freund & Schapire (1995). Adaboost addresses two fundamental questions in the idea of boosting:
340 how to choose the distribution in each round, and how to combine the weak rules into a single strong
341 learner (Schapire, 2003). AdaBoost uses “importance weights” to force the learner to pay more
342 attention to those examples with larger errors, that is, iteratively fits a learner using weighted data
343 and updates the weights with the errors from the fitted learner. Lastly AdaBoost combines these
344 weak learners together through weighted majority voting. Boosting is computationally efficient with
345 very few parameters to set, while (theoretically) guaranteeing a desired accuracy given sufficient
346 data. However, practically, the performance of boosting significantly depends on the sufficiency
347 of the data as well as the choice of the base learner. Applying base learners that are too weak
348 could definitely fail to work, while overly complex base learners could result in overfitting. It
349 also seems susceptible to uniform noise (Dietterich, 2000b; Martinez & Gray, 2016), since it may
350 over-emphasize the highly noisy examples.
351 As “off-the-shelf” supervised learning methods, decision trees are the most common choices for
352 base learners in AdaBoost. Decision trees are simple to train, yet powerful predictive tools. Decision
353 trees partition the space of all joint predictor variables into disjoint regions using greedy search,
354 either based on the error or the information gain. However, due to their greedy strategy, the results
355 obtained by decision trees might be unstable and have high variance, thus they often achieve lower
356 generalization accuracy. Boosting improves upon decision trees performance by reducing the bias
357 as well as the variance (Friedman et al., 2001). We use 100 iterations of unpruned regression trees
358 as the base learner for our boosting (AdaBoost) approach.
359 2.4.4. Random forest regression (RFR)

360 Breiman (2001) defines a random forest (RF) as an algorithm consisting of a collection of tree
361 structured classifiers that for independently and identically distributed random vectors. Each tree
362 casts a unit vote for the most popular class for each input when the response is binary. For regression
363 problems, the RF prediction is the average prediction from the regression trees.
20
364 RFs inject randomness by growing each tree on a random subsample of the training data, and
365 also by using a small random subset of the predictors at each decision node split. The RF method
366 is similar to boosting in the fact that it combines classifiers that have been trained on a subset
367 sample or a weighted subset, but they differ in the fact that boosting gives different weight to
368 the base learners based on their accuracy, while random forests have uniform weights. There has
369 been ample research on these ensemble methods and how they perform under different settings.
370 For a more complete review on their performance, the reader is referred to Quinlan (1996); Maclin
371 & Opitz (1997); Dietterich (2000a), and Maclin & Opitz (2011). We use 100 trees for the RFR
372 implementation here.
373 2.4.5. Time series cross validation

374 In this study, the modified LOOCV is applied through the prediction model comparisons and
375 evaluation approaches. The objective is to minimize the bias associated with the random sampling of
376 the training and test data sample (Arlot et al., 2010). The traditional random cross validation (e.g.
377 k-fold) is not suitable for this study because of the time series nature of the stock price prediction.
378 Thus, the modified LOOCV approach is used, which performs a time window slicing cross validation
379 strategy. The methodology moves the training and test sets in time by creating time slice windows.
380 There are three parameters to be set in the training process: (a) Initial Window, which dictates
381 the initial number of consecutive values in each training set sample; (b)Horizon, which determines
382 the size of test set samples; and (c) Fixed Window, which is a logical parameter to determine
383 whether the size of training set will be varied. The R package Caret (R Core Team, 2016) is used
384 to perform this approach. We set the Initial Window parameter to 80% of the observations, the
385 Horizon parameter to 5%, and the Fixed Window to TRUE for a static moving window of 80% of
386 the data.
387 2.4.6. Model evaluation

388 To evaluate the performance of the four modeling methods, three commonly used evaluation
389 criteria are used in this study: (a) the root mean square error (RMSE), (b) the mean absolute error
21
390 (MAE), and (c) the mean absolute percentage error (MAPE), where
s n
1X
RMSE = (At − Ft )2 ,
n t=1
n
1X
MAE = |At − Ft | ,
n t=1
n
1 X At − Ft
MAPE = × 100,
n t=1 At
391 and At is the actual target value for the t-th observation, Ft is the predicted value for the corre-
392 sponding target, and n is the sample size.
393 The RMSE is the most popular measure for the error rate of regression models, as n → ∞, it
394 converges to the standard deviation of the theoretical prediction error. However, the quadratic error
395 may not be an appropriate evaluation criterion for all prediction problems, especially in the presence
396 of large outliers. In addition, the RMSE depends on scales, and is also sensitive to outliers. The
397 MAE considers the absolute deviation as the loss and is a more “robust” measure for prediction,
398 since the absolute error is more sensitive to small deviations and much less sensitive to large ones
399 than the squared error. However, since the training process for many learning models are based
400 on squared loss function, the MAE could be (logically) inconsistent (Woschnagg & Cipan, 2004)
401 to the model optimization selection criteria. The MAE is also scale-dependent, thus not suitable
402 to compare prediction accuracy across different variables or time ranges. In order to achieve scale
403 independence, the MAPE measures the error proportional to the target value. The MAPE however,
404 is extremely unstable when the actual value is small (consider the case when the denominator At = 0
405 or close to 0). We will consider all three measures mentioned here to have a more complete view of
406 the performance of the models considering the limitations of each performance measure. The fourth
407 evaluation criterion is training runtime. We measure the time in seconds to complete the ensembles
408 (for 100 iterations) using an Intel Xeon E5-2695 24-core workstation clocked at 2.30GHz per core.
409 We do not make use of parallel multicore processing. The reader should note that the runtime
410 does not intend to measure the theoretical computational complexity of the algorithms presented
411 here, but to merely illustrate comparison of the time it takes to run each algorithm under the same
412 circumstances with a fixed physical computational power.
22
413 3. EXPERIMENTAL RESULTS AND DISCUSSION
414 In this section, we go through the techniques and methodologies used to complement and build
415 our final ensemble models. The first step in our approach includes using visualization techniques to
416 recognize highly correlated features. The extracted features with and without PCA transformation
417 are used to build the predictive models, respectively. Finally, we compare the proposed ensemble
418 models using the performance measures described in Section 2.4.6.
419 3.1. Explanatory analysis
420 We explain here our exploratory analysis of the original data and our approach to capture
421 the characteristics containing the most information from the available features. As we discussed
422 in Section 2.2, the features collected through the APIs have high variability and contain miss-
423 ing/meaningless samples. After exploring each feature we perform the necessary data cleaning,
424 feature centering and feature scaling. Furthermore, we also pay close attention to the correlation
425 structure of feature.
426 To illustrate our approach, a case study based on the Citi Group stock ($C) is presented here.
427 The data is collected from January 2013 to December 2016 on a daily basis. Figure 2 shows a
428 visualization of the correlation matrix of the five sets of input features, in which the features are
429 grouped using the hierarchical clustering algorithm (so that the features with high correlations
430 are close to each other), and the colors indicate the magnitude of the pairwise correlations among
431 features. The dark blue implies strong positive correlation, while the dark red stands for strong
432 negative correlation, and the white color implies the two features are uncorrelated. The dark blue
433 blocks along the diagonal indicate that the features fall into several large clusters, and within each
434 cluster the features show strong collinearity. For example, the different prices (open, closed, high,
435 or low) in the same day are clearly close to each other in most of the cases and thus probably fall
436 into the same cluster. There are also features negatively correlated to each other, for instance, the
437 volume and the index have opposite trends, which might due to the low volatility of the Citi Group
438 stock ($C) stock. This shows investors tend to buy other stocks when the corresponding market
439 index is increasing.
440 3.2. Feature extraction
441 For our Citi Group stock ($C) stock analysis, the first three principal components, extracted
442 from all the features considered, accounted for 21.13%, 16.86%, and 10.95% of the total variance for
23
newsCount_MACD
newsCount_OSCP
newsCount_MA10
newsCount_CMO
Wikitraffic_MACD
newsCount_ROC
Wikitraffic_OSCP
newsCount_MA5
Wikitraffic_MA10
Wikitraffic_CMO
newsCount_RSI
Wikitraffic_ROC
Wikitraffic_MA5
newsSentiment
Wikitraffic_RSI
Market_MACD
gTrend_MACD
gTrend_OSCP
Market_slowD
Market_MA10
gTrend_MA10
Market_CMO
gTrend_CMO
Market_fastD
Market_fastK
Market_ROC
gTrend_ROC
Market_MA5
gTrend_MA5
Market_CCI
Market_RSI
gTrend_RSI
newsCount
Wikitraffic
Volume
gTrend
Close
Open
Index
High
Low
1
newsCount ●● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
newsCount_MA10 ● ●● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
newsCount_MA5 ● ● ●● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ●
newsCount_OSCP ● ● ●● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ●
gTrend_MA10 ● ● ● ● ●● ●● ● ● ● ●
● ● ●
● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ●
0.8
gTrend_MACD ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ●
●●● ● ●
gTrend_MA5 ● ● ● ● ● ● ●● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●
gTrend_OSCP ● ● ● ● ● ● ●● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●
newsCount_MACD ●●●● ● ● ● ●● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ●
● ● ● ● ●
0.6
newsCount_RSI ● ● ●
●
● ●● ● ● ● ● ●
●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ●
● ● ● ● ● ●
newsCount_CMO ● ● ●
●
● ●● ● ● ● ●
●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ●
Wikitraffic ● ● ● ● ● ● ●
● ● ●
●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ●
Wikitraffic_RSI ● ● ● ● ● ● ● ●●● ● ● ● ● ●
●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.4
Wikitraffic_CMO ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ●
Wikitraffic_MACD ● ● ●
● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ●
● ● ● ● ● ● ●
● ● ● ● ●
Volume ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
Wikitraffic_MA10 ● ● ●
● ● ● ●
● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● 0.2
Wikitraffic_MA5 ● ● ●
● ● ● ●
● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
Wikitraffic_OSCP ● ● ●
● ● ● ●
● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
Market_MACD ● ● ● ● ● ●
● ● ●
● ● ●
●● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ●
Market_CMO ● ● ● ● ● ● ● ● ●
● ● ●
● ●● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ●
● ● ● ●
0
Market_fastD ● ● ●
● ● ● ● ● ●
● ● ●
● ● ●● ● ● ●
● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ●
Market_slowD ● ● ● ● ● ● ● ● ●
● ● ●
● ● ●● ● ● ●
● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ●
Market_fastK ● ●
● ●
● ● ● ● ●
● ● ● ● ● ● ●●● ● ●
● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ●
Market_RSI ● ● ● ● ● ● ● ● ●
● ● ●
● ● ● ● ● ●● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ●
● ● ● ●
−0.2
Market_CCI ● ● ● ●
● ● ● ●
● ● ●
● ● ● ● ● ●● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ●
● ● ● ●
Market_ROC ● ● ● ● ● ● ● ● ● ●
●
● ●
● ● ●● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ●
newsSentiment ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
Index ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ●
● ● ●
● ●
●● ● ● ● ● ●
● ● ● ● ●
● ● ● ●
−0.4
Open ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ●
●
● ● ●
● ●●●● ● ●
● ● ● ● ● ● ● ● ● ● ● ●
High ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ●
●
● ● ●
● ●●●● ● ●
● ● ● ● ● ● ● ● ● ● ● ●
Low ●
● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ●
●
● ● ●
● ●●●● ● ●
● ● ● ● ● ● ● ● ● ● ● ●
Close ●
● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ●
●
● ● ●
● ●●●●● ●
● ● ● ● ● ● ● ● ● ● ● ●
−0.6
Market_MA5 ● ● ●
● ● ● ● ● ● ● ●
● ● ● ●
●
● ● ●
● ●
● ●● ● ●●●
● ● ● ● ● ● ● ● ● ● ● ● ●
Market_MA10 ● ● ● ● ● ● ● ● ● ● ●
● ●
● ●
● ●
● ● ● ● ● ●●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
gTrend_ROC ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
●● ● ● ● ●
gTrend_CMO ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ●
● ● ●
● ● ● ● ● ● ● ●● ● ● ●
gTrend ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●
−0.8
gTrend_RSI ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ●● ● ●
newsCount_ROC ● ● ● ● ● ● ● ● ●
● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ●
● ●
●● ● ●
Wikitraffic_ROC ● ● ● ●
● ● ● ● ●
● ●
● ● ● ● ●
● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ●
● ● ●
● ● ●
−1
Figure 2: Correlation matrix for features
443 the data, respectively. Figure (3a) shows the cumulative percentages of the total variation in the
444 data explained by each component, from which we can observe that the first 13 principal components
445 describe 90.78% of the information from the features, and the first 17 components capture 95.29%.
446 The first 26 components explain > 99.26% of the total variance, i.e. the remaining 15 components
447 capture < 0.74% of the variation in the data. Deploying the predetermined threshold of 95%, we
448 use 17 components out the total of 41 for training. Figure (3b) characterizes the loadings (i.e.,
449 the coefficients in the linear combination of features that derive a component) for each feature
450 associated with the first two principal components. It is quite clear that the loadings of the prices,
451 as well as the technical indicators have the largest effect for the first component, e.g., the coefficient
452 of close price is 0.2668, and that of RSI is 0.2645. As for the second component, the external
453 Internet features contribute the most in the positive direction. For instance, the coefficients for
24
454 Google Trend, Wiki Traffic and News Count are 0.2547, 0.1957 and 0.2137, respectively. Also note
455 that the News Sentiment plays a role that is negatively associated with the second component
456 with coefficient −0.1018. Note that Figure (3b) shows that the relationship between the first
457 two principal components is “scattered”, i.e. they are uncorrelated which is expected since they
458 are orthogonal. We highlight this observation here, however, to note the utility of using PCA to
459 generate uncorrelated features that capture different information.
(a) PCA Percent (b) PCA Rotate
Figure 3: Illustration of the variation explained by principal components
460 Based on the observations above, PCA provides two benefits: (a) reducing the dimension space;
461 and (b) ensuring that the features selected are not correlated. However, as an unsupervised learning
462 algorithm, PCA does not consider the target while projecting the data. The implications of the
463 unsupervised nature of PCA include: (a) the surrogate predictors provide no suitable relationship
464 to the dependent variable (stock price); and/or (b) the connection between the target and the
465 predictors is weakened by the PCA transformation. Thus, in this paper, we examine the performance
466 of the aforementioned ensemble models with and without using PCA.
467 3.3. Set of predictor stages

468 To further evaluate how much additional predictive value the proposed ensemble models obtain
469 from the use of financial news sentiment, trends and online data sources, we divide our set of
470 predictors into four stages. Table 3 shows the variables selected at each stage. For instance, in Stage
25
471 1 we consider only technical indicators as predictors for proposed ensemble methods. We expect the
472 ensembles created using only the technical indicators as predictors to be highly predictive as these
473 variables provide the most information about how the stock market behaves. Stage 2 adds news
474 sentiment and counts variables to the variables considered in Stage 1. We hypothesize that adding
475 these variables should provide additional improvement to the predictive power, albeit a significant
476 lesser improvement than the overall contribution of variables in Stage 1. Stage 3 adds Google
477 trends data and Stage 4 includes Wikipedia traffic forming the most complete set of predictors.
478 The ensemble models are trained using PCA transformations and untransformed sets of predictors
479 for each stage.
Table 3: Model variables selected at each stage
Stage 1 Stage 2 Stage 3 Stage 4

Open X X X X
High X X X X
Low X X X X
Close X X X X
Volume X X X X
Index X X X X
Market fastK X X X X
Market fastD X X X X
Market slowD X X X X
Market RSI X X X X
Market CMO X X X X
Market CCI X X X X
Market MACD X X X X
Market MA5 X X X X
Market MA10 X X X X
Market ROC X X X X
newsSentiment X X X
newsCount X X X
newsCount RSI X X X
newsCount CMO X X X
newsCount MACD X X X
newsCount MA5 X X X
newsCount MA10 X X X
newsCount ROC X X X
newsCount OSCP X X X
gTrend X X
gTrend RSI X X
gTrend CMO X X
gTrend MACD X X
gTrend MA5 X X
gTrend MA10 X X
gTrend ROC X X
gTrend OSCP X X
Wikitraffic X
Wikitraffic RSI X
Wikitraffic CMO X
Wikitraffic MACD X
Wikitraffic MA5 X
Wikitraffic MA10 X
Wikitraffic ROC X
Wikitraffic OSCP X
26
480 3.4. Model comparison and evaluation
481 As previously mentioned four commonly used machine learning models have been applied to our
482 ($C) stock case study: a neural networks regression ensemble (NNRE), a support vector regression
483 ensemble (SVRE), AdaBoost with unpruned regression trees as base learners (BRT) and a Random
484 Forest with unpruned regression trees as base learners (RFR). We use three evaluation metrics
485 (MAE, MAPE, RMSE) in addition to the training runtime in seconds to gauge the performance of
486 the four models in this study. The data is split into two sets, training and test. As explained in
487 Sections 2.4.5 and 2.4.6, the approach of modified LOOCV using time-slicing windows is applied
488 through the model development approach. Since the stock market is essentially a time series for-
489 mulation, 80% of the data is used for training and during the time slicing process the training set
490 only contains the data points that occur prior to the data points in the validation set. Thus, no
491 future samples are used to predict the past samples. Specifically, the size of each training sample
492 is 80% of the data across each time slice and the test set contains 5% of the data. The process is
493 repeated with different training sets where the training size is not varied through the time slicing.
494 Therefore, a series of training and test sets is generated and used for training and evaluating the
495 models. Afterwards, the prediction performance is computed by averaging the metrics over the val-
496 idation sets with the exception of training runtime. The performance of the four models to predict
497 the one-day ahead stock price using features with and without PCA transformation is shown in
498 Figure 4. The number of iterations are set at 100 for each of each of the ensembles considered for
499 a more even comparison, but we should note that the number of iterations is a parameter that can
500 be further optimized through cross-validation to achieve better results.
501 Several conclusions/observations can be made from Figure 4. First, the test error shows im-
502 provement on most ensembles as information on news sentiment, trends and other online sources
503 are appended to the technical indicators as predictors (stages). The SVRE model is the exception,
504 showing a consistent worsening performance as more variables are added, which is indicative of over-
505 fitting. The use of PCA has a positive impact on predictive performance in most criteria analyzed.
506 Overall, the Boosting (BRT) and the (RFR) have the best average performance in most of the
507 metrics analyzed, and that also includes consistency from training to testing performance. For in-
508 stance, if we use the MAPE for illustration, the ∆MAPE = 100 × |MAPEMAPE
Test −MAPETrain |
Train
for both models
509 are impressive 2.16% and 6.3% at respectively. The SVRE and NNRE ensembles present a drop in
510 consistency performance in line with results typically published in the machine learning literature,
511 ∆MAPE < 20%. From a practical perspective, all 8 models (4 models × 2 [i.e. PCA/no PCA]) can
27
Figure 4: Performance of the NNRE ( ), SVRE ( ), BRT ( ) and RFR ( ) ensembles at each stage.
512 predict the 1-day ahead price of the stock with a MAPE≤ 1.5 (with 6 models under 1 percent)
513 using all available predictors (stage 4).
514 Secondly, we can see that the use of PCA improves in some instances the predictive performance
515 (irrespective of which metric is used for evaluation) of the ensembles, but in some cases it does not.
516 From an a practitioner’s perspective, the decision to use PCA or not hinges on two factors: (a)
517 what is an acceptable MAPE for the testing data (e.g., do they pick the best model or any model
518 under a pre-specified acceptable MAPE?); and (b) how much time are they willing to dedicate to
519 training the model. For the first factor, three of the six models with a MAPE under 1 percent
520 involved the use of PCA, so there is not a significant drop in performance by choosing either option.
521 The second factor is that the use of PCA can cut the training time significantly as can be seen
522 in Figure 4. As expected using PCA the can reduce model training time significantly. For the
523 BRT, NNRE, RFR and SVRE, the corresponding reductions in runtime are: 58%, 268%, 10%,
524 and 41%, for Stage 4 models respectively. We note that there exists a significant difference in
525 runtime between the different ensemble methods (irrespective of whether PCA is used or not). We
526 attribute this to both the complexity of the method and also to the availability of optimized R
527 packages. Thus, our results (for a given ensemble model) may not be typical if Python or some
528 other software/programming language is used. Hereafter, we focus on the ensembles performance
529 using PCA since the performance is similar and the training time is shorter than no PCA models.
530 Note that a model predicting next day stock prices can be feasibly implemented as long as the time
28
531 to train does not exceed the difference in close of trading to next day opening.
532 Figure 4 does not provide insights into the effectiveness of each ensemble in capturing the
533 turning points in the stock price. To overcome this limitation, we depict the Stage 4 prediction
534 performance for the competing ensembles over time in Figures 5 and 6. Figure 5 illustrates the
535 prediction pattern of the competing ensemble models predicting the $C stock price compared to
536 the actual price. Figure 5 shows the prediction bias defined as b = |y − ŷ|. It is interesting to
537 point out that some patterns can be observed in terms of the predictive errors, but the main finding
538 is that The BRT and RFR methods have the smallest prediction errors and bias. An interesting
539 observation is that when the stock price is stable (i.e., only changes in a small range), the SVRE
540 method overestimates the volatility by exaggerating the amplitude, as well as the frequency of
541 oscillates resulting in the highest test error rate, however the SVRE ensemble does a decent job
542 at predicting price turns. A finely tuned SVRE might better suited to predict price turns and an
543 opportunity for arbitrage.
Actual
NNRE
60
SVRE
BRT
RFR
55
50
Price
45
40
35
2015−12−29 2016−01−28 2016−02−26 2016−03−28 2016−06−14 2016−07−21 2016−08−26 2016−10−03 2016−11−02
Date
Figure 5: Ensemble predictions and actual price of the $C stock over time
544 From the above discussion, we have found that the performance of the BRT and RFR ensembles
545 is better than the SVRE and NNRE ensembles for one-day ahead stock price prediction not only in
546 terms of predictive performance but also in consistency and faster runtimes. To formally understand
547 the usefulness of our approach as the prediction window increases, we consider the forecasting
548 performance for up to 10 lags. We use the notation Lag X as the target that predicts the price
549 X days ahead of the market. As an example, we consider the BRT ensemble using PCA. The
550 results are presented in Table 4. From the results, it is clear that the performance has a decreasing
29
4 NNRE Bias
● Index
● ●
● ● ● ● ●
●
2
●
●● ● ● ● ● ● ●
● ●● ● ● ● ● ●●
● ● ● ● ● ●
● ●
● ●● ● ●
●●●● ●● ● ● ●
● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ●
●
● ● ●●● ● ● ● ●● ●●●● ●●● ● ● ● ●●
● ●● ● ●● ●●● ● ●● ● ● ●● ● ●● ●●●●●● ●●● ● ●● ● ● ●
● ● ●●● ● ● ●
●● ● ● ●● ●●● ● ● ● ●● ● ● ● ●● ● ● ●● ● ●● ●●● ● ●● ● ● ● ●●● ● ● ● ●
0
● ● ●
SVRE Bias
Index
4
●
●
● ●
● ● ● ● ●
● ●
● ●● ● ● ●
●
2
● ● ● ● ●● ● ●
● ● ●
● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ●● ●
● ● ● ● ●● ●● ● ●●● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ●● ● ● ●
● ● ● ●●
● ●● ● ●● ●● ● ● ●
● ● ● ●● ●● ●
●●● ● ●● ●● ● ●●●●● ● ● ● ● ● ● ● ●
● ● ● ●
● ● ● ● ●● ●● ●
● ● ●● ● ● ●● ● ●● ● ● ● ●
0
● ●
BRT (secs)
Time Bias
4
2
●
● ●
● ●
● ●
● ● ● ● ● ● ● ● ●
●●● ●●
● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ●
● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ●●●●● ●●●●● ● ●● ●
● ●● ●●● ● ● ●●●● ● ●●● ●●● ●●● ●● ●●●● ●●●● ● ●●● ●●●●●●●● ●●●● ●● ● ● ● ●●●●●●●●●● ●● ●●●● ●● ● ●●●
● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ●●●
0
●● ● ●
RFR (secs)
Time Bias
4
●
● ●
2
● ●● ● ●
●● ● ● ●
● ● ● ●●● ● ● ● ●● ●● ● ● ●
● ●● ● ● ●● ●
●● ● ●● ● ● ● ● ● ●●●
● ● ●
● ●● ● ●●●● ● ● ●● ●
● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ●● ●
● ● ● ●●● ●●●● ● ● ●
●● ● ●● ● ●
● ●● ● ●●●●● ● ● ●
● ● ● ● ●●● ● ●●●●●●●●●● ●●●●●● ●●● ●●●● ● ●●● ● ● ● ●●●●●●●● ● ●
● ●● ● ● ●● ● ● ●
0
● ● ●
2015−12−29 2016−01−19 2016−02−05 2016−02−25 2016−03−15 2016−04−04 2016−04−28 2016−07−08 2016−07−27 2016−08−23 2016−09−19 2016−10−06 2016−10−25 2016−12−01
Figure 6: Prediction bias over time
551 trend as the prediction period increases. Based on the natural volatility of the market, the rate
552 of change in prices is commonly larger in long term predictions than short term. Moreover, the
553 results validate that the features obtained from internet sources, such as Google Trends, significantly
554 shock the stock market for a relatively short period (one or two days), therefore the impact of these
555 variables on the predictive performance will gradually reduce. An analysis on the importance of
556 each variable on the predictive performance across lags shows that the technical indicators remain
557 considerably important in terms of prediction power, but the variables obtained from online sources
558 vary significantly after lag 3. The reader is referred to our online supplemental material at https:
559 //github.com/martinwg/stockprediction for more information on variable importance.
Table 4: The performance of the BRT ensemble on different targets
MAE MAPE RMSE MAE MAPE RMSE

Lag 1 0.349 0.787 0.482 Lag 6 0.748 1.690 1.020
Lag 2 0.668 1.371 0.849 Lag 7 0.778 1.760 1.080
Lag 3 0.754 1.555 0.952 Lag 8 0.788 1.790 1.110
Lag 4 0.662 1.490 0.868 Lag 9 0.837 1.890 1.150
Lag 5 0.759 1.720 1.010 Lag 10 0.800 1.790 1.120
560 3.5. Evaluating the generalizability of our expert system

561 In this subsection, we analyze 19 additional stocks to gauge the prediction performance of our
562 expert system under a wider range of industries, volatilities, growth patterns and general conditions.
30
563 Table 5 shows the test MAE, MAPE and RMSE under the same methodological conditions as the $C
564 case study for both PCA and No PCA formulations. The stocks have been chosen to evaluate how
565 the proposed methodologies would perform under different circumstances. For instance, Amazon’s
566 ($AMZN) stock was consistently increasing in price across the analysis period, while: (a) Duke
567 Energy’s stock ($DUK) had both periods of growth and decline; and (b) MacDonald’s ($MCD)
568 stock price was very stable. In addition, these 19 stocks captured several industries: (a) retail (e.g.,
569 Amazon and Walmart), (b) restaurants (e.g., McDonald’s), (c) medical industries (e.g, Pfizer), (d)
570 energy and oil & gas (e.g., Chevron and Duke Energy), (e) techonology stocks (e.g., Facebook,
571 IBM and Twitter), (f) communications (e.g., Time Warner and Verizon), etc. In this analysis, we
572 have also included a HYBRID method that consists of averaging out the prediction of the proposed
573 ensembles and integrate these predictions in an effort to obtain a more stable predictive model.
Table 5: The performance of the ensemble models for the 1-day ahead price prediction on different stocks
No PCA w/PCA
NNRE 5.9176 0.8308 7.6973 6.6202 0.9296 8.6108
SVRE 12.2509 1.7032 15.5321 17.9721 2.5373 22.6149
Amazon.com, Inc. ($AMZN)
BRT 4.2573 0.6009 5.6486 7.2265 1.0279 9.6373
RFR 3.2064 0.4538 4.5678 9.5250 1.3088 12.7378
HYBRID 5.0736 0.7115 6.3352 7.4839 1.0472 9.0729
NNRE 0.8276 0.8104 1.0929 0.8969 0.8768 1.1781
SVRE 1.9908 1.9623 2.5911 2.8127 2.7419 3.7080
Apple Inc. ($AAPL)
BRT 0.5194 0.5064 0.7030 0.5318 0.5165 0.6927
RFR 0.3905 0.3831 0.5481 0.7609 0.7507 0.9985
HYBRID 0.7322 0.7204 0.9918 0.9896 0.9636 1.2188
NNRE 0.6260 0.7951 0.8192 0.6645 0.8458 0.8795
SVRE 0.5521 0.6952 0.6535 0.3812 0.4799 0.3821
Chevron Corporation ($CVX)
BRT 0.4755 0.6037 0.6532 0.2626 0.3312 0.3449
RFR 0.3118 0.3962 0.4335 0.6537 0.8187 0.8888
HYBRID 0.4530 0.5755 0.6088 0.4451 0.5625 0.5855
NNRE 0.2372 0.5663 0.3277 0.2477 0.5904 0.3405
SVRE 0.5100 1.2150 0.6748 0.6927 1.6560 0.9037
The Coca-Cola Company ($KO)
BRT 0.1650 0.3941 0.2184 0.1161 0.2768 0.1488
RFR 0.1105 0.2643 0.1595 0.2098 0.4964 0.2921
HYBRID 0.1651 0.3946 0.2267 0.1653 0.3929 0.2226
31
No PCA w/PCA
NNRE 0.5341 0.5325 0.7111 0.6318 0.6298 0.8699
SVRE 0.3974 0.4064 0.3985 1.1131 1.1384 1.1161
The Walt Disney Company ($DIS)
BRT 0.2845 0.2866 0.3661 0.4624 0.4611 0.6346
RFR 0.3410 0.3366 0.4664 0.8705 0.8694 1.1494
HYBRID 0.3408 0.3398 0.4689 0.5379 0.5384 0.7096
NNRE 0.5242 0.7041 0.6843 0.5735 0.7717 0.7334
SVRE 1.4937 1.9983 1.8603 1.6727 2.2360 2.2100
Duke Energy ($DUK)
BRT 0.4723 0.6331 0.6187 0.5222 0.6997 0.7092
RFR 0.2625 0.3525 0.3485 0.5978 0.7923 0.7445
HYBRID 0.4015 0.5387 0.5290 0.4938 0.6592 0.6208
NNRE 0.9674 0.8149 1.2636 0.9939 0.8374 1.3558
SVRE 2.1867 1.8412 2.7752 2.8258 2.3646 3.4875
Facebook, Inc. ($FB)
BRT 0.6541 0.5504 0.8697 0.5956 0.5047 0.7643
RFR 0.4401 0.3709 0.6570 1.1996 0.9883 1.5523
HYBRID 0.6497 0.5476 0.8877 0.7859 0.6566 1.0118
NNRE 0.9646 0.6590 1.2282 0.9740 0.6637 1.2621
SVRE 2.4874 1.6923 3.1237 4.2481 2.8866 5.6314
IBM ($IBM)
BRT 0.8689 0.5941 1.1909 0.9619 0.6543 1.2109
RFR 0.4842 0.3311 0.6830 1.0816 0.7280 1.4377
HYBRID 0.7357 0.5030 0.9808 0.8432 0.5724 1.0923
NNRE 0.5972 0.8775 0.8443 0.6549 0.9603 0.9148
SVRE 1.3910 2.0313 1.6977 1.4811 2.1612 1.9069
Marriott International, Inc. ($MAR)
BRT 0.5193 0.7628 0.7581 0.4342 0.6389 0.6029
RFR 0.2969 0.4348 0.4517 0.4891 0.6965 0.6822
HYBRID 0.4507 0.6621 0.6584 0.4612 0.6716 0.6376
NNRE 0.6565 0.5663 0.8464 0.7149 0.6171 0.9419
SVRE 0.6815 0.5879 0.6820 0.4747 0.4095 0.4750
McDonald’s ($MCD)
BRT 0.4819 0.4162 0.6410 0.3914 0.3370 0.4984
RFR 0.3328 0.2867 0.4627 0.7935 0.6752 1.0200
HYBRID 0.4653 0.4014 0.6216 0.5319 0.4557 0.6697
NNRE 0.6236 1.9175 0.8588 0.6880 2.1332 0.9638
SVRE 0.9635 3.0024 1.2410 1.0608 3.3064 1.4018
Newmont Mining Corporation ($NEM)
BRT 0.2370 0.7175 0.3316 0.7613 2.3448 1.0122
RFR 0.3408 1.0342 0.4702 1.2043 3.5748 1.5026
HYBRID 0.4578 1.4080 0.6214 0.6787 2.0560 0.8547
32
No PCA w/PCA
NNRE 0.4540 0.8352 0.5807 0.4887 0.8989 0.6383
SVRE 0.4024 0.7449 0.4032 0.2348 0.4345 0.2352
Nike, Inc. ($NKE)
BRT 0.3357 0.6173 0.4438 0.3911 0.7180 0.4943
RFR 0.2200 0.4051 0.2975 0.3916 0.7359 0.5101
HYBRID 0.3174 0.5840 0.4185 0.3704 0.6847 0.4819
NNRE 0.2033 0.6447 0.2791 0.2120 0.6704 0.2901
SVRE 0.5420 1.7113 0.5429 0.2598 0.8201 0.2602
Pfizer, Inc. ($PFE)
BRT 0.2088 0.6616 0.2917 0.1088 0.3449 0.1469
RFR 0.1054 0.3349 0.1524 0.2092 0.6536 0.2708
HYBRID 0.1614 0.5118 0.2303 0.1372 0.4280 0.1733
NNRE 0.8024 1.0720 1.0596 0.8693 1.1631 1.1447
SVRE 1.3340 1.7916 1.6525 1.6573 2.2272 2.1813
Prudential Financial ($PRU)
BRT 0.6568 0.8891 0.9006 0.6677 0.8886 0.8571
RFR 0.4090 0.5432 0.5942 0.9161 1.1956 1.1942
HYBRID 0.6335 0.8552 0.8145 0.7036 0.9334 0.9082
NNRE 0.4383 1.0822 0.6212 0.4717 1.1578 0.6608
SVRE 0.9012 2.1956 1.1364 0.8364 2.0293 1.0690
Southwest Airlines Co. ($LUV)
BRT 0.3839 0.9493 0.5803 0.1907 0.4672 0.2590
RFR 0.2196 0.5402 0.3363 0.3961 0.9549 0.5414
HYBRID 0.3645 0.8986 0.5076 0.3473 0.8466 0.4546
NNRE 0.4767 0.7035 0.7142 0.5091 0.7518 0.7671
SVRE 1.0651 1.5650 1.3819 1.3136 1.9340 1.7119
Time Warner, Inc. ($TWX)
BRT 0.3221 0.4739 0.4669 0.2887 0.4253 0.3952
RFR 0.2140 0.3159 0.3286 0.5184 0.7564 0.7210
HYBRID 0.4179 0.6153 0.5503 0.4966 0.7305 0.6500
NNRE 0.5100 2.8188 0.8240 0.4730 2.6803 0.6979
SVRE 0.6580 3.5255 1.1659 0.4998 2.9238 0.6554
Twitter Inc. ($TWTR)
BRT 0.4852 2.6492 0.8032 0.4553 2.6671 0.6045
RFR 0.4967 2.6253 0.8559 0.3922 2.8556 0.5181
HYBRID 0.4634 2.5165 0.7890 0.3895 2.2544 0.5345
NNRE 0.2775 0.5714 0.3540 0.3219 0.6642 0.4092
SVRE 1.0309 2.1205 1.0319 0.8128 1.6718 0.8135
Verizon Communications ($VZ)
BRT 0.2080 0.4290 0.2669 0.1621 0.3343 0.2123
RFR 0.1515 0.3124 0.1982 0.3246 0.6576 0.4442
HYBRID 0.2033 0.4194 0.2600 0.2307 0.4730 0.2986
33
No PCA w/PCA
NNRE 0.4487 0.6627 0.6749 0.4990 0.7373 0.7301
SVRE 1.0222 1.5010 1.3245 1.1679 1.7063 1.5083
Wal-mart Stores, Inc. ($WMT)
BRT 0.3307 0.4871 0.4853 0.2381 0.3521 0.3322
RFR 0.2131 0.3148 0.3396 0.4643 0.6791 0.6444
HYBRID 0.3199 0.4721 0.4873 0.3360 0.4957 0.4916
574 There are five main observations from Table 5. First, the mean absolute percentage error
575 (MAPE) was on average lower for the simulations with no PCA than those with PCA. Second,
576 the BRT and RFR methodologies are the most predictive for the investigated stock, with the BRT
577 method being on average the top performer. The third observation pertains to the performance
578 of the SVRE, which typically had the lowest performance among the different ensemble methods.
579 Fourth, the HYBRID method does not result in an improved performance over the best-performing
580 ensembles considered; however, the HYBRID method does perform slightly better than the average
581 of the four ensemble methods. The fifth and perhaps the most important observation is that the
582 expert systems perform strongly across the different scenarios considered; for example, the BRT
583 no/PCA method has a MAPE under 0.75% for 18 of the 19 stocks considered.
584 4. CONCLUDING REMARKS AND FUTURE WORK
585 4.1. An overview of the impacts and contributions of our proposed expert system
586 In this paper we propose a financial expert system that can be used to predict the 1-day ahead
587 stock price. While there are several proposed financial expert systems in the literature, our ap-
588 proach has three main unique characteristics. First, our “knowledge base” combines five different
589 data sources: (a) traditional predictors extracted from stock market data, (b) features/insights
590 extracted from financial news, (c) features capturing public interest based on Google Trends, (d)
591 features capturing the public’s interest on a stock’s related Wikipedia pages, and (e) technical in-
592 dicators applied to the aforementioned four data sources. To the best of our knowledge, this is the
593 first financial expert system for predicting stock prices that combines these data sources together.
594 Typically, the methodologies in the literature focuses on either traditional or online sources (with
595 limited/no methods that combine stock data with both technical indicators and different sources of
596 online data). The underlying hypothesis behind the construction of our “knowledge base” is: the
34
597 prediction performance of the AI platform will improve based on integrating disparate data sources
598 in the knowledge base. This hypothesis is founded based on evidence from both the data mining
599 and stock movement (up/down) prediction literatures. Second, our AI platform trains ensemble
600 models to predict stock prices over multiple time periods. As shown in Table 1, the multiple time
601 period prediction using ensembles has received limited attention in the literature. This is somewhat
602 surprising since: (a) ensemble methods generally outperform single classifiers and (b) predicting the
603 price over multiple time periods provide investors with more information and thus, can potentially
604 lead to better decision-making. Third, by making publically available our code, both investors and
605 researchers can utilize our expert system in predicting the price for any stock. The models will be
606 retrained based on each stock picked by the user.
607 To demonstrate the utility of our system, we presented a case study based on the Citi Group
608 stock ($C) utilizing data from 01/01/2013 - 12/31/2016. From our case study, the AI platform
609 identified the boosted regression tree (BRT) and the Random Forest Ensemble (RFR) as the best
610 models for predicting the 1-day ahead stock price. Based on our analysis, the predictions from the
611 BRT model for any of the time periods have a test mean absolute percent error (MAPE) ≤ 1.5%.
612 Based on the $C case study, 6 out of 10 ensemble models we applied using both the online and
613 traditional data sources have achieved a MAPE < 1% in the 1-day ahead test dataset. Compared
614 to the literature that uses either single data sources or individual/ensemble learning models shown
615 in 1, our results are very competitive to the reported MAPEs of above 1%. In addition, using
616 the BRT ensemble to predict the price for a time window of 10 days, the mean absolute percent
617 error for all periods was ≤ 1.890%. A closer examination of the features extracted for these time
618 periods indicate that online data contributes significantly to the prediction accuracy. However, the
619 importance of those online features reduces or varies significantly over time. While this makes sense
620 since online data seem to capture the crowd’s interest at a given moment, this observation has
621 not been reported previously in the literature. We believe that this is due to the limited research
622 performed on combining different data sources for multiple time period prediction.
623 To assess the generalizability of our expert system, we investigated its 1-day ahead predic-
624 tion performance for 19 additional stocks. The stocks were chosen to capture different industries,
625 volatilities and growth patterns. The results summarized in Table 5 indicate that the observations
626 obtained from the main case study (involving the $C stock) can be extended to a wide variety of
627 stocks. In addition, our results indicate that the BFR approach typically outperforms the four other
628 predictive models investigated.
35
629 4.2. Using our expert system in practice: Some advice to investors
630 Accurately predicting stock prices and estimated returns is the “dream” of every investor. In
631 this paper, we present an ensemble-based approach to predict the 1-day ahead stock price using
632 various data sources. Based on our results, we believe that a 1-day ahead MAPE of ≤ 0.75% has
633 the potential to be informative for investors. We make our code publicaly available for further
634 evaluation and application to different stocks. Since we do not require a potential investor to have
635 a detailed (any) knowledge of R, we provide a tutorial for how they can modify/tweak our code
636 to predict the price of any U.S. stock in the future. The tutorial is hosted at https://github.com/
637 martinwg/stockprediction. The tutorial covers all the details from setting up and installing R to
638 running our code. Note that our code shows the fundamental steps the investor should take to
639 scrape the online data, provided that he/she presents R with keywords for the financial news query
640 and the titles of the pertinent Wikipedia pages for the stock.
641 Over the past couple of years, there has been an increasing number of articles on the use of
642 artificial intelligence for automating trading decisions (see e.g., the investigation in Wired by Metz
643 (2016)). Thus, it is important to highlight two major differences in motivation and scope behind
644 our endeavor and the efforts highlighted in Metz (2016). First, we have released all the details
645 behind our approach. While transparency is important in the context of academic research, it
646 does not carry the same connotation in the context of arbitraging as any competitive advantage is
647 lost once methodologies are publicly available. However, we believe the insights from our research
648 can be generalized. Specifically, it is important to: (a) consider how the stock will perform over
649 multiple time horizons; and (b) incorporating non-traditional data sources can improve prediction
650 performance. Second, our expert system does not include an optimization or a decision-making
651 engine. This is primarily related to the overarching (practical) goal from this research is to provide
652 an investor with: (a) a novel data-driven forecast which has predictive potential, or (b) insights
653 into some predictors that should be considered for prior to making an investment decision. These
654 forecasts or insights can also be incorporated as a part of a larger model.
655 4.3. Limitations and future research
656 Despite the predictive performance of our method, there are some limitations in our study that
657 need to be highlighted. In the previous paragraph, we highlighted some limitations from a practical
658 perspective. Here, we highlight some of the limitations from a research viewpoint. First, we have
659 only examined the utility of our model for predicting the price for 1 up to 10 days ahead. While
36
660 these represent up to two trading weeks, there is no standard definition for what constitutes short-
661 term stock predictions. The range can be in minutes/hours as in Geva & Zahavi (2014); Schumaker
662 & Chen (2009) and can go up to a month (see Wang et al. (2011); Khansa & Liginlal (2011); Wang
663 et al. (2012)). We have not investigated how our models will work at these extreme ends of the
664 short-term prediction time frame, especially since some of our predictors cannot be obtained at
665 finer granularity (e.g. Wikipedia releases their traffic information per hour and Google releases its
666 trend by day). Second, our analysis was limited to 20 U.S.-based stocks (the Citi stock and the 19
667 additional stocks presented in Table 5) during the time period from 2013-2016. We did not attempt
668 to monitor any indices or stocks from non-US markets. It is not clear whether the performance in
669 our case study would translate to future time frames and/or other stocks. The reader should note
670 that this is a limitation of any machine learning model. We attempted to mitigate the effect of this
671 limitation by making our code freely available to encourage other researchers to apply our method
672 for future time periods and/or other datasets. Third, our financial expert system currently has no
673 mechanism for detecting its obsolescence, i.e. when it needs to be retrained. While this is a common
674 limitation in the stock market prediction literature, there exists some statistical surveillance tools
675 that can be used for detecting a change in the model’s performance. The reader is referred to
676 Megahed & Jones-Farmer (2015) for an introductory discussion.
677 In our estimation, there are three major opportunities for future research. First, with the
678 exception of using technical indicators to generate features, we did not capitalize on the time-series
679 nature of the stock market. Other researchers can investigate whether using: (a) additional features
680 that capture the time-series nature of the price; or (b) using ensemble approaches that can capitalize
681 on this inherent property of the data (e.g., a recurrent neural network which consider the time effect
682 while connecting neuron layers) can improve the prediction performance. Second, researchers can
683 examine the impact of a firm’s location can be affected by the different predictors. For example,
684 Alibaba ($BABA) and Amazon ($AMZN) are direct competitors on the global market. $BABA
685 trades in the NYSE and $AMZN trades in NASDAQ. However, their operational foothold differs
686 significantly with Alibaba predominately in China and Amazon in the U.S. Thus, it would be
687 interesting to see how these differences affect the predictors’ importance and AI’s accuracy. Third,
688 it is logical to extend our system into a trading engine, which uses our predictions to maximize the
689 returns while minimizing investment risk.
690 In summary, this paper proposed a novel financial expert system for predicting short-term stock
691 prices. Our expert system is comprised of: (a) a detailed knowledge base that captures data from
37
692 both traditional and online sources; and (b) an AI platform that utilizes ensembles and a hybrid
693 model to predict the price over multiple time period. We have shown that our expert system tackles
694 a gap in the literature and we hypothesized that our proposed system will perform better than its
695 predecessors in the literature since it captures more information and utilizes superior artificial
696 intelligence methodologies. From our analysis, we have shown that our system has an excellent
697 predictive performance. To the best of our knowledge, the error rates achieved by our proposed
698 method are lower than those reported in the literature. In this paper, we have also presented
699 some advice to investors and presented four major future research streams that can build on the
700 limitations of our work. Our code and data are made available at https://github.com/martinwg/
701 stockprediction to encourage researchers to reproduce and/or extend our work.
702 REFERENCES
703 Abdullah, M., & Ganapathy, V. (2000). Neural network ensemble for financial trend prediction. In
704 TENCON 2000. Proceedings (pp. 157–161). IEEE volume 3.
705 Alkhatib, K., Najadat, H., Hmeidi, I., & Shatnawi, M. K. A. (2013). Stock price prediction using k-
706 nearest neighbor (knn) algorithm. International Journal of Business, Humanities and Technology,
707 3 , 32–44.
708 Araújo, R. d. A., Oliveira, A. L., & Meira, S. (2015). A hybrid model for high-frequency stock
709 market forecasting. Expert Systems with Applications, 42 , 4081–4096.
710 Arlot, S., Celisse, A. et al. (2010). A survey of cross-validation procedures for model selection.
711 Statistics surveys, 4 , 40–79.
712 Ballings, M., Van den Poel, D., Hespeels, N., & Gryp, R. (2015). Evaluating multiple classifiers for
713 stock price direction prediction. Expert Systems with Applications, 42 , 7046–7056.
714 Barak, S., & Modarres, M. (2015). Developing an approach to evaluate stocks by forecasting effective
715 features with data mining methods. Expert Systems with Applications, 42 , 1325–1339.
716 Bollen, J., Mao, H., & Zeng, X. (2011). Twitter mood predicts the stock market. Journal of
717 Computational Science, 2 , 1–8.
718 Booth, A., Gerding, E., & Mcgroarty, F. (2014). Automated trading with performance weighted
719 random forests and seasonality. Expert Systems with Applications, 41 , 3651–3661.
720 Boser, B. E., Guyon, I. M., & Vapnik, V. N. (1992). A training algorithm for optimal margin
721 classifiers. In Proceedings of the fifth annual workshop on Computational learning theory (pp.
722 144–152). ACM.
723 Breiman, L. (2001). Random forests. Machine Learning, 45 , 5–32.
724 Chen, Y., & Hao, Y. (2017). A feature weighted support vector machine and k-nearest neighbor
725 algorithm for stock market indices prediction. Expert Systems with Applications, 80 , 340–355.
38
726 Chen, Y., Yang, B., & Abraham, A. (2007). Flexible neural trees ensemble for stock index modeling.
727 Neurocomputing, 70 , 697–703.
728 Chong, E., Han, C., & Park, F. C. (2017). Deep learning networks for stock market analysis and pre-
729 diction: Methodology, data representations, and case studies. Expert Systems with Applications,
730 83 , 187–205.
731 Cootner, P. (1964). The random character of stock market prices. M.I.T. Press. URL: https:
732 //books.google.com/books?id=jW9gT8U6dqQC.
733 Cortes, C., & Vapnik, V. (1995). Support vector machine [j]. Machine learning, 20 , 273–297.
734 Dietterich, T. G. (2000a). Ensemble methods in machine learning. In International Workshop on

735 Multiple Classifier Systems (pp. 1–15). Springer.
736 Dietterich, T. G. (2000b). An experimental comparison of three methods for constructing ensembles
737 of decision trees: Bagging, boosting, and randomization. Machine learning, 40 , 139–157.
738 Drucker, H., Burges, C. J., Kaufman, L., Smola, A., Vapnik, V. et al. (1997). Support vector
739 regression machines. Advances in neural information processing systems, 9 , 155–161.
740 Fama, E. F. (1965). The behavior of stock-market prices. The Journal of Business, 38 , 34–105.
741 URL: http://www.jstor.org/stable/2350752.
742 Fama, E. F., Fisher, L., Jensen, M. C., & Roll, R. (1969). The adjustment of stock prices to new
743 information. International economic review , 10 , 1–21.
744 Fodor, I. K. (2002). A survey of dimension reduction techniques. Center for Applied Scientific
745 Computing, Lawrence Livermore National Laboratory, 9 , 1–18.
746 Foresee, F. D., & Hagan, M. T. (1997). Gauss-newton approximation to bayesian learning. In
747 Neural networks, 1997., international conference on (pp. 1930–1935). IEEE volume 3.
748 Freund, Y. (1990). Boosting a weak learning algorithm by majority. In COLT (pp. 202–216).
749 volume 90.
750 Freund, Y., & Schapire, R. E. (1995). A desicion-theoretic generalization of on-line learning and an
751 application to boosting. In European conference on computational learning theory (pp. 23–37).
752 Springer.
753 Friedman, J., Hastie, T., & Tibshirani, R. (2001). The elements of statistical learning volume 1.
754 Springer series in statistics Springer, Berlin.
755 Geva, T., & Zahavi, J. (2014). Empirical evaluation of an automated intraday stock recommendation
756 system incorporating both market data and textual news. Decision support systems, 57 , 212–223.
757 Gidofalvi, G. (2001). Using news articles to predict stock price movements. Technical Report
758 Department of Computer Science and Engineering, University of California, San Diego. URL:
759 http://people.kth.se/∼gyozo/docs/financial-prediction.pdf.
760 Göçken, M., Özçalıcı, M., Boru, A., & Dosdoğru, A. T. (2016). Integrating metaheuristics and
761 artificial neural networks for improved stock price prediction. Expert Systems with Applications,
762 44 , 320–331.
39
763 Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press. http://www.
764 deeplearningbook.org.
765 Guresen, E., Kayakutlu, G., & Daim, T. U. (2011). Using artificial neural network models in stock
766 market index prediction. Expert Systems with Applications, 38 , 10389–10397.
767 Hassan, M. R., Nath, B., & Kirley, M. (2007). A fusion model of hmm, ann and ga for stock market
768 forecasting. Expert systems with Applications, 33 , 171–180.
769 Hotelling, H. (1933). Analysis of a complex of statistical variables into principal components.
770 Journal of educational psychology, 24 , 417.
771 Hsu, M.-W., Lessmann, S., Sung, M.-C., Ma, T., & Johnson, J. E. (2016). Bridging the divide
772 in financial market forecasting: machine learners vs. financial economists. Expert Systems with
773 Applications, 61 , 215–234.
774 Kao, L.-J., Chiu, C.-C., Lu, C.-J., & Yang, J.-L. (2013). Integration of nonlinear independent
775 component analysis and support vector regression for stock price forecasting. Neurocomputing,
776 99 , 534–542.
777 Kara, Y., Boyacioglu, M. A., & Baykan, Ö. K. (2011). Predicting direction of stock price in-
778 dex movement using artificial neural networks and support vector machines: The sample of the
779 istanbul stock exchange. Expert systems with Applications, 38 , 5311–5319.
780 Kearns, M. J., & Valiant, L. G. (1988). Learning Boolean formulae or finite automata is as hard as
781 factoring. Harvard University, Center for Research in Computing Technology, Aiken Computation
782 Laboratory.
783 Khansa, L., & Liginlal, D. (2011). Predicting stock market returns from malicious attacks: A
784 comparative analysis of vector autoregression and time-delayed neural networks. Decision Support
785 Systems, 51 , 745–759.
786 Kim, K.-j., & Han, I. (2000). Genetic algorithms approach to feature discretization in artificial
787 neural networks for the prediction of stock price index. Expert systems with Applications, 19 ,
788 125–132.
789 Kristjanpoller, W., Fadic, A., & Minutolo, M. C. (2014). Volatility forecast using hybrid neural
790 network models. Expert Systems with Applications, 41 , 2437–2442.
791 LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521 , 436–444.
792 Lin, X., Yang, Z., & Song, Y. (2009). Short-term stock price prediction based on echo state networks.
793 Expert systems with applications, 36 , 7313–7317.
794 MacKay, D. J. (1992). Bayesian interpolation. Neural computation, 4 , 415–447.
795 Maclin, R., & Opitz, D. (1997). An empirical evaluation of bagging and boosting. AAAI/IAAI ,
796 1997 , 546–551.
797 Maclin, R., & Opitz, D. (2011). Popular ensemble methods: An empirical study. Journal of Artificial
798 Intelligence Research, 11 , 169–198.
40
799 Malkiel, B. G. (2003). The efficient market hypothesis and its critics. The Journal of Economic
800 Perspectives, 17 , 59–82.
801 Martinez, W., & Gray, J. B. (2016). Noise peeling methods to improve boosting algorithms. Com-
802 putational Statistics & Data Analysis, 93 , 483–497.
803 McCulloch, W. S., & Pitts, W. (1943). A logical calculus of the ideas immanent in nervous activity.
804 The bulletin of mathematical biophysics, 5 , 115–133.
805 Meesad, P., & Rasel, R. I. (2013). Predicting stock market price using support vector regression. In
806 Informatics, Electronics & Vision (ICIEV), 2013 International Conference on (pp. 1–6). IEEE.
807 Megahed, F. M., & Jones-Farmer, L. A. (2015). Statistical perspectives on big data. In Frontiers
808 in Statistical Quality Control 11 (pp. 29–47). Springer.
809 Metz, C. (2016). The rise of the artificially intelligent hedge fund. Wired Inc., http://fortune.com/
810 2012/02/25/buffett-beats-the-sp-for-the-39th-year/. [Online, last accessed 08/08/2017].
811 Moat, H. S., Curme, C., Avakian, A., Kenett, D. Y., Stanley, H. E., & Preis, T. (2013). Quantifying
812 wikipedia usage patterns before stock market moves. Scientific reports, 3 .
813 Mok, P., Lam, K., & Ng, H. (2004). An ica design of intraday stock prediction models with
814 automatic variable selection. In Neural Networks, 2004. Proceedings. 2004 IEEE International
815 Joint Conference on (pp. 2135–2140). IEEE volume 3.
816 Murphy, J. J. (1999). Technical analysis of the financial markets: A comprehensive guide to trading
817 methods and applications. Penguin.
818 Nassirtoussi, A. K., Aghabozorgi, S., Wah, T. Y., & Ngo, D. C. L. (2015). Text mining of news-
819 headlines for forex market prediction: A multi-layer dimension reduction algorithm with semantics
820 and sentiment. Expert Systems with Applications, 42 , 306–324.
821 Nguyen, T. H., Shirai, K., & Velcin, J. (2015). Sentiment analysis on social media for stock
822 movement prediction. Expert Systems with Applications, 42 , 9603–9611.
823 Nofsinger, J. R. (2005). Social mood and financial economics. The Journal of Behavioral Finance,
824 6 , 144–160.
825 Oliveira, N., Cortez, P., & Areal, N. (2017). The impact of microblogging data for stock market
826 prediction: Using twitter to predict returns, volatility, trading volume and survey sentiment
827 indices. Expert Systems with Applications, 73 , 125–144.
828 Patel, J., Shah, S., Thakkar, P., & Kotecha, K. (2015a). Predicting stock and stock price index
829 movement using trend deterministic data preparation and machine learning techniques. Expert
830 Systems with Applications, 42 , 259–268.
831 Patel, J., Shah, S., Thakkar, P., & Kotecha, K. (2015b). Predicting stock market index using fusion
832 of machine learning techniques. Expert Systems with Applications, 42 , 2162–2172.
833 Peason, K. (1901). On lines and planes of closest fit to systems of point in space. Philosophical
834 Magazine, 2 , 559–572.
41
835 Prechter Jr, R. R., & Parker, W. D. (2007). The financial/economic dichotomy in social behavioral
836 dynamics: the socionomic perspective. The Journal of Behavioral Finance, 8 , 84–108.
837 Preis, T., Moat, H. S., & Stanley, H. E. (2013). Quantifying trading behavior in financial markets
838 using google trends. Scientific Reports, 3 , 1684.
839 Qian, B., & Rasheed, K. (2007). Stock market prediction with multiple classifiers. Applied Intelli-
840 gence, 26 , 25–33.
841 Quinlan, J. R. (1996). Bagging, boosting, and c4. 5. AAAI/IAAI, Vol. 1 , (pp. 725–730).
842 R Core Team (2016). R: A Language and Environment for Statistical Computing. R Foundation
843 for Statistical Computing Vienna, Austria. URL: https://www.R-project.org/.
844 Rather, A. M., Agarwal, A., & Sastry, V. (2015). Recurrent neural network and a hybrid model for
845 prediction of stock returns. Expert Systems with Applications, 42 , 3234–3241.
846 Raymond McTaggart, Gergely Daroczi, & Clement Leung (2016). Quandl: API Wrapper for Quan-
847 dl.com. URL: https://CRAN.R-project.org/package=Quandl r package version 2.8.0.
848 Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1985). Learning internal representations by
849 error propagation. Technical Report DTIC Document.
850 Russell, S., Norvig, P., & Intelligence, A. (1995). A modern approach. Artificial Intelligence.
851 Prentice-Hall, Egnlewood Cliffs, 25 , 27.
852 Schapire, R. E. (1990). The strength of weak learnability. Machine learning, 5 , 197–227.
853 Schapire, R. E. (2003). The boosting approach to machine learning: An overview. In Nonlinear
854 estimation and classification (pp. 149–171). Springer.
855 Schumaker, R. P., & Chen, H. (2009). Textual analysis of stock market prediction using breaking
856 financial news: The azfin text system. ACM Transactions on Information Systems (TOIS), 27 ,
857 12.
858 Serneels, S., De Nolf, E., & Van Espen, P. J. (2006). Spatial sign preprocessing: a simple way to
859 impart moderate robustness to multivariate estimators. Journal of Chemical Information and
860 Modeling, 46 , 1402–1409.
861 Smith, V. L. (2003). Constructivist and ecological rationality in economics. The American Economic
862 Review , 93 , 465–508.
863 Tetlock, P. C. (2007). Giving content to investor sentiment: The role of media in the stock market.
864 The Journal of Finance, 62 , 1139–1168.
865 Ticknor, J. L. (2013). A bayesian regularized artificial neural network for stock market forecasting.
866 Expert Systems with Applications, 40 , 5501–5506.
867 Tsai, C.-F., & Hsiao, Y.-C. (2010). Combining multiple feature selection methods for stock pre-
868 diction: Union, intersection, and multi-intersection approaches. Decision Support Systems, 50 ,
869 258–269.
42
870 Tsai, C.-F., Lin, Y.-C., Yen, D. C., & Chen, Y.-M. (2011). Predicting stock returns by classifier
871 ensembles. Applied Soft Computing, 11 , 2452–2459.
872 Ulrich, J. (2016). TTR: Technical Trading Rules. URL: https://CRAN.R-project.org/package=

873 TTR r package version 0.23-1.
874 Valiant, L. G. (1984). A theory of the learnable. Communications of the ACM , 27 , 1134–1142.
875 Vapnik, V. N., & Chervonenkis, A. J. (1974). Theory of pattern recognition. Nauka.
876 Wang, J.-J., Wang, J.-Z., Zhang, Z.-G., & Guo, S.-P. (2012). Stock index forecasting based on a
877 hybrid model. Omega, 40 , 758–766.
878 Wang, J.-Z., Wang, J.-J., Zhang, Z.-G., & Guo, S.-P. (2011). Forecasting stock indices with back
879 propagation neural network. Expert Systems with Applications, 38 , 14346–14355.
880 Wang, L., Zeng, Y., & Chen, T. (2015). Back propagation neural network with adaptive differential
881 evolution algorithm for time series forecasting. Expert Systems with Applications, 42 , 855–863.
882 Weng, B., Ahmed, M. A., & Megahed, F. M. (2017a). Stock market one-day ahead movement
883 prediction using disparate data sources. Expert Systems with Applications, 79 , 153–163.
884 Weng, B., Tsai, Y.-T., Li, C., Barth, J. R., Martinez, W., & Megahed, F. M. (2017b). An ensemble
885 based approach for major u.s. stock and sector indices prediction. Applied Soft Computing, Under
886 Review .
887 Werbos, P. (1974). Beyond regression: New tools for prediction and analysis in the behavioral
888 sciences. Ph.D. thesis Harvard University.
889 Woschnagg, E., & Cipan, J. (2004). Evaluating forecast accuracy. University of Vienna, Department
890 of Economics, .
891 Zhai, Y., Hsu, A., & Halgamuge, S. K. (2007). Combining news and technical indicators in daily
892 stock price trends prediction. In International Symposium on Neural Networks (pp. 1087–1096).
893 Springer.
894 Zhang, Y., & Wu, L. (2009). Stock market prediction of s&p 500 via combination of improved bco
895 approach and bp neural network. Expert systems with applications, 36 , 8849–8854.
43
View publication stats

Predicting Short-Term Stock Prices Using Ensemble Methods and Online Data Sources

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Predicting Short-Term Stock Prices Using Ensemble Methods and Online Data Sources

Uploaded by

Copyright:

Available Formats

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Article in Expert Systems with Applications · June 2018

Bin Weng Lin Lu

SEE PROFILE SEE PROFILE

Fadel M. Megahed Waldyn Martinez

SEE PROFILE SEE PROFILE

Stock Market Prediction View project

The user has requested enhancement of the downloaded file.

Preprint submitted to Expert Systems with Applications December 18, 2017

Paper One time interval Multiple intervals

Data Preprocessing Feature Generation

Data cleaning: missing data, outliers Features correlation analysis visualization

Phase II: AI Platform (AI Models, Evaluation and User Interface)

Machine Learning Ensemble Models Model Evaluation User Interface

a) Root mean squared error

Time slicing cross-validation

Figure 1: An overview of the proposed method

146 2.1. Knowledge base: Data acquisition

Table 2: Description of technical indicators used in this study

Technical Indicators Description

179 2.2. Knowledge base: Data preprocessing

194 2.3. Knowledge base: Feature extraction

241 2.4. The inference engine: AI model comparison and evaluation

256 2.4.1. Neural networks regression ensemble (NNRE)

296 2.4.2. Support vector regression ensemble (SVRE)

359 2.4.4. Random forest regression (RFR)

373 2.4.5. Time series cross validation

387 2.4.6. Model evaluation

419 3.1. Explanatory analysis

440 3.2. Feature extraction

Figure 2: Correlation matrix for features

(a) PCA Percent (b) PCA Rotate

Figure 3: Illustration of the variation explained by principal components

467 3.3. Set of predictor stages

Table 3: Model variables selected at each stage

Stage 1 Stage 2 Stage 3 Stage 4

2015−12−29 2016−01−28 2016−02−26 2016−03−28 2016−06−14 2016−07−21 2016−08−26 2016−10−03 2016−11−02

Figure 6: Prediction bias over time

MAE MAPE RMSE MAE MAPE RMSE

560 3.5. Evaluating the generalizability of our expert system

584 4. CONCLUDING REMARKS AND FUTURE WORK

655 4.3. Limitations and future research

723 Breiman, L. (2001). Random forests. Machine Learning, 45 , 5–32.

734 Dietterich, T. G. (2000a). Ensemble methods in machine learning. In International Workshop on

794 MacKay, D. J. (1992). Bayesian interpolation. Neural computation, 4 , 415–447.

872 Ulrich, J. (2016). TTR: Technical Trading Rules. URL: https://CRAN.R-project.org/package=

You might also like