How To Use Machine Learning To Possibly Become A Millionaire - Predicting The Stock Market

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 23

How To Use Machine Learning To

Possibly Become A Millionaire:


Predicting The Stock Market?
Our con dence interval is somewhere between 50 and 70%

Jerry Xu Follow
Aug 30 · 8 min read

When you’re so bored with your stacks

Working on Wall Street is just as intense and rewarding as you would imagine. Lots
of suits and lots of sullen faces and lots of cigarette smoke. Amidst all of the
craziness you’d expect from the literal financial center of the world, the actual
underlying goal of everyone there is pretty simple. At risk of oversimplifying things,
I’ll tell you right now that finance is simply using money (either your own or some
you’ve borrowed) to get more money. The financial industry doesn’t actually create
any value, rather it uses other factors to get returns on investments.

The stock market is one of the most well-known infrastructures through which
anyone can potentially make a fortune. If anyone could crack the code to predicting
what future stock prices are, they’ll practically rule the world.

There’s just one problem. It’s pretty much impossible to accurately predict the
future of the stock market. So many analysts, so many researchers, so many super
smart people have tried to figure it all out. No one has been able to garner consistent
results. No one.

So what’s the point of this article? Why am I writing about using machine learning
to possibly predict the stock market? Mostly just for fun, I guess. More importantly,
however, it’s a great learning exercise for machine learning and finance.

Agenda
1. The Stocker Module

2. Moving Averages

3. Simple Linear Regression

4. K-Nearest Neighbors

5. Multilayer Perceptron

6. What You Should Do Instead

7. Areas of Improvement

8. Resources

If you want a more in-depth view of this project, or if you want to add to the code, check
out the GitHub repository.
. . .

Using the Stocker Module


The Stocker module is a simple Python library that contains a bunch of useful stock
market prediction functions. Upon initialization, they aren’t that accurate (better to
just flip a coin). But with some tuning of parameters, the results can be a lot better.

First we need to clone the GitHub repository.

!git clone https://github.com/WillKoehrsen/Data-Analysis.git

We also need to import some libraries. Now that the repo is cloned, we can import
the Stocker module as well.

!pip install quandl


!pip install pytrends
import stocker
from stocker import Stocker

Let’s create a Stocker object. I chose Google as my company, but you’re not obligated
to do the same. The Stocker module has a function called plot_stock() that does a
lot by itself.
Google’s stock is very nice

If you pay attention, you’ll notice that the dates for the Stocker object are not up-to-
date. It stops at 2018–3–27. Taking close look at the actual module code, we’ll see
that the data is taken from Quandl’s WIKI exchange. Perhaps the data is not kept up
to date?

We can use Stocker to conduct technical stock analysis, but for now we will focus on
being mediums. Stocker uses a package created by Facebook called prophet which is
good for additive modeling.
Now let’s test the stocker predictions. We need to create a test set and a training set.
We’ll have our training set to be 2014–2016, and our test set to be 2017. Let’s see
how accurate this model is.

Look how terrible this prediction is!

The results are quite horrendous, with the predictions being almost as bad as a coin
flip. Let’s adjust some hyperparameters.
Here we can see the results of using di erent changepoints
Validating on the changepoints is an effective way to adjust the hyperparameters to
better tweak the stock prediction algorithm.

Now we can evaluate the refined model to see if there are any improvements in the
prediction estimates.
This is only SLIGHTLY better than the previous model

Now it’s time to do the ultimate test: try our luck in the stock market (simulated, of
course).

Looks like it’s just better to buy and hold.

Even after all of that tweaking, it’s clear that simply buying and holding would
produce better returns.
. . .

Preparing Data for Machine Learning


Now let’s move on to attempting to predict stock prices with machine learning
instead of depending on a module. For this example, I’ll be using Google stock data
using the make_df function Stocker provides.
Narrowing down the dataframe to get the stu we care about

. . .

Moving Averages
In summary, a moving average is a commonly used indicator in technical analysis.
It’s a lagging indicator, which means that it uses past prices to predict future prices.
It’s effective in smoothing out any short-term fluctuations and finding the overall
trend. We’ll use moving averages to see if we can do a better job of predicting stock
prices.

1 import pandas as pd
2 import numpy as np
3
4 import matplotlib.pyplot as plt
5 %matplotlib inline
6
7 import matplotlib.style
8 import matplotlib as mpl
9 mpl.style.use('ggplot')
10
11 from matplotlib.pylab import rcParams
12 rcParams['figure.figsize'] = 20, 10
13
14 from sklearn.preprocessing import MinMaxScaler
15 scaler = MinMaxScaler(feature_range=(0, 1))
16
17 # Creating copy of goog_data dataframe for moving averages
18
19 df = goog_data
20
21 df['Date'] = pd.to_datetime(df.Date, format='%Y-%m-%d')
22 df.index = df['Date']
23

mov.py hosted with ❤ by Git Hub view raw


Here’s the closing prices for Google stock

1 # Creating dataframe with date and the target variable


2
3 data = df.sort_index(ascending=True, axis=0)
4 d t d D t F (i d (0 l (df)) l ['D t ' 'Adj Cl '])
4 new_data = pd.DataFrame(index=range(0, len(df)), columns=['Date', 'Adj. Close'])
5
6 for i in range(0, len(data)):
7 new_data['Date'][i] = data['Date'][i]
8 new_data['Adj. Close'][i] = data['Adj. Close'][i]
9
10 # Train-test split
11
12 train = new_data[:2600]
13 test = new_data[2600:]
14
15 num = test.shape[0]
16
17 train['Date'].min(), train['Date'].max(), test['Date'].min(), test['Date'].max()
18
19 # Making predictions
20
21 preds = []
22 for i in range(0, num):
23 a = train['Adj. Close'][len(train)-924+i:].sum() + sum(preds)
24 b = a/num
25 preds.append(b)
26

mov_t rain.py hosted with ❤ by Git Hub view raw


Let’s measure the accuracy of our model with RMS (Root Mean Squared Error).

Now let’s see our prediction plotted next to the actual prices.

Yikes

In terms of figuring out the general trend of the stock data, the moving average
method did okay, but it failed to see the full extent of the increase in the price, and
that is not good. We definitely wouldn’t want to use this method for actual
algorithmic trading.

. . .

Simple Linear Regression


Let’s try using another method to predict future stock prices, linear regression.

First let’s create a new dataset based off of the original.

1 # We'll create a separate dataset so that new features don't mess up the original data.
2
3
4 lr_data['Date'] = pd.to_datetime(lr_data.Date, format='%Y-%m-%d')
5 lr_data.index = lr_data['Date']
6
7 lr_data = lr_data.sort_index(ascending=True, axis=0)
8
9 new_data = pd.DataFrame(index=range(0, len(lr_data)), columns=['Date', 'Adj. Close'])
10 for i in range(0,len(data)):
11 new_data['Date'][i] = lr_data['Date'][i]
12 new_data['Adj. Close'][i] = lr_data['Adj. Close'][i]
13
14

lin_reg.py hosted with ❤ by Git Hub view raw


Now let’s add some more features to the dataset for the linear regression algorithm.
We’ll be using some functions from the fastai module.

1 !pip install fastai==0.7.0


2
3 from fastai.structured import add_datepart
4
5 add_datepart(new_data, 'Date')
6 new_data.drop('Elapsed', axis=1, inplace=True)
7

f ast .py hosted with ❤ by Git Hub view raw

Now let’s do a train-test split.

1 # Train-test split
2
3 train = new_data[:2600]
4 test = new_data[2600:]
5
6 x_train = train.drop('Adj. Close', axis=1)
7 y_train = train['Adj. Close']
8 x_test = test.drop('Adj. Close', axis=1)
9 y_test = test['Adj. Close']

t rain.py hosted with ❤ by Git Hub view raw

Now we can implement the algorithm and get some results.

1 # Implementing linear regression


2 from sklearn.linear_model import LinearRegression
3 model = LinearRegression()
4 model.fit(x_train, y_train)
5

implement .py hosted with ❤ by Git Hub view raw


Once again, the prediction algorithm somewhat figures out the general trend, yet it
fails to capture what we need the most.

. . .

k-Nearest Neighbors
Let’s move on to yet another machine learning algorithm, KNN.

Let’s go through the same process with the same data as the linear regression stuff.
The only difference is that we’ll be implementing a different algorithm to the data.
Let’s see which predictions are better.

1 from sklearn import neighbors


2 from sklearn.model_selection import GridSearchCV
3 from sklearn.preprocessing import MinMaxScaler
4 scaler = MinMaxScaler(feature_range=(0, 1))
5
6 # scaling the data
7
8 x_train_scaled = scaler.fit_transform(x_train)
9 x_train = pd.DataFrame(x_train_scaled)
10 x_test_scaled = scaler.fit_transform(x_test)
11 x_test = pd.DataFrame(x_test_scaled)
12
13 # using gridsearch to find the best value of k
14
15 params = {'n_neighbors': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]}
16 knn = neighbors.KNeighborsRegressor()
17 model = GridSearchCV(knn, params, cv=5)
18
19 # fitting the model and predicting
20 model.fit(x_train, y_train)
21 preds = model.predict(x_test)

knn.py hosted with ❤ by Git Hub view raw

What are our results?


What a horror story

Yikes! This is the worst prediction we’ve got so far! There’s a reason k-nearest
neighbors is more useful for classification problems and small-scale regression. This
appears to be a classic case of overfitting. Because KNN is really just calculating
distances from each point to another, it was completely unable to figure out the
trend of where the prices are going. What’s next?

Multilayer Perceptron
Let’s move into some deep learning, more specifically, neural networks. A multilayer
perceptron is one of the simplest types of neural networks, at least simpler than
convolutional neural networks and long short-term memory. We don’t need to get
into the details on how the algorithm actually works. If you’re interested, check out
the resources at the end of the article.

1 import tensorflow as tf
2 from tensorflow.keras import layers
3
4 model = tf.keras.models.Sequential()
5
6 model.add(tf.keras.layers.Dense(100, activation=tf.nn.relu))
7
8 model.add(tf.keras.layers.Dense(100, activation=tf.nn.relu))
9
10 d l dd(tf k l D (1 ti ti tf l ))
10 model.add(tf.keras.layers.Dense(1, activation=tf.nn.relu))
11
12 model.compile(optimizer='adam', loss='mean_squared_error')
13
14 X_train = np.array(x_train)
15 Y_train = np.array(y_train)
16
17 model.fit(X_train, Y_train, epochs=500)

percept ro n.py hosted with ❤ by Git Hub view raw


Let’s get our results.
This is even worse than KNN! There are a number of factors as to why the neural
network is so bad at predicting the stock prices, and one of them is definitely the
lack of meaningful features and data. Obviously there are many hyperparameters
that can be tweaked as well.

Conclusion
What did we learn today? What did all of this technical analysis show us? The
answer is quite simple: If you’re not someone like Ray Dalio or Warren Buffet or any
of the great investors, it’s very risky and ultimately not as profitable to try to beat
the stock market. According to some sources, a majority of hedge funds can’t even
do better than the S&P 500! Therefore, if you want to make the best returns on your
investments, do the buy and hold strategy. For the most part, simply investing in an
index fund like the S&P 500 has yielded pretty good returns, even when there were
several big drops in the economy. In the end, it’s up for you to decide.

Areas of Improvement
Thank you for taking the time to read through this article! Feel free to check out my
portfolio site or my GitHub.

1. Use different stock data


I only used Google stock data and for a relatively small range of time. Feel free to use
different data that can be pulled with Stocker or Yahoo Finance or Quandl.

2. Try out different machine learning algorithms


There are MANY machine learning algorithms out there that are very good. I only
used a small subset of them and only one of them was even a deep learning
algorithm.

3. Tweak more hyperparameters


This is pretty self-explanatory. More often than not, the default settings for any
algorithm are not optimal, thus it’s useful for you to try out some validation to
figure out which hyperparameters are most effective.
Resources
1. Understanding the Stock Market

2. Technical Analysis

3. What is Machine Learning?

4. Moving Averages

5. Linear Regression

6. K-Nearest Neighbors

7. Neural Networks

8. Tensorflow

9. Keras

Machine Learning Algorithmic Trading Arti cial Intelligence Stock Market

Towards Data Science

About Help Legal

You might also like