Professional Documents
Culture Documents
How To Use Machine Learning To Possibly Become A Millionaire - Predicting The Stock Market
How To Use Machine Learning To Possibly Become A Millionaire - Predicting The Stock Market
How To Use Machine Learning To Possibly Become A Millionaire - Predicting The Stock Market
Jerry Xu Follow
Aug 30 · 8 min read
Working on Wall Street is just as intense and rewarding as you would imagine. Lots
of suits and lots of sullen faces and lots of cigarette smoke. Amidst all of the
craziness you’d expect from the literal financial center of the world, the actual
underlying goal of everyone there is pretty simple. At risk of oversimplifying things,
I’ll tell you right now that finance is simply using money (either your own or some
you’ve borrowed) to get more money. The financial industry doesn’t actually create
any value, rather it uses other factors to get returns on investments.
The stock market is one of the most well-known infrastructures through which
anyone can potentially make a fortune. If anyone could crack the code to predicting
what future stock prices are, they’ll practically rule the world.
There’s just one problem. It’s pretty much impossible to accurately predict the
future of the stock market. So many analysts, so many researchers, so many super
smart people have tried to figure it all out. No one has been able to garner consistent
results. No one.
So what’s the point of this article? Why am I writing about using machine learning
to possibly predict the stock market? Mostly just for fun, I guess. More importantly,
however, it’s a great learning exercise for machine learning and finance.
Agenda
1. The Stocker Module
2. Moving Averages
4. K-Nearest Neighbors
5. Multilayer Perceptron
7. Areas of Improvement
8. Resources
If you want a more in-depth view of this project, or if you want to add to the code, check
out the GitHub repository.
. . .
We also need to import some libraries. Now that the repo is cloned, we can import
the Stocker module as well.
Let’s create a Stocker object. I chose Google as my company, but you’re not obligated
to do the same. The Stocker module has a function called plot_stock() that does a
lot by itself.
Google’s stock is very nice
If you pay attention, you’ll notice that the dates for the Stocker object are not up-to-
date. It stops at 2018–3–27. Taking close look at the actual module code, we’ll see
that the data is taken from Quandl’s WIKI exchange. Perhaps the data is not kept up
to date?
We can use Stocker to conduct technical stock analysis, but for now we will focus on
being mediums. Stocker uses a package created by Facebook called prophet which is
good for additive modeling.
Now let’s test the stocker predictions. We need to create a test set and a training set.
We’ll have our training set to be 2014–2016, and our test set to be 2017. Let’s see
how accurate this model is.
The results are quite horrendous, with the predictions being almost as bad as a coin
flip. Let’s adjust some hyperparameters.
Here we can see the results of using di erent changepoints
Validating on the changepoints is an effective way to adjust the hyperparameters to
better tweak the stock prediction algorithm.
Now we can evaluate the refined model to see if there are any improvements in the
prediction estimates.
This is only SLIGHTLY better than the previous model
Now it’s time to do the ultimate test: try our luck in the stock market (simulated, of
course).
Even after all of that tweaking, it’s clear that simply buying and holding would
produce better returns.
. . .
. . .
Moving Averages
In summary, a moving average is a commonly used indicator in technical analysis.
It’s a lagging indicator, which means that it uses past prices to predict future prices.
It’s effective in smoothing out any short-term fluctuations and finding the overall
trend. We’ll use moving averages to see if we can do a better job of predicting stock
prices.
1 import pandas as pd
2 import numpy as np
3
4 import matplotlib.pyplot as plt
5 %matplotlib inline
6
7 import matplotlib.style
8 import matplotlib as mpl
9 mpl.style.use('ggplot')
10
11 from matplotlib.pylab import rcParams
12 rcParams['figure.figsize'] = 20, 10
13
14 from sklearn.preprocessing import MinMaxScaler
15 scaler = MinMaxScaler(feature_range=(0, 1))
16
17 # Creating copy of goog_data dataframe for moving averages
18
19 df = goog_data
20
21 df['Date'] = pd.to_datetime(df.Date, format='%Y-%m-%d')
22 df.index = df['Date']
23
Now let’s see our prediction plotted next to the actual prices.
Yikes
In terms of figuring out the general trend of the stock data, the moving average
method did okay, but it failed to see the full extent of the increase in the price, and
that is not good. We definitely wouldn’t want to use this method for actual
algorithmic trading.
. . .
1 # We'll create a separate dataset so that new features don't mess up the original data.
2
3
4 lr_data['Date'] = pd.to_datetime(lr_data.Date, format='%Y-%m-%d')
5 lr_data.index = lr_data['Date']
6
7 lr_data = lr_data.sort_index(ascending=True, axis=0)
8
9 new_data = pd.DataFrame(index=range(0, len(lr_data)), columns=['Date', 'Adj. Close'])
10 for i in range(0,len(data)):
11 new_data['Date'][i] = lr_data['Date'][i]
12 new_data['Adj. Close'][i] = lr_data['Adj. Close'][i]
13
14
1 # Train-test split
2
3 train = new_data[:2600]
4 test = new_data[2600:]
5
6 x_train = train.drop('Adj. Close', axis=1)
7 y_train = train['Adj. Close']
8 x_test = test.drop('Adj. Close', axis=1)
9 y_test = test['Adj. Close']
. . .
k-Nearest Neighbors
Let’s move on to yet another machine learning algorithm, KNN.
Let’s go through the same process with the same data as the linear regression stuff.
The only difference is that we’ll be implementing a different algorithm to the data.
Let’s see which predictions are better.
Yikes! This is the worst prediction we’ve got so far! There’s a reason k-nearest
neighbors is more useful for classification problems and small-scale regression. This
appears to be a classic case of overfitting. Because KNN is really just calculating
distances from each point to another, it was completely unable to figure out the
trend of where the prices are going. What’s next?
Multilayer Perceptron
Let’s move into some deep learning, more specifically, neural networks. A multilayer
perceptron is one of the simplest types of neural networks, at least simpler than
convolutional neural networks and long short-term memory. We don’t need to get
into the details on how the algorithm actually works. If you’re interested, check out
the resources at the end of the article.
1 import tensorflow as tf
2 from tensorflow.keras import layers
3
4 model = tf.keras.models.Sequential()
5
6 model.add(tf.keras.layers.Dense(100, activation=tf.nn.relu))
7
8 model.add(tf.keras.layers.Dense(100, activation=tf.nn.relu))
9
10 d l dd(tf k l D (1 ti ti tf l ))
10 model.add(tf.keras.layers.Dense(1, activation=tf.nn.relu))
11
12 model.compile(optimizer='adam', loss='mean_squared_error')
13
14 X_train = np.array(x_train)
15 Y_train = np.array(y_train)
16
17 model.fit(X_train, Y_train, epochs=500)
Conclusion
What did we learn today? What did all of this technical analysis show us? The
answer is quite simple: If you’re not someone like Ray Dalio or Warren Buffet or any
of the great investors, it’s very risky and ultimately not as profitable to try to beat
the stock market. According to some sources, a majority of hedge funds can’t even
do better than the S&P 500! Therefore, if you want to make the best returns on your
investments, do the buy and hold strategy. For the most part, simply investing in an
index fund like the S&P 500 has yielded pretty good returns, even when there were
several big drops in the economy. In the end, it’s up for you to decide.
Areas of Improvement
Thank you for taking the time to read through this article! Feel free to check out my
portfolio site or my GitHub.
2. Technical Analysis
4. Moving Averages
5. Linear Regression
6. K-Nearest Neighbors
7. Neural Networks
8. Tensorflow
9. Keras