Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 19

Machine Learning for Stock Selection

Robert J. Yan
Charles X. Ling
University of Western Ontario, Canada
{jyan, cling}@csd.uwo.ca

1
Outline
 Introduction
 The stock selection task
 The Prototype Ranking method
 Experimental results
 Conclusions

2
Introduction
 Objective:
– Use machine learning to select a small number
of “good” stocks to form a portfolio
 Research questions:
– Learning in the noisy dataset
– Learning in the imbalanced dataset
 Our solution: Prototype Ranking
– A specially designed machine learning method

3
Outline
 Introduction
 The stock selection task
 The Prototype Ranking method
 Experimental results
 Conclusions

4
Stock Selection Task
Given information prior to week t, predict
performance of stocks of week t
– Training set

Predictor 1 Predictor 2 Predictor 3 Goal


Stock ID Return of Return of Volume ratio Return of
week t-1 week t-2 of t-2/t-1 week t

Learning a ranking function to rank testing data


– Select n highest to buy, n lowest to short-sell

5
Outline
 Introduction
 The stock selection task
 The Prototype Ranking method
 Experimental results
 Conclusions

6
Prototype Ranking

 Prototype Ranking (PR): special machine


learning for noisy and imbalanced stock data

 The PR System
Step 1. Find good “prototypes” in training data
Step 2. Use k-NN on prototypes to rank test data

7
Step 1: Finding Prototypes
Prototypes: representative points
– Goal: discover the underlying
density/clusters of the training
samples by distributing
prototypes in sample
space
– Reduce data size
prototypes samples

prototype
neighborhood 8
Finding prototypes using competitive learning

General competitive learning


 Step 1: Randomly initialize a set of prototypes
 Step 2: Search the nearest prototypes
 Step 3: Adjust the prototypes
 Step 4: Output the prototypes

Hidden density in training is reflected in prototypes

10
Modifications for Stock data

 In step 1: Initial prototypes organized in a tree-structure


– Fast nearest prototype searching
 In step 2: Searching prototypes in the predictor space
– Better learning effect for the prediction tasks
 In step 3: Adjusting prototypes in the goal attribute space
– Better learning effect in the imbalanced stock data
 In step 4, prune the prototype tree
– Prune children prototypes if they are similar to the parent
– Combine leaf prototypes to form the final prototypes

11
Step 2: Predicting Test Data
 The weighted average of k nearest prototypes
 Online update the model with new data

12
Outline
 Introduction
 The stock selection task
 The Prototype Ranking method
 Experimental results
 Conclusions

13
Data
CRSP daily stock database
– 300 NYSE and AMEX stocks, largest market cap
– From 1962 to 2004

14
Testing PR

 Experiment 1: Larger portfolio, lower average


return, lower risk – diversification
 Experiment 2: is PR better than Cooper’s
method?

15
Results of Experiment 1
1.8
1.6

Weekly Average
1.4

Return (%)
1.2
1
Average 0.8
0.6
Return 0.4
0.2
(1978-2004) 0
0 10 20 30 40 50 60 70 80 90 100 110
Stock Number in Portfolio

5
Weekly Std.(%)

4.5

4
Risk (std)
3.5
(1978-2004) 3

2.5

2
0 10 20 30 40 50 60 70 80 90 100 110

Stock Number in Portfolio


16
Experiment 2: Comparison to
Cooper’s method
 Cooper’s method (CP): A traditional non-
ML method for stock selection…
 Compare PR and CP in 10-stock portfolios

17
Results of Experiment 2
Measures:
 Average Return (Ret.)
 Sharpe Ratio (SR): a risk-adjusted return: SR= Ret. / Std.

1.6
1.4
1.2
1
0.8 PR 10-stock portfolio
CP 10-stock portfolio
0.6
0.4
0.2
0 18
Ret.(%) SR
Outline
 Introduction
 The stock selection task
 The Prototype Ranking method
 Experimental results
 Conclusions

20
Conclusions
 PR: modified competitive learning and k-NN
for noisy and imbalanced stock data
 PR does well in stock selection
– Larger portfolio, lower return, lower risk
– PR outperforms the non-ML method CP
 Future work: use it to invest and make money!

21

You might also like