Financial Analytics Study To Build Robust Trading Strategies Capstone Project - 2022-23

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 17

Financial Analytics Study to Build Robust

Trading Strategies
Capstone Project -2022-23
Problem Definition
 The main objective of this study is to find the best model to predict the
BUY \ SELL stocks in stock market.

 In the finance world stock trading is one of the most important activities.
Stock market prediction is an act of trying to determine the future value
of a stock or any other financial instrument traded on a financial
exchange.

 The successful prediction of the stock will be a great asset for the stock
market institutions and will provide real-life solutions to the problems
that stock investors face. It draws a detailed analysis of the techniques
employed in predicting BUY \ SELL stocks as well as explores the
challenges entailed along with the future scope of work in the domain.
Problem Definition

 In, this project we are going to build trading strategies, present and review a
more feasible method to predict the BUY \ SELL stock with higher accuracy.

 This paper explains the prediction of Buy \ Sell a stock using Machine
Learning & Artificial Neural Networks (ANN). The technical and fundamental
or the time series analysis is used by most of the stockbrokers while making
the stock predictions.

 During the process we will be using techniques like 101 alphas along with
various Machine Learning Algorithms.
Proposed Solution ( including EDA)
 Collected the historical Daily data (OHLC) of Nifty 50 stocks along with the
index from 1st Jan 2015 till 5th May 2022.
 Leveraged a paid API of Zerodha to fetch the data using python coding.
 Our approach towards the solution includes both conventional ML models
and using ANN. The conventional approach is based on the machine
learning algorithms namely XG Boost Classifier, Random Forest Classifier,
Gradient Boosting Classifier, Support Vector Machine classifier (SVC).

 The second approach is based on the Artificial Neural Networks (ANN).


Proposed Solution ( including EDA)
 Used imported the formulas of 101 Alphas from World Quant and
created a python program of mathematical formulae of various
operations being used by the 101 Alphas.

 Created our target variable based on the Simple Moving Average (SMA)
200, to generate recommendation for BUY, SELL, BUY HOLD, SELL HOLD
with BUY_HOLD = 1 , SELL_HOLD = 3 , SELL = 2 , BUY = 0. The prediction
of the stock will be a great asset for the stock market institutions and will
provide real-life solutions to the problems that stock investors face.
Proposed Solution ( including EDA)
 For our model building, we used historical data of Maruti and Nifty 50
index.
 Created our target variable based on the SMA 200, to generate
recommendation for BUY, SELL, BUY HOLD, SELL HOLD.
 Maruti dataset had columns namely: Open, High, Low, Close,
Recommendation and values for each of the 101 Alphas for a particular
date.
 Initially Maruti dataset had 1815 rows and 117 columns.
 Dropped the initial 200 rows as the recommendation column (our Target
variable) was blank, since SMA 200 build on the last 200 data points to
populate values thereon.
Proposed Solution ( including EDA)
 We had to do data cleansing as we had lot of cells with infinite values as
well as null values.
 Using Label Encoder, we converted our Target variable from categorical to
numerical.
 Data analysis and data visualisation to find the outliers.
 Plotted few graphs to have visual inference of
• time series data
• to view seasonality (monthly and yearly) of stock price
• to view co-variance between Maruti and Nifty 50 Index.
• to compare returns between Maruti and Nifty 50 Index.
• to view the price trend by leveraging Mann-Kendall Trend test
Algorithms, Solution and Conclusions
 Before building our model, we addressed the class imbalance of our Target
variable (Recommendation) in our Maruti Dataset by using SMOTE.
 Used used PCA for feature extraction by reducing the initial features set of
95 to 40 features.
 Build ML models by using XGboost Classifier, RandomForest Classifier,
GradientBoosting Classifier, SVC classifier.
 While comparing the performance metrics of the ML models used, the
accuracy and F1-score was highest for GradientBoosting Classifier.
 In addition built a model using Artificial Neural Network (ANN) and the
performance was much better in comparison to our best performing ML
Model i.e GradientBoosting Classifier.
Algorithms, Solution and Conclusions
 Model Creation
 
• XG Boost Classifier
• From XGBoost Library we have imported XG Boost Classifier. Model is created and trained with 70-30 train
• test split.

• XGBoost model summary


• Accuracy Score of Training Data: 0.9759547383309759
• Accuracy Score of Test Data: 0.9042904290429042
• Mean Absolute Error: 0.1806930693069307
• Precision Score: 0.90
• Recall Score: 0.91
• F1-Score: 0.90
• Accuracy Score: 0.90
 
Algorithms, Solution and Conclusions
 Model Creation
 
• Random Forest Classifier
• Model is created and trained with 70-30 train test split.

• Random Forest Classifier model summary


• Accuracy Score of Training Data: 0.9653465346534653
• Accuracy Score of Test Data: 0.9051155115511551
• Mean Absolute Error: 0.18564356435643564
• Precision Score : 0.90
• Recall Score : 0.91
• F1-Score : 0.91
• Accuracy Score : 0.91
 
Algorithms, Solution and Conclusions
 Model Creation
 
• Gradient Boosting Classifier
• Model is created and trained with 70-30 train test split.

• Gradient Boosting Classifier model summary


• Accuracy Score of Training Data: 1.0
• Accuracy Score of Test Data: 0.9298679867986799
• Mean Absolute Error: 0.14026402640264027
• Precision Score: 0.93
• Recall Score: 0.93
• F1-Score: 0.93
• Accuracy Score: 0.93
 
Algorithms, Solution and Conclusions
 Model Creation
 
• Support Vector Machine Classifier
• Model is created and trained with 70-30 train test split.

• Support Vector Machine Classifier model summary


• Accuracy Score of Training Data: 0.9105374823196606
• Accuracy Score of Test Data: 0.9034653465346535
• Mean Absolute Error: 0.17491749174917492
• Precision Score : 0.90
• Recall Score : 0.91
• F1-Score : 0.90
• Accuracy Score : 0.90
 
Algorithms, Solution and Conclusions
 Model Creation
 
• Artificial Neural Networks (ANN)
• Model is created and trained with 70-30 train test split.

• Artificial Neural Networks (ANN) model summary


• Training loss: 0.0263
• Training accuracy: 0.9914
• Validation loss: 0.1016
• Validation accuracy: 0.9823
• Test Loss: 0.173
• Test Accuracy: 0.967
 
Algorithms, Solution and Conclusions
 Comparison to benchmark
• Below is the model accuracy based on 4 target classes (BUY \ BUY HOLD \ SELL \ SELL HOLD)
Sl# Model Training Accuracy % Test Accuracy %

1 XGboost Classifier 97.59 90.42


2 RandomForest Classifier 96.53 90.51
3 GradientBoosting Classifier 100 92.98
4 SVC Classifier 91.05 90.34
5 Artificial Neural Network 99.14 96.7

• Below is the F1-score of the models for each target class


F1 - Score
Class 0 (BUY) Class 1(BUY Class 3(SELL
Sl# Model Class 2 (SELL) %
% HOLD) % HOLD) %
1 XGboost Classifier 97 84 98 82
2 RandomForest Classifier 97 84 99 83
3 GradientBoosting Classifier 99 88 99 86
5 SVC Classifier 96 85 97 83
6 Artificial Neural Network 99 95 99 94
Algorithms, Solution and Conclusions
 In real time it is very difficult to decide whether to BUY \ SELL a stock since
there are lot of parameters to be considered.
 A machine learning model (traditional ML models \ ANN) can decipher the
data patterns, and take into considerations the cyclicality, volatility and
seasonality of the data, thereby helping us in taking an informed decision
whether to BUY \ SELL a stock.
 Clean input data plays a vital role in model creation and deriving the output
from the model to get the expected business benefit.
 With 4 target classes that comprised of 92% of imbalanced data, the model
accuracy was less, F1-score for each class was very less, and model was
over-fitting.
Algorithms, Solution and Conclusions
 Had to synthesize the data using SMOTE to overcome class imbalance,
which led to better models with good accuracy scores and class level F1-
scores.
 Used PCA for feature extraction by reducing the initial features set of 95 to
40 features.
 Built ML models by using XGboost Classifier, RandomForest Classifier,
GradientBoosting Classifier, SVC classifier.
 While comparing the performance metrics of the ML models used, the
accuracy and F1-score was highest for GradientBoosting Classifier.
 With Artificial Neural Network (ANN) and the performance was much
better in comparision to our best performing ML Model i.e
GradientBoosting Classifier.
Thank You … Question please

You might also like