BTP MSR

Stock Price Trends Forecasting using Machine
Learning Models
A project report submitted in partial fulfillment of the requirements for

B.Tech. Project
B.Tech.
by
Chiranjeev Agrawal (2018IMG-021)
ABV INDIAN INSTITUTE OF INFORMATION

TECHNOLOGY AND MANAGEMENT
GWALIOR-474 010
2021
i
CANDIDATES DECLARATION
We hereby certify that the work, which is being presented in the report, entitled Stock
Price Trends Forecasting using Machine Learning Models, in partial fulfillment of
the requirement for the award of the Degree of Bachelor of Technology and submitted
to the institution is an authentic record of our own work carried out during the period
June 2021 to october 2021 under the supervision of Dr. Rajesh Rajagopal. We also
cited the reference about the text(s)/figure(s)/table(s) from where they have been taken.
Date: Signatures of the Candidates
This is to certify that the above statement made by the candidates is correct to the best
of my knowledge.
Date: Signatures of the Research Supervisors

ii
ABSTRACT
The nature of stock market movement has always been anomalous for investors because
of various influential factors. The stock market prediction refers to predicting the future
scope of the stock market.The aim of the project is to take a look at a wide range of
proclamation procedures to anticipate future stock returns upheld by past returns and
mathematical news indicators. This study aims to significantly reduce the risk of inac-
curacy in trend prediction with machine learning strategies for stock worth statements
by interpreting the chaotic market information and predicting the future value of the
financial stocks of a company. The aim of this stock market prediction project is to
use machine learning algorithms which makes predictions based on the current stock
market indices and also analysing the impacts of Novel Coronavirus outbreaks on a
particular company and also on national stock exchange index Nifty 50. Although the
stock market can never be accurately predicted due to its vast and enormous domain
this project aims at establishing a relation between chosen factors and stock prices us-
ing statistical analysis and mitigate the risks.
Keywords: stock price, risk mitigation, machine learning, future scope, indicators.
iii
ACKNOWLEDGEMENTS
We are highly indebted to Dr. Rajesh Rajagopal, and are obliged for giving us the
autonomy of functioning and experimenting with ideas. We would like to take this
opportunity to express our profound gratitude to them not only for their academic guid-
ance but also for their personal interest in our project and constant support coupled with
confidence boosting and motivating sessions which proved very fruitful and were in-
strumental in infusing self-assurance and trust within us. The nurturing and blossoming
of the present work is mainly due to their valuable guidance, suggestions, astute judg-
ment, constructive criticism and an eye for perfection. Our mentor always answered
myriad of our doubts with smiling graciousness and prodigious patience, never letting
us feel that we are novices by always lending an ear to our views, appreciating and
improving them and by giving us a free hand in our project. It’s only because of their
overwhelming interest and helpful attitude, the present work has attained the stage it
has.
Finally, we are grateful to our Institution and colleagues whose constant encouragement
served to renew our spirit, refocus our attention and energy and helped us in carrying
out this work.
(Chiranjeev Agrawal)
TABLE OF CONTENTS
ABSTRACT ii
LIST OF TABLES v
LIST OF FIGURES vi
1 INTRODUCTION AND LITERATURE SURVEY ix

1.1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
1.1.1 Utility of Machine Learning in Stock trend forecasting . . . . . x
1.1.1.1 Related Definitions . . . . . . . . . . . . . . . . . . x
1.1.2 Data collection . . . . . . . . . . . . . . . . . . . . . . . . . . x
1.1.3 Supervised Learning . . . . . . . . . . . . . . . . . . . . . . . xi
1.1.3.1 Classification Methods . . . . . . . . . . . . . . . . xi
1.1.3.2 Regression strategies . . . . . . . . . . . . . . . . . xi
1.1.4 Support Vector Machines . . . . . . . . . . . . . . . . . . . . xi
1.1.4.1 Linear SVM . . . . . . . . . . . . . . . . . . . . . . xii
1.1.4.2 Non Linear SVM . . . . . . . . . . . . . . . . . . . xii
1.2 MOTIVATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii
1.3 LITERATURE REVIEW . . . . . . . . . . . . . . . . . . . . . . . . . xii
1.4 THESIS OBJECTIVES AND DELIVERABLES . . . . . . . . . . . . xiv
2 SYSTEM ARCHITECTURE AND METHODOLOGY xv

2.1 SYSTEM OVERVIEW AND ARCHITECTURE . . . . . . . . . . . . xv
2.1.1 Data Extraction and Preprocessing . . . . . . . . . . . . . . . . xv
2.1.2 Feature Selection . . . . . . . . . . . . . . . . . . . . . . . . . xvii
2.1.3 Analysing various supervised learning methods . . . . . . . . . xvii
2.1.4 Analysing the effect of covid 19 pandemic on Stock market . . xvii
2.1.5 Credit Assignment Problem . . . . . . . . . . . . . . . . . . . xviii
2.1.6 Systeem Architecture . . . . . . . . . . . . . . . . . . . . . . . xviii
3 PROGRESS MADE SO FAR xix

3.1 DATA COLLECTION AND NORMALISATION . . . . . . . . . . . . xix
iv
TABLE OF CONTENTS v
3.2 SYSTEM EFFECTIVENESS . . . . . . . . . . . . . . . . . . . . . . . xx

3.2.0.1 Root Mean Square Error . . . . . . . . . . . . . . . . xx
3.2.1 Solution Roadmap . . . . . . . . . . . . . . . . . . . . . . . . xxi
3.2.2 Nifty stock predictions considering Covid-19 effects: . . . . . . xxii
3.2.3 R-Squared worth . . . . . . . . . . . . . . . . . . . . . . . . . xxii
4 TASKS TO BE COMPLETED xxviii
5 GANTT CHART xxix
6 REFERENCES xxx
REFERENCES xxx
LIST OF TABLES
3.1 RMSE and R-squared values for different regressors . . . . . . . . . . . xxiii
vi
LIST OF FIGURES
2.1 System Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv
3.1 First 15 rows of our Dataset . . . . . . . . . . . . . . . . . . . . . . . . xx

3.2 RMSE value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxi
3.3 RMSE value: 1.43254343-07 , R-squared value: 0.956669 . . . . . . . xxiii
3.4 RMSE value: 1.329966e-07 , R-squared value: 0.959771 . . . . . . . . xxiv
3.5 RMSE value: 2.988297e-07 , R-squared value: 0.909611 . . . . . . . . xxv
3.6 RMSE value: -117.01176 , R-squared value: -117.01176 . . . . . . . . xxvi
3.7 RMSE value: 1.274547e-07 , R-squared value: 0.961448 . . . . . . . . xxvii
5.1 Gantt Chart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxix
vii
LIST OF FIGURES viii
ABBREVIATIONS
Linear Linear support Vector Machine

SVM
KNN K Nearest Neighbour
MTL Multiple Task Learning
RMSE Root Mean Square Error
LW Low
MD Medium
HG High
SL Slow
FS Fast
LS Less
AG Average
CHAPTER 1
INTRODUCTION AND
LITERATURE SURVEY
Being an inevitable part of any country’s economy the stock market plays an important
role in the growth of the business with eventually effects the economy of that coun-
try.Every concerned investor in stock market must be aware of whether or not the stock
prices may rise or go over a particular period of time.Although the share market can
never be accurately predicted due to its vast and enormous domain this project aims at
applying machine learning techniques on stock indicators 24 cast stock prices.
Using the applied statistical analysis we can establish relation between the factors and
share worth which will facilitate in forecasting correct results. This study aims to
significantly reduce the risk of inaccuracy in trend prediction with machine learning
strategies for stock worth statements by interpreting the chaotic market information
and predicting the future value of the financial stocks of a company.
1.1 INTRODUCTION
Higher the demand of companies stock its corporate worth of share increases and if
the demand of companies stock is less then it’s corporate price will also decrease.The
current trend in stock market prediction Technologies is the use of various machine
learning models in order to make prediction based on stock indices bi training on the
previous data.
The investors are well aware of the overwhelming nature of stock market and because
of its versatility and unpredictability the prediction techniques has always been appre-
ciated bye financial analysts business tycoons brokers and researchers.Therefore, It is
necessary to build a system in order to maximize accuracy considering all the important
factors which may influence the result.
ix
CHAPTER 1. INTRODUCTION AND LITERATURE SURVEY x
1.1.1 Utility of Machine Learning in Stock trend forecasting

Only Machine Learning techniques can be used to relate previous data and the current
data in order to train the machine to learn from the past data and make appropriate
future assumptions. We will try to incorporate supervised learning methods for stock
market trend forecasting and build a system in order to maximize accuracy considering
all the important factors which may influence the result.
1.1.1.1 Related Definitions
(i) Data pre-processing: It refers to a series of steps that our acquired data has to
move through in order to convert it into readable format and generate meaning-
ful information. It involves removing the unnecessary data, corrupted data and
missing values.
(ii) Training and testing Dataset: After Remodeling the data set into a clean data
set, it is then divided into training set and testing set. Most recent values are
included in training set and testing set consists of approximately 10 percent of the
total dataset.
(iii) Supervised Learning: It is a sub category of machine learning and artificial

intelligence uses labelled data sets to train algorithms in order to classify data
and predict accurate outcomes.
(iv) Data Normalization: For better accuracy the extracted data needs to be nor-
malised thereby ensuring that all the factors are not given exceptionally high for
exceptionally low weightage.
1.1.2 Data collection

The Data collection regarding our research comes under four pillars:
(i) Open: Opening Price of stocks on a particular day.
(ii) Close: Closing pricee of stock on a particular day.
(iii) High: Highest Price recorded for that stock on that particular day.
(iv) Low: Lowest price recorded for that stock on that particular day.
(v) Shares Traded: Number of shares bought or sold on a particular date.
(vi) Turnover: Total currency exchange for that stock on that day.
CHAPTER 1. INTRODUCTION AND LITERATURE SURVEY xi
1.1.3 Supervised Learning

Supervised mastering (sl) is the machine getting to know venture of learning a charac-
teristic that maps an enter to an output based on instance input-output pairs. It infers
a feature from categorized schooling information together with a set of schooling ex-
amples. In supervised getting to know, every example is a couple which includes an
input item (normally a vector) and a desired output price (also known as the supervisory
signal). A supervised studying set of rules analyzes the schooling facts and produces
an inferred feature, which may be used for mapping new examples. An most effective
situation will permit for the set of rules to properly decide the elegance labels for un-
seen times. This requires the gaining knowledge of set of rules to generalize from the
schooling data to unseen situations in a "reasonable" way (see inductive bias). This
statistical satisfactory of an algorithm is measured through the so-called generalization
error.
1.1.3.1 Classification Methods
Supervised classification methods consists of:-

1. Support Vector Machines
2. Neural Networks
3. Naive Bayes
4. Decision Trees
5. Random Forest etc.
1.1.3.2 Regression strategies
Consists of Supervised Regression strategies like:

1. Linear Reegressions
2. Support Vector Regressions
3. Logistic Regressions
4. Decision Trees
1.1.4 Support Vector Machines

Powerful and flexible supervised machine learning algorithms that can be used for both
classification and regression mainly used for classification problems. They are famous
for their unique way of implementation with respect to other machine learning algo-
rithms. In today’s world they are extremely popular due to their capability of handling
multiple continuous variables. Predicting Stock price movements is kind of difficult.
CHAPTER 1. INTRODUCTION AND LITERATURE SURVEY xii
Moreover, in step with further investigations, movements in market costs don’t seem
to be random. Rather, they behave in a very extremely non-linear, dynamic manner.
Support vector machines (SVM) could be a terribly specific sort of learning algorithm
characterized by the capability management of the choice performed and the utilization
of the kernel functions. SVM can be of Two types:
1.1.4.1 Linear SVM
It is used for Linearly seperable data. It means that if a data is classifiable into two sub-
classes, is called as linearly seperable data and the classifier used is known as linearly
SVM classifier.
1.1.4.2 Non Linear SVM
If the data can not be differentiated or partitioned using a sstraight line, the data is
called as non linearly seperable data, and the classifier used is called as non linear
SVM classifier.
1.2 MOTIVATION
The variation of stock prices plays a very important role in our business and in many
ways it directly e effects the economy of the country which may indirectly e effect the
ease of Living and ease of doing business of common people which makes this research
severely important.A correct prediction of stocks can lead to huge profits for the seller
and the broker. Machine learning predicts a market value close to the tangible value,
thereby increasing the accuracy. Stocks are affected by many social factors among
which the major attraction is the study and analysis of impact of COVID 19 pandemic
on stock indexes and company’s stock rates.
1.3 LITERATURE REVIEW

(i) Machine learning algorithms and stock predictions: Stock market predictions
have long been related to interesting topics and are readable by investigators from
completely different places.As cited in ps://ieeexplore.ieee.org/document/9154061Prediction
of Stock Prices using Machine Learning (Regression, Classification) Algorithms,
machine learning, a well-designed algorithm for a wide selection of applications,
has read more about its power in predicting financial markets. hot algorithms, in
addition such as vector support (SVM) machines and learning enhancements are
reported to be very helpful in tracking the security market and assisting increase
the profitability of a potential buy while keeping the risk low.
CHAPTER 1. INTRODUCTION AND LITERATURE SURVEY xiii
(ii) Use of Support Vector Machines: Powerful and flexible supervised machine
learning algorithms that can be used for both classification and regression mainly
used for classification problems. They are famous for their unique way of im-
plementation with respect to other machine learning algorithms. In today’s world
they are extremely popular due to their capability of handling multiple continu-
ous variables. Predicting Stock price movements is kind of difficult. Moreover,
in step with further investigations, movements in market costs don’t seem to be
random. Rather, they behave in a very extremely non-linear, dynamic manner.
Support vector machines (SVM) could be a terribly specific sort of learning al-
gorithm characterized by the capability management of the choice performed and
the utilization of the kernel functions. Some previous works on the same are:
https://ieeexplore.ieee.org/abstract/document/9392366.
(iii) Non linear multi factor models: We incorporate deployment of multilayer feed-
forward neural networks for predicting a stock’s excess come back supporting its
exposure to various technical and elementary factors and also well demonstrated
previously in https://ieeexplore.ieee.org/document/8697278/. The effectiveness
of the approach a qualified portfolio that consists of equally capitalized long and
short positions is made and its historical returns square measure benchmarking
against Treasury obligations returns and also the Nifty 50 index.
(iv) Multiple Task Learning: It is an underlying machine learning environment where

multiple learning tasks are solved simultaneously, while exploiting similarities
and differences across all tasks. This can lead to improved learning performance
and predictive accuracy of work-related models, compared to training models sep-
arately. MTL aims to solve different tasks simultaneously, by using similarities
between different tasks. This can improve learning efficiency and work as a stan-
dard. A similar work https://ieeexplore.ieee.org/abstract/document/9392366 has
relatably demonstrated that this will incorporate varied sorts of informative vari-
ables: questionable technical variables, micro-economic stock-specific variables
(such as measures of company profitability), and macro-economic variables.
(v) Random Forest: It is better to use multiple trees for covering huge sized data sets
rather than using just one single decision tree which may lead to an overfit model.
In random forest we therefore build multiple decision trees and merge them to-
gether for more accurate and stable prediction. The generalized error of forest de-
pends directly upon the strength of individual trees.The work https://ieeexplore.ieee.org/document/737
have demonstrated it to the point.
(vi) Forecasting stock indices: Not withstanding extensive evaluation that focuses
on estimating the volume of come on exchange index, there’s a scarcity of re-
CHAPTER 1. INTRODUCTION AND LITERATURE SURVEY xiv
search inspecting the certainty of the path of stock marketplace index movement.
Given the belief that a prediction with little forecast error doesn’t essentially trans-
forms into economic benefit. Particularly, we will conduct statistical comparisons
among the two types .
(vii) Covid1-19 Pandemic and Corporate world: We discover that the corona virus
pandemic-brought inventory expenses and effects have been milder among agen-
cies with Stronger pre-2021 budget, less hindered to covid-19 through interna-
tional offer chains and purchaser locations, extra CSR activities, and much less
entrenched executives. What is more, the stock prices of organizations with larger
hedge fund ownership Completed worse, and people of companies with larger
non-financial enterprise ownership performed better. Some relatable works in-
clude : https://ieeexplore.ieee.org/document/9378030Stock Price Prediction Un-
der Anomalous Circumstances
1.4 THESIS OBJECTIVES AND DELIVERABLES

The nature of stock market movement has always been anomalous for investors be-
cause of various influential factors. Therefore, the main objective of the project is to
look at a variety of various factors to predict future stock returns based on previous
trends and numerical indicators in order to minimize the risk incurred due to inaccu-
racy. Analysing stock market trends using machine learning models thereby decoding
the apparently deep and chaotic market knowledge along with investigating the impact
of COVID 19 pandemic on a particular company is our main objective.
1. To analyze state-of-the-art machine learning models used for predicting stock mar-
ket.
2. To propose a more accurate machine learning model for predicting stock market.
3. To analyse the impact of covid 19 outbreak on the stock market.
4. To reduce uncertainty associated with investment decision making in order to differ-
entiate between traditional and risky stocks.
5. To reduce the dilemma of investors especially the ones with very less experience of
stock market.
The aim of the project is to take a look at a wide range of proclamation procedures
to anticipate future stock returns upheld by past returns and mathematical news indica-
tors.
CHAPTER 2
SYSTEM ARCHITECTURE AND

METHODOLOGY
2.1 SYSTEM OVERVIEW AND ARCHITECTURE

Diagram :figure
Figure 2.1: System Overview
2.1.1 Data Extraction and Preprocessing

Data preprocessing is the process of converting raw data into the format which is fit
for machine learning model. Data preprocessing is the first and most important step
while creating a machine learning model as also mentioned in Figure 2.1. The raw data
xv
CHAPTER 2. SYSTEM ARCHITECTURE AND METHODOLOGY xvi
picked up from various authentic sources may contain noises and missing values which
may be unacceptable or a format which cannot be directly used for machine learning
models therefore it is required to clean the data and make it suitable for use. Data pre-
processing also increases the efficiency and accuracy of machine learning model.
Steps involved in Data pre processing are as follows:
(i) Data Cleaning:

The data set may contain irrelevant and missing parts. to handle this :
1. Ignore the tuples especially when the data set is used with multiple values
missing within a tuple.
2. We can fill the missing values manually by using mean value for the most
probable value.
(ii) Data transformation:

This includes four major steps:
1. Data Normalisation: In order to scale the data values in a specified range.
2. Attribute Selection: New attributes are constructed.
3. Discretisation: is required to replace the row values of numeric attribute by
interval levels.
4. Data Integration: Combining of Data files.
(iii) Training set and Testing set:

Splittting the dataset into testing and training dataset is a crucial step in line with
increasing the performance of a machine learning model. When we already know
the output of our the end result of the data set. And when the data set used to train
a machine learning model it is called Training set.
When we use a data set to actually predict the outcome using are trained machine
learning model the data set is called Test set.
(iv) Feature scaling:

In order to ensure that no variable dominates the other we scale are variables in
a standard range. It is the last and final step of data processing which focuses
on standardizing the the independent variables of data set into a standard range
uniform to whole data set.
CHAPTER 2. SYSTEM ARCHITECTURE AND METHODOLOGY xvii
2.1.2 Feature Selection

Data set being used for building machine learning model may contain several features
of which not all may be important all the time. Unnecessary features while training the
model may result in in reduced overall accuracy, increased complexity and decreased
generalization capability which in turn may result in biased model. As mentioned in
Figure 2.1, the ultimate goal of this step is to find the best possible features set to build
are machine learning model.We aim at using the SelectKBest Algorithm, with f regres-
sion for analysis.
This is executed in three steps:
1. We started with a constant machine learning model.
2. Then We tried all models M1 consisting of just a single feature and choose the best
according to the F statistic.
3. After that We try all models M2 including M1 in addition to one other feature and
pick the best. An F-test could be an approach of comparing the importance of the de-
velopment of a model, with relation to the addition of the latest variables.
2.1.3 Analysing various supervised learning methods

Being a sub category of machine learning and artificial intelligence uses labelled data
sets to train algorithms in order to classify data and predict accurate outcomes, follow-
ing the Figure 2.1.
(i) Classification methods: involve Support Vector Machines, Neural Networks,

Naive Bayes, Decision Trees, Random Forest etc.
(ii) Regression methods: Regression methods consists of Linear regression, Support

vector regression, Logistic regression, Decision trees etc.
2.1.4 Analysing the effect of covid 19 pandemic on Stock market

Firstly, we considering the data of stock market of nifty 50 index over a few months
starting from January till June for analysis.
Secondly, we import the necessary Python libraries and with the help of those libraries
perform data normalisation of the required features.
Lastly, We analyse the cumulative returns on nifty 50 index over a past few months.
We then calculated the drawdown that is the maximum downside risk. Thereafter, we
start analyzing the google search trend results in India during COVID-19 which results
in giving the important conclusion that it is the public sentiment that had the most effect
on the market prices.
CHAPTER 2. SYSTEM ARCHITECTURE AND METHODOLOGY xviii
2.1.5 Credit Assignment Problem

In this step various methods of data collection are assigned suitable credits. Computa-
tion of the values that would be assigned as weights comes under training algorithm.
2.1.6 Systeem Architecture

(i) Data Collection:
The whole Nifty data set can be collected from official website of National Stock
Exchange date wise.Other useful data on Nestle India wide range of different
sources.
(ii) Result Computation:

Result computation using various supervised machine learning methods in order
to achieve the goal of stock market prediction. Respective ranking of various
models with respect to their accuracy is necessary for drawing final conclusions.
(iii) System Updation:

With the incorporation of new data, increment in the number of methods, and with
the advent of incorporation of new features; the results, accuracy and performance
of the system may change. Hence, updation must be done.
(iv) Company stock:

The inclusion of study of Nestle India ltd. (a Nifty listed company) fluctuations
and trends in the same time period.
CHAPTER 3
PROGRESS MADE SO FAR
3.1 DATA COLLECTION AND NORMALISATION

In order to scale the data values in a specified range. attribute selection new attributes
are constructed. Discretion is required to replace the row values of numeric attributes
by interval levels, called as Data Normalisation.For better accuracy the extracted data
needs to be nor-malised thereby ensuring that all the factors are not given exceptionally
high for exceptionally low weightage. We have taken Historical Index data of Nifty 50
from nseindia website. The four major pillars of our dataset are as follows:
(i) Open: Opening Price of stocks on a particular day.
(ii) Close: Closing pricee of stock on a particular day.
(iii) High: Highest Price recorded for that stock on that particular day.
(iv) Low: Lowest price recorded for that stock on that particular day.
(v) Shares Traded: Number of shares bought or sold on a particular date.
(vi) Turnover: Total currency exchange for that stock on that day.
NIFTY 50 is NSE’s diversified index which consists of stocks from top 50 Indian
companies across 14 sectors. It’s main function is to track the market performance of
the largest cap companies hence, it widely reflects the Indian economy.
The top rows of our data set are as shown in Figure 3.1 which is a direct screenshot
from the nifty index:
xix
CHAPTER 3. PROGRESS MADE SO FAR xx
Figure 3.1: First 15 rows of our Dataset
3.2 SYSTEM EFFECTIVENESS

We will consider mainly two factors to analyse the efficiency and effectiveness of the
system;
1. Root Mean Square Error 2. R Squared
3.2.0.1 Root Mean Square Error

CHAPTER 3. PROGRESS MADE SO FAR xxi
Figure 3.2: RMSE value
3.2.1 Solution Roadmap

(i) Data Preprocessing and Cleaning:
Data preprocessing is the process of converting raw data into the format which
is fit formachine learning model.The data set may contain irrelevant noises and
missing values which may be unacceptable or a formatwhich cannot be directly
used for machine learning models therefore it is required toclean the data and
make it suitable for use.
(ii) Feature Extraction:

Data set being used for building machine learning model may contain several
features of which not all may be important all the time. Unnecessary features
while training the model may result in in reduced overall accuracy, increased
complexity and decreased generalization capability which in turn may result in
biased model.
(iii) Data Normalization:

In order to scale the data values in a specified range. attribute selection new
CHAPTER 3. PROGRESS MADE SO FAR xxii
attributesare constructed. Discretion is required to replace the row values of nu-

meric attributesby interval levels, called as Data Normalisation.For better accu-
racy the extracted dataneeds to be nor-malised thereby ensuring that all the factors
are not given exceptionallyhigh for exceptionally low weightage.
(iv) Analysing supervised learning methods:

Supervised classification methods consists of:-
1. Support Vector Machines
2. Neural Networks
3. Naive Bayes
4. Decision Trees
5. Random Forest etc.
Supervised regression strategies consists of Supervised Regression strategies like:
1. Linear Reegressions
2. Support Vector Regressions
3. Logistic Regressions
4. Decision Treesetc.
(v) Credit Assignment: It includes the allocating of proper weightage to various

ways utilized for information assortment.
(vi) Analysis of Various Models: Comparison between the different techniques and
models executed over the dataset is acted in this stage.
3.2.2 Nifty stock predictions considering Covid-19 effects:

We have analyzed the cumulative returns of the bully index over the past six months(from
1 Jan 2020). We’ve taken into consideration the ‘Close’ column of the dataset info
frame to calculate the returns. The operation pctc hange()willgivethedailyproportionchangeatintervalsthete
50index.Wetendtoclearlyascertainthatthemarketwasmostvolatilethroughoutthemontho f Marchandmostlo
0.263, that0 sapproximatelyatwenty − sixpercentlossinonemonth.
3.2.3 R-Squared worth

The worth of R2 can vary between zero and one, and so the upper it’s definitely worth
the extra correct the regression model is as a result of the additional variability is ex-
plained by the straightforward regression model. R square price indicates the propor-
tionate amount of variation within the response variable explained by the freelance
variables. R-squared may well be maths live of but shut the knowledge unit to the fitted
curve. It’s to be known as the constant of determination, or the constant of multiple
CHAPTER 3. PROGRESS MADE SO FAR xxiii
Table 3.1: RMSE and R-squared values for different regressors

Algorithm RMSE value R-squared values
Random Forest Regressor 1.43254343-07 0.956669
Bagging Regressor 1.329966e-07 0.959771
Adaboost Regressor 2.988297e-07 0.909611
K-Neighbors Regressor 0.00039015 -117.01176
Gradient Boosting 1.274547e-07 0.961448
determination for statistical method.
Random Forest Regressor:
Figure 3.3: RMSE value: 1.43254343-07 , R-squared value: 0.956669

CHAPTER 3. PROGRESS MADE SO FAR xxiv
Bagging Regressor:
Figure 3.4: RMSE value: 1.329966e-07 , R-squared value: 0.959771

CHAPTER 3. PROGRESS MADE SO FAR xxv
Adaboost Regressor:

CHAPTER 3. PROGRESS MADE SO FAR xxvi
K Nearest Regressor:
Figure 3.6: RMSE value: -117.01176 , R-squared value: -117.01176

CHAPTER 3. PROGRESS MADE SO FAR xxvii
Gradient Boosting:

CHAPTER 4
TASKS TO BE COMPLETED
(a) We will focus on gaining additional data of stock index as until now we have fo-
cused solely on first six months of the Year. Further, in line with the same objective
will try to incorporate the stock data sets of next few months in order to apply
completely different machine learning models on the data set for prediction.
(b) We will analyse various supervised learning models for prediction and comparision
including the newly incorporated methods. The will then be ranked according to
their performance and accuracy of prediction.
(c) Incorporation of Social Media sentiment as a new factor that influence the stock
trends.
(d) We will analyse the effect of Corona virus pandemic followed by Lockdown on the
Nifty index and will make stock price prediction.
(e) Make conclusion based on the trends followed by Nestle India Ltd.
(f) We will also try to realise the utility and future scope of our research in practical
world.
xxviii
CHAPTER 5
GANTT CHART
Figure 5.1: Gantt Chart
xxix
CHAPTER 6
REFERENCES
[1] https://ieeexplore.ieee.org/document/9154061Prediction of Stock Prices using Ma-

chine Learning (Regression, Classification) Algorithms
[2] https://ieeexplore.ieee.org/document/8473214A Comparative Study of Supervised

Machine Learning Algorithms for Stock Market Trend Prediction
[3] https://ieeexplore.ieee.org/document/9390845Predicting Stock Closing Price Af-

ter COVID-19 Based on Sentiment Analysis, Published in 5th IAEAC
[4] https://ieeexplore.ieee.org/document/8703332Stock Market Prediction Using Ma-

chine Learning, published in ICSCCC
[5] https://ieeexplore.ieee.org/document/9378030Stock Price Prediction Under Anoma-

lous Circumstances
[6] https://ieeexplore.ieee.org/document/9285163Machine Learning for Predicting Stock

Market Movements
[7] https://ieeexplore.ieee.org/document/9154061Prediction of Stock Prices using Ma-

chine Learning (Regression, Classification) Algorithms
[8] https://ieeexplore.ieee.org/document/8473214A Comparative Study of Supervised

Machine Learning Algorithms for Stock Market Trend Prediction
xxx

BTP MSR

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

BTP MSR

Uploaded by

Copyright:

Available Formats

Stock Price Trends Forecasting using Machine

A project report submitted in partial fulfillment of the requirements for

Chiranjeev Agrawal (2018IMG-021)

ABV INDIAN INSTITUTE OF INFORMATION

Date: Signatures of the Candidates

Date: Signatures of the Research Supervisors

1 INTRODUCTION AND LITERATURE SURVEY ix

2 SYSTEM ARCHITECTURE AND METHODOLOGY xv

3 PROGRESS MADE SO FAR xix

3.2 SYSTEM EFFECTIVENESS . . . . . . . . . . . . . . . . . . . . . . . xx

4 TASKS TO BE COMPLETED xxviii

5 GANTT CHART xxix

3.1 RMSE and R-squared values for different regressors . . . . . . . . . . . xxiii

2.1 System Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv

3.1 First 15 rows of our Dataset . . . . . . . . . . . . . . . . . . . . . . . . xx

5.1 Gantt Chart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxix

Linear Linear support Vector Machine

1.1.1 Utility of Machine Learning in Stock trend forecasting

1.1.1.1 Related Definitions

(iii) Supervised Learning: It is a sub category of machine learning and artificial

1.1.2 Data collection

(i) Open: Opening Price of stocks on a particular day.

(ii) Close: Closing pricee of stock on a particular day.

(v) Shares Traded: Number of shares bought or sold on a particular date.

1.1.3 Supervised Learning

1.1.3.1 Classification Methods

Supervised classification methods consists of:-

1.1.3.2 Regression strategies

Consists of Supervised Regression strategies like:

1.1.4 Support Vector Machines

1.1.4.1 Linear SVM

1.1.4.2 Non Linear SVM

1.3 LITERATURE REVIEW

(iv) Multiple Task Learning: It is an underlying machine learning environment where

1.4 THESIS OBJECTIVES AND DELIVERABLES

SYSTEM ARCHITECTURE AND

2.1 SYSTEM OVERVIEW AND ARCHITECTURE

Figure 2.1: System Overview

2.1.1 Data Extraction and Preprocessing

(i) Data Cleaning:

(ii) Data transformation:

(iii) Training set and Testing set:

(iv) Feature scaling:

2.1.2 Feature Selection

2.1.3 Analysing various supervised learning methods

(i) Classification methods: involve Support Vector Machines, Neural Networks,

(ii) Regression methods: Regression methods consists of Linear regression, Support

2.1.4 Analysing the effect of covid 19 pandemic on Stock market

2.1.5 Credit Assignment Problem

2.1.6 Systeem Architecture

(ii) Result Computation:

(iii) System Updation:

(iv) Company stock:

PROGRESS MADE SO FAR

3.1 DATA COLLECTION AND NORMALISATION

(i) Open: Opening Price of stocks on a particular day.

(ii) Close: Closing pricee of stock on a particular day.

(v) Shares Traded: Number of shares bought or sold on a particular date.

Figure 3.1: First 15 rows of our Dataset

3.2 SYSTEM EFFECTIVENESS

3.2.0.1 Root Mean Square Error

Figure 3.2: RMSE value

3.2.1 Solution Roadmap