Professional Documents
Culture Documents
Stock Market Prediction: Hrithik D B181070PE
Stock Market Prediction: Hrithik D B181070PE
Hrithik D B181070PE
Abstract
The context studies about the different indicators that gives an in depth of stock market and
run these indicators on different models to predict its price. For the prediction we cover
different models to find its efficiency on real life scenarios. For finding the efficiency we use
data from the National Stock Exchange (NSE).
Introduction:
A stock market is a public market where you can buy and sell shares for publicly listed
companies. The stocks, also known as equities, represent ownership in the company. The
stock exchange is the mediator that allows the buying and selling of shares. This project
gives an insight on the future value of company stock and other financial assets traded on
an exchange. The entire idea of predicting stock prices is to gain significant profits. There are
other factors involved in the prediction, such as physical and psychological factors, rational
and irrational behaviour, and so on. All these factors combine to make share prices dynamic
and volatile. To perform the analysis the data is taken from the NSE official website, thus
gives the maximum reliability to it. For predicting these results, we use random forest
model, KNN model, Gradient boosting classifier. From these models we predict the model
which has the maximum efficiency and precision to the truth value.
Methods:
Since the main objective of this study is to find maximum precision and accuracy of different
regression models, we used the following models:
1. KNN Model –The K-Nearest Neighbour algorithm is based on the Supervised Learning
technique and is one of the most basic Machine Learning algorithms. The K-NN
method assumes that the new case/data and existing cases are similar and places
the new case in the category that is most similar to the existing categories. The K-NN
method stores all available data and classifies a new data point based on its
similarity to the existing data. This means that new data can be quickly sorted into a
well-defined category using the K-NN method. The K-NN approach can be used for
both regression and classification, but it is more commonly utilised for classification
tasks. The K-NN algorithm is a non-parametric algorithm, which means it makes no
assumptions about the underlying data. It is also called a lazy learner
algorithm because it does not learn from the training set immediately instead it
stores the dataset and at the time of classification, it performs an action on the
dataset. KNN algorithm at the training phase just stores the dataset and when it gets
new data, then it classifies that data into a category that is much similar to the new
data. It's also known as a lazy learner algorithm since it doesn't learn from the
training set right away; instead, it saves the dataset and performs an action on it
when it comes time to classify it. During the training phase, the KNN algorithm simply
stores the dataset, and when it receives new data, it classifies it into a category that
is quite similar to the new data.
2. Random Forest Model – It is a supervised machine learning algorithm that is
commonly used to solve classification and regression problems. It creates decision
trees from various samples, using the majority vote for classification and the average
for regression. One of the most essential characteristics of the Random Forest
Algorithm is that it can handle data sets with both continuous and categorical
variables, as in regression and classification. For classification difficulties, it produces
superior results. In this there are 2 methods – Bagging and Boosting. Bagging creates
a different training subset from sample training data with replacement & the final
output is based on majority voting and boosting combines weak learners into strong
learners by creating sequential models such that the final model has the highest
accuracy.
3. Gradient Boosting Classifier - Gradient boosting algorithm is one of the most
powerful algorithms in the field of machine learning. As we know that the errors in
machine learning algorithms are broadly classified into two categories i.e., Bias Error
and Variance Error. As gradient boosting is one of the boosting algorithms it is used
to minimize bias error of the model. Unlike, Ada boosting algorithm, the base
estimator in the gradient boosting algorithm cannot be mentioned by us. The base
estimator for the Gradient Boost algorithm is fixed and i.e., Decision Stump. Like,
AdaBoost, we can tune the n_estimator of the gradient boosting algorithm. However,
if we do not mention the value of n_estimator, the default value of n_estimator for
this algorithm is 100. Gradient boosting algorithm can be used for predicting not only
continuous target variable (as a Regressor) but also categorical target variable (as a
Classifier). When it is used as a regressor, the cost function is Mean Square Error
(MSE) and when it is used as a classifier then the cost function is Log loss.
In order to check the precise and accuracy we consider only the 80% of the total data of
the analysis and run through each of the above-mentioned models to predict the
remaining 20% data. To achieve maximum precision and accuracy we also considered
different financial indicators which is used by different stock market analyst throughout
the globe.
3. Results
• Random Forest Models
• KNN Model
• Gradient Boosting Classifier
• Ensembled
Conclusion:
From the results shown above it is concluded that the combined prediction gives the maximum
precision of 88% and accuracy of 86%. Even though this gives a good prediction stock market is still a
volatile market and needs a financial advisors support for assistance.
References:
[1] https://www.nseindia.com/
[2] https://finance.yahoo.com/
[3] https://www.kaggle.com