Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

Stock Market Prediction

Hrithik D B181070PE

Abstract
The context studies about the different indicators that gives an in depth of stock market and
run these indicators on different models to predict its price. For the prediction we cover
different models to find its efficiency on real life scenarios. For finding the efficiency we use
data from the National Stock Exchange (NSE).

Introduction:
A stock market is a public market where you can buy and sell shares for publicly listed
companies. The stocks, also known as equities, represent ownership in the company. The
stock exchange is the mediator that allows the buying and selling of shares. This project
gives an insight on the future value of company stock and other financial assets traded on
an exchange. The entire idea of predicting stock prices is to gain significant profits. There are
other factors involved in the prediction, such as physical and psychological factors, rational
and irrational behaviour, and so on. All these factors combine to make share prices dynamic
and volatile. To perform the analysis the data is taken from the NSE official website, thus
gives the maximum reliability to it. For predicting these results, we use random forest
model, KNN model, Gradient boosting classifier. From these models we predict the model
which has the maximum efficiency and precision to the truth value.

Methods:
Since the main objective of this study is to find maximum precision and accuracy of different
regression models, we used the following models:
1. KNN Model –The K-Nearest Neighbour algorithm is based on the Supervised Learning
technique and is one of the most basic Machine Learning algorithms. The K-NN
method assumes that the new case/data and existing cases are similar and places
the new case in the category that is most similar to the existing categories. The K-NN
method stores all available data and classifies a new data point based on its
similarity to the existing data. This means that new data can be quickly sorted into a
well-defined category using the K-NN method. The K-NN approach can be used for
both regression and classification, but it is more commonly utilised for classification
tasks. The K-NN algorithm is a non-parametric algorithm, which means it makes no
assumptions about the underlying data. It is also called a lazy learner
algorithm because it does not learn from the training set immediately instead it
stores the dataset and at the time of classification, it performs an action on the
dataset. KNN algorithm at the training phase just stores the dataset and when it gets
new data, then it classifies that data into a category that is much similar to the new
data. It's also known as a lazy learner algorithm since it doesn't learn from the
training set right away; instead, it saves the dataset and performs an action on it
when it comes time to classify it. During the training phase, the KNN algorithm simply
stores the dataset, and when it receives new data, it classifies it into a category that
is quite similar to the new data.
2. Random Forest Model – It is a supervised machine learning algorithm that is
commonly used to solve classification and regression problems. It creates decision
trees from various samples, using the majority vote for classification and the average
for regression. One of the most essential characteristics of the Random Forest
Algorithm is that it can handle data sets with both continuous and categorical
variables, as in regression and classification. For classification difficulties, it produces
superior results. In this there are 2 methods – Bagging and Boosting. Bagging creates
a different training subset from sample training data with replacement & the final
output is based on majority voting and boosting combines weak learners into strong
learners by creating sequential models such that the final model has the highest
accuracy.
3. Gradient Boosting Classifier - Gradient boosting algorithm is one of the most
powerful algorithms in the field of machine learning. As we know that the errors in
machine learning algorithms are broadly classified into two categories i.e., Bias Error
and Variance Error. As gradient boosting is one of the boosting algorithms it is used
to minimize bias error of the model. Unlike, Ada boosting algorithm, the base
estimator in the gradient boosting algorithm cannot be mentioned by us. The base
estimator for the Gradient Boost algorithm is fixed and i.e., Decision Stump. Like,
AdaBoost, we can tune the n_estimator of the gradient boosting algorithm. However,
if we do not mention the value of n_estimator, the default value of n_estimator for
this algorithm is 100. Gradient boosting algorithm can be used for predicting not only
continuous target variable (as a Regressor) but also categorical target variable (as a
Classifier). When it is used as a regressor, the cost function is Mean Square Error
(MSE) and when it is used as a classifier then the cost function is Log loss.
In order to check the precise and accuracy we consider only the 80% of the total data of
the analysis and run through each of the above-mentioned models to predict the
remaining 20% data. To achieve maximum precision and accuracy we also considered
different financial indicators which is used by different stock market analyst throughout
the globe.

Results and Discussion:


1. Model Development - For the study we considered the NIFTY50 which enlist combined
score the top 50 best stocks in India. To give maximum performance and aesthetic to
the code, jupyter notebook and to run this code we used several packages such as
pandas, numpy, finta, matplotlib, sklearn. Pandas is used to read and manupilating
the dataframe structure of the imported csv file, numpy is used to call comprehensive
mathematical functions, finta is used to call the technical financial indicators such as
MACD, EMA, SMA, MCD etc, matplotlib is used to plot the graph and sklearn is used to
call different regression models. At first, we call all the financials indicators and after
applying it to our dataset, it is further called to 3 different regression models. After
training the data we predict the further 20 percentage of these data and stored the
data with maximum percentage of precision and accuracy. In order to find the best
model, we use voting classifiers with the hardest voting parameter and ensemble this
dataset into a single one.
2. Financial Parameters
• 14 period RSI - The RSI was designed to indicate whether a security is overbought or
oversold in relation to recent price levels. The RSI is calculated using average price
gains and losses over a given period of time. The indicator is calculated using the last
14 candles or last 14 bars on the price chart.
• MACD- Moving average convergence divergence (MACD) is a trend-
following momentum indicator that shows the relationship between two moving
averages of a security’s price. The MACD is calculated by subtracting the 26-
period exponential moving average (EMA) from the 12-period EMA.
• MFI- The Money Flow Index is a momentum indicator that measures the flow of money
into and out of a security over a specified period of time.
• EMA - The exponential moving average is a technical chart indicator that tracks the
price of an investment (like a stock or commodity) over time. The EMA is a type
of weighted moving average (WMA) that gives more weighting or importance to
recent price data. Like the simple moving average (SMA), the EMA is used to see price
trends over time, and watching several EMAs at the same time is easy to do with EM
• EMV - Ease of Movement is a volume-based oscillator that fluctuates above and below
the zero line. As its name implies, it is designed to measure the “ease” of price
movement. Arms created EquiVolume charts to visually display price ranges and
volume. Ease of Movement takes EquiVolume to the next level by quantifying the
price/volume relationship and showing the results as an oscillator.

3. Results
• Random Forest Models

• KNN Model
• Gradient Boosting Classifier

• Ensembled

Conclusion:
From the results shown above it is concluded that the combined prediction gives the maximum
precision of 88% and accuracy of 86%. Even though this gives a good prediction stock market is still a
volatile market and needs a financial advisors support for assistance.

References:
[1] https://www.nseindia.com/
[2] https://finance.yahoo.com/
[3] https://www.kaggle.com

You might also like