Stock Market Prediction Using Sentiment Analysis: Prof. Artika Singh

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 19

1

Stock Market Prediction using Sentiment


Analysis
A Project Report
Submitted by
Anubhav Sharma
Aatmik Sharma
Shubham Kumar
Yagyansh Pareek
Pranit Malviya

Under the Guidance of

Prof. Artika Singh

in partial fulfillment for the award of the


degree of

MBA (Tech.)
COMPUTER ENGINEERING

At

MUKESH PATEL SCHOOL OF TECHNOLOGY


MANAGEMENT & ENGINEERING

October, 2021
DECLARATION
2

We ,Anubhav Sharma, Shubham Kumar, Pranit Malviya, Aatmik Sharma,


Yagyansh Pareek Roll No. N264, N239, N245, N263, N254 MBA (Tech.) (Computer
Engineering), VII semester understand that plagiarism is defined as anyone or combination of
the following:

1. Un-credited verbatim copying of individual sentences, paragraphs or illustration (such


as graphs, diagrams, etc.) from any source, published or unpublished, including the
internet.
2. Un-credited improper paraphrasing of pages paragraphs (changing a few words
phrases, or rearranging the original sentence order)
3. Credited verbatim copying of a major portion of a paper (or thesis chapter) without
clear delineation of who did write what. ( Source:IEEE, The institute, Dec. 2004)
4. I have made sure that all the ideas, expressions, graphs, diagrams, etc., that are not a
result of my work, are properly credited. Long phrases or sentences that had to be
used verbatim from published literature have been clearly identified using quotation
marks.
5. I affirm that no portion of my work can be considered as plagiarism and I take full
responsibility if such a complaint occurs. I understand fully well that the guide of the
seminar/ project report may not be in a position to check for the possibility of such
incidents of plagiarism in this body of work.

Aatmik sharma (N263)

Anubhav Sharma(N264)

Yagyansh Pareek (N254)

Pranit Malviya(N254)

Shubham Kumar(N239)

Place: Mumbai

Date: 14th Oct, 2021


3

CERTIFICATE

This is to certify that the project entitled “Stock Market Prediction with Sentiment Analysis”
is the bonafide work carried out by Anubhav Sharma, Shubham Kumar, Pranit Malviya,
Aatmik Sharma, Yagyansh Pareek of MBA (Tech.) (Computer Engineering), MPSTME
(NMIMS), Mumbai, during the VII th semester of the academic year 2021, in partial
fulfillment of the requirements for the award of the Degree of Bachelors of Engineering, as
per the norms prescribed by NMIMS. The project work has been assessed and found to be
satisfactory.

______________________

Artika Singh

Internal Mentor

_______________________ ________________________

Examiner 1 Examiner 2

__________________

Director
4

Table of contents

Serial TITLE PAGE NO.

1. Declaration ii

2 Certificate iii

Abbreviations iii

Abstract v

1. INTRODUCTION

1.1 Project Overview vi

1.2 Hardware Specification vi

1.3 Software Specification vi

2. REVIEW OF LITERATURE vii

3. ANALYSIS & DESIGN xiv

4. METHODS IMPLEMENTED xiv

5. RESULTS & DISCUSSION xvii

6. PROJECT TIMELINE xviii

7. CONCLUSION & FUTURE SCOPE xviii

8. LOGBOOK xix
5

ABSTRACT

Predicting stock market prices has been a topic of interest among both analysts and
researchers for a long time. Stock prices are hard to predict because of their highly volatile
nature which depends on diverse political and economic factors, change of leadership, investor
sentiment, and many other factors. Predicting stock prices based on either historical data or
textual information alone has proven to be insufficient.
Existing studies in sentiment analysis have found that there is a strong correlation between the
movement of stock prices and the publication of news articles. Several sentiment analysis
studies have been attempted at various levels using algorithms such as support vector machines,
naive Bayes regression, and deep learning. The accuracy of deep learning algorithms depends
upon the amount of training data provided. However, the amount of textual data collected and
analyzed during the past studies has been insufficient and thus has resulted in predictions with
low accuracy.
We improve the accuracy of stock price predictions by gathering a large amount of time series
data and analyzing it in relation to related news articles, using deep learning models. We will
use a dataset for S&P 500 companies for five years, along with more than 265,000 financial
news articles related to these companies. Given the large size of the dataset, we use cloud
computing as an invaluable resource for training prediction models and performing inference
for a given stock in real time.
6

Chapter 1

Introduction

1.1 Project Overview

The fluctuation of the stock market is violent and there are many complicated financial
indicators.

● However, the advancement in technology, provides an opportunity to gain steady


fortune from the stock market and also can help experts to find out the most informative
indicators to make better predictions.

● The prediction of the market value is of paramount importance to help in maximizing


the profit of stock option purchase while keeping the risk low.

● We will analyse sentiment of news from twitter and NDTV data and find correlation
between news and price movement to finally predict price movement of stock in coming
days.
.

1.2 Hardware Specification


A PC with good processing power

1.3 Software Specification

Google Colab notebook:Colab notebooks are notebooks that run in the cloud and are highly
integrated with Google Drive, making them easy to set up, access, and share.The following
sections describe deploying Earth Engine in Google Colab and visualizing maps and charts.

Visual studio code:Visual Studio Code is an source-code editor made by Microsoft for
Windows, Linux and macOS. Features include support for debugging, syntax highlighting,
intelligent code completion, snippets, code refactoring, and embedded Git.

Cloud GPU:A cloud graphics processing unit (GPU) provides hardware acceleration for an
application, without requiring that a GPU is deployed on the user's local device. Common use
7

cases for cloud GPUs are: Visualization workloads: Powerful server/desktop applications
often employ graphically demanding content.

Chapter 2

Research Survey
To understand the opinion of people about the Stock market and their levels of Financial
Literacy a survey was circulated for research. The survey received 73 responses and their
inferences along with the question are given below -

Fig:1 and Fig:2 Gender and Age of People who filled the form
8
9

Literature Review
10

For the literature review, approximately 20 papers were reviewed in the domain of Stock
Prediction with Sentiment Analysis. The below table provides us a brief overview of each of
these papers.

Name of Paper Authors Overview Method Conclusion


Stock market Wasiat Khan, They used sentiment analysis of Sentiment analysis, Different markets are
prediction using Mustansar Ali news and combined it with deep Deep learning , influenced in a different way
machine learning Ghazanfar, learning to get an accuracy of Hybrid algorithm , with respect to news , some
classifiers and 83.22% while using random forest markets are more volatile while
Muhammad Natural language processing
social media, news as classifier some are less .
Awais Azam, , random forest classifier
Amin Karami,

Predicting Stock Ayman E. They used sentiment analysis of Sentiment Analysis, Naive The model is divided into two
Market Behavior Khedr, news and combined it with Data Bayes, stages which increase the
using Data Mining S.E.Salama, mining to get an accuracy of KNN classifier prediction accuracy
Technique and Nagwa Yaseen
89.80% while using Naive Bayes,
News Sentiment
Analysis KNN classifier.

Stock Market Tejas It predicted the stock price Support Vector Cloud services will enable us to
Prediction based on Mankar,Tushar movement, in favour of the Machine,Natural Language collect large amounts of data.
Social Sentiments Hotchandani , sentiments of the tweets. Toolkit (NLT) library
using Machine Manish
Learning Madhwani,
Akshay
Chidrawar

Stock Price Saloni Mohan1, It e improve the accuracy of stock MAPE is useful while In this model it predicted stock
Prediction Using Sahitya price predictions by gathering a evaluating prediction models prices using time series models,
News Sentiment Mullapudi1, large amount of time series data where only the magnitude of neural networks, and a
Analysis Sudheer and analyzing it in relation to the difference between combination of neural networks
Sammeta1, related news articles, using deep predicted values and and financial news articles. The
Parag learning models. The dataset we observed values is important results suggest that there is a
Vijayvergia1 have gathered includes daily stock to consider while the strong relationship between
and David C. prices for S&P500 companies for direction of the difference stock prices and financial news
Anastasiu1, five years, along with more than can be ignored. Evaluation articles
265,000 financial news articles using MAPE overcomes the
related to these companies large deviation bias present
in Root Mean Square Error
(RMSE) and shows
robustness for datasets
containing long tails.

Forecasting Pushpendu In this paper, They used LSTM Random forest,LSTM Both of their models
directional Ghosh,Ariel model in multi feature setting to (CuDNNLSTM) outperformed the market in
movements of stock Neufeld, Jajati get daily returns of 0.64% and intraday
prices for intraday Keshari Sahoo used random forest method to get
11

trading using LSTM daily returns of 0.54%

Visualization and Shivang B, I Learned to predict stock prices Long Term Short Memory, Helps in accurate prediction of
Forecasting of Tirtha Roy. using the LSTM neural network. stock market.
stocks using LTSM Using plotly dash framework for
technique. building dashboards.

Study on the G.Ding They used LSTM network model LSTM-based deep recurrent The model can predict multiple
prediction of stock L.Qin and the LSTM deep-recurrent neural network stock price simultaneously
price based on the neural network model to get an
associated network accuracy of over 95%
model of LSTM

ML Algorithms in Jayesh P Comparison of different machine SVR ,Random forest, LTSM Different methods give
stock market Rejo Mathew. learning algorithms namely different benefits according to
prediction. SVR,LTSM and random forests the requirements of the user.
techniques.

Stock Market Pius Adewale The stacked LSTM model’s ARIMA,Deep Recurrent The model was able to predict
Behaviour Owolawi, ability to closely predict the stock Neural Network stock market behavior with
Prediction using Maredi market behavior based on historic some accuracy.
Stacked LSTM Mphahlele data it was trained on.
Networks

Stock Price Gourav Bathla LSTM is applied on different Deep Learning, LSTM, Deep learning is applied to
prediction using stock indexes and compared with RNN, SVR improve prediction accuracy.
LSTM and SVR linear regression and ARIMA

Stock market price LSTM model, it achieves a binary MLP,SVM,LSTM,Deep They applied the feature
trend prediction Jingyi Shen & accuracy of 93.25%. Learning expansion approaches with
using The RAF algorithm achieved a RFE.
comprehensive M. Omair relatively high true-positive rate .
deep learning Shafiq
system

Sentiment Aditya There are various ups and downs Opinion,Mining,Deep This model Was able to gather
Analysis for indian bharadwaj,Yog in the Indian stock market. In learning,LSTM ,SVR info about markets and
stock market using endra order to invest money in the stock improved efficiency and
sensex nifty. narayan,Vanshi market for purchasing the shares it accuracy.
ka is very essential for the investors
singh,Maitree to predict the stock market
condition. In the India scenario
Sensex and Nifty are two major
indicators for prediction of stock
market conditions.

Dev shah , This paper explains different LSTM,Arima,Random Different methods give
StockMarket Hasurana isah techniques that can be used for forest ,RNN ,SVM different benefits according to
Analysis A Review stock market prediction and the requirements of the user.
and Taxonomy of analysis and conveniently they
Prediction help in easing the trends.
Techniques

Stock Market Ziping Lin , This paper aims to successfully Lstm,RNN, Support vector This paper indicated how these
Prediction Analysis Anuj Thakkar, predict stock price through ,Deep learning sentiments affect the market
Zaphia li analyzing the relationship and this leads to downfall of the
12

by Incorporating between the stock price and the market or the up trends in the
Social and News news sentiments. A novel stock market.
enhanced learning-based method
Opinion and for stock price prediction is
Sentiment proposed that considers the effect
of news sentiments

Literature on Stock YV The objective of any investment is Deep learning ,Lstm,RNN, The different key issues or the
Returns: A Content Reddy,Pranav to earn a return. Return on the Neural network . factors were analyzed and
Analysis narayan amount invested in stocks includes presented in count and
dividend and capital appreciation. percentages. The study indeed
These returns are influenced by helps the stock exchanges, the
both systematic and unsystematic regulators.
risks. Systematic risk includes the
macroeconomic variables

Aparna nayak, There are two common methods to Data collection, data There are many predictive
MM Manhora , predict the stock market prices. Analysis , Data Processing. models which tell about the
Prediction Models Radhika M Pai One among that is chartist or market trend whether it is up or
for Indian Stock technical theories and the second down, but they fail to give
Market one is fundamental or intrinsic accurate results. An attempt has
value analysis. Proposed method been made to build efficient
is built on the principle of predictive model of stock
technical theories. market where the trend for the
next day is predicted.

Table 2.1 - Overview of Research Papers


13

Chapter 3
Analysis & Design
From research and user survey we analysed that stock market is lucrative field but it can be
very confusing to new comers. We will model the news dataset to analyse the sentiment of
news and how it is correlated with market to give users recommendations

Chapter 4
Proposed Model

For the sake of this project, we are considering two base papers:

1. ARIMA

An ARIMA model is a class of statistical models for analyzing and forecasting time series

data.
14

ARIMA is an acronym that stands for AutoRegressive Integrated Moving Average. It is a

generalization of the simpler AutoRegressive Moving Average and adds the notion of

integration.

This acronym is descriptive, capturing the key aspects of the model itself. Briefly, they are:

● AR: Autoregression. A model that uses the dependent relationship between an

observation and some number of lagged observations.

● I: Integrated. The use of differencing of raw observations (e.g. subtracting an

observation from observation at the previous time step) in order to make the time

series stationary.

● MA: Moving Average. A model that uses the dependency between an

observation and a residual error from a moving average model applied to lagged

observations.

Each of these components is explicitly specified in the model as a parameter. The parameters

of the ARIMA model are defined as follows:

● p: The number of lag observations included in the model, also called the lag

order.

● d: The number of times that the raw observations are differences also called the

degree of differencing.
15

● q: The size of the moving average window, also called the order of moving

average.

2. SARIMAX

ARIMA model considers only trends information in the data and ignores seasonal variation.

SARIMAX is a variation of the ARIMA model which considers seasonal variation in the data

as well. Though, our data do not have high seasonality but why not give it a try.

3. Facebook Prophet

The prophet is an open-source library published by Facebook that is based on decomposable

(trend+seasonality+holidays) models. It provides us with the ability to make time-series

predictions with good accuracy using simple intuitive parameters and has support for

including the impact of custom seasonality and holidays.

4. LSTM Model

Finance is highly nonlinear and sometimes stock price data can even seem completely

random. Traditional time series methods such as ARIMA, SARIMAX models are effective

only when the series is stationary, which is a restricting assumption that requires the series to

be preprocessed by taking log returns (or other transforms). However, the main issue arises in

implementing these models in a live trading system, as there is no guarantee of stationarity as

new data is added.

This is combated by using Neural Networks (sequential models like LSTM, GRU, etc.),

which do not require any stationarity to be used. Furthermore, neural networks by nature are
16

effective in finding the relationships between data and using it to predict (or classify) new

data.

Result & Discussion

In this project, we will be able to get the desired result i.e., prediction of stock price by using

ARIMA (AutoRegressive Integrated Moving Average) model, Facebook Prophet,LSTM

model, SARIMAX which is a variation of the ARIMA.Our work aims to predict future stock

market price of a company . Analysis and visualization of risk profile will be done with that

particular stock . It will aim to capture news sentiment that affects the stock market .We will

be taking Twitter data as our main dataset.If the news sentiment is positive, our model will

show that there are more chances that the stock price will go up and if the news sentiment is

negative, then stock price may go down. This project is an attempt to build a model that

predicts news polarity which may affect changes in stock trends.


17

Project timeline

Conclusion & Future work


18

In this project, We will analyse the stock market news data and process time-series data and
build deep learning models with a production perspective. Stock Price time series is
considered the most challenging time series and we will try to predict Nifty Index Data with
high accuracy. We will also optimize the model in post-training phase to make it ready for
deployment.Our work will surface the number of likes, comments and shares, and aim to
reach, and truly understand, the significance of social media interactions and what they tell us
about the consumers behind the screens

Logbook

Week No. Planned Milestone Discussion

1 Topic Approval Our topic approval


presentation was presented
and our topic was approved.

Market Survey Google form created and


2 responses checked and
studied.

User Journey Map, Based on user research,


3 Stakeholders map, observations and mappings
Persona and empaty maps were done.

Design Planning Design discussions and


4 planning initiated.

Design making All the page designs were


5 planned and made.
19

M2 report submissions Final Report was created


6
Log book reference Group Logbook

You might also like