Professional Documents
Culture Documents
PBL Project
PBL Project
REPORT ON:
REPORT BY:
POORTI JAIN (214104018)
SNEH YADAV(214104019)
Through this matrix you can get the result and its
f1 score, precision and recall i.e, overall accuracy
percent.
INTRODUCTION
This is a binary classification task. Hence, there are only two labels:
"1" when DJIA Adj Close value rose.
"0" when DJIA Adj Close value decreased or stayed as the same.
PROCESS
1. Sentiment Analysis Dataset
The dataset we will be using to develop this machine
learning sentiment analysis project is combination of
the world news and stock price shifts available
on Kaggle.
2. Required Libraries:
You need to install certain libraries in your system to
implement the python sentiment analysis project. The
required libraries are:
Numpy (pip install numpy)
Pandas (pip install pandas)
install nltk)
Sklearn (pip install sklearn)
1.)IMPORT LIBRARIES:
Basically, we will be importing libraries at the time we
require to use it. So in the first step we will import only
two libraries that are pandas and nltk.
Now we will first take our text data only and convert it into the lower case
using lower() method:
Hence the data set will look like:
Here you can see the all the punctuations are gone as well as
all the letters are in lowercase.
7.) PREDICTION:
Now the last step is to evaluate the model that we have created on test data.
As you can see our model is able to accurately classify the sentiments of the
text.
You can also see the predictions array on the basis of which our matrix is
built and the result is concluded.
RESULT:
Basic terms:-
F1 score is a metric used in machine learning to evaluate how accurately a binary
classification model classifies new input, taking both precision and recall metrics
into account . Precision measures how often the model is correct when it predicts a
1
positive instance, while recall measures how well the model is able to find all the
positive instances in a dataset .1
F1 scores combine these two metrics to create a single score that represents the
overall accuracy of the model . F1 scores range from 0 to 1 and are often used to
1
Macro average is the usual average we’re used to seeing. Just add them all
up and divide by how many there were. Weighted average considers
how many of each class there were in its calculation, so fewer of one
class means that it’s precision/recall/F1 score has less of an impact
on the weighted average for each of those things.
Result breakdown:-
Here as you can see in the given matrix , true negative is 139 (label 0) and
true positive is 182(label 1) which means possibility of increasing of the
stock prices is greater the next day and the possibility that this prediction is
true is given by f1 scores and other parameters like precision.
You can predict the same for any company by collecting their news
headlines dataset and applying the same process on that dataset.
References:
1. "Stock price prediction using machine learning" by Yash Patel and Yogesh
Patel (2018)
5.dataset from:
Daily News for Stock Market Prediction | Kaggle
Yahoo Finance - Stock Market Live, Quotes, Business & Finance News
News (reddit.com)
Dow Jones INDEX TODAY | DJIA LIVE TICKER | Dow Jones QUOTE & CHART | Markets Insider
(businessinsider.com)