Welco ME

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 15

WELCO

ME
A PROJECT REPORT ON

(Name of the project)

SUBMITTED
BY
(Name of the student)

2022 TO 2023
Problem Statement
The purpose of this project is to analyze the sentiments of the text and extract text features from the
provided dataset.
So, the input is the textual contents that are further cleaned, and output are labels.
The major challenges faced during the research were: Text had noise, some lacked
emotions, lack of labelled data

INTRODUCTION
We give the readers an overview of the project's background, motivation, problem statement,
difficulties encountered, project goals, and report structure in this chapter.
Background
People are publicly connecting with each other through social media platforms by
publishing images, messages, and other types of media.
On such platforms, most users express their opinions and sentiments; while doing so is a great way to
get emotional support from other users, it may also demoralize people. Understanding the feelings that users of
these social networks express is essential for preventing a depressed user from going on a bad mental health
path.
Now since people express their feeling openly on these public domains and twitter being the most
popular domain, the regular participation of users creates a great volume of data The geographical location,
timestamp, and digital footprint of the content author are all included in this data.
Twitter sentiment analysis refers to the use of natural language processing (NLP) techniques to
analyze the sentiment or emotion expressed in tweets, which are short messages posted on the social media
platform Twitter. This type of analysis has gained widespread attention in recent years due to the increasing
popularity of Twitter and the potential for real-time analysis of public opinion on a wide range of topics.
Motivation
This project allowed me to study and confirm my interest in the topic of machine learning,
which has long been a source of great fascination for me. Machines that can learn, estimate situations,
and make predictions will be extremely powerful and have a wide range of applications. Machine
learning has several applications in practically every industry, including finance, health, automobile,
etc. The fact that machine learning is so widely used inspired me to adopt it as the foundation for my
idea.
This project was motivated by the fact that how information present on the web, majorly
tweets, can shape user’s feeling and how to analyze them to detect depression at early stages
Literature Review
Neural Networks Recurrent Neural Networks
(RNN)

Deep neural network topologies are made to learn by


connecting numerous layers, with each layer only Recurrent neural networks (RNN) for text mining

connected to the one before it in the hidden part of the and classification are also of interest to researchers.

network. Vectors are already integrated in the input layer. RNN gives the earlier data points in the sequence

There should be an equal number of neurons in the output more weight. Thus, this method is efficient for

layer for each class in a multi-class classification issue, and classifying text, sequence, and string data.

just one neuron is required for a binary class.


Gated Recurrent Unit LSTM
(GRU)

The Gated Recurrent Unit (GRU) was created by J. Long-Short Term Memory (LSTM) cells are used in
Chung et al. and K. Cho et al. as an RNN gating most of our networks. They offer a means of retaining
mechanism. The LSTM architecture is simplified in a specific number of previous values. The input gate,
the GRU, but there are several key differences as the output gate, and the forget gate are typically the
well: Two gates make up GRU, but it lacks internal three components that make them up
memory
Classification of Sentiments
Through sentimental analysis, the writer's emotions or the priority of the context in the posts are identified.
The sentimental analysis, which can be positive, negative, or neutral, seeks to ascertain the writer's attitude toward
specific issues or the entire book.
The hierarchy of sentiment analysis can be seen below

Finding the polarity of the sentiment in the text and classifying it as positive, negative, or neutral is the main
objective of sentiment analysis. This categorization can take place at the phrase, document, entity, or aspect
levels. The polarity of the emotion can be extracted from user-generated text using a number of different
techniques.
DESIGN AND
METHODOLOGY
Our goal is to know what are the depressive and non-depressive tweets when the model is provided with
an input. First, let us understand the machine learning models are used in the project.
Machine Learning Techniques
 Decision trees
 Random forests Neural Networks
 Support vector machines (SVMs)
 Recurrent Neural Networks (RNN)
 Naive Bayes
 Gated Recurrent Unit (GRU)
 K-means
 LSTM
 Neural networks
 CNN
 Deep learning
Using Tweepy library to filter the fetched data

Tweepy is a Python library for accessing the Twitter API. It makes it easy to access and work with data
from Twitter in Python.
RESULTS
Model Building
Model Summary

From the above table we can conclude that the SVM model is best fitted as we as Logistic also good fitted
as compare with other models.
Confusion Matrix
In table we can see the accuracy of
classification in every training model.
Compare All the models LSTM model
Model Summary
Epoch 1/3
309/309 - 35s 108ms/step - loss: 0.413
1 - accuracy: 0.7974 - val_loss: 0.3325 -
val_accuracy: 0.8545
Epoch 2/3
309/309 - 33s 108ms/step - loss: 0.198
3 - accuracy: 0.9213 - val_loss: 0.3564 -
val_accuracy: 0.8553
Epoch 3/3
309/309 - 33s 108ms/step - loss: 0.107
1 - accuracy: 0.9593 - val_loss: 0.4389 -
val_accuracy: 0.8468
Above graph shows that the best model with the In above note we can see that the model accuracy
highest accuracy is Support Vector Machine (SVM) increasing at Epoch 3 and it goes to near 85%.
with 85.79% accuracy on test dataset. Logistic
regression performed good also but we can see the Model Accuracy
graph of Cart, NN and Rf there is problem of
overfitting.
Support Vector Machine (SVM) with word embedding
Test results
===================================
Accuracy score is : 0.8500566893424036
===================================
Detail:
precision recall f1-score support
0 0.85 0.87 0.86 3676
1 0.85 0.83 0.84 3380
accuracy 0.85 7056
macro_avg 0.85 0.85 0.85 7056 AUC score is : 0.9257555807380031
weighted_avg 0.85 0.85 0.85 7056 In Above graph ROC curve is on left side and
Precision-recall curve in left side
Test results shows the accuracy of model as well Based on this graph we can decide how much the
as precision and recall. model is performing. Curve from the graph is far
away from the dotted line therefore we can say that
model is good performing.
In precision-recall graph there is 80% to 90% area
acquired by the curve.
Area under curve is 92.57%.
Discussion

In this dissertation, we explored the use of various architectures, including neural networks,
convolutional neural networks, and transformer architectures, for sentiment analysis. We analyzed
the training time, accuracy, precision, and F1 score of these models and found that the transformer
architecture, while powerful, can be inefficient due to its dependence on large amounts of memory
and stack size. We also identified the challenge of limited maximum RAM memory for training
transformer models and discussed the potential for transformer models to create a hyper-dimensional
kernel space to separate positive and negative sentiments.
However, it is worth noting that the transformer architecture also has some limitations,
including the issue of memory leakage.
Conclusion

We fitted different types of supervised machine learning models to solve our problem of
classification of tweets which are into two types one is depressed tweetsand another is non depressed
tweets on our collected and pre-processed data
The machine learning techniques we used, the support vector machine with word embedding
achieved particularly impressive results, with an accuracy rate of around 85%. However, both the
Logistic regression and KNN performed well on temporal data, with accuracies of around 84% and
78% on the test set, respectively. In both cases, we observed a consistent decrease in loss and
increase in accuracy as the models were trained, indicated that they were effectively learning to
differentiate between positive and negative sentiments
Overall, our results demonstrate the effectiveness of machine learning techniques in
the field of sentiment analysis.
Thank You…

You might also like