Proposal Defense Sentiment Analysis[1]

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

NEW SUMMIT COLLEGE

Tribhuvwan University

Institute of Science and Technology

“Sentiment Analysis of Text

Using Multinomial Logistic Regression”

PROJECT PROPOSAL

Submitted to

Department of Computer Science and Information Technology

In partial fulfillment of the requirement for the Bachelor Degree in

Computer Science and Information Technology

Submitted by:

Nirbhik Jung Bhattarai (20429/075)


Nishan Shrestha (20430/075)
Sisan Niraula (20447/075)
Table of Content
1. Introduction.............................................................................................................................................. 4
2. Problem Statement .................................................................................................................................. 4
3. Objectives ................................................................................................................................................. 4
4. Methodology ............................................................................................................................................. 5
i. Requirement Identification ................................................................................................................. 5
a) Literature Review ............................................................................................................................ 5
b) Requirement Analysis...................................................................................................................... 6
ii. Feasibility Study................................................................................................................................... 7
a) Technical ........................................................................................................................................... 7
b) Operational ....................................................................................................................................... 7
c) Economic........................................................................................................................................... 8
d) Schedule ............................................................................................................................................ 8
ii. High Level Design of System ............................................................................................................... 8
Methodology of the proposed system ..................................................................................................... 8
Flowchart .................................................................................................................................................. 9
Working Mechanism of Proposed System ........................................................................................... 10
Description of Algorithm....................................................................................................................... 10
5. Expected Outcome ................................................................................................................................. 12
6. References ............................................................................................................................................... 13

2
Table of Figures
Figure 1: Use-Case Diagram …………………………………………………………… 6
Figure 2: Gantt Chart …………………………………………………………………… 7
Figure 3: Flowchart …………………………………………………..…....................... 9

3
1. Introduction
Sentiment Analysis is the technique of identifying and extracting subjective information from
source materials using natural language processing, text analysis, and computational
linguistics. In general, sentiment analysis seeks to determine a speaker's or writer's attitude
toward a topic or a document's overall contextual polarity. His or her attitude could be a
judgement or evaluative affective state, or it could be the desired emotional commutation.

Sentiment analysis is the technique of recognizing positive, negative, or neutral feelings


associated with a piece of writing. Humans have the intrinsic ability to identify sentiment; yet,
in a business setting, this process is time consuming, unreliable, and costly. It's simply not
feasible to read tens of thousands of user customer reviews and grade them for sentiment on
an individual basis.

2. Problem Statement
The sentiment of the people is not exactly addressed in comments and status. View Efficient
analysis of the sentiment of the people reviews on the comment can help to find out better
comments. The sentiment analysis in “Sentiment Analysis-Web Application” can be one of
the solutions to solve such issues to some extent as it offers not only an employee but also a
general public to be a part of the logistics.

3. Objectives
The objectives of Sentiment Analysis are as follows:
a. To analyze the subjective information in the text.
b. To analyze the option of people.
c. To classify text as positive, negative and neutral using multinomial logistic
regression.

4
4. Methodology
The software development methodology is a framework for structuring, planning, and
controlling the development of an information system's processes. We adopted the Waterfall
Model, which is the most extensively used development process. The Waterfall model is a
linear, sequential method to project management. Before the project begins, the customer and
stakeholder needs are obtained. This model contains several phases, each of which begins
only after the preceding stage has been completed. We gathered the requirements and adhered
to this development technique throughout the project's completion. While working on the
project, many steps of the waterfall model were carefully studied, including requirement
analysis, system design, implementation, testing, deployment, and maintenance.

i. Requirement Identification
a) Literature Review
Decision Multinomial Logistic Regression (LR) is the development of linear regression
techniques for situations where outputs are categorical variables. Multinomial Logistic
Regression has been widely used in various data mining and machine learning problems
where LR describes the response variables with one or more predictor variables. The recent
works have been proposed Multinomial Logistic Regression (LR) algorithm to solve the
research problems, e.g. Cheng and Eyke (2009), Rus et al. (2009), Freyberger et al. (2004),
Feng and Back (2009), Kotsiantis et.al (2003), Mittal (2009) and Felix (2014). Cheng and
Eyke (2009) proposed the combination of Instance Based Learning and Multinomial Logistic
Regression to complete the multi-label classification. Freyberger et.al. (2004) proposed
Multinomial Logistic Regression (LR) algorithm to find the best fitting of transfer model in
case student learning data. Rus et al. (2009) attempted to compare the result of data processing
using several machine learning method for student mental model detection, e.g. Naïve Bayes,
Bayes net, Support Vector Machines (SVM).Multinomial Logistic Regression and Decision
Trees. Feng and Back (2009) proposed Multinomial Logistic Regression for construction
model of transfer in order to predict student can represent their knowledge. Kotsiantiset.al
(2003) attempted to classify student dropout prediction by using Neural Network, Decision
Tree, Naïve Bayes, Instance Based Learning, Multinomial Logistic Regression, and Support
Vector Machine (SVM).

5
In 2019, Saad and Yang have aimed for giving a complete tweet sentiment analysis on the
basis of ordinal regression with machine learning algorithms. The suggested model included
preprocessing tweets as the first step and with the feature extraction model, an effective
feature was generated. The methods such as SVR, RF, Multinomial logistic regression
(SoftMax), and DTs were employed for classifying the sentiment analysis. Moreover, twitter
dataset was used for experimenting with the suggested model. The test results have shown
that the suggested model has attained the best accuracy, and also DTs were performed well
when compared over other methods. In 2018, Fang et al. have suggested multi-strategy
sentiment analysis models using the semantic fuzziness for resolving the issues. The outcomes
have demonstrated that the proposed model has attained high efficiency.

b) Requirement Analysis
A Functional requirements (FR) is a description of the service that the software must offer. It
describes a software system or its component. A function is nothing but inputs to the software
system, its behavior, and outputs. The functional requirements are as follows:

Fig: Use Case Diagram for Sentiment Analysis

6
A Non-functional requirements (NFR) are a set of specifications that describe the system’s
operation capabilities and constraints and attempt to improve its functionality. These can
include things like speed, security, and dependability. These requirements are applicable for
our system as well. Other non-functional requirements include the following.
a. Performance
b. Scalability
c. Interoperability

ii. Feasibility Study


a) Technical
All of the tools and software products needed to complete this project are widely
accessible on the internet. It does not require a particular environment to run. It
necessitates the use of an IDE. All of these elements are reasonably priced. The
program requires simple user interfaces, but the methodology and real-time
calculations are difficult to implement.

i. Programming Language used: Python


ii. Development Tool used: Visual Studio Code
iii. Framework used: Flask

b) Operational
It is concerned with the system's operational capabilities. Our SA (Sentiment
Analysis) project's reports and classifications can help decision makers and company
owners make better judgments for a more effective operation. The system's statistics
and reports are easier to read and comprehend. As a result, the system is operationally
viable.

7
c) Economic
Economic feasibility is used to assess the positive economic effects of a project. The
system designed for the project is a web application, which requires all of the hardware
and software support those other applications require. Hardware, software, and labor,
as well as building skills, will be required to incorporate Sentiment Analysis.
Additionally, API costs might be included as system integration charges.

d) Schedule

Fig: Gantt chart

ii. High Level Design of System

Methodology of the proposed system


We will be using agile methodology for building this proposed system. The system will be
developed step by step with collaboration of all our team members. The continuous
improvement at every stage assures the reliability of our system. Once the work begins, teams
cycle through a process of planning, executing, and evaluating. All our team members will
share different ideas and thoughts about the system and similarly with the help of nowadays
existing technology which is internet we can also get suggestions from it. Step by step, we
will be organizing the best ways to develop system efficiently. The descriptions of work to be
performed at each stage of the development process and drafted documents.

8
Flowchart
The working mechanism for sentiment analysis using flowchart is shown below:

Fig: Flowchart

9
Working Mechanism of Proposed System
The proposed system will collect the data that we want to analyze. This could be in the form
of social media posts, customer reviews, or any other type of text that contains opinions. The
next step is to preprocess the data or train the data. This involves removing stop words and
any other unnecessary data that could interfere with the sentiment analysis process. The next
step is to tokenize the text, which involves breaking it down into individual words or phrases.
The next step is to encode the text into a numerical format that can be analyzed by machine
learning algorithm. The final step is to use machine learning algorithm to analyze the
sentiment of the text. This involves training a model on a dataset of labeled data, where each
example is tagged with a sentiment label (positive, negative, or neutral). The model then uses
this training data to make predictions on new text data.

Description of Algorithm
We will be using Multinomial Logistic Regression Algorithm for sentiment analysis project.

 Multinomial Logistic Regression Algorithm


Multinomial logistic regression is an extension of logistic regression that adds native support
for multi-class classification problems. Logistic regression, by default, is limited to two-class
classification problems. Some extensions like one-vs-rest can allow logistic regression to be
used for multi-class classification problems, although they require that the classification
problem first be transformed into multiple binary classification problems. Instead, the
multinomial logistic regression algorithm is an extension to the logistic regression model that
involves changing the loss function to cross-entropy loss and predicting probability
distribution to a multinomial probability distribution to natively support multi-class
classification problems.

10
Cross Entropy or Loss Function

Cross-entropy loss, or log loss, measures the performance of a classification model whose
output is a probability value between 0 and 1. Cross-entropy loss increases as the predicted
probability diverges from the actual label. So predicting a probability of .012 when the actual
observation label is 1 would be bad and result in a high loss value. A perfect model would
have a log loss of 0.

Fig: Log Loss vs Predicted Probability

The graph above shows the range of possible loss values given a true observation (isDog =1).
As the predicted probability approaches 1, log loss slowly decreases. As the predicted
probability decreases, however, the log loss increases rapidly. Log loss penalizes both types
of errors, but especially those predictions that are confident and wrong. Cross-entropy and log
loss are slightly different depending on context, but in machine learning when calculating
error rates between 0 and 1 they resolve to the same thing.

11
Math
In binary classification, where the number of classes MM equals 2, cross-entropy can be
calculated as:
-(ylog(p)+(1-y)log(1-p))-(ylog(p)+(1-y)log(1-p))

If M>2M>2 (i.e. multiclass classification), we calculate a separate loss for each class label
per observation and sum the result.

Where,
M - number of classes
log - the natural log
y - binary indicator (0 or 1) if class label c is the correct classification for observation o
p - predicted probability observation o is of class c

5. Expected Outcome
We can properly analyze people's opinions with text with this sentiment analysis method. The
system can categorize the submitted comments as Negative, Neutral, or Positive. This
categorization can assist businesses in gathering feedback and developing better products and
services. At some level, public opinion can be derived from this categorization.

12
6. References

[1] D. W. Hosmer and S. Lemeshow, “Applied Multinominal Logistic Regression.” New


York: John Wiley & Sons, Inc, 2000.

[2] W. Cheng and H. Eyke, “Combining Instance-Based Learning and Multinominal


Logistic Regression for Multilabel Classification,” pp. 1–15, 2009.

[3] J. Freyberger, N. T. Heffernan, and C. Ruiz, “Using Association Rules to Guide a


Search for Best Fitting Transfer Models of Student Learning,” 2004.

[4] V. Rus, M. Lintean, and R. Azevedo, “Automatic Detection of Student Mental Models
During Prior Knowledge Activation in MetaTutor,” pp. 161–170, 2009.

[5] M. Feng and J. Beck, “Back to the future: a non-automated method of constructing
transfer models,” pp. 240–249, 2009.

[6] S. B. Kotsiantis, C. J. Pierrakeas, and P. E. Pintelas, “Preventing Student Dropout in


Distance Learning Using Machine Learning Techniques,” pp. 267–274, 2003.

[7] S. E. Saad and J. Yang, "Twitter Sentiment Analysis Based on Ordinal Regression,"
IEEE Access, vol. 7, pp. 163677-163685, 2019.

[8] Y. Fang, H. Tan and J. Zhang, "Multi-Strategy Sentiment Analysis of Consumer


Reviews Based on Semantic Fuzziness," IEEE Access, vol. 6, pp. 20625-20631, 2018.

13

You might also like