A Review on Sentiment Analysis Techniques for reshaping Business

1. Introduction
Language is a means of communication through which we transfer our ideas, thoughts, and
emotions to others, and linguistics is a field in which we study language scientifically. In
addition, linguistics extends its influence beyond its inherent importance by intersecting with
diverse disciplines, establishing connections that extend the boundaries of knowledge. An
example of such convergence occurs with computer science, a domain progressively
interwoven with the investigation of language. Thus the fusion of linguistics and computer
science leads to the development of computational linguistics, a field that utilizes
computational approaches to analyze, model, and gain insights into human language.
Computational linguistics involves applying computational methods and tools to investigate
linguistic phenomena (Luz, 2022). As a subfield of artificial intelligence, computational
linguistics plays a pivotal role in the advancement of Natural Language Processing (NLP).
Natural Language Processing (NLP) is a field of study and practical implementation that
investigates the ways in which computers can be employed to comprehend and handle text or
speech in natural language, with the aim of performing valuable tasks. Hence computational
linguistics is commonly seen as the exploration of linguistic capabilities through
computational processes, while natural language processing is viewed as an engineering
discipline focused on applying algorithmic methods to real-world challenges in the
processing of natural languages, encompassing tasks such as sentiment analysis, text
categorization, parsing, part-of-speech prediction, automatic translation, and text
summarization (Luz, 2022).
Natural language, integral to both direct and indirect communication, enables the
categorization of information into facts, opinions and sentiments, subjective statements that
reflect an individual's feelings towards an event or object (Rokade & Aruna, 2019).
Sentiments encompass the feelings, attitudes, emotions, and opinions an individual holds
toward entities, events, and their attributes. Unlike facts, which can be objectively proven
true or false, sentiments are subjective impressions that reflect personal beliefs or thoughts,
revealing one's opinions. Labeling the process of extracting subjective information from text
and evaluating the overall contextual polarity of opinions is called sentiment analysis
(Rokade & Aruna, 2019). Sentiment analysis, employing natural language processing and
computational linguistics, seeks to identify and extract subjective information from source
materials, aiming to determine the attitudes of speakers or consumers towards specific topics
or products also known as opinion mining, which analyzes sentiments, opinions, evaluations,
attitudes, and emotions related to various entities and their attributes (Gundla & Otari, 2015).
In decision-making procedures, awareness of others' thoughts, opinions, and sentiments
significantly influences human activities, shaping our behaviors, choices, beliefs, and
perceptions, as our decisions are often guided by understanding how others perceive the
world, what are their opinions about the word (Gundla & Otari, 2015). An opinion is a
collective verdict or belief about a specific matter, usually lacking factual basis, reflecting an
individual's subjective viewpoint shaped by emotions or understanding of facts (Gundla &
Otari, 2015). Moreover, the growing interest in sentiment analysis is driven by the
availability of extensive sentiment datasets and the vast potential of applications, including
monitoring public political inclinations, evaluating customer satisfaction with products or
services, enhancing customer relationship management, and gauging overall well-being
(Aqlan, Manjula & Lakshman Naik, 2019). SA is a textual study commonly applied to
internet reviews and social media, plays a crucial role in deciphering responses and customer
feedback on commercial platforms to inform product acceptance or rejection, thereby aiding
companies in boosting sales (Aqlan, Manjula & Lakshman Naik, 2019). In the contemporary
landscape, businesses, organizations, and individual consumers seek public opinions on
various matters, facilitated by online platforms, which have become prominent for e-
commerce, social media, and reviews (Gundla & Otari, 2015). Analyzing sentiment involves
assessing user-generated content on the internet, encompassing opinions, sentiments, and
views expressed in diverse forms like product reviews, forum posts, blogs, or tweets. These
viewpoints may concern to products, individuals, topics, or entities, and the primary goal of
sentiment analysis is to identify subjective information within different outlets, discerning the
author's perspective on an issue, service, organization, or product.
1.1 Data Sources
1.1.1 Review Sites:
Review sites serve as a valuable resource for accessing feedback on various products and
services. These platforms play a crucial role for both providers and consumers alike.
Customers use these sites to express their opinions on specific products or services, detailing
whether they find them effective, beneficial, or a potential waste of time and money. Service
providers, in turn, take advantage from these insights to enhance the quality of their offerings,
using customer reviews as a valuable tool for refining and improving their products and
1.1.2 Blogs:
A blog is a web page where individuals or groups post written content, often adopting an
informal and conversational tone. People regularly share their personal ideas or professional
views in blogs, whether on a daily, weekly, or monthly basis.
1.1.3 Micro blogging:
Microblogging is a form of blogging where individuals share short and concise messages
known as micro-posts. There are limitations on the length of each micro-post on various
microblogging sites. Some of the most well-known microblogging platforms include Twitter,
Tumblr, Instagram, Threads, and MeetMe.
1.1.4 News Articles:
Many daily newspaper websites permit users to comment on current events or issues,
moreover utilizing Rich Site Summary (RSS) can be beneficial in gauging the sentiments
expressed by readers (Rokade & Aruna, 2019).
1.1.5 Social Media:
Social media encompasses social networks and online platforms where individuals share,
create, and exchange knowledge, ideas, opinions, and beliefs. Common social networking
sites like Facebook and Twitter serve as spaces where people communicate through text,
images, videos, and links.
These sources provide a vast volume of data, allowing researchers to extract subjective
expressions that can be utilized for sentiment analysis.
1.2 Terminologies of Sentiment Analysis
1.2.1 Opinion:
Any expression, view or statement that is based on some knowledge and experiences toward
a person, an organization or an entity is called opinion. Liu (2010) described opinion
mathematically as a quintuple (o, f, so, h, t).
o represents object
f represents feature of object
so represents semantic orientation (polarity)
h represents holder
t represents time
1.2.2 Object:
In sentiment analysis, an “object” refers to an entity, event, or person that is the subject of
assessment for sentiment. For instance, when analyzing customer behavior towards online
shopping, "online shopping" serves as the object of sentiment.
1.2.3 Feature of Object:
A “feature of an object” denotes the attribute, characteristic, or aspect of an object (whether
it's an event, entity, or person) that is assessed for sentiment. For instance, in the statement
“The camera of Samsung S24 is awesome,” the feature of the object (Samsung S24) is the
1.2.4 Semantic Orientation (polarity):
Semantic orientation denotes the direction of sentiment in text, indicating whether the
expressed opinion is positive, negative, or neutral. For example, ‘The Red dress is awesome’
is a positive opinion, while ‘Her teaching method is not good’ is a negative opinion.
1.2.5 Holder:
An opinion holder is a person that expresses views, opinions, feelings or sentiments toward
an event, organization or entity. For example, Ali says, the GUCCI’s perfumes are awesome,
here “Ali” is an opinion holder.
1.2.6 Time:
The concept of time in sentiment analysis refers to the temporal aspect at which an opinion is
expressed. For example, 'I love this model of car last year' reflects the sentiment expressed at
a specific point in the past.
1.3 Levels of Sentiment Analysis

1.3.1 Document-Level Sentiment Analysis:

Document-level sentiment analysis involves evaluating the overall sentiment of an entire
document as the fundamental unit. However, the complexity arises from the potential
presence of mixed opinions within the document. Given the creative nature of language,
individuals may convey the same sentiment in diverse ways, sometimes without employing
explicit opinion words.
1.3.2 Sentence-Level Sentiment Analysis:
The collection involves segmenting sentences, categorizing each into positive, negative, or
neutral polarity, encompassing a mix of subjective and objective statements, with the
identification of subjective sentences preceding the evaluation of opinions within them
(Rokade & Aruna, 2019).
1.3.3 Word-Level Sentiment Analysis:
In word-level sentiment analysis focused on product features, the examination occurs at the
word or phrase level, incorporating adjectives and adverbs as features (Rokade & Aruna,
2019). Achieving word-level sentiment involves employing either the 'Dictionary Based
Method' or the 'Corpus Based Method.'
a. Dictionary Based Method:
This method initiates with the creation of a compact seed list containing words with
established prior polarity, which is subsequently expanded through iterative extraction of
synonyms or antonyms from online dictionary sources such as WordNet.
b. Corpus Based Method:
The corpus-based method depends on syntactic or statistical techniques, such as analyzing the
co-occurrence of a word with another whose polarity is known. In this method, pairs of
conjoined adjectives exhibit the same orientation when connected by “and” and opposite
orientations when linked by “but.”
1.4 Process of Sentiment Analysis
Sentiment analysis comprises a five-step data processing workflow, encompassing data
collection, text preparation, sentiment detection, sentiment classification, and the presentation
of output (Aqlan, Manjula & Lakshman Naik, 2019).
1.4.1 Data Collection:
The initial phase in sentiment analysis involves data collection, wherein information is
gathered from various social networking sites such as Twitter, Instagram, and Facebook,
along with select commercial websites. The collected data originates from user-generated
content on diverse web platforms.
1.4.2 Text Preparation:
In this process, the entire text is transformed into a proper normal form, and undesirable
elements are eliminated from the source text to enhance consistency and clarity. Multiple
steps are undertaken to clean unstructured extracted data, including tokenization, removal of
redundant words, stop word elimination, stemming, lemmatization, part-of-speech tagging,
emotion tagging, and scoring.
1.4.3 Sentiment Detection:
Sentiment detection, also known as opinion mining (OM) and sentiment analysis, involves
utilizing machine learning or NLP techniques to dissecting phrases and sentences extracted
from reviews to identify and retain sentences containing subjective expressions such as
beliefs, opinions, ideas and sentiments (Aqlan, Manjula & Lakshman Naik, 2019).
1.4.4 Sentiment Classification:
The task of Sentiment Classification involves extracting and categorizing text with the aim of
classifying it based on the polarity of the expressed opinion, such as positive or negative,
good or bad, like or dislike (Aqlan, Manjula & Lakshman Naik, 2019). Sentiment
classification can be done through different techniques such as machine learning approaches,
hybrid approaches and lexicon-based approaches.
1.4.5 Output Presentation:
The aim of analyzing extensive data is to transform unstructured text into valuable
information, which is subsequently visualized using various charts like graphs, line graphs,
and bar graphs.
1.5 Classification of Sentiments

1.5.1 Lexicon Based Approach:

The lexicon-based approach centers on identifying opinion lexicons, employing positive
words to characterize desired aspects and negative words for undesired aspects in order to
analyze text sentiment (Aqlan, Manjula & Lakshman Naik, 2019). There are two approaches
in lexicon based approach, corpus-based approach and dictionary-based approach.
a. Corpus-Based Approach:
The corpus-based approach initiating with a seed list of opinion words, extracts additional
ideas from a vast corpus by examining words that coexist with the seed list to discern
opinions from specific perspectives (Aqlan, Manjula & Lakshman Naik, 2019). Many
techniques depend on grammatical patterns associated with the seed list to identify additional
words in a large corpus (Aqlan, Manjula & Lakshman Naik, 2019), where the examination of
the corpus reveals similarities or distinctions in the impressions conveyed by words (Rokade
& Aruna, 2019). The interaction of similar and dissimilar opinions among adjectives is
represented graphically, and a clustering algorithm is subsequently employed to create two
clusters, designated as positive and negative (Rokade & Aruna, 2019).
b. Dictionary-Based Approach:
In this dictionary-based method, a carefully selected set of words reflecting established
patterns is manually chosen, subsequently integrated by exploring recognized corpora,
thesauri, or WorldNet for synonyms and antonyms (Aqlan, Manjula & Lakshman Naik,
2019). Newly discovered words are continuously incorporated into the seed list until the
iterative process concludes upon the absence of novel words (Aqlan, Manjula & Lakshman
Naik, 2019). The approach, while effective for identifying and storing opinion words and
their synonyms in the dictionary, is limited in its applicability and best suited for use with a
small set of test opinion words through online dictionaries like WordNet or thesaurus
(Rokade & Aruna, 2019).
1.5.2 Machine Learning Approach:
Machine learning (ML) involves the systematic exploration of algorithms and statistical
models employed by computer systems to carry out a particular task without the need for
explicit programming (Mahesh, 2020). A machine learning approach is utilizing to address
issues pertaining to text classification, particularly those involving syntactic or linguistic
structures (Aqlan, Manjula & Lakshman Naik, 2019). In addition, Machine learning tasks are
generally categorized into three main groups based on the type of learning "signal" or
"feedback" accessible to a learning system; Supervised Learning, Un-supervised Learning
and Semi-supervised Learning (Sharma & Kumar, 2017).
a. Supervised Machine Learning:
Supervised learning involves creating a function to map input to output, inferring the function
through labeled training data, and is initiated in a task-driven manner when specific inputs
can achieve diverse goals, with common tasks including regression and classification
(Jhaveri, Revathi, Ramana, Raut & Dhanaraj, 2022). Supervised learning is further divided
into following techniques;
1. Probabilistic Classifiers
Probabilistic classifiers employ multiple models for classification, encompassing various
types of mixture models, where each model serves as an integrated mixture component,
functioning generatively and accommodating specific terms for the given component,
defining this approach as a generative classifier (Aqlan, Manjula & Lakshman Naik, 2019).
i. Naïve Bayes: Recently, Naïve Bayes has gained prominence as the preferred
method for text classification (Aqlan, Manjula & Lakshman Naik, 2019).
This classifier model calculates the posterior probability of a class by
analyzing the distribution of words within the given document (Aqlan,
Manjula & Lakshman Naik, 2019), boasting simplicity in implementation
coupled with high accuracy (Rokade & Aruna, 2019). The algorithm assesses
each word in the training set, determining its probability for each class
(positive or negative), operating on the assumption that feature values are
independent of each other (Rokade & Aruna, 2019). In essence, a naive
Bayes classifier posits the absence of correlations between features in any
given entity (Rokade & Aruna, 2019).
ii. Bayesian Network classifier: The Bayesian network classifier relies on a
crucial assumption: a collection of variables, each featuring a restricted set of
mutually exclusive cases, independent of the features, with the implicit
suggestion that all features are fully interdependent, resulting in a specific
Bayesian network model represented by a directed graph illustrating random
relationships (Aqlan, Manjula & Lakshman Naik, 2019).
iii. Maximum Entropy Classifier: The maximum entropy classifier, commonly
applied in NLP, speech, data, and addressing problems, serves as a
probability distribution estimation technique, renowned for its significance
and widespread use across various natural language tasks like language
modeling, POS tagging, and text segmentation, grounded in the underlying
principle of operating without external knowledge (Aqlan, Manjula &
Lakshman Naik, 2019).
2. Rule-Based Approach
The foundation of RBA lies in relational rules (Rokade & Aruna, 2019), which operate on IF-
THEN conditions (Aqlan, Manjula & Lakshman Naik, 2019), forming a classification
technique. Despite its limited generalization capabilities, this approach excels in performance
within specific domains (Rokade & Aruna, 2019). Learning classifier systems, association
rule learning, and artificial immune systems are encompassed within this classification
technique (Rokade & Aruna, 2019). Various methods can be employed to generate rules, with
support and confidence being two prevalent approaches (Rokade & Aruna, 2019).
3. Linear Classifier
A linear classifier makes decisions by evaluating the value of a linear combination of an
object's feature values, commonly presented to the machine as a vector known as a feature
vector (Aqlan, Manjula & Lakshman Naik, 2019). It is further divided into two ways;
i. Neural Network: A neural network comprises algorithms that recognize inherent
relationships in multiple sets of data through a process resembling the functioning
of the human mind (Aqlan, Manjula & Lakshman Naik, 2019).
ii. Support Vector Machine: SVM, as a supervised learning method for classification,
employs hyperplanes to segregate multidimensional data, with data points on
these planes termed as supports (Rokade & Aruna, 2019). Support vectors,
specifically those in proximity to hyperplane boundaries, play a crucial role
(Rokade & Aruna, 2019). SVM serves the purpose of analyzing datasets for both
classification and regression analysis, representing a machine learning algorithm
designed for automated data processing (Aqlan, Manjula & Lakshman Naik,

4. Decision Tree Classifiers (DTC)

DTC is utilized for classification, seeks to partition extensive data into manageable groups by
leveraging the multiple values of attributes and data features, providing discrete predictions
for class labels; it is a widely employed and straightforward technique, particularly prevalent
in the field of sentiment analysis (Aqlan, Manjula & Lakshman Naik, 2019).
b. Un-supervised Machine Learning:
Unsupervised learning, devoid of extensive human intervention, involves analyzing datasets
without labels, commonly employed for data exploration, grouping results, identifying
significant structures and trends, extracting general features, and performing tasks such as
anomaly detection, association rule finding, dimensionality reduction, feature learning,
density estimation, and clustering (Jhaveri, Revathi, Ramana, Raut & Dhanaraj, 2022).
c. Semi-supervised Machine Learning:
Semi-supervised learning, functioning with both labeled and unlabeled data, represents a
hybrid approach positioned between supervised and unsupervised learning, particularly
valuable in real-time applications where large volumes of unlabeled data and limited labeled
data are prevalent, demonstrating enhanced predictive performance compared to using
labeled data alone; typical applications include tasks like text classification, data labeling,
fraud detection, and machine translation (Jhaveri, Revathi, Ramana, Raut & Dhanaraj, 2022).

