Professional Documents
Culture Documents
Detection of Fake Reviews Through Sentiment Analys
Detection of Fake Reviews Through Sentiment Analys
Data Collection
Data Preprocessing
Attribute selection
Feature Selection
Sentiment Classification
Algorithm
SVM
Detection Process
Comparison of Result
Result Analysis
Step 1: Reviews dataset collection
To provide a complete study of machine learning algorithms, the
experiment is based on analyzing the sentiment value of the
standard dataset.we have used the origina dataset of shopify app
reviews to test our methods of reviews classification.
The dataset in avaiable as shopify app dataset and is often
considered as the stanadard dataset for the researchers working in
the field of Sentiment Analysis.
The dataset contains 10,000+ reviews of the apps given by the
customers, some of which are positive, some negative and some
neutral.
A.StringToWordVector
To preparing our data for learning, which involves
transforming it by using the StringToWordVector filter, and
which is the main tool for text analysis in WEKA. This filter
allows configuring the different steps of the term extraction.
Indeed, we should be able to see something such as the
following:
B.Attribute Selection
Removing the poorly describing attributes can be valuable
to get improved classification accuracy. Because not all
attributes are relevant to the classification work, and irrelevant
attributes can even decrease the performance of some
algorithms. We should perform attribute selection before
training the classifier.Attribute selection with supervised
learning differs from unsupervised learning, where in the
latter case, data have no goal attribute.
1)Naïve Bayes(NB)
The NB classifier is a basic probabilistic classifier based on
applying Bayes' theorem. The NB calculates a set of
probabilities by combinations of values in a given data set.
Also, the NB classifier has fast decisions making process.
* False Positive:
These are the fake positive reviews in the training dataset , which
are classified as Positive(P).
*True Negative:
These are the real negative reviews given by the customers in the
given training dataset. These are classified by the model as
Negative(N).
*False Positive:
These are the fake negative reviews in the training dataset
which are classified by the model as Negative(N).
Real Fake
Real True Negative False Positive
Reviews(TN) Reviews(FP)
Fake False Negative True Positive
Reviews(FN) Reviews(FP)