Fake Review

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

2019 IEEE 21st International Conference on High Performance Computing and Communications; IEEE 17th

International Conference on Smart City; IEEE 5th International Conference on Data Science and Systems

Fake Reviews Detection Based on Text Feature and


Behavior Feature
Yin Shuqin Feng Jing
College of Computer Science and Technology College of Computer Science and Technology
Wuahn University of Technology Wuahn University of Technology
Wuhan, China Wuhan, China
641981005@qq.com fengjing1805@163.com

Abstract—Considering the limitations of comment text in the and it could be identified by detecting semantically repeated
study of fake reviews recognition, this paper proposes to build a reviews. Mukherjee[7] et al. used the SVM classification
classification model by integrating the features of comment text model to achieve 67.8% accuracy based on plain text features
and user behavior. However, the comment data obtained in in Yelp data sets. After the characteristics of the commenter
reality are mostly unlabeled. Therefore, this paper proposes an were integrated, the recognition accuracy was increased to
MPINPUL (Mixing Population and Individual Nature PU
84.8%. Therefore, the behavior characteristics of reviewers
Learning) model based on multiple features to build a fake
reviews classification model. In this paper, the MPINPUL model have a significant impact on the identification of fake reviews.
is divided into four steps. Firstly, the constrained k-means That's the state of the art for researchers trying to
algorithm is proposed to calculate a negative example of trust. identify fake reviews. In view of the limitations of text
The advantage of constrained k-means is that it can expand the
features, this paper divides it into multiple behavioral features
set of positive examples while identifying the trusted negative
examples. Then, LDA and k-means are used to calculate from the perspective of reviewers for research. At the same
multiple representative samples for positive and negative time, this paper proposes to use the MPINPUL (Mixing
examples respectively. Then we use the idea of population and Population and Individual Nature PU Learning[8]) model to
individuality to determine the category label of the sample. solve the problem that fake comments are widely distributed
Finally, the classifier is established. Experimental results on real in reality, have huge orders of magnitude, are difficult to be
data sets show that the recognition rate of the MPINPUL model accurately marked manually, and have some misjudgment
proposed in this paper is higher than that of other single features samples.
under fusion feature conditions.
II. FEATURE CONSTRUCTION
Keywords—fake reviews, fusion feature, PU-Learning,
constrained k-means, classification model Feature engineering plays an important role in the
research of natural language processing. Due to the
I. INTRODUCTION concealment and diversity of fake reviews, feature selection
Fake reviews identification was first proposed by Jindal is particularly important. In this paper, in order to integrate
and Liu[1] in 2008. The difficulty of this research is how to multiple features for recognition, three major categories of
effectively extract or represent the features of the comment features including text, behavior and relationship are studied.
text and user behavior, so as to achieve the purpose of fake A. Text Features
reviews recognition. Although the fabricator tries to simulate
The features extracted in this paper include Unigram
the truth of the content as much as possible, there are some
lexical features, POS features and LDA semantic features.
verbal and behavioral details that may be flawed. The
The comment text feature indicators and feature name
researchers identified fake reviews based on different
research scenarios [2]. descriptions are shown in table ĉ:

Ott[3,4] et al. used support vector machine classifier based TABLE I. COMMENTARY TEXT CHARACTERISTICS
on word bag features to identify fake reviews on the gold data Feature Name Feature Description
set built by Amazon crowd-sourced platform with an Unigram N-gram Lexical features
accuracy rate of 84%. However, fake reviews deliberately POS POS features
imitate real comments in terms of language and vocabulary, LDA LDA thematic features
so the ability to identify fake reviews by word bag alone is
B. Behavioral Characteristics of Reviewers
not strong. Li [5] et al. also used part of speech characteristics
in Amazon data set and found that comments constructed by Compared with the real commenters, the behaviors of
crowd-sourcing presented different characteristics from real the false commenters are often abnormal. Therefore, the
comments in terms of part of speech characteristics. extraction of the behavior characteristics of the reviewers is
Crowdsourced fake reviews contained more verbs, adverbs helpful to accurately identify the fake reviews. The
and pronouns. while real reviews contained more nouns, behavioral characteristics of reviewers extracted in this paper
adjectives, prepositions, qualifiers and conjunctions. Lau[6] et are shown in table Ċ:
al. believed that there was mutual copy between fake reviews,

978-1-7281-2058-4/19/$31.00 ©2019 IEEE 2007


DOI 10.1109/HPCC/SmartCity/DSS.2019.00277
TABLE II. USER BEHAVIOR CHARACTERISTICS ķ Example set as a cluster of seed set;
Feature Name Feature Description
MCS Maximum text content similarity
ĸ Positive example mark as a Must-link constraint.
RL Comment text length
MDN Maximum number of comments per day Algorithm 1: reliable negative example extraction based on
PR Proportion of positive comments constrained k-means algorithm
MCS Commentator Scoring Deviation Input˖Positive example set P
Rating Score Unlabeled sample set U
Cluster number k
C. Relationship Characteristics
Output˖k disjoint clusters
The relationship between comment text, commenters Steps:
and merchants (commodities) is often not intuitive, so it 1. Initialize the clustering center
needs to be further explored. In this paper, FP-growth 2. Set the positive example as cluster 0 seed set and
algorithm is used to extract the commenters' frequency, and initialize the clustering center as:
collaborative filtering algorithm is used to analyze the target 1
= ;
similarity. Relationship feature names and descriptions are | |
∀ ∈
shown in table ċ : 3. Randomly select unlabeled samples to initialize
other clustering centers;
TABLE III. RELATIONSHIP CHARACTERISTICS 4. While clustering results are not changing
Feature Name Feature Description 5. The positive sample text remains unchanged and
FP Commenter frequency is allocated to cluster 0;
IS Target item similarity 6. The text in the U set is allocated to cluster i*ˈIf

III. BUIDE MPINPUL CLASSIFICATION MODEL = − , ∗ = 0,1, , … , − 1;
7. Modify the cluster centerˈ
Fake reviews recognition is regarded as a dichotomous 1
problem, and comments can be divided into true reviews and = , = 0,1, … , − 1;
fake reviews by classification model. The construction ∀ ∈
process of the classification model is as follows: 8. End while

Step1˖Prepare the data sets that can be entered into the B. Calculation of Representative Samples
model; A certain accuracy can be obtained by training the
Step2˖The data set is divided into training set and test classifier with the set of trusted negative examples RN and
set; positive examples P obtained in the first stage. However, it
ignores a large number of spy samples in the unlabeled data
Step3˖Using training set to train classification model set, resulting in poor performance of the classifier. However,
and output; the spy samples play an important role in improving the
performance of the classifier. In order to determine the
Step4˖The test set is used to evaluate the trained model.
category label of spy samples, it is necessary to find samples
Due to the large amount of unlabeled data obtained in that can represent positive and negative examples
real applications, it is difficult to manually annotate them. respectively. Therefore the first to use LDA algorithm to
Therefore, this paper uses MPINPUL model to learn a small achieve reliable negative set an RN distribution on different
number of marked samples and a large number of unmarked topics, and then use the K - Means clustering algorithm for
samples. The MPINPUL model in this paper refers to the set reliable cases, let trusted negative case theme distribution of
of real comments as the positive example set P, and a large the sample of relatively consistent category. Finally using
number of unlabeled data as the set U. Rocchio classifier of positive and negative cases respectively
calculate the 10 representative sample.
A. Extraction of Trusted Negative Examples Based on
Constrained K-Means Alogorithm C. Determine the Category Label for the Spy Sample
Based on semi-supervised clustering, this paper Determining the category label of the spy sample is the
proposes to use constrained k-means algorithm to extract most critical step in the MPINPUL model framework. The
reliable negative examples. The clustering process is purpose is to divide the set US into LP and LN. This step
influenced by users' needs, and the must-link constraint is requires 10 representative samples of positive and negative
used to guide the clustering process. The advantage of the samples to calculate the category label of the spy sample.
constrained k-means algorithm is that the positive example DPMM (Dirichlet Process Mixture Model) is adopted to
set is used to initialize the positive example center in the cluster the sample spies. The idea is: mixing spy sample
clustering process. The positive example marker was used as population sex and individuality, at the same time using the
the must-link constraint to conduct constrained clustering. It probability model for spy sample calculate the probability
not only marks the reliable negative examples, but also weights respectively belong to two categories. The
expands the positive examples. classification label error of the spy sample can be reduced to
a certain extent in order to train the classifier with higher
The constrained k-means algorithm proposed in this
accuracy. The steps are as follows: first, calculate the
paper is based on the following two points:
probability that a single sample belongs to positive example

2008
and negative example. When the two probabilities are close, to learn a better classifier. SVM optimization is shown in
the category tag of the subclass of the sample is used to formula (8):
determine the category tag of the sample. | | | |
1
The idea of population is that samples in the same ( , , ) = || || + + ( ( )
)
2
subclass should have a high probability of belonging to the
same category. Formula (1) and (2) are used to calculate the | | | |

probability weight of each sample in the subclass belonging + ( ( )


) +
to two categories.
| | ˄8˅
_ ( ) = (1)
| |
Here C1ˈC2ˈC3 and C4 are the parameters that control
| |
_ ( ) = (2) classification errors and hyperplane boundaries. ˈ ˈ ˈ
| |
are error tolerance values, which control the tightness of
|USi| represents the total number of samples in a subclass, the boundary. ( ( )) , ( ( ) ) can be considered
|nump| represents the number of instances in subclass USi that error tolerances with different weights. It should be pointed
are temporarily labeled as positive examples, |numn|
out that ( ( ) ) larger a can increase the influence of
represents the number of instances in subclass USi that are
parameter , making spy sample ( ) more inclined to the
temporarily labeled as negative.
positive example.
The idea of individuality is to ignore the subclass of the
By solving the Lagrangian, the dual problem of the
sample and focus only on the relationship between a single
original optimization problem is shown in formula (9):
sample and the representative samples of all positive and
| ∗| | ∗|
negative examples. Using formulas (3) and (4), the category 1 () ( ) () ( )
probability of a single positive and negative example is W(α) = − < , >
2
calculated. ,
()
∑ ( , ) s. t. ≥ ≥ 0, ∈
_ ( )=∑ ( , )
(3)
( ( , )) ( )
≥ ≥ 0, ∈ (9)
∑ ( , )
_ ( )=∑ ( , )
(4) | | | |
( ( , ))
− =0
D. Calculate the Probability Weight of the Spy Sample
In order to improve the performance of the classifier, a
probabilistic model is proposed to represent a spy sample. Here, < ( ) , ( ) > is the inner product of ( ) and
( )
( ) and ( ) respectively represent the probability . In order to achieve better performance, the kernel
function can be used to map the input features to a higher
weights of positive and negative examples of a single sample.
dimensional feature space, so as to solve the problem of
The probability model is shown in formula (5): inconsistent data distribution and make the data have a better
{e, ( (e), (e))} , (e) + (e) = 1 (5) representation in the new space. By solving the above
optimization problem, the value of w is first calculated, and
For a spy example, the probability weights that then the Karush-Kuhntucker condition is used to obtain the
ultimately fall into two categories can be calculated by value of b, and then the prediction is made. For a test
mixing population and individuality. As shown in formula (6) sample , if + > 0 , it is proved to be a positive
and (7): example and a true comment; otherwise, it is a false comment.
(e) = ∙ _ ( )+ IV. EXPERIMENT
(1 − ) ∙ _ ( ) (6) In this paper, a total of three groups of comparative
(e) = experiments were completed. In order to prove the help of
∙ _ ( )+
fusion features for fake reviews recognition, the accuracy of
(1 + ) ∙ _ ( ) (7) MPINPUL model under different features was compared. In
order to verify the validity of the MPINPUL model proposed
The ¬ parameter in the above formula can be used to
in this paper. Two main PU learning algorithms LELC [9] and
balance the global and local information, and different values SPUL[10] are implemented respectively. In addition, two
will be selected for experiments, in order to obtain the overall additional multicore learning algorithms SILP [11] and
optimal. SimpleMKL[12] are implemented in the fourth stage of the
E. Establish a Classifier MPINPUL model to compare with the improved SVM
In this section, the probability weight of positive classifier in the fourth stage. At the same time, the MPINPUL
model was compared with the latest work on the Yelp website
example set P, trusted negative example RN, positive
dataset, namely SVM and XGBoost[13].
example spy sample set LP and negative example spy sample
set LN is fused to expand the optimization function of SVM

2009
A. Experiment Data Set experimental data set, the data was divided into 9 training set
The data included 64,445 reviews from 99 restaurants and 1 test set by the sampling crossover method. The
on Yelp. Among them, 8,035 fake comments were filtered, proportion of positive examples in the training set was fixed
while 56,410 were real. In this chapter, all 8,035 false at 40%. Ten experiments were conducted in total, and
comments were selected and 56,410 real comments were different test sets were selected for each experiment. Each
randomly selected. A total of 16,070 comments constitute the review contains a review ID, user ID, review content, stars,
data set of the experiment in this chapter. After obtaining the date, "useful" count, whether or not it was filtered, and more.

Fig. 1. Partial Original Data Sets

B. Evaluation Index C. Analysis of Experimental Result


In this experiment, accuracy, precision, recall and F1 1) Extract the 5esults of 7rusted 1egative &ases for
values were used to evaluate the performance of the $nalysis
classification model. After carrying out the constrained k-means clustering,
the problem of selecting the mark proportion p value when
TABLE IV. PERFORMANCE EVALUATION LEAGUE TABLR the sample is marked is taken into consideration. When p is
Non-Fake Fake too large, more noise will be introduced, and when p value is
Predicted number of too small, the marked sample will be too small. Therefore, a
TN FN
true reviews simple search strategy is adopted in this paper, and the p value
Predicted number of fake changes from 1 to 30. The experimental results are shown in
FP TP
reviews
figure 1. It is not difficult to see from the figure that when the
1) AccuracyTake the proportion of true comments and proportion of positive examples is high, the difference of this
false comments with correct judgment in the total number of method under different p values is very obvious.
samples as the accuracy, as shown in formula (10) :
0.95
= (10)
0.9
2) PrecisionRefers to the correct number of true 0.85
comments in the sample judged to be true comments or false
comments, and the calculation formula is shown in formula 0.8
(11) and (12) : 0.75
( − )= (11) p=1
0.7
p=2
F1 Score

( )= (12) 0.65 p=4


0.6 p=6
3) Recall How many of the true or false comments in p=8
the data set are correctly predicted, as shown in formulas (13) 0.55 p=10
and (14) : p=20
0.5
( − )= (13) p=30
0.45
10% 20% 30% 40% 50% 60% 70% 80% 90%
( )= (14)
Percent of positive examples
4) F1 ScoreF1 Score refers to the harmonic average of
Fig. 2. Experimental Results with Different p Values
precision rate and recall rate, as shown in formula (15) :
× ×
1= (15)

2010
2) Results of MPINPUL Model under Different examples was taken as 40% for the experiment. Table Ⅴ
Characteristics shows the experimental results of MPINPUL model on Yelp
In the experiment of this chapter, the method of ten fold data set. It can be seen that the classification model trained on
crossing is adopted and ten experiments are carried out. fusion features is about 10% more accurate than that trained
Different test sets were selected each time, and the average of on text features. It fully proves that the behavior
the results of 10 experiments was taken as the final characteristics of commenters are helpful to the identification
experimental results. The value of parameter s was selected of false comments.
as 0.15, the value of a was 0.3, and the proportion of positive

TABLE V. CLASSIFICATION EXPERIMENTAL RESULT OF MPINPUL MODEL

Non-fake fake
Classification Accuracy
Features
model ˄%˅ Precision Recall F1 Precision Recall F1
(%) (%) (%) (%) (%) (%)
Unigram 77.31 77.89 78.25 78.07 78.93 76.15 77.52
POS 76.45 75.48 72.16 73.78 73.68 77.03 75.32
LDA 76.82 75.13 78.74 76.89 77.17 74.36 75.74
MPINPUL Behavioral 83.84 89.21 79.39 84.01 82.67 92.36 87.25
Behavioral+
82.65 89.68 75.68 82.09 78.14 92.07 84.53
Relational
fusion feature 87.51 89.92 83.67 86.68 86.32 90.36 88.29

3) Comparison of Experimental Results of Several PU model is MPINPUL>XGboost>SVM. It not only proves the
Learning Models importance of integrating text and behavior characteristics for
In order to verify the effectiveness of the algorithm false comment recognition, but also fully reflects the
proposed in this chapter, two additional mainstream PU effectiveness of MPINPUL classification model of mixed
learning models LELC and SPUL were implemented for population and individuality in fake reviews recognition.
comparison. In the fourth stage of MPINPUL model, two
additional multi-core learning algorithms, SILP and SVM XGBoost MPINPUL
90
SimpleMKL, were implemented to train the classifier and
compared with the improved SVM classifier. Firstly, the 85
comparison between the MPINPUL model proposed in this
80
chapter and the previous PU algorithm is discussed. Table Ⅵ
shows that the MPINPUL model designed in this chapter is 75
superior to the previous PU learning algorithm, and its 70
accuracy is as high as 87.51%. Based on the first three stages
of the MPINPUL model, the multi-core learning algorithms 65
SILP and SimpleMKL are trained in the fourth stage, and the
accuracy rate is 85.16% and 86.57% respectively, which is
also higher than the traditional PU learning algorithm LELC
and SPUL, fully proving the effectiveness of the MPINPUL
model designed in this chapter.

TABLE VI. COMPARISON OF EXPERIMENTAL RESULTS OF SEVERAL


PU LEARNING MODELS Fig. 3. Comparison of Experimental Results of Various Classification
Models
Accuracy Accuracy Accuracy
Algorithm
˄P=20%˅ ˅ ˄P=30%˅ ˅ ˄P=40%˅ ˅ V. CONCLUSION
LELC 80.54 82.19 84.35
SPUL 81.69 83.93 84.47 Aiming at the problem of single features of comment
SILP 82.67 84.23 85.16 text, this paper puts forward the fake reviews recognition by
SimpleMKL 83.21 84.83 86.57 integrating comment text and commenter behavior features.
MPINPUL 82.87 86.79 87.51
Considering the lack of a large number of annotated data in
the fake reviews recognition, a PU learning algorithm was
4) Comparison of Experimental Results of Classification proposed to identify the fake reviews text. Based on the
Model research of traditional PU learning algorithm, this paper
In order to verify the effectiveness of MPINPUL model proposes a PU learning model (MPINPUL) based on mixed
in the study of fake reviews recognition, this paper population and individuality. The MPINPUL model is
implemented two additional classification models, SVM and divided into four steps: extracting credible negative samples,
XGBoost. Figure  is a comparison diagram of the computing representative samples, determining the category
experimental results of SVM, XGboost and MPINPUL model. label of spy samples, and establishing the final classifier.
The proportion of positive examples is fixed at 40%. Figure Through three sets of comparative experiments, the
3 shows that, on the whole, fusion features are better than significance of the critic's behavior in the identification of
plain text features, and the performance of classification fake reviews and the feasibility and effectiveness of

2011
MPINPUL were confirmed from the two aspects of features modeling for online review spam detection[J]. ACM Transactions on
and classification model. Management Information Systems (TMIS) ,2012 (4).
[7] Mukherjee A, Venkataraman V,Liu B,et al.What Yelp Fake Review
Filter Might be Doing? Proceedings of the7th International AAAI
Conference on Weblogs and Social Media , 2013.
REFERENCES [8] Ren Ya-Feng, Ji Dong-Hong, Zhang Hong-Bin, et al. Deceptive
reviews detection based on positive and unalbeled Learning.Journal of
Computer Research and Development,2015,52(3):639-648(in
[1] Jindal N,Liu B.Opinion Spam and Analysis. Proceedings of the 2008 Chinese).
International Conference on Web Search and Data Mining , 2008.
[9] Li X L, Philip S Y, Liu B, et al. Positive unlabeled learning for data
[2] Li Lu-Yang, Qin Bing,Liu Ting. Survey on Fake Review Detection stream classification[C]. Proceeding of the 9th SIAM Int Conference
Research[J]. Chinese Journal of Computers, 2018(04)(in Chinese). on Data Ming. Philadelphia, PA: SIAM, 2009.
[3] Ott M, Cardie C, Hancock J T. Negative deceptive opinion spam. [10] Xiao Yanshan, Liu Bing, Yin Jie, et al.Similarity-based approach for
Proceedings of the Conference of the North American Chapter of the positive and unlabeled learning[C]. Proceeding of the 22nd Int Joint
Association for Computational Linguistics:Human Language Conference on Artificial Intelligence. San Francisco: Morgan
Technologies ,2013. Kaufmann,2011.
[4] Ott M,Choi Y,Cardie C, et al.Finding deceptive opinion spam by any [11] Gert L, Nello C, Peter B,et al.Learning the Kernel matrix with semi-
stretch of the imagination. Meeting of the Association for definite programming [C]. Journal of Machine Learning Research,
Computational Linguistics:Human Language Technologies,2011. 2004.
[5] Li J, Ott M, Cardie C, et al. Towards a General Rule for Identifying [12] Alain R, Framcis R B, Stephane C, et al.SimpleMKL[J]. Journal of
Deceptive Opinion Spam. Meeting of the Association for Machine Learning Research,2008.
Computational Linguistics ,2014.
[13] Chen T, Guestrin C. XGBoost: A Scalable Tree Boosting System[J].
[6] Raymond Y. K. Lau,S. Y. Liao,Ron Chi-Wai Kwok,Kaiquan Proceedings of the 22nd ACM Sigkdd International Conference on
Xu,Yunqing Xia,Yuefeng Li.Text mining and probabilistic language Knowledge Discovery and Data Mining. ACM, 2016:785-794.
.

2012

You might also like