Professional Documents
Culture Documents
Fake Product Review Detection and Elimination Using Opinion Mining
Fake Product Review Detection and Elimination Using Opinion Mining
Fake Product Review Detection and Elimination Using Opinion Mining
Department of Computer Science and Engineering, R.M.K. Engineering College, Tamil Nadu, India
atv.cse@rmkec.ac.in, prt.cse@rmkec.ac.in, jje.cse@rmkec.ac.in, sneh19407.cs@rmkec.ac.in, shre19405.cs@rmkec.ac.in,
yuva19441.cs@rmkec.ac.in
Abstract - Identification and elimination of fake reviews and this method can be used to catch spammers who didn't use the
their removal from the dataset provided using the supervised product. In order to falsely filter reviews of the product and give
machine learning algorithm and natural language processing it a high rating, spam reviews or the use of different customer ids
techniques based on a vast variety of aspects. In this proposed may be used. This can be filtered by looking at how often words
paper, we trained the counterfeit review dataset by the process like "awesome," "so good," "fantastic," etc. are used. This
of using two independently developed machine learning encourages us to create a system that uses a review's text and
algorithm models for assessing the extent to which the rating information to identify fake customer reviews of a product.
information being provided is real. The counterfeit product The credibility grading and evaluation for a fraudulent
evaluations can be found on numerous online retailers are
review will be calculated utilizing machine learning models. By
mostly influencing the customers to buy those products and
profit for those products is probably dependent on the reviews
deriving topics and convictions from online reviews, a
of those products. Hence these counterfeit reviews must be computerized system could be used to monitor consumer
noticed so that large E-commerce companies like Meesho, analyses. It might additionally block out fraudulent critiques.
Amazon, Flipkart, Nykaa, etc. can address this issue so that Hence this issue of fake review identification and removal
fraudsters and fraudulent critics are taken out, sustaining takes a lot of data to train and be effective, along with additional
users' credibility in shopping sites. This approach may be subject knowledge such as the sarcastic sentences users employ
utilized for websites and apps with relatively few consumers,
to convey their displeasure of the product. In some cases, what is
estimating the authenticity of reviews so that online businesses
being reviewed may be good, but the method of distribution or
can respond to them suitably. This model is developed using
Naïve Bayes, Support Vector Machine,and TF-IDF (term
shipping may not be, which impacts the review classification. As
frequency-inverse document frequency )Vectorizer. To detect contrasted with classification errors as an adverse rating as in the
spam reviews on a website or application instantly, one can evaluation of sentiment, currently an NLP method has been
make use of these models. However, effectively countering utilized for detecting such reviews. The preliminary processing
spammers requires a sophisticated model that has to undergo of data is utilized to eradicate irrelevant or old evaluations of
training on a large dataset of millions of reviews. In this work ” products. Because the number of users on these
Reviews of 20 Hotels in Chicago hotel dataset” a limited websites/applications is growing daily,Evaluation of sentiment is
dataset is utilized to train the models on a small scale, but it used by businesses like Twitter, WhatsApp, and Facebook for
can be expanded to achieve greater accuracy and authenticity recognizing fake news and harmful or disparaging messages and
in the reviews. to ban perpetrators. The primary objective of this research project
is to build a platform for internet shopping where users may
Index Terms – Opinion Mining, Data Preprocessing, develop confidence in a system where the goods they purchase
Supervised Machine Learning Algorithm. are real and consumer testimonials are precise and often
validated by the company themselves. In addition, businesses in
I. INTRODUCTION
the e-commerce (Walmart, Amazon), logistics, travel, job search
The trend of people giving reviews for the product they (LinkedIn, Glassdoor, Indeed), and food (Shopsy, Swiggy,
are buying online has become a day-to-day activity Zomato) sectors use algorithms to combat Fraudulent who trick
nowadays. Based on the feedback consumers are buying customers into purchasing subpar goods and services by posting
products through various e-commerce websites. But when false reviews. Users shouldn't worry about such fake users
the reviews given by the critics are counterfeit there is no considering that they will be apprised of scammers like "not
way that the consumers would not know the authenticity of verified listings." Instruction Manual labeling of the reviews
the reviews provided by the critics to the customers. So consumes a lot of effort and is less efficient. As a result, the
consumers are being manipulated to buy a product that is not evaluations are assigned labels using an algorithm for supervised
trustworthy product. The task is straightforward but time- learning, and the designation then appears to be untenable. The
consuming because each review must be read and marked as Naïve Bayes,SVM,and TF-IDF vectorizer methods have been
a counterfeit or ambiguous category to identify the issue’s utilized to identify and remove fake reviews. The fake review
true cause. By teaching a machine learning model that deals identification problem is addressed fairly and helps consumers to
with the review section to flag a specific review as genuine view authenticated reviews.
or spam, this issue can be solved. The intriguing part is that
2
Authorized licensed use limited to: China University of Petroleum. Downloaded on April 10,2024 at 07:35:52 UTC from IEEE Xplore. Restrictions apply.
Calculated definition:
݀ܿǤ ܿݐ݊ݑ
ܶܨሺݓሻ ൌ
ݐ݊݁݉ݑ̴݄ܿ݀݁ݐ̴̴݊݅ݏ݀ݎݓ̴݈ܽݐݐ
3
Authorized licensed use limited to: China University of Petroleum. Downloaded on April 10,2024 at 07:35:52 UTC from IEEE Xplore. Restrictions apply.
Support Vector Machine (SVM) is a commonly used The spread of erroneous information seriously harms users
supervised learning approach that is utilized for classification and the social environment. It is difficult to spot a false review in
and regression problems. Although it can be used for both, it the first place because it is meant to deceive the user. Many
is primarily used in Machine Learning Classification different avenues are used to spread false information, which
problems. SVM algorithm aims to identify the optimal line disturbs society and the lives of its residents. Further
or decision boundary, known as a hyperplane, that can divide improvements would include identifying the source of the
n-dimensional space into classes so that new data points can erroneous information and stopping its spread on social media
be categorized quickly in the future. The SVM algorithm and online platforms. It would also be able to find and pinpoint
chooses the extreme points and vectors of the hyperplane. the sources of misleading information in order to stop those who
The name "support vector machine" comes from these are seeking to deceive the public. Also, they would track down
exceptional circumstances, which are described by support the social media profiles of those spreading rumors and false
vectors. Below is an example of how a decision boundary or information so they could halt them before they spread.
hyperplane can be used to separate two different categories:
V. CONCLUSION
Linear SVM classifier is employed for data that can be
separated by a single straight line into two classes, while In this proposed work, independently working machine
non-linear data and non-linear SVM classifier refer to data learning algorithm models was developed for assessing the
that cannot be classified using a straight line.The terms "non- reviews of the products. The propsed model was developed using
linear data" and "non-linear SVM classifier" refer to data that Naïve Bayes, Support Vector Machine,and TF-IDF (term
cannot be categorized using a straight line. For non-linearly frequency-inverse document frequency )Vectorizer. The
separated
p data,, non-linear SVM is utilized. proposed model efficiently detected the spam reviews on a
website or application instantly. The proposed work was tested
using” Reviews of 20 Hotels in Chicago hotel dataset” and
achieved greater accuracy and authenticity in the reviews.
REFERENCES
[1] D. F. Murad, Y. Heryadi, B. D. Wijanarko, S. M. Isa and W. Budiharto,
"Recommendation System for Smart LMS Using Machine Learning: A
Literature Review," 2018 International Conference on Computing,
Engineering, and Design (ICCED), Bangkok, Thailand, 2018, pp. 113-118,
doi: 10.1109/ICCED.2018.00031.
[2] S. M. Anas and S. Kumari, "Opinion Mining based Fake Product review
Monitoring and Removal System," 2021 6th International Conference on
Inventive Computation Technologies (ICICT), Coimbatore, India, 2021,
pp. 985-988, doi: 10.1109/ICICT50816.2021.9358716.
[3] Jain, Piyush &Chheda, Karan & Lade, Mihir. (2019). Fake Product Review
Monitoring System. International Journal of Trend in Scientific Research
and Development. Volume-3. 105-107. 10.31142/ijtsrd21644.
[4] Wahyuni, Eka &Djunaidy, Arif. (2016). Fake Review Detection From a
Product Review Using Modified Method of Iterative Computation
Framework. MATEC Web of Conferences. 58. 03003.
Fig. 3. Statistical analysis of review 10.1051/matecconf/20165803003.
[5] Kashid, Aishwarya & Lalwani, Ankita &Gaikawad, Saniksha& Patil,
IV. RESULTS & DISCUSSION Rajal&Sonkamble, Rahul & More, Shivaprasad. (2021). Fake Review
Detection System Using Machine Learning.
The identification of fake reviews has become [6] R. Patel and P. Thakkar, "Opinion Spam Detection Using Feature
increasingly common on websites and social media Selection," 2014 International Conference on Computational Intelligence
networks. To address this issue, our team utilized text and Communication Networks, Bhopal, India, 2014, pp. 560-564, doi:
processing and Naive Bayes to develop a model that can 10.1109/CICN.2014.127.
detect fake reviews. By leveraging machine learning tools, [7] Saumya, S., Singh, J.P. Detection of spam reviews: a sentiment analysis
we were able to classify news as fake or not fake in a shorter approach. CSIT 6, 137–148 (2018). https://doi.org/10.1007/s40012-018-
0193-0
amount of time by drawing upon prior data set values. This
[8] N. Sodera and A. Kumar, "Open problems in recommender systems
provides users with a greater sense of trust in reviews that diversity," 2017 International Conference on Computing, Communication
appear on social media and other sources. and Automation (ICCCA), Greater Noida, India, 2017, pp. 82-87, doi:
10.1109/CCAA.2017.8229776.
TABLE I. SUMMARY OF THE DATASET [9] Ata-Ur-Rehman et al., "Intelligent Interface for Fake Product Review
Monitoring and Removal," 2019 16th International Conference on
Total number of reviews 5853 reviews Electrical Engineering, Computing Science and Automatic Control (CCE),
Number of fake reviews 1144 reviews Mexico City, Mexico, 2019, pp. 1-6, doi: 10.1109/ICEEE.2019.8884529.
Number of real reviews 4709 reviews
[10] Mukherjee, A., Venkataraman, V., Liu, B., & Glance, N. (2021). What
Number of distinct reviews 102739 words Yelp Fake Review Filter Might Be Doing?. Proceedings of the
Total number of tokens 103052 tokens International AAAI Conference on Web and Social Media, 7(1), 409-418.
The maximum review length 875 words
[11] 2015. Proceedings of the 21th ACM SIGKDD International Conference on
The minimum review length 4 words Knowledge Discovery and Data Mining. Association for Computing
The average review length 439.5 words Machinery, New York, NY, USA.
4
Authorized licensed use limited to: China University of Petroleum. Downloaded on April 10,2024 at 07:35:52 UTC from IEEE Xplore. Restrictions apply.
[12] A. Sihombing and A. C. M. Fong, "Fake Review Detection on Yelp
Dataset Using Classification Techniques in Machine Learning," 2019
International Conference on contemporary Computing and
Informatics (IC3I), Singapore, 2019, pp. 64-68, doi:
10.1109/IC3I46837.2019.9055644
[13] A. Prabhat and V. Khullar, "Sentiment classification on big data using
Naïve bayes and logistic regression," 2017 International Conference
on Computer Communication and Informatics (ICCCI), Coimbatore,
India, 2017, pp. 1-5, doi: 10.1109/ICCCI.2017.8117734.
5
Authorized licensed use limited to: China University of Petroleum. Downloaded on April 10,2024 at 07:35:52 UTC from IEEE Xplore. Restrictions apply.