Download as pdf or txt
Download as pdf or txt
You are on page 1of 34

Report On

Fake Product Review Monitoring

Submitted in partial fulfillment of the requirements of the Mini project in


Semester VI of Third Year Computer Science and Engineering [Data Science]
by
Mrigayan Ray(Roll No. 47)
Manaswi Vartak (Roll No. 61)
Sahil Kadam (Roll No. 66)

Mentor
Prof. Odilia Gonsalves

University of Mumbai
Vidyavardhini's College of Engineering & Technology
Department of Computer Science and Engineering [Data Science]

(A.Y.2023-24)
Vidyavardhini's College of Engineering & Technology
Department of Computer Science and Engineering [Data Science]

CERTIFICATE

This is to certify that the Mini Project entitled “Geofencing app in Android using
kotlin” is a bonafide work of Mrigayan Ray (Roll No. 47), Manaswi Vartak (Roll
No. 61), Sahil Kadam (Roll No. 66) submitted to the University of Mumbai in
partial fulfillment of the requirement for the award of the degree of “Bachelor of
Engineering” in Semester VI of Third Year “Computer Science and Engineering
[Data Science]”.

Prof. Odilia Gonsalves


Guide

Prof. Yogesh Pingle Dr. Vikas Gupta Dr. H.V. Vankudre


Head of Deputy Head of Department Principal
Department
Vidyavardhini's College of Engineering & Technology
Department of Computer Science and Engineering [Data Science]

Mini Project Approval

This Mini Project entitled “Fake Product Review Monitoring” is a bonafide work
of Mrigayan Ray (Roll No. 47), Manaswi Vartak (Roll No. 61), Sahil Kadam
(Roll No. 66) is approved for the degree of Bachelor of Engineering in Semester
VI of Third Year Computer Science and Engineering [Data Science].

Examiners

1………………………………………
(Internal Examiner Name & Sign)

2…………………………………………

(External Examiner name & Sign)

Date:

Place:
CONTENTS

Abstract I

Acknowledgements II

List of Figures III

1. Introduction 1

1.1 Problem Statement 2

1.2 Objective 2

1.3 Scope 3

2. Literature Survey 4

2.1 Survey of Existing System 4

2.2 Limitation in Existing system 10

3. Methodology 11

4. Proposed System 15

4.1 System Architecture 15

4.2 Details of Hardware & Software 16

4.3 Machine Learning techniques used. 17

5. Results 20

5.1 Output 20

5.2 Result Analysis 23


6. Conclusion 25

References 26
Abstract

In the recent time due to covid-19, online shopping has increased exponentially. In online shopping
websites reviews play an integral part in online shopping and booking systems. Showing the right
and accurate reviews to user is very important to gain user’s trust and help businesses grow.
Our review monitoring system makes 6 checks to check for the negative reviews in large datasets.
The six kinds of checks are -
1. Review which have dual view
2. Reviews in which same user promoting or demoting a brand
3. Reviews in which same IP address is promoting or demoting a brand
4. Reviews posted as flood by same user
5. Similar reviews posted in same time frame
6. Meaning less texts in reviews using LSA
The system makes use of :
Sentimental analysis is a technique that take natural language processing (NLP) into
consideration for determining the constructive, deconstructive, or neutral nature of material. In
order to assist businesses, textual data is regularly subjected to sentiment analysis.
Latent Semantic Analysis is a type of natural language processing(NLP) that use a qualitative
approach to determine the relationship between words and sentences in a document. LSA is tasked
with dealing with the types of problems as followed.
The cosine of the angle between two vectors projected in a multi-dimensional space is measured
by cosine similarity, which is a metric. The closer two vectors are to each other, the smaller the
angle between them.

This monitoring system can be used in shopping sites, hotel booking systems, etc to filter out the
negative reviews upto a certain accurate extent. Accuracy can be further increased by working on
the algorithms used and providing as much training data as possible.

I
Acknowledgement

We are highly indebted to our guide Prof. Odilia Gonsalves for giving us an opportunity to work
under her guidance. Like a true mentor, He motivated and inspired us through the entire duration
of our work. We express our gratitude towards Dr. Vikas Gupta, Head of Computer Science and
Engineering (Data Science) for his support throughout the Mini Project work.

We are also grateful to the Computer Science and Engineering (Data Science) for assisting us and
guiding us throughout the Mini Project tenure. We also extend our thanks to the supportive staff
of Computer Science and Engineering (Data Science) for providing us all the necessary facilities
to accomplish this Mini Project. Last but not the least, we express our profound gratitude to the
almighty god and our parents for their blessings and support without which this task could have
never been accomplished.

II
List of Figures

Sr. no. Figure Name Pg. No.


Fig 4.1 Architecture of the process 15
Fig 5.1.1 Landing Page 20
Fig 5.1.2 Fraudulent review sample 21
Fig 5.1.3 Fraudulent review detected 21
Fig 5.1.4 Genuine review sample 22
Fig 5.1.5 Genuine review detected 22
Fig 5.2.1 Initial Phase Graph 23
Fig 5.2.2 Mid Phase Graph 23
Fig 5.2.3 Pre-final Phase Graph 24
Fig 5.2.4 Final Phase Graph 24

III
Chapter 1
Introduction

In recent times, product reviews on online shopping sites perform a significant role in product sales
since people and organizations strive to learn all of the benefits and drawbacks of a product before
purchasing it because there are numerous options for the same thing, as there can be different
multiple manufacturers who manufacture the same type of product. There could be a variation in
the sellers who provide the product, or there could be a difference in the procedure that is followed
while making a purchase of the product, so the reviews can be directly linked with the product's
sales, and thus it is crucial for online services to filter out fake reviews since their own reputation
is at stake. Thus, we need a Fake Review Detection System is needed to discover any suspicious
reviews because it's impractical for them to manually check for every review linked with products.
So a technology is utilised to try to detect any tendency in the customer reviews.
Many people now days buy and sell products on various e commerce sites and online marketplaces
which is which is why demand is growing significantly in number. As a result very informatic
feedbacks from customers are also present in these popular sites to help users in examine all the
products they are looking forward to buy for their worth.
At one end it’s a very useful and powerful tool in hand but on the bitter side I can sometimes can
also lead to wrong decision for users as users can blast these review sections with fake and
uttermost opinions which can affect product image in good or bad ways. That’s why it needs to be
taken care of as it can be done by seller to rise the popularity of certain product or by opponent or
haters to downgrade image of the same product which is of great concerns.
More Commonly, the reviews can be categorized as forgery or authentic review. It’s a pattern
when it comes to fake review that reviews are same or more upgraded review for various products.
This replication can be separated into four broad categories –

1. Reviews which are posted as flood by same user in which all the reviews are either bad or good.
2. Reviews which are posted in a very large numbers by single person from same IP Address.
3. Reviews in which single user demoting or promoting a particular brand.
4. Reviews in which person with single IP Address demoting or promoting a particular brands
image.
1
The suggested system will save the effort and time by helping the customers and business
organizations identify spams from different perspectives and also help the users in purchasing the
right products which will eventually lead to an increase in the user’s trust towards the organization.
It is very important to deploy a robust and reliable detection algorithm to assure the genuineness of
reviews posted on a site. The amount and influence of online reviews is steadily expanding as the
Internet is growing in size and importance at a very fast rate.
Reviews can affect people in a very large number of industries in a great number of ways, but the
most major industry is e-commerce, where reviews and comments on services and products are often
the most easy and comfortable, including many ways for everyone buys to choose if we want to buy
a product or not.
Tasks of this project is performed in steps accordingly:
1. Email will be used to login for better verification purpose.
2. product features will be mined that have been commented on by customers.
3. Deciding if a comment is good or bad by getting opinion of sentences in every single review.
4. If opinions are fake then delete review.
5. Finally result will be review section with minimum fake reviews.

1.1 Problem Statement

The proliferation of e-commerce and online marketplaces has led to an exponential increase in
product reviews, making it challenging to discern genuine feedback from fraudulent ones. Fake
reviews, whether intended to boost a product's image or tarnish it, pose a significant threat to
consumer decision-making and the credibility of online platforms. The problem lies in the difficulty
of manually verifying each review for authenticity, leading to a proliferation of fake entries that
undermine the trustworthiness of the review system.

1.2 Objective

The primary objective of this project is to develop a Fake Review Detection System capable of
efficiently identifying and filtering out fake reviews. This system aims to minimize the influence of

2
fraudulent entries on product ratings and provide users with reliable information for making
informed purchasing decisions. By implementing advanced algorithms and techniques such as
sentiment analysis and pattern recognition, the system seeks to distinguish between genuine and fake
reviews, thereby enhancing the integrity and transparency of online review platform.

1.3 Scope

The Fake Review Detection System will focus on addressing the following key aspects:
1. Utilizing email login for enhanced user verification to prevent fake accounts.

2. Mining product features mentioned in customer reviews to extract valuable insights.

3. Employing sentiment analysis to determine the positivity or negativity of reviews.

4. Implementing algorithms to detect patterns indicative of fake reviews, such as flooding by the
same user, excessive reviews from a single IP address, or biased promotion or demotion of specific
brands.
5. Filtering out fake reviews to ensure the authenticity of the review section, thereby enhancing user

trust and facilitating informed purchasing decisions.


6. By systematically addressing these tasks, the proposed system aims to provide a reliable
mechanism for identifying and mitigating the impact of fake reviews on online platforms,
ultimately fostering a more trustworthy and transparent online shopping environment.

3
Chapter 2
Literature Survey

It is very strenuous to find a spam or factitious reviews. The unauthorized reviewers provide
ineligible reviews to the products in order to expand or reduce the sales in less amount of time.
There are following styles of reviews present in a shopping website:

2.1 Survey of Existing Systems:

Previously, some mechanisms or projects were designed to address the issue of bogus reviews. We
have studied them below :
1. Research paper 1 : Detecting Product Review Spammers using Rating Behaviors [1] :
Authors : Ee-Peng Lim, Viet-An Nguyen, Nitin Jindal, Bing Liu and Hady W. Lauw.
Year : 2010
When it comes to spotting spams or fake reviews, it's vital to look at the ratings' behaviour. There's
a potential that spammers would spam the product for irrational events, even if the product itself
is fine. In this case, the overall ratings or average ratings will come to the rescue.

Advantages:

a) Innovative Approach: Introduces a novel method for detecting review spammers based on user
behavior.
b) Effective Detection: Shows effectiveness in identifying spammers compared to a baseline
method.
c) Regression Model: Develops a regression model to predict spam votes using spamming
behaviors.

Disadvantages:

a) Limited Scope: Focuses primarily on rating behaviors for spam detection, potentially missing
other spamming indicators.
4
b) Evaluation Challenges: Faces difficulties in evaluating a large number of reviewers due to time
constraints.
c) Future Research: Leaves out content analysis for opinion extraction, which could enhance
spam detection accuracy.

2. Research paper 2 : Spotting Fake Reviewers using Product Review Graph [2]:
Authors : Zhuo Wang, Tingting Hou, Zhun Li, Dawei Song.
Year : 2015
A large number of reviews that are fraudulent and are made to the same product are considered
spam or fraudulent since they might be placed to degrade or belittle the product sales.

Advantages:
a) Innovative Approach: Introduces a new amplifier to improve the quality of detecting
suspicious spammers in online product reviews.
b) Comprehensive Model: Considers various factors such as reviewer behavior, review content
similarity, and review date entropy to enhance fake reviewer detection.
c) Algorithm Efficiency: The ICE algorithm iteratively updates scores for reviewers, reviews,
and products, providing a systematic approach to identifying spammers.

Disadvantages:
a) Complexity: The scoring functions and algorithms introduced may be complex for users
without a strong background in machine learning or graph-based analysis.
b) Empirical Parameters: The reliance on empirical parameters for weighting features and
controlling importance may require fine-tuning for different datasets.
c) Limited Evaluation: The paper mentions human evaluation but does not provide extensive
details on the evaluation methodology or results, potentially limiting the validation of the
proposed approach.

5
3. Research paper 3 : Manipulation of online reviews: An analysis of ratings, readability, and
sentiments [3]:
Authors : Nan Hu, Indranil Bose, Noi Sian Koh, Ling Liu.
Year : 2012.
Because spammers may easily change ratings in numeric or textual formats, it is critical to assess
reviews in graphic or picture format, which will determine the good and bad features of the
evaluations, as well as their legitimacy. Hence we will not be able to find it the review is authentic
or not.

Advantages:
a) Innovative Approach: Combines sentiment analysis and readability assessments for
manipulation detection in online reviews.
b) Practical Application: Provides a statistical method to identify manipulated reviews, offering
insights for consumers and marketers.
c) Relevance: Addresses a significant issue in online commerce, enhancing understanding of the
impact of manipulated reviews on consumer behaviour.

Disadvantages:
a) Limited Scope: Focuses primarily on textual analysis, potentially overlooking other forms of
manipulation.
b) Generalizability: Findings may not be universally applicable across all product categories or
platforms.
c) Ethical Considerations: Raises concerns about the ethical implications of manipulating online
reviews for marketing purposes.

4. Research paper 4 : A Review on fake product review detection and removal techniques [4]:
Authors : Rutuja B. Ardak, Prof. Girish S. Thakare.
Year: 2021
This paper explores the evolution of opinion mining, vital for understanding consumer sentiment
in the digital era. It underscores the need for reliable tools to detect fraudulent reviews in online
commerce, essential for authentic consumer feedback.

6
Beginning with the technological surge during 1942-1962, catalyzed by World War II, pioneers
like Norbert Wiener laid the groundwork for computational neuroscience with the first
mathematical model of the biological neuron in 1943. The resurgence of interest in opinion
mining circa 2010 stemmed from accessible vast data sets and efficient graphics card processors,
revolutionizing sentiment analysis.
Contrary to popular belief, opinion mining predates the internet and Web 2.0, with early studies
(1981-1996) focusing on text interpretation using simplistic algorithms. Progress in the late
1990s and early 2000s introduced annotation tools to enhance text processing.
Further phases in opinion mining's evolution emphasized interpretation and extraction, leading
to applications across industries by 2007. This marked a significant milestone, showcasing the
genuine spirit of opinion mining in politics, business, and beyond.
In summary, this paper provides a comprehensive overview of opinion mining's evolution,
tracing its historical roots, technological advancements, and practical applications in today's
digital landscape.

Advantages:

• Effective Spam Detection: The paper proposes a method using data mining to detect fake reviews,
providing valuable insights to users and manufacturers.
• User Verification: Implementing user verification helps in banning suspicious accounts, enhancing
the credibility of reviews.
• Future Research Directions: The paper suggests future research directions to improve spam
detection methods, contributing to ongoing advancements in the field.

Disadvantages:

• Complexity: Detecting fake reviews involves analyzing various aspects like content and reviewer
behavior, which can be challenging.
• Resource Intensive: Implementing the proposed system may require significant resources for data
mining and user verification processes.

7
• Ongoing Development: Continuous updates and improvements may be needed to adapt to
evolving spamming techniques and user behaviours.

A comparative study of the past research and the project is present in the following table:
1 Multiscale cascaded - Feature Extraction and Selection: The research
domain-based approach for investigates various feature extraction techniques,
Arabic fake reviews
including TF-IDF, Word2Vec, and GloVe, to
detection in e-commerce
platforms [7] (2024) represent textual data. Additionally, it explores feature
selection methods such as Chi-Square, Mutual
Information, and Information Gain to enhance the
performance of fake review detection models. These
techniques play a crucial role in identifying relevant
features and improving classification accuracy.
Deep Learning Models Evaluation: The study
evaluates the performance of various deep learning
models, including Bi-LSTM, Bi-GRU, CNN+Bi-
LSTM, and CNN+Bi-GRU, in detecting fake reviews. It
compares their accuracy, precision, recall, and F1-score
to determine the most effective model for Arabic fake
review detection. The findings provide insights into the
suitability of different architectures for this task.
2 A review of fake news - Arabic Fake Reviews Detection: The study
detection approaches: A introduces a comprehensive dataset called the
Arabic Fake Reviews Detection (AFRD) for the
critical analysis of relevant
purpose of detecting fake reviews in Arabic
studies and highlighting key across various domains such as hotels, restaurants,
challenges associated with and products. This addresses the scarcity of
research and datasets for the Arabic language in
the dataset, feature this field.
representation, and data - Multiscale Cascaded Domain-Based Approach:
The research explores the effectiveness of different
fusion
deep learning models like Bi-LSTM, Bi-GRU,
[6] (2023) CNN+Bi-LSTM, and CNN+Bi-GRU in a
cascading approach termed Multiscale Cascaded

8
domain-based (MCDB). This method aims to
enhance the performance of fake review detection
by transferring knowledge across different
domains, showing an improvement in accuracy by
2.09% to 7.8%.
3 Fake Product Review - Our project incorporate all the six steps in a well
Monitoring (2022)[5] organized manner so that the reviews at the end will
mostly be genuine.
4 A Review on fake product - Mentions the six ways to treat the fake reviews
review detection and
removal techniques [4]
(2021)
5 Online reviews manipulation - To find out the genuine review graphics analysis was
[3] (2015) done, but as each user does not post picture with review
this was not very effective.
6 Spotting a group of fake - Spotted group of spammers that are posting same
reviewers [2] (2012) reviews multiple time using same IP address
7 Detection of spams using - Checked rating behaviour of the product.
ratings behavior [1] (2010) Unnecessary bad or good reviews were eliminated

2.2 Limitations of the existing systems

The prevalence of fake product reviews in today's online marketplace poses significant challenges
for consumers and businesses alike. As consumers increasingly rely on reviews to make informed
purchasing decisions, the proliferation of fraudulent reviews threatens to undermine trust and lead
to wasted time and money. While efforts have been made to develop tools for detecting fake
reviews, existing systems still face several limitations that hinder their effectiveness.
Limitations in Existing Systems:
1. User Vulnerability to Fake Reviews: Despite the availability of review detection systems,
consumers remain vulnerable to fake reviews. The inability to discern genuine feedback
from fraudulent ones can result in misguided purchasing decisions, ultimately leading to
wasted time and financial resources.
9
2. Limitation in IP Address Detection: Many review detection systems rely on IP address
tracking to identify suspicious activity, such as multiple reviews originating from the same
source. However, sophisticated optimization teams can circumvent this by posting reviews
from different IP addresses, thereby evading detection and further complicating the review
authenticity assessment process.
3. Brand Manipulation: Brands with vested interests may employ resources to artificially
inflate the ratings of their products through fake reviews. Despite the efforts of review
detection systems, these manipulative tactics can skew consumer perceptions and distort
the authenticity of product ratings, undermining the integrity of the review ecosystem.
4. Multiple Reviews from Single Users: Another challenge arises from users creating multiple
accounts to post multiple reviews for the same product. This deceptive practice not only
misrepresents the true sentiment towards a product but also exacerbates the difficulty in
accurately identifying and filtering fake reviews.
While existing systems for detecting fake product reviews represent a step towards enhancing
consumer trust and mitigating fraudulent activities, they are not without their limitations.
Addressing these challenges requires ongoing research and innovation to develop more robust and
resilient review detection mechanisms. By acknowledging and addressing these limitations,
researchers and practitioners can work towards improving the reliability and integrity of online
product reviews, ultimately empowering consumers to make more informed purchasing decisions
in the digital marketplace.

10
Chapter 3
Methodology

In recent years, online reviews have become increasingly important in purchase decisions. Client
assessments may give a plethora of information about your product or service. These may not be
accurate and spammers or frauds will be able to fake it and manufacture phoney reviews if there
is a fictitious, or drop in, the quality of the products or services. Buyers will be misinformed as
just a consequence of spammers' actions, and they'll still make poor decisions all of the time. As a
result, detecting spam opinions is a significant issue. Spam is defined as the employment of
excessive and illegal means, such as the fabrication of a huge number of false opinions, whether
favourable or negative, in order to generate positive or negative reviews. A technique for detecting
false product reviews is based mostly on mining industry assessments, making product purchases
more reliable for our consumers.

Customer reviews on a product will be used as data to which procedures will be applied. Mining
method can be used to further break down the phoney review monitoring system and sift the false
reviews from the genuine ones. This is attributable to the fact that consumer feedback can provide
a wealth of information concerning your service or product. However, in order to improve the
image or reduce the quality of the products or services spammers can fake reviews. Customers will
be fooled as a consequence of spammers' activity, and they will all make poor decisions most of
the time. Therefore, the detection of a spam opinions, it is a serious problem. A methodology for
recognizing phony customer reviews is based primarily on mining industry assessments, making
product transactions more trustworthy for our customers.

The overall idea is to take product information from the dataset and extract review from there then
when have reviews with us in the form of text, will pass these review from a fake review detection
model and filter out fake and real reviews and will remove the fake review .For a fake review
detection system/model we can use artificial intelligence which will have a variety of methods to
detect content according to user requirements and key to all this is data.

11
Here reviews given by customers on a product will act as data on which will apply methods. To
further break down the fake review monitoring system for filtering the fake reviews from a genuine
one, can carry out a mining method. mining. Through comprehensive data analytics, mining has
improved corporate decision-making. the technique of extracting patterns and other useful data
from big data collections. There are many mining methods like:

1 Text Mining (has been widely used in knowledge-based organizations. The technique of
reviewing enormous amounts of documents in order to find new information is known as text
mining.)
2 Opinion mining and sentiment analysis (a technique of analysis that employs computer
linguistics and natural language processing to automatically identify and extract sentiments
and reviews from text (positive, negative, neutral, etc.)
3 Natural language processing or NLP is an acronym that stands in the broadest sense of the
word and is defined as the automated processing of natural language as a language test.

Here for mining fake reviews out of all can use sentiment analysis where the most commonly used
method of NLP is sentiment analysis. Sentiment analysis is most useful in cases such as this, as
the client, advisory, research, reviews, and comments on the social networks, where people can
express their opinions and feedback. The easiest way to do it is to analyze sentiment in order to
make get sentiment like a positive/negative/neutral). In more complex cases, as the result of a
numerical score, which can be divided into as many categories as you want. In the case of our
sample text, and the customer can clearly articulate the feelings in different parts of the text.
Because of this, the output is not very useful. Instead, we can find the meaning of a sentence and
to separate the positive and negative parts of the review. The Sentiment is from the review it can
also help you choose the best of the positive and negative portions of the review.

6 Steps for filtering out fake reviews.

1. Review which have dual review:


Dual reviews refer to those reviews which have different sentiments in their heading and body
which means that the heading is positive regarding the product and the body is negative or vice

12
versa. These reviews are filtered out by matching the sentiment of the heading and the body of the
review. If the sentiment is not found equal, such reviews are marked as fake.

2. Reviews in which same user is promoting or demoting a brand:


In this step, reviews on a certain brand product are collected through the user id of the person and
then checked for multiple reviews.

3. Reviews in which same IP address is promoting or demoting a brand:


There are cases when the optimization team try to bombard the review section from different IP
addresses to defame other brands. In this step we group reviews based on the same IP address and
mark them as fake for a particular product.

4. Reviews posted as flood by same user:


This step is quite simple. Here we handle cases where a particular user posts multiple reviews on
the same product which is not at all required. These steps are done intentionally to defame a
product. Such reviews are marked as fake by the system.

5. Similar reviews posted in same time frame:


Sometimes people use bots which generate multiple reviews and flood the review section. These
reviews can vary in the text. So here we have used a time frame method where reviews posted in
a short time frame by a particular user on same or different products are marked as fake.

6. Meaningless text in reviews using LSA:


The last step is to handle meaningless reviews. Some users tend to write their own meaningless
story which has nothing in relation with the product. To filter out such reviews we have used latent
semantic analysis.

The following steps will be taken to complete the task:


1. The customer's e-mail address will be confirmed during login.
2. Gathering feedback on product features that have been mentioned by customers.

13
3. Determine if each comment is positive or negative by identifying opinion sentences in each
review.
4. If fraudulent reviews are discovered while offering opinions, a warning will be sent, and the
review will be designated as a phony review.
5. Summarizing the findings by review section, with the least number of bogus reviews
possible.

14
Chapter 4
Proposed System

In today's era of online shopping and e-commerce, online reviews play a quite important role in
decision-making. Customers, for example, check product or store reviews before determining what
to buy, where to buy it, and whether to buy it. Because there are monetary incentives provided to
produce false/ fraudulent reviews, there has been a major surge in difficult opinion spam on online
review websites. In essence, an untruthful review is a phoney, fraudulent, or opinion spam review.
Positive reviews and ratings on a specific product can attract more customers and increase sales;
bad evaluations might reduce demand and sales. In recent years, fake review detection has received
a lot of attention.

4.1 System Architecture

Fig 4.1 Architecture of the process

15
4.2 Details of Hardware & Software:
1. Hardware :
• Processor – intel core i5
• RAM 8.0 - 16.0 GB
• 1 TB Hard disk Storage Drive (minimum 256 GB required).
• 256 GB Solid State Drive
• 64-bit operating system

2. Software:
• Programming Language: Python
• Windows 11
• Jupyter Notebook

The fake product review monitoring system collects reviews from various users and detects
fraudulent reviews using sentiment analysis/opinion mining and content-similarity approaches,
assisting the user in purchasing the proper products based on genuine customer reviews. The
method will also assist in spotting customers who write repeated reviews with the intent of harming
the brand or company's reputation. This model will help us to detect the fake reviews and treat
them

The system consists of the:


Frontend – The frontend part of the system will consist of a website where users will be able to
create their google account and will be authenticated through Google Auth. On successful login of
a particular user, the website will show the various products available. On the purchase of the
particular product, the user will be able to submit his/her review about the respective product.

Backend – The backend part of the system will be responsible for storing the reviews of the
particular product along with the details of the user who made the purchase of the respective
product.

16
Review Monitoring System- The review monitoring system will be responsible for detecting the
type of review whether positive or negative through sentiment analysis (opinion mining), content
similarity to detect the multiple reviews given by the same user to detect the frauds.

4.3 Machine Learning Techniques Used

1. Cosine Similarity
Cosine similarity is a metric that computes the cosine of the angle formed by two vectors projected
in three dimensions. Two vectors are similar to each other if the angle between the two vectors is
smaller. If the angle between two vectors is 90 degrees, the cosine similarity will have a value of
0; this means that the two vectors are perpendicular to each other which means they are not
correlated. The angle between vectors A and B decreases as the cosine similarity measurement
approaches 1. In this scenario, it is possible to say that A and B are more similar.
Cosine similarity can be described mathematically as the division between the dot product of
vectors and the product of the Euclidean norms or magnitude of each vector.
Cos(𝜃) lies in the range [−1,1] :
−1 - indicates highly opposite vectors i.e. no similarity
1 - indicates independent vectors
0 - indicates a high similarity between the vectors
Some applications of cosine similarity are:
In processes of data mining, information retrieval, and text matching
Is used in a recommendation engine to recommend similar entities such as books, clothes, etc
The cosine-similarity based locality-sensitive hashing technique increases the speed for matching
DNA sequence data.

2. Latent Semantic Analysis


Latent Semantic Analysis (LSA) is a natural language processing method that use statistical
techniques to find associations between words in a document. It deals with issues such as:
For example, mobile, phone, cell phone, and telephone are all comparable, but if we ask "The cell
phone has been ringing," only documents with the word "cell phone" are returned, but documents
with the words "mobile, phone, and telephone are not returned.

17
LSA makes the following two assumptions:
1. Words used in the same context are comparable to one another.
2. The ambiguity of the terms used obscures the data's latent semantic structure.
Language is more than just the words on the page in front of you. When you read a text, your mind
creates images and ideas in your head. Themes develop after reading a large number of works,
even if they are never mentioned directly. Our ability to comprehend and absorb language defies
a mathematical description (for the moment). It is one of the most widely used Natural Language
Processing (NLP) strategies for quantitatively determining text topics.

3. Sentimental Analysis:
For sentimental analysis use of certain algorithms will be done. Also focus on which classifier
has the most accuracy. To identify similar material from our dataset for redundant data, we used a
content similarity technique. Content similarity is a statistic which can be used to evaluate how
similar data objects are, regardless of their size. Cosine Similarity is a Python function that
measures the similarity of two sentences. Data items in a dataset are handled as a vector in cosine
similarity, which has the following benefits: Even if the two similar data objects are separated by
the Euclidean distance due to their size, the angle between them could be smaller. The greater the
similarity, the smaller the angle. The cosine similarity captures the orientation (angle) of the data
items when plotted on a multi-dimensional space, not the magnitude.

3.4 Advantages of proposed system:


• Users get maximum genuine reviews about the product.
• Users can post their own review about the product.
• User can save their time and money by shopping through websites which have genuine
reviews of customers.
• Fraudulent cases can be detected and taken care of.
Websites such as Amazon, Flipkart, Myntra, etc. sell numerous products. It is very usual to
sometimes see that a product has multiple reviews which appear to be same. This in turn creates a
bad user experience while shopping products and destroys the usefulness of the reviews. The
existing system can be integrated with such websites to create a better user experience.

18
Chapter 5
Result

5.1 Output :

Fig 5.1.1 Landing Page

19
Fig 5.1.2 Fradulent review sample

Fig 5.1.3 Fraudulent review detected


20
Fig 5.1.4 Genuine review sample

Fig 5.1.5 Genuine review added

21
5.2 Result Analysis
Model Accuracy at various Stages for data set :
• x-axis- probability of positive outcome
• y-axis- Accuracy

Fig 5.2.1 Initial Phase (Graph 1)

Fig 5.2.2 Mid Phase (Graph 2)

22
Fig 5.2.3 Pre-Final Phase (Graph 3)

Fig 5.2.4 Final Phase (Graph 4)

23
Chapter 6
Conclusion

Detecting opinion spam from large amounts of unstructured data has become a significant research
challenge as a result of this study. Although various algorithms have been utilised in opinion review
analysis and have yielded positive results, no particular algorithm can address all of the obstacles
and difficulties that today's systems faces. Our programme will assist the user in purchasing the
appropriate product without falling into the trap of any scams. For genuine ratings, people can
acquire a report on Fake Product Review Monitoring & Removal. Our application will analyse the
data and then post real product reviews. Also, the consumer can be certain upto a certain extent
that the products are available with genuine reviews.

Our main objective is to develop a system that can detect spam and duplicated reviews and filter
them out, providing users with reliable information regarding the product. Our project's goal is to
improve customer satisfaction while also making online buying more secure. By using opinion
mining techniques and constructing a word dictionary, the project will be able to detect false
reviews.

It is feasible to enhance the algorithm used to calculate review sentiment scores.. It is possible to
update our sentiment word dictionary. Possibility of adding more terms to our lexicon and updating
the weights assigned to those words in order to obtain a more accurate review score.

24
References

[1] Ee-Peng Lim, Viet-An Nguyen, Nitin Jindal, Bing Liu and Hady W. Lauw, “ Detecting

Product Review Spammers using Rating Behaviors”, 2010 (School of Information Systems

Singapore Management University) CIKM’10, October 26–30, 2010, Toronto, Ontario,

Canada.

[2] Zhou Wang “Spotting Fake Reviewers using Product Review Graph”, 2015 (School

of Information Science and Engineering, Shenyang Ligong University, Shenyang)

[3] Nan Hu a , Indranil Bose b, Noi Sian Koh c and Ling Liu, “Manipulation of online reviews:

An analysis of ratings, readability, and sentiments”, 2012, Decision Support Systems 52

(2012) 674–684, doi:10.1016/j.dss.2011.11.002.

[4] Rutuja B. Ardak, Prof. Girish S.Thakre , “Fake Product Review Monitoring”, 2022, 2021

IJCRT | Volume 9, Issue 8 August 2021 | ISSN: 2320-2882.

[5] Suhaib Kh Hamed a, Mohd Juzaiddin Ab Aziz, Mohd Ridzwan Yaakub, “A review of fake

news detection approaches: A critical analysis of relevant studies and highlighting key

challenges associated with the dataset, feature representation, and data fusion” 2023, Heliyon

9 (2023) e20382, https://doi.org/10.1016/j.heliyon.2023.e20382.

[6] Nour Qandos, Ghadir Hamad, Maitha Alharbi, Shatha Alturki, Waad Alharbi and Arwa A.
Albelaihi , “Multiscale cascaded domain-based approach for Arabic fake reviews detection in

25
e-commerce platforms” 2024, Journal of King Saud University - Computer and Information
Sciences.

[7] https://towardsdatascience.com/sentiment-analysis-concept-analysis- andapplications-

6c94d6f58c17

[8] https://www.aviso.com/blog/sentiment-analysis-using-nlp/

[9] https://kavita-ganesan.com/what-is-text- 5 similarity/

[10] http://www.cs.wisc.edu/niagara/data/.

[11] https://www.kernix.com/article/similarity-measure-of-textual-documents/

26

You might also like