1 s2.0 S0020025521004485 Main

Information Sciences 570 (2021) 623–637
Contents lists available at ScienceDirect
Information Sciences
journal homepage: www.elsevier.com/locate/ins
An effective and efficient fuzzy approach for managing natural

noise in recommender systems
Pengyu Wang a, Yong Wang a,b,⇑, Leo Yu Zhang c, Hong Zhu d
a
Key Laboratory of Electronic and Logistics of Chongqing, Chongqing of Posts and Telecommunications, Chongqing 400065, China
b
Guangxi Key Laboratory of Cryptography and Information Security, Guilin University of Electronic Technology, Guilin 541004, China
c
School of Information Technology, Deakin University, Victoria 3216, Australia
d
School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China
a r t i c l e i n f o a b s t r a c t
Article history: A high-quality recommender system (RS) can effectively alleviate information overload by
Received 9 May 2020 producing recommendations. The quality of the recommender system not only depends on
Received in revised form 28 April 2021 the recommendation algorithm but also on the quality of collected data. Since users are
Accepted 1 May 2021
often affected by environmental and accidental factors during the rating process, natural
Available online 11 May 2021
noise is probably brought into the data of RS by non-malicious users, which will lead to
deviations in prediction results. In this paper, we propose a scheme based on fuzzy theory
Keywords:
to manage the natural noise in RS. We first classify the ratings into three fuzzy categories
Noise detection
Recommender system
with variable boundaries. Then, the fuzzy profiles of users and items are built to detect the
Natural noise natural noise in ratings. Finally, once the noisy ratings are detected, we replace them with
Fuzzy set the rating threshold values according to the Maximum membership principle. The pro-
Collaborative filtering posed scheme is tested in two benchmark datasets and experimental results verify that
the scheme can significantly improve the recommendation quality and has higher effi-
ciency than the schemes based on re-predication.
Ó 2021 Elsevier Inc. All rights reserved.
1. Introduction
The development of the Internet leads to the rapid growth of data, which increases the burden of users on selecting
desired products or services. This problem is known as information overload [1]. To alleviate information overload, recom-
mender systems (RSs) are proposed and have been wildly used in different areas, such as data-driven service [2], green hotels
selection [3], and clustering [4]. In recent years, quite a few recommendation algorithms are proposed to improve the quality
of recommender systems. Nevertheless, few studies have focused on the impact of data quality on recommendation results.
As reported by the existing studies [5,6,7], due to deliberate or accidental factors, the basic data of RSs will inevitably con-
tains some noise. The noise in data will affect the performance of recommendation algorithms.
In RS, noise can generally be divided into malicious noise and natural noise. Malicious noise, produced by malicious users,
may bring commercial advantage for them [5]. They deliberately inject some malicious data into RS to mislead the recom-
mendation results. Fortunately, some techniques are presented to resist the attack of malicious noise [6]. However, the
problem of natural noise in RS is often neglected in the literature since it is not easy to be detected [7]. Natural
⁎ Corresponding author at: Key Laboratory of Electronic and Logistics of Chongqing, Chongqing of Posts and Telecommunications, Chongqing 400065,
China.
E-mail address: wangyong_cqupt@163.com (Y. Wang).
https://doi.org/10.1016/j.ins.2021.05.002
0020-0255/Ó 2021 Elsevier Inc. All rights reserved.
P. Wang, Y. Wang, L. Yu Zhang et al. Information Sciences 570 (2021) 623–637
noises are produced by non-malicious users, due to some accidental factors, and these noises also bias the recommendation
results.
To alleviate the influence of natural noise, some additional actions are proposed to re-evaluate the rating values. In [8], RS
requires users to rate the same item at least twice at different time slots. Then, the RS re-evaluates and compares these rat-
ings of the same item given by the same user at different time slots to judge whether they are consistent. If they are not,
these ratings are regarded as noisy. By requiring users to rate the same item multiple times, the influence of accidental fac-
tors in the rating process can be effectively alleviated. In [9], an interactive system is built to provide users with some guid-
ance in time, and even some experts are invited to examine the historical ratings and judge whether natural noises exist in
the ratings. Although these methods can effectively detect natural noise, they need more labor costs and their working pro-
cess is too complicated to be widely adopted by RS.
In addition, some researchers have proposed noise correction methods that only depend on the features of rating val-
ues. Since these methods directly deal with the noise in the rating data and do not require additional resources, they are
easy to be deployed in RS. Among them, classification-based [10] and fuzzy-based methods [11] are the two main cat-
egories. Classification-based methods classify ratings into three categories (low, medium, and high ratings) according to
the boundary between low and medium ratings, and the boundary between medium and high ratings. However, this
kind of method simply divides the rating scale into three equal parts, which cannot fully describe the differences
between the rating categories. Fuzzy-based methods use a fuzzy set to further describe to what degree that each rating
value belongs to a category (i.e., low, medium, or high rating), which is more reasonable than the classification-based
methods. However, the membership functions in the fuzzy-based schemes may cause different rating values having
the same fuzzy profiles [11]. After detecting noisy ratings, some methods remove the noising ratings directly, and some
methods correct noisy ratings with average rating values. However, these methods may lead to some new problems,
such as data sparsity and correcting noise insufficiently. The main ideas and drawbacks of the existing schemes are
summarized in Table 1.
According to Table 1, the main challenges of detecting and correcting natural noises can be summarized as follows:
How to design a rating classification method, which can not only quickly classify ratings based on their features, but also
effectively distinguish the differences between different rating values.
How to design a method to correct the noisy ratings afte detection, i.e., effectively remove the noise contained in ratings,
neither insufficiently nor excessively.
To deal with the above challenges, we devise a new membership function of rating categories with variable boundary
values, which provide a good basis to describe the fuzzy characteristics of users and items. Then, we propose a fuzzy-
based method to detect and correct natural noise. Since we have well-controlled the degree of noise detection by introducing
a control parameter, the natural noises in ratings can be found effectively. Finally, the noisy ratings are corrected according
to the Maximum membership principle, which can remove noise sufficiently and guarantee the corrected value in a reasonable
range. The main contributions of this paper are summarized as follows:
We elaborately devise a membership function of rating categories to classify ratings, in which the boundary between low
and medium ratings and the boundary between medium and high ratings are not fixed but regarded as two variables
drawn from a certain normal distribution. Compared with the schemes using fixed boundary values, the proposed mem-
bership function can not only distinguish the differences of rating values but also build the fuzzy profiles of users and
items more reasonably.
In the noise detection process, a new fuzzy-based method is presented based on the fuzzy profiles of users and items. In
our method, a threshold value d is determined according to the characteristics of the whole dataset, which is used to con-
trol the degree of detection. It is an effective way to ensure that the degree of noise detection is neither excessive nor
insufficient
Table 1
The ideas, merits, and demerits of the main research schemes.
Classification-based schemes Fuzzy-based schemes

In the process of rating Idea: Divide rating scale into three equal parts Idea: Create membership function of rating categories
classification Merits: Simple Merits: More reasonable
Demerits: Different rating values are grouped Demerits: Different ratings may have the same fuzzy profiles
into the same category
In the process of noise Idea: Re-predict a new value for noisy rating Idea:(a) Re-predict a new value for noisy rating (b) Remove noisy rating
correction Merits:Correct noise effectively (c) Correct noise with average value
Demerits: Time-consuming Merits: (a) Correct noise effectively (b)Simple; (c) Easy to implement
Demerits: (a) Time-consuming (b)Cause data sparsity problem (c)
correct noise insufficiently
624
In the noise correction process, the noisy ratings are corrected according to the Maximum membership principle, which can
give more reasonable corrected values to the noisy ratings and guarantee the corrected values passing the noise detection
rule. Thus, it can achieve better results than the commonly used methods, such as directly removing noise, correcting
noise with averaged value and re-predicting ratings.
The remaining parts of this paper are organized as follows. In Section 2, we introduce the related works about natural
noise in RS. In Section 3, we propose a new fuzzy-based method to detect and correct natural noise. In Section 4, the pro-
posed scheme is tested in two benchmark datasets and compared with other schemes. Finally, conclusions are drawn in
Section 5.
2. Related works
Collaborative filtering (CF) as one of the most popular recommendation models has been widely used in research and
practice. The basic idea of CF is to search the N most similar neighbors for a given user by similarity measures, and then pro-
duce predictions for the user according to neighbors’ opinions. Thus, the neighbor selection is a critical step for CF, which
determines the quality of recommendation results. However, noisy data may bias the neighbor selection and even mislead
CF to choose fake neighbors (i.e., the neighbors are not really similar to the given user or item), which will finally generate
recommendation results with bias. The framework of CF under noisy data and non-noisy data is shown in Fig. 1.
Different from malicious noises, natural noises are produced by non-malicious users due to different rating environments
or some accidental factors [7,12]. They are not easy to be detected but have considerable side effects on recommendation
algorithms, especially CF. Thus, it is necessary to correct natural noises in RS to obtain more accurate prediction results.
There have been some approaches to detect and correct natural noises in recent years. In [8], users are required to rate
the same items again to check whether the previous ratings contain natural noises, which can avoid the influence of acci-
dental factors in the rating process. In [9], an interactive system is built to provide users with some guidance in time, and
even some experts are invited to examine the historical ratings and judge whether natural noises in ratings according to their
professional experience. Although these methods can avoid some possible noises, they may be hard to implement in real
applications due to extra time, labor, and resource costs.
To detect natural noise efficiently, some classification-based methods are proposed [10,13]. This kind of method only uses
the features of rating data to process natural noise and possesses the advantage of easy deployment in RS. In these methods,
the ratings in RS are firstly classified into three categories (i.e., low, medium, and high ratings). Then, the users in RS are also
divided into three categories (i.e., positive, average, and negative users) according to their historical ratings. Similarly, items
are divided into preferred, av-preferred, and no-preferred items. Based on user and item classification results, the general
rules for detecting natural noise are proposed and shown in Table 2 [10,13]. Positive users most likely provide a high rating
to a preferred item, average users tend to provide a medium rating to an av-preferred item and negative users tend to pro-
vide a low rating to a no-preferred item. When the rating does not satisfy the three mentioned assumptions, it is regarded as
natural noise. Similar classification-based noise detection methods are also used in group RS [14] and multi-criteria RS [15].
Although the classification-based methods can detect and correct natural noise to some extent, they divide the rating scale
into three equal parts, which leads to a new problem that different ratings belong to the same category and it cannot fully
describe the features of different rating values.
Recently, the fuzzy theory is introduced to classify ratings to improve the ability of detecting natural noise [11]. In [16,17],
the researchers do not group each rating to a single class but create the membership function to compute the membership
degree that the rating belongs to different categories. The membership function provides a basic way to describe the fuzzy
Fig. 1. The working process of CF using noisy data and non-noisy data.
625
Table 2
Detection rules for natural noise in classification-based methods.
Rating No-preferred item Av-preferred item Preferred item

Negative user – Natural noise Natural noise
Average user Natural noise – Natural noise
Positive user Natural noise Natural noise –
Negative user refers to the strict user who is used to give low ratings to items; Positive user refers to the generous user who is used to give high ratings to
items; Average user refers to the user who is used to give medium ratings to items. Preferred item refers to the item which gets high ratings from most of
the users; Av-preferred item refers to the item which gets medium ratings from most of the users; No-preferred item refers to the item which gets low
ratings from most of the users.
profiles of users and items. Then, the noise detection rules are proposed based on the user and item profiles. But these meth-
ods still adopt the strategy that re-predict a new rating to correct noise, which is time-consuming. Different from re-
predicting a new rating for natural noise, Yera1 et al. presented two strategies: directly removing noise and modifying noise
with average value [18]. However, removing noisy ratings leads to more sparse data and will bring side effects to CF. Mod-
ifying noise with the average value may lead to insufficient correction and the modified ratings by this strategy possibly do
not pass the noise detection rules. In Table 3, we list the main membership functions used by fuzzy-based natural noise man-
agement schemes. We can see from Table 3 that the fuzzy profile of rating value ‘‘2” is the same as that of ‘‘3” in [16,17].
Meanwhile, the same situation also happens in rating values ‘‘4” and ‘‘5”. Thus, although the schemes in [16,17] use a
fuzzy-based way to describe ratings, they still cannot effectively distinguish the differences between different rating values.
Compared with the membership functions in [16,17], the scheme in [18] can better reflect the difference between the rating
categories. In [18], how to set the low-medium and the medium–high thresholds is an important issue and is closely related
to the performance of rating classification, which needs to be further studied. In this paper, we regard the boundary value
between the low and medium ratings and the boundary value between medium and high ratings as two variables drawn
from a certain normal distribution and design a new membership function. Based on this membership function, we further
propose a new fuzzy-based method, which shows good potential for detecting and correcting natural noise in ratings.
3. A new fuzzy-based method to manage natural noise in RS
In this section, we propose a new fuzzy-based method to manage natural noise in RS. The framework of the proposed
method is depicted in Fig. 2 and the main steps are described as follows:
Table 3
Main membership functions and their rating classification results.
Scheme Membership function Fuzzy profile

[16] 1: (1,0,0)2: (0,1,0)3: (0,1,0)4: (0,0,1)5: (0,0,1)
[17] 1: (1,0,0)2: (0,1,0)3: (0,1,0)4: (0,0,1)5: (0,0,1)
[18] 1: (1,0,0)2: (0.5,0.5,0)3: (0,1,0)4: (0,0.5,0.5)5: (0,0,1)
626
Fig. 2. The overall framework of the proposed method.
Step 1: According to the method in Section 3.1, construct a membership function for low, medium, and high ratings,
respectively.
Step 2: Build the rating profiles according to the membership function constructed above. Based on the rating profiles,
build the fuzzy profiles of users and items, which will be described in detail in Section 3.2.
Step 3: Based on the fuzzy profiles of users and items, estimate the possible rating for each user-item pair. Detect natural
noise by comparing the estimated rating with the original rating, as well be discussed in detail in Section 3.3.
Step 4: Once the natural noise is detected, it will be corrected according to the rules in Section 3.4.
3.1. Constructing membership function
As an important part of fuzzy set theory, membership function is used to describe uncertain phenomenon in practice [19].
Normally, it is applied to fuzzy classification scenarios [20]. The boundaries between different rating categories in [16,17,18]
are always fixed, which may cause the membership functions unable to distinguish the differences among rating values
under some cases. Here, we improve the membership function by treating the bourdariy as a normal distribution variable,
which guarantees the flexibility of rating classification. The general membership function is defined as follows:
Definition 1 [19]. A fuzzy set F in X is a set of ordered pairs
F ¼ fðx; AðxÞÞg; x 2 X;
where A(x) is the membership degree of x in A and A : X ! ½0; 1 is called the membership function.
Here, we classify the ratings of RS into three categories, i.e., low, medium, and high ratings. Correspondingly, the mem-
bership functions of rating categories are defined as A1 ,A2 , and A3 , respectively. Then, we present a mapping ek;v as follows:
ek;v ðxÞ : XðXÞ ! fA1 ; A2 ; A3 g ð1Þ
that is
8
>
< A1 ðxÞ; x 6 k
ek;v ðxÞ ¼ A2 ðxÞ; k < x < v ; ð2Þ
>
:
A3 ðxÞ; x > v
where k is the boundary point between low and medium rating, and v is the boundary point between medium and high rat-
ing; x is a rating value and XðxÞ denotes all possible fuzzy subsets in domain X.
Clearly, k and v are important to determine the membership function of low, medium, and high rating categories. In the
previous researches [11,16,17,18], fixed values are always assigned to k and v. To improve the reasonability and flexibility of
rating classification, we introduce probability distribution to determine k and v. From the perspective of statistics, the ideal
boundary point between low rating and medium rating varies for different users. In this consern, we regard k as a variable
627
that obeys normal distribution, i.e., k : Nða; r21 Þ. Similarly, v obeys Nðb; r22 Þ. Here, r1 and r2 denote the variance of normal
distribution. Our idea of describing boundary points is illustrated in Fig. 3.
According to Fig. 3, the cumulative distribution functions of A1 , A2 , and A3 are computed as
Z þ1
xa
A1 ðxÞ ¼ Pðx 6 aÞ ¼ P a ðxÞdx ¼ 1 U ð3Þ
x r1
Z
x
xb
A3 ðxÞ ¼ Pðx > bÞ ¼ P b ðxÞdx ¼ U ð4Þ
1 r2

xa xb
A2 ðxÞ ¼ 1 A1 ðxÞ A3 ðxÞ ¼ U U ð5Þ
r1 r2
Rx 2
where Pðx 6 aÞ is the probability with respect to x 6 a and XðxÞ ¼ 1
ffiffiffiffi et2 dt.
p1 Here, we set r21 ¼ r22 ¼ 1 to simplify the cal-
2p
culation. In this way, our presented membership function has strong flexibility, which is applicable for any RS with different
rating scales.
3.2. Building fuzzy profiles of ratings, users, and items
Different from classification-based methods [10,13,14,15], we do not classify each rating into a single category, but build
profiles of ratings by using a fuzzy set. According to the membership function in Section 3.1, the fuzzy profile of rating is built
as follows:
F r ¼ ðA1 ðrÞ; A2 ðrÞ; A3 ðrÞÞ ð6Þ
where A1 ðrÞ, A2 ðrÞ, and A3 ðrÞ denote the membership degrees of rating r belongs to the low, medium, and high-rating cate-
gory, respectively.
To further illustrate the advantages of our rating fuzzy profiles, we take a 5-star RS as an example and calculate the fuzzy
profiles of each rating. The computation results are listed in Table 4 and the membership degrees that a rating value belongs
to the low, medium, and high-rating category are shown in Fig. 4. It is easily observed that all the rating values have different
fuzzy profiles, which means that our classification method can distinguish the rating values well. Compared with the
schemes in Table 3, our member function can better describe the differences among rating values.
Based on the fuzzy profiles of ratings, we then build the fuzzy profiles of items as follows. For an item i, we first collect all
the ratings of i. Then, we build a 3-tuple F i to represent the membership degrees that i belongs to no-preferred item, av-
preferred item, and preferred item by averaging the membership degrees of its historical ratings that belong to low, medium,
and high ratings, which is characterized by
!
1 X 1 X 1 X
F i ¼ F 1i ; F 2i ; F 3i ¼ A1 ðrÞ; A2 ðrÞ; A3 ðrÞ ð7Þ
jRi j r2R jRi j r2R jRi j r2R
i i i
where Ri is the rating set of item i and r is a rating value, and jXj returns the number of elements in the set X.
Similarly, based on the same principle, we build the fuzzy profiles of user u as follows:
!
1 X 1 X 1 X
F u ¼ F 1u ; F 2u ; F 3u ¼ A1 ðrÞ; A2 ðrÞ; A3 ðrÞ ð8Þ
jRu j r2R u
jRu j r2R u
jRu j r2R u
Fig. 3. The relationship between boundary point and membership function.
628
Table 4
The fuzzy profiles of ratings in a five-star RS.
Rating value Fuzzy profiles of ratings

1 (0.8413,0.1574,0.0013)
2 (0.5000,0.4772,0.0228)
3 (0.1587,0.6826,0.1587)
4 (0.0228,0.4772,0.5000)
5 (0.0013,0.1574,0.8413)
Fig. 4. The membership degree of each rating category.
where Ru is the rating set provided by user u. Here, F u quantifies the tendency that user u prefers to provide low, medium,
and high ratings for items. The resultant fuzzy profiles of ratings, users, and items provide a good basis for detecting and
correcting natural noise.
3.3. Detecting natural noise
Intuitively, negative users are most likely to provide low ratings for the no-preferred item. Similarly, positive users are
most likely to provide high ratings for the preferred items. Here, we directly estimate the rating of user u on item i by con-
sidering both the tendency of user rating items and the membership degrees of the item belonging to different categories.
Based on the fuzzy profiles of user and item, the rating is estimated by

F ui ¼ F 1ui ; F 2ui ; F 3ui ¼ F 1u F 1i ; F 2u F 2i ; F 3u F 3i ð9Þ

where F ui is the estimated rating of item i by user u. Moreover, we normalize F ui to N ui ¼ N 1ui ; N 2ui ; N 3ui as follows:
F 1u F 1i
N1ui ¼ ð10Þ
F 1u F 1i þ F 2u F 2i þ F 3u F 3i
F 2u F 2i
N2ui ¼ ð11Þ
F 3u F 3i
N3ui ¼ ð12Þ
Finally, we detect natural noise by comparing N ui with the fuzzy profile of the original rating. Suppose the original rating
of user u on item i is r, the fuzzy profile of r is calculated by Eq. (6) and denoted as F r . The difference of between F r and N ui is
computed by
d ¼ jjF r N ui jj2 ð13Þ
where jj jj2 returns L2 -normal of a vector. The similarity between F r and N ui is given by
629
1
s¼ ð14Þ
1þd
Here, a threshold value d is used to control the degree of detecting natural noise. It can help us determine whether the
rating is noisy. When s < d, the original rating is regarded as noisy. Otherwise, the original rating is not noisy. By increasing d,
more ratings will be treated as noisy ratings. In this case, some ratings may be misjudged. On the other hand, if d is small,
some noisy ratings cannot be detected. Thus, the introduction of d is very important to control the quality of noise detection,
and by adjusting the value of d. we can control the degree of detecting noise in a reasonable range. Further analysis of how to
determine will be given in Section 4.2.
3.4. Correcting natural noise
In our scheme, we correct noisy ratings according to the Maximum membership principle. The general form of the Maxi-
mum membership principle is defined as follows.
Definition 2 (.). Given membership function Ai 2 XðXÞði ¼ 1; 2; 3; . . . ; nÞ, for each element x, [21]
9j; Aj ðxÞ ¼ maxfA1 ðxÞ; A2 ðxÞ; . . . ; An ðxÞg
then x relatively belongs to Aj , where XðXÞ denotes the set of all fuzzy subsets of reference set X.
According to the above Maximum membership principle, our correction rules are proposed as follows:
8 n o
>
> a; if N 1ui ¼ max N1ui ; N2ui ; N3ui
>
>
< n o
r correct ¼ aþb ; if N2ui ¼ max N1ui ; N 2ui ; N3ui ð15Þ
>
>
2
n o
>
>
: b; if N3 ¼ max N1 ; N2 ; N3
ui ui ui ui
where rcorrect is the corrected value of rui and a and b are the average value of normal distribution k and v, respectively. They
are determined by:

Rmax Rmin
a ¼ Rmin þ round ð16Þ
3

Rmax Rmin
b ¼ Rmax round ð17Þ
3
where Rmax and Rmin are the maximum and the minimum value for all possible ratings, respectively, and roundðxÞ returns the
nearest integer value of x.
4. Experiments
In this section, we will test the performance of our scheme in some public datasets. Two benchmark datasets from dif-
ferent domains are selected. The details on these two datasets are shown in Table 5. For each dataset, 80% of rating data is
randomly selected as the training set and the remain 20% ratings are used as the test set.
4.1. Evaluation metrics
We firstly detect and correct noisy ratings in datasets, then execute some popular CFs on the new datasets after de-
nosing. If the CFs can achieve better results on the new datasets, it means that our work is effective. Generally, the perfor-
mance of a recommendation algorithm is evaluated from two aspects: prediction error and recommendation quality. Mean
Absolute Error (MAE) and Root Mean Squared Error (RMSE) are two metrics on prediction error [22], which are defined as
P
ðu;iÞ2T jr ui pui j
MAE ¼ ð18Þ
jTj
Table 5
The details of Movielens and Yahoo Music datasets.
Dataset #Users #Items #Ratings Rating scale

a
Movielens 6040 3900 1,000,209 {1,2,3,4,5}
Yahoo Musicb 8089 1000 270,121 {1,2,3,4,5}
a
https://grouplens.org/datasets/movielens/.
b
https://webscope.sandbox.yahoo.com/catalog.php?datatype=r.
630
vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
u P
u ðr ui pui Þ2
u
tðu;iÞ2T
RMSE ¼ ð19Þ
jTj
where T is the set of predicted ratings. The lower MAE and RMSE values indicate lower prediction errors. Precision, Recall and
F1-value are three popular metrics used to evaluate recommendation quality, which are given by

Ip \ Ia
Precision ¼ ð20Þ
Ip

Ip \ Ia
Recall ¼ ð21Þ
jI a j
2 Precision Recall
F1 ¼ ð22Þ
Precision þ Recall
where Ip is the item set that is recommended to user u, and Ia is the item set that user u actually prefers. Here, we set the
average value rated by user u as the threshold value of recommendation. When the rating value of user u on item i is larger
than the threshold value, it indicates that user u likes item i. Since F1-value is the comprehensive indicator that combines
Precision and Recall together, we use it to evaluate the recommendation quality in our experiments.
4.2. The determination of d
In our scheme, the parameter d is very important and closely related to natural noise detection. If d is too small, some
potential natural noises cannot be found. Meanwhile, if d is too large, some non-noisy ratings may be misjudged as natural
noise. We use Fig. 5 to further illustrate the influence of d. The ideal curve represents the prediction error when CF processes
data with no natural noise. Suppose d is smaller than the ideal value and as d increases, more natural noises are detected and
corrected, so the actual error curve moves close to the ideal curve as shown in Fig. 5(a). On the other side, if d is larger than
the ideal value, some accurate ratings are misjudged as noise. As the value d increases, more ratings will be misjudged as
noise and the prediction error of CF becomes larger as shown in Fig. 5(b). Therefore, when d is gradually increased from
0, the best d will appear at the inflection point where the prediction error of CF changes from decreasing to increasing.
In our experiments, we respectively set d = 0.1, 0.2, 0.3, . . ., 1 and test the prediction errors of some popular CFs (i.e., PCC
[23], COS [24], ACOS [25], and MSD [26]) in the Movielens dataset and the Yahoo Music dataset. The MAE curves of these CFs
with different neighbor sizes are shown in Figs. 6 and 7.
In Fig. 6(a), when d 6 0:4, no noisy ratings are detected by using our method. Thus, their MAE curves are the same as the
MAE curve obtained from the original dataset. As dincreases, more noisy ratings are detected and the error of PCC keeps
reducing. When d ¼ 0:8, PCC achieves the best MAE at all neighbors. In Fig. 6(b), ACOS achieves the best MAE (0.79) at
d ¼ 1 and its MAE is 0.83 when d ¼ 0:8. The difference between the two MAEs is small. In Fig. 6(c) and Fig. 6(d), COS achieves
the best MAE (0.80) at d ¼ 0:8 and MSD achieves the best MAE (0.81) also at d ¼ 0:8. The experiment results in Fig. 6 shows
that for most of CFs, d ¼ 0:8 is the inflection point where the prediction error changes from decreasing to increasing. Thus,
we set d ¼ 0:8 to detect the natural noise for the Movielens dataset.
In Fig. 7(a), PCC reduces the MAE by about 15% at d ¼ 0:8. In this case, noise detection is the most effective. When d > 0:8,
the MAE starts to increase, it means that d ¼ 0:8 is the inflection point. In Fig. 7(b), ACOS improves the prediction results by
about 11% at d ¼ 0:8. As d ¼ 1, the MAE of ACOS even becomes worse than MAE on the original dataset. In this case, we can
Fig. 5. The effect of parameter d on CF’s error. (a) When d is smaller than the best value, increasing d will decrease the prediction error of CF; (b) When d is
larger than the best value, increasing d will increase the prediction error of CF.
631
Fig. 6. The MAE curves of different CFs (PCC, ACOS, COS, MSD) in the Movielens dataset.
believe that over de-noising has seriously distorted the original data. Regarding COS and MSD, they have the best MAEs at
d ¼ 0:8 as shown in Fig. 7(c) and Fig. 7(d). Based on the experiment results, we also set d ¼ 0:8 for the Yahoo Music dataset.
4.3. The effect of proposed membership function
In our scheme, we devise a membership function to classify the ratings based on unfixed boundary points. To verify the
effectiveness of unfixed boundary points in our membership function, we compare our membership function with the mem-
bership function in [16], which is based on fixed boundary points. The same schemes (i.e., the detecting scheme in Section 3.3
and the correction schemes in Section 3.4) are used here. The only difference in our tests is the membership function.
According to the two different membership functions, we process the ratings in the Movielens and Yahoo Music datasets
and get the corresponding de-noised datasets. Finally, we use PCC as the CF scheme to calculate the RMSEs on the de-
noised datasets. The RMSEs are shown in Fig. 8.
We can see from the figure that our membership function achieves better RMSEs in the Movielens dataset than the one in
[16]. In the Yahoo Music dataset, although the RMSEs of our scheme are a little larger than the scheme in [16] when the
neighbor size is small, our scheme still has an obvious advantage over the scheme in [16] when the neighbor size is larger
than 30. Overall , our scheme is better than the scheme in [16]. Therefore, we may conclude that the strategy of unfixed
boundary points in our membership function is beneficial to create more accurate fuzzy profiles of ratings, users, and items
and has a positive effect on the final results.
4.4. Performance analysis
To verify the performance of our scheme, we apply our scheme to detect and correct natural noise in the Movielens data-
set and the Yahoo Music dataset, respectively. Then, we test some popular CFs in the de-noised dataset and the original data-
set. Moreover, some newly published schemes [10,11,13,18] are tested and compared with our scheme. In [18], there are two
different strategies used to correct natural noise. One is the remove strategy which removes the noisy ratings directly, the
other is the average strategy which corrects the noisy ratings with the average value of ratings.
632
Fig. 7. The MAE curves of different CFs (PCC, ACOS, COS, MSD) in the Yahoo Music dataset.
Fig. 8. The comparison results of membership functions between our scheme and the scheme in [16].
4.4.1. The results on Movielens

Table 6 shows the test results of four classical CFs (i.e., PCC [23], ACOS [24], COS [25], and MSD [26]) working on the orig-
inal Movielens dataset and the de-noised datasets. The de-noised datasets are processed by our scheme and some other
schemes [10,11,13,18], separately. The test results on the original Movielens dataset are regarded as the baseline. In
[10,11], the same noise correction strategy (i.e., re-prediction) is used, but their noise detection strategies are different.
The detection strategy in [10] is designed based on a general classification method, while the detection strategy in [11] is
based on a fuzzy method. Since the MAE and RMSE of [11] are always better than those of [10], it means that the fuzzy-
based method can better describe the features of rating data, and the noise-detection strategy based on it can better distin-
guish natural noise than the general classification strategy. Compared with the baseline, we can see that the scheme in [13]
633
Table 6
The test results of MAE, RMSE, and F1-value in the Movielens dataset.
Movielens Original Method in [10] Method in [11] Method in [13] Method in [18]- Method in [18]- Our scheme
remove strategy average strategy
MAE PCC 0.9679 0.9536(+1.43%) 0.9484(+1.95%) 0.8910(+7.69%) 0.9629(+0.50%) 0.9481(+1.98%) 0.8050(+16.29%)
ACOS 1.1272 1.1277(-0.05%) 1.1252(+0.20%) 1.0298(+9.74%) 1.1272(0.00%) 1.1269(+0.03%) 0.8311(+29.61%)
COS 0.9762 0.9583(+1.79%) 0.9566(+1.96%) 0.8753(+10.09%) 0.9657(+1.05%) 0.9603(+1.59%) 0.8033(+17.29%)
MSD 0.9416 0.9299(+1.17%) 0.9240(+1.76%) 0.8712(+7.04%) 0.9407(+0.09%) 0.9304(+1.12%) 0.8075(+13.41%)
RMSE PCC 1.2402 1.2198(+2.04%) 1.2116(+2.86%) 1.1577(+8.25%) 1.2381(+0.21%) 1.2141(+2.61%) 1.0216(+21.86%)
ACOS 1.4746 1.4747(-0.01%) 1.4723(+0.23%) 1.3763(+9.83%) 1.4755(-0.09%) 1.4731(+0.15%) 1.0884(+38.62%)
COS 1.2488 1.2229(+2.59%) 1.2192(+2.96%) 1.1351(11.37%) 1.2410(+0.78%) 1.2275(+2.13%) 1.0166(+23.22%)
MSD 1.2040 1.1876(+1.64%) 1.1777(+2.63%) 1.1283(+7.57%) 1.2107(-0.67%) 1.1917(+1.23%) 1.0221(+18.19%)
F1 PCC 0.6486 0.6495(+0.09%) 0.6447(-0.39%) 0.6044(-4.42%) 0.6553(+0.67%) 0.6543(+0.57%) 0.6783(+2.97%)
ACOS 0.1329 0.1293(-0.36%) 0.1288(-0.41%) 0.1244(-0.85%) 0.1312(-0.17%) 0.1303(-0.26%) 0.1590(+2.61%)
COS 0.6378 0.6381(+0.03%) 0.6362(-0.16%) 0.6041(-3.37%) 0.6469(+0.91%) 0.6419(+0.41%) 0.6769(+3.91%)
MSD 0.6516 0.6531(+0.15%) 0.6495(-0.21%) 0.5986(-5.30%) 0.6562(+0.46%) 0.6564(+0.48%) 0.6720(+2.04%)
increases the performance of MAE and RMSE by more than 5%. The noise detection strategy in [13] is the same as that in [10].
Moreover, the MAE and RMSE in [13] are much better than those in [10]. Thus, the test results in Table 6 also show that the
noise correction strategy in [13] is much better than that in [10]. Compared with baseline, the scheme in [18] also improves
the MAE and RMSE. But the improvement is limited. Although the fuzzy-based strategy in [18] plays a good basis for detect-
ing noise, its strategy for correcting noise does not further enhance the performance significantly. One noise-correction strat-
egy in [18] is to remove the noisy ratings directly, which leads to more sparse data. As we know, sparse data is not beneficial
to improve the performance of CFs. Another noise-correction strategy in [18] is to use the average value to replace the noisy
ratings, but the average value may not accurately represent the category of ratings. Compared with the baseline, our scheme
improves the MAE and RMSE of the four CFs by at least 13% and 18%, respectively. Among all the schemes in Table 6, our
scheme has the best MAE and RMSE. In our scheme, the noise-detection strategy is designed based on the fuzzy profiles
of ratings, users, and items. Since the fuzzy profile can better describe the features of ratings, users, and items, it helps
our scheme find noisy data effectively. In our correction-noise strategy, we do not remove the noisy data or simply replace
it with an average value. The noisy rating is revised according to the Maximum membership principle, which keeps the char-
acteristics of rating data very well. Thus, our noise-correction strategy can further improve the performance of CFs.
For the F1-value, it can be seen from Table 6 that only our scheme achieves stable improvements in all CFs. Compared
with the baseline, our scheme improves F1-value by about 3%. For other schemes in Table 6, their improvements on F1-
value are not significant. Under some certain cases, such as ACOS in [10,11,13,18], the F1-values of the methods in compar-
ison decrease. The reason is that detecting and correcting natural noise mainly focus on the rating values, while F1-value is a
typical metric for recommendation and it is an indirect result of using rating value. Thus, the influence of correcting noise on
F1-value is not obvious. Nevertheless, our scheme still increases the F1-values in all CFs, which shows that our scheme has
good adaptability and can be used to improve the recommendation performance of CF.
Except for the neighbor-based CFs (i.e, PCC, ACOS, COS, and MSD), we also select matrix factorization (MF) as another test
method. Different from the neighbor-based CFs, MF does not depend on a specific rating value, but on the overall trend of the
whole ratings. Since MF uses a learning way to generate the predictive values, it can be regarded as another kind of CF. Here,
we use the RMSE curve to compare the performance of different denoising schemes. Fig. 9 shows the RMSE curves of differ-
ent denoising schemes when the number of iterations increases. It is easy to find that the proposed method has better RMSE
than the other methods and the baseline method.
4.4.2. The results on Yahoo Music

We carry out the same tests in the Yahoo Music datasets. The rating data are de-noised by our scheme and the schemes in
[10,11,13,18]. Table 7 shows the MAE, RMSE and F1-value of the four CFs (PCC [23], ACOS [24], COS [25] and MSD [26]). For
the original dataset, the MAE and RMSE are larger than 1.0 and 1.4, respectively, which is worse than the results in the
Movielens dataset. One important reason is that the ratings in Yahoo Music are sparser. Compared with the baseline, our
scheme improves the MAE and RMSE by about 12% and 15%, respectively. Meanwhile, the other schemes improve the
MAE and RMSE by less than 2%. Thus, our scheme has obvious advantages in MAE and RMSE over the other schemes. For
the F1-value, our scheme still significantly improves the recommendation quality by more than 5%, which is far better than
the other schemes. The main reason of this improvement lies in two aspects: First, we construct accurate fuzzy profiles for
classifying items, users, and items, which provides effective support for noise detection. Second, we modify noisy ratings
according to the Maximum membership principle, which effectively ensures the high quality of correcting natural noise.
Moreover, the RMSEs of MF using different denoising schemes on the Yahoo Music dataset are calculated and the result is
shown in Fig. 10. Similar to the case of the Movielens dataset, our scheme greatly improves the RMSE in the Yahoo music
dataset and has better results than the other schemes.
634
Fig. 9. The RMSEs of MF using different denoising management on Movielens dataset.
4.5. Efficiency analysis
Here, we analyze the time complexity of our scheme and compare it with other schemes [10,11,13,18]. Suppose the num-
bers of users and items in a RS are m and n, respectively. Each user rates k items and each item is rated by v users on average,
k n; v m. The rating scale is {1, 2, . . ., r}. In our scheme, the fuzzy profiles of users and items are built and the corre-
sponding time complexity Oðmk þ nv Þ. To detect natural noise, we need to judge whether each rating is noisy or not, which
needs an extra OðmkÞ. Since our scheme only preforms one comparison and one modification to correct natural noise, the
time complexity is Oð1Þ for one rating. Therefore, the total time complexity is Oð2mk þ nv Þ.
For the method in [10], it firstly classifies users and items into accurate categories according to historical rating informa-
tion, and this step needs a complesity of Oðmk þ nv Þ. Then the time complexity of detecting natural noise is OðmkÞ. Once the
noisy ratings are detected, PCC is used to re-predict new ratings for these noisy data, the time complexity is Oðm2 þ mkÞ.
Therefore, the total time complexity of [10] is Oðm2 þ 3mk þ nv Þ. The scheme in [13] is a classification-based method, which
corrects noisy ratings with low-medium and medium–high ratings, its time complexity is Oð2mk þ nv Þ. For the method in
[11], it creates a membership function of rating categories and builds the fuzzy profiles of users and items, this step needs
Oðmk þ nv Þ. The time complexity of noise detection is OðmkÞ and the time complexity of noise correction is Oðm2 þ mkÞ.
Thus, the total time complexity of [11] is Oðm2 þ 3mk þ nv Þ. The scheme in [18] is composed of four parts: rating regularities
exploring, regularity filtering, noise detection, and noise correction. To explore rating regularities, the time complexity is
Oðrn Þ. In regularity filtering, each element in all regularities should be checked and will be removed once it is the subset
of other regularity, this process needs Oðr2n Þ. In noise detection, each rating in RS will be checked whether it exists in all reg-
ularities or not, the process needs Oðmkr n Þ. In the correction process, the top T noisiest ratings for each user are selected and
modified. For the remove strategy in [18], its time complexity is OðmTÞ. For the average strategy, it needs Oðmðk þ TÞÞ. There-
fore, in [18], the total complexity of the scheme using remove strategy is Oðrn þ r2n ðmk þ 1Þ þ mTÞ and the total complexity
Table 7
The test results of MAE, RMSE, and F1-value in the Yahoo Music dataset.
Yahoo Music Original Method in [10] Method in [11] Method in [13] Method in [18]- Method in [18]- Our scheme
remove strategy average strategy
MAE PCC 1.1385 1.1365(+0.20%) 1.1320(+0.65%) 1.1321(+0.64%) 1.1285(+1.00%) 1.1385(0.00%) 0.9959(+14.26%)
ACOS 1.0442 1.0405(+0.37%) 1.0358(+0.84%) 1.0299(+1.43%) 1.0389(+0.53%) 1.0424(+0.18%) 0.9243(+11.99%)
COS 1.1232 1.0.1226(+0.06%) 1.1244(-0.12%) 1.1148(+0.84%) 1.1213(+0.19%) 1.1405(-1.73%) 0.9958(+12.74%)
MSD 1.1556 1.1542(+0.14%) 1.1455(+1.01%) 1.1497(+0.59%) 1.1527(+0.29%) 1.1547(+0.09%) 0.9835(+17.21%)
RMSE PCC 1.4570 1.4554(+0.16%) 1.4541(+0.29%) 1.4532(+0.38%) 1.4480(+0.90%) 1.4570(0.00%) 1.3089(+14.81%)
ACOS 1.5320 1.5284(+0.36%) 1.5196(+1.24%) 1.5163(+1.57%) 1.5281(+0.39%) 1.5265(+0.55%) 1.3024(+22.96%)
COS 1.4466 1.4467(-0.01%) 1.4489(-0.23%) 1.4405(+0.61%) 1.4491(-0.25%) 1.4658(-1.92%) 1.3052(+14.14%)
MSD 1.4842 1.4831(+0.11%) 1.4740(+1.02%) 1.4804(+0.38%) 1.4810(+0.32%) 1.4851(-0.09%) 1.3010(+18.32%)
F1 PCC 0.5417 0.5419(+0.02%) 0.5389(-0.28%) 0.5441(+0.24%) 0.5580(+1.63%) 0.5433(+0.16%) 0.6108(+6.91%)
ACOS 0.4371 0.4366(-0.05%) 0.4397(+0.26%) 0.4429(+0.58%) 0.4384(+0.13%) 0.4329(-0.42%) 0.5407(+10.36%)
COS 0.5458 0.5461(+0.03%) 0.5331(-1.27%) 0.5488(+0.30%) 0.5517(+0.59%) 0.5361(-0.97%) 0.6153(+6.95%)
MSD 0.5399 0.5400(+0.01%) 0.5356(-0.43%) 0.5411(+0.12%) 0.5474(+0.75%) 0.5347(-0.52%) 0.6129(+7.30%)
635
Fig. 10. The RMSEs of MF using different denoising management on Yahoo Music dataset.
Table 8
The time complexity and run time of our scheme and some other schemes.
Scheme Time complexity Run time on Movielens Run time on Yahoo Music
Method in [10] Oðm2 þ 3mk þ nv Þ 1278.93 s 123.71 s
Method in [11] Oðm2 þ 3mk þ nv Þ 1444.77 s 683.42 s
Method in [13] Oð2mk þ nv Þ 7.30 s 0.90 s
Method in [18]- remove strategy Oðrn þ r 2n ðmk þ 1Þ þ mTÞ 629.96 s 82.30 s
Method in [18]-average strategy Oðrn þ r 2n ðmk þ 1Þ þ mðk þ TÞÞ 627.97 s 86.20 s
Our scheme Oð2mk þ nv Þ 8.71 s 1.40 s
of the scheme using the average strategy is Oðrn þ r2n ðmk þ 1Þ þ mðk þ TÞÞ. To summarize, the time complexity of all the
above schemes are listed in Table 8. From this table, we can conclude that our scheme and the scheme in [13] have higher
efficiency.
To further verify the high efficiency of our proposed method, we also implement all schemes on a PC with Intel(R) Core
(TM) i7-7700HQ CPU and 16.0 GB RAM. The running time of all the above schemes in the datasets Movielens and Yahoo
Music is also shown in Table 8. We can see from Table 8 that the running time also confirms that our scheme and the scheme
in [13] have higher efficiency. Compared with [13], our scheme introduces a modest time delay since building fuzzy profiles
need more computation than directly classifying ratings, and we regard this trade-off is inevitable and generally acceptable
in real applications.
5. Conclusion
In this paper, we propose a fuzzy approach to manage natural noise in RS, which can detect the natural noise effectively
and correct it efficiently. The ratings in RS are divided into three categories based on fuzzy techniques, which can better iden-
tify the relationship between ratings and high, medium, and low evaluations. Based on the classification of rating, the fuzzy
profiles of users and items are proposed, which can not only describe the trait of users and items accurately but also lay a
good foundation for detecting natural noise. Finally, based on the maximum membership degree principle, the detected nat-
ural noise is corrected by directly using the Maximum membership principle, which makes our scheme own high efficiency.
The experimental results on two benchmark datasets also confirm that the proposed scheme significantly improves the qual-
ity of data and can help the CFs obtain better recommendation results.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have
appeared to influence the work reported in this paper.
Acknowledgments
This work was supported by the National Natural Science Foundation of China [Nos. 61702221, 61772215], the MOE Lay-
out Foundation of Humanities and Social Sciences [No. 20YJAZH102] and the Foundation of Guangxi Key Laboratory of Cryp-
tography and Information Security [No. GCIS201908].
636
References
[1] J. Matthes, K. Karsay, D. Schmuck, A. Stevic, ‘‘Too much to handle”: Impact of mobile social networking sites on information overload, depressive
symptoms, and well-being, Computers in Human Behavior 105 (2020) 106217.
[2] L. Qi, X. Zhang, S. Li, S. Wan, Y. Wen, W. Gong, Spatial-temporal data-driven service recommendation with privacy-preservation, Information Sciences
515 (2020) 91–102.
[3] S.-H. Cheng, Autocratic decision making using group recommendations based on hesitant fuzzy sets for green hotels selection and bidders selection,
Information Sciences 467 (2018) 604–617.
[4] Y. Wang, X.Y. Wan, Z.Y. Tao, P. Zhang, Collaborative filtering recommendation algorithm based on K-medoids item clustering, Journal of Chong
University of Posts and Telecommunications 29 (4) (2017) 521–526 (In Chinese).
[5] A.M. Turk, A. Bilge, Robustness analysis of multi-criteria collaborative filtering algorithms against shilling attacks, Expert Systems with Applications
115 (2019) 386–402.
[6] H. Xia, B. Fang, M. Gao, H. Ma, Y. Tang, J. Wen, A novel item anomaly detection approach against shilling attacks in collaborative recommendation
systems using the dynamic time interval segmentation technique, Information Sciences 306 (2015) 150–165.
[7] M.P. O’Mahony, N.J. Hurley, G. Silvestre, Detecting noise in recommender system databases, in: Proceedings of the 11th International Conference on
Intelligent User Interfaces, 2006, pp. 109–155.
[8] X. Amatriain, J.M. Pujol, N. Tintarev, N. Oliver, Rate it again: Increasing recommendation accuracy by user re-rating, in: Proceedings of the Third ACM
Conference on Recommender Systems, 2009, pp. 173–180.
[9] H.X. Pham, J.J. Jung, Preference-based user rating correction process for interactive recommendation systems, Multimedia Tools & Applications 65
(2013) 119–132.
[10] R.Y. Toledo, Y.C. Mota, L. Martínez, Correcting noisy ratings in collaborative recommender systems, Knowledge-Based Systems 76 (2015) 96–108.
[11] R. Yera, J. Castro, L. Martínez, Natural noise management in recommender systems using fuzzy tools, Computational Intelligence for Semantic
Knowledge Management, Springer, Cham, 2020, pp. 1–24.
[12] B. Li, L. Chen, X. Zhu, C. Zhang, Noisy but non-malicious user detection in social recommender systems, World Wide Web-Internet & Web Information
Systems 16 (2013) 677–699.
[13] S. Bag, S. Kumar, A. Awasthi, M.K. Tiwari, A noise correction-based approach to support a recommender system in a highly sparse rating environment,
Decision Support Systems 118 (2019) 46–57.
[14] J. Castro, R. Yera, L. Martínez, An empirical study of natural noise management in group recommendation systems, Decision Support Systems 94 (2017)
1–11.
[15] P. Choudhary, V. Kant, P. Dwivedi, Handling natural noise in multi criteria recommender System utilizing effective similarity measure and particle
swarm optimization, Procedia Computer Science 115 (2017) 853–862.
[16] R. Yera, J. Castro, L. Martínez, A fuzzy model for managing natural noise in recommender systems, Applied Soft Computing 40 (2016) 187–198.
[17] J. Castro, R. Yera, L. Martínez, A fuzzy approach for natural noise management in group recommender systems, Expert Systems with Applications 94
(2018) 237–249.
[18] R.Y. Toledo, M.J. Barranco, A.A. Alzahrani, L. Martinez, Exploring fuzzy rating regularities for managing natural noise in collaborative recommendation,
International Journal of Computational Intelligence Systems 12 (2) (2019) 1382–1392.
[19] L.A. Zadeh, Fuzzy Sets, Information and Control 8 (3) (1965) 338–353.
[20] J. Zhang, J.X. Huang, Q.V. Hu, Boosting evolutionary optimization via fuzzy-classification-assisted selection, Information Sciences 519 (2020) 423–438.
[21] P. Wang, W. Wang, N. Meng, D. Xu, Multi-objective energy management system for DC microgrids based on the maximum membership degree
principle, Journal of Modern Power Systems & Clean Energy 6 (4) (2018) 668–678.
[22] C. Feng, J. Liang, P. Song, Z. Wang, A fusion collaborative filtering method for sparse data in recommender systems, Information Sciences 521 (2020)
365–379.
[23] Z. Tan, L. He, An efficient similarity measure for user-based collaborative filtering recommender systems inspired by the physical resource principle,
IEEE Access 5 (2017) 27211–27228.
[24] B.M. Sarwar, G. Karypis, J.A. Konstan, J. Riedl, Item-based collaborative filtering recommendation algorithms, in: Proceedings of the 10th International
WWW Conference, 2001, pp. 285–295.
[25] G. Yang, X. Zhao, G. Fan, A modification on the vector cosine algorithm of similarity analysis for improved discriminative capacity and its application to
the quality control of Magnoliae Flos, Journal of Chromatography A 1518 (2017) 34–45.
[26] J.L. Sanchez, F. Serradilla, E. Martinez, J. Bobadilla, Choice of metrics used in collaborative filtering and their impact on recommender systems, in:
Proceedings of 2008 2nd IEEE International Conference on Digital Ecosystems and Technologies, 2008, pp. 432–436.
637

1 s2.0 S0020025521004485 Main

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

1 s2.0 S0020025521004485 Main

Uploaded by

Copyright:

Available Formats

Information Sciences 570 (2021) 623–637

Contents lists available at ScienceDirect

An effective and efficient fuzzy approach for managing natural

Classification-based schemes Fuzzy-based schemes

Rating No-preferred item Av-preferred item Preferred item

3. A new fuzzy-based method to manage natural noise in RS

Scheme Membership function Fuzzy profile

[17] 1: (1,0,0)2: (0,1,0)3: (0,1,0)4: (0,0,1)5: (0,0,1)

[18] 1: (1,0,0)2: (0.5,0.5,0)3: (0,1,0)4: (0,0.5,0.5)5: (0,0,1)

Fig. 2. The overall framework of the proposed method.

3.1. Constructing membership function

Definition 1 [19]. A fuzzy set F in X is a set of ordered pairs

3.2. Building fuzzy profiles of ratings, users, and items

Fig. 3. The relationship between boundary point and membership function.

Rating value Fuzzy profiles of ratings

Fig. 4. The membership degree of each rating category.

3.3. Detecting natural noise

3.4. Correcting natural noise

4.1. Evaluation metrics

Dataset #Users #Items #Ratings Rating scale

4.2. The determination of d

4.3. The effect of proposed membership function

4.4. Performance analysis

4.4.1. The results on Movielens

4.4.2. The results on Yahoo Music

Fig. 9. The RMSEs of MF using different denoising management on Movielens dataset.

4.5. Efficiency analysis

Declaration of Competing Interest

You might also like