Professional Documents
Culture Documents
Application Research of Collaborative Filtering Algorithm in Catering Recommendation System
Application Research of Collaborative Filtering Algorithm in Catering Recommendation System
Application Research of Collaborative Filtering Algorithm in Catering Recommendation System
2023 19th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD) | 979-8-3503-0439-8/23/$31.00 ©2023 IEEE | DOI: 10.1109/ICNC-FSKD59587.2023.10280828
School of Information Science and Engineering, School of Information Science and Engineering,
Linyi University, Linyi University,
552762462@qq.com hujingyao@163.com
Keywords—recommendation system; Collaborative filtering; 1) Find a set of users with similar interests to the
target users
Similarity Given user u and user v, letting N (u) and N (v) represent the
collection of items that user u and v have had positive feedback.
I. INTRODUCTION Simply calculate the interest similarity between users u and v
The catering recommendation system not only meets the using the Jaccard formula, as shown in formula (1):
=|
| ∩ |
personalized catering needs of customers, but also helps
businesses establish long-term stable customer relationships, , ∪ |
(1)
reduces customer churn rates, and improves customer loyalty.
In reality, many users have not been engaged in historical
The focus of the catering recommendation system is the
behavior towards the same item, then | ( ) ∩ ( )|=0.
recommendation algorithm it uses. So studying and optimizing
Therefore, it is possible to establish an inverted list of items to
recommendation algorithms is very important.This paper
users, saving a list of users who have acted on each item and
mainly studies the application and improvement of
additionally, to reduce the influence of popular items on
collaborative filtering algorithm in catering recommendation
calculating similarity between users. Formula for calculating
system.
user similarity, as shown in formula (2)
Authorized licensed use limited to: Universidad Simon Bolivar (Colombia). Downloaded on May 13,2024 at 03:54:12 UTC from IEEE Xplore. Restrictions apply.
∑∈ ∩ | |
1 ,#
% , #$ = & ' #
The main idea of algorithm optimization is to improve the
∈( ,)$∩ #$ formula for calculating similarity based on the needs of business
scenarios. This paper will optimize the cosine similarity
(3)
commonly used in the collaborative filtering recommendation
B. Item-based collaborative filtering algorithm. The improvement ideas are as follows:
recommendation algorithm
A. Cosine similarity
Collaborative filtering based on items takes items as the core,
Cosine similarity is a measure of the difference between two
combines user historical behavior data, calculates the similarity
individuals. It is the cosine value of the angle between two
between items, and then recommends them to target users based
vectors in vector space,. When the cosine value approaches 1, it
on their similarity. The implementation process is as follows:
indicates that the angle between two vectors is closer to 0
1) Calculate similarity between items
degrees, indicating that the two vectors are more similar. When
Use formula (4) to define the similarity of items:
the cosine value is negative, it indicates that two vectors are
=
| * ∩ + |
*,+ | * |
negatively correlated. The calculation method for cosine
(4)
similarity in a two-dimensional vector is shown in formula (7):
The denominator | (#)| is the number of users who like item
51 52!71 72
#, ,$ = cos #, ,$ = =
0∙2$
3"0"3×3"2"3
i, and the numerator | (#)∩ (,)| is the number of users who (7)
2 2 2 2
85 !7 ×85 !7
like both item i and item j. If item j belongs to a popular item, it 1 1 2 2
will make any item very similar to the popular item. This is not
a good feature for recommendation systems dedicated to When considering the user's rating as an n-dimensional
discovering long-tail information. Therefore,Therefore, formula vector, the calculation method is shown in formula (8):
;
∑ : ×:,
(#9 #, ,$ =
(5) is used to avoid recommending popular items.
#,,<1 #
=
| * ∩ + |
(8)
2 2
8∑; : ×∑; :
*,+ | * || + |
(5) #<1 # , ,
can calculate the interest of user u in item j through formula (6): direction of vectors. The difference fitness brought by the scores
of different users is low, which will cause errors. If there are two
% = & '
users u1 and u2, their scoring vectors are respectively (5,4,3)
, ,# #
#∈ ∩- ,,)
and (2,1,3). At this time, the calculation result of cosine
Authorized licensed use limited to: Universidad Simon Bolivar (Colombia). Downloaded on May 13,2024 at 03:54:12 UTC from IEEE Xplore. Restrictions apply.
similarity is 0.869, which shows that the similarity between the the (J#)−JLℎ) value, the greater its weight, and the larger the
two users is very high. However, visually, there will be a large impact factor during calculation, the better the average weighted
difference in the preference similarity between u1 and u2, so the effect.
result calculated by cosine similarity is inconsistent with the
D. Add weight to the number of shared scoring items
actual situation. Therefore, an improved cosine similarity
Due to the limited number of common scoring items among
formula is used in this paper to solve the problem that cosine
some users, directly using the improved cosine similarity
similarity is not sensitive enough to numerical value.
calculation results can lead to bias and result in low
B. Improved cosine similarity recommendation quality. In order to improve this deviation, this
The core of the improved cosine similarity algorithm is to article further weights the similarity value, and the weight
subtract the mean value of the user's score from each dimension influence factor is calculated based on the number of common
of the vector,The formula is as follows: scoring items between users, as shown in formula (11):
|N# ∩N, |
× (#9 #, ,$, |N# ∩ N, | ≤ J
;
∑ : =:# ∙ :, =:,
(#9 #, ,$ =
#,,<1 #
(#9 #, ,$ = M J
In the above example, the average ratings of users u1 and u2 Among them, |N#∩N,| is the number of common rating items
are 4 and 2, respectively. It is used to calculate user u1, and the among users, T is the set threshold for the number of items, and
u2 rating vectors are (1,0, -1) and (0, -1,1), respectively. At this the ratio of the two can be regarded as a penalty factor. When
point, the cosine value is calculated to be approximately -0.5, the number of common rating items is less than T, the credibility
and the result is negative, indicating that there is a significant of their rating similarity is also smaller, and the similarity value
difference in preferences between the two users, and this result calculated through weighting is more realistic; When the
is also more realistic. number of common scoring items is greater than T, no
weighting is applied, and the similarity value obtained in the
C. Time parameters were added for weighted average
(#9 #, ,$ = 0.5 +
′ (#9 #,,
2
(12)
rating score of user i, and :#1 the most recent rating of user i; JLℎ
The dataset used in this article is from the MovieLens
dataset provided by the GroupLens research group. The
is a set time threshold parameter, indicating that when
MovieLens dataset is one of the most classic datasets in the field
calculating scores, only the average weighting is performed
of recommendation systems. This article uses the Pandas library
Authorized licensed use limited to: Universidad Simon Bolivar (Colombia). Downloaded on May 13,2024 at 03:54:12 UTC from IEEE Xplore. Restrictions apply.
experiments on training data, and finally calculates MAE.
Figure 2 shows the comparison of the final experimental results.
0.82
0.78
0.74
0.7
0.66
0.62
0.58
0.54
0.5
Improved Weighted Improved cosine Cosine
Cosine
Fig 1 data source
B. evaluating indicator From the experimental results, it can be seen that the
Combined with practical application requirements, in the improved weighted cosine calculation method proposed in this
process of collaborative filtering recommendation algorithm article has obvious advantages, and the MAE comparison is
research, in order to evaluate the quality of recommendations, significantly reduced.
the average absolute error (MAE) is generally used. MAE
V. SUMMARIZE
measures the accuracy of system recommendations by the
This article demonstrates the shortcomings of traditional
difference between the user's predicted value and the user's real
collaborative filtering algorithms and proposes the use of an
score in the test data set. This indicator first needs to sum the
improved cosine similarity formula, adding time threshold
absolute values of the differences between N corresponding
parameters for weighting, and re weighting the similarity results
predictions, and then takes the average value to calculate MAE.
before normalizing them there are still shortcomings:
The calculation method is shown in formula (13):
∑
A. Seasonal issues with dishes
|%# =Y# |
VWX =
#<1
(13) Seasonal issues refer to the fact that users generally enjoy
eating different dishes in different seasons, and their dietary
The lower the MAE value, the more accurate the prediction
interests will not remain unchanged and will change according
score of the algorithm, and the higher the quality of
to the needs of the season.
recommendations.
B. Dish pairing problem
C. Result analysis
Dish pairing problem:it refers to the fact that when ordering
This experiment compared three different similarity
a meal, users do not always order one type of dish, but rather
algorithms, namely cosine, improved cosine, and improved
different types of dishes are paired with each other.
weighted cosine. For each similarity algorithm, this article
implements user based similarity calculation, uses weighted In response to the above issues, this article will conduct
Authorized licensed use limited to: Universidad Simon Bolivar (Colombia). Downloaded on May 13,2024 at 03:54:12 UTC from IEEE Xplore. Restrictions apply.
REFERENCES [10] Rubaiee S , Zhao Zhenyi, Jian Zhou. An improved
[1] Nándor Fodor,Péter Csathó, Tamás Árendás, László association rule mining algorithm for large data[J]. Journal of
Radimszky, Tamás Németh. Crop Nutrient Status and Nitrogen, Intelligent Systems, 2021, 30(1).
Authorized licensed use limited to: Universidad Simon Bolivar (Colombia). Downloaded on May 13,2024 at 03:54:12 UTC from IEEE Xplore. Restrictions apply.