1 s2.0 S0957417422004043 Main

Expert Systems With Applications 201 (2022) 116979
Contents lists available at ScienceDirect
Expert Systems With Applications

journal homepage: www.elsevier.com/locate/eswa
An attention-based convolutional neural network for recipe recommendation

Nan Jia a,d , Jie Chen b ,∗, Rongzheng Wang c
a
School of Information Engineering, Hebei GEO University, Shijiazhuang, China
b
The First Affiliated Hospital, Guangzhou University of Chinese Medicine, Guangzhou, China
c
School of Data and Computer Science, Sun Yat-sen University, Guangzhou, China
d
Intelligent Sensor Network Engineering Research Center of Hebei Province, Shijiazhuang, China
ARTICLE INFO ABSTRACT
Keywords: The boom in cuisine websites has accumulated a wealth of recipe data, as well as interaction data between
Recipe recommendation users and recipe. Based on these data, users can get recommendation that meet their tastes on recommendation
Attention mechanism algorithm. In this paper, we propose an attention-based convolutional neural network for recipe recommen-
Convolution neural network
dation. Specifically, we use attention mechanism to capture users’ preferences for different ingredients. At the
same time, we use multi-perspectives convolution neural network to extract user features and recipe features
at higher level. Furthermore, a multi-layer neural network is used to model the interaction between users
and recipes according to their features. The experimental results show that our method achieves the better
recommendation results compared with other traditional methods.
1. Introduction time, etc. Therefore, how to effectively utilize this information to model
users’ preferences is another challenge that needs to be solved.
With the improvement of the material standard of living, the diet To address the first challenge, we collected the feedback data of
has gradually changed from subsistence consumption to development users’ behavior of creating and sharing recipes, and the interactive data
consumption, and people’s demand for diversified recipes has been between users and recipes, such as clicking and browsing recipes, are
increasing (Ge et al., 2015; Teng et al., 2012). Traditional paper recipes ignored. To address the second challenge, we transfer the user interest
can no longer meet people’s needs, so people start to look for satis-
in the recipe into user interest in the ingredients, and we employ the
fying recipes from websites (Kusmierczyk & Nørvåg, 2016; Lin et al.,
attention mechanism (Firat et al., 2016) to capture fine-grained user
2014). The cuisine website which has accumulated a huge amount
preference from the food ingredients. To address the last challenge, we
of recipes has attracted a large number of users to find and share
take advantage of the multi-perspectives convolutional neural network
recipes (Kusmierczyk et al., 2015; Min et al., 2017a). However, the rich
recipes on the Internet, on the one hand, makes the user’s choice more to extract more abstract high-level features from multiple factors (such
abundant (Marin et al., 2019). But on the other hand, it also brings as cooking methods, cooking time).
users a serious problem of information overload when choosing recipes, Different from the existing recommendation methods (Freyne &
which makes them difficult or expensive to find their favorite recipes. Berkovsky, 2010; Teng et al., 2012), we propose a recipe recom-
There are a lot of interactive data between users and recipes on mendation algorithm based on convolutional neural network (Ketkar,
cuisine websites (Min et al., 2017b; Wang et al., 2015). Based on 2017) with attention mechanism in this paper. Although existing meth-
these data, recommendation algorithm can be used to recommend ods (Freyne & Berkovsky, 2010; Teng et al., 2012) also take the
users the suitable recipes (Lops et al., 2011; Okura et al., 2017; Park ingredients into account in the recipe recommendation, we differ from
et al., 2007). However, there are the following challenges in recipe theirs in terms of modeling ingredients, and we propose a tree structure
recommendations: (1) It is difficult to collect explicit rating data from to code the food based on the ingredients containing in each food. In ad-
users, because the interactive data between users and recipes is not dition, we give more weight to ingredients in recipe recommendations,
always representative of the user’s interest in the recipe. (2) Ingredient
while existing methods use ingredients more as auxiliary information.
is one of the most factor that affects users’ preferences for recipes in diet
The experimental results show that our recipe recommendation
recommendation. Therefore, how to capture and model the ingredient
model outputs other baseline models with HR@10 value of 0.219
information in recipes is a challenge. (3) There are another factors af-
fecting users’ preferences for recipes, such as cooking methods, cooking and NDCG@10 value of 0.118, and the proposed attention mechanism
∗ Corresponding author.
E-mail addresses: jianan_0101@163.com (N. Jia), 2287550@qq.com (J. Chen), wangrzh@mail2.sysu.edu.cn (R. Wang).
https://doi.org/10.1016/j.eswa.2022.116979
Received 29 October 2021; Received in revised form 21 January 2022; Accepted 23 March 2022
Available online 4 April 2022
0957-4174/© 2022 Elsevier Ltd. All rights reserved.
N. Jia et al. Expert Systems With Applications 201 (2022) 116979
and multi-perspectives convolutional neural network are proved to be Table 1

The collected data set.
effective in the experiments. This paper contributes the following: (1)
To our knowledge, our recipe recommendation method is the first one Datatype Attribute
that transfers the user interest in the recipe into user interest in the User Information author’s unique identity,
region, gender
ingredients or cooking methods. (2) We apply the attention mechanism
to capture the fine-grained user preference for food. (3) The experiment Recipe Information names of dishes, ingredients,
cooking steps, number of
results show that our method can effectively recommend the recipes for
collectors, number of visitors,
the users. type of dish, difficulty, comments
The rest of the paper is organized as follows. First, we introduce
related work in Section 2, and discussing the major approaches to
Table 2
personalized recipe recommendation. Then we introduce the data set in
Data set statistics.
Section 3, including data collection and preprocessing, and ingredient
Number of users Number of the shared recipes
and food coding. The convolutional neural network recommendation
24,918 1
algorithm will be introduced in Section 4. Section 5 gives a discussion
15,056 2
about the experiment results. In Section 6, we will focus on the threats 6,232 >5
that could affect the results of our study. Finally, the conclusion and 2,904 >10
future work will be given in Section 7.
2. Related work
Elsweiler et al. (2017) improved recommendation quality by extracting
image information as new features. Teng et al. (2012) et al. used the
In this section we discuss the major approaches to personalized
recipe recommendation. A growing number of recipes have been cre- processing of user comment information to establish the relationship
ated with the wide growth of food categories and the diversification between ingredients. With the increasing attention to health, more
of ingredient combinations. As the number of recipes accumulates, and more recipe recommendation studies are gradually considering the
choosing healthy and palatable recipes becomes more time-consuming. issue of diet health. Some studies have improved the recommendation
In addition, although people’s choices have become richer, people’s algorithm to consider the healthy components of recipes when making
judgment on the health of food is often not accurate (Brunner et al., recommendations, for the sake of obtaining healthier recommendation
2001), which leads to people’s frequent choice of unhealthy recipes. results (Ge et al., 2015; Trattner & Elsweiler, 2017). Other studies con-
Therefore, it is of great significance to solve the problem of information sider recommending dietary plans to users based on the recommended
overload when users choose recipes and to recommend healthy recipes results to ensure the overall dietary health of users (Elsweiler & Harvey,
for users. With the increasing emphasis on health, research in the field 2015).
of dietary recommendations has become active.
The premise of healthy recipe recommendation is to understand the 3. Data collection and preprocessing
nutritional content of recipes, which has been investigated for a long
time (Schneider et al., 2013; Trattner & Elsweiler, 2017). Schneider 3.1. Data set
et al. (2013) collected 96 recipes and analyzed their health components
through the diet blog, the results of which showed that the energy We use the crawler technology to collect 57,193 users, a total of
contained in the recipes met health standards, but was excessive in 230,000 data instances from the cuisine website Douguo.1 Each user
sodium and saturated fat. A number of recent studies (Christoph et al., instance records the users’ behavior of creating and sharing recipes.
2017; Trattner & Elsweiler, 2017) have collected a large number of The data instance consists of user information and recipe information.
online recipes, and analyzed the contents of five nutrients in the recipe User information includes user gender, location and other information,
according to the international health organization’s standards on food while recipe information includes ingredient, cooking method and
composition, namely calories, saturated fat, unsaturated fat, sodium, other information. The detailed data records are shown in Table 1.
and carbohydrate. The results showed that 5.7% of the five diets failed For data sets in the recommendation field, such as MoviceLens (He
to meet the standard, and 67.7% of the five diets only met the standard et al., 2017) and medical diagnosis (Chang et al., 2020), it usually faces
of one or two. the problem of data sparsity. To verify the sparsity of the recipe data
The establishment of early recipe recommendation system often set collected in this study, we counted the number of recipes created
relies on domain knowledge (Hammond, 1986; Hinrichs, 1989). With by each user, as shown in Table 2. It can be seen that the number of
the development of recommendation system technology, content-based users who created only one recipe was 24,918, and the number of users
recommendation algorithm and collaborative filtering recommendation who created recipes greater than 5 was 6,232, which means that nearly
algorithm have also been applied in the field of recipe recommendation. 90% of users created recipes less than 5. Therefore, the recipe data set
Considering the inherent characteristics of the recipe recommendation we collected also faces the serious problem of data sparsity. The sparse
field, it is often the fine-grained ingredients and cooking methods nature of the data makes it difficult to capture user preferences. For this
that determine the user’s preference for the recipe. Some researchers purpose, we follow the recommendation domain processing method,
try to identify user preferences based on the ingredients of a recipe. filtering the data first, and then selecting users who share more than 5
Freyne and Berkovsky (2010) proposed that users’ ratings of recipes recipes. Finally, our data set contains 6,232 users with a total of 49,752
should be converted to users’ ratings of corresponding ingredients, so records.
as to predict users’ ratings of target recipes based on ingredients. In
order to find more useful information from recipes for recommenda- 3.2. Data preprocessing
tion, Teng et al. (2012) established a food supplement network and
a food substitute network by analyzing the composition of ingredi- The recipe data set we collected is a set of recipe data shared by
ents in recipes, for the purpose of capturing the relationship between users on cuisine website. First, we need to identify the same recipe
ingredients. In addition, note that users’ online ratings and pictures and distinguish different recipes. However, it is tricky to identify and
of recipes can be somewhat indicative of their preferences or eating
habits. Therefore, some researches add these auxiliary information into
1
the recommendation algorithm to improve the recommendation effect. https://www.douguo.com
2
Table 3
The recipe types and cooking methods.
Recipe type Cooking method
Bread, biscuits, soup, fried, stir-fry, braised, stuffy,
fried rice, hot pot, cake, stewed, steamed, scald, boil,
noodles poach, pickle, cold mix, bake,
halide, draw silk, honey sauce,
sauce blasting, rinsing, baking,
salt baking, boiling, dry
salting, fumigation
3.2.2. DBSCAN-based food clustering

We first extract ingredients from each recipe and in accordance with
Fig. 1. Tree structure food coding. the above tree coding method to encode the ingredients. Subsequently,
we extract each recipe of cooking methods according to the relevant
keyword matching. We will convert cooking methods to a 22 dimension
distinguish a recipe by its name due to the user’s irregular filling of binary vector, where 1 represents recipe description of corresponding
recipe information and the alias of the recipe. If the same dish cannot cooking method, 0 means no. At the same time, we extract the recipe
be identified, part of the information will be lost and the data will be type of each dish based on keywords and labels. Finally we use DBSCAN
sparse. To this end, we cluster recipes according to ingredients and method to cluster the recipe based on this.
cooking methods of recipes. Our goal is to aggregate identical or similar 𝑁𝜀 (𝑝) is the set of recipes within the radius 𝜀 of the recipe, it is
recipes into the same recipe cluster. To be specific, first of all, we use shown as
the coding rules of tree structure to code food ingredients, and ex-
tract the cooking methods corresponding to recipes. Subsequently, the 𝑁𝜀 (𝑝) = {𝑞|𝑞 ∈ 𝐷, 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒(𝑝, 𝑞) ≤ 𝜀} (1)
Density Based Spatial Clustering of Applications with Noise (DBSCAN)
method is used to carry out recipe clustering with these information. where the distance is shown:
∑
𝑁
𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒(𝑝, 𝑞) = 𝑖𝑛𝑔𝑟𝑒𝑑𝑖𝑒𝑛𝑡(𝑝, 𝑖) − 𝑖𝑛𝑔𝑟𝑒𝑑𝑖𝑒𝑛𝑡(𝑞, 𝑖) (2)
3.2.1. Ingredients coding 𝑖=1
First of all, we extracted the names and codes of all the ingredients
where p, q is the recipe, and N is the total number of ingredients in
according to the ‘‘Chinese Food Composition Table’’2 released by the
a recipe. To make sure there are the same number of ingredients in
Nutrition and Health Institute of the Chinese Center for Disease Control
a recipe, N is a fixed value of 8 in this study. ingredient(p,i) function
and Prevention. However, the standard food ingredient coding rules
do not reflect the classification of food materials. For example, in represents the coded value of the 𝑖𝑡ℎ ingredient in the recipe P in the
the standard food code, category 07 is nut and category 08 is pork. tree code. The DBSCAN based recipe clustering is shown in algorithm
Nut and pork codes are close to each other, but actually there is no 1. Through clustering, we finally got 2,533 dishes.
obvious relationship between the two categories. Hence, in order to
better reflect the classification relationship between food materials,
we use the food coding rules with tree structure based on the basis
of the standard food composition coding rules as shown in Fig. 1.
For example, the brisket is coded as 000106, the first 0 is reserved
without reference, the second 0 means the food is animal (1 means
the food is plant), the third 0 means the food is bovine, the last three
means a specific part of the cow, 106 means the brisket. The coding
of tree structure can better reflect the classification and hierarchical
relationship between food materials.
A recipe can be seen as a combination of ingredients and cooking
methods. Even the same ingredients can make a big difference. For ex-
ample, tomato egg-drop soup and tomato scrambled eggs are basically
the same ingredients, but they are two different dishes. Based on this, in
order to carry out clustering more accurately, we extracted the cooking
method of the recipe from the recipe making step description. At the
same time, we divide recipes into 7 types to ensure that different types
of recipes cannot be treated as the same recipe cluster. It should be
noted that we only distinguish bread, biscuit, soup, fried rice, cake,
noodles and hot pot. Instead of the recipe in the 7 recipe types, we
will default to the daily recipe of the same category. The related recipe
types and cooking methods are shown in Table 3. It can be seen that
there are 7 recipe types and 22 cooking methods, with one or more
cooking methods for each dish.
2
http://www.chinanutri.cn/xxzy/
3
respectively. Therefore, the user vector 𝑝𝑢 through the embedding layer

can be expressed as:
𝑝 𝑢 = 𝑃 𝑇 ⋅ 𝑣𝑢 (3)
Handling by the embedding the layer, recipe vector 𝑞𝑖 (containing

multiple ingredients vector 𝑞𝑖𝑗 ) can be expressed as:
𝑞𝑖 = 𝑄𝑇 ⋅ 𝑣𝑖 (4)
After the embedding layer is the attention layer. The attention

layer is used to capture the user’s preferences for different ingredients.
A recipe often contains a variety of ingredients, and users have dif-
ferent preferences for different ingredients. Thus we propose to use
the attention mechanism to capture users’ preferences for different
ingredients. Specifically, the input of the attention layer is a number of
food material vectors obtained through the embedding layer. Through
the attention layer, each food material gets the corresponding weight
value, and the final output is the weighted sum of the food material
Fig. 2. Overview of the recipe recommendation model. vectors. We will discuss this layer in detail in Section 4.2
The attention layer is followed by the convolutional layer. In order
to further extract higher level user vector and recipe vector. A Multiple
Perspectives Convolutional Neural Network (MP-CNN) is proposed.
4. Recipe recommendation model
This network can model the user or recipe vector from multiple per-
spectives and can use the Convolutional Neural Network to extract the
In this section, we first introduce the overall framework of a convo-
high-level features of the user or recipe. We will introduce MP-CNN in
lutional neural network recommendation algorithm based on attention
detail in Section 4.3.
mechanism proposed in this paper, then we elaborate the attention
The final output layer of the model is the multi-layer perceptron
layer and the multi-perspective convolutional neural network layer in
network layer, the input of which is the user vector, recipe vector, and
this model in detail. Furthermore, we explain the probabilistic basis and
user information and recipe information of the above network output.
training process of the model.
The network layer models the interaction between users and recipes
through multi-layer perceptrons combined with user information and
4.1. Overview recipe information. The output of the network layer is the user’s pref-
erence rating for recipe 𝑦̂𝑢𝑖 . So far, the mathematical representation of
The convolutional neural network recommendation model based on the convolutional neural network based on attention mechanism is as
attention mechanism is a layered neural network model, which can follows:
model the complex interaction method between users and recipes. It
mainly includes input layer, embedding layer, attention layer, convo- 𝑦̂𝑢𝑖 = 𝑓 (𝑃 𝑇 𝑣𝑢 , 𝑄𝑇 𝑣𝑖 |𝑃 , 𝑄, 𝛩𝑓 ) (5)
lutional layer and multi-layer perceptron layer. The general framework Where 𝑃 ∈ 𝑅𝑀×𝐾 , 𝑄 ∈ 𝑅𝑁×𝐾 represent the implicit matrix of the
of the model is shown in Fig. 2. user and the item, 𝛩𝑓 is the parameter of the model, the function 𝑓
The bottom layer of the model is the input layer which includes user represents the entire neural network, and its mathematical expression
vector, recipe vector, user personal information and recipe information. is as follows:
Specifically, we use 𝑉𝑢 to represent the user vector. 𝑉𝑢 is the one-hot
coding vector uniquely identified by the user. We use 𝑉𝑖 to represent 𝑓 (𝑃 𝑇 𝑣𝑢 , 𝑄𝑇 𝑣𝑖 ) = 𝜙𝑜𝑢𝑡 (𝜙𝑛 (...𝜙1 (𝑃 𝑇 𝑣𝑢 , 𝑄𝑇 𝑣𝑖 ))) (6)
the recipe vector, which is the binary vector of the recipe about the
𝜙𝑜𝑢𝑡 is the mapping function of the output layer, and 𝜙𝑛 is the
ingredients. The 0 or 1 in the vector means that the recipe does not mapping function of the 𝑛th layer neural network.
have or has the corresponding ingredients. The user’s personal infor-
mation includes gender, age and location, while the recipe information 4.2. Attention layer
includes the time and difficulty of cooking. To keep the model generic,
user personal information and recipe information are optional inputs In general, a Chinese dish consists of ingredients and cooking meth-
to the model. ods. A vector consisting of ingredients and cooking methods contains
After the input layer is the embedding layer, which is used to map more information to better identify a recipe than a one-hot vector that
the input sparse vectors into dense vectors of the low-dimensional is coded with a recipe’s unique identifier. But Chinese recipes often
eigenspace. Traditional recommendation models, such as matrix de- contain a variety of ingredients, including main ingredients, auxiliary
composition which decomposes the interaction matrix into two small ingredients and seasonings. Each ingredient plays a different role in
matrices that can be regarded as the representation of users or items in the recipe. Take yu-fragrant shredded pork as an example, its list of
the low-dimensional feature space. Compared with the matrix decom- ingredients includes pork, black fungus, carrot, garlic, ginger, vinegar,
position, the embedding layer can be regarded as the feature space to sugar, etc. Among them, the importance of pork, black fungus and
find users and recipes. It can also be viewed as the full connection layer, carrot may be higher than that of garlic and ginger. At the same time,
in which the input is the sparse user or item vector, and the output is users have different preferences for the ingredients in the recipe. In
the trained weight value of each node. Specifically, considering a user order to capture the importance of each ingredient in a recipe, we
interactions with the recipe for ⟨𝑢, 𝑖⟩, user’s one-hot coding vector is take advantage of the attention mechanisms to solve this problem.
viewed as 𝑉𝑢 ∈ 𝑅|𝑈 |×1 , recipe vector is viewed as 𝑉𝑖 ∈ 𝑅|𝐼|×1 . The matrix Specifically, the input of the attention layer is a number of food
corresponding to the user’s low-dimensional feature space is denoted as material vectors obtained through the embedding layer. Through the
𝑃 ∈ 𝑅|𝑈 |×𝑑 , and the matrix corresponding to the food’s low-dimensional attention layer, each food material gets the corresponding weight value,
feature space is denoted as 𝑃 ∈ 𝑄|𝐼|×𝑑 . Where 𝑑 is the dimension of and the final output is the weighted sum of the food material vectors.
low-dimensional feature space of users and food materials, |𝑈 | and |𝐼| The input and output model of attention layer is shown in Fig. 3.
refer to the total number of users and the total number of food materials First, embedding vector of user’s one-hot coding and recipe is obtained
4
Fig. 3. Attention layer network diagram.
through the user’s implicit vector 𝑝𝑢 and implicit vector 𝑞𝑖𝑗 of food
ingredients in the recipe. 𝑝𝑢 and 𝑞𝑖𝑗 are taken as inputs of attention
layer to obtain the weight of each food material, namely attention
score 𝛼(𝑢, 𝑗). The attention layer can be regarded as a two-layer neural Fig. 4. Multi-perspectives convolutional neural network diagram.
network, and the mathematical expression of attention score is as

follows:
Where, 𝑝𝑢𝑛 represents the vector generated by the 𝑛𝑡ℎ linear transfor-
𝛼(𝑢, 𝑗) = 𝑤𝑇1 𝜙(𝑤𝑢 𝑝𝑢 + 𝑤𝑖 𝑞𝑗𝑖 + 𝑏) + 𝑐 (7)
mation on the input vector 𝑝𝑢 , and 𝑤𝑛 is the parameter of the linear
Where 𝑤𝑢 , 𝑤𝑖 , 𝑏 are parameters of the first network layer, 𝑤1 and bias 𝑐 transformation. On top of the linear transformation layer are several
are parameters of the second network layer. Function 𝜙 represents relu residual network layers, through which a deeper neural network can
function. After the attention scores of each ingredient are calculated, be built to enhance the learning ability of the model. The mathematical
the scores need to be normalized. The mathematical expression is as representation of the residual network is as follows:
follows: ∑
𝐿−1
𝑒𝑥𝑝(𝛼(𝑢, 𝑗)) 𝑝𝐿 = 𝑝𝑙 + 𝐹 (𝑝𝑖 , 𝑤𝑖 ) (11)
𝛼(𝑢, 𝑗) = ∑ (8) 𝑖=𝑙
𝑛∈𝑅(𝑗) 𝑒𝑥𝑝(𝛼(𝑢, 𝑗))
where 𝑝𝑙 is the output of the residual network of layer 𝑙, 𝑝𝐿 is the output
Finally, the recipe vector 𝑞𝑖 is obtained by weighting combination
of the residual network of layer 𝐿, 𝑤𝑖 is the parameter of the residual
according to the weight of each ingredient. Its mathematical calculation
network of layer 𝑖, and function 𝐹 is the residual function.
is as follows:
{ } After the input vector manipulated by multiple linear transforma-
| 𝑞𝑖∗ |
∑ tion and residual network, multiple feature vectors are obtained, which
𝑞𝑖 = 𝛼(𝑢, 𝑗) ⋅ 𝑞𝑗𝑖 (9) are denoted as {𝑥𝑢1 , 𝑥𝑢2 , … , 𝑥𝑢𝑛 }. These 𝑛 vectors can be regarded as
𝑗=1
abstract features extracted from different views of the input vector.
4.3. Multi-perspectives convolutional neural network Moreover, these vectors can better represent the high-level features of
users or items, based on which we use the convolutional neural network
The role of the embedding layer is to map sparse user and recipe to extract higher-level features from {𝑥𝑢1 , 𝑥𝑢2 , … , 𝑥𝑢𝑛 }.
vectors to the corresponding low-dimensional feature space, and the
dimension of the low-dimensional feature space corresponds to the 4.4. Training algorithm
output dimension of the embedding layer. In general, the larger the
dimension of the feature space, the more information it contains. The To train the parameters of the model, we can train the model by
feature vector can well represent the user or the recipe. However, minimizing the objective function. Many studies use mean square error
the feature space of large dimension, on the one hand, requires the as the objective function. The formula is as follows
∑
use of more complex models for modeling, on the other hand, it may 𝐿𝑠𝑞𝑟 = 𝑤𝑢𝑖 (𝑦𝑢𝑖 − 𝑦̂𝑢𝑖 )2 (12)
introduce more invalid features. In order to solve this problem, a multi- (𝑢,𝑖)∈𝑦∪𝑦−
perspectives convolutional neural network is proposed to extract higher
Where 𝑦 is the observable interaction, and 𝑦− is the negative sample
level user and recipe vectors.
instance. 𝑤𝑢𝑖 is the training weights of instance user 𝑢 to items 𝑖. 𝑦𝑢𝑖 is
The multi-perspectives convolutional neural network proposed in
the real value, 𝑦̂𝑢𝑖 is the predictive value. The mean square error can
this paper was inspired by the multi-headed attention proposed by
be viewed as assuming that the observed sample satisfies the normal
Vaswani et al. (2017). Multi-headed attention first performs a linear
distribution. Since the value of 𝑦𝑢𝑖 in our implicit data is only 0 or 1, it is
transformation on the input vector, then does the dot product h times
not appropriate to use the mean square error as the objective function.
for attention to form multiple heads, one head for each dot product.
Our training task can be regarded as a binary task, so we use the cross
Multi-headed attention has the advantage of allowing the model to
entropy as the objective function and its mathematical representation is
learn relevant information in different presentation subspaces.
as follows. Furthermore, we use the stochastic gradient descent method
The proposed multi-perspectives convolution neural network model
to minimize the objective function.
diagram as shown in Fig. 4. It can be seen that the input of multi- ∑
perspectives convolutional neural network is the user vector 𝑝𝑢 after 𝐿=− (𝑦𝑢𝑖 log 𝑦̂𝑢𝑖 + (1 − 𝑦𝑢𝑖 ) log(1 − 𝑦̂𝑢𝑖 )) (13)
embedding layer or the recipe vector 𝑞𝑖 after attention layer. The (𝑢,𝑖)∈𝑦∪𝑦−
layer above the input layer of the multi-perspectives convolutional
5. Experiment and result discussion
neural network is a linear layer, where the input vector of 𝑝𝑢 (𝑝𝑢 , for
example) is converted linearly to multiple vector as 𝑝𝑢1 , 𝑝𝑢2 , … , 𝑝𝑢𝑛 , the
In this section, we first introduce the evaluation metrics and param-
mathematical expression is as follows:
eter setting of the experiment, and then we will carry out the validation
𝑝𝑢𝑛 = 𝑤𝑛 𝑝𝑛 (10) experiment to answer the following questions.
5
RQ1: Does our recipe recommendation model perform better than • Eals: Point by point alternating least square method. Eals is an
other models? implicit feedback recommendation algorithm based on matrix de-
RQ2: Does the attention mechanism we proposed for the ingredients composition. Compared with matrix decomposition, this method
work? optimizes the objective function and gives different weights to the
RQ3: Is the proposed multi-perspectives convolutional neural net- negative samples.
work effective? • NeuCF: Neural network collaborative filtering algorithm, which
We ask RQ1 to evaluate the performance of the proposed approach combines with matrix decomposition using multi layer perceptron
when comparing with the existing recommendation algorithms, such to model the interaction between users and items, and achieves
as ItemPop, ItemKNN, BPR et al. We ask RQ2 to evaluate whether the good recommendation results.
attention mechanism applying on the ingredients is work. We ask RQ3
in order to evaluate the impacts of the multi-perspectives convolutional 5.2. RQ1
neural network proposed (i.e., MP-CNN) by us. Therefore, we compare
the effects of the model with and without adding MP-CNN. The three To answer research question 1, we evaluated the recommended
methods complement each other to verify the validity of the proposed performance of our model under different conditions compared to
model. other benchmark models. Fig. 5 shows the performance of our model
and the benchmark model in different dimensions [8,16,32,64]. Our
5.1. Evaluation criteria and experiment setting model and NeuMF model dimensions refer to the embedding layer
dimensions, BPR and Eals model dimensions refer to the hidden layer
To assess the effectiveness of recipe recommendation, we use the
dimensions, whereas for the ItemKNN model we try different K values
10-folds cross validation as an evaluation method. That is, for each
and select the best results. We use HR@10 and NDCG@10 to evaluate
user, we randomly select one interaction with the recipe as a test set
the performance of the model. The experimental results showed that
and use the rest of the nine data for model training. Meanwhile, Hit
our method obtained the best experimental results, with HR@10 value
Ratio(HR) (A et al., 2012) and Normalized Discounted Cumulative Gain
of 0.219 and NDCG@10 value of 0.118. Our model experimental results
(NDCG) (Busa-Fekete et al., 2012) are used as the metrics to evaluate
were significantly improved by 2.5% and 2.7% compared with Eals and
the recommendation performance. HR directly represents whether the
BPR models. This indicates that our model has better expressive ability
recipe in the test set appears in the recommended list. Its formula is as
and can model the interaction pattern of users and recipes well. As the
follows:
dimensions of our model grow, so does the performance of our model.
ℎ𝑖𝑡𝑠@𝐾
𝐻𝑅@𝐾 = (14) This may be due to the complexity of user features and recipe features,
𝑛
requiring a larger feature space to fully represent users and recipes.
where, 𝑛 is the total number of test sets and ℎ𝑖𝑡𝑠@𝑘 is the number of Fig. 6 shows the recommendation performance of our model and
the samples from the test set in the recommendation list. Therefore, benchmark model under different K values when carrying out top-
HR indicates how much samples from the test set appear in the rec- K recommendation, where K gradually increases from 10 to 100.
ommendation list, but it does not indicate the quality of the ranking Experimental results show that compared with other methods, the
in the recommendation list. To solve this problem, we use NDCG to
convolutional neural network model based on attention mechanism
evaluate the recommendation list. NDCG evaluates the quality of a
proposed in this paper achieves the best results in all top-K recom-
recommendation list by giving the top list a higher weight. Its formula
mendations. In addition, we further carried out paired T-test on the
is as follows:
experimental results, in which P <0.05 indicated that the performance
∑𝐾
2𝑟𝑖 −1 of our model was statistically significantly different from the other
𝑁𝐷𝐶𝐺@𝐾 = 𝑍𝐾 (15)
𝑙𝑜𝑔 𝑖 +1 benchmark models, that is, the performance of our model was improved
𝑖=1 2
compared with other models.
where, 𝑍𝐾 is the normalized coefficient to ensure that the result is
Our model uses a series of interactions between users and recipes
within (0,1), and 𝑟𝑖 indicates whether the sample at the 𝑖𝑡ℎ position
to mine and identify user preferences. To investigate the impact of
in the list is in the test set, where 𝑟𝑖 is 1 if in the test set, otherwise it
the number of user interaction recipes on our model, we extracted
is 0.
users with different number of interactions and tested the performance
In order to verify the performance of the attention-based convo-
of the model. It should be emphasized that we did not retrain the
lutional neural network model, we chose three types of mainstream
model according to user data of different interaction quantities, we only
recommendation algorithms for comparison. Namely, the item-based
extracted user sets with different interaction quantities of recipes, and
recommendation algorithm (i.e., ItemPop (Linden et al., 2003) and
tested different user sets with the trained model to verify the perfor-
ItemKNN (Hu et al., 2008)), matrix decomposition-based recommenda-
mance of the model. The experimental results are shown in Fig. 7. The
tion algorithm(i.e., BPR (Hu et al., 2020) and Eals (Juan et al., 1997)),
experimental results show that the recommendation performance of our
neural network based recommendation algorithm (i.e., NeuCF (Mai
model is superior to other models in all user sets with different number
et al., 2009)). To cover the three types of recommendation algorithm,
of recipe interactions. This proves that our model has good robustness.
we choose the following benchmark algorithms for comparison:
In addition, our model performs better than other models in data
• ItemPop: According to the popularity of the recipes, the most sets with less interaction with recipes, which also demonstrates that
popular recipes are recommended for users. The popularity of attention mechanism and multi-perspectives convolutional network can
recipes is measured by the number of people interacting with be used to improve recommendation performance when training data
recipes. is insufficient.
• ItemKNN: A collaborative filtering algorithm based on items. In
order to adapt the item-based collaborative filtering algorithm to 5.3. RQ2
the recommendation of implicit feedback data, we carried out
experiments by referring to the settings of Hu et al. (2008). In order to better understand the convolutional neural network
• BPR: Bayesian personalized sorting algorithm. BPR is a sorting model based on attention mechanism proposed in this study, we further
algorithm based on matrix decomposition. It optimizes the sorting verify the effectiveness of the attention mechanism in the model. It
through a series of interactions between users and items, so as to should be noted that when the attention mechanism is not used, each
recommend the optimal sorting results for users. food material vector has the same weight.
6
Fig. 5. Model performance in different dimensions.
Fig. 6. Model performance with different K values.
Fig. 7. Model performance in different interaction sparsity.
In this case, the attention layer is equivalent to the average sum It can be seen that when using the self-attention mechanism (Self-
of all the food material vectors. In order to better verify the effective ATT), the recommended performance of the model achieves the best
use of attention mechanism, we used ordinary attention mechanism result, in which the HR@10 value reaches 0.219 and the NDCG@10
and self-attention mechanism to carry out experimental evaluation. value reaches 0.118. Recommendations achieves the worst performance
Table 4 shows the experimental results on the attention mechanism. when the attention layer averages only the material vectors. In this
7
Table 4 Table 7
The performance of models corresponding to different attention layers. NDCG@10 value of the model with different parameters.
Model Attention layer Evaluation metrics Number of linear Number of residual network layers
transformations
HR@10 NDCG@10 1 4 8 12
AVG 0.198 0.096 1 0.096 0.105 0.108 0.112
OurModel ATT 0.217 0.117 4 0.098 0.106 0.110 0.113
Self-ATT 0.219 0.118 8 0.099 0.108 0.115 0.118
12 0.099 0.109 0.113 0.115
Table 5
The performance of models corresponding to different convolutional layers.
Model Convolution layer The evaluation index using for the study of the recipe recommendation. We collect 57,193
HR@10 NDCG@10 users, a total of 230,000 recipes from the website Douguo. Since we
– 0.196 0.100 need capture the interaction between the user and the recipes, and we
OurModel
MP-CNN 0.219 0.118 filter out the users who share less than 5 recipes. Then, the issue of
the data sparsity will be mitigated, and we can better capture the user
Table 6 preferences. After data filtering, we collect 6,232 users with a total of
HR@10 value of the model with different parameters. 49,752 recipes. In the future, we need to collect more user and recipe
Number of linear Number of residual network layers to extend our data set.
transformations
1 4 8 12 Another threat refers to the suitability of our evaluation measure.
1 0.189 0.198 0.203 0.205 We use a conventional measure to evaluate the effectiveness of the
4 0.195 0.204 0.211 0.214
models when recommend the recipes to the users. Because the issue in
8 0.191 0.208 0.213 0.219
12 0.192 0.209 0.215 0.213 this study can be modeled as a recommendation problem, we introduce
Hit Ratio(HR) and Normalized Discounted Cumulative Gain (NDCG) to
evaluate the performance of the proposed model. HR indicates how
much samples from the test set appear in the recommendation list and
case, each ingredient has the same weight, and we deliberately ignore
NDCG evaluates the quality of a recommendation list by giving the top
ingredient’s influence on the user’s preference for recipes by average
list a higher weight. The combination of the two metrics can evaluate
the ingredient vectors. In addition, both the self-attention mechanism
the effectiveness of the recommendation model. Thus, we believe there
and the ordinary attention mechanism achieves a higher recommen-
dation effect than the average of food materials. This indicates that is little threat to the suitability of our evaluation measure.
the attention mechanism proposed in this study is effective, which can
capture the user’s different preferences for different ingredients in the
6.2. The goal of this study
recipe. In this study, we employ the self-attention mechanism as the
attention layer.
Just like the shopping on e-commerce websites, the system will
5.4. RQ3 recommend us the potential goods we want to buy according to our
previous shopping information. Similarly, when we go to a cuisine
In order to verify the effectiveness of the multi-perspectives convo- website, the system can also recommend some recipes for us to choose
lutional neural network proposed in this paper, we first compare the from. Cuisine sites using recommendation algorithm have two bene-
effects of the model without adding this network, and the results are
fits. First, it can address the issue of information overload for users
shown in Table 5. The experimental results show that the HR@10 value
when they look for the favorite recipes from the cuisine sites, because
of the model is increased from 0.196 to 0.219, and the NDCG@10
many cuisine websites have accumulated a huge amount of recipes.
value is increased from 0.1 to 0.118 after the addition of the multi-
Second, with the help of the recommendation algorithm, cuisine site
perspectives convolutional neural network, which indicates that the
multi-perspectives convolutional neural network is effective. To further will become more intelligent and better able to understand the needs
verify the effectiveness of the network, we test the influence of different of users, thus increasing the stickiness of users to the cuisine site,
network parameters on the model. The experimental results are shown which is conducive to the lasting operation of the websites. Therefore,
in Tables 6 and 7. It can be seen from the table that with the increase the goal of our doing this research is that: we hope the proposed
of the number of linear transformations and the number of residual recommendation algorithm by us can assist cuisine site to recommend
network layers, the performance of the model is also increasing. The more suitable recipes to users.
recommendation model achieves the best performance when the num-
ber of linear transformations is 8 and the residual network layer is 12.
The HR@10 value reaches 0.219, and the NDCG@10 value reaches 6.3. Cross validation discussion
0.118. Even if the number of linear transformations remains the same,
stacking more layers of residual network can improve the performance
To minimize the influence of sample randomness on experimental
of the model. This further demonstrates that the effectiveness of the
results, we employ the 10-folds cross validation in the experiment.
multi-perspectives convolutional neural network. It can extract higher
By using 10-folds cross validation (Xing et al., 2019), the data set is
level features of users or ingredients to some extent.
randomly divided into 10 parts, and using 9 of them for training and
6. Discussion 1 for testing. This process can be repeated 10 times, with different test
data used each time. To make a comparison, we also apply the 5-folds
6.1. Thread to validity cross validation, i.e., 4 for training and 1 for testing. The 5-folds cross
validation is also used in many studies (Yuan et al., 2022). We can
In this section, we discuss the threats that could affect the results of observe that the difference in performance of the model by applying
our study. One threat to the validity relate to the scale of the data set 10-folds and 5-folds is only three decimal places (see Table 8).
8
Table 8 Elsweiler, D., Trattner, C., & Harvey, M. (2017). Exploiting food choice biases for
The performance of model using different cross validation. healthier recipe recommendation. In N. Kando, T. Sakai, H. Joho, H. Li, A.
Cross validation Evaluation metrics P. de Vries, & R. W. White (Eds.), Proceedings of the 40th international ACM SIGIR
HR@10 NDCG@10 conference on research and development in information retrieval (pp. 575–584). ACM,
http://dx.doi.org/10.1145/3077136.3080826.
10-folds 0.219 0.118
Firat, O., Cho, K., & Bengio, Y. (2016). Multi-way, multilingual neural machine
5-folds 0.218 0.115
translation with a shared attention mechanism.
Freyne, J., & Berkovsky, S. (2010). Recommending food: Reasoning on recipes and
ingredients. In P. D. Bra, A. Kobsa, & D. N. Chin (Eds.), Lecture Notes in Computer
Science: vol. 6075, User Modeling, Adaptation, and Personalization, 18th International
7. Conclusion and future work Conference (pp. 381–386). Springer, http://dx.doi.org/10.1007/978-3-642-13470-
8_36.
In this paper, we propose a multi-perspectives convolutional neural Ge, M., Ricci, F., & Massimo, D. (2015). Health-aware food recommender system. In
network with attention mechanism for recipe recommendation. By H. Werthner, M. Zanker, J. Golbeck, & G. Semeraro (Eds.), Proceedings of the
applying the attention mechanism on the vectorized recipes via the 9th ACM Conference on Recommender Systems (pp. 333–334). ACM, URL https:
ingredients, we transfer the user interest in the recipes into user interest //dl.acm.org/citation.cfm?id=2796554.
Hammond, K. J. (1986). CHEF: a model of case-based planning. In T. Kehler (Ed.),
in the ingredients, then the proposed model will learn the preference
Proceedings of the 5th National Conference on Artificial Intelligence (pp. 267–271).
weights of the ingredients of each user. In the experiment, we crawled Morgan Kaufmann, URL http://www.aaai.org/Library/AAAI/1986/aaai86-044.php.
230,000 interactive records of users and recipes from website Douguo, He, X., Liao, L., Zhang, H., Nie, L., & Chua, T. S. (2017). Neural collaborative filtering.
and carried out experiments based on these data. The experimental In The 26th international conference.
results show that the proposed model outputs other baseline models, Hinrichs, T. R. (1989). Strategies for adaptation and recovery in a design problem
and the proposed attention mechanism and multi-perspectives convo- solver. In Proceedings of the workshop on case-based reasoning.
lutional neural network are proved to be effective in the experiments. Hu, Y., Koren, Y., & Volinsky, C. (2008). Collaborative filtering for implicit feedback
datasets. In Proceedings of the 8th IEEE international conference on data mining (pp.
More efforts are in progress. One future research work mainly focuses
263–272). IEEE Computer Society, http://dx.doi.org/10.1109/ICDM.2008.22.
on improving the performance of the proposed model by expansion of
Hu, Y., Xiong, F., Pan, S., Xiong, X., & Chen, H. (2020). BayesIan personalized ranking
interactive records of users and recipes. Another future work is looking based on multiple-layer neighborhoods. Information Sciences, 542.
for opportunity to apply the proposed model to a real world scenario, Juan, A. D., Vander Heyden, Y., Tauler, R., & Massart, D. L. (1997). Assessment of new
such as website Douguo. In addition, we will try to find some external constraints applied to the alternating least squares method. ANALYTICA CHIMICA
validation dataset to validate the effectiveness of the proposed method. ACTA.
Ketkar, N. (2017). Convolutional neural networks. Springer International Publishing.
CRediT authorship contribution statement Kusmierczyk, T., & Nørvåg, K. (2016). Online food recipe title semantics: Combining
nutrient facts and topics. In CIKM ’16, Proceedings of the 25th ACM international
on conference on information and knowledge management (pp. 2013–2016). New
Nan Jia: Methodology, Validation, Writing – original draft. Jie York, NY, USA: Association for Computing Machinery, http://dx.doi.org/10.1145/
Chen: Investigation, Data curation, Software, Resources. Rongzheng 2983323.2983897.
Wang: Methodology, Software, Resources. Kusmierczyk, T., Trattner, C., & Nrvåg, K. (2015). Temporality in online food recipe
consumption and production. In Proc. 24th int. conf. world wide web (pp. 55–56).
Declaration of competing interest Lin, C. J., Kuo, T. T., & Lin, S. D. (2014). A content-based matrix factorization model
for recipe recommendation. In Pacific-asia conference on knowledge discovery and
data mining.
The authors declare that they have no known competing finan-
Linden, G., Smith, B., & York, J. (2003). Amazon.com recommendations: item-to-item
cial interests or personal relationships that could have appeared to
collaborative filtering. IEEE Internet Computing, 7(1), 76–80.
influence the work reported in this paper. Lops, P., Gemmis, M., & Semeraro, G. (2011). Content-based recommender systems:
State of the art and trends. In Recommender systems handbook.
Acknowledgment Mai, J., FaN, Y., & Shen, Y. (2009). A neural networks-based clustering collaborative fil-
tering algorithm in E-commerce recommendation system. In International conference
This work was supported by the National Natural Science Founda- on web information systems and mining.
tion of China (No. 61902105, No. 62172452), the fund of National Clin- Marin, J., Biswas, A., Ofli, F., Hynes, N., & Torralba, A. (2019). Recipe1M+: A dataset
for learning cross-modal embeddings for cooking recipes and food images. IEEE
ical Research Base of Traditional Chinese Medicine (No. [2018]131),
Transactions on Pattern Analysis and Machine Intelligence, PP(99), 1.
Sanming Project of Medicine in Shenzhen, China (NO. 202106006),
Min, W., Bao, B. K., Mei, S., Zhu, Y., Rui, Y., & Jiang, S. (2017). You are what you eat:
Software and Big Data Innovation of Shijiazhuang Key R&D Plan (No. Exploring rich recipe information for cross-region food analysis. IEEE Transactions
219790381G). on Multimedia, 1.
Min, W., Jiang, S., Wang, S., Sang, J., & Mei, S. (2017). A delicious recipe analysis
References framework for exploring multi-modal recipes with various attributes. In The 2017
ACM.
A, W. A., A, S. M. S., & B, A. S. I. (2012). Intelligent web proxy caching approaches Okura, S., Tagami, Y., Ono, S., & Tajima, A. (2017). Embedding-based news rec-
based on machine learning techniques. Decision Support Systems, 53(3), 565–579. ommendation for millions of users. In The 23rd ACM SIGKDD international
Brunner, E., Stallone, D., Juneja, M., Bingham, S., & Marmot, M. (2001). Dietary conference.
assessment in whitehall II: comparison of 7 d diet diary and food-frequency Park, M. H., Hong, J. H., & Cho, S. B. (2007). Location-based recommendation system
questionnaire and validity against biomarkers. British Journal of Nutrition, 86(3), using Bayesian user’s preference model in mobile devices. In International conference
405–414. on ubiquitous intelligence and computing.
Busa-Fekete, R., Szarvas, G., Élteto, T., & Kégl, B. (2012). An apple-to-apple comparison Schneider, E. P., Mcgovern, E. E., Lynch, C. L., & Brown, L. S. (2013). Do food blogs
of learning-to-rank algorithms in terms of normalized discounted cumulative gain.
serve as a source of nutritionally balanced recipes? An analysis of 6 popular food
In Ecai-12 Workshop.
blogs. Journal of Nutrition Education and Behavior, 45(6), 696–700.
Chang, W., Zhang, Q., Fu, C., Liu, W., & Lu, J. (2020). A cross-domain recommender
Teng, C., Lin, Y., & Adamic, L. A. (2012). Recipe recommendation using ingredient
system through information transfer for medical diagnosis. Decision Support Systems,
networks. In N. S. Contractor, B. Uzzi, M. W. Macy, & W. Nejdl (Eds.), Web science
143(21), Article 113489.
Christoph, T., David, E., & Simon, H. (2017). Estimating the healthiness of internet 2012 (pp. 298–307). ACM, http://dx.doi.org/10.1145/2380718.2380757.
recipes: A cross-sectional study. Frontiers in Public Health, 5(5). Trattner, C., & Elsweiler, D. (2017). Investigating the healthiness of internet-sourced
Elsweiler, D., & Harvey, M. (2015). Towards automatic meal plan recommendations for recipes: Implications for meal planning and recommender systems. In R. Barrett,
balanced nutrition. In H. Werthner, M. Zanker, J. Golbeck, & G. Semeraro (Eds.), R. Cummings, E. Agichtein, & E. Gabrilovich (Eds.), Proceedings of the 26th
Proceedings of the 9th ACM conference on recommender systems (pp. 313–316). ACM, international conference on world wide web (pp. 489–498). ACM, http://dx.doi.org/
URL https://dl.acm.org/citation.cfm?id=2799665. 10.1145/3038912.3052573.
9
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., Xing, H., Ge, L., Xin, X., David, L., & Zhi, J. (2019). Deep code comment generation
& Polosukhin, I. (2017). Attention is all you need. In Advances in neural information with hybrid lexical and syntactical information. Empirical Software Engineering,
processing systems 30: annual conference on neural information processing systems 2017 25(3), 2179–2217.
(pp. 5998–6008). Yuan, H., Xingjian, L., Zhihao, C., Nan, J., Xiapu, L., Xiangping, C., Zibin, Z., &
Wang, X., Kumar, D., Thome, N., Cord, M., & Precioso, F. (2015). Recipe recognition Xiaocong, Z. (2022). Reviewing rounds prediction for code patches. Empirical
with large multimodal food dataset. IEEE Computer Society. Software Engineering, 27(1).
10

1 s2.0 S0957417422004043 Main

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

1 s2.0 S0957417422004043 Main

Uploaded by

Copyright:

Available Formats

Expert Systems With Applications 201 (2022) 116979

Contents lists available at ScienceDirect

Expert Systems With Applications

An attention-based convolutional neural network for recipe recommendation

ARTICLE INFO ABSTRACT

and multi-perspectives convolutional neural network are proved to be Table 1

3.2.2. DBSCAN-based food clustering

respectively. Therefore, the user vector 𝑝𝑢 through the embedding layer

Handling by the embedding the layer, recipe vector 𝑞𝑖 (containing

After the embedding layer is the attention layer. The attention

Fig. 3. Attention layer network diagram.

network, and the mathematical expression of attention score is as

Fig. 5. Model performance in different dimensions.

Fig. 6. Model performance with different K values.

Fig. 7. Model performance in different interaction sparsity.

You might also like