Professional Documents
Culture Documents
Sharma 2021
Sharma 2021
https://doi.org/10.1007/s10639-021-10643-8
Abstract
Web recommendation systems are ubiquitous in the world used to overcome the
product overload on e-commerce websites. Among various filtering algorithms,
Collaborative Filtering and Content Based Filtering are the best recommendation
approaches. Being popular, these filtering approaches still suffer from various lim-
itations such as Cold Start Problem, Sparsity and Scalability all of which lead to
poor recommendations. In this paper, we propose a hybrid system-based book rec-
ommendation system that anticipates recommendations. The proposed system is a
mixture of collaborative filtering and content based filtering which can be explained
in three phases: In the first phase, it identifies the users who are analogous to the
active user by matching users’ profiles. In the second phase, it chooses the candi-
date’s item for every similar user by obtaining vectors V c and Vm corresponding to
the user’s profile and the item contents. After calculating the prediction value for
each item using the Resnick prediction equation, items are suggested to the target
user in the final phase. We compared our proposed system to current state-of-the-art
recommendation models, such as collaborative filtering and content-based filtering.
It is shown in the experimental section that the proposed hybrid filtering approach
outperforms conventional collaborative filtering and content-based filtering.
* Sunny Sharma
sunny202658@gmail.com
Vijay Rana
vijay.rana93@gmail.com
Manisha Malhotra
mmanishamalhotra@gmail.com
1
University of Jammu, Kathua Campus, Jammu, India
2
GNA University, Punjab, India
3
Chandigarh University, Punjab, India
13
Vol.:(0123456789)
Education and Information Technologies
1 Introduction
Web Mining is the process of extracting useful and desired information from the
web (Rettinger et al., 2012). It is enriched with the three mining standards which
are depicted in Fig. 1: Web Content Mining (WCM), Web Structure Mining (WSM)
and Web Usage Mining (WUM). WCM is a straightforward method of examining
the contents of a web page. WSM is the process of extracting information about web
pages, such as their ranking and how they are connected to one another. Algorithms
like Page Rank, Alexa Rank, and HITS etc.; are used in structure mining. WUM
analyses information about a user when he or she is browsing the internet (Adeniyi
et al., 2016). This information is often used to forecast the user’s future needs. None-
theless, the vast amount of data available on the internet poses a challenge to both
consumers and businesses. Multiple product options are offered to the customer for
a particular need, causing product overload (Sharma et al., 2019). As a result, both
researchers and businesses have emphasized the importance of computing-based
marketing strategies such as one-to-one marketing and Customer Relationship Man-
agement (CRM). Providing customized web recommendations in areas where a user
is interested is an effective strategy for overcoming product overload.
Web Recommender Systems (Moreno et al., 2016) inherit filtering processes to
predict the need of users in which the users might have a tendency to express. For
instance, think of the Netflix models where the users are recommended the contents
based on their past behaviors, or the way in which products are shown on Amazon.
A general view of such recommendation techniques is depicted in the Fig. 2. The
two most popular and recommended approaches to build a recommender system are:
Content Based Filtering (CBF) approach and Collaborative Filtering (CF) approach.
In order to predict which product a customer would like to purchase, it is important to
know what other customers with the same background purchase. This is the main idea
behind the CF. CBF is based on the contents the user has browsed or liked previously.
Content based is feasible only if there is data that defines what an individual user
likes. In Spite of the huge success of recommendation techniques, these techniques
have several limitations. Some challenges associated with Collaborative Filtering are
Cold Start Problem, Sparsity and Scalability (Polatidis & Georgiadis, 2016).
13
Education and Information Technologies
• Cold Start Problem: It occurs when a user has not rated any item yet and there-
fore an item can’t be recommended to the user.
• Sparsity: This problem occurs when the numbers of users are more than the
available ratings, because most of the users don’t rate most items and therefore
the user item rating matrix is in general extremely sparse. This is of the instance
when the ratio of user to item is high.
• Scalability: As there are millions of users and products on the web, so large
amount of computation power is required to calculate the similarity between
users to calculate recommendations.
13
Education and Information Technologies
candidate’s item for each neighbor. In the last phase, the system recommends the
selected items to the active user on the basis of prediction value for each item using
the Resnick prediction equation (Taghipour et al., 2008; Wu et al., 2015). The rest
of the manuscript is prepared in the following way: In Section 2, the study on rec-
ommender systems is discussed. In Section 3, we describe the proposed Hybrid Fil-
tering based recommendation system. Besides, the whole working of Collaborative
Filtering and Content Based Filtering algorithms is discussed. The results of CF and
CBF are directly compared to the proposed Hybrid system. The entire comparative
study is presented in Section 4. Finally, the paper is concluded in Section 5.
2 Background
Web Personalization was presented over two decades back and many eminent
researchers have presented various techniques in order to make the process of per-
sonalization as convenient as possible. According to Malik and Fyfe (2012), Web
personalization is divided into three stages: learning, matching, and recommenda-
tion. Implicit and explicit learning are the two kinds of learning in the Learning
phase. Collaborative Filtering (CF), Content Based Filtering (CBF), Rule Based,
and Hybrid Filtering are some of the filtration methods used in the matching phase.
The final phase, the recommendation phase, is in charge of providing customers
with a set of customized results. This paper also explores various researches which
have been carried out in the field of web personalization.
Many merchants store the data about the customers’ way of buying products
online. Different products are bought by different customers. Moreover, more than
one customer can buy the same product. So in order to know, which customer would
like to purchase what product, it is important to know what other customers with
the same background purchase? This is the main idea behind the collaborative fil-
tering. Collaborative Filtering based recommender systems have been used in wide
areas (Elkahky et al., 2015; Isinkaye et al., 2015; Li & Karahanna, 2015; Logesh &
Subramaniyaswamy, 2019). GroupLens (Miller et al., 2003) is a news based system
which exploits Collaborative Filtering Technique in order to provide news recom-
mendations from massive news databases to its users. Ringo (Shardanand & Maes,
1995) is an online social application which uses Collaborative Filtering in order to
build users’ profiles using the ratings of the users given to the music albums. The
Ecommerce platform, Amazon employs topic diversification algorithms for generat-
ing the recommendation list (Smith & Linden, 2017). To overcome the scalability
issue, Amazon uses collaborative filtering methods by producing a table of similar
products offline by using the item-to-item matrix.
Content Based Filtering is based on the items a user has previously clicked. A
few systems which use Content Based Filtering to present users with customized
information include MyBestBets, Letizia etc.; (Pazzani & Billsus, 2007). The Leti-
zia (Lieberman, 1995) makes use of user interface to provide personalized informa-
tion and to collect user’s profile.
Schouten et al. (2010) in 2010 presented an ontology based framework for
recommending the news. This ontology based framework consists of a news
13
Education and Information Technologies
classification phase, a news updating phase and a querying phase. Due to the
uncertainty of users’ patterns, exact forecasting remains a very challenging task
and predicting a user’s location is very essential for a wide range of applications.
Later in 2012, Santra and Jayasudha (2012) presented a recommendation system.
Data mining techniques like Naïve Bayes method and Neural Networks method
are exploited to prove that the accuracy of prediction is better. After a couple
of years, Al-Hassan et al. (2015) proposed a semantic enhanced hybrid recom-
mendation approach by incorporating the semantic of the items to the item based
collaborative filtering approach for improving the recommendations in E-gov-
ernment domains. Further, a new ontology based semantic similarity approach
is proposed to find the similarity between the ontology instances. The authors
finally show the efficiency of the framework by using a case study of Austral-
ian e-government tourism services. The evaluation of the results shows that the
proposed study outperforms the competent approaches in the nature of recom-
mendation results. Wu et al. (2015) aimed to predict the user’s behavior at some
particular situation and subsequently offers personalized recommendations, for
example target ads, events and movement prediction. More generally he focused
on mobile devices and modern positioning technology. The shared data on the
social media such as location based tweets can generate data about some events
and locations. The social media platforms such as twitter, fb, foursquare generate
abundant amounts of information. Twitter alone generates 1 million geo based
tweets every day. These tweets can be used to predict the nature of an event, users
visited location and moreover the set of tweets can reveal the true purpose of a
user to visit the particular place. Ali et al. (2016) presented that available book
recommender systems face several issues because most of these systems merely
consider descriptions of the books along with metadata. In response, the author
created a hybrid book recommender system that uses table of contents and associ-
ation rule mining to make recommendations. The paper also urges that problems
like cold start and Sparsity can be mitigated by using the proposed approach.
3 Methodology
13
Education and Information Technologies
3.1 Content processing
13
Education and Information Technologies
where tk is the term that occurs at least once in the document dj, N represents the
number of documents, and n k is referred as the number of documents which have the
term tk.
( ) fk,j
TF tk , dj = (2)
maxz f z,i
where the maximum is found over the frequencies fz,i of all the terms tz which
occur in the document dj.
13
Education and Information Technologies
1. Calculate weight for all the users with respect to semantic similarity.
• Cosine similarity method is used to find similarity between rating vectors. The
method is stated as below:
∑
� � r r
i u,i v,i
��⃗v⃗ = �
s u,�
∑ 2 �∑ 2 (3)
r
i u,i
r
i v,i
13
Education and Information Technologies
2. After calculating the similarity, pick n users having highest similarity with the active user.
3. Calculate the prediction from a weighted grouping of the selected neighbors’ rat-
ings using the following weighted amalgamation of the chosen neighbors’ ratings.
∑n � �
r − ru ∗ s(a, u)
i=1 u,i
p(a, i) = ra + ∑n (4)
i=1
s(a, u)
The user based CF performs better than the item based. The only exception where
item based outperforms user based CF is when rated items are more than the num-
ber of users (Thorat et al., 2015). Among the various similarity measures (Mihalcea
et al., 2006), we use cosine similarity as it is simple and performs well. The cosine
similarity is stated in the Eqs. 3 and 5.
r��u⃗.��r⃗v
sim(u, v)cos = (5)
|���������
|r | ⃗|.||r�����||⃗
| u| v
Once we calculated the similarity between the items as described above, the rat-
ings are predicted by using the idea of weighted sum.
CBF as shown in the Fig. 5 is based on the contents the user has previously browsed or
liked. It recommends the items by finding the similarity between the user profile and
the items to be recommended. In order to find similar items, cosine similarity is used.
In CB Filtering, the model first extracts useful and required information regarding
book contents (Table of Contents) with their associated metadata including title, author,
category, short description etc. and stores in an ontology based database. CB Filtering
uses all this information to filter out the best matching books out of all the books. On
the other side, the user profile is also represented by the Meta features of these items.
These item features can be represented by mean of numbers such as TF (Term Fre-
quency) or Boolean values which show the absence or presence of terms in the docu-
ment descriptions. To locate terms weights and their relevance for the document, TF-
IDF is used to find the value of a term in a particular document with respect to corpus
or document collection. In this model, each item (book) is represented by n terms rec-
ognized using TF-IDF technique. The preprocessing techniques are applied to the terms
as illustrated above in content processing module. During preprocessing, unnecessary
terms known as Stop words are removed because they don’t have enough weight to
represent an item. Furthermore, for these terms, IDF is used to weigh those terms that
are relevant but appear infrequently in the document. Conversely, terms that are less
relevant but appear frequently in the document are weighed down. The vectors are used
to represent the books after manipulating both TF and IDF for each term. The user log
file which includes the user’s browsing history is used to suggest similar types of books
to the user. Cosine similarity is used to determine the similarity between the user profile
and the books to be recommended for this purpose. A general formula for finding the
similarity between two books is given in the Eq. 6.
13
Education and Information Technologies
book a ⋅ book b
similarity(book a, book b) = (6)
‖book a‖ ∗ ‖book b‖
It exploits the cosine similarity to calculate the similarity between the contextual user
profile (Vc) vector and item content vector ( Vm). However, CB filtering faces certain chal-
lenges such as over specialization and content extraction problems. To cope up with these
issues, we combine the techniques of CF and CBF resulting in hybrid recommendations.
13
Education and Information Technologies
3.5 Hybrid filtering
where pu and p n are the mean intensities of the active user and others users who
are similar to the active user.
Algorithm 3 Hybrid filtering
13
Education and Information Technologies
Users 270,324
Books 271,360
Ratings 1,149,780
4 Results evaluation
Our web personalization system has been designed and implemented on a high-
level and general purpose programming language, Python including Anaconda
distribution. For performing the experiments, standard book datasets have been
exploited. The detail of datasets is provided in the Table 1. Ziegler et al. (2005)
created the dataset with 2, 70,324 users who provided 1,149,780 ratings on 2,
71,360 books.
We compared our proposed system with the existing state of the art recom-
mendation models including Collaborative filtering and Content Based filter-
ing. We implemented both of these two approaches and computed their results.
Collaborative filtering is implemented using the users with the same profile
as the target user. In content based filtering, the cosine similarity is computed
between the user profile and the items to be recommended. Experimental
results of our proposed model are discussed and presented in the next section.
It can be observed in the Table 2 that hybrid filtering performs better than col-
laborative filtering and content based filtering.
13
Education and Information Technologies
0.8
Precision
0.6
Collaborave Filtering
0.4
Content Based Filtering
0.2
Hybrid Filtering
0
60 80 100 120 140 160 180 200 220 240
No. of Users
Anticipating the web user needs and providing the qualitative results is one of
the key factors of this framework. If these needs are quickly recognized, the
customer can be offered the best products instantly. We proposed a new, simple
and novel approach for recommending the intended items to the users based on
long-term behavioral signals that matches or outperforms the state-of-the-art
for this task. This study can help developers and other stakeholders in building
successful e-commerce businesses. E-commerce personalization is a developing
0.4 Filtering
0.3 Content Based
0.2 Filtering
0.1 Hybrid Filtering
0
60 80 100 120 140 160 180 200 220 240
No. of Users
13
Education and Information Technologies
Data Availability The datasets analyzed during the current study are available in the IIF’s repository,
http://www2.informatik.uni-freiburg.de/~cziegler/BX/.
Declarations
Conflicts of interest The authors do not have any conflicts of interest in relation to the present work.
References
Adeniyi, D. A., Wei, Z., & Yongquanu, Y. (2016). Automated web usage data mining and recomme.
ndation system using K-Nearest Neighbor (KNN) classification method. Applied Computing
and Informatics, 12(1), 90–108. https://doi.org/10.1016/j.aci.2014.10.001.
Al-Hassan, M., Lu, H., & Lu, J. (2015). A semantic enhanced hybrid recommendation approach: a
case study of e-Government tourism service recommendation system. Decision Support Sys-
tems, 72, 97–109. https://doi.org/10.1016/j.dss.2015.02.001.
Ali, Z., Khusro, S., & Ullah, I. (2016). A hybrid book recommender system based on table of con-
tents (toc) and association rule mining. In Proceedings of the 10th International Conference
on Informatics and Systems (pp. 68–74). ACM. https://doi.org/10.1145/2908446.2908481.
De Gemmis, M., Lops, P., Musto, C., Narducci, F., & Semeraro, G. (2015). Semantics-aware
content-based recommender systems. In Recommender Systems Handbook (pp. 119–159).
Springer. https://doi.org/10.1007/978-1-4899-7637-6_4.
13
Education and Information Technologies
Elkahky, A. M., Song, Y., & He, X. (2015). A multi-view deep learning approach for cross domain
user modeling in recommendation systems. In Proceedings of the 24th International Confer-
ence on World Wide Web (pp. 278–288). International World Wide Web Conferences Steering
Committee. https://doi.org/10.1145/2736277.2741667.
Ganguly, D., Roy, D., Mitra, M., & Jones, G. J. (2015). Word embedding based generalized lan-
guage model for information retrieval. In Proceedings of the 38th international ACM SIGIR
conference on research and development in information retrieval (pp. 795–798). ACM. https://
doi.org/10.1145/2766462.2767780.
Isinkaye, F. O., Folajimi, Y. O., & Ojokoh, B. A. (2015). Recommendation systems: Principles,
methods and evaluation. Egyptian Informatics Journal, 16(3), 261–273. https://doi.org/10.
1016/j.eij.2015.06.005.
Li, S. S., & Karahanna, E. (2015). Online recommendation systems in a B2C E-commerce context:
A review and future directions. Journal of the Association for Information Systems, 16(2), 72.
https://doi.org/10.17705/1jais.00389.
Lieberman, H. (1995). Letizia: an agent that assists web browsing. IJCAI, 1(1995), 924–929.
Logesh, R., & Subramaniyaswamy, V. (2019). Exploring hybrid recommender systems for personal-
ized travel applications. In Cognitive informatics and soft computing (pp. 535–544). Springer.
https://doi.org/10.1007/978-981-13-0617-4_52.
Malik, Z. K., & Fyfe, C. (2012). Review of web personalization. Journal of Emerging Technologies
in Web Intelligence, 4(3), 285–296. https://doi.org/10.4304/jetwi.4.3.285-296.
Mihalcea, R., Corley, C., & Strapparava, C. (2006). Corpus-based and knowledge-based measures
of text semantic similarity. In Aaai (Vol. 6, No. 2006, pp. 775–780).
Miller, B. N., Ried, J. T., & Konstan, J. A. (2003). GroupLens for Usenet: Experiences in applying
collaborative filtering to a social information system. From Usenet to CoWebs (pp. 206–231).
Springer. https://doi.org/10.1007/978-1-4471-0057-7_10.
Moreno, M. N., Segrera, S., López, V. F., Muñoz, M. D., & Sánchez, Á. L. (2016). Web mining
based framework for solving usual problems in recommender systems. A case study for mov-
ies ׳recommendation. Neurocomputing, 176, 72–80. https://doi.org/10.1016/j.neucom.2014.10.
097.
Pazzani, M. J., & Billsus, D. (2007). Content-based recommendation systems. In The adaptive web
(pp. 325–341). Springer. https://doi.org/10.1007/978-3-540-72079-9_10.
Polatidis, N., & Georgiadis, C. K. (2016). A multi-level collaborative filtering method that
improves recommendations. Expert Systems with Applications, 48, 100–110. https://doi.org/
10.1016/j.eswa.2015.11.023.
Rettinger, A., Lösch, U., Tresp, V., d’Amato, C., & Fanizzi, N. (2012). Mining the seman-
tic web. Data Mining and Knowledge Discovery, 24(3), 613–662. https://doi.org/10.1007/
s10618-012-0253-2.
Santra, A. K., & Jayasudha, S. (2012). Classification of web log data to identify interested users
using Naïve Bayesian classification. International Journal of Computer Science Issues
(IJCSI), 9(1), 381.
Schouten, K., Ruijgrok, P., Borsje, J., Frasincar, F., Levering, L., & Hogenboom, F. (2010). A
semantic web-based approach for personalizing news. In Proceedings of the 2010 ACM Sym-
posium on Applied Computing (pp. 854–861). ACM. https://doi.org/10.1145/1774088.17742
64.
Shardanand, U., & Maes, P. (1995). Social information filtering: algorithms for automating" word
of mouth". In Chi (Vol. 95, pp. 210–217). https://doi.org/10.1145/223904.223931.
Sharma, S., Mahajan, S., & Rana, V. (2019). A semantic framework for ecommerce search engine
optimization. International Journal of Information Technology, 11(1), 31–36. https://doi.org/
10.1007/s41870-018-0232-y.
Singh, V. K., & Singh, V. K. (2015). Vector space model: an information retrieval system. Inter-
national Journal of Advance Engineering Research and Studies/IV/II/Jan.-March, 141, 143.
Smith, B., & Linden, G. (2017). Two decades of recommender systems at Amazon.com. IEEE
Internet Computing, 21(3), 12–18.
Taghipour, N., & Kardan, A. (2008). A hybrid web recommender system based on q-learning. In
Proceedings of the 2008 ACM symposium on Applied computing (pp. 1164–1168). ACM.
https://doi.org/10.1145/1363686.1363954.
13
Education and Information Technologies
Thorat, P. B., Goudar, R. M., & Barve, S. (2015). Survey on collaborative filtering, content-based
filtering and hybrid recommendation system. International Journal of Computer Applications,
110(4), 31–36. https://doi.org/10.5120/19308-0760.
Wu, F., Wang, H., Li, Z., Lee, W. C., & Huang, Z. (2015). SemMobi: a semantic annotation system
for mobility data. In Proceedings of the 24th International Conference on World Wide Web
(pp. 255–258). ACM. https://doi.org/10.1145/2740908.2742837.
Ziegler, C.-N., McNee, S. M., Konstan, J. A., and Lausen, G. (2005). Improving Recommendation
Lists Through Topic Diversification”. Proceedings of 14th ACM International Conference on
World Wide Web.
Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in
published maps and institutional affiliations.
13