Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

Machine Learning based Efficient Recommendation

System for Book Selection using User based


Collaborative Filtering Algorithm
Madhuri Kommineni1 , P.Alekhya 2 ,
Department of Computer Science and Engineering, Department of Computer Science and Engineering,
KoneruLakshmaiah Education Foundation, Vaddeswaram, AP, KoneruLakshmaiah Education Foundation, Vaddeswaram,
India, E-mail ID: Madhuri.cbit@gmail.com AP, India

T. Mohana Vyshnavi3 , V.Aparna 4


Department of Computer Science and Engineering, Department of Computer Science and Engineering,
KoneruLakshmaiah Education Foundation, Vaddeswaram, KoneruLakshmaiah Education Foundation, Vaddeswaram,
AP, India AP, India

K Swetha 5 , V Mounika6 ,
Department of Computer Science and Department of Computer Science and
Engineering,KoneruLakshmaiah Education Foundation, Engineering,KoneruLakshmaiah Education Foundation,
Vaddeswaram,AP, India Vaddeswaram,AP, India

Abstract: Recommender system is a new generation of internet tool I. INTRODUCTION


that helps users to access the web and receive information about
their preferences. Using an online recommender is comparatively an
A recommender system is generally referred as a subclass of
easy and faster procedure to purchase items and this is done
information filtering system that helps to predict a user's
quickly. Recommendation systems plays an indispensable role in e-
preference of an object. We used mainly in commercial
commerce websites to help users in identifying the right goods. One
of the best methods to increase profits and attract customers is a applications. Supported systems are implemented a wide range of
recommendation process. The existing methodologies allow the areas and are commonly used for playlist creation for music and
systems to collect the irrelevant data and lead to a downfall in video services such as netflix, youtube. For specific topics like
attracting the users and completing their work in a quick and restaurants and online dating, there are also common
reliable way. This paper provides an overview of the recommender systems. To examine research articles and experts,
Recommendation S ystems that is currently employed in the partners, financial services and life insurance, recommendation
operations of the online book shopping domain. This paper proposes
systems have been developed.
a simple understandable system for book recommendations that
Recommenders typically use either collaborative filtering or
help readers to suggest the right book, which is to be studied next. In
content-based filtering or both. Collaborative sorting methods
recent years, information analysis challenge has been focused on for
the administration recommendation system. For clients, network create a model based on the past experience of a user (items that
assets are completely linked and quickly developed. The proposed are liked, rated or purchased previously) and also considering the
method works on training, feedback, management, reporting, similar decisions taken by others. Content based filtering method
configuration, and using it to offer useful information to the user in uses a list of discrete, pre-tagged item properties to recommend
order to aid in decision-making and data item recommendations. other items that have similar properties. Present recommendation
We have used a User Based Collaborative Filtering (UBCF) systems combine in a hybrid system one or more approaches.
approach and measured the performance of similarity measures in Content-based filtering, suggests items to the users on the basis
recommending books to a user. The proposed system's overall
of correlation between user profile and content of item. The
architecture is introduced and its implementation is represented
with a model design. content of an item is defined as a collection of descriptor terms,
usually the words in a text. The user profile is defined with the
Key words: Recommender system, Content based filtering, same words and created by evaluating the meaning of objects that
Collaborative filtering, Memory based approach, Cosine similarity. the user have seen. The user is recommended with the items that
are mostly positively rated. Recommendation Engines, in simple
are data-intensive systems involving complex patterns matching a
collection of predefined parameters and becoming efficient with
increasing the size of the content they obtain. To create concrete
suggestions, CBF uses different methods to find similarities

66

Authorized licensed use limited to: Cornell University Library. Downloaded on August 21,2020 at 09:59:39 UTC from IEEE Xplore. Restrictions apply.
among documents. Moreover, if the profile of the consumer different recommendation techniques. The recommendation
changes, the CBF methodology can change its suggestions easily. engine of the book should reduce the overhead associated with
Collaborative recommender systems try to predict the preference making the best book choices among the abundance.
[11-12] Uses svd algorithm on movie lens data set for the movie
of items on the basis of the items previously classified by other
recommendations and proves to be best.
users for a particular target user. Web search portals are
[15-16] Proposed a system using CCF using SVD model for
beginning to recognize the importance of the recommendation for
news recommendations on bing and proved to be efficient when
news topics, where there are two challenges mentioned earlier.
compared to other algorithms. The proposed CCF combines both
To make recommendations, they plan to incorporate the content -
the advantages of the Content-based (CBF) Filtering approach
based approach to filtering and the collaborative approach to
and the features of the Collaborative (CF) Filtering approach by
filtering. Collaborative filtering filters data using other people's
suggestions. Here the idea is that the, people who liked some using some of the rich contexts and focusing on long-tail users.
This CCF is designed for settings such as the recommendation of
items in the past will probably like again in future.
the bing news topic, where a piece of news could be interpreted
There are two approaches for implementing collaborative
by rich contexts such as the results of querying.
filtering:
[9-10] Uses nearest neighbors algorithms and matrix factorization
Model based approach uses either item-based approach or user-
for social voting and concluded that a affiliations factors play an
based approach. Users perform the main role in the user-based
important role in increasing the accuracy
approach. If some customers have similar tastes, then they join a
[20]
group. Items are recommended to users based on the assessment
[3-4] Discussed some problems and challenges in present
of items by others in the same group who have common tastes.
recommender systems and concludes to use new approaches for
While implementing a methodology, the similarity between
implementing recommender systems. Used cooperative filtering
various items is known by using a similarity measure that is most
suitable, to find the user ratings to the items that are not and content-based filtering to search the books list driven on their
ranking and material. The recommendation system primarily
previously rated by him/her.
depends on recommended book value and book rating provided
Rather than depending on the person who is most similar, the
by existing users.
predicted rating value is based on the weighted average of
[19-20] Proposed the recommendations that are provided based
multiple people's recommendations. Weight given to the ratings
on similarity of the selected artist, top artists in a genre, using a
of a person is known by the comparison between the person and
hybrid model and artists in a genre, using a hybrid model and
the target user for whom the prediction should be made. The
artists listened by user’s friends.
pearson correlation coefficient may be used as a correlation
measure. [7-8] Comparing Content Based (CBF) and Collaborative
Filtering (CF) in recommended overview of recommendation
In memory based approach we build collaborative filtering
systems and gives the differences between content based and CF
models by using machine learning algorithms. In this we find the
approaches. This paper is to suggest items, the collaborative
ratings of an item that user might give
algorithm uses “User Behavior”. They exploit other users and
items behaviors in terms of transaction history, ratings,
II. LITERATURE SURVEY
information on selection and purchase. The experience and
To provide specific recommendations, there are different types of expectations of other users over the items are used to recommend
approaches for recommendations systems. Content-based (CBF) items to new users. We need to know the content of both the user
and Collaborative Filtering (CF) are traditionally used. The CBF and the item in content-based filtering. Usually we build user-
approach knows about the content of element means the material profile and item-profile using the shared attribute space content.
to change it as suitably based on the user profile. In comparison, [5-6] Uses quick sort and cosine similarity by CF algorithm and
collaborative filtering does not depend on content and match to implements an online recommender system. Proposed a
the ideas based on particular and in past some of the users agreed methodology using OOADM (Object-Oriented Analysis and
these ideas, and there is a chance that these ideas may be agreed Design Methodology), improved collaborative filtering algorithm
by some of the users in future. You can collect the data about and an efficient quick sort algorithm and Django And NOSQL
your preferences based on the ratings you give on the products . and proved to be efficient and scalable. This paper explains that
recommendation systems that are based on collaborative filtering
[1-2] A hybrid recommendation system combining the features of methods, content and hybrid recommendation methods have been
collaborative, content based and demographic methods with high suggested but these have many issues that are the challenges of
accuracy implemented in java. This book recommendation research work. Working on this research area is needed to
engine introduced is a professional tool to recommend e-user
explore and provide new methods that can minimize obstacles
books. The recommender feature is certainly going to be a great
and provide guidance for collaborating in filtering a wide range
Java language web application. This type of web app lication
should prove beneficial for today's highly demanding websites of applications while considering issues of reliability and
for online purchases. This hybrid recommender system is more confidentiality.
accurate and efficient because it combines the characteristics of
67

Authorized licensed use limited to: Cornell University Library. Downloaded on August 21,2020 at 09:59:39 UTC from IEEE Xplore. Restrictions apply.
[17-18] Hybrid recommendation solution for online book portal IV. METHODOLOGY
applied associative rule mining and KNN in getting the book
recommendations for a user I.
[13-14]It describes about various similarity measuring techniques
and concludes item based methods are accurate when compared User ratings User item matrix
to user based methods.

III. DATA SET Construct a matrix of


knn Compute similarity
The dataset used for this is taken from Kaggle Goodreads- measure
books data where you can get the data about books, authors and
their titles along with ratings. Were commend the books to users
based on cosine similarity. The recommendations of a book are
being affected by many variables such as goodreads_book_id, Apply Deviation from Generate Top n
user_id, best_book_id, tag_id, tag_name, ratings, mean method recommendations
received_ratings, total_ratings, books_count and
similarity_measure. This system uses good reads data set. This
data set is having 7 tables.
Figure 2. System Architecture
Books.csv,Genres.csv,to_read.csv,tags.csv,book_tags.csv,max_ra
tings.csv,my_ratings .csv
1. Books.csv- consists of books with a total of !0K books having The phases of the proposed method are summarized as
its attributes book_isbn number, author, rating and etc follows:
2. Geners.csv- Consists of different categories of books. It has User rating: The system gathers feedback from goodreads-book
tag name with index as attributes. dataset
3. Book_tags.csv -Contains the columns tags name given by each
User-item matrix: User related data and device objects are
user and count of that tag name to book.
entered as a list of numerical ratings in a User-Item matrix.
4. Max_rating.csv- gives rating of a user have columns like
book_id and rating given that book. Compute similarity matrix: Using the similarity methods (e.g.,
5. Ratings.csv- Gives about ratings given to books by different PCC, CPCC, Cosine, Jaccard) we compute similarity matrix for
users has columns like user_id,book_id and rating and counts up each and every user. The similarity measure ranges from 0, 1.
to 9,00,000 records Construct a matrix of knn: Get the k nearest neighbours of a
6. to_read.csv- shows which user marked a book as to read target user and generate a matrix with the candidate books and
contains columns user_id and book_id users. The books that are rated by the neighbours and not rated by
7. Tags.csv -Different tags given by users are stored as tags it the target user are candidate books.
consists tag_name attribute. Apply Deviation from mean method: The predictive scores are
calculated from the above generated matrix by applying
aggregate method-deviation from mean method. The predictive
Top 10 scores might be binary (0/1) or numerical values which depict the
score that target user might give for all books based on the
Recommendations ratings given by neighbours and similarities. Sort the scores of all
books to get an order.

Generate Top n recommendations: According to the above sorted


Animal Farm books based on neares t neighbours and scores. Top n
recommendations are given to the target user.
Great…

Slaughterh…

1984

Macbeth

dtype:…
0 0.5 1
Similarity Measure

Figure 1. Recommendations based on similarity


68

Authorized licensed use limited to: Cornell University Library. Downloaded on August 21,2020 at 09:59:39 UTC from IEEE Xplore. Restrictions apply.
Cosine Similarity:It is the dot product of the two data points.
start It measures the cosine angle between the objects. If angle
between them is 0 degrees then the similarity is 1. If angle
is 90 degrees then similarity score is 0.It is a very popular
Merge Book Tags,Tags matrix in finding similarity because it is easy to evaluate
Dataset specially for sparse vectors.

sim(x,y) = …‘• ൫ሺ‫ݔ‬Ԧሻǡ ሺ ‫ݕ‬Ԧሻ ൯ ൌ ሺ‫ݔ‬ԦሻǤ ሺ‫ݕ‬ԦሻȀ หȁሺ‫ݔ‬Ԧሻ ȁหଶ ‫ כ‬ห ȁሺ‫ݕ‬Ԧሻ ȁหଶ  ൌ
Scrap some tags and we
get genre tags σ௦‫א‬௦ೣ೤ ‫ݎ‬௫ ǡ௦ ‫ݎ‬௬ ǡ௦ Ȁටσ௦‫א‬௦ೣ೤ ‫ݎ‬௫ଶǡ௦ ටσ௦‫א‬௦ೣ೤ ‫ݎ‬௬ଶǡ௦ Æ3

Jaccard Similarity: The Jaccard method was proposed by


Merge all genre tags, Koutrika and Bercovitz to calculate correlations between pairs of
authors for all books users. The Jaccard approach only takes into account the number
of co-ratings for each user pair to describe their partnership. Two
users will have a strong relationship if they have different rating
Split book genre tags matrix into trends, and vice versa. It does not consider absolute values of
training and testing dataset rating

‫ݏ‬ሺ‫ݔ‬ǡ ‫ݕ‬ሻ ௃௔௖௖௔௥ௗ ൌ ห‫ܫ‬௫௬ หȀห‫ܫ‬௫ ‫ܫ ׫‬௬ หÆ4


Extract strings of genres
and authors Pseudocode:
Start
Apply TFDIF Vectorizer function Top 10 Recommendations
and get tag matrix Initialize
Give counter for all similarity scores of books
in index row in similarity matrix
Sort all similarity score by sorted()
Apply cosine similarity, Pearson Correlation
Get indices of all top 10 recommendations
Coefficient, Constrained Pearson Correlation,
JACCARD similarity measures on TAG matrix Return book titles

Algorithm: User based collaborative filtering for Book


recommendations
Get top N recommendations based on
similarity matrix
Rated Unrated
Recommended TP FP
Stop Not Recommended FN TN
Table 1.Recommendati on Confusion Matrix
Figure 3. Flow of system design
Evaluation Metrics:
Similarity Measures:
Pearson coeffiecient correlation : TP: the number of books that are recommended to user and rated
Pearson coefficient correlation function calculates pairwise
TN: the number of books that are not recommended and not rated
correlation. Always diagonal in matrix is correlated by users.
perfectly. Here pearson correlation gives linear correlation
between 2 variables. FP:the number of books that are recommendeded and but are not
It ranges between -1 to +1. rated by users.

ഥ ሻଶ Æ1
r= ∑ሺ‫ ݔ‬െ ‫ݔ‬ҧሻ ሺ‫ ݕ‬െ ‫ݕ‬ሻȀඥσሺ‫ ݔ‬െ ‫ݔ‬ҧሻ ଶ σ ሺ ‫ ݕ‬െ ‫ݕ‬ FN:the number of books that are not recommended but rated by
users.
Constrained Pearson Coeffiecient:
Recall:It is the ratio of number of correctly predicted positive
In PCC the average of the values.But in constrained pearson
observations to the total number of positive observatio ns.
coefficient the median is considered.In a range [1,5] the median
value 3 is considered for calculation.It is almost same as PCC but Recall=TP/(TP+FN)
with more improved results.
r = σሺ‫ ݔ‬െ ‫ ݔ‬௠ ሻሺ‫ ݕ‬െ ‫ݕ‬௠ ሻȀඥσሺ‫ ݔ‬െ ‫ ݔ‬௠ ሻଶ σ ‫ݔ‬ሺ‫ ݕ‬െ ‫ݕ‬௠ ሻ ଶ Æ2

69

Authorized licensed use limited to: Cornell University Library. Downloaded on August 21,2020 at 09:59:39 UTC from IEEE Xplore. Restrictions apply.
Precision:
0.7
Precision is a random error description, a measure of statistical va
riability. Positive observations predicted for all actual class obser 0.6
vations
Precision=TP/(TP+FP)Æ5 0.5
F-score: F-score is the ratio of precision and recall more the F- 0.4
score is more the accuracy
0.3 MAP
F-Measure = (2*Precision*Recall)/(Precision+Recall)Æ6
0.2
MAP [Mean absolute Precision]
The standard single 0.1
number metric for evaluating search algorithms is Mean Average
Precision (MAP). Average accuracy (AP) is the average of accur 0
acy values (uninterpolated) at all ranges where relevant document
s are located.

V. PERFORMANCE AND RESULT ANALYSIS


Figure 6.MAP for different similarity measures
Here we are presenting the accuracy measures of the proposed
approach by constructing a confusion matrix obtained by testing From the graphs we can say that constarined pearson correlation
on the test data. The below graphs give a comparison of the similarity measure is the best measure for calculating similarities
recall, precision, Fscore and MAP (Mean Absolute Presidion) of among users .
PCC,CPCC,Cosine and Jaccard.) On Deviation of Mean method.
VI. CONCLUS ION
0.7
In this paper we have introduced new techniques to
0.6 recommend books using user based collaborative filtering for
0.5 assisting the users in an efficient way. We also compared some
0.4 similarity measures and found the best one. The user based
collaborative filtering method builds a user-user similarity matrix
0.3 by using some similar measures to present the similarity between
r…
0.2 users in order to utilize it for further processing. We can deduce
from the results and visualizations that rating the accuracy
0.1
followed by a normal distribution implies its consistency and
0 efficiency. Future work should instead target to protect the
system data against attacks and developing the different
algorithms.
REFERENCES
Figure 4.Recall measure for different similarity measures
[1]AbhilashaSase, KritikaVarun, SanyuktaRathod, Prof. DeepaliPatil(2015),“A
Proposed Book Recommender System”, International Journal of Advanced
Research in Computer and Communication Engineering .
0.6 [2]Murthy ,K.V.S.S.R., &Satyanarayana ,K.V.V (2018),”Intusion detection
mechanism with machine learning process “ International journel of Engineering
0.5 and Technology.
[3]ZhongqiLu, ZhichengDou, JianxunLian, XingXie and QiangYang(2015),”
Content-Based Collaborative Filtering for News T opic Recommendation”,
0.4
Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence.
[4]Narsinga rao,M.R.,Venkatesh Prasad ,V.,Sai
0.3 T eja,P.,Zindavali,M.&phanindraReddy,O.(2018),”A survey on Preventio of
Fscore Overfitting in convolutional neura networks using machine learning
0.2 techninques”International journal of engineering and Technology.
[5]E. UkoOkon, B. O. Eke, P. O. Asagba(2018),” An Improved Online Book
0.1 Recommender System using Collaborative Filtering Algorithm”, International
Journal of Computer Applications.
0 [6]LavanyaReddy,L.Srisai Krishna,J,&vaishnavi,G.V.S.S(2018),”Survey on
software reliability prediction using machine learning techniques”, Journal of
Advaced research in dynamical and Control Systems”
[7]ManiMadhukar(2014),” Challenges & Limitation in Recommender Systems”,
International Journal of Latest Trends in Engineering and Technology.
Figure 5.Fscore for different similarity measures [8]Lakshmi ,C.r.,rao, D.T.,&Rao,G.V.S(2018),”Fog detection ad Visibility
enhancement under partial machine learning approach”International Conference
on Power,control,Signals and instrumentation Engineering

70

Authorized licensed use limited to: Cornell University Library. Downloaded on August 21,2020 at 09:59:39 UTC from IEEE Xplore. Restrictions apply.
[9]Dharna Patel, Dr. Harish Patidar(2018),” Hybrid Recommendation Solution
for Online Book Portal”, International Journal for Research in Applied Science &
Engineering Technology
[10]Krishna Mohan ,G.,Yoshitha ,N.,Lavanya ,M.L.N,&KrishnaPriya ,A.(2018)
,”Assesment and Analysisb of software reliability usinfmahine learning
techniques”Internationaljournel of Enginerring and technology
[11]ParulAggarwal, Vishal Tomar, AdityaKathuria(2017),” Comparing Content
Based and Collaborative Filtering in Recommender Systems”,International
Journal of New Technology and Research.
[12]Surarchita,V.,VenkateswaraRao ,P.,battacharyya ,D.,Kim,T.(2016)
,”Classification of penaeid prawn species using radial basis probabilistic neural
networks and supportvectormachines”InternationalJournel of Bio-Science and
Bio-technology
[13] Dr. Mohammed Ismail 1 ,Dr. K. BhanuPrakash ,Dr. M. NagabhushanaRao
(2018)”Collaborative filtering-based recommendation of online social voting”
International Journal of Engineering & Technology.
[14]Vidhyullatha ,p.,&RajeswaraRao,D.(2016),”Machine learning techniques on
multi dimensional curve fitting data based on r-square and chi-square
methods”InternationalJournel of Electrical and Computer Engineering
[15]Immidi Kali Pradeep, M.JayaBhaskar(2018)”Comparative analysis of
recommender systems and its Enhancements” International Journal of
Engineering & Technology.
[16]Anila,M.,&Pradeepini ,G.(2017),”Study of prediction algorithms for selecting
appropriate classifier in machine learning”Journel of Advanced Research in
Dynamical and Control Systems
[17]Item-Based Collaborative Filtering Recommendation Algorithms(2001)”
BadrulSarwar, George Karypis, Joseph Konstan, and John Riedl”
[18]Atmakur,V.K.,&SIvaKumar,P.(2018),”A prototype analysis of machine
learning methodologies for sentiment analysis of social
networks”,Internationaljournel of engineering and technology
[19] Dhruv, Ajay, AasthaKamath, AnujaPowar, and Karan Gaikwad. “ Artist
Recommendation System Using Hybrid Method: In Emerging research in
computing, Information, Communication and Applications, pp. 527 -542.
Springer, Singapore, 2019
[20] Raghuwanshi, Sandeep K., and R. K. Patreiya, “ Collaborative Filtering
T echniques in Recommendation Systems” In Data Enginnering and Applications,
pp, 11-21, Springer, Singapore, 2019.

71

Authorized licensed use limited to: Cornell University Library. Downloaded on August 21,2020 at 09:59:39 UTC from IEEE Xplore. Restrictions apply.

You might also like