Project Report

Table of Contents
CHAPTER 01 ............................................................................................................................ 3
INTRODUCTION ..................................................................................................................... 3
Introduction of Recommender System .................................................................................. 3
Approaches ............................................................................................................................ 7
Collaborative filtering ........................................................................................................ 7
Content-based Filtering Technique .................................................................................... 9
Hybrid Filtering ............................................................................................................... 10
Problem Statement ............................................................................................................... 11
CHAPTER 02 .......................................................................................................................... 13
BACKGROUND ..................................................................................................................... 13
Background of the Recommender System ........................................................................... 13
History.................................................................................................................................. 17
Advantages of recommender system ............................................................................... 19
Disadvantages of recommender system ........................................................................... 20
CHAPTER 03 .......................................................................................................................... 22
METHODOLOGY .................................................................................................................. 22
Recommender system .......................................................................................................... 22
Content-based Filtering Technique ...................................................................................... 24
Jaccard Similarity............................................................................................................. 27
Euclidean Distance........................................................................................................... 28
Cosine Similarity ............................................................................................................. 29
Vector Space Model ............................................................................................................. 30
Methodology ........................................................................................................................ 36
CHAPTER 04 .......................................................................................................................... 50
RECOMMENDATIONS & CONCLUSION .......................................................................... 50
Advantages of content based system ................................................................................... 50
Drawbacks of content based recommender system ............................................................. 50
Conclusion ........................................................................................................................... 51
CHAPTER 01
INTRODUCTION
Introduction of Recommender System
The electronic business is growing rapidly, which has increased the available information on
the internet due to which it becomes difficult to access the required information on internet
timely. This problem can be solved by introducing the recommender systems. Currently, the
recommender systems are becoming more popular in the business world. Almost every
consumer website has their own recommender systems. Recommender systems can be used
in many areas such as research papers, movies, videos, newspaper, books, songs, products
and so on.
The recommender systems can be defined as an information filtering system which is used to
predict the user preferences about his choices and other user’s behavior as well or it is a
discovery assistant that helps theirs users in identifying the items they may like. These
systems basically make suggestions of items for the users on the basis of his past purchases
and searches, and on different user's conduct. It creates ease for the customer to discover
items which they may not be able to find out by themselves.
The recommendation system allows the customers to give input about their preferences or
abhorrence's. Look at Netflix as an object lesson, where customers can provide the feedback
around their options by using a solitary snap to rate the items. The users use the numerical
values to rate the items. The five star rating system is a salient example.
Figure 1 Five star rating system

Other types of feedback are quite unclear, but easy to collect on the Web-driven worldview.
In addition to the rating system, feedback can also be obtained by recording the buying and
browsing behavior of customers. The online traders like Amazon.com,, Facebook, YouTube,
Daraz.pk, and OLX.com etc. use these types of feedbacks. The substance to which the
suggestion is given is referred to as the customer, and the goods which are being prescribed
are likewise considered as items. As the past interests of the users are considered as the best
indicators for the future preferences so that’s why the recommender system uses the users
past behaviors to prepare an analytic thinking of their tastes. Suppose, a customer came on
website and orders shampoo, conditioner and body wash. His buying pattern will be recorded
in the database and when another customer came and orders for example shampoo, the
recommender system will starts suggesting him to order conditioner and body wash as well.
Let us consider the example of YouTube, where the system suggests his customers the clips
of their interests on the basis of their previous browsing details.
A prominent exemption is the situation of a knowledge based recommender system which did
not consider the past preferences of the users instead it considers the requirements suggested
by the users. The knowledge based recommender systems are intelligent enough to think and
able to arrive at a decision that which product should be ranked high or low. For instance, a
product which is not commonly purchased by the customers lies in the lowest ratings, but
suddenly people start purchasing that item, in this situation the knowledge based
recommender system will automatically raise its ratings and will start suggesting that item to
the new coming users.
The recommender systems work in a perfect world in one of two ways. It can depend on the
properties of the items that a customer likes, which are broken down to figure out what else
the customer may like; or, it can depend on the preferences of different customers, which the
recommender system at that point uses to measure similarity index amongst customers and
prescribed items to them as needs be. It is additionally conceivable to consolidate both these
techniques to manufacture a significantly more vigorous recommender system.
A recommender system is an instrument that lets algorithm developers a chance to anticipate
what a customer might possibly like among a rundown of given items. The basic rule of the
recommendations is that there must be a significant relationship between the user and the
item which he selects. Suppose, a client who is occupied with an educational program will
probably be keen on another program related to the subject, as opposed to in an action film,
etc. In many cases, where the products are classified into groups of similar items, these
groups may have significant correlations among them which can be utilized to make perfect
recommendations to the users. But on the other hand, the dependency among the user and the
item might be available at the better granularity of individual things instead of different
classes of items. The dependencies can be viewed in the form of ratings matrix, and the future
predictions will be established on the ground of these rating matrix for target users. If the
quantity of rated things that are accessible to a node is larger than it is more simple to
establish strong predictions about the future behavior of the user.
A wide range of models can be utilized to establish future predictions about the user
preferences. For example, the aggregate purchasing or rating conduct of different clients can
be utilized to make clusters of similar clients who experience a similar choice of items. The
interests and activities of these clusters can be utilized to make suggestions to the singular
individuals from the cluster. If the new user outside of that cluster came and order an item
from that cluster the system will start suggesting him the other items from that cluster.
There are some approaches which can be used in the recommender system to make
suggestions to the users.

Approaches
Most recommender systems adopt one of two essential strategies: collaborative filtering or
content-based filtering. Different methodologies, (for example, hybrid filtering) additionally
exist.
Figure 2: Recommendation filtering techniques
Collaborative filtering
Collaborative Filtering algorithms are based on the client’s behavior. The model of
collaborative filtering can be developed from individual user behavior, but in order to start
out more beneficial results it can likewise be prepared along the base of other users conduct
who experience similar choices or tastes. Collaborative filtering is used to predict the missing
ratings. The ratings provided by the multiple users about the items will be used to make
recommendations for the new users and also for the existing ones. Let us consider an example
of digital library, user ratings about the books specify their likes or dislikes about the specific
books. Most users would have seen just a little portion from the immense availability of
accessible records. Consequently, a big part of the ratings are unobserved (or unspecified).
Instead of that the ratings, which are identified or determined by the users can be termed as
observed ratings while the unspecified ratings are also known as the unobserved or missing
ratings.
The collaborative filtering technique is founded on a single fundamental rule that is if there
are some ratings, which are unspecified than these unspecified ratings can be attributed on the
ground that the observed ratings are highly correlated crosswise over different users and
items. Let’s take an example to explain this concept in detail, consider the two users who
have very common preferences. Suppose, they have the same specified ratings for an item
than at that point their likeness can be recognized by the basic algorithm. It is probable to
consider obvious in a case where one of them has rated the value, the other one also bears the
same feeling. This familiarity can be applied to make deductions about not completely
indicated esteems.
Suppose another example of collaborative filtering, suppose we’re making a website for
suggesting articles to the readers. In collaborative filtering technique the information of the
users about their preferences will be recorded that who have subscribed and read the article.
The system will create the groups of the users who prefer the similar articles. From this
available information, it is now easy to identify the most popular articles and than that article
can be recommended to the group member who have not read or subscribe that article yet.
Collaborative filtering used two different methods which are:
 The memory based and;
 The model based.
The memory based collaborative filtering is also known as neighborhood collaborative
filtering. This method is used to anticipate the rating which are unspecified yet on the basis of
their neighborhoods. The user based collaborative filtering and the item based collaborative
filtering are the two different techniques to explain the neighborhood. In user based
collaborative filtering technique the users will be identified who experience the similar
preferences in the past and will prove to predict that if two different users like the same affair
in a same fashion than in future one of them’s unobserved ratings can be found along the
footing of the observed ratings of the other one. While in item based collaborative filtering
technique the items will be identified which are more similar to the target point and will
prove to predict that if a particular user likes the target product than he will more likely to
prefer the other group of items similar to the target point.
Content-based Filtering Technique
In the content-based recommender system the explanations related to the target item plays an
essential role in order to make predictions. These explanations are termed as Content. In the
content based recommender system, the past purchasing designs and the senior ratings of the
users along with the content of the item are collectively utilized keeping in mind the end goal
to arrive at predictions. The basic thought behind the content-based recommender system is
that the user interests can be determined on the basis of features or properties of the items
they have graded or used previously. Content based systems, suggests items in view of a
close examination between the description of the items and a client's profile. The component
of items is mapped with highlight of clients keeping in mind the end goal to get client – item
similarity. The best coordinated pair’s will be recommended as suggestions. In the
recommendations of documents to the users such as articles, papers, web logs, web pages,
publications and so on, the content based recommender systems are regarded as the most
successful filtering technique. CBF used many different models such as the vector space
model, neural networks, decision trees etc, to find or measure the document similarity.
This approach may utilize authentic perusing data, for example, which blogs the client
peruses and the attributes of those blogs. Imagine a client normally reads the history based
articles and comment the blogs about software, and then the content based filtering will
utilized the client’s history to recommend him the similar contents.
Another example of the CBF is the News Dude which is an individual news system that uses
blended discourse to peruse news stories to clients. The TF-IDF model is applied to depict
news stories so as to decide the fleeting suggestions which are then contrasted with the
Cosine Similarity Measure.
Hybrid Filtering
In Hybrid filtering techniques other recommendation techniques combine together in order to
manage the large measure of inputs as the recommender system has the flexibility to treat
different type of task simultaneously. The point where a large number of inputs are available,
there is a golden opportunity for hybridization where the different angles from various sorts
of systems are joined to accomplish the best of all universes.
Problem Statement
After leading an analysis of JBR of CUST, we have identified that it neglects the need of
recommender system. The recommender system guides the choices of the users. It increases
the probability of receiving the relevant info for the users and benefit him in the context of
the time saving and increasing efficiency of work. The recommender system will create an
evaluation of value, style or perspective by the opinion of the different individual’s
experience. The recommender system uses the client’s preferences to make accurate
recommendations to the user related to their pursuits. The purpose of our study is to introduce
a recommender system for JBR, in which the similarity between the articles can be measured
in order to provide the ease to the users. In order to evaluate the similarity index of text files
the suggested recommender system will use the content-based technique.
In this study, our point of interest is the content-based filtering techniques for the
recommender system because we are working on the recommender system for the Jinnah
Business Review. The motivation behind this action is to present the system for JBR through
which the similarity between the articles can be measured in order to provide the ease to the
users. Due to this recommender system the users will be able to find out the relevant
information quickly. Generally the content files are considered as similar if both of the
articles are portraying similar ideas and they are semantically close too. Then again,
"similarity" can be practiced in a setting of copy discovery. We will consider the similarity
with regards to the comparable ideas between the articles. We will quantify the articles
similarity in light of the vector space model which is based on the cosine similarity. In this
model, articles will be considered as vectors rather than content. The angle between the given
vectors will characterize their similarity. On the off chance that the angle is little, it shows
that the records are more similar and if the angle is huge, it demonstrates that the articles are
less similar. Cosine similarity involves contrasting client profiles, item profiles or content
records. Utilizing cosine similarity we will discover whether the two articles are of similar
idea or not.
CHAPTER 02
BACKGROUND
Background of the Recommender System
Recommender system is used to suggest the relevant items or products to user. The
fundamental purpose of the recommender system is to increase the profit of trader or
merchants by increasing sales. Increase in sales is possible only when the recommender
system shows items which are of customer’s interest or relevant to customer’s interest. The
profit can be increased through showing user more relevant items. As much the recommender
system shows relevant items as user will purchase more items which leads to higher
profitability for trader. The recommender system gain attention of user by showing more and
more relevant items. A strong recommender system which has a clear and more relevant
suggestions of items is more profitable to merchant. Through suggesting different items the
recommender system attracts the user so it is very important to choose the relevant suggesting
items very carefully. Items must be of user’s interest.
The very first or fundamental of any recommended item is that it must be relevant to the user.
It must be attractive because generally users buy the products which are interesting.
Relevancy is a basic goal of recommender system, because if the product is not relevant, it
will never attract the customer, recommender system can gain the attention of user by
suggesting relevant items only.

It is nature of human being that unique and new things attract it most. Any item attracts user
when it is never seen by user before. Items which are common does not attract user. If the
recommender system shows any same items again, it also make customer bore. Sales
diversity is decreased when any common items are repeatedly recommended by
recommender system.
Another way to attract user is to show something which is unexpected for user, and user feel
it luck to discover the item. Showing new things may not attract the user as much as the
unexpected item. It works when recommender system totally amaze the user by showing any
surprising item, instead of showing anything that they did not seen before. Because in some
cases items may be new but according to user’s interest. For instance a new English language
institute opens in neighbourhood is recommended to a user who usually interested in different
languages learning schools, it may not attract the user although it is new for user but not
surprising. On the other hand if any Chinese or Japanese language learning institute
suggestion is made, it may surprise the user and gain the user’s attention. Showing surprising
item is beneficial for increase in sale diversity of any item which are not much popular
before. It develops the new areas of interest for users.
Increase in recommendation diversity can increase in sales by attracting user. Recommender
system commonly suggest a list of top-k things. At the period when all these suggested things
are fundamentally the same as, it builds the hazard that the user dislike any of these matters.
Then again, when the suggested list contains things of various sorts, there is a more
prominent possibility of choosing more than one from the list. Decent variant has a
competitive edge of making assure that user should not get bore from repeated
recommendation of same items.

Beside these solid objectives, various delicate objectives are additionally fulfilled by the
recommendation system, which are useful for both the user and the merchant. From the point
of view of the user recommendations can assist enhance general user fulfilment with the Web
webpage. For instance, a user who more than once gets more suggestions from Amazon.com
will be happier with these suggestions and will probably utilize the site once more. This can
enhance the sincerity of user and further increment the deals at the site. At the trader end, the
recommendation process can give bits of knowledge into the requirements of the user and
help alter the user encounter, further. At long last, giving the user a clarification to why a
specific thing is suggested is regularly helpful. For instance, on account of Netflix,
recommendations are furnished alongside beforehand watched films..
There is a wide assorted variety in the sorts of items suggested by such systems. A few
recommender systems, for example, Facebook, don't specifically suggest items. Or maybe
they may suggest social connections, which have a backhanded advantage to the site by
expanding its convenience and publicizing benefits. Keeping in mind the end goal to
comprehend the idea of these objectives, we will talk about some well-known cases of
recorded and current recommender systems. These cases will likewise exhibit the wide
decent variety of recommender systems that were established either as research models, or
can be used today as business systems for solving different business problems.
GroupLens was a very first recommender system, which was worked as an examination
model for suggestion of Usenet news. The system gathered evaluations from Usenet readers
and utilized them to predict regardless of whether different readers might want an article
before they read it. A portion of the collaborative filtering algorithms were created in the
GroupLens setting. The general thoughts created by this gathering were additionally stretched
out to other item settings, for example, books and films. The comparing recommender
systems were primarily known as BookLens and MovieLens, individually. Besides its
pioneering contributions to collaborative filtering research, the GroupLens research group
was striking for discharging a few informational indexes amid the early years of this field,
when informational collections were not effectively accessible for benchmarking.
Unmistakable illustrations incorporate three informational indexes from the MovieLens
recommender system. These informational collections are of progressively expanding size,
and they contains 105, 106, and 107 ratings respectively.
Amazon.com was also in one of the innovators of recommender system, particularly in the
business setting. Amid the early years, it was one of only a handful couple of retailers that
had the forward planning to understand the value of this innovation. Initially established as a
book e-retailer, the business extended to for all intents and purposes all types of items. Thus,
Amazon.com now offers for all intents and suggests all classifications of items, for example,
books, CDs, programming, hardware, electronics and so on. The suggestions in Amazon.com
are given on the premise of unequivocally gave evaluations, purchasing conduct, and
perusing conduct. The evaluations in Amazon.com are indicated on a 5-point scale, with most
minimal rating being 1-star, and the most elevated rating being 5-star. The user particular
purchasing and perusing information can be effortlessly gathered when users are signed in
with a record validation system upheld by Amazon. Suggestions are likewise given to users
on the basic Web page of the webpage, at whatever point they sign into their records. As a
rule, clarifications for suggestions are given. For instance, the relationship of a prescribed
thing to already obtained things might be incorporated into the recommender system
interface. The buy or perusing conduct of a user can be seen as a sort of understood rating,
instead of an express appraising, which is indicated by the user. Numerous business systems
allow the adaptability of giving suggestions both on the premise of express and understood
input. Truth be told, a few models have been intended to together record for unequivocal and
certain input in the suggestion procedure. A portion of the calculations utilized by early forms
of the Amazon.com recommender system are discussed.
Social networks regularly recommend potential friends to users with a specific end goal to
build the number of social associations at the site. Facebook is one such case of a long range
interpersonal communication Web webpage. This sort of proposal has somewhat unexpected
objectives in comparison to an item suggestion. While an item suggestion straightforwardly
builds the benefit of the dealer by encouraging item deals, an expansion in the quantity of
social connections enhances the experience of a user at an informal organization. This, thus,
energizes the development of the informal community. Informal organizations are intensely
reliant on the development of the system to build their promoting incomes. In this manner,
the suggestion of potential friends (or connections) empowers better development and
availability of the system. This issue is also known as link prediction in the arena of informal
organization examination. Such types of proposals depend on structural relationships instead
of data. Therefore, the nature of the basic algorithms is totally unique.
History
Recommender system is very popular and powerful tool of digital world in the present era.
As recommender system helps one to find out the stuff they are looking for, from the options
of millions, because there is a lot of choices and alternatives are provided to single user. In
present we can easily find a solution of it on amazon.com or by going to Netflix but in the
past it was not that much easy. It is a fact that human beings follow each other most of the
time, if one is a going in a specific direction soon you may find that it creates a line of other
followers behind, similarly if a user choose any one product and then gradually an increase in
the demand of product can observed. On the basis of similar concept recommender system
works. Information retrieval evolved in response to the need to be able to ask questions about
a large collection of documents. Now much of the computing here was actually done because
of large lawsuits that were being handled in the computer industry companies like IBM, but
the same technology applies to libraries and their card catalogues or even to companies that
are building indexes of the world wide web. The principles are the same. You have a static
content base, or mostly static. We don't publish new books that often, compared to how often
we read them. Or we don't publish new webpages as often as people navigate to them. But we
have a dynamic information need. That information need is what we sometimes call a query
and a femoral interest that we want an answer to. Because of this balance, we spend our time
and invest it in indexing everything we can about that content base.
The introduction of GroupLens done by Paul Resnick, John Riddle and their understudies
was that users who are perusing news articles through GroupLens would rate the articles as
they read them. They simply put in a speedy number for a one through five and users would
be coordinated to each other to discover other individuals who had comparative tastes. When
you went to the newsgroup to choose what articles to peruse, you would get a customized
forecast of which articles you might want or aversion and how much utilizing a closest
neighbour approach where consolidated together the evaluations of other individuals like you.
This is the thing that that resembled. In the mid and late 90s, organizations were jumping up
left and right. GroupLens wasn't an advance months a short time later, there was a framework
called Ringo and Homer from the MIT Media Lab that turned out to be Firefly Networks.
The GroupLens framework turned into the organization Net Perceptions, Firefly progressed
toward becoming Agents Inc. Work was being done left and right, and people went out, and
got these things into business hone. Amazon was only one case of the actually handfuls and
after that hundreds and afterward a large number of employments industrially, of
recommender innovation beginning in the mid-90s and advancing to today. At the point when
the user demands proposals, the framework utilizes its connections to discover an area of
users who concur with this user on what films are great. It at that point utilizes those
neighbors' sentiments to give suggestions to this user. The thought being that in the event that
you concede to a great deal of the things you've seen officially, at that point the things seen
by this individual who concurs with you may be a decent suggestion among the things that
you haven't seen. So on the off chance that it sounds good to you to see this in a table instead
of only a major blob of users, how about we take a gander at this arrangement of evaluations
and say, I'm endeavoring to choose whether I need to watch Blimp or Rocky XV today
around evening time. So we will search for users that concur with me in our past appraisals
and we see that Ben, and I concur generally well on two or three films that we've seen, and
Nathan, and I likewise concur swapping out one of the motion pictures. What's more, we'll
discover that Joe and I differ unequivocally, however we likewise differ decently reliably.
Perhaps I may like something that he doesn't care for. So at that point Pat and I concur,
however Pat hasn't seen either Rocky XV or Blimp. So despite the fact that we concur, his
suppositions aren't that helpful for making sense of what I should watch next. So now, we
take a gander at what do these users consider the films.
Advantages of recommender system
Recommendation systems assume a critical role in the present web. Here is a parcel of the
advantages and disadvantages of recommender system.
 The recommender system depends on real user conduct, i.e. target reality. This is the
greatest preferred standpoint - watching individuals in their indigenous habitat and
settling on outline choices straightforwardly on the outcomes. For instance, the

"Recommended Post" highlight of Facebook proposes posts on our likes and
preferences.
 The recommendation system helps the users to identify the data connected to their
interest or preferences. It benefits him in the context of the time saving and increasing
efficiency of study.
 A recommendation system is used to guide the user's selection or preferences by
considering the past preferences, buying or browsing behaviour clients.
 In the recommender system, the aggregate purchasing or rating conduct of different
clients can be utilized to make clusters of similar clients who experience a similar
choice of items. The interests and activities of these clusters can be utilized to make
suggestions to the singular individuals from the cluster. If the new user outside of that
cluster came and order an item from that cluster the system will start suggesting him
the other items from that cluster.
Disadvantages of recommender system
 The biggest advantage of recommender system is huge data. As there is millions of
users and it is very difficult to manage big data.
 User preferences never remain same and they may change it time to time according to
need but recommender system can only recommend the items on the basis of past
preferences.
 Another problem in recommender system is change in data. Because trend are not
constant they always change, so an algorithm may not get the clear data for
recommendation.
 A lot of variables are required to make a single recommendation, because it need
enough information to give a simple recommendation and it makes recommender
system complex. To get user satisfaction it is important to make a suitable required
recommendation.
 Sometimes user’s interest are dynamic and past preferences are totally opposite from
current choices so it is difficult to recommend a list of specific items. Because user’s
behaviour and choice is totally unpredictable and diverse.

CHAPTER 03
METHODOLOGY
Recommender system
The trend of e-commerce is at peak in the modern era. Use of Recommendation system is
increasing day by day in e-commerce systems for personalization. Today Recommendation
systems become very important aspect of e-commerce systems. The main purpose of
Recommendation system is to make a suggestion to user by providing information about
different items available. Although the recommendation system is used at very large scale but
there are some shortcomings and problems which are faced by recommender system.
The first important point or approach in the recommendation system is the prediction of
rating value for the compound of user item. In this case the assumption is to predict data
through the preferences of the user for specific items. An "𝑀 × 𝑁" matrix is created to record
the m user and n items where recorded values are used for training model. The problem has
occurred in the system due to lack of accuracy as the system is based on an assumption. This
problem is often known as the matrix completion problem because the matrix of values is
recorded incomplete and all other values are forecasted by learning algorithms. The other
common problem in recommendation system is ranking problem. In actuality it is not
necessary that predicted items must be based on user’s past preferences because a user may
not like any item again, which he liked in the past may be he is looking for a new item. As
recommendations cannot be always made on the basis of ratings of user specific preferences.
So merchant should present something new to user which may attract him more than past.
May be a merchant also want to promote a newly introduce the product. The merchant also
wants to promote a newly introduce the product. The merchant should recommend the top-k
items for the promotion to selected top-k user. But the problem is due to selection of top-k
items which should be included in the top-k items. In this case the accuracy of the rating it is
not necessary. If the problem of the first method is reduced the second problem is
automatically resolved because the solutions of second are automatically resolved from the
solutions of the first problem, Sometime this problem can be solved without the help of the
first problem.
To make effective recommendations, the recommender system requires a lot of data. I may
be the biggest problem in the Recommender system to collect and manage this huge data.
Because all the big companies or e-commerce systems using a recommender system has
millions of users and data accordingly. Those recommender systems which have outstanding
recommendations has a huge data of users like amazon.com, Facebook, Netflix, Google, etc.
To make algorithm work the recommender system has to create "𝑀 × 𝑁" matrix which can
only be created with large data. When many items and user are included in the matrix a
strong recommendation is formed. So to get good recommendation you have to manage huge
data.
Another problem is previous data. The recommender system may show same item again and
again, and ignore the new ones. It may create biasness toward the previous preferences. It is
very hard to determine the new item prediction to the user.

User preferences do not remain same. May be a user chooses one item today do not need
tomorrow. He may look for any other product he needs, but the recommender system is
recommending the previous items. For instance a user searched for T-shirts on amazon.com
today, but on the next day he may look for any good literature book. But the Recommender
system is not able to recommend the books automatically. That is why there is very few e-
commerce systems which has an excellent and strong recommender system.
Content-based Filtering Technique
The recommender systems are a discovery assistant that helps theirs users in distinguishing
the details they may like. Generally the recommender systems can be separated into two
types,
 The content-based filtering and;
 The collaborative filtering.
In this study, our point of focus is the content based filtering models. Before we go on, it’s
important to characterize the couple of terms which are as follows:
 The items whose traits can be applied every bit a component of the recommender
system will be known as the content.
 Attributes or the traits of the items are the description of the particular item. For
instance, the collection of words in an article etc.
Now let’s explain what actually the content-based recommender systems are?
Suppose someone has approached you for a book recommendation, it's truly normal to solicit
what sorts of books they like. From that point, you could think about a couple of titles that are
like the things they've enjoyed before. This procedure, of suggesting content in light of its
attributes, is the fundamental element of content based filtering, the innovation behind Netflix
and Pandora's are the recommender system.
The content-based recommender system requires the related data about various accessible
things as the content alongside the profile of the client which must follow the client
preferences. Or in the content based recommender system, the past purchasing designs and
the senior ratings of the users along with the content of the item are collectively utilized
keeping in mind the end goal to arrive at predictions. This technique is recommended the
ideas to the users by making a comparison between the user preferences and the content
related to the items.
The basic thought behind the content-based recommender system is that the user interests can
be determined on the basis of features or properties of the items they have graded or used
previously. In the content based recommender system the explanations related to the target
item plays an essential role in order to make predictions. These explanations are termed as
Content. Content based systems, suggests items in view of a close examination between the
description of the items and a client's profile. The component of items is mapped with
highlight of clients keeping in mind the end goal to get client – item similarity. The best
coordinated pair’s will be recommended as suggestions. In the recommendations of
documents to the users such as articles, papers, web logs, web pages, publications and so on,
the content based recommender systems are regarded as the most successful filtering
technique.
A content based recommender systems work with the information that the client gives, either
explicitly, i.e. in the form of ratings or implicitly i.e. simply by snapping on a link. In the
illumination of that information, a customer profile is created by the system, which is then
utilized to prepare recommendations to the client. As the client gives more sources of
information or takes activities on the suggestions, the system turns out to be increasingly
accurate.
Figure 3 Recommendation Engine
The fundamental thought behind the content based recommender system is to provide the
recommendations to the users related to their choice. In CUST, user who needs articles to
study can visit their JBRC. Let’s assume that a user reads an article from JBR and then he
wants to review the more articles related to the article which he read before. Now there is a
need for a system through which the user can easily access the similar articles, so in order to
achieve that we can use the content based recommender system. These systems will use the
past reading patterns of the users to make suggestions of similar articles to the users. By
using the mathematical approach the accuracy of recommendations can be achieved. The
document similarity can be measure using three mathematical approaches.
 Jaccard Similarity
 Euclidean Distance
 Cosine Similarity
Jaccard Similarity
The ratio between the union and the intersection of two objects is known as Jaccard
similarity.
Figure 4 Jaccard Similarity

The union of two objects can be achieved by combining
the both given sets of objects while the intersection of
sets contains the only similar objects from the two sets.
The combinations of distinct items are collectively
known as set. For example, 2, 7, 9, d, a are some
distinctive objects but when we consider these objects
collectively they’ll form a set i.e. {2, 7, 9, d, a}. Figure 5 Set
Figure 6 Union and Intersection representation
Euclidean Distance
Euclidean distance is the difference between the two points in a plane. In mathematics, the
sum of the square over the root of all the given vectors is called the Euclidean distance.
Figure 7 Euclidean Distance
Euclidean Distance = √(𝑥2 − 𝑥1 )2 − (𝑦2 − 𝑦1 )2
Cosine Similarity
The closeness among the two non-zero vectors of the product or item is known as cosine
similarity which measures the cosine of the angles among the vectors. The value will be 1 for
the Cosine of 0° while the for all the other angles rather than 0° the value will be less than 1.
Cosine similarity is used to find the normalized dot product of the two vectors. The cosine
similarity determined the cosine between the angles. If the two vectors are parallel it means
the cosine similarity among these vectors is 1 while the vectors which makes an angle of 90°
will have the cosine similarity of 0.
𝑉(𝑑1 ). (𝑉𝑑2 )
𝑠𝑖𝑚(𝑑1 , 𝑑2 ) =
‖𝑉(𝑑1 )‖‖𝑉(𝑑2 )‖
Where, V denotes the vectors while the 𝑑1 𝑎𝑛𝑑 𝑡ℎ𝑒 𝑑2 represents the documents.
Figure 8 Cosine Similarity
All the above mentioned mathematical approaches can be used to measure the similarity
among the documents in order to make recommendations for the users. As we have a very
short time to implement our idea to introduce a recommender system for JBRC of CUST
that’s why we will focus only the cosine similarity at this stage. Cosine similarity is an
important technique of all which is used to make recommendations more accurate.
Vector Space Model
VSM is an algebraic model which is also termed as Term vector model. This model is used
for the representation of documents in the form of vectors. These vectors can be added to one
another and can also be multiplied by the scalar numbers. In this model the whole document
will be taken as the bag full of words and the main goal is to discover the more similar
documents. The documents are considered as vectors instead of texts. In a vector space model
each term of the document will have its own axis. We have to imagine all the documents as
vector in order to get an angle between them so that we can measure the difference by
calculating cosine of the angle which is used to measure the distance between the documents.
This distance will indicate the similarity between the document. Distance between he two
documents id either zeroes or positive, it can never negative. If the two documents makes an
angle of zero degree between them its means they are perfectly similar to each other.
Figure 9 Vectors with the angle of 0°
Here Vetor A is equal to {"H" , "E" , "L" , "L" , "O"} and the vector B is also equal to
{"H" , "E" , "L" , "L" , "O"}. These two vectors are parallel and the anglr between them is zero
and cos 0° = 1. While the angle between Vetor A {"H" , "E" , "L" , "L" , "O"} and the vector B
i.e. {"X" , "Y"} will be approximately 90° and cos 90° = 0. Which means that the documents
are totally different.
Figure 10 Vector makes an angle of 90°
What is a vector? Well, a vector is a quantity which has some direction and along with the
direction it has some magnitude as well.Weight, force, velocity and momentum are some
common examples of vector as they have the both the properties of vector i.e. magnitude and
the direction. Vector representation can be done in two and three both dimensions. A vector
is a simple single line, the length of that line is known as its magnitude while the orientation
of that line is known as its direction. The vector line has an arrowhead at one end of the line
which directs towards the direction of the vector.
Figure 12 Vector
Figure 11 Vector representation in space
Afetr making a clear understanding of vector now it’s the time to place the set of documents
in a space considering these documents as the vector. Vectors can be represented in more
than 100 dimentsional plane as each term of the document will have a separate dimension.
Figure 13 Documents representation as vector
In the diagram, 𝑉(𝑑1 ) and 𝑉(𝑑2 ) are the vectors obtained from the documents d1 and d2
while the V(Q) is the query vector. In space there are terms related to the documents but to
make it simple the only two terms are considered on each axis. The angle θ among the
document 1 and the query vector will determine the similarity of the document with the
query. Cosine of θ will be used to calculate the value of the angle between the document and
the query.
In information retrieval system, the term frequency and the inverse document frequency are
considered the important concepts. The term frequency denoted as Tf, is the frequency of the
specific terms in a particular document or simply it tells that how many times the particular
term appears in a document. While the inverse of the document is denoted by the IDF, which
considers the terms with the lowest frequency. Suppose a user make a search on Google for
“the rise of technology”, it is surely the term “the” will must have the high frequency than
the term “technology” but the importance of the term technology cannot be denied as well
from the query point of view. In such kind of situations, the tf-idf discredits the impact of
highly repeated words in a document in order to decide the significance of the document.
𝑁
𝑖𝑑𝑓 = log
𝑑𝑓
Where
N = No. of documents
df = frequency of particular word in total documents
So, if the specific term appears many times in a document, the will reduce its weight
automatically and similarly if a term appears few times in documents than its obvious for the
term to have a higher weight. In order to dampen the impact of high frequency terms, the idf
uses the log.
In the process of measuring the document similarity the most popular weight which is
considered is the dot product or combination of tf-idf.
𝑡𝑓 − 𝑖𝑑𝑓 = 𝑡𝑓 × 𝑖𝑑𝑓
So if the term t appears many times in a small number of document than it will have a high
weight and similarly if a term t appears less time in a maximum number of documents than it
means that the term have less weight. These tf-idf weights are used for the similarity
measure.
To measure the document similarity in a vector space model, we will use the cosine similarity
with tf-idf. The closeness among the two non-zero vectors of the product or item is known as
cosine similarity which measures the cosine of the angles among the vectors. The value will
be 1 for the Cosine of 0° while the for all the other angles rather than 0° the value will be less
than 1. Cosine similarity is used to find the normalized dot product of the two vectors. The
cosine similarity determined the cosine between the angles. If the two vectors are parallel it
means the cosine similarity between these vectors is 1 while the vectors which makes an
angle of 90° will have the cosine similarity of 0.
𝑉(𝑑1 ). (𝑉𝑑2 )
𝑠𝑖𝑚(𝑑1 , 𝑑2 ) =
‖𝑉(𝑑1 )‖‖𝑉(𝑑2 )‖
Here in the numerator there is a dot product of the two vector documents while the
denominator is the dot product of the Euclidean distance of the given vector documents.
Suppose there is a vector document denoted as 𝑑1 which has weights of, for example
𝑊1 , 𝑊2 , 𝑊3 and another vector document 𝑑2 contain two weights of 𝑋1 , 𝑋2
𝑑1 . 𝑑2 = 𝑊1 × 𝑋1 +, 𝑊2 × 𝑋2
It is the dot product of the two vector documents. There is no third weight of vector document
𝑑2 therefore, 𝑊3 × ∅ = 0
𝑑1 . 𝑑2 = 𝑊1 × 𝑋1 +, 𝑊2 × 𝑋2+ 0
While in the denominator there is a product of Euclidean distance of the two vector
documents i.e.
𝐸𝑢𝑐𝑙𝑖𝑑𝑒𝑎𝑛 𝐷𝑖𝑠𝑡𝑎𝑛𝑐𝑒 𝑜𝑓 𝑑1 = √ 𝑊1 2 + 𝑊2 2 + 𝑊3 2
&
𝐸𝑢𝑐𝑙𝑖𝑑𝑒𝑎𝑛 𝐷𝑖𝑠𝑡𝑎𝑛𝑐𝑒 𝑜𝑓 𝑑2 = √𝑋1 2 + 𝑋2 2

Methodology
Let’s work with the query which has the word gossip and the word jealous in it, so there are
two term vocabulary gossip and jealous. And we have a query that has both these terms, we
have a document d that has gossip in it but it does not have word jealous. If you plot the
vector for this document it is going to close y axis because the gossip access it has only the
word gossip in it approximately. And you have a document d3 which has a word jealous in it,
but not gossip so d1 has a gossip, but not a jealous and d3 has a jealous, but not gossip and
then we have d2 which has gossip and jealous both but both appearing multiple times. So if
we do not convert these vectors into a unit vector, then the vectors are going to look
something like this d2 is going to the extent far out into the d2 space.
Figure 14 Documents represented as vectors in a plane

The query has both these terms. This document has both of these terms, but if you look at the
Euclidian distance between the query and the second document its more than the distance
between the query and d3 and query and d1.even though both d1 and d3 have just one of the
two query terms. Where is d2 has a both the q terms still the distance between q and d2 is
longer, and that’s because we have not normalized these vector normalization is important
once we normalized then d2 it will become equal to the query vector. Now when it gets close
to Q, we can calculate the angle between the two vectors.
Actually, there is a two way to think about this one is, first do the normalization and compute
the dot product and another way to think about this is not do the normalization but then use
the formula of cosine theta. The cosine of theta between the query vector and the document
vector is the dot product of the vector document and the query vector over the mod of vector
document and the mod of query vector.
⃗ . ⃗⃗⃗⃗
𝑄 𝐷2
cos 𝜃 =
|𝐴||𝐵 ⃗|
AND IT’S A same thing of the cosine of the angle Q between Q vector and D vector is the
dot product in the Q and the document d divided by the magnitude of the Q vector and the
magnitude of the d vector (formula) so we can take this magnitude and combine this vector
and that will become unit vector. In the direction of Q likewise this vector d divided by the
magnitude of d has been just the unit vector in the direction of d i.e.
𝑄⃗
𝑄̂ =
⃗⃗⃗⃗⃗
|𝑄|
&
⃗⃗⃗⃗
𝐷2
̂=
𝐷
⃗⃗⃗⃗⃗⃗⃗
|𝐷2 |
We can either do the normalization or we can first take the dot product and then divide the
dot product by the product of magnitude. It does not matter which way we do it, but in either
case we will be measuring the angel between the two vectors almost specifically the cosine of
the angle and if we do that then we can see that 𝐷2 will appear somewhere near o the query
vector. So the angle between Q and 𝐷2 or even we are directly measuring the angle between
Q and 𝐷2 or even if we are directly measuring the angle without normalization, the angle
between Q and 𝐷2 is very small thus the angle between Q and 𝐷1 and the angle between Q
and 𝐷3 is large that means Q and 𝐷2 are closer to one another, then Q is closer to 𝐷1 or 𝐷3 .
So 𝐷2 is the most relevant document in this case.
The Cosine of the angle between the two vectors is already expressing the unit vectors we
just take the dot product. So if Q and D have already been length normalized then the cosine
of the angle between them is just a dot product and the dot product is nothing but the product
of their component added together. Take the product component by component and add it to
Q, D length normalized.
|𝑉|
⃗ .𝐷
cos(𝑄 ⃗ .𝐷
⃗ ) =𝑄 ⃗ =∑ 𝑄𝑖 𝐷𝑖
𝑖=1
And if these are not normalized than we take that actual dot product and divided by the
product of the magnitude of the two vectors. Than in this case,
⃗ .𝐷
𝑄 ⃗ ⃗ 𝐷
𝑄 ⃗ ∑|𝑉|
𝑖=1 𝑄𝑖 𝐷𝑖
⃗ ⃗
cos 𝑄 . 𝐷 = = . =
⃗ ||𝐷
|𝑄 ⃗ | |𝐷
⃗ | |𝑄 ⃗|
√∑|𝑉|
𝑖=1 𝑄𝑖
2
√∑|𝑉|
𝑖=1 𝐷𝑖
2
⃗
𝑄 ⃗
𝐷
 .
⃗ | |𝐷
⃗|
= Unit vectors
|𝑄
𝑄⃗ .𝐷
⃗
 ⃗ ||𝐷⃗|
= dot product
|𝑄
These vectors 𝑄𝑖 and 𝐷𝑖 are the vectors of tf-idf weight. The ith components of vectors Q viz
𝑄𝑖 is the tf-idf weight of the ith term in the vocabulary and similarly the ith components of
vectors D viz 𝐷𝑖 is the tf-idf weight of the ith term in the vocabulary which are being
⃗ .𝐷
multiplied with one another. I the above mentioned equation, cos 𝑄 ⃗ is the cosine similarity
of vector Q and vector D or eventually the cosine of the angle between vector Q and D.
If we convert the vectors into a unit vector than we will have a unit circle and then all the
vector align along this unit circle or in general along this v dimensional surface. All these
vactors will have the end point along with the unit circle.
Figure 15 Vectors represented in a v dimensional surface
Now lets take an example, suppose we have these three classical novel which are sense and
sensibility, pride and prejudice and wuthering heights. Our focus or attention will be on the
four random terms in the novels which includes affection, jealous, gossip, wuthering. In this
example we will not do the idf weighting. The term frequency count of the above mentioned
four terms in the novels are as follows:

SaS PaP WH
Terms Term frequency Term frequency Term frequency
affection 115 58 20
jealous 10 7 11
gossip 2 0 6
wuthering 0 0 38
Table 1 Cosine similarity amongst 3 documents
So these are three document and because there are four terms so we represent these four
documents in four dimension space and we will get three point for three document.
Now lets compute the tf score and idf score and than multiply the two.
Terms Log weighing of Log weighing of Log weighing of
Sas PaP WH
affection 1 + log10 115 1 + log10 58 = 2.76 1 + log10 20 = 2.30
= 3.06
jealous 1 + log10 10 = 2.00 1 + log10 7 = 1.85 1 + log10 11 = 2.04
gossip 1 + log10 2 = 1.30 0 1 + log10 6 = 1.78
wuthering 0 0 1 + log10 38 = 2.58
Table 2 Log frequency weighting
Now if we want to compute that how close these points are one another in v dimensional
space, we have to do length normalization also. Ofcourse we can directly compute angle
between the three letters but we just assume that first do length normalization before taking
cosine score.
Terms Sas PaP WH
affection 3.06 2.76 2.30

√(3.062 + 22 + 1.302 ) √(2.762 + 1.852 ) √(2.32 + 2.042 + 1.782 + 2.582 )
= 0.789 = 0.832 = 0.524
jealous 2 1.85 2.04

√(3.062 + 22 + 1.302 ) √(2.762 + 1.852 ) √(2.32 + 2.042 + 1.782 + 2.582 )
= 0.515 = 0.555 = 0.465
gossip 1.30 0 1.78

√(3.062 + 22 + 1.302 + 02 ) √(2.32 + 2.042 + 1.782 + 2.582 )
= 0.335 = 0.405
wutherin 0 0 2.58
√(2.32 + 2.042 + 1.782 + 2.582 )
g
= 0.588
Table 3 Length Normalization
We can actually verify unit length of the vectors. The sum of the square’s of magnitude over
the square root of the given vector will be 1.
 𝑆𝑎𝑆 = √(0.7892 + 0.5152 + 0.3352 + 02 ) = 1
 𝑃𝑎𝑃 = √(0.8322 + 0.5552 + 02 + 02 ) = 1

 𝑊𝐻 = √(0.5242 + 0.4652 + 0.4052 + 0.5882 ) = 1
So we have three vectors (SaS, PaP, WH) in four dimensional space. First, we will compute
the cosine score between two novels sense and sensibility and pride and prejudice. In this
case, the cosine score will be just given by the dot product of the given vectors because they
have already been normalized. So its just a simple dot product of SaS an PaP.
cos(𝑆𝑎𝑆, 𝑃𝑎𝑃) = {(0.789 × 0.832) + (0.515 × 0.555) + (0.335 × 0) + (0 × 0)} = 0.94
If we compute the cosine score between sense and sensibility and Wuthering Heights, the
cosine score will be 0.79, and if we compute the score between the pride and prejudice and
Wuthering heights we will get a score of 0.69. The cosine scores between sense and
sensibility and pride and prejudice to be quite high around 0.94, its higher than the other two
cosine scores. Now, mathematically it is proved that the SaS and the PaP are closer to each
their because the cosine of the angle between them is greater an the other one i.e.
𝑆𝑎𝑆, 𝑃𝑎𝑃 > 𝑆𝑎𝑆, 𝑊𝐻. In other word he reason behind their similarity might be that the
authors of both the novels is same.
The above discussed example uses the concept of normalization of vectors. Now we will
consider another example in which the similarity among the documents will be measured.
Suppose we have three documents such as
D1 = People injured in an accident
D2 = People are travelling in a bus

D3 = The bus has an accident.
And will also have a query vector. We have to remove he stop words from the query vector
so it can be considered s a bag of words.
Q = “people bus accident”
According to the above mentioned details, we have the total number f documents (n=3). Now
we will calculate the tf-idf wights.
𝑛
 𝑖𝑑𝑓 = log10
𝑑𝑓
 tf-idf = 𝑡𝑓 × 𝑖𝑑𝑓
Terms Term Frequency Weights= tf*idf
Q D1 D2 D3 df n/df IDF Q D1 D2 D3
people 1 1 1 0 2 1.5 0.1760 0.1760 0.1760 0.1760 0
injured 0 1 0 0 1 3 0.4771 0 0.4771 0 0
in 0 1 1 0 2 1.5 0.1760 0 0.1760 0.1760 0
accident 1 1 0 1 2 1.5 0.1760 0.1760 0.1760 0 0.1760
are 0 0 1 0 1 3 0.4771 0 0 0.4771 0
travelling 0 0 1 0 1 3 0.4771 0 0 0.4771 0
a 0 0 1 0 1 3 0.4771 0 0 0.4771 0
bus 1 0 1 1 2 1.5 0.1760 0.1760 0 0.1760 0.1760
the 0 0 0 1 1 3 0.4771 0 0 0 0.4771
has 0 0 0 1 1 3 0.4771 0 0 0 0.4771
an 0 1 0 1 2 1.5 0.1760 0 0.1760 0 0.1760
Figure 16 tf-idf weights
After calculating the tf-idf weights, we have to calculate the vector length which is actually
the Euclidean distance of the vectors.
|𝐷| = √∑ (𝑤𝑖,𝑗 2 )
𝑖
So the Euclidean length of the vectords;
|𝐷1 | = √(0.17602 + 0.47712 + 0.17602 + 0.17602 + 0.17602 ) = 0.5930
|𝐷2 | = √(0.17602 + 0.17602 + 0.47712 + 0.47712 + 0.47712 + 0.17602 ) = 0.8809
|𝐷3 | = √(0.17602 + 0.17602 + 0.47712 + 0.47712 + 0.17602 ) = 0.7404
|𝑄| = √(0.17602 + 0.17602 + 0.17602 ) = 0.3048
After calculating the Euclidean distance now we will compute the dot product of each
document vector with the query vector.
𝑄. 𝐷𝑖 = (𝑤𝑄,𝑗 × 𝑤𝑖,𝑗 )
The document 1 has the two terms similar to the query, i.e. people and accident so we will
add the dot products of the query vector and the document vector of term i.
𝑄. 𝐷1 = √(0.1760 × 0.1760) + (0.1760 × 0.1760) = 0.2489
The document 2 also has the two terms similar to the query, i.e. people and bus so we will
𝑄. 𝐷2 = √(0.1760 × 0.1760) + (0.1760 × 0.1760) = 0.2489
The document 3 also has the two terms similar to the query, i.e. accident and bus so we will
𝑄. 𝐷1 = √(0.1760 × 0.1760) + (0.1760 × 0.1760) = 0.2489
After computing the dot product, the final step to measure the similarity index between the
documents is to fine the cosine of the angle between the document vector and the query
vector.
Document 1
𝑄. 𝐷1
cos 𝜃(𝐷1 ) =
|𝑄| × |𝐷1 |
0.2489
cos 𝜃(𝐷1 ) =
0.3048 × 0.5930
cos 𝜃(𝐷1 ) = 1.3771
Document 2
𝑄. 𝐷2
cos 𝜃(𝐷2 ) =
|𝑄| × |𝐷2 |
0.2489
cos 𝜃(𝐷2 ) =
0.3048 × 0.8809
cos 𝜃(𝐷2 ) = 0.9270

Document 3
𝑄. 𝐷3
cos 𝜃(𝐷3 ) =
|𝑄| × |𝐷3 |
0.2489
cos 𝜃(𝐷3 ) =
0.3048 × 0.7404
cos 𝜃(𝐷3 ) = 1.1029
So we see that the document 𝐷1 has the greatest value of cosine 𝜃, so it is more similar to the
query document.
CHAPTER 04
RECOMMENDATIONS & CONCLUSION
Advantages of content based system
 Recommendations can be made on the basis of the user’s ratings, a unique
recommendation can be made according to user’s taste.
 Content based recommender system is more efficient as it uses only the content of
each item for making recommendations, so the problem of huge data does not occur
while using an algorithm.
 The biggest advantage of the content based recommender system is that its
methodology is easy and logical as compare to other systems.
 The Recommendations of the content based system are made through interest of a
single user so it does not require the interest of other users.
 Content based recommender system can give the logic of their recommendations. This
ability makes it more beneficial than other systems.
Drawbacks of content based recommender system
 The biggest drawback of the content based recommender system is the huge size of
data set for items. As content based use the sets of items that relates more to user
preferences, so it is hard to examine user’s choice for every item.

 The algorithm must use the content of each item browsed by users. It is difficult to
consider each and every term of the set in the content based system.
 The results of any recommender system are not 100% accurate, so it is not easy to
estimate the problems of the user accurately.
 The content based system is complex because there is no data which exactly define
the user’s interest. Choices of different users are unclear and vary time to time,
according to events in their lives.
 To implement the content based recommender system in an organization is very
expensive, as it requires the highly professional experts to manage the system which i
 This system increases the labour expenses that are not affordable for many businesses.
Conclusion
In this project we have studied Recommender system, the background, different types,
pioneers of recommender system, world famous examples of recommender system and how
does it work. Our main focus was JBR, we analyze the problems and flaws that exist in the
JBRC of CUST. We are just dumping the papers in JBRC as there is no recommender system
in it. In order to remove these shortcomings of JBR in practical we have made a prototype
example by using the vector space model which involves the cosine similarity in it. Due to
shortage of time and some other limitations, we could not make it in running form. We
identified that the JBR is lacking semantic system. The user may not find the relevant item,
because the system is not able to recommend exactly what user is looking for, or anything
that is away from the mind of the user at that moment but recommender system can diminish
all these flaws of JBR. It will work on the basis of past information. The user will type the
query and the system will allow the user to quickly access the articles which are closer to his
query. It will be helpful for the users and will also enhance the efficiency at work.
These recommender systems can also be utilized in a broader sense. The database can be
created or the searched articles and the system will rank the articles in the database on the
basis of user priority, in such a situation Knowledge based recommender systems will be
used. These systems are considered as intelligent as they will be able to change the priority
automatically by observing the change in user actions.
To introduce such type of system, it requires a huge time to work on and in this project we
are lacking with time so initially we have focused a very small portion. At this stage we will
consider the only abstracts of papers and compare their similarity using the vector space
model and cosine similarity.

Project Report

Uploaded by

Copyright:

Available Formats

You might also like

Project Report

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Project Report

Uploaded by

Copyright:

Available Formats

Table of Contents

Introduction of Recommender System .................................................................................. 3

Collaborative filtering ........................................................................................................ 7

Content-based Filtering Technique .................................................................................... 9

Hybrid Filtering ............................................................................................................... 10

Problem Statement ............................................................................................................... 11

Background of the Recommender System ........................................................................... 13

Advantages of recommender system ............................................................................... 19

Disadvantages of recommender system ........................................................................... 20

Recommender system .......................................................................................................... 22

Content-based Filtering Technique ...................................................................................... 24

Cosine Similarity ............................................................................................................. 29

Vector Space Model ............................................................................................................. 30

RECOMMENDATIONS & CONCLUSION .......................................................................... 50

Advantages of content based system ................................................................................... 50

Drawbacks of content based recommender system ............................................................. 50

Introduction of Recommender System

items which they may not be able to find out by themselves.

Figure 1 Five star rating system

of their interests on the basis of their previous browsing details.

the new coming users.

techniques to manufacture a significantly more vigorous recommender system.

A recommender system is an instrument that lets algorithm developers a chance to anticipate

establish strong predictions about the future behavior of the user.

suggestions to the users.

content-based filtering. Different methodologies, (for example, hybrid filtering) additionally

Figure 2: Recommendation filtering techniques

Collaborative filtering used two different methods which are:

 The memory based and;

 The model based.

The memory based collaborative filtering is also known as neighborhood collaborative

prefer the other group of items similar to the target point.

Content-based Filtering Technique

similarity. The best coordinated pair’s will be recommended as suggestions. In the

utilized the client’s history to recommend him the similar contents.

Cosine Similarity Measure.

In Hybrid filtering techniques other recommendation techniques combine together in order to

of systems are joined to accomplish the best of all universes.

evaluation of value, style or perspective by the opinion of the different individual’s

the suggested recommender system will use the content-based technique.

Background of the Recommender System

fundamental purpose of the recommender system is to increase the profit of trader or

items very carefully. Items must be of user’s interest.

suggesting relevant items only.

diversity is decreased when any common items are repeatedly recommended by

institute opens in neighbourhood is recommended to a user who usually interested in different

before. It develops the new areas of interest for users.

Increase in recommendation diversity can increase in sales by attracting user. Recommender

recommendation of same items.

specific thing is suggested is regularly helpful. For instance, on account of Netflix,

recommendations are furnished alongside beforehand watched films..

when informational collections were not effectively accessible for benchmarking.

Unmistakable illustrations incorporate three informational indexes from the MovieLens

recommender system. These informational collections are of progressively expanding size,

and they contains 105, 106, and 107 ratings respectively.

of the Amazon.com recommender system are discussed.

objectives in comparison to an item suggestion. While an item suggestion straightforwardly

organization examination. Such types of proposals depend on structural relationships instead

of data. Therefore, the nature of the basic algorithms is totally unique.