Project Report

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 52

Table of Contents

CHAPTER 01 ............................................................................................................................ 3

INTRODUCTION ..................................................................................................................... 3

Introduction of Recommender System .................................................................................. 3

Approaches ............................................................................................................................ 7

Collaborative filtering ........................................................................................................ 7

Content-based Filtering Technique .................................................................................... 9

Hybrid Filtering ............................................................................................................... 10

Problem Statement ............................................................................................................... 11

CHAPTER 02 .......................................................................................................................... 13

BACKGROUND ..................................................................................................................... 13

Background of the Recommender System ........................................................................... 13

History.................................................................................................................................. 17

Advantages of recommender system ............................................................................... 19

Disadvantages of recommender system ........................................................................... 20

CHAPTER 03 .......................................................................................................................... 22

METHODOLOGY .................................................................................................................. 22

Recommender system .......................................................................................................... 22

Content-based Filtering Technique ...................................................................................... 24

Jaccard Similarity............................................................................................................. 27
Euclidean Distance........................................................................................................... 28

Cosine Similarity ............................................................................................................. 29

Vector Space Model ............................................................................................................. 30

Methodology ........................................................................................................................ 36

CHAPTER 04 .......................................................................................................................... 50

RECOMMENDATIONS & CONCLUSION .......................................................................... 50

Advantages of content based system ................................................................................... 50

Drawbacks of content based recommender system ............................................................. 50

Conclusion ........................................................................................................................... 51
CHAPTER 01

INTRODUCTION

Introduction of Recommender System

The electronic business is growing rapidly, which has increased the available information on

the internet due to which it becomes difficult to access the required information on internet
timely. This problem can be solved by introducing the recommender systems. Currently, the

recommender systems are becoming more popular in the business world. Almost every

consumer website has their own recommender systems. Recommender systems can be used

in many areas such as research papers, movies, videos, newspaper, books, songs, products

and so on.

The recommender systems can be defined as an information filtering system which is used to

predict the user preferences about his choices and other user’s behavior as well or it is a

discovery assistant that helps theirs users in identifying the items they may like. These

systems basically make suggestions of items for the users on the basis of his past purchases

and searches, and on different user's conduct. It creates ease for the customer to discover

items which they may not be able to find out by themselves.

The recommendation system allows the customers to give input about their preferences or

abhorrence's. Look at Netflix as an object lesson, where customers can provide the feedback

around their options by using a solitary snap to rate the items. The users use the numerical

values to rate the items. The five star rating system is a salient example.

Figure 1 Five star rating system


Other types of feedback are quite unclear, but easy to collect on the Web-driven worldview.

In addition to the rating system, feedback can also be obtained by recording the buying and

browsing behavior of customers. The online traders like Amazon.com,, Facebook, YouTube,

Daraz.pk, and OLX.com etc. use these types of feedbacks. The substance to which the

suggestion is given is referred to as the customer, and the goods which are being prescribed

are likewise considered as items. As the past interests of the users are considered as the best

indicators for the future preferences so that’s why the recommender system uses the users

past behaviors to prepare an analytic thinking of their tastes. Suppose, a customer came on

website and orders shampoo, conditioner and body wash. His buying pattern will be recorded

in the database and when another customer came and orders for example shampoo, the

recommender system will starts suggesting him to order conditioner and body wash as well.

Let us consider the example of YouTube, where the system suggests his customers the clips

of their interests on the basis of their previous browsing details.

A prominent exemption is the situation of a knowledge based recommender system which did

not consider the past preferences of the users instead it considers the requirements suggested

by the users. The knowledge based recommender systems are intelligent enough to think and

able to arrive at a decision that which product should be ranked high or low. For instance, a

product which is not commonly purchased by the customers lies in the lowest ratings, but

suddenly people start purchasing that item, in this situation the knowledge based

recommender system will automatically raise its ratings and will start suggesting that item to

the new coming users.

The recommender systems work in a perfect world in one of two ways. It can depend on the

properties of the items that a customer likes, which are broken down to figure out what else

the customer may like; or, it can depend on the preferences of different customers, which the

recommender system at that point uses to measure similarity index amongst customers and
prescribed items to them as needs be. It is additionally conceivable to consolidate both these

techniques to manufacture a significantly more vigorous recommender system.

A recommender system is an instrument that lets algorithm developers a chance to anticipate

what a customer might possibly like among a rundown of given items. The basic rule of the

recommendations is that there must be a significant relationship between the user and the

item which he selects. Suppose, a client who is occupied with an educational program will

probably be keen on another program related to the subject, as opposed to in an action film,

etc. In many cases, where the products are classified into groups of similar items, these

groups may have significant correlations among them which can be utilized to make perfect

recommendations to the users. But on the other hand, the dependency among the user and the

item might be available at the better granularity of individual things instead of different

classes of items. The dependencies can be viewed in the form of ratings matrix, and the future

predictions will be established on the ground of these rating matrix for target users. If the

quantity of rated things that are accessible to a node is larger than it is more simple to

establish strong predictions about the future behavior of the user.

A wide range of models can be utilized to establish future predictions about the user

preferences. For example, the aggregate purchasing or rating conduct of different clients can

be utilized to make clusters of similar clients who experience a similar choice of items. The

interests and activities of these clusters can be utilized to make suggestions to the singular

individuals from the cluster. If the new user outside of that cluster came and order an item

from that cluster the system will start suggesting him the other items from that cluster.

There are some approaches which can be used in the recommender system to make

suggestions to the users.


Approaches

Most recommender systems adopt one of two essential strategies: collaborative filtering or

content-based filtering. Different methodologies, (for example, hybrid filtering) additionally

exist.

Figure 2: Recommendation filtering techniques

Collaborative filtering

Collaborative Filtering algorithms are based on the client’s behavior. The model of

collaborative filtering can be developed from individual user behavior, but in order to start

out more beneficial results it can likewise be prepared along the base of other users conduct

who experience similar choices or tastes. Collaborative filtering is used to predict the missing
ratings. The ratings provided by the multiple users about the items will be used to make

recommendations for the new users and also for the existing ones. Let us consider an example

of digital library, user ratings about the books specify their likes or dislikes about the specific

books. Most users would have seen just a little portion from the immense availability of

accessible records. Consequently, a big part of the ratings are unobserved (or unspecified).

Instead of that the ratings, which are identified or determined by the users can be termed as

observed ratings while the unspecified ratings are also known as the unobserved or missing

ratings.

The collaborative filtering technique is founded on a single fundamental rule that is if there

are some ratings, which are unspecified than these unspecified ratings can be attributed on the

ground that the observed ratings are highly correlated crosswise over different users and

items. Let’s take an example to explain this concept in detail, consider the two users who

have very common preferences. Suppose, they have the same specified ratings for an item

than at that point their likeness can be recognized by the basic algorithm. It is probable to

consider obvious in a case where one of them has rated the value, the other one also bears the

same feeling. This familiarity can be applied to make deductions about not completely

indicated esteems.

Suppose another example of collaborative filtering, suppose we’re making a website for

suggesting articles to the readers. In collaborative filtering technique the information of the

users about their preferences will be recorded that who have subscribed and read the article.

The system will create the groups of the users who prefer the similar articles. From this
available information, it is now easy to identify the most popular articles and than that article

can be recommended to the group member who have not read or subscribe that article yet.

Collaborative filtering used two different methods which are:

 The memory based and;

 The model based.

The memory based collaborative filtering is also known as neighborhood collaborative

filtering. This method is used to anticipate the rating which are unspecified yet on the basis of

their neighborhoods. The user based collaborative filtering and the item based collaborative

filtering are the two different techniques to explain the neighborhood. In user based

collaborative filtering technique the users will be identified who experience the similar

preferences in the past and will prove to predict that if two different users like the same affair

in a same fashion than in future one of them’s unobserved ratings can be found along the

footing of the observed ratings of the other one. While in item based collaborative filtering

technique the items will be identified which are more similar to the target point and will

prove to predict that if a particular user likes the target product than he will more likely to

prefer the other group of items similar to the target point.

Content-based Filtering Technique

In the content-based recommender system the explanations related to the target item plays an

essential role in order to make predictions. These explanations are termed as Content. In the

content based recommender system, the past purchasing designs and the senior ratings of the

users along with the content of the item are collectively utilized keeping in mind the end goal
to arrive at predictions. The basic thought behind the content-based recommender system is

that the user interests can be determined on the basis of features or properties of the items

they have graded or used previously. Content based systems, suggests items in view of a

close examination between the description of the items and a client's profile. The component

of items is mapped with highlight of clients keeping in mind the end goal to get client – item

similarity. The best coordinated pair’s will be recommended as suggestions. In the

recommendations of documents to the users such as articles, papers, web logs, web pages,

publications and so on, the content based recommender systems are regarded as the most

successful filtering technique. CBF used many different models such as the vector space

model, neural networks, decision trees etc, to find or measure the document similarity.

This approach may utilize authentic perusing data, for example, which blogs the client

peruses and the attributes of those blogs. Imagine a client normally reads the history based

articles and comment the blogs about software, and then the content based filtering will

utilized the client’s history to recommend him the similar contents.

Another example of the CBF is the News Dude which is an individual news system that uses

blended discourse to peruse news stories to clients. The TF-IDF model is applied to depict

news stories so as to decide the fleeting suggestions which are then contrasted with the

Cosine Similarity Measure.

Hybrid Filtering

In Hybrid filtering techniques other recommendation techniques combine together in order to

manage the large measure of inputs as the recommender system has the flexibility to treat
different type of task simultaneously. The point where a large number of inputs are available,

there is a golden opportunity for hybridization where the different angles from various sorts

of systems are joined to accomplish the best of all universes.

Problem Statement

After leading an analysis of JBR of CUST, we have identified that it neglects the need of

recommender system. The recommender system guides the choices of the users. It increases

the probability of receiving the relevant info for the users and benefit him in the context of

the time saving and increasing efficiency of work. The recommender system will create an

evaluation of value, style or perspective by the opinion of the different individual’s

experience. The recommender system uses the client’s preferences to make accurate

recommendations to the user related to their pursuits. The purpose of our study is to introduce

a recommender system for JBR, in which the similarity between the articles can be measured

in order to provide the ease to the users. In order to evaluate the similarity index of text files

the suggested recommender system will use the content-based technique.

In this study, our point of interest is the content-based filtering techniques for the

recommender system because we are working on the recommender system for the Jinnah

Business Review. The motivation behind this action is to present the system for JBR through

which the similarity between the articles can be measured in order to provide the ease to the

users. Due to this recommender system the users will be able to find out the relevant

information quickly. Generally the content files are considered as similar if both of the

articles are portraying similar ideas and they are semantically close too. Then again,

"similarity" can be practiced in a setting of copy discovery. We will consider the similarity

with regards to the comparable ideas between the articles. We will quantify the articles
similarity in light of the vector space model which is based on the cosine similarity. In this

model, articles will be considered as vectors rather than content. The angle between the given

vectors will characterize their similarity. On the off chance that the angle is little, it shows

that the records are more similar and if the angle is huge, it demonstrates that the articles are

less similar. Cosine similarity involves contrasting client profiles, item profiles or content

records. Utilizing cosine similarity we will discover whether the two articles are of similar

idea or not.
CHAPTER 02

BACKGROUND

Background of the Recommender System

Recommender system is used to suggest the relevant items or products to user. The

fundamental purpose of the recommender system is to increase the profit of trader or

merchants by increasing sales. Increase in sales is possible only when the recommender

system shows items which are of customer’s interest or relevant to customer’s interest. The

profit can be increased through showing user more relevant items. As much the recommender

system shows relevant items as user will purchase more items which leads to higher

profitability for trader. The recommender system gain attention of user by showing more and

more relevant items. A strong recommender system which has a clear and more relevant

suggestions of items is more profitable to merchant. Through suggesting different items the

recommender system attracts the user so it is very important to choose the relevant suggesting

items very carefully. Items must be of user’s interest.

The very first or fundamental of any recommended item is that it must be relevant to the user.

It must be attractive because generally users buy the products which are interesting.

Relevancy is a basic goal of recommender system, because if the product is not relevant, it

will never attract the customer, recommender system can gain the attention of user by

suggesting relevant items only.


It is nature of human being that unique and new things attract it most. Any item attracts user

when it is never seen by user before. Items which are common does not attract user. If the

recommender system shows any same items again, it also make customer bore. Sales

diversity is decreased when any common items are repeatedly recommended by

recommender system.

Another way to attract user is to show something which is unexpected for user, and user feel

it luck to discover the item. Showing new things may not attract the user as much as the

unexpected item. It works when recommender system totally amaze the user by showing any

surprising item, instead of showing anything that they did not seen before. Because in some

cases items may be new but according to user’s interest. For instance a new English language

institute opens in neighbourhood is recommended to a user who usually interested in different

languages learning schools, it may not attract the user although it is new for user but not

surprising. On the other hand if any Chinese or Japanese language learning institute

suggestion is made, it may surprise the user and gain the user’s attention. Showing surprising

item is beneficial for increase in sale diversity of any item which are not much popular

before. It develops the new areas of interest for users.

Increase in recommendation diversity can increase in sales by attracting user. Recommender

system commonly suggest a list of top-k things. At the period when all these suggested things

are fundamentally the same as, it builds the hazard that the user dislike any of these matters.

Then again, when the suggested list contains things of various sorts, there is a more

prominent possibility of choosing more than one from the list. Decent variant has a

competitive edge of making assure that user should not get bore from repeated

recommendation of same items.


Beside these solid objectives, various delicate objectives are additionally fulfilled by the

recommendation system, which are useful for both the user and the merchant. From the point

of view of the user recommendations can assist enhance general user fulfilment with the Web

webpage. For instance, a user who more than once gets more suggestions from Amazon.com

will be happier with these suggestions and will probably utilize the site once more. This can

enhance the sincerity of user and further increment the deals at the site. At the trader end, the

recommendation process can give bits of knowledge into the requirements of the user and

help alter the user encounter, further. At long last, giving the user a clarification to why a

specific thing is suggested is regularly helpful. For instance, on account of Netflix,

recommendations are furnished alongside beforehand watched films..

There is a wide assorted variety in the sorts of items suggested by such systems. A few

recommender systems, for example, Facebook, don't specifically suggest items. Or maybe

they may suggest social connections, which have a backhanded advantage to the site by

expanding its convenience and publicizing benefits. Keeping in mind the end goal to

comprehend the idea of these objectives, we will talk about some well-known cases of

recorded and current recommender systems. These cases will likewise exhibit the wide

decent variety of recommender systems that were established either as research models, or

can be used today as business systems for solving different business problems.

GroupLens was a very first recommender system, which was worked as an examination

model for suggestion of Usenet news. The system gathered evaluations from Usenet readers

and utilized them to predict regardless of whether different readers might want an article

before they read it. A portion of the collaborative filtering algorithms were created in the

GroupLens setting. The general thoughts created by this gathering were additionally stretched

out to other item settings, for example, books and films. The comparing recommender

systems were primarily known as BookLens and MovieLens, individually. Besides its
pioneering contributions to collaborative filtering research, the GroupLens research group

was striking for discharging a few informational indexes amid the early years of this field,

when informational collections were not effectively accessible for benchmarking.

Unmistakable illustrations incorporate three informational indexes from the MovieLens

recommender system. These informational collections are of progressively expanding size,

and they contains 105, 106, and 107 ratings respectively.

Amazon.com was also in one of the innovators of recommender system, particularly in the

business setting. Amid the early years, it was one of only a handful couple of retailers that

had the forward planning to understand the value of this innovation. Initially established as a

book e-retailer, the business extended to for all intents and purposes all types of items. Thus,

Amazon.com now offers for all intents and suggests all classifications of items, for example,

books, CDs, programming, hardware, electronics and so on. The suggestions in Amazon.com

are given on the premise of unequivocally gave evaluations, purchasing conduct, and

perusing conduct. The evaluations in Amazon.com are indicated on a 5-point scale, with most

minimal rating being 1-star, and the most elevated rating being 5-star. The user particular

purchasing and perusing information can be effortlessly gathered when users are signed in

with a record validation system upheld by Amazon. Suggestions are likewise given to users

on the basic Web page of the webpage, at whatever point they sign into their records. As a

rule, clarifications for suggestions are given. For instance, the relationship of a prescribed

thing to already obtained things might be incorporated into the recommender system

interface. The buy or perusing conduct of a user can be seen as a sort of understood rating,

instead of an express appraising, which is indicated by the user. Numerous business systems

allow the adaptability of giving suggestions both on the premise of express and understood

input. Truth be told, a few models have been intended to together record for unequivocal and
certain input in the suggestion procedure. A portion of the calculations utilized by early forms

of the Amazon.com recommender system are discussed.

Social networks regularly recommend potential friends to users with a specific end goal to

build the number of social associations at the site. Facebook is one such case of a long range

interpersonal communication Web webpage. This sort of proposal has somewhat unexpected

objectives in comparison to an item suggestion. While an item suggestion straightforwardly

builds the benefit of the dealer by encouraging item deals, an expansion in the quantity of

social connections enhances the experience of a user at an informal organization. This, thus,

energizes the development of the informal community. Informal organizations are intensely

reliant on the development of the system to build their promoting incomes. In this manner,

the suggestion of potential friends (or connections) empowers better development and

availability of the system. This issue is also known as link prediction in the arena of informal

organization examination. Such types of proposals depend on structural relationships instead

of data. Therefore, the nature of the basic algorithms is totally unique.

History

Recommender system is very popular and powerful tool of digital world in the present era.

As recommender system helps one to find out the stuff they are looking for, from the options

of millions, because there is a lot of choices and alternatives are provided to single user. In

present we can easily find a solution of it on amazon.com or by going to Netflix but in the

past it was not that much easy. It is a fact that human beings follow each other most of the

time, if one is a going in a specific direction soon you may find that it creates a line of other

followers behind, similarly if a user choose any one product and then gradually an increase in

the demand of product can observed. On the basis of similar concept recommender system
works. Information retrieval evolved in response to the need to be able to ask questions about

a large collection of documents. Now much of the computing here was actually done because

of large lawsuits that were being handled in the computer industry companies like IBM, but

the same technology applies to libraries and their card catalogues or even to companies that

are building indexes of the world wide web. The principles are the same. You have a static

content base, or mostly static. We don't publish new books that often, compared to how often

we read them. Or we don't publish new webpages as often as people navigate to them. But we

have a dynamic information need. That information need is what we sometimes call a query

and a femoral interest that we want an answer to. Because of this balance, we spend our time

and invest it in indexing everything we can about that content base.

The introduction of GroupLens done by Paul Resnick, John Riddle and their understudies

was that users who are perusing news articles through GroupLens would rate the articles as

they read them. They simply put in a speedy number for a one through five and users would

be coordinated to each other to discover other individuals who had comparative tastes. When

you went to the newsgroup to choose what articles to peruse, you would get a customized

forecast of which articles you might want or aversion and how much utilizing a closest

neighbour approach where consolidated together the evaluations of other individuals like you.

This is the thing that that resembled. In the mid and late 90s, organizations were jumping up

left and right. GroupLens wasn't an advance months a short time later, there was a framework

called Ringo and Homer from the MIT Media Lab that turned out to be Firefly Networks.

The GroupLens framework turned into the organization Net Perceptions, Firefly progressed

toward becoming Agents Inc. Work was being done left and right, and people went out, and

got these things into business hone. Amazon was only one case of the actually handfuls and

after that hundreds and afterward a large number of employments industrially, of

recommender innovation beginning in the mid-90s and advancing to today. At the point when
the user demands proposals, the framework utilizes its connections to discover an area of

users who concur with this user on what films are great. It at that point utilizes those

neighbors' sentiments to give suggestions to this user. The thought being that in the event that

you concede to a great deal of the things you've seen officially, at that point the things seen

by this individual who concurs with you may be a decent suggestion among the things that

you haven't seen. So on the off chance that it sounds good to you to see this in a table instead

of only a major blob of users, how about we take a gander at this arrangement of evaluations

and say, I'm endeavoring to choose whether I need to watch Blimp or Rocky XV today

around evening time. So we will search for users that concur with me in our past appraisals

and we see that Ben, and I concur generally well on two or three films that we've seen, and

Nathan, and I likewise concur swapping out one of the motion pictures. What's more, we'll

discover that Joe and I differ unequivocally, however we likewise differ decently reliably.

Perhaps I may like something that he doesn't care for. So at that point Pat and I concur,

however Pat hasn't seen either Rocky XV or Blimp. So despite the fact that we concur, his

suppositions aren't that helpful for making sense of what I should watch next. So now, we

take a gander at what do these users consider the films.

Advantages of recommender system

Recommendation systems assume a critical role in the present web. Here is a parcel of the

advantages and disadvantages of recommender system.

 The recommender system depends on real user conduct, i.e. target reality. This is the

greatest preferred standpoint - watching individuals in their indigenous habitat and

settling on outline choices straightforwardly on the outcomes. For instance, the


"Recommended Post" highlight of Facebook proposes posts on our likes and

preferences.

 The recommendation system helps the users to identify the data connected to their

interest or preferences. It benefits him in the context of the time saving and increasing

efficiency of study.

 A recommendation system is used to guide the user's selection or preferences by

considering the past preferences, buying or browsing behaviour clients.

 In the recommender system, the aggregate purchasing or rating conduct of different

clients can be utilized to make clusters of similar clients who experience a similar

choice of items. The interests and activities of these clusters can be utilized to make

suggestions to the singular individuals from the cluster. If the new user outside of that

cluster came and order an item from that cluster the system will start suggesting him

the other items from that cluster.

Disadvantages of recommender system

 The biggest advantage of recommender system is huge data. As there is millions of

users and it is very difficult to manage big data.

 User preferences never remain same and they may change it time to time according to

need but recommender system can only recommend the items on the basis of past

preferences.

 Another problem in recommender system is change in data. Because trend are not

constant they always change, so an algorithm may not get the clear data for

recommendation.
 A lot of variables are required to make a single recommendation, because it need

enough information to give a simple recommendation and it makes recommender

system complex. To get user satisfaction it is important to make a suitable required

recommendation.

 Sometimes user’s interest are dynamic and past preferences are totally opposite from

current choices so it is difficult to recommend a list of specific items. Because user’s

behaviour and choice is totally unpredictable and diverse.


CHAPTER 03

METHODOLOGY

Recommender system

The trend of e-commerce is at peak in the modern era. Use of Recommendation system is

increasing day by day in e-commerce systems for personalization. Today Recommendation

systems become very important aspect of e-commerce systems. The main purpose of

Recommendation system is to make a suggestion to user by providing information about

different items available. Although the recommendation system is used at very large scale but

there are some shortcomings and problems which are faced by recommender system.

The first important point or approach in the recommendation system is the prediction of

rating value for the compound of user item. In this case the assumption is to predict data

through the preferences of the user for specific items. An "𝑀 × 𝑁" matrix is created to record

the m user and n items where recorded values are used for training model. The problem has

occurred in the system due to lack of accuracy as the system is based on an assumption. This

problem is often known as the matrix completion problem because the matrix of values is

recorded incomplete and all other values are forecasted by learning algorithms. The other

common problem in recommendation system is ranking problem. In actuality it is not

necessary that predicted items must be based on user’s past preferences because a user may

not like any item again, which he liked in the past may be he is looking for a new item. As

recommendations cannot be always made on the basis of ratings of user specific preferences.

So merchant should present something new to user which may attract him more than past.
May be a merchant also want to promote a newly introduce the product. The merchant also

wants to promote a newly introduce the product. The merchant should recommend the top-k

items for the promotion to selected top-k user. But the problem is due to selection of top-k

items which should be included in the top-k items. In this case the accuracy of the rating it is

not necessary. If the problem of the first method is reduced the second problem is

automatically resolved because the solutions of second are automatically resolved from the

solutions of the first problem, Sometime this problem can be solved without the help of the

first problem.

To make effective recommendations, the recommender system requires a lot of data. I may

be the biggest problem in the Recommender system to collect and manage this huge data.

Because all the big companies or e-commerce systems using a recommender system has

millions of users and data accordingly. Those recommender systems which have outstanding

recommendations has a huge data of users like amazon.com, Facebook, Netflix, Google, etc.

To make algorithm work the recommender system has to create "𝑀 × 𝑁" matrix which can

only be created with large data. When many items and user are included in the matrix a

strong recommendation is formed. So to get good recommendation you have to manage huge

data.

Another problem is previous data. The recommender system may show same item again and

again, and ignore the new ones. It may create biasness toward the previous preferences. It is

very hard to determine the new item prediction to the user.


User preferences do not remain same. May be a user chooses one item today do not need

tomorrow. He may look for any other product he needs, but the recommender system is

recommending the previous items. For instance a user searched for T-shirts on amazon.com

today, but on the next day he may look for any good literature book. But the Recommender

system is not able to recommend the books automatically. That is why there is very few e-

commerce systems which has an excellent and strong recommender system.

Content-based Filtering Technique

The recommender systems are a discovery assistant that helps theirs users in distinguishing

the details they may like. Generally the recommender systems can be separated into two

types,

 The content-based filtering and;

 The collaborative filtering.

In this study, our point of focus is the content based filtering models. Before we go on, it’s

important to characterize the couple of terms which are as follows:

 The items whose traits can be applied every bit a component of the recommender

system will be known as the content.

 Attributes or the traits of the items are the description of the particular item. For

instance, the collection of words in an article etc.

Now let’s explain what actually the content-based recommender systems are?
Suppose someone has approached you for a book recommendation, it's truly normal to solicit

what sorts of books they like. From that point, you could think about a couple of titles that are

like the things they've enjoyed before. This procedure, of suggesting content in light of its

attributes, is the fundamental element of content based filtering, the innovation behind Netflix

and Pandora's are the recommender system.

The content-based recommender system requires the related data about various accessible

things as the content alongside the profile of the client which must follow the client

preferences. Or in the content based recommender system, the past purchasing designs and

the senior ratings of the users along with the content of the item are collectively utilized

keeping in mind the end goal to arrive at predictions. This technique is recommended the

ideas to the users by making a comparison between the user preferences and the content

related to the items.

The basic thought behind the content-based recommender system is that the user interests can

be determined on the basis of features or properties of the items they have graded or used

previously. In the content based recommender system the explanations related to the target

item plays an essential role in order to make predictions. These explanations are termed as

Content. Content based systems, suggests items in view of a close examination between the

description of the items and a client's profile. The component of items is mapped with

highlight of clients keeping in mind the end goal to get client – item similarity. The best

coordinated pair’s will be recommended as suggestions. In the recommendations of

documents to the users such as articles, papers, web logs, web pages, publications and so on,
the content based recommender systems are regarded as the most successful filtering

technique.

A content based recommender systems work with the information that the client gives, either

explicitly, i.e. in the form of ratings or implicitly i.e. simply by snapping on a link. In the

illumination of that information, a customer profile is created by the system, which is then

utilized to prepare recommendations to the client. As the client gives more sources of

information or takes activities on the suggestions, the system turns out to be increasingly

accurate.

Figure 3 Recommendation Engine

The fundamental thought behind the content based recommender system is to provide the

recommendations to the users related to their choice. In CUST, user who needs articles to

study can visit their JBRC. Let’s assume that a user reads an article from JBR and then he

wants to review the more articles related to the article which he read before. Now there is a
need for a system through which the user can easily access the similar articles, so in order to

achieve that we can use the content based recommender system. These systems will use the

past reading patterns of the users to make suggestions of similar articles to the users. By

using the mathematical approach the accuracy of recommendations can be achieved. The

document similarity can be measure using three mathematical approaches.

 Jaccard Similarity

 Euclidean Distance

 Cosine Similarity

Jaccard Similarity

The ratio between the union and the intersection of two objects is known as Jaccard

similarity.

Figure 4 Jaccard Similarity


The union of two objects can be achieved by combining

the both given sets of objects while the intersection of

sets contains the only similar objects from the two sets.

The combinations of distinct items are collectively

known as set. For example, 2, 7, 9, d, a are some

distinctive objects but when we consider these objects

collectively they’ll form a set i.e. {2, 7, 9, d, a}. Figure 5 Set

Figure 6 Union and Intersection representation

Euclidean Distance

Euclidean distance is the difference between the two points in a plane. In mathematics, the

sum of the square over the root of all the given vectors is called the Euclidean distance.
Figure 7 Euclidean Distance

Euclidean Distance = √(𝑥2 − 𝑥1 )2 − (𝑦2 − 𝑦1 )2

Cosine Similarity

The closeness among the two non-zero vectors of the product or item is known as cosine

similarity which measures the cosine of the angles among the vectors. The value will be 1 for

the Cosine of 0° while the for all the other angles rather than 0° the value will be less than 1.

Cosine similarity is used to find the normalized dot product of the two vectors. The cosine

similarity determined the cosine between the angles. If the two vectors are parallel it means

the cosine similarity among these vectors is 1 while the vectors which makes an angle of 90°

will have the cosine similarity of 0.

𝑉(𝑑1 ). (𝑉𝑑2 )
𝑠𝑖𝑚(𝑑1 , 𝑑2 ) =
‖𝑉(𝑑1 )‖‖𝑉(𝑑2 )‖
Where, V denotes the vectors while the 𝑑1 𝑎𝑛𝑑 𝑡ℎ𝑒 𝑑2 represents the documents.

Figure 8 Cosine Similarity

All the above mentioned mathematical approaches can be used to measure the similarity

among the documents in order to make recommendations for the users. As we have a very

short time to implement our idea to introduce a recommender system for JBRC of CUST

that’s why we will focus only the cosine similarity at this stage. Cosine similarity is an

important technique of all which is used to make recommendations more accurate.

Vector Space Model

VSM is an algebraic model which is also termed as Term vector model. This model is used

for the representation of documents in the form of vectors. These vectors can be added to one

another and can also be multiplied by the scalar numbers. In this model the whole document

will be taken as the bag full of words and the main goal is to discover the more similar

documents. The documents are considered as vectors instead of texts. In a vector space model

each term of the document will have its own axis. We have to imagine all the documents as
vector in order to get an angle between them so that we can measure the difference by

calculating cosine of the angle which is used to measure the distance between the documents.

This distance will indicate the similarity between the document. Distance between he two

documents id either zeroes or positive, it can never negative. If the two documents makes an

angle of zero degree between them its means they are perfectly similar to each other.

Figure 9 Vectors with the angle of 0°

Here Vetor A is equal to {"H" , "E" , "L" , "L" , "O"} and the vector B is also equal to
{"H" , "E" , "L" , "L" , "O"}. These two vectors are parallel and the anglr between them is zero
and cos 0° = 1. While the angle between Vetor A {"H" , "E" , "L" , "L" , "O"} and the vector B
i.e. {"X" , "Y"} will be approximately 90° and cos 90° = 0. Which means that the documents
are totally different.

Figure 10 Vector makes an angle of 90°

What is a vector? Well, a vector is a quantity which has some direction and along with the

direction it has some magnitude as well.Weight, force, velocity and momentum are some
common examples of vector as they have the both the properties of vector i.e. magnitude and

the direction. Vector representation can be done in two and three both dimensions. A vector

is a simple single line, the length of that line is known as its magnitude while the orientation

of that line is known as its direction. The vector line has an arrowhead at one end of the line

which directs towards the direction of the vector.

Figure 12 Vector
Figure 11 Vector representation in space

Afetr making a clear understanding of vector now it’s the time to place the set of documents

in a space considering these documents as the vector. Vectors can be represented in more

than 100 dimentsional plane as each term of the document will have a separate dimension.
Figure 13 Documents representation as vector

In the diagram, 𝑉(𝑑1 ) and 𝑉(𝑑2 ) are the vectors obtained from the documents d1 and d2

while the V(Q) is the query vector. In space there are terms related to the documents but to

make it simple the only two terms are considered on each axis. The angle θ among the

document 1 and the query vector will determine the similarity of the document with the

query. Cosine of θ will be used to calculate the value of the angle between the document and

the query.

In information retrieval system, the term frequency and the inverse document frequency are

considered the important concepts. The term frequency denoted as Tf, is the frequency of the

specific terms in a particular document or simply it tells that how many times the particular

term appears in a document. While the inverse of the document is denoted by the IDF, which

considers the terms with the lowest frequency. Suppose a user make a search on Google for

“the rise of technology”, it is surely the term “the” will must have the high frequency than

the term “technology” but the importance of the term technology cannot be denied as well
from the query point of view. In such kind of situations, the tf-idf discredits the impact of

highly repeated words in a document in order to decide the significance of the document.

𝑁
𝑖𝑑𝑓 = log
𝑑𝑓

Where

N = No. of documents

df = frequency of particular word in total documents

So, if the specific term appears many times in a document, the will reduce its weight
automatically and similarly if a term appears few times in documents than its obvious for the
term to have a higher weight. In order to dampen the impact of high frequency terms, the idf
uses the log.

In the process of measuring the document similarity the most popular weight which is
considered is the dot product or combination of tf-idf.

𝑡𝑓 − 𝑖𝑑𝑓 = 𝑡𝑓 × 𝑖𝑑𝑓

So if the term t appears many times in a small number of document than it will have a high
weight and similarly if a term t appears less time in a maximum number of documents than it
means that the term have less weight. These tf-idf weights are used for the similarity
measure.

To measure the document similarity in a vector space model, we will use the cosine similarity

with tf-idf. The closeness among the two non-zero vectors of the product or item is known as

cosine similarity which measures the cosine of the angles among the vectors. The value will
be 1 for the Cosine of 0° while the for all the other angles rather than 0° the value will be less

than 1. Cosine similarity is used to find the normalized dot product of the two vectors. The

cosine similarity determined the cosine between the angles. If the two vectors are parallel it

means the cosine similarity between these vectors is 1 while the vectors which makes an

angle of 90° will have the cosine similarity of 0.

𝑉(𝑑1 ). (𝑉𝑑2 )
𝑠𝑖𝑚(𝑑1 , 𝑑2 ) =
‖𝑉(𝑑1 )‖‖𝑉(𝑑2 )‖

Here in the numerator there is a dot product of the two vector documents while the

denominator is the dot product of the Euclidean distance of the given vector documents.

Suppose there is a vector document denoted as 𝑑1 which has weights of, for example

𝑊1 , 𝑊2 , 𝑊3 and another vector document 𝑑2 contain two weights of 𝑋1 , 𝑋2

𝑑1 . 𝑑2 = 𝑊1 × 𝑋1 +, 𝑊2 × 𝑋2

It is the dot product of the two vector documents. There is no third weight of vector document

𝑑2 therefore, 𝑊3 × ∅ = 0

𝑑1 . 𝑑2 = 𝑊1 × 𝑋1 +, 𝑊2 × 𝑋2+ 0

While in the denominator there is a product of Euclidean distance of the two vector

documents i.e.

𝐸𝑢𝑐𝑙𝑖𝑑𝑒𝑎𝑛 𝐷𝑖𝑠𝑡𝑎𝑛𝑐𝑒 𝑜𝑓 𝑑1 = √ 𝑊1 2 + 𝑊2 2 + 𝑊3 2

&

𝐸𝑢𝑐𝑙𝑖𝑑𝑒𝑎𝑛 𝐷𝑖𝑠𝑡𝑎𝑛𝑐𝑒 𝑜𝑓 𝑑2 = √𝑋1 2 + 𝑋2 2


Methodology

Let’s work with the query which has the word gossip and the word jealous in it, so there are

two term vocabulary gossip and jealous. And we have a query that has both these terms, we

have a document d that has gossip in it but it does not have word jealous. If you plot the

vector for this document it is going to close y axis because the gossip access it has only the

word gossip in it approximately. And you have a document d3 which has a word jealous in it,

but not gossip so d1 has a gossip, but not a jealous and d3 has a jealous, but not gossip and

then we have d2 which has gossip and jealous both but both appearing multiple times. So if

we do not convert these vectors into a unit vector, then the vectors are going to look

something like this d2 is going to the extent far out into the d2 space.

Figure 14 Documents represented as vectors in a plane


The query has both these terms. This document has both of these terms, but if you look at the

Euclidian distance between the query and the second document its more than the distance

between the query and d3 and query and d1.even though both d1 and d3 have just one of the

two query terms. Where is d2 has a both the q terms still the distance between q and d2 is

longer, and that’s because we have not normalized these vector normalization is important

once we normalized then d2 it will become equal to the query vector. Now when it gets close

to Q, we can calculate the angle between the two vectors.

Actually, there is a two way to think about this one is, first do the normalization and compute

the dot product and another way to think about this is not do the normalization but then use

the formula of cosine theta. The cosine of theta between the query vector and the document

vector is the dot product of the vector document and the query vector over the mod of vector

document and the mod of query vector.

⃗ . ⃗⃗⃗⃗
𝑄 𝐷2
cos 𝜃 =
|𝐴||𝐵 ⃗|

AND IT’S A same thing of the cosine of the angle Q between Q vector and D vector is the

dot product in the Q and the document d divided by the magnitude of the Q vector and the

magnitude of the d vector (formula) so we can take this magnitude and combine this vector

and that will become unit vector. In the direction of Q likewise this vector d divided by the

magnitude of d has been just the unit vector in the direction of d i.e.

𝑄⃗
𝑄̂ =
⃗⃗⃗⃗⃗
|𝑄|
&

⃗⃗⃗⃗
𝐷2
̂=
𝐷
⃗⃗⃗⃗⃗⃗⃗
|𝐷2 |

We can either do the normalization or we can first take the dot product and then divide the

dot product by the product of magnitude. It does not matter which way we do it, but in either

case we will be measuring the angel between the two vectors almost specifically the cosine of

the angle and if we do that then we can see that 𝐷2 will appear somewhere near o the query

vector. So the angle between Q and 𝐷2 or even we are directly measuring the angle between

Q and 𝐷2 or even if we are directly measuring the angle without normalization, the angle

between Q and 𝐷2 is very small thus the angle between Q and 𝐷1 and the angle between Q

and 𝐷3 is large that means Q and 𝐷2 are closer to one another, then Q is closer to 𝐷1 or 𝐷3 .

So 𝐷2 is the most relevant document in this case.

The Cosine of the angle between the two vectors is already expressing the unit vectors we

just take the dot product. So if Q and D have already been length normalized then the cosine

of the angle between them is just a dot product and the dot product is nothing but the product

of their component added together. Take the product component by component and add it to

Q, D length normalized.

|𝑉|
⃗ .𝐷
cos(𝑄 ⃗ .𝐷
⃗ ) =𝑄 ⃗ =∑ 𝑄𝑖 𝐷𝑖
𝑖=1
And if these are not normalized than we take that actual dot product and divided by the

product of the magnitude of the two vectors. Than in this case,

⃗ .𝐷
𝑄 ⃗ ⃗ 𝐷
𝑄 ⃗ ∑|𝑉|
𝑖=1 𝑄𝑖 𝐷𝑖
⃗ ⃗
cos 𝑄 . 𝐷 = = . =
⃗ ||𝐷
|𝑄 ⃗ | |𝐷
⃗ | |𝑄 ⃗|
√∑|𝑉|
𝑖=1 𝑄𝑖
2
√∑|𝑉|
𝑖=1 𝐷𝑖
2


𝑄 ⃗
𝐷
 .
⃗ | |𝐷
⃗|
= Unit vectors
|𝑄

𝑄⃗ .𝐷

 ⃗ ||𝐷⃗|
= dot product
|𝑄

These vectors 𝑄𝑖 and 𝐷𝑖 are the vectors of tf-idf weight. The ith components of vectors Q viz

𝑄𝑖 is the tf-idf weight of the ith term in the vocabulary and similarly the ith components of

vectors D viz 𝐷𝑖 is the tf-idf weight of the ith term in the vocabulary which are being

⃗ .𝐷
multiplied with one another. I the above mentioned equation, cos 𝑄 ⃗ is the cosine similarity

of vector Q and vector D or eventually the cosine of the angle between vector Q and D.

If we convert the vectors into a unit vector than we will have a unit circle and then all the

vector align along this unit circle or in general along this v dimensional surface. All these

vactors will have the end point along with the unit circle.
Figure 15 Vectors represented in a v dimensional surface

Now lets take an example, suppose we have these three classical novel which are sense and

sensibility, pride and prejudice and wuthering heights. Our focus or attention will be on the

four random terms in the novels which includes affection, jealous, gossip, wuthering. In this

example we will not do the idf weighting. The term frequency count of the above mentioned

four terms in the novels are as follows:


SaS PaP WH

Terms Term frequency Term frequency Term frequency

affection 115 58 20

jealous 10 7 11

gossip 2 0 6

wuthering 0 0 38

Table 1 Cosine similarity amongst 3 documents

So these are three document and because there are four terms so we represent these four

documents in four dimension space and we will get three point for three document.

Now lets compute the tf score and idf score and than multiply the two.
Terms Log weighing of Log weighing of Log weighing of

Sas PaP WH

affection 1 + log10 115 1 + log10 58 = 2.76 1 + log10 20 = 2.30

= 3.06

jealous 1 + log10 10 = 2.00 1 + log10 7 = 1.85 1 + log10 11 = 2.04

gossip 1 + log10 2 = 1.30 0 1 + log10 6 = 1.78

wuthering 0 0 1 + log10 38 = 2.58

Table 2 Log frequency weighting

Now if we want to compute that how close these points are one another in v dimensional

space, we have to do length normalization also. Ofcourse we can directly compute angle

between the three letters but we just assume that first do length normalization before taking

cosine score.
Terms Sas PaP WH

affection 3.06 2.76 2.30


√(3.062 + 22 + 1.302 ) √(2.762 + 1.852 ) √(2.32 + 2.042 + 1.782 + 2.582 )

= 0.789 = 0.832 = 0.524

jealous 2 1.85 2.04


√(3.062 + 22 + 1.302 ) √(2.762 + 1.852 ) √(2.32 + 2.042 + 1.782 + 2.582 )

= 0.515 = 0.555 = 0.465

gossip 1.30 0 1.78


√(3.062 + 22 + 1.302 + 02 ) √(2.32 + 2.042 + 1.782 + 2.582 )

= 0.335 = 0.405

wutherin 0 0 2.58
√(2.32 + 2.042 + 1.782 + 2.582 )
g
= 0.588

Table 3 Length Normalization

We can actually verify unit length of the vectors. The sum of the square’s of magnitude over

the square root of the given vector will be 1.

 𝑆𝑎𝑆 = √(0.7892 + 0.5152 + 0.3352 + 02 ) = 1

 𝑃𝑎𝑃 = √(0.8322 + 0.5552 + 02 + 02 ) = 1


 𝑊𝐻 = √(0.5242 + 0.4652 + 0.4052 + 0.5882 ) = 1

So we have three vectors (SaS, PaP, WH) in four dimensional space. First, we will compute

the cosine score between two novels sense and sensibility and pride and prejudice. In this

case, the cosine score will be just given by the dot product of the given vectors because they

have already been normalized. So its just a simple dot product of SaS an PaP.

cos(𝑆𝑎𝑆, 𝑃𝑎𝑃) = {(0.789 × 0.832) + (0.515 × 0.555) + (0.335 × 0) + (0 × 0)} = 0.94

If we compute the cosine score between sense and sensibility and Wuthering Heights, the

cosine score will be 0.79, and if we compute the score between the pride and prejudice and

Wuthering heights we will get a score of 0.69. The cosine scores between sense and

sensibility and pride and prejudice to be quite high around 0.94, its higher than the other two

cosine scores. Now, mathematically it is proved that the SaS and the PaP are closer to each

their because the cosine of the angle between them is greater an the other one i.e.

𝑆𝑎𝑆, 𝑃𝑎𝑃 > 𝑆𝑎𝑆, 𝑊𝐻. In other word he reason behind their similarity might be that the

authors of both the novels is same.

The above discussed example uses the concept of normalization of vectors. Now we will

consider another example in which the similarity among the documents will be measured.

Suppose we have three documents such as

D1 = People injured in an accident

D2 = People are travelling in a bus


D3 = The bus has an accident.

And will also have a query vector. We have to remove he stop words from the query vector

so it can be considered s a bag of words.

Q = “people bus accident”

According to the above mentioned details, we have the total number f documents (n=3). Now

we will calculate the tf-idf wights.

𝑛
 𝑖𝑑𝑓 = log10
𝑑𝑓

 tf-idf = 𝑡𝑓 × 𝑖𝑑𝑓
Terms Term Frequency Weights= tf*idf

Q D1 D2 D3 df n/df IDF Q D1 D2 D3

people 1 1 1 0 2 1.5 0.1760 0.1760 0.1760 0.1760 0

injured 0 1 0 0 1 3 0.4771 0 0.4771 0 0

in 0 1 1 0 2 1.5 0.1760 0 0.1760 0.1760 0

accident 1 1 0 1 2 1.5 0.1760 0.1760 0.1760 0 0.1760

are 0 0 1 0 1 3 0.4771 0 0 0.4771 0

travelling 0 0 1 0 1 3 0.4771 0 0 0.4771 0

a 0 0 1 0 1 3 0.4771 0 0 0.4771 0

bus 1 0 1 1 2 1.5 0.1760 0.1760 0 0.1760 0.1760

the 0 0 0 1 1 3 0.4771 0 0 0 0.4771

has 0 0 0 1 1 3 0.4771 0 0 0 0.4771

an 0 1 0 1 2 1.5 0.1760 0 0.1760 0 0.1760

Figure 16 tf-idf weights

After calculating the tf-idf weights, we have to calculate the vector length which is actually
the Euclidean distance of the vectors.
|𝐷| = √∑ (𝑤𝑖,𝑗 2 )
𝑖

So the Euclidean length of the vectords;

|𝐷1 | = √(0.17602 + 0.47712 + 0.17602 + 0.17602 + 0.17602 ) = 0.5930

|𝐷2 | = √(0.17602 + 0.17602 + 0.47712 + 0.47712 + 0.47712 + 0.17602 ) = 0.8809

|𝐷3 | = √(0.17602 + 0.17602 + 0.47712 + 0.47712 + 0.17602 ) = 0.7404

|𝑄| = √(0.17602 + 0.17602 + 0.17602 ) = 0.3048

After calculating the Euclidean distance now we will compute the dot product of each
document vector with the query vector.

𝑄. 𝐷𝑖 = (𝑤𝑄,𝑗 × 𝑤𝑖,𝑗 )

The document 1 has the two terms similar to the query, i.e. people and accident so we will
add the dot products of the query vector and the document vector of term i.

𝑄. 𝐷1 = √(0.1760 × 0.1760) + (0.1760 × 0.1760) = 0.2489

The document 2 also has the two terms similar to the query, i.e. people and bus so we will
add the dot products of the query vector and the document vector of term i.
𝑄. 𝐷2 = √(0.1760 × 0.1760) + (0.1760 × 0.1760) = 0.2489

The document 3 also has the two terms similar to the query, i.e. accident and bus so we will
add the dot products of the query vector and the document vector of term i.

𝑄. 𝐷1 = √(0.1760 × 0.1760) + (0.1760 × 0.1760) = 0.2489

After computing the dot product, the final step to measure the similarity index between the
documents is to fine the cosine of the angle between the document vector and the query
vector.

Document 1

𝑄. 𝐷1
cos 𝜃(𝐷1 ) =
|𝑄| × |𝐷1 |

0.2489
cos 𝜃(𝐷1 ) =
0.3048 × 0.5930

cos 𝜃(𝐷1 ) = 1.3771

Document 2

𝑄. 𝐷2
cos 𝜃(𝐷2 ) =
|𝑄| × |𝐷2 |

0.2489
cos 𝜃(𝐷2 ) =
0.3048 × 0.8809

cos 𝜃(𝐷2 ) = 0.9270


Document 3

𝑄. 𝐷3
cos 𝜃(𝐷3 ) =
|𝑄| × |𝐷3 |

0.2489
cos 𝜃(𝐷3 ) =
0.3048 × 0.7404

cos 𝜃(𝐷3 ) = 1.1029

So we see that the document 𝐷1 has the greatest value of cosine 𝜃, so it is more similar to the
query document.
CHAPTER 04

RECOMMENDATIONS & CONCLUSION

Advantages of content based system

 Recommendations can be made on the basis of the user’s ratings, a unique

recommendation can be made according to user’s taste.

 Content based recommender system is more efficient as it uses only the content of

each item for making recommendations, so the problem of huge data does not occur

while using an algorithm.

 The biggest advantage of the content based recommender system is that its

methodology is easy and logical as compare to other systems.

 The Recommendations of the content based system are made through interest of a

single user so it does not require the interest of other users.

 Content based recommender system can give the logic of their recommendations. This

ability makes it more beneficial than other systems.

Drawbacks of content based recommender system

 The biggest drawback of the content based recommender system is the huge size of

data set for items. As content based use the sets of items that relates more to user

preferences, so it is hard to examine user’s choice for every item.


 The algorithm must use the content of each item browsed by users. It is difficult to

consider each and every term of the set in the content based system.

 The results of any recommender system are not 100% accurate, so it is not easy to

estimate the problems of the user accurately.

 The content based system is complex because there is no data which exactly define

the user’s interest. Choices of different users are unclear and vary time to time,

according to events in their lives.

 To implement the content based recommender system in an organization is very

expensive, as it requires the highly professional experts to manage the system which i

 This system increases the labour expenses that are not affordable for many businesses.

Conclusion

In this project we have studied Recommender system, the background, different types,

pioneers of recommender system, world famous examples of recommender system and how

does it work. Our main focus was JBR, we analyze the problems and flaws that exist in the

JBRC of CUST. We are just dumping the papers in JBRC as there is no recommender system

in it. In order to remove these shortcomings of JBR in practical we have made a prototype

example by using the vector space model which involves the cosine similarity in it. Due to

shortage of time and some other limitations, we could not make it in running form. We

identified that the JBR is lacking semantic system. The user may not find the relevant item,

because the system is not able to recommend exactly what user is looking for, or anything

that is away from the mind of the user at that moment but recommender system can diminish

all these flaws of JBR. It will work on the basis of past information. The user will type the
query and the system will allow the user to quickly access the articles which are closer to his

query. It will be helpful for the users and will also enhance the efficiency at work.

These recommender systems can also be utilized in a broader sense. The database can be

created or the searched articles and the system will rank the articles in the database on the

basis of user priority, in such a situation Knowledge based recommender systems will be

used. These systems are considered as intelligent as they will be able to change the priority

automatically by observing the change in user actions.

To introduce such type of system, it requires a huge time to work on and in this project we

are lacking with time so initially we have focused a very small portion. At this stage we will

consider the only abstracts of papers and compare their similarity using the vector space

model and cosine similarity.

You might also like