Data Science

Introduction:
A recommender system, or a recommendation system (sometimes replacing 'system'

with a synonym such as platform or engine), is a subclass of information ltering
system that seeks to predict the "rating" or "preference" a user would give to an item.[1][2]
Recommender systems are used in a variety of areas, with commonly recognised
examples taking the form of playlist generators for video and music services, product
recommenders for online stores, or content recommenders for social media platforms and
open web content recommenders.[3][4] These systems can operate using a single input, like
music, or multiple inputs within and across platforms like news, books, and search queries.
There are also popular recommender systems for speci c topics like restaurants
and online dating. Recommender systems have also been developed to explore research
articles and experts,[5] collaborators,[6] and nancial services.[7]
Recommender systems usually make use of either or both collaborative ltering and

content-based ltering (also known as the personality-based approach),[8] as well as other
systems such as knowledge-based systems. Collaborative ltering approaches build a
model from a user's past behavior (items previously purchased or selected and/or
numerical ratings given to those items) as well as similar decisions made by other users.
This model is then used to predict items (or ratings for items) that the user may have an
interest in.[9] Content-based ltering approaches utilize a series of discrete, pre-tagged
characteristics of an item in order to recommend additional items with similar properties.
[10] Current recommender systems typically combine one or more approaches into a hybrid
system.
The differences between collaborative and content-based ltering can be demonstrated by
comparing two early music recommender systems – Last.fm and Pandora Radio.
• Last.fm creates a "station" of recommended songs by observing what bands and
individual tracks the user has listened to on a regular basis and comparing those
against the listening behavior of other users. Last.fm will play tracks that do not
appear in the user's library, but are often played by other users with similar
interests. As this approach leverages the behavior of users, it is an example of a
collaborative ltering technique.
• Pandora uses the properties of a song or artist (a subset of the 400 attributes
provided by the Music Genome Project) to seed a "station" that plays music with
similar properties. User feedback is used to re ne the station's results,
deemphasizing certain attributes when a user "dislikes" a particular song and
emphasizing other attributes when a user "likes" a song. This is an example of a
content-based approach.
Each type of system has its strengths and weaknesses. In the above example, Last.fm
requires a large amount of information about a user to make accurate recommendations.
This is an example of the cold start problem, and is common in collaborative ltering
systems.[11][12][13][14][15] Whereas Pandora needs very little information to start, it is far more
limited in scope (for example, it can only make recommendations that are similar to the
original seed).
Recommender systems are a useful alternative to search algorithms since they help users
discover items they might not have found otherwise. Of note, recommender systems are
often implemented using search engines indexing non-traditional data.
fi
fi
fi
fi
fi
fi
fi
fi
fi
fi
fi
Recommender systems were rst mentioned in a technical report as a "digital bookshelf"
in 1990 by Jussi Karlgren at Columbia University,[16] and implemented at scale and worked
through in technical reports and publications from 1994 onwards by Jussi Karlgren, then at
SICS,[17] [18] and research groups led by Pattie Maes at MIT,[19] Will Hill at Bellcore,
[20] and Paul Resnick, also at MIT[21] [22] whose work with GroupLens was awarded the
2010 ACM Software Systems Award.

Montaner provided the rst overview of recommender systems from an intelligent agent
perspective.[23] Adomavicius provided a new, alternate overview of recommender systems.
[24] Herlocker provides an additional overview of evaluation techniques for recommender
systems,[25] and Beel et al. discussed the problems of of ine evaluations.[26] Beel et al.

have also provided literature surveys on available research paper recommender systems
and existing challenges.[27][28][29]
Recommender systems have been the focus of several granted patents. [30][31][32][33][34]
Recommender systems are tools designed for interacting with large and complex
information spaces and prioritizing items in these spaces that are likely to be of interest to
the user. This area of expertise, christened in 1995, has grown enormously in the variety of
problems addressed and techniques employed as well as in its practical applications.
Personalized recommendations are an important part of many on-line e-commerce
applications like Amazon.com, Netflix, and Pandora. The wealth of practical application
experience has become an inspiration for researchers to extend the reach of
recommender systems into new and challenging areas.
Recommender systems are so commonplace now that many of us use them without even knowing
it. Because we can't possibly look through all the products or content on a website, a
recommendation system plays an important role in helping us have a better user experience, while
also exposing us to more inventory we might not discover otherwise.
Some examples of recommender systems in action include product recommendations on Amazon,

Net ix suggestions for movies and TV shows in your feed, recommended videos on YouTube,
music on Spotify, the Facebook newsfeed and Google Ads.
An important component of any of these systems is the recommender function, which takes
information about the user and predicts the rating that user might assign to a product, for example.
Predicting user ratings, even before the user has actually provided one, makes recommender
systems a powerful tool.
Recommender systems possess immense capability in various sectors ranging from

entertainment to e-commerce. Recommender Systems have proven to be instrumental in
pushing up company revenues and customer satisfaction with their implementation.
Therefore, it is essential for machine learning enthusiasts to get a grasp on it and get
familiar with related concepts.
As the amount of available information increases, new problems arise as people are
nding it hard to select the items they actually want to see or use. This is where the
fi
fl
fi
fi

fl
recommender system comes in. They help us make decisions by learning our
preferences or by learning the preferences of similar users.
They are used by almost every major company in some form or the other. Net ix uses it to
suggest movies to customers, YouTube uses it to decide which video to play next on
autoplay, and Facebook uses it to recommend pages to like and people to follow.
This way recommender systems have helped organizations retain customers by providing
tailored suggestions speci c to the customer's needs. According to a study by McKinsey,
35 percent of what consumers purchase on Amazon and 75 percent of what they watch on
Net ix come from product recommendations based on such algorithms.
Why DATA Science? What are the leds used?
The scope of data science solutions grows exponentially each day. It is not surprising if
you think of them as tools designed to meet your specific business needs and optimize
particular business processes. Data science helps companies make better decisions, and
recommender systems help data scientists succeed in it.
Recommendation systems have impacted or even redefined our lives in many ways. One
example of this impact is how our online shopping experience is being redefined. As we
browse through products, the Recommendation system offer recommendations of
products we might be interested in. Regardless of the perspective — business or
consumer, Recommendation systems have been immensely beneficial. And big data is the
driving force behind Recommendation systems. A typical Recommendation system cannot
do its job without sufficient data and big data supplies plenty of user data such as past
purchases, browsing history, and feedback for the Recommendation systems to provide
relevant and effective recommendations. In a nutshell, even the most advanced
Recommenders cannot be effective without big data.
How does a Recommendation system work?
A Recommendation system works in well-defined, logical phases which are data collection,
ratings, and filtering. These phases are described below.
Data collection
fl

fi
fi

fl
Let us assume that a user of Amazon website is browsing books and reading the details.
Each time the reader clicks on a link, an event such as an Ajax event could be fired. The
event type could vary depending on the technology used. The event then could make an
entry into a database which usually is a NoSQL database. The entry is technical in content
but in layman’s language could read something like “User A clicked Product Z details
once”. That is how user details get captured and stored for future recommendations.
How does the Recommendation system capture the details? If the user has logged in, then
the details are extracted either from an http session or from the system cookies. In case
the Recommendation system depends on system cookies, then the data is available only
till the time the user is using the same terminal. Events are fired almost in every case — a
user liking a Product or adding it to a cart and purchasing it. So that is how user details are
stored. But that is just one part of what Recommenders do.
The following paragraphs show how Amazon offers its product recommendations to a user
who is browsing for books:
• As shown by the image below, when a user searched for the book Harry Potter and
the Philosopher’s Stone, several recommendations were given.
In another example, a customer who searched Amazon for Canon EOS 1200D 18MP
Digital SLR Camera (Black) was interestingly given several recommendations on camera
accessories
Ratings
Ratings are important in the sense that they tell you what a user feels about a product.
User’s feelings about a product can be reflected to an extent in the actions he or she takes
such as likes, adding to shopping cart, purchasing or just clicking. Recommendation
systems can assign implicit ratings based on user actions. The maximum rating is 5. For
example, purchasing can be assigned a rating of 4, likes can get 3, clicking can get 2 and
so on. Recommendation systems can also take into account ratings and feedback users
provide.
Filtering
Filtering means filtering products based on ratings and other user data. Recommendation
systems use three types of filtering: collaborative, user-based and a hybrid approach. In
collaborative filtering, a comparison of users’ choices is done and recommendations given.
For example, if user X likes products A, B, C, and D and user Y likes products A, B, C, D
and E, the it is likely that user X will be recommended product E because there are a lot of
similarities between users X and Y as far as choice of products is concerned.
Several reputed brands such as Facebook, Twitter, LinkedIn, Amazon, Google News,
Spotify and Last.fm use this model to provide effective and relevant recommendations. In
user-based filtering, the user’s browsing history, likes, purchases and ratings are taken into
account before providing recommendations. This model is used by many reputed brands
such as IMDB, Rotten Tomatoes and Pandora. Many companies also use a hybrid
approach. Netflix is known to use a hybrid approach.
Role of big data
As stated earlier, big data drives what Recommenders do primarily. Recommenders

cannot do a thing without the constant supply of data. However, the role of big data goes
beyond just data. It is clear that the above operations require a high-capacity CPU which
can work for hours. To realize this, Hadoop can be used. To reduce the manual work
needed to code, identify right algorithms, similarity methods and other tasks, Mahout could
be used.
Mahout is a library that comprises machine learning algorithms. It provides a set of options
to choose recommendation algorithm, choosing n-nearest neighbors and similarity
methods. Though it is a standard Java class, it operates purely on Hadoop.
To make your tasks even easier, you can use a tool known as PredictionIO which bundles
both Mahout and Hadoop and what more, it provides a nice user interface.
So, the role of big data can be summed in providing meaningful, actionable data fast and
providing necessary setup to quickly process the data. It is obvious that traditional
technologies are not meant to process such large volumes of data so quickly. So, it will not
suffice to just have big data in order to provide strong recommendations.
The Amazon use case
How Amazon uses the powerful duo of big data and Recommendation System is worth a
study. Amazon has been in certain ways a pioneer of ecommerce but more important than
that accolade is how it is driving its revenue up by providing more and more effective
recommendations.
Buying can be both impulsive and planned and Amazon is smartly tapping into the
impulsive shopper’s mind by providing relevant and useful product recommendations. For
that, it is relentlessly working on making its Recommendation engine more powerful.
Shopping has a connection with psychology. Shoppers buy for instant gratification, instant
mood uplift, social esteem and reasons not even known to them clearly.
Amazon is smart enough to take these factors into account. And now, it is working on a
system called predictive dispatch which means that its Recommendation engine can
predict what the customer is going to buy and make arrangements for a speedy dispatch.
What makes Amazon’s achievements more creditable is the fact that unlike Facebook —
which also relies a lot on big data — which knows a lot of details about its subscribers, all
Amazon knows about its customers are the spending patterns.
Amazon has been cashing on this knowledge smartly in an attempt to get more out of your
pockets. It is a difficult job to analyze spending patterns, likes, product preferences and

provide effective recommendations just on that basis. And now, Amazon is trying to make
available its tools and technologies that use big data and Recommendation systems so
effectively for sale to other corporations that use big data. So, Amazon’s product ads will
start to appear more frequently on other websites as well and that is going to drive up
sales.
The following image shows how big companies have been using the power of big data and
Recommendation engines.
Limits of Recommendation systems
For all their efficiencies, Recommendation Systems are not a full proof system.
Recommenders have been known to suffer from the following limitations:
• Recommenders depend totally on data and their hirers must constantly supply them
with large volumes of data. That is why; smaller firms are more disadvantaged then
the bigger firms such as Google and Amazon.
• Recommenders may find it difficult to exactly identify user choice patterns if the
user preferences tend to vary quickly, as in fashion. Recommenders depend a lot
on historic data but that may not be suitable for certain product niches.
• Recommenders face problems with unpredictable items. For example, there are
certain movie types that evoke extreme reactions such as love or hate. It is
extremely difficult to provide recommendations for such items.
Summary
While big data and Recommendation engines have already proved an extremely useful
combination for big corporations, it raises a question of whether companies with smaller
budgets can afford such investments. It is encouraging for such companies that big data
tools and technologies are relatively more affordable. Product recommendations are
extremely important to provide a good user experience from the customer’s viewpoint.
Also, from the company’s viewpoint, it takes into account unknown factors that can make a
customer buy products which might seem unlikely. As the above image shows, the power
of Recommenders is getting bigger.
Bio: Kaushik Pal (www.techalpine.com) has 16 years of experience as a technical

architect and software consultant in enterprise application and product development. He
has interest in new technology and innovation area along with technical writing. His main
focuses are on web architecture, web technologies, java/j2ee, Open source, big data and
semantic technologies.
Types of data utilised by recommender systems.
In addition to relationships, recommender systems utilize the following kinds of data:
User Behavior Data
Users behavior data is useful information about the engagement of the user on the product. It can be
collected from ratings, clicks and purchase history.
User Demographic Data
User demographic information is related to the user’s personal information such as age, education,
income and location.
Product Attribute Data
Product attribute data is information related to the product itself such as genre in case of books, cast
in case of movies, cuisine in case of food.
HOW DO WE PROVIDE DATA FOR RECOMMENDER SYSTEMS?
Data can be provided in a variety of ways. There are two particularly important methods, explicit
and implicit rating.
Explicit Ratings
Explicit ratings are provided by the user. They infer the user’s preference. Examples include star
ratings, reviews, feedback, likes and following. Since users don't always rate products, explicit
ratings can be hard to get.
Implicit Ratings
Implicit ratings are provided when users interact with the item. They infer a user’s behavior and
are easy to get as users are subconsciously clicking. Examples include clicks, views and purchases.
(Note: Views and purchases can be a better entity to recommend as users will have spent time and
money on what is most crucial for them.)
Product Similarity (Item-Item Filtering)
Product similarity is the most useful system for suggesting products based on how much the user
would like the product. If the user is browsing or searching for a particular product, they can be
shown similar products. Users often expect to nd products they want quickly and move on if they
have a hard time nding the relevant product. When the user clicks on one product we can
show another similar product, or if the user buys the product we can email the user advertisements
or coupons based on a similar product. Product similarity is particularly useful when we don’t know
much about the user yet, but we do know what products they're viewing.
fi
fi
Amazon suggesting similar products.
User Similarity (User-User Filtering)
User similarity is for checking the difference between the similarity of two users. If two users have
similar preferences for a product we can assume they have similar interests. It’s like a friend
recommending a product.
User Similarity
Amazon Customer Similarity

One shortcoming of user similarity, however, is that it requires all the user data to suggest products.
It’s called a cold start problem because beginning the recommendation process requires previous
data from users. A newly launched e-commerce website, for example, suffers from the cold start
problem because it doesn't have a large number of users.
Product similarity doesn’t have this problem because it just requires product information and the
user’s preference. Net ix, for example, avoids this issue by asking users their likes when starting a
new subscription.
Similarity measure methods :

minkowski distance :When the dimension of a data point is numeric, the general
form is called the Minkowski distance.
Manhattan distance :The distance between two points measured along axes at right
angles.
fl
Manhattan Distance
Euclidean Distance: The square root of the sum of squares of the difference between the
coordinates and is given by Pythagorean theorem.
Cosine Similarity: Measures the cosine of the angle between two vectors. It is a judgment of
orientation rather than magnitude between two vectors with respect to the origin. The cosine of 0
degrees is 1 which means the data points are similar and cosine of 90 degrees is 0 which means data
points are dissimilar.
Pearson Coef cient: It is a measure of correlation between two random variables and ranges
between [-1, 1].
Pearson Correlation Coef cient

fi
fi
Hamming Distance: All the similarities we discussed were distance measures for continuous
variables. In the case of categorical variables, Hamming distance must be used.
DATA MINING IN RECOMMENDER APPLICATIONS

The term data mining refers to a broad spectrum of mathematical modeling techniques and
software tools that are used to nd patterns in data and user these to build models. In this
context of recommender applications, the term data mining is used to describe the collection
of analysis techniques used to infer recommendation rules or build recommendation models
from large data sets. Recommender systems that incorporate data mining techniques make
their recommendations using knowledge learned from the actions and attributes of users.
These systems are often based on the development of user pro les that can be persistent
(based on demographic or item “consumption” history data), ephemeral (based on the actions
during the current session), or both. These algorithms include clustering, classi cation
techniques, the generation of association rules, and the production of similarity graphs
through techniques such as Horting.
Science Behind Recommendations
There are three major types of recommender systems:
• Content-based ltering
• Collaborative ltering
• Hybrid recommender systems
These methods can rely on user behavior data, including activities, preferences, and likes, or
can take into account the description of the items that users prefer, or both.
Content-based ltering
This method works based on the properties of the items that each user likes, discovering
what else the user may like. It takes into account multiple keywords. Also, a user pro le is
fi
fi
fi

fi

fi

fi
fi
designed to provide comprehensive information on the items that a user prefers. The system
then recommends some similar items that users may also want to purchase.
content-based ltering
Collaborative ltering
Recommendation engines can rely on likes and desires of other users to compute a similarity
index between users and recommend items to them accordingly. This type of ltering relies
on user opinion instead of machine analysis to accurately recommend complex items, such
as movies or music tracks.
collaborative ltering
fi
fi
fi

fi

The collaborative ltering algorithm has some speci cs. The system can search for look-alike
users, which will be user-user collaborative ltering. So, recommendations will depend on a
user pro le. But such an approach requires a lot of computational resources and will be hard
to implement for large-scale databases.
Another option is item-item collaborative ltering. The system will nd similar items and
recommend these items to a user on a case-by-case basis. It is a resource-saving approach,
and Amazon utilizes it to engage customers and improve sales volumes.
Hybrid recommender systems

It is also possible to combine both types to build a more prosperous recommendation engine.
This method is used to generate collaborative and content-based predictions and pull them
all together to increase performance.
We have already mentioned Net ix, and this provider of media services uses a hybrid system
to win customer loyalty. Users get movie recommendations based on their habits and the
characteristics of content they prefer.
fi
fi
fl

fi
fi
fi
fi

Data Science

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Data Science

Uploaded by

Copyright:

Available Formats

Introduction:

A recommender system, or a recommendation system (sometimes replacing 'system'

Recommender systems usually make use of either or both collaborative ltering and

2010 ACM Software Systems Award.

systems,[25] and Beel et al. discussed the problems of of ine evaluations.[26] Beel et al.

Some examples of recommender systems in action include product recommendations on Amazon,

Recommender systems possess immense capability in various sectors ranging from

Why DATA Science? What are the leds used?

How does a Recommendation system work?

Role of big data

As stated earlier, big data drives what Recommenders do primarily. Recommenders

The Amazon use case

Limits of Recommendation systems

Bio: Kaushik Pal (www.techalpine.com) has 16 years of experience as a technical

Types of data utilised by recommender systems.

In addition to relationships, recommender systems utilize the following kinds of data:

User Behavior Data

User Demographic Data

Product Attribute Data

HOW DO WE PROVIDE DATA FOR RECOMMENDER SYSTEMS?

Product Similarity (Item-Item Filtering)

Amazon Customer Similarity

Similarity measure methods :

Pearson Correlation Coef cient

DATA MINING IN RECOMMENDER APPLICATIONS

Science Behind Recommendations

There are three major types of recommender systems:

Hybrid recommender systems

You might also like