Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 2

Book based question

Strengths and Weaknesses of Neighborhood-Based Methods

Neighborhood methods have several advantages related to their simplicity and


intuitive approach. Because of the simple and intuitive approach of these methods,
they are easy to implement and debug. It is often easy to justify why a specific item
is recommended, and the interpretability of item-based methods is particularly
notable
The main disadvantage of these methods is that the offline phase can sometimes be
impractical in large-scale settings. The offline phase of the user-based method
requires at least O(m2) time and space. This might sometimes be too slow or
space-intensive with desktop hardware, when m is of the order of tens of millions.
Nevertheless, the online phase of neighborhood methods is always efficient. The
other main disadvantage of these methods is their limited coverage because of
sparsity. For example, if none of John’s nearest neighbors have rated Terminator, it
is not possible to provide a rating prediction of Terminator for John. On the other
hand, we care only about the top-k items of John in most recommendation settings.
If none of John’s nearest neighbors have rated Terminator, then it might be
evidence that this movie is not a good recommendation for John. Sparsity also
creates challenges for robust similarity computation when the number of mutually
rated items between two users is small
Q2.Efficient Implementation and Computational Complexity in
Neighborhood-based methods
Neighborhood-based methods are always used to determine the best item
recommendations for a target user or the best user recommendations for a target
item. The aforementioned discussion only shows how to predict the ratings for a
particular user-item combination, but it does not discuss the actual ranking process.
A straightforward approach is to compute all possible rating predictions for the
relevant user-item pairs (e.g., all items for a particular user) and then rank them.
While this is the basic approach used in current recommender systems, it is
important to observe that the prediction process for many user-item combinations
reuses many intermediate quantities.
Neighborhood-based methods are always partitioned into an offline phase and an
online phase. In the offline phase, the user-user (or item-item) similarity values and
peer groups of the users (or items) are computed. For each user (or item), the
relevant peer group is prestored on the basis of this computation. In the online
phase, these similarity values and peer groups are leveraged to make predictions
with the use of relationships be the maximum number of specified ratings of a user
(row), and be the maximum number of specified ratings of an item (column). Note
that is the maximum running time for computing the similarity between a pair of
users (rows), and the maximum running time for computing the similarity between
a pair of items(columns).

1. Cosine variant function on the raw ratings.

In some implementations of the raw cosine, the normalization factors in the


denominator are based on all the specified items and not the mutually rated items.
In general, the Pearson correlation coefficient is preferable to the raw cosine
because of the bias adjustment effect of mean-centering. This adjustment accounts
for the fact that different users exhibit different levels of generosity in their global
rating patterns.

2. What is significant rating


The reliability of the similarity function Sim(u, v) is often affected by the number
of common ratings |Iu ∩ Iv| between users u and v. When the two users have only
a small number of ratings in common, the similarity function should be reduced
with a discount factor to reduce in relative importance of that user pair. This
method is referred to as significance weighting.
3. What is bias adjustment
Bias creates consistent errors in the ML model, which represents a simpler ML
model that is not suitable for a specific requirement. On the other hand,
variance creates variance errors that lead to incorrect predictions seeing trends
or data points that do not exist.
4. Write about neighborhood-based collaborative filtering algorithms can be
formulated in one of two ways:
5. What is long tail rating
The "long tail" in the context of rating frequencies refers to the
distribution of ratings across items, where a few items receive a large
number of ratings (often high ratings), while many items receive
relatively few ratings (often low ratings). This distribution is
characterized by a long tail on the right side of the rating frequency
distribution graph.

You might also like