Professional Documents
Culture Documents
Bayesian Adaptive User Profiling With Explicit and Implicit Feedback
Bayesian Adaptive User Profiling With Explicit and Implicit Feedback
Feedback
ABSTRACT 1. INTRODUCTION
Research in information retrieval is now moving into a per- Although ad hoc retrieval systems have become part of
sonalized scenario where a retrieval or filtering system main- the daily life of internet users or digital library users, it is
tains a separate user profile for each user. In this framework, clear that there is great variety in users’ needs. Thus, such
information delivered to the user can be automatically per- systems can not offer the best possible service since they are
sonalized and catered to individual user’s information needs. not tailoring information to individual user needs.
However, a practical concern for such a personalized system IR research is now moving into a more complex environ-
is the “cold start problem”: any user new to the system ment with user centered or personalized adaptive informa-
must endure poor initial performance until sufficient feed- tion retrieval, information filtering and recommendation sys-
back from that user is provided. tems as major research topics. As opposed to traditional ad
To solve this problem, we use both explicit and implicit hoc systems, a personalized system adaptively learns a pro-
feedback to build a user’s profile and use Bayesian hierarchi- file for each user, automatically catering to the user’s specific
cal methods to borrow information from existing users. We needs and preferences [1].
analyze the usefulness of implicit feedback and the adap- To learn a reliable user specific profile, an adaptive sys-
tive performance of the model on two data sets gathered tem usually needs a significant amount of explicit feedback
from user studies where users’ interaction with a document, (training data) from the user. However, the average user
or implicit feedback, were recorded along with explicit feed- doesn’t like to answer a lot of questions or provide explicit
back. Our results are two-fold: first, we demonstrate that feedback on items they have seen. Meanwhile, a user does
the Bayesian modeling approach effectively trades off be- not want to endure poor performance while the system is
tween shared and user-specific information, alleviating poor “training” and expects a system to work reasonably well
initial performance for each user. Second, we find that im- as soon as he/she first uses the system. Good initial per-
plicit feedback has very limited unstable predictive value by formance is an incentive for the user to continue using the
itself and only marginal value when combined with explicit system. Thus an important aspect of personalization is to
feedback. develop a system that works well initially with less explicit
user feedback.
Much prior research has been carried out exploring the
Categories and Subject Descriptors usefulness of implicit feedback [10] because it is easy to col-
H.3.3 [Information Storage and Retrieval]: Informa- lect and requires no extra effort from the user. On the other
tion Search and Retrieval – Information filtering, relevance hand, user independent systems perform reasonably well for
feedback most users, mainly because the system parameters have been
tuned to a reasonable value based on thousands of existing
users. These observations suggest at least two possible ap-
General Terms proaches to improve the early stage performance of a per-
Algorithms, Human Factors sonalized system: using cheap implicit feedback from the
user and borrowing information from other users. Both ap-
Keywords proaches may help the system reduce its uncertainty about
the user, especially when the user just starts using the sys-
Information Retrieval, User Modeling, Bayesian Statistics, tem and has not provided much explicit feedback.
Implicit Feedback This paper explores both directions under a single, uni-
fied Bayesian hierarchical modeling framework. Intuitively,
combining the two approaches under this framework may
achieve a nice tradeoff between bias and variance trade off,
Permission to make digital or hard copies of all or part of this work for
especially for a new user. This is because including im-
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies plicit feedback decreases the bias and increases variance of
bear this notice and the full citation on the first page. To copy otherwise, to the learner, while borrowing information from other users
republish, to post on servers or to redistribute to lists, requires prior specific through a Bayesian prior in our framework has the inverse
permission and/or a fee. effect. We demonstrate, empirically, that the hierarchical
CIKM’06, November 5–11, 2006, Arlington, Virginia, USA. model controls the tradeoff between shared and personal in-
Copyright 2006 ACM 1-59593-433-2/06/0011 ...$5.00.
formation, thereby alleviating poor initial performance. We user interacts with the system and provides explicit and im-
also evaluate the long-term usefulness of different types of plicit feedback about documents the user has read. Let a
feedback for predicting a user’s rating for a document. d-dimensional random variable xu represent the information
The paper is organized as follows. We begin with a brief about a document for user u. Each dimension of xu corre-
review of related work in Section 2. We then describe the sponds to a feature, and the features could be user actions
Bayesian hierarchical modeling approach to learn user spe- on the document, the document relevance scores the system
cific models (Section 3) and introduce a computationally derives from the user’s explicit feedback on other documents,
efficient model, the Hierarchical Gaussian network, as an user-independent features of the document, and so on. The
example to be used in our experiments (Section 4). Section user u may provide a rating y u to a document1 xu according
5 and 6 describe the experimental methodology and results to the user model f u . For now, a model is a function that
on two real user study data set. Section 7 concludes. takes information about a document and returns an estima-
tion of whether the user likes the document or not (rating):
2. RELATED WORK f u : xu → y u . The functional form for f u varies for dif-
ferent learning system, and we delay this discussion until
Implicit feedback is a broad term including any kind of
Section 4. We make no assumption about the distribution
natural interaction of a user with a document [16]. Exam- of documents.
ples of implicit feedback are mouse and keyboard usage, page A Bayesian based learning system begins with a certain
navigation, book-marking, and editing. While the focus of prior belief, P (f |θ), about the distribution of user models.
much emerging research, there is still some debate over the In the simplest terms the hierarchical Bayesian framework
value of implicit feedback. For example, it has been ob-
can be written as
served in numerous controlled studies [3, 12, 6, 4] that there
is a clear correlation between the time a user spends view- fu ∼ P (f |θ)
ing a document and how useful they found that document. y = f u (x)
However, Kelly, et al. [9] demonstrate, in a more natural-
istic setting, that this correlation varies significantly across and is also illustrated in Figure 1 Note that this is a very
tasks and conclude that the usefulness of implicit feedback general framework and can accommodate any class of func-
is conditioned heavily on what type of information the user tions for which a reasonable prior can be specified. This
is seeking. Standard machine learning techniques such as includes any function parameterized by real number (e.g.
SVMs [8], neural networks [12], and graphical models [4][18] SVMs, neural networks with fixed topology) and regression
have been used to explore the usefulness of implicit feedback, trees [2].
with varying degrees of success. In our work we use a hi- A personalized system tries to learn the user model f u for
erarchical Bayesian hierarchical model to integrate implicit a user u. As the user uses the system, the system receives
feedback. a sample of document-rating pairs, Du = {(xui , yiu )|i =
The idea of borrowing information from other users to 1 . . . Nu }, where Nu is the number of training examples for
serve the information needs of a user is widely studied in user u. Using Baye’s Rule, the system can update its belief
the area called collaborative filtering and Bayesian hierar- about the user model based on the data and get the posterior
chical modeling has applied in this context [17]. The major distribution over user models:
differences in our work are: 1) we focus on developing com- P (Du |f u , θ)P (f u |θ)
plex user models that go beyond relevance based content P (f u |θ, Du ) =
P (Du |θ)
model by including other explicit feedback (such as topic
Nu
familiarity and readability) and implicit feedback (such as P (f u (xui ) = yiu |f u )
= P (f u |θ)
Y
user actions and system context); 2) we focus on the adaptive P (f u (xui ) = yiu |θ)
i=1
learning environment and analyze the online performance of
the learning system. This analysis makes it clear how an u
adaptive system can benefit from the hierarchical Bayesian To find the maximum a posteriori (MAP) model, fM AP ,
framework; and 3) we use a Gaussian network (Gaussian we can ignore the denominator since it is the same for all
user models with Gaussian prior), while the prior work uses models. Then we have:
different functional forms, such as multinomial user models u
fM = argmaxf P (f |θ, Du )
AP
with Dirichlet prior [17]. Our method is more computation-
Nu
ally efficient and also has less privacy concerns while sharing
P (f (xui ) = yiu |f )
Y
information across users. = argmaxf P (f |θ)
i=1
= argmaxf log P (f |θ) +
3. HIERARCHICAL BAYESIAN
Nu
FRAMEWORK X
log P (f (xui ) = yiu |f ) (1)
One concern about using user-specific models, or profiles, i=1
is that there is an initial period where the user must endure
Equation 1 shows clearly how the fitness of a model is de-
poor performance. On the other hand, existing user inde-
composed into the model’s prior probability and the data
pendent systems seem to work well by tuning parameters for
likelihood given the model.
the general public. This motivates us to borrow information
Incorporating a prior distribution automatically manages
from existing users to serve a new user. Here we adopt a
the trade off between generic user information and user spe-
principled approach for doing this based on Bayesian hier-
archical models. 1
This paper sometimes refers to xu as simply a “document”
Let f u represent the model of a user u. Over time, the even though it has features of the user’s interaction with it.
cific information based on the amount of training data avail-
able for the specific user. When a user u starts using a
system, the system has almost no information about the
user. Because Nu for the user is small, the prior distribu-
tion p(f |θ) is a major contributor when estimating the user
model fˆ (first term in Equation 1). If the system has a
reliable prior, it can perform well even with little data for
the user. How can we find a good prior for f ? We treat
user u’s model f u as a sample from the distribution P (f |θ).
Given a set of user models, we can choose θ to maximize
their likelihood.
As the system gets more training data for this particu-
u
lar user, fM AP is dominated by the data likelihood (second
term in Equation 1) and the prior learned from other users
3.5
3
1.5
2.5
MSE
MSE
2
1.5
1
1
0.5
0.5 0
0 50 100 150 200 250 300 0 10 20 30 40 50 60 70 80 90
P vs A P vs A
P vs G P vs G
P vs N P vs N
(P)rior (N)o Prior (G)eneric (A)verage (P)rior (N)o Prior (G)eneric (A)verage
Figure 2: Mean squared error for ZID of model Figure 3: Mean squared error for the first 100 doc-
with Prior, with No prior, Generic user-independent uments in ZID . See Figure 2 caption.
model, and the Moving Average estimator. The bot-
tom part of plot shows the results of a 95% confi- Performance (Micro) Over Time with Implicit Feedback Only
1.6
dence t-test comparing the Prior model to the oth-
ers. A solid block indicates that the Prior performed 1.5
1.2
1.1
or if the user has a new interest. User bias in rating has
1
been previously observed and explicitly modeled [14]. Our
results not only confirm the prior work but also suggest a 0.9
shows that the Prior and No Prior models are better than
the Moving Average when sufficient training examples (more Figure 4: Results for ZI . See Figure 2 caption.
than 200) are available. This suggests that a large amount
of implicit feedback from a particular user could be useful,
which is consistent with Teevan, et al.[15]. ing a page and how useful they thought the page was. Our
Results for Claypool’s data set (C) are shown in Figure 5. results support Kelly’s conclusions despite the fact that over
The range of the x-axis is much smaller than that of Figure the entire data set C, with all user’s aggregated, there is a
2, because the number of examples per user was far fewer slightly positive correlation between the time spent viewing
in this study. The hierarchical user model performs similar a page and user rating.
to the moving-average, while the non prior model performs Finally, in order to compare the performance across dif-
worse than. This might due to the fact that the system has ferent feature sets we give a side-by-side plot of the hierar-
not accumulated enough implicit feedback to make them chical model performance on the data sets ZI , ZID , and ZD
useful. Thus, the relative performance of the models is sim- (Figure 6). It shows that the implicit feedback is unreliable
ilar to that of the early stage of ZI , which also only contains by itself and has only marginal value when combined with
implicit feedback. Compared with ZI , the absolute perfor- explicit feedback.
mance on this data set is worse. The task of Claypool’s user
study was unrestricted, thus the correlation between implicit 6.1 Further Discussion
feedback and user rate is smaller than ZI . This is not sur- In order to better understand why implicit feedback does
prising since previous studies on the usefulness of implicit not help at the early stage of learning and becomes useful at
feedback have reported both negative and positive results the later stage when there is a large amount of data, we can
[9][13]. In particular Kelly and Belkin showed there to be decompose the generalization error of a learning algorithm
very little correlation across tasks between time spent view- into 3 parts:
Performance (Micro) Over Time with Implicit Feedback Only (Claypool)
5 Hierarchical Model Performance (Micro Average) Over Time
1.6
4.5
1.5
4 1.4
3.5 1.3
3 1.2
MSE
MSE
2.5 1.1
2 1
1.5 0.9
1 0.8
0.5 0.7
8. ACKNOWLEDGMENTS
This research was funded in part by Google Research
Award. Any opinions, findings, conclusions or recommen-
dations expressed in this paper are the authors’, and do not
necessarily reflect those of the sponsors.