Electronic Commerce Research and Applications: Mehmet Türkay Yoldar, U Ğur Özcan T

Electronic Commerce Research and Applications 35 (2019) 100857
Contents lists available at ScienceDirect
Electronic Commerce Research and Applications

journal homepage: www.elsevier.com/locate/elerap
Collaborative targeting: Biclustering-based online ad recommendation T

a,⁎ b
Mehmet Türkay Yoldar , Uğur Özcan
a
Department of Management Info. Sys., Institute of Informatics, Gazi University, 06680 Ankara, Turkey
b
Department of Industrial Engineering, Faculty of Engineering, Gazi University, 06570 Ankara, Turkey
A R T I C LE I N FO A B S T R A C T
Keywords: In online advertising, it is essential to show appropriate ads to target users. However, this is a challenging
Behavioral targeting process. Although conventional targeting methods yield successful results, they cannot effectively select different
Biclustering ads for all users. In this study, we explore collaborative filtering techniques on an online ad dataset. We propose
Collaborative filtering a method of recommending different and effective ads to users. The proposed method, which is based on bi-
Computational advertising
clustering and ordered weighted average aggregation operators, can address situations such as the lack of im-
Online advertising
Ordered weighted averaging
plicit feedback on items. We present the results of an offline analysis of the proposed method together with those
Recommender systems of collaborative filtering methods. It is shown that collaborative filtering methods are beneficial, and that the
proposed method provides superior results, especially in systems where user navigation histories are well
known.
1. Introduction field of computational advertising. The methods for CTR are only a tiny
part of this area. In a complex ecosystem, computational advertising,
Online advertising (sometimes called Internet ads) is one of the most which tries to model the journey from an ad impression to possibly a
effective ways of broadening the reach, finding new customers, creating purchase process (Rajan, 2017), is a scientific subdiscipline of online
new revenue sources, and increasing recognition for all sizes of busi- ads where information retrieval, statistical modeling, machine learning,
nesses. Online ad revenues, which range from display ads to paid social and optimization meet (Dave and Varma, 2014). The main problem of
media promotions, increased to USD 72.5 billion in 2016, and by ap- computational advertising is to find the best ads for a given user in a
proximately 14% (to USD 83 billion) in 2017 (IAB, 2018). particular context (Broder, 2008).
The primary stakeholders in online ads are users, advertisers, pub- Targeting is one of the main steps used to make ad impressions
lishers, and ad delivery networks. Advertisers want to promote the meaningful and valuable. The most common targeting methods used in
products or services they want to advertise through the Internet sites computational advertising are contextual targeting, demographic tar-
(or mobile applications) of the publishers. Given the amount and di- geting, location-based targeting, and behavioral targeting.
versity of advertisers and publishers, there is a need for third-party Currently, the effectiveness of targeting techniques has increased
service providers and ad delivery networks to facilitate this collabora- considerably through the use of big data; however, there are still some
tion (Chen, et al., 2016). problems. An ad cannot be adequately displayed unless its content is
Each display of an ad is called an impression, each successful inter- associated with either user history or website content. Ad impressions
action between an ad and a user is called click-through (or click), and the based on demographic data (such as age, location, gender, and income)
number of clicks per impression is called the click-through rate (CTR) remain superficial when not combined with other methods. The central
(Rosenkrans, 2007). question in this study is how to show different ads to users beyond con-
The first examples of banner ads that appeared in 1993 are contextual and behavioral targeting. We investigate the use of recommender
sidered to be the oldest standard format used (Chen et al., 2016). De- systems, specifically collaborative filtering techniques, on the online ad
spite the enormous growth of online ads from that time, CTRs between domain. This study examines one additional research question: How do
2% and 4% (Kazienko and Adamski, 2007) at the beginning of the we recommend ads to new users with recommender systems on a sparse
2000s have fallen sharply, with a current average of 0.05% dataset? Users mostly avoid interacting with ads, and this makes it
(SmartInsights, 2018). challenging to collect implicit feedback. The cold start problem is one of
The methods used for increasing CTRs can be evaluated under the the known issues for recommender systems, and the ad dataset is very
⁎
Corresponding author.
E-mail address: mehmetturkay.yoldar@gazi.edu.tr (M.T. Yoldar).
https://doi.org/10.1016/j.elerap.2019.100857
Received 18 December 2018; Received in revised form 8 May 2019; Accepted 8 May 2019
Available online 11 May 2019
1567-4223/ © 2019 Elsevier B.V. All rights reserved.
M.T. Yoldar and U. Özcan Electronic Commerce Research and Applications 35 (2019) 100857
sparse by nature. Collaborative filtering algorithms can be classified into two main
In this study, we propose a new model and compared it to state-of- categories: neighborhood approaches and latent factor models (Koren
the-art recommendation methods. Behavioral targeting is one of the and Bell, 2015). In neighborhood approaches (also called memory-based
methods used in computational advertising, and collaborative filtering algorithms), the entire rating collection is utilized to make re-
is a state-of-the-art technique of recommendation systems. The pro- commendations or predict the rank of items. User-based collaborative
posed method aims to solve the problem of effective ad impressions, filtering (UBCF) algorithms gather item ratings from similar users to
taking into consideration situations in which user navigation histories make recommendations (Schafer et al., 2007), whereas item-based
are known, but where their interaction with ads is low (or none ex- collaborative filtering (IBCF) algorithms find similar items, and then
istent). While conventional targeting methods consider the individual recommend them to target users (Sarwar et al., 2001; Schafer et al.,
history of users, the proposed method creates collaborative re- 2007).
commendations. The utility matrix required by collaborative filtering In contrast, latent factor models (also referred to as model-based al-
methods reduces the quality of referrals if the target user does not click gorithms) generate recommendations by first building a model, which
on ads at all. To handle this dilemma, the proposed method is based on is performed by a variety of machine learning algorithms such as matrix
biclustering and ordered weighted averaging (OWA) aggregation opera- factorization. The weighted regularized matrix factorization (WRMF)
tors. The user profiles required are obtained by web usage mining, (Hu, et al., 2008; Pan et al., 2008) method is a well-known matrix
while users are clustered through biclustering. factorization algorithm for implicit feedback datasets. The Bayesian
In contrast to other similar works, the information that is input to personalized ranking (BPR) (Rendle et al. 2009) method, which is the
the web usage mining is not user interactions on a single website or maximum posterior estimator from a Bayesian analysis, treats Top-N
application, but all navigation history and behavior. Ad recommenda- recommendation as a ranking problem—it can be combined with other
tion is performed by using the OWA operator upon a joint decision by methods such as matrix factorization (BPRMF). Sparse linear methods
users in the bicluster. To the best knowledge of the authors, there is no (SLIMs) are state-of-the-art recommendation approaches based on ma-
similar research in the literature on recommending ads to users who do trix factorization that rely on a regularized l1-norm and l2 -norm opti-
not interact with the ads at all, and recommending ads beyond well- mization for top-N recommender systems (Ning and Karypis, 2011).
known targeting methods. At the same time, the lack of personally Sparsity is one of the major challenges that downgrades the per-
identifiable information in both the input data and created user profiles formance of recommender systems. A dataset often has a massive
suggests that the developed method may be an alternative for today's number of items but insufficient user reviews; thus, the possibility of
rising privacy concerns. finding similar users is minimal (Lü et al., 2012). There are recent
studies on reducing sparsity by using transfer learning for collaborative
2. Background filtering (Li et al., 2009a,b) or cross-domain recommendations
(Fernández-Tobías et al., 2012; Gao et al., 2013; Li, 2011). Another
This section presents the underlying concepts of this work and a challenge, the cold start problem, is an issue where the recommender
brief literature review covering fundamental and recent research on system cannot predict or make a recommendation effectively to new
behavioral targeting, collaborative filtering, user profiling, web usage users. New items (or even new domains) suffer from insufficient data
mining, biclustering, and OWA aggregation operators. (Schafer et al., 2007). The solutions to this problem are building hybrid
techniques (Adomavicius and Tuzhilin, 2005), tracking user interac-
2.1. Behavioral targeting tions on different sites (Lü et al., 2012), and using cross-domain re-
commender systems (Fernández-Tobías et al., 2012).
Behavioral targeting can be described as an effective way of showing Because our datasets are very sparse, our proposed method is em-
ads by evaluating information gathered from navigation patterns, such powered from cross-domain recommendation. One task of the cross-
as user-visited pages and web searches (Levene, 2011; Yan et al., 2009). domain recommender system is learning about users and items in the
Retargeting, which can be considered as a type of behavioral targeting, is source domain to increase the quality of the recommendations for items
defined by Goldfarb (2014) as an ad impression to the user through the in the target domain (Fernández-Tobías et al., 2012). Similarly, in our
content that they have already searched for or seen. For example, a proposed method, user profiling is performed on the source domain;
person who looks for a mobile phone on an e-commerce site may en- then, the recommendations are made in the target domain. In the next
counter mobile phone ads for that site on other websites. Similarly, subsection, we provide a brief review of user profiling.
people searching for a mobile phone on a search engine may encounter
mobile phone ads on other sites they visit, even if they do not visit any 2.3. User profiling
shopping sites.
All stakeholders in the field of online ads collect and store a large The ultimate goal of any system adapted to users is to provide what
amount of information about users. Stakeholders who collect this in- they need without explicitly asking it from them (Mulvenna et al.,
formation use them to conduct research for improving their products. 2000). The most crucial step in targeting online ads is the user profile.
User past navigation behavior is also crucial for the implementation of Existing personalization methods are based on direct or indirect ex-
behavioral targeting methods. This information can be used to reveal pressions of user interest (Mobasher, et al., 2001). The data required
the general characteristics of users, such as gender, the tendency to from the user can either be extracted from past actions, or demanded
purchase sporting goods, or whether they plan for a vacation (Evans, directly. To create a user profile, it may be preferable to generate
2009). probability-based models from user sequential actions, make time-
based weighting based on activity type and content, consider the
2.2. Collaborative filtering number of events in specific categories, or use clustering and other
techniques (Mobasher et al., 2002; Schiaffino and Amandi, 2009). To
Collaborative filtering allows users to provide ratings about a col- obtain the necessary data that can be used in creating a user profile,
lection of items when sufficient information is gathered on the system; implicit statements such as page views and product purchases can be
the system can make recommendations to each user (Bobadilla et al., utilized, as well as explicit statements such as ratings, registration
2013). User ratings can also be implicitly collected by mining user ac- forms, or product ratings (Eirinaki and Vazirgiannis, 2003).
tivity and behavior (Anand and Mobasher, 2003; Herlocker et al., 2000; Various studies (Golemati et al., 2007; Hoppe, et al., 2013;
Nasraoui et al., 2008) retrieved from other systems or domains (Burke, Mobasher et al., 2000; Nasraoui et al., 2000; Qiu and Cho, 2006; Raad,
2002; Li, 2011). et al., 2010; Soltysiak and Crabtree, 1998; Sugiyama et al., 2004)
2
showed that in building user profiles, the reduction of the human factor these clusters, and these profiles are enriched with the help of addi-
by replacing it with computers allows both quick and objective results. tional resources. Mele (2013) used collaborative filtering and web
The information gathered from user behaviors can be analyzed by data usage mining to improve search engine performance by suggesting
mining and machine learning methods and user profiles can be created. more related web pages. User navigation logs, which contain click re-
All personalization approaches, particularly methods based on data cords, were used to identify users with similar interests and tastes.
mining, require an accurate collection of data to reflect the interests of Adeniyi et al. (2016) have used the k-nearest neighbor method in a
users. Personalized systems differ not only by the methods used to recommender system to process existing user click-through data with
produce suggestions or to make predictions but also from the way user web usage mining and matched them with specific user groups to create
profiles are created (Mobasher, 2007). a suggestion set more quickly to the target user. In this way, they ex-
Rule-based and content-based personalization systems typically pected to overcome the problem of scaling.
create an individual user profile over user interest, and use that profile
only to tailor future user interactions. A significant disadvantage of 2.5. Biclustering
approaches based on individual profiles is that the recommendations
are limited because the user is very focused on their previous interests; In this study, we applied biclustering methods that allow working
in other words, the inability of the user to make recommendations that with high-dimensional data and objects that are found in multiple
he/she is interested in, even if it is not known that he/she has a specific clusters. Different from other clustering methods, biclustering creates
interest. In rule-based systems, although user profiles are based on different submatrices from data matrices, taking objects and their at-
personal and demographic data, the use of demographic data in the tributes into consideration equally. While conventional methods cluster
suggestion process is not very common. The reason for this is that it is attributes of objects according to their attribute values, biclusters can
more difficult to collect these data from the Internet, and the collected aggregate their attributes, such as objects, and uniquely identify pat-
data have a tendency to be of poor quality. Besides, only re- terns that other methods cannot identify. Clustering algorithms of this
commendations based on demographic data have been shown to be less type can be evaluated under biclustering methods. Biclustering
accurate than suggestions based on content-based or user behavior methods are used in different applications: gene expression analysis
(Pazzani, 1999). In collaborative filtering, the system not only uses the (Cheng and Church, 2000), collaborative filtering (Fabricio et al.,
user profiles of active users but also considers those of other users. 2007), and even in the field of online ad (Ignatov et al., 2012).
Profiles are usually represented as a set of ratings that show user pre- Biclustering methods can be classified according to the methods
ferences of an item subset. An active user profile is used to find other applied by the algorithms: greedy, exhaustive enumeration, distribution
users with similar tastes (Mobasher, 2007). parameter identification, and divide-and-conquer (Padilha and
Campello, 2017). The δ-bicluster algorithm (Cheng and Church, 2000),
2.4. Web usage mining order preserved submatrix (Ben-Dor et al., 2003), conserved gene expres-
sion motifs (xMOTIFs) (Murali and Kasif, 2003), iterative signature algo-
In this study, web usage mining is utilized in building user profiles rithm (Bergmann et al., 2003), large average submatrices (Shabalin et al.,
because the primary data source is the log file that contains implicitly 2009), and qualitative biclustering (Li et al., 2009a,b) are well-known
collected records from users. greedy algorithms. Two popular exhaustive enumeration algorithms are
Web mining is defined as the use of data mining methods, techni- differentially expressed biclusters (Serin and Vingron, 2011) and statis-
ques, and models on web-based data, structures, or usage patterns. Web tical-algorithmic method for bicluster analysis (Tanay et al., 2002).
usage mining refers to the application of data mining methods to reveal Spectral biclustering (Kluger et al., 2003) and the plaid model
patterns of web usage data to better understand the needs of web ap- (Lazzeroni and Owen, 2002) are frequently mentioned distribution
plications as well as users (Srivastava et al., 2000). The most significant parameter identification algorithms in the literature. The Binary inclu-
difference that distinguishes web usage mining from structural web sion-maximum biclustering (BiMax) algorithm is one of the divide-and-
mining and web content mining is that it reflects user behavior. Ana- conquer algorithms. BiMax is used to find all biclusters that are inclu-
lysis of user behavior can provide information that can lead to custo- sion-maximal. These biclusters cannot be part of another bicluster
mization and personalization of user experience (Markov and Larose, larger than themselves. BiMax finds the most comprehensive sub-
2007). matrices with a value of “1″ by separating ”0″ and “1″ in the binary data
Web usage mining, which does not have its specific algorithm, fol- matrix, and it realizes the divide-and-conquer strategy (Prelić et al.,
lows the typical data mining cycle. Mobasher (2007) stated that web 2006). We preferred to use this algorithm because it can determine
usage mining has the excellent flexibility of using different data sources biclusters in a reasonable time and it is as effective as other biclustering
extensively and that customization tasks can be better integrated with algorithms.
other existing applications. The web usage mining cycle consists of data There are recent studies using biclustering with collaborative fil-
collection, preprocessing, discovery of patterns, and analysis stages tering. Symeonidis et al. (2008) proposed the nearest-biclusters method
(Kosala and Blockeel, 2000; Varnagar et al., 2013). to group users and items at a time. The method was combined with two
Web usage mining, which is used in conjunction with personaliza- different types of biclustering algorithms: BiMax and xMOTIFs. The
tion approaches such as collaborative filtering, can boost the perfor- results show that the proposed method improves the performance of the
mance of recommender systems that are insufficient to handle issues collaborative filtering process considerably. Fabricio et al. (2007) de-
such as subjective user ratings, scalability, or high dimensionality. Web veloped an immune-inspired biclustering technique to account for the
usage mining also helps in the effective use of collaborative filtering on existing duality between users and items, rather than considering only
anonymous users (Cho et al., 2002; Mobasher et al., 2001, 2002). the similarities between users or items and reported that better results
Most of the data mining approaches used for personalization are were obtained compared to other collaborative filtering methods in the
considered extensions of collaborative filtering. In these approaches, literature. Alqadah et al. (2015) developed a recommendation system
user profiles are built with the help of different algorithms, taking into using a bicluster-based collaborative filtering method to obtain sensi-
consideration preferences or navigation patterns of the users. These tive and regional results. The developed method provides better results
user profiles can be used to generate suggestions or to estimate beha- than the commonly accepted algorithms especially on sparse data.
vior of the target user (Mobasher, 2007). Symeonidis et al. (2009) applied a biclustering method to group
Nasraoui et al. (2008) summarized their approach to user profiling users in their work. The use of groups instead of individual users re-
by using web usage mining: user sessions extracted from web log files flects the preferences of all communities and leads to better explana-
are preprocessed and then clustered. User profiles are created from tions because of the broader user preferences of collaborative features.
3
Few At least half As many as possible Most

1.0 1.0 1.0 1.0
Membership Q(r)
Membership Q(r)
Membership Q(r)
Membership Q(r)
0.8 0.8 0.8 0.8
0.6 0.6 0.6 0.6
0.4 0.4 0.4 0.4
0.2 0.2 0.2 0.2
0.0 0.0 0.0 0.0
0
0.2
0.4
0.6
0.8
1
0
0.2
0.4
0.6
0.8
1
0
0.2
0.4
0.6
0.8
1
0
0.2
0.4
0.6
0.8
1
r r r r
a) Few (a = 0.05, b = b) At least half (a = 0, c) As many as possi- d) Most (a = 0.3, b =
0.15) b = 0.5) ble (a = 0.5, b = 1) 0.8)
Fig. 1. Linguistic quantifiers: few, at least half, as many as possible, and most.
Ignatov et al. (2012) developed an algorithm that generates dense ap- linguistic quantifiers (Herrera and Herrera-Viedma, 1997; Herrera
proximate biclusters from binary data in a study on concept-based bi- et al., 1996; Yager, 1988, 1993). Proportional quantifiers are used to
clustering for online ads. Zhang et al. (2014) proposed a biclustering represent relative amounts, such as at least half and most. A propor-
and fusion-based recommendation technique for the cold start problem, tional quantity can be represented by a fuzzy subset (Q) in the range
and then compared their methods to the UBCF and IBCF algorithms. In [0,1]. For any r ∈ [0, 1], the membership function Q (r) indicates how
a similar study, Kant and Mahara (2018) fused IBCF and UBCF in a well the quantifier that the proportion r represents is compatible with
weighted sum approach and used biclustering for the neighborhood its meaning. The weights of the OWA aggregation operator are ex-
formation to handle the sparsity problem, which led them to obtain pressed as
improved prediction results.
Sj Sj − 1
wj = Q ⎛ ⎞ − Q ⎛⎜ ⎟
⎞ ⎜ ⎟
T
⎝ ⎠ ⎝ T ⎠ (4)
2.6. Ordered weighted averaging aggregation operators
where the j-th largest ai value is bj , and the importance level of this
The aggregation of preferences, criteria, or similarities takes place j n
criterion is uj , Sj = ∑k = 1 uk , T = ∑k = 1 uk , and Q is the membership
in various stages in recommender systems. Generally, aggregation function. The linguistic membership function is
processes are performed using arithmetic mean or maximum and
⎧ r −0,a r<a
minimum operators. Many other aggregation operators, which may
lead to more appropriate recommendations, are often ignored (Beliakov Q (r ) = r−b
, a ≤ r≤b
⎨
et al., 2011). Beliakov et al. (2011) also mentioned that the use of ⎩ 1, r>b (5)
different aggregation operators according to the specifications of the
where different a and b values correspond to different linguistic quan-
recommender system might lead to improved recommendation results,
tifiers. Some linguistic quantifiers are shown in Fig. 1.
but the use of more complex operators will not lead to more accurate
recommendations.
3. Collaborative targeting method
In this study, OWA operators are used to perform the aggregation
operations. Yager (1988) introduced the OWA operators as a new ag-
This section presents our proposed method: collaborative targeting.
gregation method and then they became a research object and were
First, we describe our datasets and data preparation steps. Next, we
applied in many fields (Emrouznejad and Marra, 2014).
detail the profile building, which is followed by biclustering of users
An OWA operator of dimension n is a mapping, F : Rn → Rn , which
and their interests. After that, we explain the process of finding out
has an associated n-dimensional weight vector W
which biclusters the target user belongs to. Finally, we describe how ad
w recommendations are made to the target user based on the joint deci-
⎡ 1⎤
⎢ w2 ⎥ sion of the bicluster users.
W=
⎢ ⋮ ⎥ Table 1 presents some of the variables used in this section and their
⎢ wn ⎥
⎣ ⎦ (1) description.
n
such that wj ∈ [0, 1] and ∑ j=1 wj = 1; moreover,
3.1. Description of datasets
n
F(a1, ⋯, an ) = ∑ j=1 wj bj (2)
In this study, we use ad domain-specific one real-world dataset and
where bj is the j-th largest of the ai (Yager, 1988). one synthetic dataset, which is synthetically generated based on the real-
The most important step of the OWA operator is reordering the world dataset.
arguments, while the weights are associated with a particular position
in the order, without depending on a specific argument. If vector B 3.1.1. Real-world dataset
corresponds to ordered arguments, the OWA operator can be expressed The real-world dataset contains three web logs. These logs are
as briefly named as impression logs, click logs, and user navigation logs.
F(a1, ⋯, an ) = W T B (3) The data were collected between November 1, 2015 and December 31,
2015 by the ad delivery network. The purpose of collecting these data is
where WT is the transpose of the weight vector (Yager, 1988; Yager and to serve their customers to relevant ads. The data are not publicly
Filev, 1999). available. All the datasets correspond to 2 months of activity for 250
anonymous users.
2.6.1. Determining OWA weights All data are composed of a set of columns that represents features
One way of calculating the weights of the OWA operator is by using from the users, ads, and navigation behaviors such as website visits or
4
Table 1
Variable descriptions.
Variable Name Description
A User ad matrix A matrix that contains CTR values.

g Global interest vector A vector that contains the distribution of all user interests.
I User interest matrix A matrix that contains user interest. To generate the user interest matrix, global interest is applied to the local interest matrix.
I User interest User interest in a category. Global interest is implemented; thus, popular categories are neglected.
imp User importance A value between 0 and 1 reflecting the importance of user. Users with higher importance have more influence in the recommendation phase.
L Local interest matrix A matrix that contains local interest.
l Local interest User interest in a category without the influence of the global interest.
p User interest vector A vector that contains user interest.
V User visit matrix A matrix that contains the number of visits.
v Number of visits The total number of user visits to a category.
Table 2 Table 4
Details of each log file. Statistical properties of the click data.
Log type No. of records No. of properties Statistics Value (Real) Value (Synthetic)
Impression logs 1 254 961 72 Number of ad campaigns 356 1427

Click logs 5 109 72 Number of users 250 7921
Navigation logs 1 321 757 6 Unique clicks against campaigns 1146 89 924
Unique clicks per user 4.58 11.35
Unique clicks per campaign 3.22 63.01
Number of clicks of the most clicked user (per 20 38
searches. The impression logs contain the raw data for each displayed
campaign)
ad. Each record includes 72 features, including the metadata of the ad, Number of clicks of the least clicked user (per 2 4
visual properties of the ad, campaign details (in which the ad is linked), campaign)
information (location, language, browser, etc.) about the user who sees Total clicks for the top 10% users 27% 21%
the ad, and time stamp. A campaign is a group of ads that promote the Total clicks for the top 10% campaigns 87% 38%
Sparsity 0.987 0.992
same product or service. The click logs contain raw data for each
clicked ad. Each record contains the same features as the impression Columns corresponding to Value (Real) and Value (Synthetic) present the sta-
logs. Navigation logs contain raw data for user navigational patterns. tistics for the real-world dataset and synthetic dataset, respectively.
Each record consists of six features including user information, navi-
gation address, and time stamp. The numbers of records per log are Table 5
listed in Table 2. The sparsity of the click data is approximately 99% by Statistical properties of the navigation data.
the nature of the ad domain.
Statistics Value (Real) Value (Synthetic)
Number of categories 61 61
3.1.2. Data preparation Mean 1.69% 1.69%
The primary operations that can be performed are cleaning, re- Median 0.57% 1.42%
duction, and enrichment processes, which are frequently used in data Most visited category 12.05% 3.04%
mining. No additional cleansing has been performed on the obtained Least visited category < 0.01% 0.09%
Total visits for the top 10% category 47% 19%
data. Because some fields in the data are the same for all records (such
as the location of users and linguistic information), these fields are ig- Columns corresponding to Value (Real) and Value (Synthetic) present the sta-
nored in further processing. Considering that the ad suggestion is based tistics for the real-world dataset and synthetic dataset, respectively.
on the campaign in which the ad is linked to, various fields of the ad
(such as height, width, and size) are ignored, and the unique identifier age range and gender distribution for a given category. This service
of the campaign is considered instead. utilizes the DoubleClick database (DoubleClick, 2016). Table 3 presents
Two additional services, named demographic service and domain the details of data provided from external sources.
service, were prepared for data enrichment. These services utilize third- Tables 4 and 5 list some statistics of the click data and navigation
party services or data sources to create a meaningful profile for users, data after the data preparation steps are applied, respectively. Fig. 2
and they generate new data over the activities that the user have done. shows the high-level architecture of these services including the data
The domain service returns the categories to which the given website preprocessing steps.
belongs. For example, for Elsevier.com, the research/reference category
information is obtained. This service utilizes the OpenDNS Domain
Tagging database (OpenDNS, 2016) created by categorizing Internet 3.1.3. Synthetic dataset
sites by folksonomy. There are 61 main categories such as games, Because our real-world dataset is relatively small, we generated a
travel, and movies. The demographic service returns information on synthetic dataset to meet specific needs. The process of creating a
Table 3
Details of external data.
Data Type No. of Records Description
Categories 61 Unique categories retrieved from the domain service

Domain–category data 35 816 List of categories where the domains are related
This is the primary data used to build the user visit matrix
Demographics data 61 Demographics data for each category in octet form1
1
Each octet contains the probability of being in 18–24, 25–34, 35–44, 45–54, 55–64, and 65+ age range and likelihood of sex.
5
Fig. 2. High-level architectural design of data preprocessing and integration with external services.
synthetic dataset is by implementing small scripts, which works ac- matrices. When a target user requests for an ad recommendation, the
cording to some statistical distributions based on a few parameters. We nearest biclusters are found based on the interests of the target user.
generated and restricted the synthetic dataset based on the following From the impression and click data, a user ad matrix is constructed for
requirements: users of the selected biclusters. Finally, ad recommendations are gen-
erated for the target user.
• The navigation data are randomized and distributed based on the
real-world dataset. 3.3. Profile building
• The users are grouped to share some common interests. The group
size is limited to a maximum of 3% of the total population, and the Preparation of user visit and user interest matrix is required to ob-
interest category size of a group is limited from 4 to 16. tain user profiles, which contains user interest against predefined ca-
• The most popular ads are limited to be 7.5% of all ads. tegories. These preparation steps are detailed in this subsection.
• The average CTR is limited to 0.75%. Another matrix, the user ad matrix, is also prepared to be used for the
• The clicks for the ads are distributed statistically based on the in- recommendation. The user importance, which plays a key role in the
verse power law distribution (1 x n ); n is selected as 0.25 for a recommendation, is also introduced.
smoother clicks distribution between ads.
• The number of clicks per user is randomly distributed between 4 and 3.3.1. User visit matrix
40. The navigation data of the user are processed, and the user visit
matrix is prepared. This matrix consists of how many times a user
The tables also indicate some statistics for the clicks data and na- visited a category. We describe the user visit matrix as
vigation data of the synthetic dataset, respectively. Data preparation
V = vu, c (6)
steps were not applied to the synthetic dataset.
where V is the user visit matrix, and v is the number of visits for user u
3.2. Overview and category c.
The collaborative targeting method can be divided into three pro- 3.3.2. User interest matrix
cesses: a) profile building creates the required matrices for other op- We need a matrix that we can work on to bicluster users and cate-
erations from the user navigation data and calculates user importance, gories of interest. Instead of biclustering the user visit matrix, we create
b) finding user and interest groups creates a bicluster set by performing a new binary matrix to perform fast and efficient processes by ne-
biclustering over the user interest matrix, and c) recommendation finds glecting uninterested but frequently visited categories by users and
the nearest biclusters to the target user and uses the impression and considering the global interest.
click data to generate ad recommendations for the user. The proposed The user visit matrix contains information on how many times a
method is outlined in Fig. 3, which shows how to make an ad sugges- user has visited a category. The fact that two different users (e.g., Alice
tion for a target user. and Brad) have visited the same category equally does not make sense
User visit and user interest matrix are formed from navigation data. alone. Assume that the total number of visits by Alice is 1000 and the
User importance data and the set of biclusters are generated from these total number of visits by Brad is 50 000. If both users have made 300
6
Fig. 3. Outline of the proposed method.
Table 6
User visit matrix, V.
User Category 1 Category 2 Category 3 Category 4 Category 5 Total (User)
Alex 30 5 17 66 32 150
Brad 25 5 60 47 13 150
Cheryl 5 2 18 3 7 35
Total (Category) 60 12 95 116 52 335
Table 7 Table 10
Local user interest matrix, L. User importance vector, imp.
User Category 1 Category 2 Category 3 Category 4 Category 5 User Importance
Alex 0.20 0.03 0.11 0.44 0.21 Alex 0.528

Brad 0.17 0.03 0.40 0.31 0.09 Brad 0.791
Cheryl 0.14 0.06 0.51 0.09 0.20 Cheryl 0.309
Table 8 Table 11
Global interest vector, g. User ad matrix, A.
Category 1 Category 2 Category 3 Category 4 Category 5 User ad1 ad2 ad3 ad4 ad5 ad6 ad7 … ad356
Global Interest 0.18 0.04 0.28 0.35 0.16 1 0 0 0 0 0 0 0.157523 … 0

2 0 0 0 0 0 0 0 … 0
3 0 0 0.51729 0 0 0 0 … 0
… … … … … … … … … …
Table 9
250 0.120182 0 0.117218 0 0.2321 0 0 … 0
User interest matrix, I.
User Category 1 Category 2 Category 3 Category 4 Category 5 Total
than the number of visits by the user.
Alex 1 0 0 1 1 3
Similarly, some categories are visited more frequently by all users
Brad 0 0 1 0 0 1
Cheryl 0 1 1 0 1 3 than other categories. For example, while the websites of the Sports
category are visited by an average of 10% of all user visits, the Toy
category has only a 0.5% visit. Alice made 5% of her visits to Sports and
visits to the websites of the Sports category, the interest of these two 2% to the Toy category. Alice will only be relevant to the Sports cate-
users in the Sports category cannot be considered the same (For Alice gory when evaluated within itself but will have a higher interest in the
and Brad, the rate of visits to the Sports category is 30% and 0.6%, Toy category compared to other users.
respectively). It is necessary to pay attention to the rate of visit rather To address these two conditions, we introduce two interest types:
7
Fig. 4. Flowchart for selecting the top-N ads for a recommendation.
Table 12
Illustration of the reordering step for OWA.
i User ai u j User bj u
1 2 0 0.055125 → (reordering) 1 58 0.315536 0.015847

2 15 0 0.124215 2 68 0.315536 0.074681
3 17 0 0.057386 3 43 0.29546 0.062233
4 22 0 0.0785 4 93 0.243545 0.067826
5 30 0.2321 0.096331 5 30 0.2321 0.096331
6 43 0.29546 0.062233 6 2 0 0.055125
7 48 0 0.064936 7 15 0 0.124215
8 52 0 0.054962 8 17 0 0.057386
9 55 0 0.127911 9 22 0 0.0785
10 58 0.315536 0.015847 10 48 0 0.064936
11 68 0.315536 0.074681 11 52 0 0.054962
12 71 0 0.06863 12 55 0 0.127911
13 74 0 0.105373 13 71 0 0.06863
14 75 0 0.06179 14 74 0 0.105373
15 76 0 0.067028 15 75 0 0.06179
16 89 0 0.059802 16 76 0 0.067028
17 93 0.243545 0.067826 17 89 0 0.059802
18 95 0 0.067422 18 95 0 0.067422
19 102 0 0.069605 19 102 0 0.069605
20 109 0 0.068092 20 109 0 0.068092
8
Table 13 ⎡ l u , c1 ⎤
Weight vectors obtained for two different linguistic quantifiers: Few and Most. ⎢ l u, c 2 ⎥
pu = ⎢ ⎥
User bj u w a = 0.05, b = 0.25 w a = 0.3, b = 0.8 ⎢ ⋮ ⎥
(Few) (Most) ⎢ lu, cn ⎥
⎣ ⎦ (8)
58 0.31554 0.01585 0 0
68 0.31554 0.07468 0.06266 0
where p is the user interest vector for user u, and n is the number of
43 0.29546 0.06223 0.21494 0 categories. We can calculate the local interest matrix via the user visit
93 0.24355 0.06783 0.23426 0 matrix.
30 0.2321 0.09633 0.33271 0
2 0 0.05513 0.15544 0 V
L=
15 0 0.12422 0 0.08558 Vj (9)
17 0 0.05739 0 0.07928
22 0 0.0785 0 0.10845 where L is the local interest matrix, and Vj is a vector of row sums of the
48 0 0.06494 0 0.08971 user visit matrix ( j = (1, 1, ⋯, 1) is the column vector). To find out in-
52 0 0.05496 0 0.07593
terest-dense categories for each user, we need to define a global interest
55 0 0.12791 0 0.17671
71 0 0.06863 0 0.09481
vector.
74 0 0.10537 0 0.14557
jT L
75 0 0.06179 0 0.08536 g=
76 0 0.06703 0 0.05859 ∑ jT L (10)
89 0 0.0598 0 0
95 0 0.06742 0 0 where g is the global interest vector, and jT L
is the vector of column
102 0 0.06961 0 0 sums of the local interest matrix. We can calculate the user interest
109 0 0.06809 0 0 matrix I as follows:
OWA 0.21755 0
IOWA 0.84457 0 L
I=
g (11)
Table 14 Each element of the matrix is transformed based on a defined

OWA and IOWA values calculated for each ad and sorted (sorting varies by threshold value to binarize the user interest matrix.
operator).
1, iu, c ≥ t
Top-N Ad OWA Ad IOWA
iu, c = ⎧
⎩ 0, iu, c < t
⎨ (12)
1 ad28 0.309263 ad7 1
where t is the threshold value. A high threshold (> 1) selection makes it
2 ad44 0.302389 ad28 1
3 ad7 0.287252 ad44 1 easier to find users with more precise interests but selecting an ex-
4 ad46 0.217554 ad46 0.844570 cessively high (> 3) value may cause no user to match each other.
5 ad24 0.192411 ad1 0.786439 Finally, we have a binary user interest matrix where global interest is
6 ad8 0.151433 ad5 0.727088 also applied. It contains binary data for user interest, and it has users in
7 ad25 0.140774 ad8 0.679124
8 ad47 0.134341 ad24 0.665121
its rows and categories in its columns.**
9 ad5 0.104363 ad47 0.521154
10 ad34 0.089903 ad6 0.50607 ⎡ iu1, c1 ⋯ iu1, cn ⎤
11 ad1 0.077375 ad25 0.417876 I=⎢ ⋮ ⋱ ⋮ ⎥
⎢ium, c1 ⋯ ium, cn ⎥
12 ad6 0.07233 ad34 0.344299 ⎣ ⎦ (13)
13 ad18 0.046889 ad18 0.15763
14 ad23 0.016631 ad21 0.061186 where m is the number of users, and n is the number of categories. If
15 ad11 0.015206 ad23 0.061186 user activities have not created any relation to a category, then
16 ad21 0.012095 ad11 0.060665
I (u, c ) = 0 . Otherwise, if the user has an interest in a category,
17 ad20 0.010739 ad20 0.04094
18 ad30 0.000504 ad30 0.001544 I (u, c ) = 1.
… … 0 … 0
3.3.3. User importance
We desire that each user in the system has different degrees of
local interest and global interest. Then, we combine these two types of importance in the recommendation phase. In this way, we can highlight
interests into the user interest matrix. Afterward, this matrix is used for the tastes of more valuable users. We consider two criteria in judging
biclustering. which of the two compared users is more important. First, the user has
The local interest of a user in a category is the ratio of their total more visits. Second, the user is interested in a small number of cate-
number of visits to that category and the total number of navigations to gories. Because we want the user importance to have smooth values
all categories. Thus, this is a value between 0 and 1. between 0 and 1, we normalize the ratio of these two values ex-
vu, c ponentially.
l u, c = The user importance is defined as
∑c∗ vu, c∗ (7)
 1/ e
V̇
where l is the local interest of user u in category c. The user interest imp = ⎛ ⎞ ⎜ ⎟
⎝I⎠ ̇ (14)
vector contains the interests of a user for all categories.
Table 15
Categorization of ads.
Selected Not Selected Total
Relevant Recommended and clicked by the target user Clicked by the target user but not recommended All ads clicked by the target user
Irrelevant Recommended but not clicked Not recommended and not clicked All ads not clicked by the target user
9
Table 16
Parameters and their optimized values.
Parameter Reference Value (Real) Value (Synthetic)
Threshold (1) 3.3.2 User Interest Matrix → Eq. (12) 1 1.5

Number of biclusters 3.4 Finding User and Interest Groups 25 50
Minimum number of rows in bicluster 3.4 Finding User and Interest Groups 10 32
Minimum number of columns in bicluster 3.4 Finding User and Interest Groups 6 8
Threshold (2) 3.5.1 Finding Biclusters to Which User Belongs 3.5.1 Finding Biclusters to Which 0.02 0.05
User → Algorithm 1
Bicluster limit 3.5.1 Finding Biclusters to Which User Belongs → Algorithm 1 3 3
Threshold (3) 3.5.2 Selecting Top-N Ads to Recommend 0.1 0.05
Q(r) 3.5.2 Selecting Top-N Ads to Recommend a = 0.05, b = 0.25
Fig. 5. The k-fold cross-validation method applied to the proposed method. The figure illustrates how the training and test data are used.
Table 17  is a normalized vector of row

where imp is the user importance, V̇ = Vj
Evaluation metrics for compared methods – Experiment 1. sums of user visit matrix (number of visits), and I ̇ = 
Ij is a normalized
@N Metric Popular IBCF UBCF WRMF BPRMF SLIM Proposed
vector of row sums of user interest matrix (number of interests). The
Method value of the user importance is between 0 and 1, and higher values are
more important than lower ones.
1 P 0.069 0.412 0.412 0.167 0.511 0.509 0.356
R 0.041 0.100 0.105 0.023 0.125 0.120 0.085 Example 1. Consider a microsystem that has three users (let us call
F1 0.051 0.161 0.167 0.040 0.201 0.195 0.137 them Alex, Brad, and Cheryl) and five categories with the given user
5 P 0.039 0.353 0.353 0.278 0.419 0.417 0.367 visit matrix in Table 6. This example demonstrates how a user interest
R 0.109 0.394 0.416 0.401 0.470 0.465 0.439
F1 0.057 0.372 0.382 0.328 0.443 0.440 0.400
matrix and user importance are calculated. By applying Eq. (9) on given
10 P 0.031 0.271 0.276 0.217 0.310 0.307 0.278 V, the local user interest matrix is calculated (Table 7). The global
R 0.152 0.588 0.595 0.583 0.653 0.639 0.650 interest rate is calculated by using Eq. (10) and it is presented in
F1 0.051 0.371 0.377 0.316 0.421 0.414 0.389 Table 8.
20 P 0.018 0.185 0.171 0.133 0.216 0.215 0.218
R 0.180 0.731 0.718 0.713 0.820 0.819 0.740 Eq. (11) and Eq. (12) are applied together to create the user interest
F1 0.033 0.296 0.276 0.225 0.342 0.341 0.336 matrix (Table 9). Finally, the user importance is calculated by using Eq.
(14) (Table 10). The effect of global interest can be seen on Cheryl’s
The column corresponding to @N presents the number of recommendations.
Rows corresponding to P, R, and F1 present the precision, recall, and F1 values,
interests for Category 1 and Category 2. Cheryl visited Category 1 more
respectively. Bold numbers are the best performance and underlined numbers than twice compared to Category 2. However, this is also the case for
are the second-best performance in terms of the given metrics for each @N other users. Therefore, Cheryl has no particular interest on Category 1,
value. but on Category 2.
10
Fig. 6. Precision, recall, and F1 values of the proposed and compared methods – Experiment 1.
Table 18 binary matrix, the use of the BiMax algorithm is deemed suitable.
Evaluation metrics for compared methods – Experiment 2. Because it finds all biclusters that are inclusion-maximal, this behavior
means that biclusters can be overlapped partially; thus, the target user
Method can belong to more than one bicluster.
We used the R package named biclust (Kaiser et al., 2015) to perform
1 P 0.175 0.176 0.278 0.600 0.371 0.371 0.582 biclustering. This package provides several algorithms to find biclusters
R 0.029 0.056 0.107 0.123 0.104 0.104 0.071
in two-dimensional data. When performing biclustering by the BiMax
F1 0.049 0.085 0.155 0.204 0.163 0.163 0.126
5 P 0.100 0.188 0.233 0.440 0.297 0.300 0.516 algorithm, parameters such as the number of minimum row and column
R 0.068 0.361 0.425 0.397 0.418 0.437 0.323 of a bicluster and the maximum number of biclusters to be found are
F1 0.081 0.248 0.301 0.417 0.348 0.356 0.397 specified. Selection of these parameters is explained in Section 4.2.
10 P 0.085 0.153 0.161 0.310 0.218 0.218 0.442
R 0.117 0.503 0.582 0.517 0.581 0.585 0.551
F1 0.098 0.235 0.252 0.388 0.317 0.317 0.490
20 P 0.064 0.099 0.100 0.245 0.156 0.154 0.331 3.5. Recommendation
R 0.169 0.638 0.741 0.762 0.765 0.758 0.641
F1 0.093 0.171 0.176 0.371 0.259 0.256 0.436 Recommendations are created in two main steps. First, we de-
termine the biclusters that the target user belongs, and then we select
top-N recommendations based on the joint decision of the biclusters’
respectively. Bold numbers are the best performance and underlined numbers users.
are the second-best performance in terms of the given metrics for each @N
value.
3.5.1. Finding biclusters to which user belongs
Another observation is that Alex and Brad have the same number of The Jaccard index, which is preferred to be used for binary data, is
total visits; however, based on the distribution on their visits, Alex has calculated to find which bicluster or biclusters the target user belongs
three interests, but Brad has only one. Because Brad has a more focused to. For each bicluster, the Jaccard index is calculated as
interest than Alex and the same number of total visits, Brad is more
important than Alex. Of course, if Brad shows interest on different ca- |X ∩ Y |
J (X , Y ) =
tegories, his importance will be decreased in time. On the other hand, |X ∪ Y | (15)
Alex and Cheryl have the same number of interests, but Alex has more
where X is the set of target user interests and Y is the set of bicluster
visits. It makes Alex more important than Cheryl. The importance of
interests. We call these values the bicluster score.
these users is in line with the user importance vector.
Because the biclusters may overlap each other, a target user can be a
member of more than one bicluster. Starting with the highest scored
3.3.4. User ad matrix bicluster, if the scores of the other biclusters are within a determined
The impression data and click data of a user are processed, and the threshold value1, all those bicluster users are obtained. If the bicluster
user ad matrix is prepared. This matrix contains campaign-based CTR limit to be included is 1, only the first bicluster is considered. Algorithm
values based on a single user. Simply, it is the ratio of how many times a 1 is used for this operation.
user views and clicks on the ads for an ad campaign. A sample user ad
matrix is illustrated in Table 11.
3.4. Finding user and interest groups 1

The threshold value is between 0 and 1. If the threshold is 0, the next
bicluster must have exactly the same bicluster score as the current bicluster. If
The biclustering operation is performed on the user interest matrix the threshold is 1, all biclusters are considered unless the bicluster limit is ex-
to find user and interest groups. Because the user interest matrix is a ceeded.
11
Table 19 (2). When each user is considered as a criterion, weights can be cal-
Evaluation metrics for compared methods – Experiment 3. culated by using Eq. (4). The a and b parameters for the linguistic
quantifier are selected as 0.05 and 0.25, respectively. These values
Method coincide with “at least one” and “few.” OWA values are calculated se-
parately for each ad after the weight vector is obtained. At the same
1 P 0.061 0.278 0.500 0.333 0.390 0.399 0.184 time, IOWA values are also calculated. The ai value is taken as the
R 0.037 0.051 0.139 0.069 0.092 0.095 0.105
stimulated order-inducing variable ui for IOWA. The ai values are
F1 0.046 0.086 0.217 0.114 0.149 0.154 0.134
5 P 0.033 0.278 0.378 0.233 0.308 0.315 0.163 changed to 1 if ai > 0 and 0 if ai = 0 . With this change, we only con-
R 0.096 0.323 0.435 0.297 0.374 0.383 0.461 sider if the user clicked or not clicked on the ad, while CTR values are
F1 0.049 0.299 0.404 0.261 0.338 0.346 0.241 used as the order-induced variables. We used the R package named agop
10 P 0.024 0.233 0.261 0.178 0.235 0.237 0.124
(Gagolewski and Cena, 2014) to calculate the OWA and IOWA values.
R 0.122 0.459 0.562 0.452 0.531 0.533 0.689
F1 0.041 0.309 0.357 0.255 0.325 0.328 0.210
Finally, the values obtained for each ad are arranged in a des-
20 P 0.017 0.174 0.156 0.108 0.154 0.154 0.091 cending order so that the specified number (top-N) of ad suggestions is
R 0.151 0.578 0.636 0.510 0.611 0.614 0.763 presented to the target user. A threshold value between 0 and 1 can be
F1 0.030 0.268 0.250 0.179 0.246 0.247 0.163 selected to limit the number of false positive recommendations. If the
threshold value is approximately 1, the number of recommendations
will be reduced, but the true positive rate will be better. This would be
respectively. Bold numbers are the best performance and underlined numbers useful in the case where other ad targeting options are available. The
are the second-best performance in terms of the given metrics for each @N complete flowchart of the ad recommendation is illustrated in Fig. 4.
value.
Example 2. This example describes the OWA calculation and top-N
recommendation shown in Fig. 4 for a single ad in a bicluster of 20
Algorithm 1: Get users of biclusters that the target user belongs users. Table 12 illustrates the reordering step. Only five users of this
bicluster have clicked the ad. The weights are calculated by taking user
Input: Target User, Biclusters, Threshold, Bicluster Limit importance into consideration.
Output: List of Users
get the scores of all biclusters that match with the Target User; Table 13 provides the weight vectors obtained for two different
while bicluster score is within the Threshold value and Bicluster Limit is not exc- linguistic quantifiers. If the “Most” linguistic quantifier is selected for
eeded do
weight calculation, this ad will not be recommended because at least
get users from current bicluster;
if List of Users already exists then seven users would have clicked it, and this also does not guarantee a
add new and unique users to existing List of Users; recommendation. If the “Few” is selected, this ad will have a chance to
else be recommended because it requires the ad to be clicked by at least two
create a new List of Users users. By calculating the weights, the OWA and IOWA values are cal-
add users into List of Users;
end
culated for each ad separately. The OWA and IOWA values obtained for
end all ads are sorted in Table 14.Finally, the top-N ads are recommended to
return List of Users; the target user. Because the OWA and IOWA values are different, the
recommendation lists are also different.
3.5.2. Selecting top-N ads to recommend

The selection of ads is based on the joint decision of the users of the 4. Experiments and results
biclusters to which the target user belongs. The OWA and induced OWA
(IOWA) (Yager and Filev, 1999) operators are used for this. The CTR In this section, accuracy metrics are first introduced that are used in
values of users for an ad from the user ad matrix form the vector ai of the evaluation of the recommender systems. Then, we discuss how we
Eq. (2). The descending order of these values forms the vector bj of Eq. optimized the parameters of the proposed method to increase its
12
Table 20 as
Evaluation metrics for compared methods – Experiment 4.
clicked ∩ recommended
P=
recommended (16)
Method
clicked ∩ recommended
1 P 0.076 0.370 0.431 0.331 0.309 0.413 0.554 R=
clicked (17)
R 0.009 0.049 0.058 0.037 0.036 0.039 0.066
F1 0.016 0.086 0.102 0.067 0.065 0.071 0.118
where clicked means all ads clicked by the target user (in other words,
5 P 0.059 0.320 0.350 0.248 0.282 0.337 0.460
R 0.032 0.203 0.227 0.137 0.133 0.140 0.261
relevant items), recommended means the number of recommended ads
F1 0.041 0.248 0.276 0.176 0.181 0.198 0.333 (in other words, selected items), P is precision, and R is recall.
10 P 0.054 0.295 0.281 0.208 0.251 0.256 0.360 One of the main difficulties of using precision and recall in com-
R 0.058 0.322 0.337 0.220 0.197 0.270 0.387 paring different algorithms is that precision and recall must be con-
F1 0.056 0.308 0.307 0.214 0.221 0.263 0.373
sidered together to evaluate the performance of an algorithm thor-
20 P 0.047 0.233 0.171 0.147 0.137 0.201 0.228
R 0.099 0.391 0.397 0.315 0.321 0.340 0.459 oughly. It has been observed that precision and recall are inversely
F1 0.064 0.292 0.239 0.201 0.192 0.253 0.304 linked (Herlocker et al., 2004). The F1 metric combines precision and
recall into a single value.
Rows corresponding to P, R, and F1 present the precision, recall, and F1 values, 2PR
F1 =
respectively. Bold numbers are the best performance and underlined numbers P+R (18)
are the second-best performance in terms of the given metrics for each @N
value.
4.2. Parameter optimization
efficiency. Finally, the proposed method is compared to the state-of-the-
art recommendation methods: IBCF, UBCF, WRMF, BPRMF, and SLIM. We introduced some parameters of the proposed method in the
previous section. Each parameter affects the performance of the pro-
posed method. Similar to the methods used by Zhen et al. (2009) and
4.1. Evaluation metrics Symeonidis et al. (2008), we optimized one parameter of the proposed
method at a time with the other parameters fixed. The parameters and
Precision and recall are the two commonly used evaluation metrics their optimized values for our datasets are presented in Table 16.
for recommender systems (Herlocker et al., 2004). According to the
survey by Bobadilla et al. (2013), cross-validation, precision, recall, and 4.3. Experimental setup
receiver operating characteristic curve (ROC) metrics are widely used to
evaluate the top-N recommendations. In addition, precision and recall The proposed method is compared to five other recommendation
provide more information in binary classifications, especially in un- methods, namely IBCF, UBCF, WRMF, BPRMF, and SLIM. These
balanced distributed datasets than in the widely used ROC (Saito and methods cover a variety of the current state-of-the-art methods, which
Rehmsmeier, 2015). are good performers for the Top-N recommendation task. In addition to
To calculate precision and recall, the itemset must be separated into these methods, we also share the results of Popular recommendations.
relevant and irrelevant classes. Precision is the probability that a se- We used the R package named recommenderlab (Hahsler, 2017) to work
lected item is relevant; it is calculated as the ratio of the relevant items with Popular, IBCF, and UBCF. For WRMF, BPRMF, and SLIM, we used
selected to the total number of selected items. Recall is the probability the Librec (Guo, Zhang, Sun, and Yorke-Smith, 2015) library.
that a relevant item will be selected; it is the ratio of the relevant items The k-fold cross-validation method (Kohavi, 1995) was applied to
selected to the total number of available relevant items (Herlocker measure the performance of the recommendation methods. In this
et al., 2004). Table 15 presents the categorization of ads in this study. method, the dataset is divided into two specific parts: test data and
According to their definitions, precision and recall can be expressed training data. The method is trained through the training data and then
13
Table 21 4.4. Results of experiments

Evaluation metrics for compared methods – Experiment 5.
@N Metric Popular IBCF UBCF WRMF BPRMF SLIM Proposed Five experiments were performed with the entire and different
Method subsets of both the real-world dataset and the synthetic dataset. The
results we present are averages over 11 folds. It is useful to note that
1 P 0.036 0.293 0.283 0.261 0.269 0.357 0.218
while other methods are provided with at least one positive target user-
R 0.006 0.044 0.043 0.037 0.045 0.051 0.038
F1 0.010 0.077 0.074 0.064 0.077 0.090 0.064
ad relation during evaluation (given ≥ 1), no information is given
5 P 0.037 0.257 0.265 0.178 0.174 0.231 0.265 about the user-ad relationship to the proposed method (given = 0).
R 0.021 0.188 0.200 0.111 0.127 0.113 0.176
F1 0.027 0.217 0.228 0.137 0.147 0.151 0.211
4.4.1. Experiment 1 – Entire dataset (real-world dataset)
10 P 0.034 0.223 0.206 0.141 0.132 0.167 0.237
R 0.040 0.276 0.284 0.167 0.171 0.217 0.342 In the first experiment, the entire real-world dataset is used. The
F1 0.037 0.246 0.239 0.153 0.149 0.189 0.280 results of the experiment are displayed in Table 17. which includes
20 P 0.035 0.172 0.115 0.095 0.090 0.116 0.105 detailed evaluation metrics for the compared methods. Note that bold
R 0.085 0.312 0.311 0.230 0.247 0.299 0.300
and underlined values indicate the best performance and second best
F1 0.049 0.222 0.168 0.134 0.132 0.167 0.155
performance, respectively. Fig. 6 shows the precision, recall, and F1
The column corresponding to @N presents the number of recommendations. values, respectively, at different @N values. Ideally, it is desirable that
Rows corresponding to P, R, and F1 present the precision, recall, and F1 values, the values be close to 1.
respectively. Bold numbers are the best performance and underlined numbers BPRMF and SLIM are observed as two methods that are close to each
are the second-best performance in terms of the given metrics for each @N other and give the best results. IBCF and UBCF also yield close results,
value. but their performances are lower. When the poor performance of
WRMF is neglected due to its low precision values, the performance of
tested using the test data. This process is repeated for each piece. We the proposed method appears among neighborhood-based approaches
used the R package named caret (Ckuhn, 2017) to validate the re- and latent factor models. The differences between the methods are not
commendation methods by the k-fold cross-validation. Details of the significant2 (α = 0.05, with 99% confidence) for this experiment. For
method are shown in Fig. 5, where recommendations refer to ads that this dataset, we can say that although the proposed method is evaluated
are recommended to the target user, and clicks refer to ads that are at given = 0, its performance is equivalent to other methods’ perfor-
clicked by the target user. mances.
The k value (number of folds) is selected as 11. According to Kohavi
(1995), moderate k values between 10 and 20 reduce the variance, but 4.4.2. Experiment 2 – Top 100 users (real-world dataset)
vice versa for smaller k values. The experiments are run for different N We use the same setup as in Experiment 1 and the only difference is
values (number of recommendations): 1, 3, 5, 7, 10, 15, and 20. For the dataset. Instead of the full dataset, we select a subset containing 100
each setup, precision, recall, and F1 values are calculated. users who visit the most. The results of the experiment are displayed in
To test the proposed method for exceptional and boundary situa- Table 18. Fig. 7 shows the precision, recall, and F1 values, respectively,
tions, two more subsets are selected from the real-world dataset. One of at different @N values.
these subsets contains 100 users who visit the most (used in Experiment Different from the results of the previous experiment, WRMF ex-
2), and the other one contains 100 users with the fewest visit (used in hibits a better performance at lower @N values, and the proposed
Experiment 3). Additionally, the synthetic dataset is used in Experiment method gives the optimum results at higher @N values. In particular,
4, which allows us to perform an evaluation on a larger dataset. Finally, the strong precision values of the proposed method have resulted in
a subset from the synthetic dataset, where the most popular ads are higher F1 values. The high precision and high recall values show that
removed, is used for the long-tail experiment (Experiment 5).
2
We perform significance testing using ANOVA with Tukey’s pairwise com-
parison method. The Popular method is excluded from statistical calculations.
14
Table 22
Overall comparison of the best methods and the proposed method.
Experiment Best Method Second-Best Method Proposed Method Performance Difference
E1 – Entire Dataset (R) BPRMF (0.3692) SLIM (0.3650) 0.3306 −10%

E2 – Top 100 Users (R) Proposed Method (0.3765) WRMF (0.3622) 0.3765 +4%
E3 – Bottom 100 Users (R) UBCF (0.3185) SLIM (0.2881) 0.1885 −41%
E4 – Entire Dataset (S) Proposed Method (0.2985) UBCF (0.2440) 0.2985 +22%
E5 – Long Tail (S) IBCF (0.1993) Proposed Method (0.1921) 0.1921 −4%
the appropriate ads are successfully found and that unsuitable ads can 4.4.4. Experiment 4 – Entire dataset (synthetic dataset)
also be filtered. SLIM and BPRMF performed similarly. The proposed The entire synthetic dataset is used in this experiment to evaluate all
method’s performance is 26% higher than that of WRMF, and 55% methods on a larger dataset. The results of the experiment are displayed
higher than those of SLIM and BPRMF at @N = 10. As in the previous in Table 20. Fig. 9 shows the precision, recall, and F1 values, respec-
experiment, UBCF and IBCF have shown similar performances, but their tively, at different @N values.
performances are lower than those of the other methods. Although the The proposed method outperforms the other methods in terms of
proposed method has a better performance, it is statistically significant precision, recall, and F1. UBCF is the second-best performing method.
compared to the UBCF and IBCF methods, but not to WRMF, SLIM, and The proposed method’s performance is 24% higher than that of UBCF at
BPRMF (α = 0.05, with 99% confidence). The higher performance of @N = 10. One reason for the better performance is that when the
the proposed method is expected, because it has an advantage when the synthetic data were generated, the real-world dataset was taken into
knowledge about user navigational patterns is strong, but the other consideration, and the distribution of user interests was smoother. In
methods are not affected positively or negatively by that information. fact, the results of this experiment are consistent with Experiment 2:
both datasets have users with rich navigational histories. Although the
proposed method has a better performance, the differences between the
4.4.3. Experiment 3 – Bottom 100 users (real-world dataset) methods are not significant (α = 0.05, with 99% confidence).
In this experiment, we select a subset containing 100 users with the
fewest visits. The results of the experiment are displayed in Table 19. 4.4.5. Experiment 5 – Long tail (synthetic dataset)
Fig. 8 shows the precision, recall, and F1 values, respectively, at dif- In this last experiment, we used a subset of the synthetic dataset to
ferent @N values. examine the performance of the methods for long tail. The results of the
UBCF outperforms the other methods overall. Although the pro- experiment are displayed in Table 21. Fig. 10 shows the precision, re-
posed method has a better recall at larger @N values, it has a lower call, and F1 values, respectively, at different @N values.
precision. The high recall and low precision values show that the ap- IBCF shows the best performance while the proposed method per-
propriate ads are successfully found but unsuitable ads cannot be fil- forms second in terms of F1. The results are very interesting because for
tered efficiently; therefore, they are also recommended to the target each @N value, a different method shows a better performance. For @
user. This made the proposed method perform worse than the other N = 1 and @N = 5, the SLIM and UBCF have higher precision and re-
methods. The root cause behind this issue is that the proposed method call values, respectively. When the number of recommendations is in-
lost its advantage when the information about user navigational pat- creased, the proposed method and IBCF show a better performance. As
terns is less known than usual. The performance of the proposed a result, there is no statistically significant difference between the
method for the dataset used in this experiment is different from that of methods (α = 0.05, with 99% confidence).
UBCF, but not significantly different from those of SLIM, IBCF, WRMF,
and BPRMF; on the other hand, the performances of all other methods 5. Summary
are not significantly different (α = 0.05, with 99% confidence).
Table 22 summarizes the results of the experiments in terms of the
15
best and second-best performing methods, as well as the performance Handbook. Springer, Boston, Ma, pp. 705–734.
difference between the proposed method and the best performing Ben-Dor, A., Chor, B., Karp, R., Yakhini, Z., 2003. discovering local structure in gene
expression data: the order-preserving submatrix problem. J. Comput. Biol. 10 (3–4),
methods. If the proposed method is the best performer, it is compared 373–384.
against the second-best method. Bergmann, S., Ihmels, J., Barkai, N., 2003. Iterative signature algorithm for the analysis of
large-scale gene expression data. Phys. Rev. E 67 (3), 031902.
Bobadilla, J., Ortega, F., Hernando, A., Gutiérrez, A., 2013. Recommender systems
6. Conclusion survey. Knowl. Based Syst. 46, 109–132.
Broder, A.Z., 2008. Computational advertising and recommender systems. Proceedings of
This study was undertaken to explore the preparation of ad re- the 2008 ACM Conference on Recommender Systems. ACM Press, New York.
Burke, R., 2002. Hybrid recommender systems: survey and experiments. User Model User
commendation effectively beyond conventional targeting methods. In Adapted Interaction 12 (4), 331–370.
contrast to contextual targeting and behavioral targeting that consider Chen, G., Cox, J.H., Uluagac, A.S., Copeland, J.A., 2016. In-depth survey of digital ad-
individual user histories, ads that users click on are recommended to vertising technologies. IEEE Commun. Surveys Tutorials 18 (3), 2124–2148.
Cheng, Y., Church, G.M., 2000. Biclustering of expression data. In: Presented At The
each other. The second aim of this study was to design a re-
Intelligent Systems In Molecular Biology Conference, San Diego, Ca.
commendation framework that would be able to suggest effective ads to Cho, Y.H., Kim, J.K., Kim, S.H., 2002. A personalized recommender system based on web
cold users. For this purpose, users are clustered by their interests in- usage mining and decision tree induction. Expert Syst. Appl. 23 (3), 329–342.
stead of the ads they are interested in; therefore, users who have never Dave, K., Varma, V., 2014. Computational advertising: techniques for targeting relevant
Ads. Foundations Trends Inf. Retrieval 8 (4–5), 263–418.
clicked on ads (or have very few clicks) have also been successfully Doubleclick, 2016. Doubleclick By Google. Retrieved From Https://Www.
recommended with ads. Collaborative filtering methods were also ex- Doubleclickbygoogle.Com/.
amined and compared to the proposed method. The experimental re- Eirinaki, M., Vazirgiannis, M., 2003. Web mining for web personalization. ACM Trans.
Internet. Technol. (Toit) 3 (1), 1–27.
sults confirm that the proposed method will be useful especially on Emrouznejad, A., Marra, M., 2014. Ordered weighted averaging operators 1988–2014: a
systems where rich user navigational data are available. Moreover, the citation-based literature survey. Int. J. Intell. Syst. 29 (11), 994–1014.
proposed method achieves a similar performance to state-of-the-art Evans, D.S., 2009. The online advertising industry: economics, evolution, and privacy. J.
Econ. Perspect. 23 (3), 37–60.
methods in other cases. Fabricio, O., Ferreira, H.M., Von Zuben, F.J., 2007. Applying biclustering to perform
This study strengthens the idea that when user navigation history is collaborative filtering. Paper Presented at the Seventh International Conference on
known, it is possible to recommend ads collaboratively. Our findings Intelligent Systems Design and Applications. IEEE Computer Society Press, Los
Alamitos, Ca.
point out that the proposed method is valuable in navigational-history- Fernández-Tobías, I., Cantador, I., Kaminskas, M., Ricci, F., June 18–19, 2012. Cross-
rich datasets, and as effective as other state-of-the-art methods in other Domain Recommender Systems: A Survey of the State of the Art. Paper Presented at
datasets. the Spanish Conference on Information Retrieval, Escuela Técnica Superior De
Ingeniería Informática Of The Universidad Politécnica De Valencia.
We were only able to perform our work on a limited set of data, and
Gagolewski, M., Cena, A., 2014. Agop: Aggregation Operators Package For R. Retrieved
conducted our verifications offline. In future works, it is possible to from Http://Agop.Rexamine.Com/.
conduct an online evaluation of a larger dataset. Second, our proposed Gao, S., Luo, H., Chen, D., Li, S., Gallinari, P., Guo, J., September 10–14, 2013. Cross-
method does not calculate the demographic characteristics when user domain recommendation via cluster-level latent factor model. In: Paper Presented at
the Joint European Conference On Machine Learning And Knowledge Discovery In
profiling, but this would be possible by using a larger dataset. Databases, Skopje, Macedonia.
Furthermore, considering the impact of time decay when forming user Goldfarb, A., 2014. What is different about online advertising? Rev. Ind. Organ. 44 (2),
interests will result in robust user profiles. In addition, using the pro- 115–129.
Golemati, M., Katifori, A., Vassilakis, C., Lepouras, G., Halatsis, C., April 23–26, 2007.
posed method with different linguistic quantifiers may leverage pro- Creating an ontology for the user profile: method and applications. In: Proceedings of
blems such as long-tail ones. Finally, the proposed method may be the First International Conference on Research Challenges in Information Science,
evaluated on benchmark datasets such as MovieLens, BookCrossing, and Ouarzazate, Morocco.
Guo, G., Zhang, J., Sun, Z., Yorke-Smith, N., 2015. Librec: A java library for recommender
Last.fm against other algorithms—this would require changes on the systems. In: Paper Presented at the Umap Workshops.
method drastically; however, the performance of the method can be Hahsler, M., 2017. Recommenderlab: Lab for Developing and Testing Recommender
analyzed on different domains other than the ad domain. Algorithms. Retrieved from Https://Cran.R-Project.Org/Package=Recommenderlab.
Herlocker, J.L., Konstan, J.A., Riedl, J., 2000. Explaining collaborative filtering re-
commendations. In: Paper presented at the proceedings of the 2000 ACM Conference
Declaration of Competing Interest on Computer Supported Cooperative Work.
Herlocker, J.L., Konstan, J.A., Terveen, L.G., Riedl, J.T., 2004. Evaluating collaborative
filtering recommender systems. ACM Trans. Inf. Syst. 22 (1), 5–53.
The authors acknowledge no conflict of interest in this study.
Herrera, F., Herrera-Viedma, E., 1997. aggregation operators for linguistic weighted in-
formation. IEEE Trans. Syst. Man Cybernet. Part A: Syst. Hum. 27 (5), 646–656.
Acknowledgment Herrera, F., Herrera-Viedma, E., Verdegay, J., 1996. Direct approach processes in group
decision making using linguistic OWA operators. Fuzzy Sets Syst. 79 (2), 175–190.
Hoppe, A., Nicolle, C., Roxin, A., 2013. Automatic ontology-based user profile learning
This research did not receive any specific grant from funding from heterogeneous web resources in a big data context. Proc. Vldb Endowment 6
agencies in the public, commercial, or nonprofit sectors. This study (12), 1428–1433.
represents part of the Ph.D. thesis submitted by M. T. Yoldar to the Hu, Y., Koren, Y., Volinsky, C., 2008. Collaborative Filtering For Implicit Feedback
Datasets. Proceedings of the Eighth International Conference on Data Mining. IEEE
Department of Management Information Systems, Institute of Computer Society Press, Los Alamitos, Ca.
Informatics of Gazi University, Ankara, Turkey. Iab, 2018. Iab Internet Advertising Revenue Report. Retrieved From Https://Www.Iab.
Com/Wp-Content/Uploads/2018/05/Iab-2017-Full-Year-Internet-Advertising-
Revenue-Report.Rev_.Pdf.
References Ignatov, D.I., Kuznetsov, S.O., Poelmans, J., 2012. Concept-based biclustering for internet
advertisement. Proceedings of the Ieee 12th International Conference on Data Mining
Adeniyi, D., Wei, Z., Yongquan, Y., 2016. Automated web usage data mining and re- Workshops. IEEE Computer Society Press, Los Alamitos, Ca.
commendation system using K-nearest neighbor classification method. Appl. Comput. Kaiser, S., Santamaria, R., Khamiakova, T., Sill, M., Theron, R., Quintales, L., Leisch, F.,
Inf. 12 (1), 90–108. De Troyer, E., 2015. Biclust: Bicluster Algorithms. Retrieved From Https://Cran.R-
Adomavicius, G., Tuzhilin, A., 2005. Toward the next generation of recommender sys- Project.Org/Package=Biclust.
tems: a survey of the state-of-the-art and possible extensions. IEEE Trans. Knowledge Kant, S., Mahara, T., 2018. Nearest biclusters collaborative filtering framework with fu-
Data Eng. 17 (6), 734–749. sion. J. Comput. Sci. 25, 204–212.
Alqadah, F., Reddy, C.K., Hu, J., Alqadah, H.F., 2015. Biclustering neighborhood-based Kazienko, P., Adamski, M., 2007. Adrosa: adaptive personalization of web advertising.
collaborative filtering method for top-N recommender systems. Knowl. Inf. Syst. 44 Inf. Sci. 177 (11), 2269–2295.
(2), 475–491. Kluger, Y., Basri, R., Chang, J.T., Gerstein, M., 2003. Spectral biclustering of microarray
Anand, S.S., Mobasher, B., 2003. Intelligent techniques for web personalization. In: Paper data: coclustering genes and conditions. Genome Res. 13 (4), 703–716.
Presented at the Proceedings of the 2003 International Conference on Intelligent Kohavi, R., 1995. A study of cross-validation and bootstrap for accuracy estimation and
Techniques for Web Personalization. Xxx. model selection. In: Proceedings of the International Joint Conference on Artificial
Beliakov, G., Calvo, T., James, S., 2011. Aggregation of preferences in recommender Intelligence, San Francisco: Morgan Kaufmann.
systems. In: Ricci, F., Rokach, L., Shapira, B. (Eds.), Recommender Systems Koren, Y., Bell, R., 2015. Advances in collaborative filtering. In: Ricci, F., Rokach, L.,
16
Shapira, B., Kantor, P.B. (Eds.), Recommender Systems Handbook. Springer, Berlin- Raad, E., Chbeir, R., Dipanda, A., 2010. User profile matching in social networks.
Heidelberg, Germany, pp. 77–118. Proceedings of the 13th International Conference on Network-Based Information
Kosala, R., Blockeel, H., 2000. Web mining research: a survey. ACM Sigkdd Explorations Systems (Nbis). IEEE Computer Society Press, Washington, DC.
Newslett. 2 (1), 1–15. Rajan, S., 2017. The evolution of computational advertising. Proceedings of the ACM
Ckuhn, M., 2017. Caret: Classification and Regression Training. Retrieved From Https:// SIGIR International Conference on Theory of Information Retrieval. Acm Press, New
Cran.R-Project.Org/Package=Caret. York.
Lazzeroni, L., Owen, A., 2002. Plaid models for gene expression data. Statistica Sinica Rendle, S., Freudenthaler, C., Gantner, Z., Schmidt-Thieme, L., 2009. BPR: Bayesian
61–86. Personalized Ranking From Implicit Feedback. Proceedings of the Twenty-Fifth
Levene, M., 2011. An Introduction to Search Engines and Web Navigation. John Wiley, Conference on Uncertainty in Artificial Intelligence. Auai Press, Arlington, Va.
New York. Rosenkrans, G., 2007. Online Advertising Metrics: Vol. 15. Reference, Hershey, Pa: Idea
Li, B., 2011. Cross-domain collaborative filtering: a brief survey. Proceedings of the 2011 Group.
23rd IEEE International Conference on Tools with Artificial Intelligence. IEEE Saito, T., Rehmsmeier, M., 2015. The precision-recall plot is more informative than the
Computer Society Press, Los Alamitos, Ca. Roc plot when evaluating binary classifiers on imbalanced datasets. PLoS One 10 (3),
Li, B., Yang, Q., Xue, X., 2009a. Transfer learning for collaborative filtering via a rating- E0118432.
matrix generative model. Proceedings of the 26th Annual International Conference Sarwar, B., Karypis, G., Konstan, J., Riedl, J., 2001. Item-based collaborative filtering
on Machine Learning. ACM Press, New York. recommendation algorithms. Proceedings of the 10th International Conference on
Li, G., Ma, Q., Tang, H., Paterson, A.H., Xu, Y., 2009b. Qubic: a qualitative biclustering World Wide Web. Acm Press, New York.
algorithm for analyses of gene expression data. Nucl. Acids Res. 37 (15) E101-E101. Schafer, J.B., Frankowski, D., Herlocker, J., Sen, S., 2007. Collaborative filtering re-
Lü, L., Medo, M., Yeung, C.H., Zhang, Y.-C., Zhang, Z.-K., Zhou, T., 2012. Recommender commender systems. In: Brusilovski, P., Kobsa, A., Nejdl, W. (Eds.), The Adaptive
systems. Phys. Rep. 519 (1), 1–49. Web. Springer, Berlin-Heidelberg, Germany, pp. 291–324.
Markov, Z., Larose, D.T., 2007. Data Mining the Web: Uncovering Patterns in Web Schiaffino, S., Amandi, A., 2009. Intelligent user profiling. In: Bramer, M. (Ed.), Artificial
Content, Structure, and Usage. John Wiley, New York. Intelligence an International Perspective. Springer, Berlin-Heidelberg, Germany, pp.
Mele, I., 2013. Web usage mining for enhancing search-result delivery and helping users 193–216.
to find interesting web content. Proceedings of the Sixth ACM International Serin, A., Vingron, M., 2011. Debi: discovering differentially expressed biclusters using a
Conference on Web Search and Data Mining. ACM Press, New York. frequent itemset approach. Algorithms Mol. Biol. 6 (1), 18.
Mobasher, B., 2007. Data mining for web personalization. In: Brusilovski, P., Alfred, K., Shabalin, A.A., Weigman, V.J., Perou, C.M., Nobel, A.B., 2009. Finding large average
Wolfgang, N. (Eds.), The Adaptive Web. Berlin-Heidelberg, Germany, pp. 90–135. submatrices in high dimensional data. Annals Appl. Stat. 3 (3), 985–1012.
Mobasher, B., Cooley, R., Srivastava, J., 2000. Automatic personalization based on web Smartinsights, 2018. Average Display Advertising Clickthrough Rates. Retrieved From
usage mining. Commun. ACM 43 (8), 142–151. Https://Www.Smartinsights.Com/Internet-Advertising/Internet-Advertising-
Mobasher, B., Dai, H., Luo, T., Nakagawa, M., 2001. Improving the effectiveness of col- Analytics/Display-Advertising-Clickthrough-Rates/.
laborative filtering on anonymous web usage data. Proceedings of the IJCAI 2001 Soltysiak, S., Crabtree, I., 1998. Automatic learning of user profiles—towards the per-
Workshop on Intelligent Techniques for Web Personalization. AAAI Press, Menlo sonalisation of agent services. BT Technol. J. 16 (3), 110–117.
Park, Ca. Srivastava, J., Cooley, R., Deshpande, M., Tan, P.N., 2000. Web usage mining: discovery
Mobasher, B., Dai, H., Luo, T., Nakagawa, M., 2002. Discovery and evaluation of ag- and applications of usage patterns from web data. ACM Sigkdd Explorations
gregate usage profiles for web personalization. Data Mining Knowledge Discov. 6 (1), Newslett. 1 (2), 12–23.
61–82. Sugiyama, K., Hatano, K., Yoshikawa, M., 2004. Adaptive web search based on user
Mulvenna, M.D., Anand, S.S., Büchner, A.G., 2000. Personalization on the net using web profile constructed without any effort from users. Paper Presented at the Proceedings
mining: introduction. Commun. ACM 43 (8), 122–125. of the 13th International Conference on World Wide Web. Acm Press, New York.
Murali, T., Kasif, S., January 3–7, 2003. Extracting Conserved Gene Expression Motifs Symeonidis, P., Nanopoulos, A., Manolopoulos, Y., 2009. Moviexplain: A Recommender
From Gene Expression Data. In: Paper Presented At The Pacific Symposium On System With Explanations. Paper Presented at the Proceedings of the Third ACM
Biocomputing, Lihue, Hi. Conference On Recommender Systems. ACM Press, New York.
Nasraoui, O., Frigui, H., Krishnapuram, R., Joshi, A., 2000. Extracting web user profiles Symeonidis, P., Nanopoulos, A., Papadopoulos, A.N., Manolopoulos, Y., 2008. Nearest-
using relational competitive fuzzy clustering. Int. J. Artif. Intell. Tools 9 (04), biclusters collaborative filtering based on constant and coherent values. Inf. Retrieval
509–526. 11 (1), 51–75.
Nasraoui, O., Soliman, M., Saka, E., Badia, A., Germain, R., 2008. A web usage mining Tanay, A., Sharan, R., Shamir, R., 2002. Discovering statistically significant biclusters in
framework for mining evolving user profiles in dynamic web sites. IEEE Trans. gene expression data. Bioinformatics 18 (Suppl. 1), S136–S144.
knowledge Data Eng. 20 (2), 202–215. Varnagar, C.R., Madhak, N.N., Kodinariya, T.M., Rathod, J.N., February 21–22, 2013.
Ning, X., Karypis, G., 2011. Slim: sparse linear methods for top-N recommender systems. Web usage mining: a review on process, methods and techniques. In: Presented at the
Paper Presented At The 2011 Ieee 11th International Conference On Data Mining. International Conference On Information Communication And Embedded Systems,
Ieee Computer Society Press, Los Alamitos, Ca. Chennai, Tamil Nada, India.
Opendns, 2016. Opendns Domain Tagging. Retrieved From Https://Community.Opendns. Yager, R.R., 1988. On ordered weighted averaging aggregation operators in multicriteria
Com/Domaintagging/. decisionmaking. IEEE Trans. Syst. Man Cybernet. 18 (1), 183–190.
Padilha, V.A., Campello, R.J., 2017. A systematic comparative evaluation of biclustering Yager, R.R., 1993. Families of OWA operators. Fuzzy Sets Syst. 59 (2), 125–148.
techniques. BMC Bioinf. 18 (1), 55. Yager, R.R., Filev, D.P., 1999. Induced ordered weighted averaging operators. IEEE Trans.
Pan, R., Zhou, Y., Cao, B., Liu, N.N., Lukose, R., Scholz, M., Yang, Q., 2008. One-Class Syst. Man Cybernet. Part B (Cybernetics) 29 (2), 141–150.
Collaborative Filtering. IEEE Computer Society Press, Los Alamitos, Ca. Yan, J., Liu, N., Wang, G., Zhang, W., Jiang, Y., Chen, Z., 2009. How much can behavioral
Pazzani, M.J., 1999. A framework for collaborative, content-based and demographic fil- targeting help online advertising? Proceedings of the 18th International Conference
tering. Artif. Intell. Rev. 13 (5–6), 393–408. on World Wide Web. ACM Press, New York.
Prelić, A., Bleuler, S., Zimmermann, P., Wille, A., Bühlmann, P., Gruissem, W., Hennig, L., Zhang, D., Hsu, C.-H., Chen, M., Chen, Q., Xiong, N., Lloret, J., 2014. Cold-start re-
Thiele, L., Zitzler, E., 2006. A systematic comparison and evaluation of biclustering commendation using bi-clustering and fusion for large-scale social recommender
methods for gene expression data. Bioinformatics 22 (9), 1122–1129. systems. IEEE Trans. Emerging Top. Comput. 2 (2), 239–250.
Qiu, F., Cho, J., 2006. Automatic identification of user interest for personalized search. Zhen, Y., Li, W.-J., Yeung, D.-Y., 2009. Tagicofi: tag informed collaborative filtering.
Paper Presented at the Proceedings of the 15th International Conference on World Proceedings of the Third ACM Conference on Recommender Systems. ACM Press,
Wide Web. ACM Press, New York. New York.
17

Electronic Commerce Research and Applications: Mehmet Türkay Yoldar, U Ğur Özcan T

Uploaded by

Copyright:

Available Formats

You might also like

Electronic Commerce Research and Applications: Mehmet Türkay Yoldar, U Ğur Özcan T

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Electronic Commerce Research and Applications: Mehmet Türkay Yoldar, U Ğur Özcan T

Uploaded by

Copyright:

Available Formats

Electronic Commerce Research and Applications 35 (2019) 100857

Contents lists available at ScienceDirect

Electronic Commerce Research and Applications

Collaborative targeting: Biclustering-based online ad recommendation T

Few At least half As many as possible Most

A User ad matrix A matrix that contains CTR values.

Impression logs 1 254 961 72 Number of ad campaigns 356 1427

Categories 61 Unique categories retrieved from the domain service

Fig. 3. Outline of the proposed method.

Alex 0.20 0.03 0.11 0.44 0.21 Alex 0.528

Global Interest 0.18 0.04 0.28 0.35 0.16 1 0 0 0 0 0 0 0.157523 … 0

Fig. 4. Flowchart for selecting the top-N ads for a recommendation.

1 2 0 0.055125 → (reordering) 1 58 0.315536 0.015847

Table 14 Each element of the matrix is transformed based on a deﬁned

Threshold (1) 3.3.2 User Interest Matrix → Eq. (12) 1 1.5

Table 17  is a normalized vector of row

3.4. Finding user and interest groups 1

3.5.2. Selecting top-N ads to recommend

Table 21 4.4. Results of experiments

E1 – Entire Dataset (R) BPRMF (0.3692) SLIM (0.3650) 0.3306 −10%

You might also like