Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

Graph Convolutional Networks for Task-specific

Embeddings in Multimodal and Multi-relational Data

Ankit Gandhi, Anil R. Yelundur, Arijit Biswas, Vineet Chaoji


India Machine Learning
ganankit, yelundur, barijit, vchaoji@amazon.com

Abstract

There are billions of ASINs, hundreds of millions of customers and millions of


sellers interacting in a complex manner within the Amazon network. It is important
to semantically represent and understand these entities so that ML models involving
these can be improved. Since these entities create a multi-entity and multi-relational
graph structure, we should try to exploit the graph structure and learn better
representations of these entities. Recently, Graph Convolutional Networks (GCN)
have been effectively used to represent graph structured data. It has achieved
state-of-the-art results in various graph-level tasks. In this work, we exploit the
graph structure for various important business problems within Amazon, apply
GCN to learn embeddings of the corresponding entities and use them along with
other entity features to improve the model performance. Especially, we apply this
framework to three different problems – (i) identification of abusive entities (sellers
and reviewers) in the seller-reviewer-ASIN graph in a semi-supervised fashion, (ii)
predicting whether a transaction is going to be electronic or not given a combination
of seller, customer, and ASIN, (iii) predicting the return probability of an ASIN
sold by a seller. We obtain 32.14%, 4.75% and 1.73% relative improvement in
performance in these problems respectively.

1 Introduction
Various entities such as ASINs, sellers, and customers in Amazon constitute a graph, where
they interact in various ways. For example, customers can interact with ASINs through
click/view/purchase/rating/review etc. Sellers and ASINs have “in-stock” or “sell” relationships.
Customers often rate or write reviews about sellers. Graphs are an extremely powerful tool to capture
and represent such relationships in a seamless manner. The entities can be represented as nodes of a
graph and their relationships can be represented in terms of edges. This results in a multimodal1 and
multi-relational2 graph. In addition to the nodes and edges, each node can also be represented using a
set of historical features, characterizing the intrinsic properties of those nodes.
Recently, researchers have proposed various methods that are capable of learning from graph-
structured data [6, 9, 15, 21, 29, 30, 31, 7]. These methods have been shown to boost the performance
of many graph-level tasks such as node classification, link prediction, graph classification, sub-
graph classification, etc. One of the most promising approaches among these is the use of Graph
Convolutional Networks (GCNs)[15]. GCNs learn to aggregate feature information from local
graph neighborhoods using neural networks. The convolution operation in a GCN transforms and
combines feature information from a node’s one-hop graph neighborhood, and when multiple such
convolutions are stacked over each other, the node representation in-effect can capture information
1
Graph having different kinds of nodes (sellers, ASINs, customers, etc.)
2
Graph having multiple types of edges between nodes (purchase, click edge between customers and ASINs)

31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA.
from distant parts of the graph. Unlike content based deep and shallow models – RNNs, CNNs, etc.,
GCNs leverages both graph structure as well as content information (defined as initial node features)
for improved representation and modeling. In this paper, we extend GCNs on a multimodal and
multi-relational graph to learn task-specific node embeddings for different Amazon problems. We
apply GCNs on three important business applications (mentioned below) to capture the structural
information and show a significant impact on them.
(i) Identification of abusive reviewers and sellers in a semi-supervised setting: Amazon cus-
tomers rely significantly on product reviews and ratings while making buying decisions on e-
commerce platforms. Given their influence on customer spends, product reviews are a fertile
ground for abuse. A common form of abuse involves a seller running campaigns soliciting fake,
genuine-looking positive reviews, for their own products or fake negative reviews about their com-
petitors’ products. The paid reviewers have a varied modi operandi. They can either create their
own account and start posting paid reviews (fake reviews) or they can hijack an inactive account
in good standing to post seemingly innocuous reviews from that account. In order to maintain
customer’s trust, it is imperative for Amazon to detect abusive sellers and reviewers in its platform.
In this work, we use the known abusive sellers and reviewers from the past as partial supervision in
the seller-reviewer-ASIN graph to detect potential abusive entities. We show that the GCN based
approach is more generic and able to identify many other types of abuses as compared to tensor
decomposition based methods [28, 27] with improved performance.
(ii) Predicting whether a transaction is going to be electronic or not: In emerging markets such
as India, cash payment at the time of delivery is a preferred option for many customers. Although
cash-on-delivery (COD) provides convenience to customers, it drives up significant operational costs
for Amazon. Hence, as Amazon scales up within emerging marketplaces, it would like to transition
the COD customers to electronic payment methods[1]. One of the possible ways to do this is by
incentivizing customers. For example, Amazon could pay a flat cashback of 15$ for any visa card
transaction or 5% cashback up to 50$ for any master card transaction. However, it is not possible to
incentivize every customer for every transaction. Hence, we would like to decide intelligently when
to give an incentive to a customer. For example, if a transaction has a very high likelihood of making
an electronic transaction, we would not give an incentive or might pay a small amount. Hence, one
of the important ML problems for customer incentivization is to determine the probability of an
electronic transaction and use that to decide the incentive strategy. For this problem, we capture the
graph topology using GCNs in the customer-seller-ASIN graph and predict whether the transaction is
going to be electronic or not.
(iii) Predicting the return probability of an ASIN sold by a seller: In this problem, we predict the
probability of an ASIN sold by a particular seller being returned (returned due to a seller controllable
reason code). This is important for several use-cases at Amazon, for e.g., product quality identification
(poor quality, unsafe/expired, item not as described in the Amazon catalog), seller performance (late
shipment, poor packaging), seller abuse and fraud, etc. ASIN return probability is also used to
estimate Product and Seller Trust Scores [2]. In this work, we estimate the ASIN return probability
using GCNs on a seller-ASIN graph and show that it improves the production system performance,
which doesn’t capture any structural information [2].
Using GCNs, we observe a relative improvement of 32.14%, 4.75% and 1.73% for the above three
business problems respectively. The rest of the paper is organized as follows. Section 2 discusses
related works and some background regarding the problems discussed. Section 3 discusses the GCN
approach for multimodal and multi-relational graphs, feature details, graph construction process and
training details for different problems. In Section 4 we discuss the experimental results along with
the implementation details. We conclude the final remarks in Section 5.

2 Related Work
Learning Embeddings: In the past, there have been several works on learning entity embeddings
using deep neural networks. One of the very first work is word2vec [20]. At Amazon also, various
methods have been proposed to learn representations of different entities such as the seller, customer,
ASIN, Prime, Interest, etc. [24, 5, 4, 18]. However, these embeddings don’t capture the structural
information by modeling interactions among entities in a graph.
Graph Embeddings and GCNs: Neural networks on graphs distill high-dimensional information
about each node’s neighborhood into a dense vector. Recently, there has been a significant rise in the

2
methods that rely on the concept of graph convolutions. A number of works have been published in
different domains leading to state-of-the-art results on benchmarks such as node classification, link
prediction, and recommender systems [6, 9, 15, 21, 29, 30, 31, 7]. These methods have outperformed
the random-walk based graph embedding techniques [8, 22]. Our proposed GCN network is very
similar to the network used for identification of drug effects [31], however, we have extended it to a
multimodal and multi-relational graph as well. Although GCNs have been used in social networks,
knowledge graphs, recommendation engines, and computation biology, they have not yet been used
in modeling e-commerce graphs. To the best of our knowledge, this is the first time GCNs have been
applied to learn the representation of entities in a wide variety of e-commerce business problems.
Abuse in Amazon E-commerce: In Amazon e-commerce, the major driver of review related abuses
is the sellers. A common form of abuse is a seller running review campaign – soliciting fake good
reviews for their own products or fake bad reviews about competitors’ products. There has been a lot
of attention recently to address the issue of finding fake reviewers in online e-commerce platforms.
Jindal et al. [14] was one of the first to show that review spam exists and proposed simple text
based features to classify fake reviewers. Abusive reviewers have grown in sophistication ever since,
employing professional writing skills to avoid detection via text based techniques. To detect more
complex fake review patterns, researchers have proposed graph based approaches such as approximate
bipartite cores/lockstep behavior among reviewers [17, 3, 11, 13], network footprints of reviewers in
the reviewer product graph [25] and using anomalies in rating distribution [10]. Some recent research
has also shown the importance of time in identifying fake reviews because it is critical to produce
as many reviews as possible in a short period of time to be economically viable. Methods based on
time-series analysis [16, 26] have been proposed. Tensor based methods such as CrossSpot [12],
M-Zoom [23], MultiAspectForensics [19] have been proposed to identify dense blocks in tensors,
which can also be applied to our case of identifying fake reviews.
For detecting seller-reviewer collusions, the authors in AMLC papers [28, 27] formulate the problem
as detecting dense bipartite cores in the seller-reviewer graph that satisfy certain constraints. However
given the lack of labeled data but the availability of associated meta-data (like time stamps, ratings,
etc.), they first formulate the problem as an unsupervised tensor decomposition. Subsequently, they
propose a semi-supervised tensor decomposition technique that leverages training data collected thus
far. To account for correlations between different forms of abuse like reviews abuse and product
quality abuse, the authors propose extensions to incorporate multiple binary targets based on the
Logistic Model with Pólya-Gamma data augmentation. They have shown that their semi-supervised
approach beats the state of the art baselines in identifying abusive sellers on Amazon data sets.
However, their methodology intrinsically detects only lockstep behavior i.e., dense bipartite cores
between the entities aka paid reviewer abuse behavior as defined in PRMO (Paid Reviewer Modus
Operandi). Tensor decomposition based approaches are unable to detect other forms of abuse which
do not necessarily manifest as dense bipartite cores; such as VCAC (Veteran Customer Account
Compromised). GCN, on the other hand is not limited to detecting only those abuses that manifest as
dense bipartite cores but rather is able to detect more generic types of abuses such as VCAC, in a
semi-supervised setting. Our experimental results show that in a semi-supervised setting, GCN is at
par (w.r.t. AUC) with the semi-supervised tensor decomposition approach in detecting PRMO abuse
type. However, it significantly outperforms the semi-supervised tensor decomposition approach in
detecting VCAC abuse, an abuse type that does not necessarily manifest as dense bipartite cores. This
implies that the GCN approach has the capability in detecting many different types of abuses in a
semi-supervised setting.

3 Proposed Method
In this section, we describe GCNs for a multimodal and multi-relational graph. Our task is to
generate high-quality embeddings or representations of nodes for the above tasks by utilizing the
structural information as well as the side information available with each node. Using GCNs, we
build an end-to-end trainable model for different business problems. Subsequently, we also discuss
the feature details, graph construction process and the neural network architectural details for each of
the problems.
The idea behind GCNs is that it learns how to transform and propagate information across the graph,
captured by node feature vectors. To generate embedding for a node, it uses the localized convolutional
module that captures information from the node’s neighborhood. Every node’s convolutional module
is a different neural network architecture depending upon its neighborhood. But these modules share

3
Input Graph GCN Architecture for the graph
13
Layer 2
1 7 Layer 1
𝐖"# 𝐡"#
𝐖+"
12 2
2 𝐡"& 3 𝐱&
6 ∑ 3 ∑
1 𝐱"
3 11 4
ing s 𝐡"+
dd ou
be evi
10 Em m pr
fro er 𝐡"
lay
" 𝐖## 𝐡"-
4 𝐖,"
𝐡#" 1 6
5 14 6 𝐱-
Ag "
8 Em grega ∑ ∑ 7 𝐡.
ne bedd ted
∑ ∑
igh i
bo ng fro
rs
9 𝐱/
9
m 8
𝐡"'
𝐖&# 𝐡(' = 𝐱 '
𝐖-"
𝐡"""
3 Node Types: 11
∑ 𝐡""# 10 𝐱 "(
12 ∑
6 Relation 11 𝐱 ""
Types: 13
𝐡""&

Figure 1: Architectural details of 2-layer Graph Convolutional Network for a multimodal and multi-
relational graph. Left – A small example of a graph. The graph has three different node types (e.g.
sellers, customers, ASINs) and six different relation types (e.g., purchase, customers from same
geographical location, etc.). Right – The network architecture computing the embedding of node-1
(h21 ). In Layer-1, the network is expanded for node 8 only. Likewise, the network for other nodes
is also defined based upon their neighbors (shown with dotted arrows). Node-1 embedding (h21 ) is
computed using it’s embedding from the previous layer (h11 ) and the embeddings of it’s neighbors
(h12 , h13 , h14 , h16 , h17 , h18 , h111 , h112 , h113 ) from the previous layer. Best viewed in color.

the same set of parameters across all the nodes. This makes the parameter complexity of GCNs
independent of the graph size. Hence, for the new/unseen nodes also, we can get the embeddings
directly by applying the learned convolutional modules (we don’t have to train the network again!).
In GCNs, we stack multiple such convolutional modules to capture information about the graph
topology. Each time we apply the convolutional module, we get a new representation for a node,
and when we stack multiple such convolutions, we gain more information about the graph structure
around that node. Initial node features (side information/node properties) are provided as input to the
GCN and then, the node embeddings are computed by applying the series of convolutional modules.
This is shown pictorially in Figure 1. For multi-relational case, we learn a separate convolutional
module for each relation type. The similar model for GCNs has also been used for detecting drug
side-effects in [31].
Let G = {V, R} denotes the graph with a set of nodes vi ∈ V (e.g., sellers, customers, ASINs, etc.)
and relations (vi , r, vj ) ∈ R, where r is the edge type between nodes vi and vj (e.g., purchase, view,
click, etc.). In addition, let x1 , x2 , . . . , xn denotes the side information assigned to every node in the
graph, defined by real-valued feature vectors. Without loss of generality, the above business problems
can be mapped to either node classification or link (relation) prediction problem in the graph. For
node classification, our task is to predict the label of nodes in the test set (Ntest ), given the label
of few nodes in the train set (Ntrain ). For link prediction, the task is to predict edges/relations r
between different pair of nodes. Let Etrain and Etest denotes the set of training and testing edges
respectively. With this aim, we use a non-linear, multi-layer GCN model that operates directly on
graph G. A GCN operating on graph G (Figure1) firstly produces the embeddings for nodes and then,
use the embeddings for modeling node classification and link prediction tasks.
We describe the embedding generation process and classification procedures of GCN in Algorithm 1.
The parameters that we learn in the model are – the weight parameters for each convolutional layer
per relation (Wkr , ∀k, ∀r) and the parameters of the final dense layer neural network for doing the
classification (W or Wr , ∀r ). We learn these parameters by minimizing the cross-entropy loss on

4
Algorithm 1 Embedding generation and classification using GCNs
Input : G = {V, R}; x1 , x2 , . . . , xn − input features; depth K;
Nri −set of neighbors of node vi under relation r; Ntrain or Etrain - set of training labels
φ – a non-linear function
Outputs: Ntest or Etest - set of testing labels
1: procedure E MBEDDING G ENERATION
2: h0v ← xv , ∀v ∈ V . initializing with node features
3: for k = 1 . . . K do . K depth neural network
4: for v ∈ V do P P . iterating over all nodes
5: hk+1
v = φ( r n
j∈Nri r
ij
W k k
h
r j + n k k
h
i i ), where . Updating node v embedding
1 1
nr = √ i j and ni = √ i – normalization constants based upon neighborhood size
ij k
|Nr ||Nr | |Nr |
k k k
6: hv ← hv /||hv ||2 , ∀v ∈V
7: zv = hK
v , ∀v ∈ V . zv is the final learned embedding of node v
8: procedure C LASSIFICATION
9: if Node Classification then
10: li = sof tmax(WT zi ), ∀i ∈ Ntest . Predicting node labels
11: else if Link Prediction then
12: lrij = sof tmax(WTr concat[zi , zi ]), ∀(i, j, r) ∈ Etest . Predicting edge labels

training set – Ntrain or Etrain (labeled nodes and labeled edges). We generalize the Algorithm 1 to
the minibatch setting and learn the parameters.

3.1 Abusive Sellers and Reviewers Detection

Graph Construction and Training: For this problem, we use review rating and time of the rating
(provided by reviewers) to construct the graph. We build a tripartite graph containing reviewers,
sellers, and ASINs. The relation/edge between these nodes in the graph is defined by the review rating
and the time of the rating (discretized into bins). There will be Nr × Nt relation types in the graph
where Nr is the number of review ratings and Nt is the number of time bins. For a particular review,
one of the relations out of Nr will be active between nodes based upon the rating and similarly, one of
the relations out of Nt will be active for time of the rating. For e.g., if reviewer ‘r’ has given a rating
of 5 at time 10 (discretized) to seller ‘s’ for an ASIN ‘a’, then (i) an edge will be added between ‘r’
and ‘a’ for rating relation 1 and time relation 10, (ii) an edge will be added between ‘r’ and ‘s’ for
rating relation 1 and time relation 10. We don’t add an edge between ‘s’ and ‘a’ as it will be indirectly
captured because of transitivity. The graph structure is shown in Figure 2. The small set of abusers
(labeled nodes) are known to us from the currently known reviewers and/or sellers. The nodes that
are labeled with no-abuse are a subset of nodes that have a very low score from the unsupervised
model from [28] as well as are currently not labeled with any form of abuse. Our task here is to label
other nodes in the graph, and identify the potential abusers using GCNs.
Node Features: The input to the GCN are the initial node features for each node (x1 , x2 , . . . , xn ),
where the nodes correspond to the reviewers, ASINs and sellers. We obtain initial node features from
the output of the unsupervised Logistic CP tensor decomposition model from [28]. For example, the
node feature for seller i is row i from the factor matrix that corresponds to the seller mode. The node
features have abstract representation/embedding of how likely they belong to some dense bipartite
cores. We observed that seeding the GCN model with such node features that encapsulates their
overall clustering behavior is much more efficient than using random seeds or one-hot encoding.

3.2 Predicting whether a transaction is electronic or not

Graph Construction and Training: In this problem, we use the historical transactions of customers
to construct the graph. Here also, we construct the tripartite graph – comprising of sellers, customers,
and ASINs. The relation/edge between these nodes is defined by the historical transactions. We
consider only one type of relation in the graph based upon the customer’s purchase. For e.g. if
customer ‘c’ has purchased an ASIN ‘a’ from a seller ‘s’, then (i) an edge will be added between ‘c’
and ‘a’ for relation purchase, (ii) an edge will be added between ‘c’ and ‘s’ for relation purchase. This

5
Figure 2: The graph structure for the three different business problems on which GCNs is applied.

graph structure is shown in Figure 2. For all these transactions (customer purchases), the payment
method is known to us – either electronic or COD. Given a tuple <customer, ASIN, seller>, we train
GCNs to predict whether the transaction is going to be electronic or COD using the historical data
(line 11-12 in Algorithm 1). The classification procedure in GCNs concatenates the learned customer,
seller and ASIN embeddings to predict whether the transaction will be electronic or not.
Node Features: We initialize the customer nodes by a set of features that are derived from their
various historical attributes and characteristics. The feature set includes information such as total
purchases, rejection of orders, electronic payment rate, return of products, over trailing one month,
three months, and six months from a particular date. Other than that, the feature set also captures
whether a customer has registered instruments for electronic transactions and the total balance in her
wallet. We use a 78 dimensional feature to capture all of the above. For ASIN and seller nodes, we
use a feature set capturing payment success rate and electronic payment rate of ASIN/seller over
trailing 15 days, one month, three months, and six months from a particular date. The seller and
ASIN node each have initial features of dimension 8.

3.3 Predicting the return probability of an ASIN sold by a seller

Graph Construction and Training: We construct the graph in this problem using the ASINs offered
by the sellers. The customers are not considered in this problem. We construct a bipartite graph
comprising of sellers and ASINs. An edge is connected between a seller node and an ASIN node if
that ASIN is offered by the seller. The graph constructed is shown in Figure 2. Our task here is to
predict the return probability of an ASIN sold by a seller. We use the historical data of offers (<seller,
ASIN> tuple) that was returned/not-returned to train our GCN network. Once the network is trained,
using the seller and ASIN embeddings, it can estimate the return probability of any offer <seller,
ASIN>.
Node Features: We use different seller attributes to initialize the seller nodes in the graph – seller
rating, seller feedback, tenure, location, past returns and concessions, seller suspension history,
current offers, units sold, GMS over different trailing duration and seller performance metrics. For
ASIN nodes, we use attributes like title, description, bullets, rating, review text, image quality, brand,
category, GL, historic return rate, and availability history to create a feature vector. Along with the
seller and ASIN features, we also have few offer levels (combination of the seller and ASIN) features
– ASIN price, days since available, fulfillment channel, price competitiveness. We concatenate these
offer level features also just before the classification layer (Line 11,12 in Algorithm 1) to do the
prediction.

6
(a) Seller Abuse (b) Reviewer PRMO Abuse (c) Reviewer VCAC Abuse

Figure 3: Comparison of ROC curves between GCN and semi-supervised tensor decomposition
technique [27, 28] for different abuse types)

In this work, we have modeled the Amazon environment as a set of nodes in two or three different
disjoint sets (bipartite graph or tripartite graph), however, the approach can easily be extended to
others graphs also.
4 Experimental Results
This section outlines our experiments to validate the efficacy of GCNs for different business problems
within Amazon. We present results for the three business problems discussed in the paper and compare
them with the performance of state-of-the-art systems in Section 4.1, 4.2 and 4.3 respectively. In
Section 4.4, we discuss the training details of GCNs and the other implementation details.

4.1 Abusive Sellers and Reviewers Detection

Seller Dataset: The input data corresponds to Amazon review data from May, 2017 to October, 2017
for US marketplace. Each row of this data indicates an association between the three entities (seller,
reviewer, ASIN) in the form of a review as reviewer ID, ASIN, seller ID, rating and time. Note
that rating corresponds to an integer between 1 to 5. Time is converted to a week index. This data
consists of roughly 1.25 million rows with 25, 218 unique reviewers, 476, 518 unique ASINs and
74, 745 unique sellers. We use this data to construct the graph in Figure 2. For labeled data, we have
1482 sellers that are known apriori to be guilty of review abuse i.e., positive set. To this, we add an
additional 2446 sellers that currently have no abuse associated and also have the lowest scores from
the unsupervised Logistic CP tensor decomposition model from [28] i.e., negative set.
Reviewer Dataset: This corresponds to Amazon review data from September, 2017 to November,
2017 for US marketplace. This data consists of roughly 3.5 million rows with 92, 199 unique
reviewers and 1, 472, 641 unique ASINs. This data is used to construct the graph. For labeled data,
we have considered two kinds of abuses, (i) PRMO abuse: 8132 reviewers labeled as positive and
21, 197 reviewers labeled as negative, (ii) VCAC abuse: 3549 reviewers labeled as positive and
15, 030 reviewers labeled as negative.
In both the datasets, we divide the labeled data randomly into two sets, namely, train (80%) and
test (20%). All performance metrics are reported on the test data. Figure 3 shows the performance
of GCNs as compared to semi-supervised Tensor Decomposition methods [27, 28] for abusive
sellers and reviewers detection. As it can be seen in the plots, GCNs perform as good as Tensor
Decomposition methods for identifying abusive sellers and PRMO abusive reviewers. However, in
the case of VCAC abusive reviewers, GCNs significantly outperforms the semi-supervised tensor
decomposition method, achieving an AUC of 0.74 as compared to 0.56 (relative improvement of
∼ 32% in AUC). This happens because the methods in [27, 28] are restricted to identifying dense
cores in the bipartite graph and are not able to detect abuses other than lockstep behavior, their
performance is just better than the performance of a random classifier for identifying VCAC abuse.

4.2 Predicting whether a transaction is electronic or not

Dataset: For this problem, the input data corresponds to all transactions from 15th-28th, July’18
for IN marketplace. Each record in this data indicates an association between customer, seller, and
ASIN. There are approximately 1.78 million records with 1.23 million unique customers, 36, 766
unique sellers and 500, 595 unique ASINs. We use this data to construct the customer-seller-ASIN

7
graph. For each of these records, we also have labels – whether a transaction is electronic or COD.
For evaluating the performance on this task, we use all transactions from 29th-30th July’18. The
evaluation set has around 94, 373 records. The goal is to accurately predict whether the transaction
will be electronic or not for each of the records in this set.
The baseline model that we use for this task is the XGBoost classifier with all the features defined in
Section 3.2. In this model, for a <customer, seller, ASIN> tuple, we concatenate all their features and
train a classifier to predict whether a transaction involving these three entities will be electronic or
not. We achieve an AUC of 0.802 using this model. Whereas, when we model the problem using
GCNs and capture the structural information, we achieve an AUC of 0.842 (relative increase of
4.98% in the performance). Note that the seller, customer, and ASIN nodes in the GCN model are
initialized with the same set of features that are used in the XGBoost classifier. This shows that
leveraging structural information and graph topology among these entities helps in learning better
representations and hence, results in improved performance.

4.3 Predicting the return probability of an ASIN sold by a seller

Dataset: The input data corresponds to all the offers (<ASIN, seller> combination) sold in one week
for IN marketplace. Each record in the data denotes an association between seller and ASIN. The
data has 3.3 million records with 862,739 unique ASINs and 40,322 unique sellers. We construct the
ASIN-seller graph using these records (see Figure 2). For each of these records, we also have labels
associated with them – whether an ASIN has been returned due to seller controllable reason code or
not. Our task is to predict this label for out-of-time (future) offers. Our test dataset has 0.76 million
records. We do not consider any customer information in this problem to do a fair comparison with
the production system which doesn’t use any customer attributes [2].
The baseline/production model that is used for this task is the XGBoost model trained on ASIN, seller,
and combination of ASIN-seller attributes. In this approach, a model is trained to predict the return
probability of ASIN at an offer level by combining all these features. An AUC of 0.702 is achieved
using this technique on the test set. However, when we model the problem with GCNs using the
same set of features, we achieve an AUC of 0.714 (relative improvement of 1.7%). We initialize the
seller and ASIN nodes in the graph with the ASIN and seller attributes feature respectively. Features
from ASIN-Seller combination are concatenated before the classification layer in GCNs along with
the embeddings of ASIN and seller to predict the return probability of the ASIN. Note that this is an
extremely challenging problem and even a small improvement in performance is significant.

4.4 Implementation Details

For all the problems, we used a two-layer GCN network containing 64 and 32 hidden units respectively.
We set the learning rate to 0.001 and the minibatch size to 512. For electronic transaction and return
probability prediction problem, we train the GCN network for 1 epoch (the loss function value
saturates within 1 epoch only). For abuse detection, since the labeled examples are very few, we train
the GCN networks alternately to predict the labeled nodes (minimize the cross entropy cross for
labeled nodes) and to predict the input edges (minimize the cross entropy loss for link prediction).
For every 10 iterations of training for input edges, the network is trained for a single iteration for
labeled nodes. Empirically, we observe that this way of alternate training acts as a good regularizer
for the node labeling task and helps in avoid overfitting. In total, we train the GCN network for 1500
iterations for abuse detection. The code is implemented in tensorflow and it takes around ∼ 16 hours
to train a network for 1500 iterations on a m4.4xlarge EC2 instance . However, the training time can
be improved significantly if we train it on GPUs.

5 Conclusion
In this work, we apply GCNs on a multimodal and multi-relational Amazon graph data. We show
that performance of various business problems can be improved significantly when the structural
information is encoded while learning representations for different entities. In future, we would like
to apply this formulation to other business problems and especially for abuse detection, we would
like to explore – (a) the signatures model is picking up for VCAC to classify the reviewers as abusive,
(b) extracting dense bipartite cores for collusion detection from the node embeddings., (c) applying
model on other kind of abuses such as sock puppets, (d) including other input node features with
node specific attributes such as customer clusters, concessions, etc.

8
References
[1] https://w.amazon.com/index.php/EPR%20Machine%20Learning.
[2] https://w.amazon.com/index.php/InternationalCountryExpansion/A2I/
SellerExperience/MarketPlaceTrust/TrustScore.
[3] A. Beutal, W. Xu, V. Guruswami, C. Palow, and C. Faloutsos. CopyCatch: stopping group at-
tacks by spotting lockstep behavior in social networks. In Proceedings of the 22nd International
Conference on World Wide Web, pages 119–130, 2013.
[4] Arijit Biswas and Subhajit Sanyal. Cust2vec: A multi-task recurrent neural network for learning
customer embeddings. AMLC, 2017.
[5] Arijit Biswas and Subhajit Sanyal. Seller2vec: A multi-modal & multi-task deep neural network
for seller representations. AMLC, 2018.
[6] Michael M. Bronstein, Joan Bruna, Yann LeCun, Arthur Szlam, and Pierre Vandergheynst.
Geometric deep learning: going beyond euclidean data. CoRR, 2016.
[7] Mahsa Ghorbani, Mahdieh Soleymani Baghshah, and Hamid R. Rabiee. Multi-layered graph
embedding with graph convolutional networks. CoRR, 2018.
[8] Aditya Grover and Jure Leskovec. node2vec: Scalable feature learning for networks. CoRR,
2016.
[9] William L. Hamilton, Rex Ying, and Jure Leskovec. Representation learning on graphs: Methods
and applications. CoRR, 2017.
[10] B. Hooi, N. Shah, A. Beutal, S. Gunneman, L. Akoglu, M. Kumar, D. Makhija, and C. Faloutsos.
BIRDNEST: Bayesian inference for ratings-fraud detection. In SIAM International Conference
on Data Mining (SDM), 2016.
[11] B. Hooi, H. Song, A. Beutal, N. Shah, K. Shin, and C. Faloutsos. Frauder: Bounding graph
fraud in the face of camouflage. In Proceedings of the 22nd ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining, pages 895–904, 2016.
[12] M. Jiang, A. Beutal, P. Cui, B. Hooi, S. Yang, and C. Faloutsos. A general suspiciousness
metric for dense blocks in multimodal data. In IEEE International Conference on Data Mining
(ICDM), pages 781–786, 2015.
[13] M. Jiang, P. Cui, A. Beutal, C. Faloutsos, and S. Yang. Inferring lockstep behavior from
connectivity pattern in large graphs. Knowledge and Information Systems, 48(2):399–428, 2015.
[14] N. Jindal and Liu. B. Opinion spam and analysis. In Proceedings of the 2008 International
Conference on Web Search and Data Mining, pages 219–230, 2008.
[15] Thomas N. Kipf and Max Welling. Semi-supervised classification with graph convolutional
networks. CoRR, 2016.
[16] H. Li, Z. Chen, A. Mukherjee, B. Liu, and J. Shao. Analyzing and detecting opinion spam on a
large-scale dataset via temporal and spatial patterns. In Proceedings of the 9th International
AAAI Conference on Web and Social Media (ICWSM), pages 26–29, 2015.
[17] Y. Li, O. Matrinez, X. Chen, and J. E. Hopcroft. In a world that counts: Clustering and detecting
fake social engagement at scale. In Proceedings of the 25th International Conference on World
Wide Web, pages 111–120, 2016.
[18] Ben London and Hyokun Yunand. Product2vec: A multi-task learning framework for cold-start
product recommendation. AMLC, 2016.
[19] K. Maruhashi, F. Guo, and C. Faloutsos. MultiAspectForensics: Pattern mining on large-scale
heterogenous networls with tensor analysis. In Advances in Social Networks Analysis and
Mining (ASONAM), pages 203–210, 2011.

9
[20] Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of word
representations in vector space. CoRR, 2013.
[21] Federico Monti, Michael M. Bronstein, and Xavier Bresson. Geometric matrix completion with
recurrent multi-graph neural networks. CoRR, 2017.
[22] Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. Deepwalk: Online learning of social
representations. CoRR, 2014.
[23] K. Shin, B. Hooi, and C. Faloutsos. M-zoom: fast dense-block detection in tensors with quality
quarantees. In Joint European Conference on Machine Learning and Knowledge Discovery in
Databases (ECML/PKDD), pages 264–280, 2016.
[24] Jessie (Zheshen) Wang, Ted Sandler, and Siwei Jia andJason Yang Wi. Prime2vec: Learning a
shared representation of customers using multi-task deep neural network. AMLC, 2018.
[25] J. Ye and L. Akoglu. Discovering opinion spammer groups by network footprints. In Joint
European Conference on Machine Learning and Knowledge Discovery in Databases, pages
267–282, 2015.
[26] J. Ye, S. Kumar, and L. Akoglu. Temporal opinion spam detection by multivariate indicative
signals. In International AAAI Conference on Web and Social Media, pages 743–746, 2016.
[27] Anil R. Yelundur, Srinivasan H. Sengamedu, and Andrea Effgen. Detection of seller-reviewer
collusion using multi-target tensor decomposition. AMLC, 2017.
[28] Anil R. Yelundur, Srinivasan H. Sengamedu, Bamdev Mishra, Jagdeep Pani, Andrea Effgen,
Michael Smyth, and Kim Wilber. Bayesian semi-supervised tensor decomposition using natural
gradients for collusion detection. AMLC, 2018.
[29] Rex Ying, Ruining He, Kaifeng Chen, Pong Eksombatchai, William L. Hamilton, and Jure
Leskovec. Graph convolutional neural networks for web-scale recommender systems. CoRR,
2018.
[30] Jiaxuan You, Rex Ying, Xiang Ren, William L. Hamilton, and Jure Leskovec. Graphrnn: A
deep generative model for graphs. CoRR, 2018.
[31] Marinka Zitnik, Monica Agrawal, and Jure Leskovec. Modeling polypharmacy side effects with
graph convolutional networks. CoRR, 2018.

10

You might also like