Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

Decision Support Systems 56 (2013) 211–222

Contents lists available at ScienceDirect

Decision Support Systems


journal homepage: www.elsevier.com/locate/dss

Capturing the essence of word-of-mouth for social commerce: Assessing the


quality of online e-commerce reviews by a semi-supervised approach
Xiaolin Zheng a,⁎, Shuai Zhu a, Zhangxi Lin b,c
a
College of Computer Science, Zhejiang University, No. 38, Zheda Road, Hangzhou 310027, China
b
Center for Advanced Analytics and Business Intelligence, Texas Tech University, Lubbock, TX 79409-2101, USA
c
Key Lab of Financial Intelligence and Financial Engineering, Southwestern University of Finance and Economics, Chengdu, Sichuan, 611130, China

a r t i c l e i n f o a b s t r a c t

Article history: In e-commerce, online product reviews significantly influence the purchase decisions of buyers and the market-
Received 28 April 2012 ing strategies employed by vendors. However, the abundance of reviews and their uneven quality make
Received in revised form 28 April 2013 distinguishing between useful and useless reviews difficult for potential customers, thereby diminishing the ben-
Accepted 9 June 2013
efits of online review systems. To address this problem, we develop a semi-supervised system called Online Re-
Available online 15 June 2013
view Quality Mining (ORQM). Embedded with independent component analysis and semi-supervised ensemble
Keywords:
learning, ORQM exploits two opportunities: the improvement of classification performance through the use of a
Online review few labeled instances and numerous unlabeled instances, and the effectiveness of the social characteristics of
Review quality e-commerce communities as identifiers of influential reviewers who write high-quality reviews. Three comple-
Review mining mentary experiments on datasets from Amazon.com show that ORQM exhibits remarkably higher performance
Semi-supervised learning in classifying reviews of different quality levels than do other well-accepted state-of-the-art text mining
Social network methods. The high performance of ORQM is also consistent and stable even under limited availability of labeled
instances, thereby outperforming other baseline methods. The experiments also reveal that (1) the social fea-
tures of reviewers are important in deriving better classification results; (2) classification results are affected
by product type given the different purchase habits of consumers; and (3) reviews are contingent on the inher-
ent nature of products, such as whether they are search goods or experience goods, and digital products or phys-
ical products, through which purchase decisions are influenced.
© 2013 Elsevier B.V. All rights reserved.

1. Introduction indicate the advent of social shopping and social commerce. The for-
mer is based primarily on word-of-mouth in electronic marketing,
Empowered by cutting-edge social media, internet-based social and the latter is referred to as the retail and referral network of indi-
communities have significantly contributed to the success of online vidual sellers/shops for products sold online. Both are underpinned
businesses [1]. Current online retail commerce is characterized by by advanced online social media.
many-to-many transactions, instead of the traditional one-to-one re- E-commerce communities have given rise to a substantial vol-
lationship between sellers and buyers, closely tying the parties in- ume of consumer-generated information, including online reviews
volved in various social networks that center on a given product. of products or sellers, online transaction ratings, and scores on the
That is, sellers form alliances through links to products for brand sell- different criteria provided by the electronic market. Such informa-
ing, buyers share shopping experiences in online virtual communi- tion exerts a huge influence on the evolution of e-commerce. The
ties, and these two types of networks interact with each other, influx of user-generated contents reflects the wide acceptance of
thereby generating positive propensity towards a more effective mar- key Web 2.0 characteristics, such as user-centered design and infor-
ket. E-commerce oriented communities have thus far remained pop- mation sharing. User-generated contents, typically product reviews,
ular given that they are constantly updated with enriched features can be exploited through econometrics and data mining analysis
and services, enabling buyers to enjoy informative shopping facilities [2]. With the help of a modified regression model, retailers can
and retailers to build strong customer loyalty. Such communities have also predict the effectiveness of reviews in sales generation. Deter-
also considerably improved market efficiency. All these advantages mining consumers' true opinions from reviews of existing product
improvements or new product developments is helpful for manu-
facturers. Online review analysis can help retailers implement
⁎ Corresponding author. Tel.: +86 57187951453; fax: +86 571 87951453.
targeted marketing, sales prediction, and customer relationship
E-mail addresses: xlzheng@zju.edu.cn (X. Zheng), zscs@zju.edu.cn (S. Zhu), management. From a consumers' perspective, effective and authen-
zhangxi.lin@ttu.edu (Z. Lin). tic reviews have become invaluable sources of shared opinions

0167-9236/$ – see front matter © 2013 Elsevier B.V. All rights reserved.
http://dx.doi.org/10.1016/j.dss.2013.06.002
212 X. Zheng et al. / Decision Support Systems 56 (2013) 211–222

about products; these opinions have a guiding effect on purchase by product type. All the methods exhibit a higher performance in the
decisions. IT product datasets than in the cultural product datasets.
The abundance of online reviews causes two problems: informa-
tion overload and quality discrepancy. In particular, the quality of re- The rest of the paper is organized as follows. Section 2 provides a
views dramatically varies, from very helpful to useless and even background to quality mining and presents the literature review.
spam-like, diminishing the benefits gained from online reviews. Section 3 discusses the ORQM system. Section 4 describes the exper-
From a buyer's perspective, therefore, low-quality reviews pose diffi- iments on the Amazon.com online review datasets, including details
culties in comprehensively evaluating product quality. Specifically, on the performance evaluation, model validation, and feature selec-
negative evaluations from credible reviewers tend to give rise to tion. The conclusions and future research directions are presented in
herd mentality [3]. In establishing market efficiency and providing Section 5.
benefits to retailers and consumers, the aforementioned issues
make identifying true buyer opinions a critical yet challenging task. 2. Related work
To evaluate the quality of online reviews, researchers have devot-
ed considerable effort to developing text analytics methods, including Knowledge discovery from online reviews (i.e., review mining) is
review quality classification [4,5] and spam detection [6]. These an interdisciplinary research area that features econometric analysis,
methods employ large-scale training datasets to build classification consumer psychological modeling, statistical linguistics, natural lan-
or prediction models, with data labeling as one of the major tasks in guage processing, opinion mining, and machine learning [9]. It has re-
data pre-processing. However, manually labeling a large dataset in- ceived much attention from researchers in economics, management,
curs high costs, and is impractical when applied in industry given behavioral science, psychology, computer science, and sociology.
the diversity of reviews and the numerous factors that influence In the early stage of review mining research, efforts were devoted
product evaluations. In addition, a single model may exhibit unstable to identifying the polarity of reviews (positive or negative) [10]. Par-
performance because of lack of generalizability. The major drawback ticular attention was later paid to determining the influence of prod-
of these methods is that they do not fully use the social characteristics uct reviews on the purchase intentions of consumers from the
of e-commerce communities, which are distinct attributes of ad- perspectives of marketing and sociology. For example, Lee et al. [3] in-
vanced e-commerce. Thus far, many data mining techniques for vestigated the conformity effect of negative reviews on marketing.
knowledge discovery from online reviews have presented unstable Econometrists and management experts studied the economic value
or poor performance when applied to actual situations. of reviews and determined consumer needs. Ghose et al. [11] estimat-
On the basis of the discussion above, we argue that the current im- ed the feature weights of reviews and predicted the sales fluctuations
perfections in review quality classification, which is the preliminary influenced by different features. Lee et al. [12] used a combination of
task in further data analysis, can be rectified by a semi-supervised association rule mining and graph analysis to accurately identify cus-
classification approach [7]. This branch of machine learning has tomer needs.
gained increasing popularity because of its practical value and versa- Despite the contributions of these studies, the robustness and re-
tile performance. The most critical advantage of semi-supervised clas- liability of their results are undermined by the insufficient focus on
sification is that it requires training datasets that contain only a few review quality. Given the popularity of online review systems and
labeled instances but numerous unlabeled ones. The general idea of the abundance of review information, review quality has emerged
this method is to train a classifier from a dataset that contains both la- as an important research issue in social shopping and social com-
beled and unlabeled records, instead of training one with only labeled merce. User ratings generally demonstrate an unbalanced distribu-
records. Prior experiments show that, with a sufficiently accurate tion [13], a bias further reflected as the effects of certain factors on
classifier, semi-supervised learning enables highly accurate classifica- project reviews [14]. Therefore, Pipino et al. [15] conducted subjective
tion [8]. Inspired by these outcomes, we design and implement the and objective assessments of data quality, in conjunction with three
Online Review Quality Mining System (ORQM) based on the functional forms of objective metrics, to investigate distribution pat-
semi-supervised classification approach for review quality classifica- terns. To enhance the efficiency of this approach, researchers directed
tion. ORQM is also reinforced with a Co-EM version of the ensemble more focus toward feature selection for product quality assessments
selection method [7] to optimize the accuracy of classification and based on online reviews. Social features, such as trust features in so-
the independent component analysis (ICA)-based method for cial networks, improve the accuracy of predicting review quality
pre-processing mapped features. Specifically, ORQM incorporates [16]. Users constantly search for helpful reviews that enable them to
the comprehensive social features of reviewers, thereby enabling more efficiently and precisely make decisions, giving rise to the pop-
consumers to take advantage of potentially high-quality reviews ularity of helpfulness ratings, an extended feature of online reviews
from influential evaluators. The experiments on Amazon.com review [17]. Moghaddam et al. [18] suggested that rating the helpfulness of
datasets, which contain the information on physical and digital prod- online reviews be personalized because utility is a subjective concept.
ucts in four categories and on both search goods and experience Liu et al. [19] found that the helpfulness ratings assigned by regular
goods, indicate: users differ from those provided by sellers, who design specific fea-
tures for detecting helpful reviews on the basis of a retailers perspec-
1. The performance of ORQM is superior to that of the most popular su- tive. Other relevant studies emphasized the detection of spam
pervised methods in terms of seven extensively applied metrics (ac- reviews and untrustworthy members. Hu et al. [20] empirically
curacy (ACC), F-score, receiver operating characteristic curve (AUC), found that review abuse occurs in Amazon.com and Barnes Noble,
average precision (APR), root mean square error (RMSE), mean and Ku et al. [21] proposed a method for distinguishing members in
cross-entropy (MXE), and mean) [10]. Specifically, ICA-based opinion-sharing communities for products; the distinction is
pre-processing further improves and stabilizes the performance of conducted on the basis of member reviews and trust networks [20].
ORQM. Most of these methods rely on large manually labeled training
2. ORQM works more effectively than do baseline methods even datasets, making them time consuming and less scalable. For example,
under a very small number of labeled samples. Adding more la- Godfrey et al. [22]reported that human experts have to spend as many
beled samples steadily improves the performance of the proposed as 400 h transcribing an hour of conversational speech corpus. Further-
system. more, methods that rely solely on a single machine-learning model nor-
3. The social traits of reviewers are more helpful than other features in mally suffer from diminished generalizability and overlook social
enhancing quality classification performance, which is also influenced attributes.
X. Zheng et al. / Decision Support Systems 56 (2013) 211–222 213

Semi-supervised learning and ensemble learning are among the the reviewers in these networks. Therefore, the average support
promising methods for overcoming these problems. Semi-supervised number of a review is key to the quality of the review [18].
learning models are derived from traditional learning methods, includ-
ing semi-supervised SVM [23], graph-based semi-supervised learning We define useless reviews as those containing spamming or
[24], and co-training [25]. Co-EM, a variant of co-training, has been ex- low-quality contents [4]. Review spam is analogous to Web spam
tensively used in semi-supervised learning. In the co-training tech- [31], through which spammers post malicious appraisals of specific
nique, an instance in a dataset has two views, each described by products to damage the reputation of these goods or mislead con-
different feature sets with complementary information on the instance sumers. Although low-quality reviews are not products of malicious
[23]. Two separate classifiers are trained using the labeled instances in intent, they provide little information, and therefore, little additional
each view, and the most confident unlabeled instance is applied until value to consumers. Carelessly written reviews may contain numer-
all the instances are exhausted. Additionally, Co-EM [26] improves ous syntactic or grammatical mistakes and little personal empirical
co-training by building a probability model that enables classifiers to feedback. Shoppers are unlikely to refer to these reviews when mak-
teach each other, and leverages the EM algorithm to estimate ing purchase decisions.
parameters. A review system is related to three entities: the items evaluated
Semi-supervised learning markedly enhances the performance of (products and services), the reviews of each item, and the communi-
a classification model while requiring considerably fewer labeled in- ties to which reviewers/consumers belong. We denote these entities
stances than does supervised learning; the unlabeled instances in a as I, R, and C, respectively. The demographics of the reviewers and
training dataset also contribute to classification or cluster modeling. their social network information form the social context of reviews.
Nonetheless, semi-supervised learning does not always yield good re- Consequently, we define the social feature of a review as follows:
sults [27]. To acquire the full benefits of this approach, the co-training
based method must typically comply with two assumptions [28]: Definition 1. For each review R, the social feature (SF) of R is denoted
as SF(R) = b D, SR, C >, which includes consumers (C), their social
1. Each classifier trained on a corresponding view of instances per- relationship (SR), and their demographics (D).
forms sufficiently well when provided enough labeled instances. Traditional quality studies rely primarily on text-based features
2. Views are conditionally independent of one another given their [15,32], which are defined as follows:
class labels.
Definition 2. Text-based features (TB) of a review (R) are defined as
Assumption 1 ensures that each classifier provides another classi- TB(R) = b ID, CD, RD >, which includes intrinsic data quality, con-
fier more accurately classified instances, and the second assumption textual data quality, and representational data quality [33].
guarantees that the newly added unlabeled instances are informative A review R can be either labeled (L) or unlabeled (U). Each labeled
enough to update the classification model. If Assumption 2 were not review ri ∈ L is assigned a numeric
  score Sri , which represents the
held, selected instances would be highly similar, and therefore, less true helpfulness degree of ri. Sri can be collected from a feedback
useful, eventually diminishing model performance. system available in many online social commerce platforms, such as
Given the mandatory compliance with the two assumptions, some the helpful voting system in Amazon.com, or manually labeled by an-
co-training based methods, such as Co-EM SVM [23] and Bayesian notators. In this paper, we adopt the former.
co-training [25], manually assign independent views, an approach
 Consequently,
  the input
 raw data for ORQM can be denoted as
that presents nongeneralizability and inaccuracy. The performance L∪U; Sri ; SFðRÞ; TBðRÞ . A classification function f is trained to esti-
of SVM and Bayesian classifiers may not be sufficiently satisfactory mate whether a review is helpful or useless (including spamming,
for labeling unlabeled data. These drawbacks motivate our decision replicating, or low quality reviews) using the following formula:
to embed the ORQM system with ICA [29] and use semi-supervised
ensemble learning [30] to address the aforementioned problems. 
Useful; ith review is useful
f ðRi Þ→ ð1Þ
3. Problem formalization SpamjDuplicatejLow  quality; otherwise

The extent of uselessness varies per case; useful and useless re-
views are not clearly distinguished, challenging the implementation where R := {∪Ri, i = 1, …, n.} and Ri is the ith review. f maps the re-
of ORQM. People may also have various criteria for product review view feature space into the numerical quality value space. The contin-
quality, which depend on their roles in an e-commerce process, per- uous value of f can be further converted into the binary decision
sonal experiences or backgrounds, and product types. In a broad (helpful or useless discrimination) by ROC analysis using kappa sta-
sense, therefore, the quality of an online review is subjective and con- tistics, which is measured by two annotators to determine a threshold
text relevant. A helpful review likely possesses the following θ.1 Previous studies focused on manipulating supervised learning
characteristics: models [34,32] to find a quality predictor P on {L, TB(R)}. Our novel
semi-supervised method, i.e., Co-EM ensemble learning, takes advan-
1. It provides as many appraisable contents as possible to encompass tage of L, U, and SF to enhance P.
detailed descriptions of product features. For example, a helpful The three steps taken in verifying the effectiveness of our methods
cell phone review tends to exhaustively describe product aspects, are discussed as follows. First, we select three typical supervised
including operating system, display, battery duration, and weight. learning models and three typical semi-supervised models as baseline
It also presents comprehensive personal experiences and opinions predictors, together with our model trained on {SF(R), TB(R)} to com-
that do not echo other product descriptions or reviews; pare model performance. Then, we alter the input of each model by
2. It has few spelling and grammatical errors, as well as short gradually increasing its input instance number. Finally, we investigate
sentences made up of familiar words or terms, and presents the manner by which quality predictor P is enhanced by the addition
more relevant contents about a product; of an SF. In the succeeding section, we discuss the mechanism of our
3. The provider of a useful review tends to be active in reviewer com- ORQM system.
munities and receives positive feedback from others. The review
systems of leading e-commerce sites, such as Amazon.com, are ac- 1
In our experiments, we found that if was set to 0.6, the error rate could be mini-
tually social networks. The implicit and explicit feedbacks in such mized. Thus, if a review has more than 60% helpful votes, it can be classified as helpful,
social networks represent the strength of the connection among and vice versa.
214 X. Zheng et al. / Decision Support Systems 56 (2013) 211–222

4. The anatomy of ORQM data quality, and representational data quality. According to the au-
thors [18], social features are specifically important in online review
ORQM incorporates the social characteristics of reviewers to evalu- quality mining because these attributes, which are used to quantify
ate the quality of reviews with an extended Co-EM ensemble learning the social features of reviewers, are strongly associated with review
method. The ORQM system comprises three components: the system quality. This relationship stems from the social contingence of the in-
structure, the features of reviews extracted for data mining, and the formation sent by customers and retailers on social networks. On the
principle of Co-EM ensemble learning with ICA-based data transforma- basis of these considerations, we observe that reviewers with solid
tion. These components constitute the mechanism of ORQM. reputations post more high-quality reviews than do ordinary re-
As shown in Fig. 1 review mining in ORQM involves three stages ac- viewers, an observation also confirmed in [36]. Thus, we claim that
complished by three subsystems: feature extraction, ICA, and Co-EM social features are effective components of review mining because
ensemble learning. The feature extraction subsystem pre-processes they capture the social characteristics of reviewers and their relation-
raw datasets to build a primitive feature space. The ICA subsystem con- ships in social networks. We select six features from the social catego-
verts the feature space into an appropriate form to optimize the perfor- ry (Table 1).
mance of Co-EM ensemble learning. The Co-EM ensemble learning The rationale behind the selection of these social features lies in
subsystem classifies online reviews into two classes in accordance the fact that people with strong reputations in an e-commerce com-
with their characteristics. munity tend to provide more influential discussions, making their re-
In the ORQM workflow, the ICA subsystem transforms the original views more helpful [1,2]. The more detailed considerations for these
feature space into two mutually independent projection spaces. This features are discussed as follows. Historical ranking (f28) and recent
transformation underpins the Co-EM ensemble learning subsystem, en- ranking of reviewers (f27) show that reviewers with high ranks,
abling the satisfaction of Assumption 2, as stated in Section 2. Moreover, more personal information, and more helpfulness votes, are expected
the AdaBoost method in SVM [35] in the Co-EM ensemble learning to be high-quality review writers. Social commerce communities,
subsystem manipulates the ensemble classifiers for Co-EM learning, en- such as Amazon.com, often rank reviewers in accordance with num-
abling the satisfaction of Assumption 1. This way, ORQM can function in ber of reviews, helpfulness rate, and total helpfulness votes received.
correspondence with the best performance of co-training based Therefore we take advantage of current and historical rank to distin-
methods. guish high-quality reviews.
A social network normally contains some hub nodes which have a
high degree of centrality [37]; these nodes are recognized as opinion
4.1. Feature extraction leaders, and hold highly regarded reputations and influence in mar-
keting. Studies also show that opinion leaders may deliver extra prod-
The features extracted from online review corpuses contribute uct information, influence the product adoption process, and enlarge
major information to review quality mining. In a study by Wang market size [38], making them suitable agents for viral marketing
and Strong [33], a hierarchical text-based quality framework was [39]. In our work, Top reviewer flag (f30) is used for representing
constructed along three dimensions: intrinsic data quality, contextual the influence of opinion leaders.

Fig. 1. ORQM structure.


X. Zheng et al. / Decision Support Systems 56 (2013) 211–222 215

Total helpful votes (f29) that each reviewer receives are defined as The optimization problem can be solved by gradient descent. The
the degree of attention that a reviewer receives. To measure the ex- t-step iteration is obtained as follows:
tent of activity of reviewers in a community, we propose using Num- 8   
ber of reviews posted by each reviewer (f31) and Number of < W ðt Þ ¼ W ðt−1Þ þ μ ðt Þ W −T ðt−1Þ−E ϕðSÞxT
disclosed demographics (f32), including number of friends, interests,  h i : ð5Þ
: W ðt Þ ¼ W ðt−1Þ þ μ ðt Þ I−E ϕðSÞST W −T ðt−1Þ
and locations.
In addition to social features, text-based features in intrinsic, con-
textual, and accessible categories are carefully designed to capture After a statistically independent feature vector S is generated by
the quality of reviews, following previous research. The extraction ICA, it can be randomly split into two vectors. Two Co-EM ensemble
of these features is discussed in the experiment section. The descrip- classifiers can be trained by mutual reinforcement (the pseudo-code
tions of these features are provided in Appendix A. We use 32 features of the ICA algorithm can be found in Appendix B).
falling under four categories for review mining. Finally, the values of the available features extracted from on-
line reviews, denoted as V, are split into two disjoint sets V1 and
4.2. ICA-based data transformation V2. Each instance can be represented as {L ∪ U, V1(SF(R) ∪ TB(R)),
V2(SF(R) ∪ TB(R))}. Here, the second and third components are
As a statistical method for separating a multivariate vector into vectors over V1 and V2, respectively.
additive subcomponents, ICA is widely used in statistics, signal pro-
cessing (blind source separation), wavelet transform, and machine 4.3. Co-EM ensemble learning
learning. It converts original multi-dimensional random vectors into
statistically independent components [29]. In ORQM, ICA is a key Co-training and Co-EM methods can use unlabeled data to en-
step in transforming the original feature space into a conditional in- hance performance when these data are trained on independent
dependence projection feature space, which makes our input for views, but Co-EM requires a classifier to estimate class probability
Co-EM ensemble learning more robust. A typical ICA problem is de- of instance. The Co-EM SVM method that we adopt is an extension
fined as follows. of the general Co-EM method, reinforced by SVM for text mining
[23]. We also improve Co-EM SVM by enhancing SVM with the en-
Definition 3. X = (x1,x2, …,xm)T is an observed m-dimensional ran- semble approach.
dom vector, and S = (s1,s2, …,sn)T is an n-dimensional component Krogh [42] stated that an ensemble-enhanced model exhibits high
vector, the components of which are statistically mutually indepen- generalization performance when the average error rate of compo-
dent. The ICA problem is a linear transformation as follows: nent classifiers is low, and the extent of difference among the compo-
nent classifiers is sufficiently large. To deliver versatile performance,
X ¼ WS ð2Þ therefore, diversity among component classifiers must be kept as
high as possible when ORQM exploits unlabeled instances. To develop
the Co-EM ensemble learning algorithm, three problems must be
where W is an m × n static matrix for estimation, and the compo- solved: constructing robust ensembles to satisfy the diversity require-
nents of si must be as statistically independent from one another as ment, estimating the class probability of unlabeled instances, and de-
possible [29]. veloping a learning algorithm that uses unlabeled instances.
ICA can be conducted with the help of mutual information, which To address the first problem, we introduce the AdaBoost algo-
measures W by minimizing mutual information among the features rithm [35] into the SVM ensemble construction (AdaBoost _ SVM).
in X [29,40]: The proposed AdaBoost_SVM obtains better generalization ability
than SVM alone for imbalanced classification tasks [35], consequently
X
N −1 producing high benefits in quality classification problems, as the real
MinX IðX Þ ¼ −HðX Þ þ HðX ðiÞÞ ð3Þ quality distribution is extremely biased [13]. (AdaBoost _ SVM)
i¼0
re-weighting the instances by first assigning a training sample with
a large weight δ and then decreasing this weight, so that SVM classi-
where X(i) is the ith component of X, I(X) is the measure of mutual in- fiers initially perform poorly before acquiring stronger ability later on
formation for X, and H(X(i)) is the entropy of X(i). and keeps improving the classification performance. Controlling δ not
An approximate expression of I(X) can be obtained using the only prevents (AdaBoost _ SVM) from overfitting, but also facilitates
Edgeworth expansion [41] as follows: higher generalization performance.
To solve the second problem, we follow the principles of
X
N−1  semi-supervised learning. Let us denote U+ and U− as helpful and
1 2 1 2 7 4 1 2
I ðX Þ≈C− κ 3 ðX ðiÞÞ þ κ 4 ðX ðiÞÞ þ κ 4 ðX ðiÞÞ− κ 3 ðX ðiÞÞκ 4 ðX ðiÞÞ useless unlabeled reviews, respectively, and L+ and L− as helpful
i¼0
12 48 48 8
and useless labeled reviews, respectively. The task is to estimate the
ð4Þ

class probability P^ y xi ; xi ∈U , y = helpful for U+ or useless for U−.



P^ ðx jyÞP^ ðyÞ
As P^ yjxi ¼ p x , the prior probabilities P^ ðyÞ can be derived from

i

where κi denotes the cumulant of its corresponding random variable. ð iÞ


L+ and L−, e.g., P ðþÞ ¼ LL . In each iteration, the unlabeled data are
þ

split into U+ and U− with the ratio P^ ðyÞ and the unlabeled instance
Table 1 with the highest AS(x∗i ) appended to L+, where AS(⋅) denotes the
Social features.
AdaBoost_SVM classifier.
Variables Description If the decision function value is assumed normally distributed as
NRR (f27) Reviewer rank in the last few days
p(AS(X)|y) ∼ N[μ,σ2] [43], according to the law of large numbers, μ
CRR (f28) Historical reviewer rank and σ2 for U+ and U− can be estimated as the following, where
HV (f29) Total helpfulness votes received y ∈ {helpful,useless} and C is a constant:
TP (f30) Top reviewer flag, if the reviewer is of top type
 
NPI (f31) Number of personal information items offered C
TRN (f32) Number of total posted reviews μy ¼ jLj∑ ASðxÞ þ jUj∑ ASðxÞ ð6Þ
jU j þ jL j
y y
x∈L x∈U
216 X. Zheng et al. / Decision Support Systems 56 (2013) 211–222
!
C  2 Table 2
2
σ y ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ∑ ASðxÞ−μ y : ð7Þ Descriptions of Amazon.com review datasets.
jU j þ jLy j2
y 2 x∈U;ðx;yÞ∈L
Category Reviews Products Reviewers

Music 1,327,456 221,432 503,884
Finally, the class probabilities P^ y xi can be inferred from the Gauss- Books 2,493,087 637,120 1,076,746
ian Maximum Likelihood of Eqs. (5) and (6) and prior probabilities P^ ðyÞ. DVD/VHS 633,678 60,292 250,693
mProducts 228,422 36,692 165,608
h i


N μ y ; σ 2y ðASðxi ÞÞ P ∧ ðyÞ All 5,838,032 1,195,133 2,146,048


p^ yjxi ¼ h i

h i

: ð8Þ
N μ y ; σ 2y AS xi P ∧ ðyÞ þ N μ y ; σ y 2 AS xi P ∧ ðy Þ
on the basis of size; categories that are too large or too small are ex-
For the third problem, we designed a method for training ensembles cluded. The same dataset categories were also used by Jindal et al.

based on labeled data, unlabeled data, and class probabilities p^ y xi . In [6]. These reviews cover both search products, such as DVD/VHS
the simplest way, in each iteration of Co-EM ensemble learning, the en- and mProducts (IT products such as computers,), and experience
semble ASi examines each instance in U. If the number of component products, such as music and books. All reviews contain information
classifiers voting for some label exceeds a given threshold τ, then the about the review contents, product attributes, and reviewer attri-

unlabeled instances with their class probabilities p^ y xi are placed in butes (Table 3).
the labeled dataset L.
However, in some cases, the number of unlabeled instances added to
the labeled dataset L can be very large or even equal to the size of U. In 5.1.3. Metrics
this case, when the learned model, especially in some initial iterations, As previously stated, we apply accuracy (ACC), F-score, receiver
has not fully satisfied the underlying normal distribution, then it may operating characteristic curve (AUC), average precision (APR), root
affect the performance by leveraging a large amount of automatically mean square error (RMSE), mean cross-entropy (MXE), and mean
misclassified labeled data. [46] in assessing the performance of ORQM in review quality min-
Thus, Nigam et al. [44] proposed that each unlabeled instance should ing. These metrics are commonly used in other models to measure
be assigned a fixed weight. Using a similar idea, in our model, the weight the performance of such models; thus, they serve as standards for
of an instance is given by the probabilistic confidence of an ensemble. By testing the performance of ORQM. Among the measures, ACC and
introducing the soft weight, it not only reduces the negative effect of large F-score are threshold metrics that usually have a fixed threshold;
automatically labeled data but also makes the algorithm insensitive to τ. AUC and APR are order metrics; RMS and MXE are probability
Utilizing ensemble learning makes labeling the unlabeled in- metrics; and mean calculates the average values of the other six
stances much more accurate than using a single classifier, but the metrics.
misclassification of unlabeled instances is unavoidable. Moreover,
one of the disadvantages of the EM algorithm is that it tends to con-
verge into the local optima. Thus, we added the slack variable Cs to 5.2. Data extraction and transformation
each component SVM classifier of the ensemble following the
smoothing strategy of TSVM [45]. Two tasks are conducted in preparing pre-text mining data: fea-
The pseudo-code of Co-EM ensemble learning is available in ture extraction and transformation, which involve many technical de-
Appendix C. The algorithm first constructs two SVM ensembles and tails. The following are the approaches applied to extract or transform
then transforms the original features into statistically independent fea- a number of important intrinsic features:
ture subsets V1 and V2. Then, it initializes two ensembles and trains two
ensembles E1 and E2 on V1 and V2, respectively. The unlabeled data are 1. We use the latent Dirichlet allocation (LDA) method [47] for topic
split iteratively according to the estimated probability through mutual discovery, which is a utility implemented by LingPipe. Given the
learning iteration. After convergence, the unlabeled data are used up. Fi- number of topics (i.e., Feature f2, the value of which is determined
nally, the trained classifiers for quality classification are obtained. by applying Gibbs sampling in parameter estimation [48]), the LDA
utility can decompose the term frequency vector of a document
5. Experimental evaluation into a series of orthogonal vectors of term frequencies;
2. Using Bos method (f3) [9] as the basis, we train another dynamic
5.1. Experiment design language model to label each review as either subjective or objec-
tive, which then generates the statistics of subjective and objective
5.1.1. Experiment scheme sentences in reviews, including Feature subject (f7), Object (f8),
We conduct three complementary experiments. The first is and Ratio of subjective and objective reviews (f9, i.e. SOR).
designed to compare ORQM with several state-of-the-art supervised,
semi-supervised, and unsupervised methods. The second experiment
evaluates these models along the dimensions of different sample Table 3
sizes and scenarios with/without ICA. The last experiment is intended Variables of the experiment dataset.

to examine the effects of social features on ORQM. Review dataset Product dataset Reviewer dataset
These experiments match the three indispensable steps of review
Reviewer id Product id Reviewer id
quality mining systems: review corpus shifting, pre-processing, and Product id Product name Reviewer name
learning. Consequently, these tests adequately cover the dimensions Date Brand Rank
of scope, depth, and factorial diversity in each sub-system in ORQM. Helpful feedback number Sales price Top k
All feedback number List price Location
Rating Product description Birthday
5.1.2. Dataset Title Total review number
We collect data from Amazon.com, one of the major representa- Body New reviewer rank
tive sources of research data. The June 2006 dataset contains 5.8 mil- Classic reviewer rank
lion reviews posted by 2.14 million reviewers for 1.2 million Total helpful votes
Total votes
products in four categories (Table 2). We select the four categories
X. Zheng et al. / Decision Support Systems 56 (2013) 211–222 217

3. We jointly use LDA [9] and the Inquirer dictionary 2 to construct Finding 1. Co-EM ensemble learning outperforms other methods in re-
two TF-IDF vectors includes Tf-idf vector of product feature view quality classification measured with seven metrics in terms of
words (f10) and Tf-idf vector of sentiment words (f11). datasets from four kinds of products.
4. We derive f5, the degree of consistency indicates the deviation of a re- Table 4 shows that under most of the metrics, the semi-supervised
view from its rating, according to Eq. (5) as follows, where P denotes methods produce generally better results than do the supervised
the polarity score and R is the product rating provided by a reviewer. methods because the former use unlabeled data to improve perfor-
mance. The results on the four distinguishable products show that
Co-EM ensemble learning achieves remarkably higher performance on
8 each metric than do the supervised methods, especially on ACC and
< 0; ðR−1Þ  20bPbR  20 AUC. Co-EM ensemble learning also enhances related semi-supervised
P−ðR−1Þ  20
⌊ ⌋
consistency ¼ : ð9Þ learning, such as Co-EM SVM.
: ; else
20
Finding 2. Review quality mining with nearly all algorithms, as previously
itemized, performs well on the mProduct dataset but relatively
underperforms on the music, book, and DVD/VHS datasets.
5. With regard to Flesch reading ease score (f21), we adopted the ap- We attribute this outcome to the heterogeneity of reviews rele-
proach by [49] to quantify the legibility of text using the following vant to the nature of the products, in which herd mentality signifi-
formula: cantly influences virtual social relationships and the effects of online
reviews [50,51].
Score ¼ 206:835−ð1:015  ASLÞ−ð84:6  ASW Þ ð10Þ mProducts are search products with tangible characteristics that
can be described more objectively, whilst music and books are expe-
where ASL stands for average sentence length, and ASW stands for rience goods with more unstructured features, resulting in more sub-
average syllable number per word. jective reviews. Thus, online search results for search goods have less
depth (time per page) and higher breadth (total number of pages)
6. The degree of consistency indicates the deviation of a review from
than those for experience goods [20]. In this way, online reviews
its rating. We derived f5 according to Formula (5), where P de-
exert less influence on consumer search and purchase behavior for
notes the polarity score, and R is the product rating provided by
search goods than for experience goods. If the reviews of search
the reviewer. As 0 b P b 100 and 1 b R b 5, we adopt a linear
goods are meticulously written, the comments in the reviews can
scheme to map R into the range of 20 in P.
well match the expectation of a prospective consumer. From this per-
7. (6) Other derived features include a. Polarity feature (f3). It refers
spective, such reviews are good predictors for product quality, and
to the number of distinct product features of a review. It is built by
the responses on these reviews are accurate. As the features
the hierarchical classification method for sentiment analysis; b.
presented realistically reflect helpfulness, they in turn enhance the
Feature WR (f6). It is the ratio of nouns, verbs, adjectives, and ad-
classification performance.
verbs in a review, a part-of-speech (POS) tagger is trained on
By contrast, the experiences of reviewers in experience goods,
Brown corpus.
such as DVD, music and books, may not be directly adopted by others.
These approaches are mainly processed through LingPipe3 and Features in many reviews on experience goods for determining help-
Inquirer dictionary, and the ICA-based feature transformation algo- fulness could become deviational. This outcome eventually dimin-
rithm is implemented by the MILCA4 toolkit. The resultant out- ishes the perceived ability for prospective consumers to identify
comes are review feature vectors to be used for further analytical helpful reviews for their decisions [11]. Specifically, models on the
processes. book dataset perform the worst, whereas models on the music
dataset perform the best. We infer that books are the most
5.3. Experiment results and analysis experience-independent products, which also have minimum explicit
features for quality assessment. Therefore, our experiment finding in
5.3.1. Evaluating the performance of different review mining models in this aspect is consistent with the previous discussion.
terms of the seven metrics
We measure the Co-EM ensemble learning and baseline methods on 5.3.2. Testing the influence of labeled samples
the Amazon.com dataset using the 10-cross-validation. Our baseline The second experiment is designed to evaluate the effect of differ-
methods incorporate supervised-learning models, including Random ent training sizes on the performance. We manipulated ORQM to ex-
Forest, SVM, and Logistic Regression, semi-supervised models, includ- plore the effect of the training sample size and the effect of ICA on the
ing Co-EM SVM, Co-Training, Co-EM Bayesian and unsupervised one, performance of the system. The latter was done by comparing the
namely, KNN clustering. These algorithms are implemented using the performance of ORQM with a modified one without ICA. Based on,
machine learning toolkit WEKA5. Two sets of 10,000 labeled reviews we obtained the following interesting findings:
sampled from the dataset are used for training and testing, respectively.
The experiment results are listed in Table 4, from which we obtain the Finding 3. At different levels of available labeled samples, ORQM consis-
following findings. tently outperforms other models on the AUC metric.
Fig. 2 shows that ORQM delivers the highest AUC performance
2
http://www.wjh.harvard.edu/inquirer/. even under extremely few labeled samples. ROC analysis (AUC) is
3
LingPipe is a toolkit for processing text using computational linguistics that in- designed to evaluate the classification task with varying class distri-
cludes topic classification, POS Tagging, Named Entity Recognition, clustering, charac-
butions, which are caused by different training samples. Hence, the
ter language modeling, and sentiment analysis, among others. For more details, refer to
the homepage at http://alias-i.com/lingpipe. observation on the AUC of these models is suitable. Supervised
4
MILCA stands for Mutual Information Least-dependent Component Analysis. It is methods have poor performance under an excessively small labeled
an ICA algorithm that uses an accurate Mutual Information (MI) estimator to find the training dataset because of their incapability to use massive unlabeled
least dependent components under a linear transformation (http://www.klab.caltech. data. As the number of labeled samples increases, ORQM and the
edu/kraskov/MILCA/).
5
WEKA is a collection of machine learning algorithms implemented in Java by the
other semi-supervised methods behave more stable than do super-
University of Waikato in the principle of open source (http://www.cs.waikato.ac.nz/ vised methods, fully demonstrating the power of semi-supervised
ml/weka/). learning when adequately conceived and configured.
218 X. Zheng et al. / Decision Support Systems 56 (2013) 211–222

Table 4
Results of the different algorithms on the Amazon.com datasets.

Metrics

Category Algorithm ACC F-score AUC APR RMS MXE Mean

mProducts Co-EM Ensemble 0.9915 0.9744 0.9969 0.9899 0.9807 0.9978 0.9885
Co-EM SVM 0.9609 0.874 0.981 0.9773 0.9319 0.9692 0.9491
Co-training 0.9699 0.8671 0.9608 0.9793 0.937 0.9704 0.9474
Co-EM Bayesian 0.9794 0.88 0.985 0.9821 0.935 0.9841 0.9576
Random forest 0.9129 0.852 0.977 0.963 0.929 0.9293 0.9272
LibSVM 0.9101 0.879 0.98 0.9603 0.9207 0.921 0.9285
Logic regression 0.8244 0.8156 0.89 0.8894 0.86 0.884 0.8606
KNN 0.7631 0.7583 0.7814 0.7745 0.7592 0.7702 0.7678
Music Co-EM Ensemble 0.906 0.9 0.971 0.9435 0.9193 0.9214 0.9269
Co-EM SVM 0.883 0.849 0.944 0.9207 0.9005 0.9112 0.9014
Co-training 0.854 0.8374 0.9398 0.91 0.8866 0.9009 0.8881
Co-EM Bayesian 0.889 0.8672 0.952 0.9401 0.9017 0.912 0.9103
Random forest 0.876 0.849 0.94 0.9271 0.8871 0.9094 0.8981
LibSVM 0.835 0.808 0.9373 0.9331 0.9147 0.92 0.8914
Logic regression 0.701 0.6878 0.81 0.776 0.7193 0.72 0.7357
KNN 0.692 0.6749 0.7245 0.7137 0.6883 0.7059 0.6999
Books Co-EM Ensemble 0.8879 0.8637 0.9492 0.9277 0.8973 0.9003 0.9044
Co-EM SVM 0.8491 0.8102 0.9223 0.9039 0.8651 0.8874 0.873
Co-training 0.8227 0.7972 0.9036 0.8741 0.8419 0.864 0.8506
Co-EM Bayesian 0.828 0.8097 0.8993 0.8793 0.8324 0.8542 0.8505
Random forest 0.8641 0.8479 0.9217 0.9115 0.8847 0.8994 0.8882
LibSVM 0.8041 0.7769 0.8974 0.8505 0.8211 0.8479 0.833
Logistic regression 0.6842 0.6659 0.7795 0.7359 0.6914 0.7126 0.7116
KNN 0.7144 0.6831 0.7292 0.7261 0.7014 0.7185 0.7121
DVD/VHS Co-EM Ensemble 0.9019 0.8881 0.9599 0.9331 0.9097 0.9217 0.9191
Co-EM SVM 0.8541 0.8215 0.9117 0.8973 0.8676 0.8797 0.872
Co-training 0.8083 0.7768 0.8794 0.8411 0.8192 0.8215 0.8244
Co-EM Bayesian 0.8872 0.8769 0.9257 0.9193 0.8992 0.9137 0.9037
Random forest 0.7036 0.6875 0.7795 0.7437 0.7179 0.7338 0.7277
LibSVM 0.6473 0.6105 0.7705 0.7392 0.7075 0.7212 0.6994
Logistic regression 0.4741 0.4629 0.6038 0.5731 0.5059 0.5302 0.525
KNN 0.7241 0.6938 0.7381 0.7332 0.7119 0.7286 0.7216

Finding 4. ICA plays a critical role in ORQM. circle also known as positive psychology [55]. Thus, social features are
The results from a comparison experiment without ICA show that, strong indicators of review quality.
if the number of labeled samples were larger than a given threshold,
Finding 6. Next to social features, intrinsic features extracted from the
the performance of ORQM would diminish upon training on some
reviews of search goods significantly influence the performance of ORQM.
larger datasets. The decline in performance is due to the inappropri-
This finding is inferred from the results of mProduct data mining
ate pre-processing by randomly splitting features without ICA for
in Table 5 but is not applicable to other product categories. This dis-
Co-EM ensemble learning, where the independence assumption no
crepancy is attributed to the nature of intrinsic features, which reflect
longer holds. Thus, the ICA-based pre-processing increases and stabi-
review topics, product features, and sentiments, and make these
lizes the performance of ORQM.
qualities more suitable for products with clearly distinguishable fea-
tures, such as search goods.
5.3.3. Evaluating the effects of social features
The last experiment evaluates the effectiveness of social features
in ORQM. The performances of ORQM measured in AUC and ACC are 6. Conclusion and future work
listed in Table 5 with different combinations of features. The follow-
ing are the main findings: The review quality problem in e-commerce communities has
drawn considerable research attention, because the absence of an ef-
Finding 5. Social features contribute the most to the performance of fective mechanism for review quality control casts doubt on the re-
ORQM. sults studies on opinion mining, sentiment classification, and review
Social features implicitly take advantage of viral marketing (VM), summarization. Review quality mining that guarantees the depend-
which diffuses the opinion of products or ads through social networks ability of the results of the aforementioned studies can provide a
in a self-replicating way analogous to the spread of viruses or com- solid background for successive related research. Furthermore, plac-
puter viruses [52]. The mechanism of VM is in the cascade effect orig- ing these studies within a quality control structure yields more accu-
inated by influencers [53], who have high social influences and active rate results.
social actions. Thus, the reviews of influencers will likely change the In this study, we present the ORQM system to conduct robust,
minds and behaviors of consumers. practical, and high-performance review quality mining. The main
We find that reviewers with a high reputation of high centrality distinguishing trait of ORQM is that it provides comprehensive func-
are more likely to post high quality reviews. These opinion leaders tionalities to evaluate the quality of reviews, which not only cover
also exert great influence, namely, diffusion in the social network text-based features, such as intrinsic, contextual, and accessible
for marketing, which inevitably influences the recognition degree of ones, but also creatively introduce social networks features in
the quality of their online reviews [54,36]. Thus, consumers in the so- assessing the significance of online reviews. These social features
cial commerce community are more likely to follow these opinion can capture the social characteristics of reviewers and their relation-
leaders. With more consent on their reviews, these opinion leaders ship within the online business social network, and help to enhance
are more likely to write more high quality reviews, forming a virtuous the accuracy and performance of quality classification.
X. Zheng et al. / Decision Support Systems 56 (2013) 211–222 219

(a) mProduct Table 5


Results of the feature combinations for ORQM.

Metrics

Dataset Feature combination ACC AUC

mProduct Intrinsic feature (1) 0.8415 0.8742


Contextual feature (2) 0.8766 0.8907
Accessibility feature (3) 0.9036 0.9328
Social feature (4) 0.9212 0.9391
(1) + (4) 0.9263 0.9471
(2) + (4) 0.9211 0.9457
(3) + (4) 0.9364 0.9572
(1) + (2) + (3) + (4) 0.9915 0.9969
Music Intrinsic feature (1) 0.8201 0.8492
Contextual feature (2) 0.8679 0.8845
Accessibility feature (3) 0.8602 0.8798
Social feature (4) 0.8814 0.9132
(b) Music (1) + (4) 0.8871 0.9265
(2) + (4) 0.8899 0.9395
(3) + (4) 0.8931 0.9463
(1) + (2) + (3) + (4) 0.906 0.971
Books Intrinsic feature (1) 0.8126 0.8386
Contextual feature (2) 0.8204 0.8399
Accessibility feature (3) 0.8549 0.8736
Social feature (4) 0.8605 0.8994
(1) + (4) 0.8671 0.9197
(2) + (4) 0.8692 0.9225
(3) + (4) 0.8843 0.9339
(1) + (2) + (3) + (4) 0.8879 0.9492
DVD/VHS Intrinsic feature (1) 0.8253 0.8446
Contextual feature (2) 0.8271 0.8495
Accessibility feature (3) 0.8566 0.8796
Social feature (4) 0.8792 0.8997
(1) + (4) 0.8894 0.9214
(2) + (4) 0.8831 0.9183
(c) Books (3) + (4) 0.8902 0.9328
(1) + (2) + (3) + (4) 0.9019 0.9599

samples, while resulting in more versatile performance compared


with existing methods. In addition, ICA pre-processing ensures the
robustness and stability of ORQM. These improvements are con-
firmed or supported by three carefully designed experiments.
As a fundamental framework on review quality evaluation in
e-commerce realm, ORQM should not only serve from the perspective
of retailers, manufactures, but also benefit common online shopping
consumers. On one hand, a promising opportunity is to use ORQM
as a basis for examining the economic value of reviews. High-
quality reviews reflect how people evaluate a product, which features
satisfy their needs, and what types of products they prefer. For exam-
(d) DVD/VHS ple, marketing practitioners can build personalized recommendation
systems by incorporating ORQM to improve system performance;
manufacturers can customize their products to more precisely fit con-
sumer needs; and retailers can adjust their advertising strategies to
highlight preferred product features for sales promotions. On the
other hand, conducting research from the perspective of consumers
is also a worthwhile endeavor. Consumers rely heavily on high-
quality reviews in making purchase decisions. To save consumers ef-
fort in reading a large volume of product reviews, online stores, such
as Amazon, provides the sentiment information, the frequency of re-
view access, and the usefulness points of the review from other con-
sumers. However, these utilities do not tell the quality of a review,
while a poorly written or spamming review could sometimes be mis-
leading. Therefore, a customer-oriented review search system or a re-
Fig. 2. Algorithms comparison on mProducts, music, books and DVD/VHS of Amazon.com view recommendation system can be valuable to social commerce, in
datasets using varied labeled samples.
which review quality mining is a vital component. In this way, we can
regard ORQM as the fundamental work for building an online review
filtering system in a social commerce environment, in which ORQM
More important, from a technical aspect, we combine ensemble can serve as a quality control subsystem for filtering out useless re-
learning and Co-EM learning to enhance the performance of the sys- views. Then online consumer can refer historical purchasing records
tem. Leveraging the Co-EM version of ensemble learning enables and comments in an easier way, without distracted by tons of useless
semi-supervised learning, which reduces the requirement of labeled review information.
220 X. Zheng et al. / Decision Support Systems 56 (2013) 211–222

Though researchers have presented plenty of creative proposals to Table A.7


move forward in review quality mining, there are still many challenges Contextual features.

to be coped with in our future research agenda. Specifically, people have Variables Description Notes
different individual perspectives in the justification of a reviews quality.
Simi (f12) Cosine similarity between review and product description
Their criteria of the review quality may vary between helpfulness and Dup (f13) Cosine similarity between current and previous NEW
uselessness continuously. Thus, applying a unified model that can reviews posted
cope with diverse consumer tastes rooted in different purposes is in ED (f14) Elapsed time after a review was posted
GS (f15) Helpfulness score evaluated by helpfulness votes divided
an urgent request. We expect the social network-based approach to
by total votes
be a potential solution to this challenge. A promising solution is Rat (f16) Product rating
crowdsourced clustering, a type of crowdsourcing method [56]. For ex- Div (f17) Deviation between current rating and the average rating NEW
ample, given by delegating certain steps of the method to the public, the
provision of user-friendly and adaptive quality estimations from differ-
ent user types will become possible. In this context, even when social
Table A.8
information becomes less available because of privacy concerns or Accessibility features.
other reasons, a social inference mechanism [57] can still be introduced
to resolve the shortage of social characteristics. Variables Description

SN (f18) Sentence number of review


Acknowledgments ASL (f19) Average sentence length
ASW (f20) Average number of syllables per word
FRE (f21) Flesh Reading Ease score
This work was supported in part by the National Key Technology FKG (f22) Flesh Kincaid Grade score
RD Program (No. 2012BAH16F02), the National Natural Science Foun- SMOG (f23) Years of education required
dation of China (Grant No. 61003254 and 91218301), and the Funda- SE (f24) Number of spelling errors
mental Research Funds for the Central Universities, as well as ALC (f25) Average length of sentences
ALW (f26) Average length of words
Financial Service Innovation Team Development Project at Southwest
University of Finance and Economics (2012).

Appendix A. Features
information, medium length, and elaborated presentations are more
acceptable to consumers.
A.1. Intrinsic features

Appendix B. Independent component analysis preprocessing


Intrinsic features are designated to capture the quality of reviews in
their nature inherited in the lexical, syntactical, and semantic constitu-
Please refer to Algorithm 1 for details.
tions of the individual review. These reviews contain substantive infor-
mation that may interest consumers. In accordance with this definition, Algorithm 1. ICA(DL,Seed,T)
we selectively adopted four features previously proposed as intrinsic fea-
tures and carefully added seven new ones (Table A.6). Among these fea- Input:
tures, feature polarity (f3) and consistency (f5) are introduced to detect Original n-dimensional vector DL = {x1,x2, …,xn};
spam reviews, as spammers tend to deviate from the normal practice. Seed: seed for random function;
T: Threshold for iteration.
A.2. Contextual features Output:
Mutually statistical independent vectors V1 = {s11,s12, …,s1k} and
Contextual features in Table A.7 are proposed to evaluate the con- V2 = {s21,s22, …,s2(n − k)}
text of each review, as data quality must be aligned to a certain con-
text [33]. Here, the context includes other reviews listed for the 1: V1 = V2 = null;
same product and editorial descriptions. 2: while |W(t) − W(t − 1)| b T do
3: W(t) = W(t − 1) + μ(t)(W− T(t − 1) − E(ϕ(S)xT));
A.3. Accessibility features 4: W(t) = W(t − 1) + μ(t)(I − E[ϕ(S)ST]W− T(t − 1));
5: end while
Accessibility features in Table A.8 measure the readability of a re- 6: S = W−1DL;
view. Consumers tend to read reviews that are neither too long and 7: for i = 1 to length(s) do
too complex nor too short. Reviews with a moderate amount of 8: {V1,V2} = RandomSplit(S,Seed);
9: end for
Table A.6
10: return V1, V2.
Intrinsic features.

Variables Description Notes Appendix C. Co-EM ensemble learning algorithm


RL (f1) Length of review
Topic (f2) Topic number of review New Please refer to Algorithms 2 and 3 for details. The Algorithm 2 de-
Polarity (f3) Polarity score of the review New scribes the preprocessing step and Algorithm 3 shows the transforma-
Pfeatures (f4) Product feature number in the review New tion and learning steps.
Consistency (f5) Consistency score by comparing f3 and rating New
WR (f6) Ratio of nouns, verbs, adjectives, and adverbs
Algorithm 2. Co - EM - Ensemble(DL,DU,C,T,Hi,Hc,MR,Seed,SI,SVMP,E,
in the review
Subject (f7) Subject sentence number and subject sentence ratio CS) part1
Object (f8) Object sentence number and object sentence ratio
SOR (f9) Ratio of subject and object sentences New Input:
FTFIDF (f10) Tf-idf vector of product feature words New Labeled review feature vectors DL;
STFIDF (f11) Tf-idf vector of sentiment words New
Unlabeled review feature vectors DU;
X. Zheng et al. / Decision Support Systems 56 (2013) 211–222 221

Slack parameters C; [5] X. Yu, Y. Liu, X. Huang, A. An, A quality-aware model for sales prediction using re-
views, Proceedings of the 19th international conference on World wide web,
Iteration number T; ACM, 2010, pp. 1217–1218.
Hill climb iteration number Hl; [6] N. Jindal, B. Liu, E. Lim, Finding unusual review patterns using unexpected rules,
optimization metric HC; Proceedings of the 19th ACM international conference on Information and
knowledge management, ACM, 2010, pp. 1549–1552.
Ratio of models that will be randomly chosen from library in each [7] A. Mantrach, N. Van Zeebroeck, P. Francq, M. Shimbo, H. Bersini, M. Saerens,
iteration MR; Semi-supervised classification and betweenness computation on large, sparse, di-
Seed of Random number generator Seed; rected graphs, Pattern Recognition 44 (6) (2011) 1212–1224.
[8] B. Kulis, S. Basu, I. Dhillon, R. Mooney, Semi-supervised graph clustering: a kernel
Sort initialization SI; approach, Machine Learning 74 (1) (2009) 1–22.
Pool of SVM with different kernel, etc. SVMP; [9] B. Pang, L. Lee, Opinion mining and sentiment analysis, Foundations and Trends in
Ensemble E; Information Retrieval 2 (1–2) (2008) 1–135.
[10] B. Pang, L. Lee, A sentimental education: sentiment analysis using subjectivity
Smoothing factor Cs.
summarization based on minimum cuts, Proceedings of the 42nd Annual Meeting
Output: on Association for Computational Linguistics, ACL, 2004, p. 271.
Trained function Ce; [11] A. Ghose, P. Ipeirotis, Estimating the helpfulness and economic impact of product
reviews: mining text and reviewer characteristics, IEEE Transactions on Knowl-
1: C S ¼ 21 ;
T
edge and Data Engineering 23 (10) (2011) 1498–1512.
[12] T. Lee, Needs-based analysis of online customer reviews, Proceedings of the Ninth In-
2: for i = 1 to capacity(s) do ternational Conference on Electronic Commerce, vol. 258, ACM, 2007, pp. 311–318.
3: if length(E) b = SI then [13] N. Hu, J. Zhang, P. Pavlou, Overcoming the j-shaped distribution of product re-
4: E ← E + SVMi; views, Communications of the ACM 52 (10) (2009) 144–147.
[14] A. Talwar, R. Jurca, B. Faltings, Understanding user behavior in online feedback
5: Sort(E, Hc(SVMi));
reporting, Proceedings of the 8th ACM Conference on Electronic Commerce,
6: else ACM, 2007, pp. 134–142, (1250931).
7: if Ei − 1 b Hc(SVMi) then [15] L. Pipino, Y. Lee, R. Wang, Data quality assessment, Communications of the ACM
45 (4) (2002) 211–218.
8: Ei = Ei − 1 + SVMi;
[16] H. Min, J. Park, Identifying helpful reviews based on customer's mentions about
9: end if experiences, Expert Systems with Applications 39 (15) (2012) 11830–11838.
10: end if [17] Y. Lu, P. Tsaparas, A. Ntoulas, L. Polanyi, Exploiting social context for review qual-
11: end for. ity prediction, Proceedings of the 19th International Conference on World wide
Web, ACM, 2010, pp. 691–700.
[18] C. Au Yeung, T. Iwata, Strength of social influence in trust networks in product re-
view sites, Proceedings of the Fourth ACM International Conference on Web
Algorithm 3. Co - EM - Ensemble(DL,DU,C,T,Hi,Hc,MR,Seed,SI,SVMP,E, Search and Data Mining, ACM, 2011, pp. 495–504.
CS) part2 [19] S. Moghaddam, M. Jamali, M. Ester, Etf: extended tensor factorization model for
personalizing prediction of review helpfulness, Proceedings of the fifth ACM in-
ternational conference on Web search and data mining, ACM, 2012, pp. 163–172.
1: ICA(DL,DU) → {V1,V2}; [20] Y. Liu, J. Jin, P. Ji, J. Harding, R. Fung, Identifying helpful online reviews: a product
2: Random(Seed) → K; designer's perspective, Computer-Aided Design 45 (2013) 180–194.
3: E1 = E2 = null; [21] N. Hu, L. Liu, V. Sambamurthy, Fraud detection in online consumer reviews, Deci-
sion Support Systems 50 (3) (2011) 614–626.
4: for j = 1toHi do [22] J. Godfrey, E. Holliman, J. McDaniel, Switchboard: telephone speech corpus for re-
5: SVMj ← RandomSubsetK(SVMP); search and development, Acoustics, Speech, and Signal Processing, 1992,
6: if Hc(SVMj, V2) > Hc(E1,2,V2) then ICASSP-92., 1992 IEEE International Conference on, vol. 1, 1992, pp. 517–520.
[23] U. Brefeld, T. Scheffer, Co-Em support vector learning, Proceedings of the
7: E1,2 ← E1,2 + SVMj; Twenty-first International Conference on Machine Learning, vol. 21, Association
8: else for Computing Machinery, 2004, pp. 121–128.
9: return E1,E2; [24] R. Johnson, T. Zhang, Graph-based semi-supervised learning and spectral kernel
design, IEEE Transactions on Information Theory 54 (1) (2008) 275–288.
10: end if
[25] S. Yu, B. Krishnapuram, R. Rosales, R. Rao, Bayesian co-training, Journal of Ma-
11: end
for

chine Learning Research 12 (2011) 2649–2680.
12: p^ y x;i ←DL ; [26] S. Bickel, T. Scheffer, Estimation of mixture models using Co-Em, 16th European
13: for k = 1toT do Conference on Machine Learning, 2005, pp. 35–46.
[27] Y. Li, Z. Zhou, Towards making unlabeled data never hurt, Proceedings of the Twenty
14: for V = 1, 2 do Eighth International Conference on Machine Learning, ACM, 2011, pp. 1081–1088.
15: Dþ ^
U ¼ ðp ðy ¼ 1Þ  jDU jÞfDU g, [28] A. Blum, T. Mitchell, Combining labeled and unlabeled data with co-training, Pro-
16: D−U = DU − DU ;
+ ceedings of the11th Annual Conference on Computational Learning Theory, ACM,
1998, pp. 92–100.
17: μ+, μ−, σ+ , σ −
← DL, DU; [29] A. Hyvarinen, Independent Component Analysis by Minimization of Mutual Infor-
18: ∀xk ∈DU ; p^ y x;k ←Ek−1V ; mation, Helsinki University of Technology, 1997. , (City:).
19: {Vv} → EvKwithCS; [30] L. Shi, X. Ma, L. Xi, Q. Duan, J. Zhao, Rough set and ensemble learning based
semi-supervised algorithm for text classification, Expert Systems with Applica-
20: end for tions 38 (5) (2011) 6300–6306.
21: CS = 2CS; [31] E. Lim, V. Nguyen, N. Jindal, B. Liu, H. Lauw, Detecting product review spammers
22: end for  using rating behaviors, 19th International Conference on Information and Knowl-

edge Management, ACM, 2010, pp. 939–948.
23: return 12 E1T þ E2T →C e . [32] Z. Zhang, B. Varadarajan, Utility scoring of product reviews, Proceedings of the
15th ACM International Conference on Information and Knowledge Management,
ACM, 2006, pp. 51–57.
References [33] R. Wang, D. Strong, Beyond accuracy: what data quality means to data consumers,
Journal of Management Information Systems 12 (4) (1996) 5–33.
[1] A. Stephen, O. Toubia, Deriving value from social commerce networks, Journal of [34] S. Kim, P. Pantel, T. Chklovski, M. Pennacchiotti, Automatically assessing review
Marketing Research 47 (2) (2009) 215–228. helpfulness, Proceedings of the 2006 Conference on Empirical Methods in Natural
[2] N. Archak, A. Ghose, P. Ipeirotis, Show me the money!: deriving the pricing power Language Processing. Association for Computational Linguistics, 2006, pp. 423–430.
of product features by mining consumer reviews, Proceedings of the 13th ACM [35] X. Li, L. Wang, E. Sung, Adaboost with SVM-based component classifiers, Engi-
SIGKDD International Conference on Knowledge Discovery and Data Mining, neering Applications of Artificial Intelligence 21 (5) (2008) 785–795.
ACM, 2011, pp. 56–65. [36] P. Van Eck, W. Jager, P. Leeflang, Opinion leaders' role in innovation diffusion: a
[3] J. Lee, D. Park, I. Han, The effect of negative online consumer reviews on product simulation study, Journal of Product Innovation Management 28 (2) (2011)
attitude: an information processing view, Electronic Commerce Research and Ap- 187–203.
plications 7 (3) (2008) 341–352. [37] J. Goldenberg, S. Han, D. Lehmann, J. Hong, The role of hubs in the adoption pro-
[4] J. Liu, Y. Cao, C. Lin, Y. Huang, M. Zhou, Low-quality product review detection in cesses, Journal of Marketing 73 (2) (2009) 1–13.
opinion summarization, Proceedings of the 2007 Joint Conference on Empirical [38] S. Aral, D. Walker, Creating social contagion through viral product design: a ran-
Methods in Natural Language Processing and Computational Natural Language domized trial of peer influence in networks, Management Science 57 (9) (2011)
Learning, ACL, 2007, pp. 334–342. 1623–1639.
222 X. Zheng et al. / Decision Support Systems 56 (2013) 211–222

[39] W. Chen, C. Wang, Y. Wang, Scalable influence maximization for prevalent viral mar- Xiaolin Zheng is an associate professor in College of Com-
keting in large-scale social networks, 16th ACM SIGKDD International Conference on puter Science in Zhejiang University, taking charge of Mod-
Knowledge Discovery and Data Mining, Association for Computing Machinery, 2010, ern Service Innovation Lab as Director. His researches
pp. 1029–1038. mainly focus on Data Mining for Social Network, electronic
[40] H. Stogbauer, A. Kraskov, S. Astakhov, P. Grassberger, Least-dependent-component commerce, service computing and so on. He is a senior mem-
analysis based on mutual information, Physical Review E 70 (6) (2004) 066123. ber of CCF (China Computer Federation), committee mem-
[41] A. Papoulis, R. Probability, Probability, Random Variables and Stochastic Process- ber in Service Computing of CCF, and committee member
es, vol. 3McGraw-hill, New York, 1991. in Cloud Computing of CIC (China Institute of Communica-
[42] A. Krogh, J. Vedelsby, et al., Neural network ensembles, cross validation, and ac- tion), member of IEEE and ACM. He has received several
tive learning, Advances in Neural Information Processing Systems, 1995. awards as key member in the following prize: Second Class
231–238. Prize for Outstanding Achievement Award of Colleges and
[43] C. Chen, Y. Tseng, Quality evaluation of product reviews using an information universities Scientific Research in 2010, IBM Outstanding
quality framework, Decision Support Systems 50 (4) (2011) 755–768. Teachers Award in 2009, Excellent Scholar in First Alibaba
[44] K. Nigam, A. McCallum, S. Thrun, T. Mitchell, Text classification from labeled Young Scholars Program in 2009.
and unlabeled documents using EM, Machine Learning 39 (2) (2000)
103–134. Shuai Zhu is a student at Zhejiang University in China pur-
[45] X. Peng, A n-twin support vector machine (n-TSVM) classifier and its geometric suing his M.S. degree in Computer Science. His current re-
algorithms, Information Sciences 180 (20) (2010) 3863–3875. search interests include review mining, natural language
[46] R. Caruana, A. Niculescu-Mizil, Data mining in metric space: an empirical analysis processing, machine learning and computational advertis-
of supervised learning performance criteria, Proceedings of the tenth ACM ing. He has received several awards and honors, including in-
SIGKDD International conference on Knowledge Discovery and Data Mining, novation awards of Tencent, Inc., school outstanding honors.
ACM; Association for Computing Machinery, 2004, pp. 69–78.
[47] I. Porteous, D. Newman, A. Ihler, A. Asuncion, P. Smyth, M. Welling, Fast col-
lapsed Gibbs sampling for latent Dirichlet allocation, 4th ACM SIGKDD Inter-
national Conference on Knowledge Discovery and Data Mining, 2008,
pp. 569–577.
[48] S. Moghaddam, M. Ester, Ilda: interdependent lda model for learning latent as-
pects and their ratings from online product reviews, Proceedings of the 34th in-
ternational ACM SIGIR conference on Research and development in Information
Retrieval, ACM, 2011, pp. 665–674.
[49] W.H. DuBay, The Principles of Readability, Impact Information, 2004. , (City:). Dr. Zhangxi Lin is an associate professor at the Rawls Col-
[50] H. Bloom, The Global Brain: The Evolution of Mass Mind from the Big Bang to the lege of Business Administration, and a co-director of Center
21st Century, John Wiley and Sons, New York, 2000. , (City:). for Advanced Analytics and Business Intelligence, at Texas
[51] C. McPhail, The Myth of the Madding Crowd, Aldine de gruyter, 1991. Tech University. He received his first master degree in com-
[52] J. Leskovec, L. Adamic, B. Huberman, The dynamics of viral marketing, ACM Trans- puter science in 1982 from Tsinghua University, and another
actions on the Web (TWEB) 1 (1) (2007) 5. master degree in economics in 1996 from the University of
[53] C. Kiss, M. Bichler, Identification of influencers—measuring influence in customer Texas at Austin. He earned his Ph.D. degree in information
networks, Decision Support Systems 46 (1) (2008) 233–253. systems in 1999 from the University of Texas at Austin.
[54] A. Shoham, A. Ruvio, Opinion leaders and followers: a replication and extension, Zhangxi Lin's research interests include data communica-
Psychology and Marketing 25 (3) (2008) 280–297. tions, business intelligence, electronic commerce, and
[55] C. Snyder, Positive Psychology: The Scientific and Practical Explorations of Human knowledge-based system. In last ten years, he has published
Strengths, Sage, Thousand Oaks, CA, 2007. , (City:). more than a hundred papers in internationally refereed
[56] R. Gomes, P. Welinder, A. Krause, P. Perona, Crowdclustering, Technical Report journals and conferences. Zhangxi Lin is a member of Associ-
CNS-TR-2011.001, California Institute of Technology, Pasadena, CA, 2011. ation of Information Systems, and INFORMS.
[57] M. Bilenko, M. Richardson, Predictive client-side profiles for personalized adver-
tising, 17th ACM SIGKDD International Conference on Knowledge Discovery
and Data Mining, Association for Computing Machinery, 2011, pp. 413–421.

You might also like