Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

Opinion Digger: An Unsupervised Opinion Miner from

Unstructured Product Reviews

Samaneh Moghaddam Martin Ester


School of Computing Science School of Computing Science
Simon Fraser University Simon Fraser University
Burnaby, BC, Canada Burnaby, BC, Canada
sam39@cs.sfu.ca ester@cs.sfu.ca

ABSTRACT problem definition is illustrated in Figure 1. Opinion Digger takes


Mining customer reviews (opinion mining) has emerged as an in- unstructured reviews (full text reviews) and a set of predefined as-
teresting new research direction. Most of the reviewing websites pects and a rating guideline as input, and outputs a set of additional
such as Epinions.com provide some additional information on top aspects (not provided in input), plus the estimated rating of each.
of the review text and overall rating, including a set of predefined
aspects and their ratings, and a rating guideline which shows the
intended interpretation of the numerical ratings. However, the ex-
isting methods have ignored this additional information. We claim
that using this information, which is freely available, along with the
review text can effectively improve the accuracy of opinion min-
ing. We propose an unsupervised method, called Opinion Digger,
which extracts important aspects of a product and determines the
overall consumer’s satisfaction for each, by estimating a rating in
the range from 1 to 5. We demonstrate the improved effectiveness
of our methods on a real life dataset that we crawled from Epin-
ions.com.

Categories and Subject Descriptors


H.3.3 [Information Search and Retrieval]: Information filtering;
I.2.7 [Natural Language Processing]: Text Analysis

General Terms Figure 1: Problem Definition


Algorithms, Design, Experimentation

Keywords The aspect-based view of the reviews provided by Opinion Dig-


ger does not only help users gain some insight into the quality of
Opinion Mining, Text Mining, Aspect Extraction, Rating Predic- the product and enables them to compare different products, but
tion, Sentiment Analysis also can be used as input for various computer systems. Extracted
aspects and their estimated ratings can be used in summarization
1. INTRODUCTION systems to find sentences which summarize the review more accu-
In this paper, we propose an unsupervised method, called ’Opin- rately, in recommendation systems to provide explanations for rec-
ion Digger’, for mining and summarizing opinions from unstruc- ommendation, in opinion retrieval systems, and in opinion-based
tured customer reviews. Opinion Digger can help users make better question answering systems to answer opinion-based questions by
decisions by providing a summary view of the ratings for the major comparing aspects and ratings of different products.
aspects of the product. An aspect (also called product-feature) is an On top of the review text and overall rating, most reviewing web-
attribute or component of the product that has been commented on sites such as Epinions.com, Tripadvisor.com, Amazon.com, and
in a review e.g. ’battery life’ and ’zoom’ for a digital camera. The etc. provide some additional information including:

• A set of predefined aspects (we call them known aspects)


and their ratings. Known aspects are some key aspects of a
Permission to make digital or hard copies of all or part of this work for product category that users are requested to rate (e.g. ’Ease
personal or classroom use is granted without fee provided that copies are of Use’, ’Durability’, and etc. for camcorders).
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to • A rating guideline which describes the intended interpreta-
republish, to post on servers or to redistribute to lists, requires prior specific tion of the numerical ratings, i.e. the correspondence be-
permission and/or a fee.
CIKM’10, October 26–30, 2010, Toronto, Ontario, Canada. tween ratings and adjectives (e.g. ’excellent’:5, ’good’:4,
Copyright 2010 ACM 978-1-4503-0099-5/10/10 ...$10.00. ’average’:3, ’poor’:2, and ’terrible’:1 in Epinions.com).

1825
However, the existing methods have ignored this additional in- known aspects Ki for each category Ci , and a rating guideline for
formation which is freely available in the review websites. Since the review site, we want to extract a set of aspects Ai = {ai,1 , ai,2 ,
none of the existing benchmark datasets contains this important in- ..., ai,z } for each category Ci represented in the input, and also to
formation, we crawled the well-known review website Epinions.com estimate the rating of each ai,t for each product Pi,j ∈ Ci based
and built a new dataset containing 2.5K reviews about 40 products on the sentiments people expressed in the set of reviews Ri,j .
in 5 categories.
The main contributions of this paper are as follows: 3. ASPECT EXTRACTION
• We propose an unsupervised method for aspect extraction To extract a set of aspects for product Pi,j , Opinion Digger uses
from unstructured reviews using known aspects. the collection of all reviews available for Pi,j in dataset Ri,j . Note
that all of the reviews were automatically Part-Of-Speech (POS)
• We introduce an unsupervised method for aspect rating (on a tagged (i.e. determined whether the word is a noun, verb, adjec-
scale from 1 to 5) based on the rating guideline. tive, etc). Opinion Digger uses the ’pos_tagger’ which is built-in in
In aspect extraction, Opinion Digger uses known aspects to mine NLTK1 , to generate the POS tag of each word. An aspect can be ex-
opinion patterns from reviews and to determine the threshold for pressed by a noun, adjective, verb or adverb, but recent research [5]
the number of matchings. In aspect rating, Opinion Digger uses shows that 60-70% of the aspects are explicit nouns. In addition, in
the rating guideline to estimate the rating of sentiments (adjectives) reviews people more likely talk about product aspects which sug-
and aspects. Note that the ratings of known aspects are withheld gests that aspects should be frequent nouns. But, are all frequent
from the learning method and are used only for evaluation pur- nouns aspects? In the next subsection, we first explain how Opin-
poses. The experimental evaluation on our Epinions.com dataset ion Digger finds frequent nouns, and then answer this question.
supports our claim that the use of this additional information effec-
tively improves the accuracy of opinion mining.
3.1 Finding Frequent Noun phrases
Frequent noun finding is a basic method in opinion mining for
extracting aspects. Irrelevant contents in reviews is often different
2. PROBLEM STATEMENT in different reviews [5]. So, those nouns that are infrequent are
Let C = {C1 , C2 , ..., Ck } be a set of product categories, like likely to be non-aspects or less important aspects.
’Cellular phone’, ’Mp3 Player’, and etc. For each product category Potential Aspect: A potential aspect for the product Pi,j is a
Ci we have a set of products Pi = {Pi,1 , Pi,2 , ..., Pi,n }, such as noun phrase which is frequent in the set of reviews Ri,j .
’Apple Smartphone’, Nokia 6210’, and etc. for product category For each product, Opinion Digger finds frequent noun phrases as
’Cellular phone’. For each product Pi,j there is a set of reviews a set of potential aspects. Opinion Digger finds the stem of each
Ri,j = {ri,j,1 , ri,j,2 , ..., ri,j,m }. Each review ri,j,t contains pairs noun using the Porter Stemmer algorithm [7]. It eliminates all stop
of <known aspects, rating>, review text (sequence of words), and words using the standard list of stop words2 , and only keeps nouns
the overall rating. In our dataset, review text contains complete with non-stopword stems. Then Opinion Digger applies the Apri-
sentences (unstructured reviews). In the following we define the ori algorithm on the remaining nouns to find all multi-part noun
basic notations we use throughout this paper. phrases which are frequent, like ’photo quality’ and ’LCD display’.
Aspect: An aspect is an attribute or component of the product The support of each phrase is equal to the number of times it ap-
that has been commented on in a review. For example, ’battery pears in the review collection. In our work, we use the minimum
life’ in the opinion sentence ’The battery life is too short’. support of 1% to find frequent noun phrases (potential aspects) as
Known Aspect: Known aspects are predefined aspects for each it is used in [4] for the same purpose.
category in the review website for which users explicitly expressed
ratings. Each category Ci has a set of known aspects Ki provided 3.2 Mining Opinion Patterns
by the review website (e.g. ’battery life’ and ’durability’ for cam- In this sub-phase Opinion Digger uses known aspects and mines
corder category). a set of POS patterns they match. We emphasize that mined pat-
Sentiment: Sentiment is a linguistic term which refers to the terns are independent from product categories, so Opinion Digger
direction in which a concept or opinion is interpreted [5]. We use learns the patterns across all reviews. In addition, opinion patterns
sentiment in a more specific sense as an opinion about a product as- will depend on the structure of reviews, therefore if they are mined
pect expressed by an adjective. For example, ’great’ is a sentiment from semi-structured reviews (or unstructured reviews), they can
for the aspect ’picture quality’ in the sentence ’It has great picture be applied to semi-structured reviews (or unstructured reviews) to
quality’. extract aspects.
Orientation: A sentiment can be classified in n-level orienta- To mine patterns, Opinion Digger first finds matching phrases
tion scale. In two-level orientation scale (polarity), a sentiment is for each of the known aspects. It searches for each known aspect in
either positive or negative. While most of opinion mining research the reviews and finds its nearest adjective in that sentence segment
considered two-level orientation scale, most of the review websites as corresponding sentiment. It saves the sentence segment between
use five-level orientations, presented by stars (1 to 5 stars). In this these two as a matching phrase and picks the POS tags of all words
paper we use five-level orientation and rating as synonyms. as a pattern. It replaces the tag of known aspects with the special tag
Rating Guideline: In some of the review websites, when a user ’_ASP’ to identify which part of patterns are aspects. For example,
writes a review he is asked to assign an overall rating to that prod- one of the mined patterns using the known aspect ’movie quality’ is
uct. Theses websites provide some guidelines for users in assign- ’_JJ_ASP’ (i.e. adjective+aspect) which was extracted from ’great
ing overall ratings. For example, Epinions.com provides a rating movie quality’. After mining all POS patterns, Opinion Digger uses
guideline stating that "rating 5 means ’excellent’, rating 4 means Generalized Sequential Pattern (GSP) mining [9] to find frequent
’good’, rating 3 means ’average’, rating 2 means ’poor’, and rating patterns. We use 1% as the minimum support as it is used in [6].
1 means ’terrible’".
1
Problem Definition: Given a set of reviews R about multiple http://www.nltk.org/
2
products that can be from different categories from C, a set of http://ir.dcs.gla.ac.uk/resources/linguistic_utils/stop_words

1826
3.3 Filtering Out Non-Aspects timent.
∑ So, the rating of the sentiment snt is equal to rsnt =
In this paper we will employ a simple constraint on the number i wi × ri /Σi wi , where ri is the rating of the neighbor ni and
of matching patterns. We leave more complicated and potentially wi = 1/distance(snt, ni ). distance(snt, ni ) denotes the min-
even more accurate constraints to future work. We define the factor imum path distance between neighbor ni and sentiment snt in
P num which is the number of opinion patterns that are matched Wordnet hierarchy.
at least once by the potential aspect. Since the average value of
P num for known aspects is 2, a potential aspect will be filtered out 5. EXPERIMENTAL RESULTS
if P num < 2. After applying this constraint on potential aspects, We evaluate our method from different points of view: the accu-
for each product in each category, Opinion Digger outputs a list racy of aspect extraction and of aspect rating.
of filtered potential aspects to the next phase. Applying opinion
patterns can eliminate most of non-aspects from the set of potential 5.1 Dataset
aspects. However, some of them still remain since they match the Since none of the existing benchmark datasets for opinion min-
minimum number of patterns. ing contains known aspects and a rating guideline, we built a crawler
to extract reviews from the Epinions.com website. Our dataset con-
4. ASPECT RATING tains 2.5K reviews in five product categories: Camcorder, cellular
Current research just considers two orientations for an opinion, phone, digital camera, DVD player, and Mp3 player. We selected
positive and negative, but they do not express the strength of posi- eight products in each category with different overall ratings. Table
tiveness or negativeness of an opinion. In other words, they do not 1 shows the distribution of reviews across the rating scale.
clarify how much an opinion is positive/negative and to what extent
a reviewer recommends/not recommend that product to others. In
Table 1: Distribution of Reviews Across the Rating Scale
this paper, we consider a 5-level orientation scale (1 to 5 stars) and Category #Rev. 1-star 2-star 3-star 4-star 5-star
estimate the rating of each aspect of a product on that scale.
Camcorder 197 33% 7% 8% 24% 28%
Note that the aspect rating phase is performed for each product
Cellular Phone 630 19% 9% 9% 27% 37%
separately. In each product Pi,j , for each aspect ai,t , Opinion Dig- Digital Camera 707 17% 6% 8% 26% 43%
ger first extracts the nearest sentiments to each occurrence of that DVD Player 324 24% 7% 10% 31% 28%
aspect in the set of reviews Ri,j . Sentiments are usually nearest ad- MP3 Player 625 12% 6% 8% 31% 43%
jectives in the same sentence segment which describe the quality of Overall 2483 21% 7% 8% 28% 36%
the aspect. Then a set of rated adjectives provided by Epinions.com
(rating guideline) is used, and a k nearest neighbor (KNN) algo-
rithm is applied to estimate the rating of each extracted sentiment. For each category the number of reviews and the percentages of
Wordnet [2] is used to compute the similarity between adjectives reviews that have a given overall rating are shown. For each review
for the KNN algorithm. Finally, Opinion Digger aggregates the we recorded the following information: Known aspects and their
ratings of all sentiments expressed about each aspect to estimate its ratings, full text review, and the overall rating of that review. Table
rating. For each product, Opinion Digger outputs a set of extracted 2 shows the distribution of known aspect ratings across the rating
aspects and their estimated ratings. scale. The most frequent rating is 4, since it has been assigned to
The rating guideline provided by Epinions.com is illustrated in 38% of the known aspects. We will use this rating in evaluation of
Figure 2. It is shown that in a 5-level orientation scale, most adjec- aspect rating.
tives have two nearest neighbors, like ’defective’ which is placed
between ’poor’ and ’terrible’. Some of the adjectives, like those se-
mantically placed above ’excellent’ or below ’poor’ have just one Table 2: Distribution of Known Aspects Across the Rating
nearest neighbor. Therefore, we set k equal to 2 and use a 2-NN Scale
algorithm for aspect rating. Category 1-star 2-star 3-star 4-star 5-star
Camcorder 7.9% 6.9% 0.3% 39% 19.9%
Cellular Phone 4% 7.1% 0.2% 37% 33%
Digital Camera 3% 3.9% 0.1% 37.2% 44.3%
DVD Player 7.3% 9% 0.2% 37.3% 28.3%
MP3 Player 9.8% 9.7% 0.2% 41.9% 15.8%
Overall 5.4% 6.5% 0.2% 38% 33.4%

In addition, we manually created a set of "true" aspects for each


product category as gold standard. We asked some judges to read
the reviews for each category and provide a set of aspects for each
Figure 2: Sentiment Rating Space based on reviews. We will use the gold standard in the evaluation
of aspect extraction.

For each sentiment snt Opinion Digger performs breadth first 5.2 Evaluation of Aspect Extraction
search in the Wordnet synonymy graph with the maximum depth 5 Since there seems to be no automatic way for validating the cor-
to find two rated synonyms from the rating guideline. Then it uses rectness of extracted aspects, all of the current aspect extraction
a distance-weighted nearest neighbor algorithm with a continuous- techniques [4, 6] have been evaluated against a manually created
valued target function to return the weighted average of the rat- gold standard (true aspects). Precision is equal to the number of
ings of 2-nearest neighbors as the estimated rating for the sen- extracted aspects which are true, over the total number of extracted

1827
aspects, and recall is equal to the percentage of true aspects which list in Wordnet) of one of the positive/negative seed sets. The other
were extracted by the method. comparison partner, PRank, is a general ranking algorithm which
To have a fair comparison, the comparison partners should be finds a ranking-prediction rule for ranking input instances (aspects)
unsupervised and should take the same review structure as input. [1].
Since Opinion Digger takes unstructured reviews as input, and Feature-
Based Summarization (FBS) [4] is the only method applicable on Table 4: Ranking Loss of Estimated Ratings for Known As-
unstructured reviews, FBS is chosen as a comparison partner. We pects for different methods
compare our proposed aspect extraction method with three meth-
ods: Naive baseline (as also used in [4, 6, 3]), FBS, and Com-FBS. Category MAJORITY POLARITY PRank. OPD
In Naive baseline, the top k frequent noun phrases for each prod- Camcorder 1.06 0.88 0.74 0.56
uct are selected as aspects. In this paper, since none of the gold Cellular Phone 1.05 0.84 0.72 0.52
standard lists has more than 15 aspects, we set k equal to 15. In Digital Camera 1.04 0.81 0.63 0.49
FBS [4] two types of pruning are applied on frequent noun phrases DVD Player 0.92 0.74 0.52 0.29
to extract aspects from reviews. We propose Com-FBS as another Mp3 Player 1.07 0.91 0.75 0.6
comparison partner, a method which applies FBS to extract aspects Average 1.03 0.84 0.67 0.49
for each product, and considers only candidate aspects which ap-
pear in at least half of the products of the corresponding category.
As Table 4 shows, OPD achieves an average ranking loss of 0.49
(on a 5-star scale), compared to 0.67 for PRank and 0.84 for PO-
Table 3: Average Precision and Recall of Aspect Extraction
Naive FBS Com-FBS Op. Digg. LARITY. A substantial gain in performance is observed across all
Catg. aspects of each product category.
Prc. Rec. Prc. Rec. Prc. Rec. Prc. Rec.
Cam. 20% 23% 43% 45% 61% 53% 77% 82%
Cel. 33% 42% 61% 57% 72% 68% 86% 92% 6. SUMMARY
Dig. 40% 45% 67% 43% 75% 52% 87% 79% In this paper we proposed an unsupervised approach for mining
DVD. 20% 31% 40% 49% 57% 64% 70% 90%
unstructured product reviews which provides a set of product as-
Mp3. 27% 29% 52% 52% 68% 66% 81% 91%
Avg 28% 34% 53% 49% 67% 61% 80% 87% pects and estimates their ratings. As input, Opinion Digger takes
a set of known aspects and a rating guideline in addition to the
review text for each product. Our approach consists of two main
phases: extracting product aspects and estimating aspects ratings.
Table 3 shows that each improvement of the method achieves
Our experimental evaluation on data from the Epinions.com web-
a substantial performance gain in terms of precision and recall.
site demonstrates the accuracy of the proposed method.
FBS outperforms Naive, showing the gain of the pruning meth-
ods proposed by [4]. Com-FBS outperforms FBS, demonstrating
the benefit of aggregation over product categories, as proposed in 7. REFERENCES
our approach. Finally, the full Opinion Digger method is the clear [1] K. Crammer and Y. Singer. Pranking with ranking. In NIPS
winner, demonstrating that using available additional information ’01.
about known aspects effectively improves the accuracy of aspect [2] C. Fellbaum. Wordnet: An electronic lexical database.
extraction. Cambridge, MA: MIT Press, 1998.
The comparison of the precision and recall of different categories [3] H. Guo, H. Zhu, Z. Guo, X. Zhang, and Z. Su. Product
and the consideration of the number of reviews available per cate- feature categorization with multilevel latent semantic
gory shows that all of the methods perform better for those cat- association. In CIKM ’09.
egories that have more reviews (e.g. digital camera and cellular [4] M. Hu and B. Liu. Mining and summarizing customer
phone). reviews. In KDD ’04.
5.3 Evaluation of Aspect Rating [5] B. Liu. Web Data Mining: Exploring Hyperlinks, Contents,
and Usage Data, chapter 11. Springer, 2007.
Estimated ratings are evaluated using Ranking Loss which mea-
[6] B. Liu, M. Hu, and J. Cheng. Opinion observer: analyzing
sures the average distance between the true and predicted numerical
and comparing opinions on the web. In WWW ’05.
ratings [1, 8].
[7] M. F. Porter. An algorithm for suffix stripping. Readings in
Table 4 shows the ranking loss of our algorithm OPD along
information retrieval, 1997.
with various comparison partners, including the simple MAJOR-
ITY baseline (as also used in [8]), the POLARITY baseline (similar [8] B. Snyder and R. Barzilay. Multiple aspect ranking using the
to the method used in [4, 10, 11]), and the Prank method (as also good grief algorithm. In HLT-NAACL ’07.
used in [8]) proposed by [1]. All of these methods output a rating in [9] R. Srikant and R. Agrawal. Mining sequential patterns:
the range from 1 to 5 similar to OPD. As mentioned in Section 7.1 a Generalizations and performance improvements. In EDBT
rating of 4 is the most common rating for all aspects and thus a pre- ’96.
diction of all 4’s for known aspects gives a MAJORITY baseline [10] P. D. Turney. Thumbs up or thumbs down?: semantic
and a natural indication of task difficulty [8]. In the POLARITY orientation applied to unsupervised classification of reviews.
baseline [1, 8], the rating of an aspect is determined by aggregat- In ACL ’02.
ing the polarity of the corresponding adjectives. This method starts [11] J. Zhu, H. Wang, B. K. Tsou, and M. Zhu. Multi-aspect
from a set of seed adjectives which are labeled as positive or nega- opinion polling from textual reviews. In CIKM ’09.
tive, and uses a bootstrapping mechanism to determine the polarity
(positive or negative) of a new adjective. A new adjective will be
labeled as positive/negative, if it appears in the synset (synonym

1828

You might also like