19 (Colleta Sentimen)

See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/264233723
Combining Classiﬁcation and Clustering for Tweet Sentiment Analysis
Conference Paper · January 2014

DOI: 10.13140/2.1.1060.9281
CITATIONS READS
21 3,434
4 authors:
Luiz F. S. Coletta Nádia Félix

Dell Technologies Universidade Federal de Goiás
26 PUBLICATIONS 408 CITATIONS 52 PUBLICATIONS 687 CITATIONS
SEE PROFILE SEE PROFILE
Eduardo R Hruschka Estevam Rafael Hruschka

University of São Paulo 135 PUBLICATIONS 5,021 CITATIONS
114 PUBLICATIONS 4,744 CITATIONS
SEE PROFILE
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Automatic Genetic Generation of Fuzzy Classification Systems. View project
Prophet View project
All content following this page was uploaded by Luiz F. S. Coletta on 26 July 2014.
The user has requested enhancement of the downloaded file.

Combining Classification and Clustering for Tweet
Sentiment Analysis
Luiz F. S. Coletta∗ , Nádia F. F. da Silva∗ , Eduardo R. Hruschka∗ Estevam R. Hruschka Jr.†

∗ Instituteof Mathematics and Computer Science
University of Sao Paulo (USP) at Sao Carlos, Brazil
{luizfsc,nadia,erh}@icmc.usp.br
† Department of Computer Science
Federal University of Sao Carlos (UFSCAR) at Sao Carlos, Brazil

estevam@dc.ufscar.br
Abstract—The goal of sentiment analysis is to determine is considered a classification problem and, analogously to what
opinions, emotions, and attitudes presented in source material. In happens for large documents, sentiments of tweets can be
tweet sentiment analysis, opinions in messages can be typically expressed in different ways [14]. Specifically, they can be
categorized as positive or negative. To classify them, researchers classified according to the existence of sentiment — i.e., if
have been using traditional classifiers like Naive Bayes, Maximum there is sentiment in a message, such message is considered
Entropy, and Support Vector Machines (SVM). In this paper, we
show that a SVM classifier combined with a cluster ensemble can
polar (categorized as positive or negative), otherwise it is
offer better classification accuracies than a stand-alone SVM. In considered neutral. Some authors consider six “universal”
our study, we employed an algorithm, named C3 E-SL, capable emotions [15]: anger, disgust, fear, happiness, sadness, and
to combine classifier and cluster ensembles. This algorithm can surprise. In our work, we adopt the view that tweet sentiments
refine tweet classifications from additional information provided can be either positive or negative, as in [16], [17], [18], [19].
by clusterers, assuming that similar instances from the same
clusters are more likely to share the same class label. The Many researchers have been focusing on the use of tra-
resulting classifier has shown to be competitive with the best ditional classifiers like Naive Bayes, Maximum Entropy, and
results found so far in the literature, thereby suggesting that the Support Vector Machines (SVM) to classify tweets. We show
studied approach is promising for tweet sentiment classification. that the combination of a supervised model and a cluster
ensemble [20], based on different data partitions, can improve
Keywords—Tweet Sentiment Analysis, Classification; Support
Vector Machines, Clustering, Cluster Ensemble.
tweet classification accuracy. To the best of our knowledge,
no study has explored the combination of supervised and
unsupervised models for tweet classification. Our study takes
I. I NTRODUCTION into account an algorithm introduced in [21], whose name is
Twitter is a popular microblogging service in which users C3 E (from Consensus between Classification and Clustering
post status messages, called “tweets”, with no more than 140 Ensembles). This algorithm assumes that clusterers can provide
characters. Such environment represents one of the largest supplementary constraints that help to classify new data —
and most dynamic datasets of user generated content [1], [2]. in particular, that similar instances are more likely to share
Tweets can express opinions on different topics, which can the same class label. To classify tweets, we have used a C3 E
help to direct marketing campaigns to share consumer opinions version based on Squared Loss (SL) function, named C3 E-SL
concerning brands and products [3], outbreaks of bullying [22]. Such an algorithm needs the information (a priori) of
[4], events that generate insecurity [5], and acceptance or two parameters, namely: the relative importance of classifier
rejection of politicians [6], all in an electronic word-of-mouth and cluster ensembles, and the number of iterations of the
way. Automatic tools have helped decision makers to ensure algorithm. To estimate these parameters, we have used a Dy-
efficient solutions for a plenty of problems. The focus of this namic Differential Evolution (D2 E) algorithm [22]. Our study
paper is on the sentiment analysis of tweets. shows that the refinements provided by C3 E-SL (optimized by
means of D2 E) can improve the classification accuracy of a
Sentiment analysis aims at identifying opinions, emotions, stand-alone SVM classifier for tweet sentiment analysis.
and attitudes contained in source material like documents,
short texts, and sentences — e.g., from reviews [7], [8], [9], II. R ELATED WORK
[10], blogs [11], [12], and news [13]. In such domains, we
usually deal with large text corpora and “formal language”. Several studies on the use of stand-alone classifiers for
However, taking into account sentiment analysis of tweets, the tweet sentiment analysis are available in the literature. Some of
concerns are different, in particular due to: (i) the frequency them propose the use of emoticons and hashtags for building
of misspellings and slang in tweets is much higher than in the training set, as Go et al. [23] and Davidov et al. [24], who
other domains, since users frequently post messages (that can identified tweet polarity by using emoticons as class labels.
contain a specific cultural vocabulary) using many different Others have used the characteristics of the social network as
electronic devices (e.g., cell phones and tablets); (ii) Twitter networked data, such as Hu et al. [25]. In their paper, emotional
users post messages on a variety of topics, unlike other sites, contagion theories are materialized based on a mathematical
which are tailored to specific topics. Tweet sentiment analysis optimization formulation for the supervised learning process.
Approaches that integrate learning methods and opinion More precisely, the objective becomes to minimize J in (1)
mining lexicon-based techniques have been studied by Agar- with respect to {yi }ni=1 :
wal et al. [26], Read [27], Zhang et al. [28], and Saif et
al. [29]. They have used lexicons, part-of-speech, and writing
style as linguistics resources. In a similar context, Saif et al. 1X 1 X
J= kyi − π i k2 + α sij kyi − yj k2 (1)
[30] introduced an approach to add semantics to the sentiment 2 2
i∈X (i,j)∈X
analysis training set as an additional feature. For each extracted
entity (e.g., iPhone), they added its respective semantic concept
(like “Apple’s product”) and measured the correlation of the By keeping {yj }nj=1 \ {yi } fixed, we can minimize J in
representative concept as negative/positive sentiments. Equation (1) for every yi by setting:
To achieve better results in classification/clustering ap-

∂J
plications, many approaches based on ensembles have been = 0. (2)
developed and used in practice — for classification problems ∂yi
[31], [32], as for clustering problems [33], [34], [35], [36].
To take advantage of both labeled and unlabeled data, several Considering that the similarity matrix S is symmetric and
researches have designed ways of combining classifiers and observing that ∂kxk2
∂x = 2x, we obtain:
clusterers [22], [37], [38], [39], [40], [41], [21], [42], [43].
Acharya et al. [21] and Gao et al. [43], in particular, deal X
π i + α′ sij yj
with the combination of a handful of classifiers and clusterers
j6=i
with the ultimate goal of classifying new data. Coletta et yi = X , (3)
al. [22] used a Squared Loss (SL) function in the optimiza- 1 + α′ sij
tion algorithm introduced in [21]. Such an algorithm, named j6=i
C3 E-SL, has presented attractive empirical results, besides
being computationally efficient in practice [22], [37], [21]. where α′ = 2α has been set for mathematical conve-
Our work explores the combination of classifiers and nience. Equation (3) can be computed iteratively, for all
clusterers in tweet sentiment classification by using C3 E-SL i ∈ {1, 2, ..., n}, until a maximum number of iterations (I),
[22]. We consider that the additional information provided in order to obtain posterior class probability distributions for
by data partitions built from bag-of-words representation with the instances in X .
lexicons scores can boost classification accuracies yielded by Figure 1 shows an overview of our approach to refine tweet
classifiers. In this sense, clusterers can discover a type of “topic classification. In order to optimize the two user-defined param-
structure” present in the data encoding that is in the form of eters of C3 E-SL, namely the coefficient α, which controls the
meta-information. relative importance of classifier and cluster ensembles, and its
number of iterations, I, we have used a differential evolution
III. R EFINING TWEET CLASSIFICATION VIA CLUSTERING algorithm, as described next.
In [21], the authors designed a framework to combine clas- A. Estimating the C3 E-SL Parameters
sifiers and clusterers aiming at generating a more consolidated
classification. Such a framework, whose core is an optimization The C3 E-SL user-defined parameters, α and I, can be
algorithm named C3 E, assumes that an ensemble of classifiers successfully estimated by means of a Dynamic Differential
(consisting of one or more classifiers) have been previously in- Evolution (D2 E) [22] algorithm, which extends DE [45], [46]
duced from a training set. This ensemble estimates initial class by sampling values for its control parameters, F and Cr.
probability distributions for every instance xi of a target/test The dynamic adaptation takes place if in two consecutive
set X = {xi }ni=1 . These probability distributions are stored generations, there are no changes in the average fitness of the
as c-dimensional vectors {πi }ni=1 , where c is the number current population. To do so, D2 E uses the following rules:
of classes. Assuming that a cluster ensemble can provide
supplementary constraints for such a previous classification,
the probability distributions in {πi }ni=1 are refined taking if (f¯G = f¯G−1 )

<d3 , d4 >,
into account a similarity matrix S = {sij }ni,j=1 — with the <F, Cr>G+1 = (4)
<F, Cr>G , otherwise
rationale that similar instances are more likely to share the
same class label. Matrix S captures similarities between the
instances of X . In particular, each entry corresponds to the where d3 ∈ [0.1, 1], d4 ∈ [0, 1] are values randomly chosen
relative co-occurrence of two instances in the same cluster from a uniform distribution, f¯G and f¯G−1 are the average fit-
[20], [36] — considering all the data partitions built from X . ness of the population in generation G and G− 1, respectively.
It allows F and Cr to assume new values if there is no im-
The classification refinement is obtained from a posterior provement in two consecutive generations. Therefore, D2 E can
class probability distribution for every instance in X — defined avoid local minima (with sufficiently high values of F and/or
as a set of vectors {yi }ni=1 — that is provided by C3 E. Cr), as well as it can emphasize exploitation (with sufficiently
In [21], C3 E exploits general properties of a large class of low values of F and/or Cr), making the algorithm more robust
loss functions, described by Bregman divergences [44]. If we with respect to initial values of its control parameters. D2 E
opt to use a Squared Loss (SL) function as the Bregman is based on the strategy “DE/best/2/bin” [46] and starts with
divergence, the optimization problem can be simplified [22]. N p = 20, F = 0.25, and Cr = 0.25.
Fig. 1. Improving tweet classification by using C3 E-SL algorithm [22]. Class probability distributions computed by SVM are refined taking into account the
information of clusters on the data to be classified.
To optimize the C3 E-SL parameters, D2 E aims at min- 3) Sanders - Twitter Sentiment Corpus: It consists of hand-
imizing the misclassification rate provided by the algorithm classified tweets collected from four Twitter search terms:
in a validation set by combining different values of α = @apple, #google, #microsof t, #twitter. Each tweet has a
{0, 0.001, 0.002, ..., 0.15} and I = {1, 2, ..., 10}. After doing sentiment label: positive, neutral, negative, and irrelevant. As
that, C3 E-SL with α∗ and I ∗ can be used to classify tweets in in [48], we conducted experiments considering only the set
a target/test set. formed from 570 positive tweets and 654 negative ones.
4) Stanford - Twitter Sentiment Corpus: This dataset [23]
IV. E XPERIMENTAL RESULTS has 1.6 million training tweets (half tweets with positive
emoticons and another half with negative emoticons) collected
Experiments were conducted to evaluate the improvements by using a scraper that queries the Twitter API. The test set was
achieved when a SVM classifier is combined with a cluster collected by searching the Twitter API with specific queries
ensemble by means of C3 E-SL algorithm (described in Section including product names, companies, and people. 359 tweets
III). Comparisons were made between the stand-alone SVM were manually annotated — 177 with negative class label and
and its outcomes refined by the C3 E-SL approach. In addition, 182 with positive class label. We used four training sets (with
for comparison purposes, the best classification results reported 500; 1,000; 2,000; and 3,000 tweets) sampled from the original
in the literature are also presented. Details on the used datasets training set.
and the performed preprocessing steps are described next.
B. Preprocessing
A. Datasets Before addressing the preprocessing step, it is instructive to
describe the way the tweets were encoded. We employed bag-
Experiments have been carried out on four representative of-words, where tweets are described by a table in which the
datasets obtained from tweets on different subjects: columns represent the terms (or existing words) in the tweets.
1) Health Care Reform (HCR): This dataset was built by The values in the table represent the frequency of the terms in
crawling tweets containing the hashtag “#hcr” (health care the tweet. Hence, a collection of tweets can be represented as
reform) in March 2010 [2]. A subset of this corpus was illustrated in Table I, in which there are n tweets and m terms.
manually annotated as positive, negative, and neutral. The Each tweeti = (ai1 , ai2 , · · · , aim ), where aij is the frequency
neutral tweets were not considered in the experiments. Thus of term tj . The frequency can be computed in various ways.
our training set contains 621 examples (215 positive and 406 We used binary frequency, assuming that a term is considered
negative) whereas the test set contains 665 (154 positive and “frequent” if it occurs in more than one tweet.
511 negative). TABLE I. R EPRESENTATION OF TWEETS AS BAG - OF - WORDS .
2) Obama-McCain Debate (OMD): This dataset was con- t1 t2 ··· tm
structed from 3,238 tweets crawled during the first U.S.
presidential TV debate that took place in September 2008 [47]. tweet1 a11 a12 ··· a1m
The sentiment ratings of the tweets were acquired by using the tweet2 a21 a22 ··· a2m
Amazon Mechanical Turk1. Each tweet was rated as positive,
··· ··· ··· ··· ···
negative, mixed, and other. “Other” tweets are those that could
not be rated. We kept only the positive and negative tweets tweetn an1 an2 ··· anm
rated by at least three voters, which comprise a set of 1,906
tweets, from which 710 are positive and 1,196 are negative. To preprocess the tweet datasets, we first removed retweets,
stop words, links, URLs, mentions, punctuation, and accentu-
1 https://www.mturk.com/ ation. Stemming was performed in order to minimize sparsity.
We also used the opinion lexicon2 proposed by Hu and Liu For the latter, in particular, we sampled training sets (with
[9], who created a list of 4,783 negative words and 2,006 pos- different sizes) from its original training set. For Obama-
itive ones. This list was compiled over many years, and each McCain Debate (OMD) and Sanders datasets, results were also
word indicates an opinion. Positive opinion words are used obtained over 10 runs, but considering the 2×5 cross-validation
to express desired states, while negative opinion expresses the procedure [54].
opposite. Examples of positive opinion words are: beautiful,
wonderful, good, and amazing. Examples of negative opinion E. Empirical Results
words are: bad, poor, and terrible. We computed and included
the number of positive and negative lexicons in each tweet Table II shows results provided by SVM and their refine-
in order to enrich the data. Emoticons available in the tweets ments by using C3 E-SL on each dataset. The best results found
have also been used in the feature set. Thus, the number of in the literature are also reported. The F-Measure (F1) has
positive and negative emotions were counted to complement been showed because some datasets have unbalanced data. By
the information provided by the bag-of-words and lexicons. observing these results, one can see that the combination of a
classifier with clusterers, as done by C3 E-SL algorithm, can
C. SVM and Cluster Ensemble Settings provide improvements in comparison with the stand-alone use
of SVM, which is considered a state-of-the-art classifier.
For simplicity and aimed at showing a proof of concept,
we have used an ensemble formed by a single component, For the HCR dataset, C3 E-SL provided substantial en-
namely a non-linear SVM with RBF kernel. The RBF kernel hancements over the SVM results. Also, better results than
parameters, C and γ, were estimated by grid-search on the those reported in the literature [2], [30] were obtained —
training set according to Hsu et al. [50]. As for the clustering except for the F-Measure computed on positive tweets.
component, we built a similarity matrix from five data parti- For OMD and Sanders datasets, we can note some slight
tions induced by using K-Medoids clustering algorithm [51] improvements, which are close to the best results in the
with cosine similarity. Based on Kuncheva et al. [52], [53], literature [25], [30], [48]. Considering the positive class F-
each data partition assumed a specific value for the number of Measure on OMD, C3 E-SL achieved better results (80.58%)
clusters k = {2,3,5,6,8}. These values were randomly selected in comparison to that found in the literature (70.30%).
and fixed for all datasets.
For Stanford, we performed experiments on four different
D. Optimization of C3 E-SL training sets (sampled with 500; 1,000; 2,000; and 3,000
balanced tweet examples from the original training set). The
The C3 E-SL optimization consists of searching for values C3 E-SL classification refinements provided better results than
of its parameters, α and I, that could yield to good classifi- SVM in all these scenarios (except for that with 2,000 training
cation accuracy. Ideally, we are looking for an optimal pair of tweets). C3 E-SL also showed competitive accuracies in com-
values (α∗ and I ∗ ) that results in the most accurate classifier parison with the results found in the literature. However, it
for tweet sentiment analysis. In order to estimate these values, is worth noticing that the results reported in [49], [30] were
we employ D2 E algorithm (addressed in Section III-A and that obtained from the whole training set (with 1.6 million tweets).
shown good results in other application domains [22]). So our results are very encouraging, thus suggesting that
The procedure adopted involves three datasets: training, the algorithm being studied is promising for tweet sentiment
validation, and test [54]. The test set (with instances to be classification.
classified) has not been used in the process of building the
ensemble of classifiers — i.e., it is an independent set of data V. C ONCLUSIONS
not used at all to optimize any parameter of the algorithm
and the resulting classifier. Thus, as usual and in conformity Tweet sentiment analysis has been extensively studied in
with real-world applications (on tweet classification), only the recent years. To classify tweet messages, researchers have been
training and validation sets are used to optimize the parameters using stand-alone classifiers like Support Vector Machines
of C3 E-SL. In this work, the validation set was formed with (SVM). We showed that the combination of a SVM classifier
half the training set available. In this sense, firstly a SVM is with a cluster ensemble can boost the classification of tweets
built with 50% of the training set. Then, D2 E estimates α∗ and — with the assumption that similar instances are more likely
I ∗ by minimizing the misclassification rate in the validation to share the same class label. For tweet classification, in
set (formed with the other 50% of the available training set), particular, such assumption sounds reasonable, since tweet
where a cluster ensemble was induced. Thereafter, the resulting messages of a specific topic (to be explored) usually have a
values for the parameters are finally employed to assess the great number of opinions that can be (dis)similar to each other,
classification accuracy in the test set where, again and as but either classified as positive or negative. Different clusters
requested by C3 E-SL, a cluster ensemble similar to the one can encompass similar positive/negative opinions, thereby pro-
built for the validation set must be induced. In this step, a new viding additional information about the structure of the data,
SVM was built considering all the available labeled data (i.e., which can help to classify tweets by using C3 E-SL.
considering 100% of the original training set). Experimental results obtained on four datasets showed
For Health Care Reform (HCR) [30] and Stanford [23] that the combination of a SVM classifier with a ensemble
datasets, results were obtained over 10 runs from the same cluster, as done by C3 E-SL algorithm, can improve the tweet
training and test sets available in their public resources. classification. In most of the cases, C3 E-SL provided bet-
ter results than the stand-alone SVM. Furthermore, C3 E-SL
2 Available at http://www.cs.uic.edu/∼ liub/FBS/sentiment-analysis.html. showed competitive results in comparison with those found in
TABLE II. AVERAGE CLASSIFICATION ACCURACIES (%) AND F-M EASURES (%) OBTAINED BY USING SVM AND C3 E-SL ON EACH DATASET. F OR THE
S TANFORD DATASET, WE SAMPLED TRAINING SETS WITH DIFFERENT SIZES ( BEST RESULTS IN BOLD ). F OR COMPARISON PURPOSES , THE BEST RESULTS
FROM THE LITERATURE ARE ALSO PRESENTED .
Accuracy (%) F1 - Positive Class (%) F1 - Negative Class (%)

Dataset
SVM C3 E-SL SVM C3 E-SL SVM C3 E-SL
Health Care Reform (HCR) 74.29 79.62 43.56 32.13 83.35 88.00
Literature [2], [30] 71.20 50.30 86.00
Obama-McCain Debate (OMD) 75.15 75.18 80.14 80.58 67.24 64.24
Literature [25], [30] 76.30 70.30 85.40
Sanders 82.11 82.15 80.43 80.80 83.48 83.27
Literature [48] 84.40 - -
Stanford - trained with 500 tweets 74.65 77.69 74.65 77.33 74.65 78.02
∗ ∗ ∗
Literature [49], [30] 87.20 82.50 85.30
∗
By using the whole training set — i.e., 1.6 million tweets.
the literature. We note that for the HCR dataset our accuracy is [4] J.-M. Xu, K.-S. Jun, X. Zhu, and A. Bellmore, “Learning from
the best one reported so far in the literature. For the Stanford bullying traces in social media.” in HLT-NAACL. The Association
dataset, the best accuracy found, obtained with the complete for Computational Linguistics, 2012, pp. 656–666.
training set (formed by 1,600,000 tweets), is 87.20%, whereas [5] M. Cheong and V. C. Lee, “A microblogging-based approach to ter-
our approach trained with only 0.19% of these data achieved an rorism informatics: Exploration and chronicling civilian sentiment and
response to terrorism events via twitter,” Information Systems Frontiers,
average accuracy of 81.84%. Although more experiments are vol. 13, no. 1, pp. 45–59, Mar. 2011.
necessary before stating stronger claims, the studied algorithm
[6] N. A. Diakopoulos and D. A. Shamma, “Characterizing debate perfor-
has shown to be promising for tweet sentiment analysis. mance via aggregated twitter sentiment,” in Proceedings of the SIGCHI
Conference on Human Factors in Computing Systems, ser. CHI ’10.
There are several aspects that can be investigated in the New York, NY, USA: ACM, 2010, pp. 1195–1198.
future. For example, ways of inducing “good” data partitions
[7] P. D. Turney, “Thumbs up or thumbs down?: semantic orientation
to compose a cluster ensemble deserve attention. In particular, applied to unsupervised classification of reviews,” in Proceedings of
clustering algorithms such as those based on Latent Dirichlet the 40th Annual Meeting on Association for Computational Linguistics,
Allocation [55] can be used to explicitly find topics in data. ser. ACL 02. Stroudsburg, PA, USA: Association for Computational
Also, the use of active learning to leverage the combination Linguistics, 2002, pp. 417–424.
results of classifiers and clusterers is a promising future work. [8] B. Pang, L. Lee, and S. Vaithyanathan, “Thumbs up? sentiment classifi-
cation using machine learning techniques,” in Proceedings of EMNLP,
2002, pp. 79–86.
ACKNOWLEDGMENT
[9] M. Hu and B. Liu, “Mining and summarizing customer reviews,” in
The authors would like to acknowledge Brazilian Re- Proceedings of the tenth ACM SIGKDD international conference on
search Agencies Capes (Proc. DS-7253238/D), CNPq (Proc. Knowledge discovery and data mining, ser. KDD ’04. New York, NY,
USA: ACM, 2004, pp. 168–177.
303348/2013-5) and FAPESP (Proc. 2013/07375-0 and
2010/20830-0) for the financial support. [10] ——, “Mining opinion features in customer reviews,” in Proceedings
of the 19th national conference on Artifical intelligence, ser. AAAI’04.
AAAI Press, 2004, pp. 755–760.
R EFERENCES [11] B. He, C. Macdonald, J. He, and I. Ounis, “An effective statistical
[1] A. Ritter, S. Clark, Mausam, and O. Etzioni, “Named entity recognition approach to blog post opinion retrieval,” in Proceedings of the 17th
in tweets: An experimental study,” in Proceedings of the Conference ACM conference on Information and knowledge management, ser.
on Empirical Methods in Natural Language Processing, ser. EMNLP CIKM ’08. New York, NY, USA: ACM, 2008, pp. 1063–1072.
’11. Stroudsburg, PA, USA: Association for Computational Linguistics, [12] P. Melville, W. Gryc, and R. D. Lawrence, “Sentiment analysis of
2011, pp. 1524–1534. blogs by combining lexical knowledge with text classification,” in
[2] M. Speriosu, N. Sudan, S. Upadhyay, and J. Baldridge, “Twitter polarity Proceedings of the 15th ACM SIGKDD international conference on
classification with label propagation over lexical links and the follower Knowledge discovery and data mining, ser. KDD ’09. New York, NY,
graph,” in Proceedings of the First Workshop on Unsupervised Learning USA: ACM, 2009, pp. 1275–1284.
in NLP, ser. EMNLP ’11. Stroudsburg, PA, USA: Association for [13] A. Balahur, R. Steinberger, M. Kabadjov, V. Zavarella, E. van der
Computational Linguistics, 2011, pp. 53–63. Goot, M. Halkia, B. Pouliquen, and J. Belyaeva, “Sentiment analysis
[3] B. J. Jansen, M. Zhang, K. Sobel, and A. Chowdury, “Twitter power: in the news,” in Proceedings of the Seventh International Conference
Tweets as electronic word of mouth,” J. Am. Soc. Inf. Sci. Technol., on Language Resources and Evaluation (LREC’10), N. C. C. Chair),
vol. 60, no. 11, pp. 2169–2188, nov 2009. K. Choukri, B. Maegaard, J. Mariani, J. Odijk, S. Piperidis, M. Rosner,
and D. Tapias, Eds. Valletta, Malta: European Language Resources [34] P. Hore, L. Hall, and D. Goldgof, “A scalable framework for cluster
Association (ELRA), may 2010. ensembles,” Pattern Recognition, vol. 42, no. 5, pp. 676–688, 2009.
[14] A. Montoyo, P. Martnez-Barco, and A. Balahur, “Subjectivity and [35] A. Gionis, H. Mannila, and P. Tsaparas, “Clustering aggregation,” in
sentiment analysis: An overview of the current state of the area and ACM Trans. Knowl. Discov. Data. New York, NY, USA: ACM, 2007,
envisaged developments,” Decision Support Systems, vol. 53, no. 4, pp. vol. 1.
675 – 679, 2012. [36] A. Strehl and J. Ghosh, “Cluster Ensembles - A Knowledge Reuse
[15] P. Ekman, Emotion in the Human Face. Cambridge University Press, Framework for Combining Multiple Partitions,” Journal of Machine
1982, vol. 2. Learning Research, vol. 3, pp. 583–617, 2002.
[16] B. Liu, Sentiment Analysis and Opinion Mining, ser. Synthesis Lectures [37] A. Acharya, E. R. Hruschka, J. Ghosh, and S. Acharyya, “An optimiza-
on Human Language Technologies. Morgan & Claypool Publishers, tion framework for combining ensembles of classifiers and clusterers
2012. with applications to non-transductive semi-supervised learning and
transfer learning,” ACM Transactions on Knowledge Discovery from
[17] ——, Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data, 2014, (to appear).
Data, ser. Data-Centric Systems and Applications. Secaucus, NJ, USA:
Springer-Verlag New York, Inc., 2006. [38] H. Gan, N. Sang, R. Huang, X. Tong, and Z. Dan, “Using clustering
analysis to improve semi-supervised classification,” Neurocomputing,
[18] ——, Sentiment analysis and subjectivity. Taylor and Francis Group, vol. 101, pp. 290 – 298, 2013.
Boca, 2010.
[39] A. Rahman and B. Verma, “Cluster-based ensemble of classifiers,”
[19] A. Agarwal and P. Bhattacharyya, “Sentiment analysis: A new approach Expert Systems, vol. 30, no. 3, pp. 270–282, 2013.
for effective use of linguistic knowledge and exploiting similarities in a [40] R. Soares, H. Chen, and X. Yao, “Semisupervised classification with
set of documents to be classified,” in Proceedings of the International cluster regularization,” Neural Networks and Learning Systems, IEEE
Conference on Natural Language Processing (ICON), 2005. Transactions on, vol. 23, no. 11, pp. 1779–1792, 2012.
[20] J. Ghosh and A. Acharya, “Cluster ensembles,” Wiley Interdisc. Rew.: [41] Q. Qian, S. Chen, and W. Cai, “Simultaneous clustering and classifica-
Data Mining and Knowledge Discovery, vol. 1, no. 4, pp. 305–315, tion over cluster structure representation,” Pattern Recognition, vol. 45,
2011. no. 6, pp. 2227 – 2236, 2012.
[21] A. Acharya, E. R. Hruschka, J. Ghosh, and S. Acharyya, “C3E: A [42] W. Cai, S. Chen, and D. Zhang, “A multiobjective simultaneous learning
framework for combining ensembles of classifiers and clusterers,” in framework for clustering and classification,” Neural Networks, IEEE
Multiple Classifier Systems. LNCS Vol. 6713, Springer, 2011, pp. Transactions, vol. 21, no. 2, pp. 185–200, 2010.
269–278.
[43] J. Gao, F. Liangy, W. Fanz, Y. Suny, and J. Han, “Graph-based
[22] L. F. S. Coletta, E. Hruschka, A. Acharya, and J. Ghosh, “A differential Consensus Maximization among Multiple Supervised and Unsupervised
evolution algorithm to optimise the combination of classifier and cluster Models,” in 23rd Annual Conference on Neural Information Processing
ensembles,” International Journal of Bio-Inspired Computation, 2014, Systems, December 2009.
(to appear). [44] A. Banerjee, S. Merugu, I. S. Dhillon, and J. Ghosh, “Clustering with
[23] A. Go, R. Bhayani, and L. Huang, “Twitter sentiment classification us- bregman divergences,” J. Mach. Learn. Res., vol. 6, pp. 1705–1749,
ing distant supervision,” Unpublished Manuscript, Stanford University, 2005.
pp. 1–6, 2009. [45] R. Storn and K. Price, “Differential evolution - a simple and efficient
[24] D. Davidov, O. Tsur, and A. Rappoport, “Enhanced sentiment learning heuristic for global optimization over continuous spaces,” Jornal of
using twitter hashtags and smileys,” in Proceedings of the 23rd Interna- Global Optimization, vol. 11, no. 4, pp. 341–359, 1997.
tional Conference on Computational Linguistics: Posters, ser. COLING [46] K. Price, “Differential evolution: a fast and simple numerical optimizer,”
’10. Stroudsburg, PA, USA: Association for Computational Linguistics, in Fuzzy Information Processing Society, 1996. NAFIPS. 1996 Biennial
2010, pp. 241–249. Conference of the North American. IEEE, 1996, pp. 524–527.
[25] X. Hu, L. Tang, J. Tang, and H. Liu, “Exploiting social relations for [47] D. A. Shamma, L. Kennedy, and E. F. Churchill, “Tweet the debates:
sentiment analysis in microblogging,” in Proceedings of the sixth ACM understanding community annotation of uncollected sources,” in Pro-
international conference on Web search and data mining, 2013. ceedings of the first SIGMM workshop on Social media. ACM, 2009,
[26] A. Agarwal, B. Xie, I. Vovsha, O. Rambow, and R. Passonneau, pp. 3–10.
“Sentiment analysis of twitter data,” in Proceedings of the Workshop [48] D. Ziegelmayer and R. Schrader, “Sentiment polarity classification
on Languages in Social Media, ser. LSM ’11. Stroudsburg, PA, USA: using statistical data compression models.” in ICDM Workshops,
Association for Computational Linguistics, 2011, pp. 30–38. J. Vreeken, C. Ling, M. J. Zaki, A. Siebes, J. X. Yu, B. Goethals, G. I.
[27] J. Read, “Using emoticons to reduce dependency in machine learning Webb, and X. Wu, Eds. IEEE Computer Society, 2012, pp. 731–738.
techniques for sentiment classification,” in Proceedings of the ACL [49] A. Bakliwal, P. Arora, S. Madhappan, N. Kapre, M. Singh, and
Student Research Workshop, ser. ACLstudent ’05. Stroudsburg, PA, V. Varma, “Mining sentiments from tweets,” in Proceedings of the 3rd
USA: Association for Computational Linguistics, 2005, pp. 43–48. Workshop on Computational Approaches to Subjectivity and Sentiment
Analysis (WASSA), vol. 12, 2012.
[28] L. Zhang, R. Ghosh, M. Dekhil, M. Hsu, and B. Liu, “Combining
lexicon-based and learning-based methods for twitter sentiment analy- [50] C.-W. Hsu, C.-C. Chang, C.-J. Lin et al., “A practical guide
sis.” HP Laboratories, 2011, technical Report. to support vector classification, 2003,” Paper available at
http://www.csie.ntu.edu.tw/ cjlin/papers/guide/guide.pdf, 2010.
[29] S. Mohammad, S. Kiritchenko, and X. Zhu, “Nrc-canada: Building
the state-of-the-art in sentiment analysis of tweets,” in Proceedings of [51] L. Kaufman and P. Rousseeuw, Clustering by means of medoids. North-
the seventh international workshop on Semantic Evaluation Exercises Holland, 1987.
(SemEval-2013), Atlanta, Georgia, USA, June 2013. [52] L. Kuncheva, S. Hadjitodorov, and L. Todorova, “Experimental com-
[30] H. Saif, Y. He, and H. Alani, “Semantic sentiment analysis of twitter,” in parison of cluster ensemble methods,” in Information Fusion, 2006 9th
Proceedings of the 11th international conference on The Semantic Web International Conference on, July 2006, pp. 1–7.
- Volume Part I, ser. ISWC’12. Berlin, Heidelberg: Springer-Verlag, [53] L. I. Kuncheva and S. T. Hadjitodorov, “Using diversity in cluster
2012, pp. 508–524. ensembles,” in Systems, man and cybernetics, 2004 IEEE international
conference on, vol. 2. IEEE, 2004, pp. 1214–1219.
[31] N. C. Oza and K. Tumer, “Classifier ensembles: Select real-world
applications,” Information Fusion, vol. 9, no. 1, pp. 4–20, 2008. [54] I. H. Witten and E. Frank, Data mining: practical machine learning
tools and techniques, 2nd ed., ser. Morgan Kaufmann Series in Data
[32] L. I. Kuncheva, Combining Pattern Classifiers: Methods and Algo- Management Systems. Morgan Kaufmann, 2005.
rithms. Hoboken, NJ: Wiley-Interscience, 2004.
[55] D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent dirichlet allocation,”
[33] R. Ghaemi, N. Sulaiman, H. Ibrahim, and N. Mustapha, “A Survey: J. Mach. Learn. Res., vol. 3, pp. 993–1022, Mar. 2003.
Clustering Ensembles Techniques,” Proceedings of World Academy of
Science, Engineering and Technology, vol. 38, February 2009.
View publication stats

19 (Colleta Sentimen)

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

19 (Colleta Sentimen)

Uploaded by

Copyright:

Available Formats

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Combining Classiﬁcation and Clustering for Tweet Sentiment Analysis

Conference Paper · January 2014

Luiz F. S. Coletta Nádia Félix

SEE PROFILE SEE PROFILE

Eduardo R Hruschka Estevam Rafael Hruschka

Automatic Genetic Generation of Fuzzy Classification Systems. View project

Prophet View project

The user has requested enhancement of the downloaded file.

Luiz F. S. Coletta∗ , Nádia F. F. da Silva∗ , Eduardo R. Hruschka∗ Estevam R. Hruschka Jr.†

Federal University of Sao Carlos (UFSCAR) at Sao Carlos, Brazil

To achieve better results in classification/clustering ap-

Accuracy (%) F1 - Positive Class (%) F1 - Negative Class (%)

Literature [2], [30] 71.20 50.30 86.00

Obama-McCain Debate (OMD) 75.15 75.18 80.14 80.58 67.24 64.24

Literature [25], [30] 76.30 70.30 85.40

Sanders 82.11 82.15 80.43 80.80 83.48 83.27

Literature [48] 84.40 - -

View publication stats

You might also like