Download as pdf or txt
Download as pdf or txt
You are on page 1of 24

Poznań Studies in Contemporary Linguistics 55(2), 2019, pp.

445–468
© Faculty of English, Adam Mickiewicz University, Poznań, Poland
doi: 10.1515/psicl-2019-0016

SENTIMENT ANALYSIS FOR POLISH

ALEKSANDER WAWER
Institute of Computer Science, Polish Academy of Sciences, Warsaw
axw@ipipan.waw.pl ORCID 0000-0002-7081-9797

ABSTRACT

This article is a comprehensive review of freely available tools and software for senti-
ment analysis of texts written in Polish. It covers solutions which deal with all levels of
linguistic analysis: starting from word-level, through phrase-level and up to sentence-
level sentiment analysis. Technically, the tools include dictionaries, rule-based systems
as well as deep neural networks. The text also describes a solution for finding opinion
targets. The article also contains remarks that compare the landscape of available tools in
Polish with that for English language. It is useful from the standpoint of multiple disci-
plines, not only information technology and computer science, but applied linguistics
and social sciences.

KEYWORDS: Sentiment analysis; opinion extraction; opinion targets; deep neural net-
works.

1. Introduction

The purpose of this article is to review various tools available for Polish lan-
guage sentiment analysis and related areas. The understanding of sentiment
analysis in this text is a very broad one: it consists of distinguishing whether a
document, sentence, phrase or word is positive, negative or neutral. The field is
closely related to opinion mining and opinion classification.
It is worth noting that multiple different definitions of sentiment analysis
and opinion mining are in use. In his book, Bing Liu (Liu 2012) defines an opin-
ion as a quintuple composed of an entity, it’s aspect, a sentiment on this aspect,
a holder and a time when it’s expressed. The sentiment is positive, negative, or
neutral, or expressed with different strength/intensity levels. In this understand-
446 A. Wawer

ing, sentiment analysis is one of sub-problems of a wider area of opinion extrac-


tion.
This article describes tools freely available as of mid 2018 to automatically
analyze sentiments in Polish-language texts. The main focus is on software ap-
plications (though this analysis assumes a wider understanding) as the notion of
tools includes dictionaries applicable to automated text analysis. The article also
deals with (although more briefly) related resources, which the software tools
depend on: datasets and corpora of various kinds used to train the models, as is
the case with machine learning approaches.
The topic of tools for sentiment analysis, as defined in this text, covers all of
its possible sub-types. The main categorization is based on levels at which the
sentiment analysis is carried out. It begins with word-level analysis (where dic-
tionaries play a dominant role), phrase-level analysis including phrases derived
from syntactic analysis of a sentence, sentence-level analysis and finally, docu-
ment-level analysis. These levels are obviously interrelated. Distinguishing
whether a sentence is positive or negative has to take into account word- and
phrase-level information. Nevertheless, the levels are considered seperate in
many scenarios.
Opinion target or aspect-based sentiment analysis is another subproblem de-
scribed in this paper which arises when performing sentiment analysis. This re-
fers to a fine-grained analysis which decomposes opinions into their various as-
pects of the objects under evaluation. It is typically applied to product reviews,
as they contain elaborate statements about features which, considered together
with the associated evaluative (sentiment-expressing) vocabulary, constitute a
list of advantages and disadvantages.
As it was mentioned before, the main intention of this text is to describe
tools publicly and freely available, usable for other researchers, either as soft-
ware packages or online through web application interfaces (APIs). This as-
sumption excludes various commercially available toolkits, offered by multiple
companies for the purpose of analyzing opinions in Polish language texts. Such
tools are typically available via web interfaces. Their usability is mostly relevant
for marketers, as they often include an option to automatically gather data from
social media. We only briefly mention approaches and experiments described in
various papers that did not publish their related tools.
The article is organized by levels on which sentiment analysis is carried out.
We begin with word-level by describing sentiment dictionaries in Section 2. We
then move to phrase-level, discussing rule-based tools in Section 3 and recurrent
neural networks in Section 4. Finally, Section 5 covers tools for detecting opin-
Sentiment analysis for Polish 447

ion targets. The article is concluded in Section 7, which contains a discussion of


possible future developments in the area of sentiment analysis.

2. Word level: Sentiment dictionaries

In this section we discuss the most basic building blocks of word-level senti-
ment analysis: dictionaries. Typically, these resources assign sentiment values
(for example labels: positive, neutral, negative) to lexical units, often word base
forms. In Section 2.1 we cover automatically generated dictionaries, in Section
2.2 manually created dictionaries are discussed.

2.1. Automatically generated dictionaries

2.1.1. From word co-occurrences: Słownik Wydźwięku

Automatically generated sentiment dictionaries were the first publicly available


resources in Polish. The methods of their generation include supervised and
weakly supervised machine learning approaches. In both of these methods, an
algorithm is presented with a number of words with known, human-labeled sen-
timents.
In the supervised method, the algorithm learns to predict sentiment from
aggregated information acquired by analyzing occurrences of these words in a
corpus. Trained models based on those algorithms may then be used to predict
sentiments of previously unseen words, and in theory, assign sentiment labels to
large sets of words. Possibly even to the whole Polish lexicon. The main ad-
vantage of this method is low cost of sentiment labels acquisition: requiring on-
ly small sets of sentiment-annotated words, used as examples to train the auto-
mated models. Their obvious downside is average quality, automated methods
are prone to errors because predicting word sentiment is a difficult task. One ex-
ample of a fully supervised approach was described in (Wawer and Rogozińska
2012). To predict a word’s sentiment, the authors used aggregated contexts of its
occurrences in the National Corpus of Polish (NKJP). The aggregated contexts
were formed by counting each co-occurring word in concordance windows and
thus forming a vector of co-occurrences. Such vectors were then used as input
of multiple classification methods such as Random Forest and SVM, to predict a
word’s sentiment in a supervised manner.
448 A. Wawer

An example of the weakly supervised approach to predicting a word’s sen-


timent was described in (Wawer 2012). This method requires only small sets of
so called paradigm words (about 5 to 10 words with clear positive or negative
sentiment orientation). To predict sentiment of an input word, one needs to
count its co-occurrences in concordance windows with each of the paradigm
words. If it co-occurs more often with positive paradigm words than negative
ones, it is classified as positive, and vice versa. The exact formula used to pre-
dict a word’s sentiment is called SO-PMI (Semantic Orientation – Pointwise
Mutual Information). Because there are few sets of paradigm words (as men-
tioned, 5–10 in each class), it is crucial that the co-occurrence frequency esti-
mates are not biased by corpora. Originally, the method was applied to English
words with Yahoo browser by providing co-occurrence results using results
from the entire World Wide Web. Its Polish equivalent as in (Wawer 2012) only
used the National Corpus of Polish, a much smaller data source.
Both methods discussed above were applied to a list of over 4 thousand
Polish words. The dictionary is available for download.1
It contains scorings of following sentiment classes from the fully supervised
method:
– 2-class: Neutral words, marked as (0), are distinguished from all other
words, regardless of their polarity (1).
– 3-class: Words are either negative (−1), neutral (0) or positive (1).
– 5-class: Words are classified as very negative (−2), negative (−1), neutral
(0), positive (1) and very positive (2).
It is important to note that in the 3-class and 5-class scoring, ’neutral’ includes
both neutral and subjective words.
The dictionary also contains weakly supervised sentiment estimates com-
puted using a regression-like SO-PMI formula. Negative numbers should in
theory be assigned to negative words and vice versa. The absolute value of a
number indicates the strength of word’s sentiment orientation. However, the re-
ported accuracy of SO-PMI is lower than that of the fully supervised method.
The results reported in (Wawer 2012) were highly promising with an accu-
racy of over 0.8 measured on a subset of the lexicon. Unfortunately, a random
examination of the dictionary reveals a number of rather serious errors, which
indicates that the accuracy at the level of the whole four thousand-word diction-
ary might be lower than reported for its subset used in evaluation.

1
http://zil.ipipan.waw.pl/SlownikWydzwieku
Sentiment analysis for Polish 449

2.1.2. Compiled from reviews

Sentiment dictionaries may also be compiled from corpora with sentiment labels
attached to documents, such as customer reviews with ratings. Positive reviews
tend to contain more positively oriented words, negative words appear more of-
ten in negative reviews. A method of sentiment dictionary acquisition which ex-
plores this intuition has been described in Haniewicz et al. (2013). Unfortunate-
ly, the dictionary discussed in this article is not publicly available. It is based on
another resource (a semantic network), also not publicly available. The reported
evaluation of the dictionary is performed on reviews from the same domain as
those used to generate the dictionary and reaches 0.79 in terms of accuracy
computed on the level of reviews. Word-level accuracy is likely lower.
For languages other than Polish, other approaches have been proposed. For
example, Rill et al. (2012) introduced a method to derive sentiment phrases
from reviews. Authors generated lists of opinion bearing phrases with their
opinion values in a continuous range between −1 and 1.

2.2. Manually created dictionaries

2.2.1. plWordNet Emo

The only publicly available, manually created sentiment Polish language dic-
tionary is the one described in (Zaśko-Zielińska et al. 2015). The dictionary, as
described in the article, is annotated on the level of lexical units from plWord-
Net. The dictionary contains sentiment annotations of 32 thousands lexical
units: 19.6 for nouns and 11.6 for adjectives. It has five polarity labels (strong
positive, weak positive, neutral, weak negative, strong negative).
One interesting property of this dictionary is the fact that it allows a word to
have multiple, possibly even opposing sentiment labels depending on its sense.
This is done via linking the units to their synsets, determined by word senses
and synonymy relation. In most cases this turns out to be sufficient. However,
about 5% of the lexical units were marked as ambiguous. Even within their
senses as determined by synonymy relation and synsets, annotators identified
various sentiment polarities.
The usage of this dictionary to label the text (assign word sentiments) re-
quires determining senses of ambiguous words by analyzing sentence- or even
document-level context of their appearances. In automated analysis, a word
sense disambiguation algorithm should decide which sense has been used in a
450 A. Wawer

given sentence. Unfortunately, assigning correct plWordNet senses to word oc-


currences is a difficult task. Słowosieć (The Polish WordNet) suffers from noto-
riously high number of senses which are difficult to distinguish. It is often diffi-
cult to image the actual uses of some of the rarer senses, even for a human.
The practical usage of the sentiment dictionary based on plWordNet in au-
tomated scenarios such as software applications and scripts is therefore limited
by the possible solutions to a word sense disambiguation problem.
The creators of plWordNet provide a tool for automated word sense disam-
biguation. The algorithm is inspired by page-rank approaches. It has been de-
scribed in (Kędzia et al. 2015). The tool is available via either a web interface2
or a REST API. In the latter case, there are example client scripts in Java and
Python that demonstrate how to use it for automated processing.3 The tool takes
Polish language texts as input and returns their senses as they are recognized in
actual sentences. It requires that morphosyntactic analysis and disambiguation
has been carried out previously.
The reported accuracy depends on algorithm options and part-of-speech
(POS). The worst precision was reported for verbs (up to 0.37). Noun senses
tended to be recognized with somewhat better precision (up to 0.58). Compre-
hensive evaluation may be found in Tables 6–10 in Zaśko-Zielińska et al.
(2015).
Overall, performance at such a level may seriously downgrade the quality of
an overall sentiment analysis solution which was designed using the dictionary.

2.3. Sentiment dictionaries: Overview of current state

The state of dictionaries can be summarized as follows. There are two available
resources, each with some drawbacks. The first is one automatically generated
with lexeme-level sentiment assignments. It suffers from a relative lower quality
of sentiment labellings, but is usable for automated text processing. The second
one is a dictionary with manual word sense-level sentiment assignments. Its
quality is unquestionably better, but it is less feasible for automated use due to
word sense disambiguation issues described previously.
Possibly, the best choice for real world usage is to combine both dictionaries
into one list, make sure that sentiment labels correspond to the most frequent
sense of each word. This approximates lexeme-level annotation.

2
http://ws.clarin-pl.eu/wsd.shtml
3
http://nlp.pwr.wroc.pl/redmine/projects/nlprest2/wiki
Sentiment analysis for Polish 451

3. Phrase level: Sentipejd

This section describes a tool for rule-based phrase-level sentiment computation.


Although the principles behind this approach are rather simple, based on our
knowledge of the Polish language the only such publicly available tool is Senti-
pejd (Buczyński and Wawer 2008a; Buczyński and Wawer 2008b).
Access to this tool is possible via a Multiservice web API.4 Example client
scripts that demonstrate how to use it for automated processing are available via
Git.5
The service provides convenient integration with other downstream services
such as morphosyntactic analysis with disambiguation. In order to use the Sen-
tipejd module, one needs to use a POS tagger first, as the tool requires morpho-
logically disambiguated texts as its input.
Sentipejd is a sentiment-oriented grammar for multi-purpose shallow pars-
ing engine Spejd (Przepiórkowski and Buczyński 2007). Sentipejd was devel-
oped to deal with expressions of sentiment spanning over multiple words. Spe-
cifically, it is capable of detecting multi-word phrases and inverting or erasing
their sentiments according to a provided set of rules, processed as a cascade.
Sentipejd requires a sentiment dictionary to pre-process data on on individ-
ual-word level. The dictionary used in Spejd’s Multiservice version is a manual-
ly created list of over 3 thousand words with labeled sentiments (positive and
negative). It has not been listed in the previous section as it is not publicly
available.
Let us examine Sentipejd’s history and its original intended usage. The rules
have been developed to increase the accuracy of product review classification.
Their output, aggregated with word-level dictionary analysis (aggregated infor-
mation about frequencies of negative and positive words) may then be used as
an input to classification algorithm, which decides whether a review is positive,
neutral or negative (or using any other more or less fine-grained categorization).
The most detailed presentation of rule types in Sentipejd can be found in
Buczyński and Wawer (2008b). In the paper, Spejd formalism rules are provided
to exemplify each type. We simply list and discuss rule types. They are as fol-
lows:

– Affirmation – emphasis of sentiment (eg. ‘very good’). Can be used to dou-


ble-count such sentiment expressions.

4
http://multiservice.nlp.ipipan.waw.pl/
5
http://git.nlp.ipipan.waw.pl/multiservice/clients
452 A. Wawer

– Negation – operation of reversing sentiment polarity. For example, polecam


‘I recommend’ should remain positive, while nie polecam ‘I do not recom-
mend’ switches positive into negative.
– Nullification – lack of certain property which erases existing sentiment re-
gardless of its value. For example nie mam zastrzeżeń ‘I have no objections’
or zero wad ‘zero defects’.
– Limitation – special type of polarity reversal. A limiting expression tells us
that an expression of positive and negative sentiment has only a very limited
extent, therefore hinting that the general sentiment of the review is the oppo-
site of the expression. Examples: jedyny problem ‘the only problem’, jedyna
zaleta ‘the only advantage’.
– Negative Modification – adjective of negative sentiment preceeding a noun
of positive sentiment, for example koszmarna jakość ‘nightmarish quality’.

There is one more rule type (which isn’t mentioned here) and possible in Senti-
pejd: it is simply a listing of all multi-word expressions that carry some senti-
ment as a whole, while none of the words have any sentiment when considered
individually. This scenario is similar to idiomatic expression detection.
The application of rules in Buczyński and Wawer (2008b) demonstrated to
increase the review classification accuracy up to four percent, depending on set-
ting. The effect was more noticeable in the case of more wordy, longer product
reviews.
To sum up, Sentipejd provides an improvement over word-level dictionary
only analysis via extension into phrase-level sentiment analysis. It is a highly
versatile tool that can quite easily be adapted into many domains by changing or
fine-tuning its sentiment lexicon. Rules, as they are now implemented, are not
overly lexicalized and should be easy to re-use.
Its main drawback is no support for sentence-level sentiment and handling
more complex phrasal structures. It is not clear how to combine sentiment from
multiple sentiment sources. For example, it is not clear how to combine senti-
ment from a negative word and positive phrase which both occurred in the same
sentence. Which takes precedence?
This boils down to the fact that grammars, which empower Spejd, are shal-
low and often there is no single, usable syntactic link between such phrases and
words (which would create a rule for deciding which takes precedence). It is
hard, if not entirely impossible, to enumerate all possible cases using shallow
grammar rules such as Sentipejd’s. The difficulty becomes compounded due to
relatively free word order when constructing Polish sentences.
Sentiment analysis for Polish 453

Solutions have been proposed to handle this problem in modern day senti-
ment analysis (also in Polish) as presented in Section 4. The next section reveals
tools suitable for computing sentiment of any syntactically coherent phrase,
provided that it can be formed using a sentence’s dependency parse graph.

4. Phrase and sentence level: Recurrent neural networks

4.1. Introduction

This section presents tools that are suitable not only for all previously men-
tioned tasks (word-level and phrase-level sentiment analysis) but additionally
also for sentence-level analysis. As previously mentioned, these tools can pre-
dict sentiment of almost any phrase, including much more complex phrases than
Sentipejd. The only restriction on phrase type is syntactic: phrases have to be
formed from sub-trees of sentence’s dependency tree graphs.
There are several specific features that all tools presented here share.
First of all and most broadly, they are all based on deep learning. This no-
tion is used to describe multi-layer neural networks that became heavily popular
in natural language processing after 2013.
Arguably, their popularity can be explained at least partially by the seminal
paper by Mikolov et al. (2013). The study demonstrated interesting regularities
in word representations learned using neural networks. The representations,
called word embeddings, were very promising in solving word analogy tasks us-
ing simple arithmetics. In the famous example, arithmetic operation on vectors
of three words: “King-Man+Woman” results in a vector very close to “Queen”.
It indicates that words with similar meaning tend to have similar word embed-
ding vectors. It quickly became obvious that using such word representations
combined with multi-layer neural networks trained on very large corpora allow
science to match (or in some cases surpass) the existing state-of-the-art in many
areas of natural language processing.
Second, within the broad area of deep learning, all tools presented in this
section are based on recurrent neural networks (RNNs), and their sub-type such
as long short-term memory networks (LSTMs) (Hochreiter and Schmidhuber
1997). RNNs are currently fundamental building blocks not only for sentiment
analysis but for many other language processing tasks, including machine trans-
lation, language modeling and question answering.
Third, as is the case with most natural processing tools of this type, they use
pre-trained word embedding vectors to represent a word’s meaning.
454 A. Wawer

Four, the tools share the same task. Given a dependency tree of a sentence,
the goal is to provide the correct sentiment for each possible sub-tree (phrase).
Phrases correspond to sub-trees of a dependency parse tree. By convention, as
discussed in Section 4.2, sentiment values are assigned to whole phrases (and in
some cases, whole sentence), regardless of their type. Typically, applications
compute sentiment recursively, starting from leaves and smaller phrases, then
expanding to larger phrases and taking into account sentiment values already
computed for their nested sub-phrases. This could be equivalent to recursively
folding the tree in a bottom-up fashion. Sentence-level sentiment becomes then
the value of predictive model after folding the whole sentence.
The next section briefly discusses the resource used to train the neural net-
work models, and provides insight into data format provided at output.

4.2. Polish Sentiment Treebank

The dataset used to train neural network models described in Section 4.3 is a
dependency treebank with sentiment annotations. It is available for downloaded
from <http://zil.ipipan.waw.pl/TreebankWydzwieku>.
The inspiration for the Polish Sentiment Treebank comes from the English
language, where multiple tools and approaches emerged after publishing the
Stanford Sentiment Treebank (Socher et al. 2013). This English reference data
set contains 9645 sentences from a movie review domain. It has been widely
used for evaluating multiple deep learning approaches, such as simple recursive
neural networks and recently more complex Tree LSTMs (Tai et al. 2015a).
For each sentence, its overall sentiment (neutral, positive and negative), as
well as sentiments of each sub-phrase (sub-tree) and each leaf word have been
assigned by a linguist.
Sentiment annotations for each token corresponded to the overall sentiment
of the whole phrase under it and included within it. Specifically:

– For every leaf token or word, its sentiment corresponds to this word or to-
ken’s sentiment.
– For every non-leaf token or word (node that has non-empty set of children)
sentiment field describes the sentiment of the whole phrase, formed by sub-
tree starting at this token (that includes this token and all tokens below it)

Together, the current version of the treebank consists of 6555 sentiment-


annotated phrases from the parse trees of 1200 sentences. It has been built from
Sentiment analysis for Polish 455

two parts. The first part of the treebank is composed from the sub-part of Sklad-
nica treebank, namely from sentences that contain at least one sentiment-bearing
word. This part consists of 235 sentences (1915 sentiment-annotated multiword
phrases). The second part consits of 965 sentences from a product review corpus
available from <http://zil.ipipan.waw.pl/OPTA/>. The number of sentiment-
annotated multiword phrases is 4640.
The dataset is the first Polish language corpus with fully labeled parse trees
that allows for analysis of compositional effects of sentiment in Polish. It cap-
tures complex sentiment-related phenomena at the intersection of syntax and
semantics. The dataset enables us to train compositional models that are based
on supervised and structured machine learning techniques. It is especially well-
suited for recurrent neural networks.
The dataset was used extensively during the PolEval 2017 competition as
training and test data for participating systems. It provided the opportunity to
design, train and compare such algorithms on the Polish language data sets.
Figure 1 presents one example sentence from the Polish Sentiment Tree-
bank.6 The figure illustrates also dependency parse structure. Only selected
phrase-level (sub-trees) sentiment annotations have been presented. The sen-
tence is a mixture of both positive and negative sentiments, with negative sen-
timents taking over the precedence over positive. Positive sentiments (‘a year
ago was really ok’) were overridden by more recent negative experiences (‘I do
not recommend’). Clearly, understanding the overall sentiment composition re-
quires capturing the time frame of both experiences.

Figure 1. An example sentence from the Polish Sentiment Treebank.

The next section discusses systems submitted to the sentiment analysis subtask
of the Poleval 2017 competition.

6
The English translation of the sentence is as follows: ‘Generally I do not recommend, but a year
ago was really ok’.
456 A. Wawer

4.3. Overview of PolEval Systems

Among multiple systems submitted to PolEval 2017 sentiment competition


(Wawer and Ogrodniczuk 2017), three implementations are publicly available
and have been presented in the next sections.
Architectures used by the three presented solutions, Tree-LSTM-NR (Ryci-
ak 2017), “Alan Turing climbs a tree” (Żak and Korbak 2017) and Tree-LSTM-
ML (Lew and Pęzik 2017) are all variants of the prototypical Tree LSTM intro-
duced in (Tai et al. 2015b). Tree-LSTM is widely considered as the best repre-
sentative of RNNs suitable for sentence parse data, not only in the form of con-
stituency but also dependency trees.
As in many other deep learning architectures, words are represented by
word embedding vectors. A typical starting point in these approaches is to begin
with bottom leaves and to move upwards, averaging over hidden and cell states
of all of a node’s children. The averaged states from sub-phrases below are then
used in upward levels computation along with direct children of a given node.
Sentiment values are then produced at any level for each non-terminal node.
They could also be passed as one of the variables along with hidden and cell
states upwards.
The conditions of PolEval competition allowed up to three submissions
from the same author or team. This usually served as a means to try out different
parameters or settings of predictive models. The next sections contain discus-
sions of each of the contributed systems and details where to download the im-
plementations. Interestingly, each of the systems was built using different deep
learning framework (Theano, Tensorflow and PyTorch).

4.3.1. Tree-LSTM-NR (Ryciak 2017)

Tree-LSTM-NR works on the principle of propagating information along


branches of a tree. The algorithm begins in leaves and moves upwards aggregat-
ing information in each node from its subtrees. In the end, the network reaches
the root of the tree and produces an output – prediction for sentence-level sen-
timent. At each non-terminal node it produces an output for the corresponding
sub-tree.
Tree-LSTM-NR was submitted in three variants. They differed only in the
learning method:
– predictions 1 and 2: During the training phase, the neural network model
(epoch number) was selected as the one which performed best on validation
Sentiment analysis for Polish 457

data. The selection of sentences into train and validation sets in predictions 1
and 2 was a bit different in terms of sizes and selection methods.
– predictions 3: The training phase length was determined using early stopping
condition set to 5 epochs, measured on test data. Then, we re-trained this
model on the combined train and validation sets for 15 more epochs.

Tree-LSTM-NR used dependency parse information and sentiment annotations


for each phrase (including single tokens and whole sentences). Additionally,
300-dimensional word2vec vectors trained with a combination of Wikipedia
and the National Corpus of Polish which were used for words representation.
The tool, implemented in Theano, is available for download.7

4.3.2. Tree-LSTM-ML (Lew and Pęzik 2017)

Like all other models in the PolEval sentiment subtask, the model was created
using LSTM architecture of recurrent deep neural network in tree topology. For
training, the authors used syntactic dependency trees without dependency labels.
The model was built as follows:
– Leaves of NN are embeddings of leaves in dependency tree.
– Subtrees of NN are embeddings of subtrees in dependency tree and are cre-
ated by combining embedding of a root of dependency subtree one-by-one
with its children (from left to right) using LSTM equations.
– Embedding of each subtree was projected on its sentiment label.

The tool, implemented in TensorFlow, is available for download.8

4.3.3. Alan Turing climbs a tree (Żak and Korbak 2017)

The system used in this solution was a Child-Sum Tree-LSTM deep neural net-
work (as described by (Tai et al. 2015a)), fine-tuned for dealing with morpho-
logically rich languages. Fine-tuning included applying a custom regularization
technique (zoneout, described by Krueger et al. 2016), and further adapted for

7
https://github.com/norbertryc/poleval
8
https://github.com/michal-lew/tree-lstm
458 A. Wawer

Tree-LSTMs) as well as using pre-trained word embeddings enhanced with sub-


word information (fastText, as in Bojanowski et al. 2016).
The authors experimented with two variants: summing the children’s hidden
and cell states (sum-child in the paper) and choosing one child to pass its hidden
and cell state upwards (choose-child in the paper). They also experimented with
using either one of word2vec and fastText (Bojanowski et al. 2016) for word
representation. These are reflected in three variants submitted for evaluation.
The system did not achieve good results in the official evaluation phase. As
the authors claimed, this was due to a mistake in the code. After fixing the prob-
lematic issue, the system achieved the best measured accuracy of 0.807 on the
test data set.
The tool implemented in PyTorch is available for download.9

4.4. Quality measurements

Given the files with results of dependency syntax analysis for each sentence,
participants of PolEval 2017 were to provide sentiment labelings. They were to
provide sentiment values of specific words for leaves. For non-leaf tokens they
were to provide sentiment values that reflect the sentiment of the overall phrase,
formed by sub-tree starting at this token (that includes this token and all tokens
below it). Sentiment labelings obtained from participants were compared with
the organizer’s test set of gold (reference) annotation.
To evaluate the performance of the systems submitted to Poleval, their mi-
cro accuracy has been calculated against the provided test data. Micro accuracy
means that each sub-tree has been taken into account when calculating scores. It
is a global sum of correctly labeled sentiment scores, over cases belonging to
each class.10
The official results measured at the PolEval competition are presented in
Table 1.
The competing systems were heavily tuned to obtain the highest possible
prediction quality using such regularization techniques as dropout, zone-out or
L2. The winning system Tree-LSTM-NR was a Tree LSTM variant with infor-
mation from dependency labels and word2vec (Mikolov et al. 2013) embed-
dings generated for Polish from Wikipedia and the National Corpus of Polish
(www.nkjp.pl).

9
https://github.com/tomekkorbak/treehopper
10
Micro is opposed to macro accuracy, where for each class (positive, neutral, negative) its accura-
cy is computed separately, to later compute the average of class scores.
Sentiment analysis for Polish 459

Table 1 Evaluation: micro accuracy of each system


performed at the official PolEval 2017 evaluation.

System Variant Accuracy


Tree-LSTM-NR (predictions 2) 0.795
Tree-LSTM-NR (predictions 3) 0.795
Tree-LSTM-NR (predictions 1) 0.779
Tree-LSTM-ML (test_file) 0.768
Alan Turing climbs a tree (fast_emblr) 0.678
Alan Turing climbs a tree (slow_emblr) 0.671
Alan Turing climbs a tree (ens) 0.670

4.5. Neural networks for phrase and sentence-level sentiment: A summary

It is difficult to assess which system to recommend. Accuracy can be assumed to


be similar across all tests. After fixing the problematic issue, “Alan Turing
climbs a tree” may perhaps have a slight edge. From the standpoint of underly-
ing deep learning frameworks, selecting solutions based on Tensorflow and
PyTorch is more advisable since Theano’s life cycle has ended. It will not be
further developed and maintained.
None of the systems are currently deployed over web interface or available
via an API. Users need to download the tools and configure them on their own
machines. It is worth noting that solutions of this type, namely deep multilayer
neural networks, have different requirements than the applications previously
described. Technically, the speed of computation is significantly raised by using
graphic processing units (GPUs). Such setups might be especially advisable for
large and medium sized corpora.

5. Finding opinion targets

Opinion target extraction is the task of recognizing words towards which an


opinion (sentiment) is expressed. Typically, in the domain of product reviews,
they are aspect (called also attribute) terms related to the reviewed entity or the
entity itself. For instance in the domain of photo cameras, opinions may be ex-
pressed towards a camera itself (entity) but also its aspects such as lens and bat-
tery. Various aspects may have different or even opposing attached sentiments:
someone may like the lens but dislike the battery.
460 A. Wawer

Opinion target words are usually domain-dependent. In product reviews,


opinions are expressed about properties of products, specific to each product
type. For example, when reviewing perfumes, people opine various aspects of
smell, its durability, bottles, and so on. When reviewing phones, they opine as-
pects such as battery time, screen, application performance. All these constitute
opinion targets, and as the provided examples prove, these lists vary between
domains. Similarly in social media, targets of opinions depend on the current
topic under discussion.
The description of experiments on automated opinion target extraction can
be found in Wawer (2015). The work was conducted on a corpus of product re-
views, downloaded from one of the biggest Polish opinion aggregation web-
sites, for two types of products: clothes and perfumes. The corpus contained an-
notations of opinion target and sentiment words taken from former studies. After
multiple steps, the author arrived at 1737 annotated pairs (sentence-level) of
sentiments and opinion targets.
The first part of the research focused on identifying possible dependency
tree paths between sentiment words and their targets. To start this process the
sentences have been parsed using the MaltEval dependency parser and model
for the Polish language (http://zil.ipipan.waw.pl/PDB/PDBparser). Dependency
tree path in this sense is a description of dependency labels and tokens (using
their part-of-speech information), that can be encountered when traversing a
parse tree from a sentiment word to its target.
Such paths can be described in multiple ways. The method assumed in
Wawer (2015) was a rule-based formalism similar to Stanford’s tools Tregex
and Semgrex (Levy and Andrew 2006). It allows for addressing attributes of to-
kens (for example, POS and lemma) and expressing the direction of dependen-
cies. In this pattern matching system, tokens are expressed as enclosed in [..]
and dependency relations marked as < or >, depending on the direction. Label
type was specified following the dependency relation operator. The pattern
matching system used in Wawer (2015) also allows us to specify that encoun-
tered tokens belong to given POS type (eg. [pos:verb] to specify verbs). Patterns
should be read from left (the first token on the left being the sentiment word) to
the right (the last token being the target word).
The second part of the work deals with using such path descriptions alone for
finding relations between sentiments and their potential targets. Using annotated
pairs of sentiments and their targets, a set of possible paths has been generated.
Table 2 presents only the top five most frequently encountered patterns along
with their precision (P), number of correct (C) and erroneous (E) pair matchings.
A more extensive list of such patterns can be found in (Wawer 2015).
Sentiment analysis for Polish 461

Table 2. The top five most frequent dependency patterns: precision (P),
number of correct (C) and erroneous (E) matches.

Path P C E
[pos:adj] <adjunct [pos:subst] 0.886 396 51
[pos:fin] >comp [pos:prep] >comp [pos:subst] 0.814 48 11
[pos:adj] >adjunct [pos:prep] >comp [pos:subst] 0.906 48 5
[pos:adj] <adjunct [pos:subst] >adjunct [pos:prep] >comp [pos:subst] 0.333 16 32
[pos:adj] <pd [pos:fin] >subj [pos:subst] 0.909 40 4

The most frequent pattern is simply a sentiment-bearing adjective that modifies


its target – a noun (as in pairs: “good battery”, “poor lens”). The pattern has rel-
atively high precision, almost 0.9. Other patterns except for the fourth one also
have high precision. Unfortunately, the precision of some less frequent patterns
reported in Wawer (2015) is low.
There are other serious issues with using such patterns as the only method
of identifying opinion targets. For example, it is not clear how to proceed if, for
one particular sentiment word, multiple patterns match that each points to a dif-
ferent potential target word. It is possible to circumvent this problem by ranking
the dependency patterns and taking the one with highest overall accuracy, alt-
hough the solution obtained would not be a very accurate one.
The issues indicate that in order to reach high recall (which means taking in-
to account low-precision dependency paths as well as capturing only correct
sentiment-target occurrences) (preserving high precision) one has to propose
some other tool.
The solution proposed in Wawer (2015) is to take such paths as its starting
point and employ a machine learning method, which would examine contexts of
each dependency rule occurrence and decide whether the potential target word
should be matched with its sentiment word or not. In other words, the proposed
method is to apply a rule matching engine, and use its output as input features
for a machine learning classifier. The method applied for this task is Conditional
Random Fields (Lafferty et al. 2001) (CRF): a default choice for natural lan-
guage sequence labeling tasks in the era prior to deep learning.
In this approach identifying target words is treated as a part of labeling a se-
quence. Input is formed from multiple features, such as information about word
sentiment (S), dependency labels (dep) and parts-of-speech (POS), markers are
also placed at tokens that are pointed to by dependency rules for potential target
extraction. This has been tried in two variants, by either marking that any rule
462 A. Wawer

matches for a token (ruleAny) or by indicating the particular rule that matched
(ruleID). Experiments have also been carried out to investigate the influence of
adding lexical information (lemma).
Table 3 presents the results of various input feature spaces for the CRF pre-
dictions of target words. The results are average values of precision (P), recall
(R) and F1 measured in 10-fold cross validation. A more comprehensive version
of the table can be found in Wawer (2015).

Table 3. Target word extraction using the CRF algorithm


and various input feature combinations.

Features P R F1
lemma 0.586 0.33 0.421
lemma + POS + dep 0.553 0.466 0.505
POS + dep 0.548 0.426 0.478
POS + dep + ruleAny 0.783 0.891 0.833
POS + dep + ruleAny + ruleID + S 0.829 0.889 0.857

The main conclusion is that models without lexical features (lemma) outper-
formed ones with lexical features by large margin. The value of this result is that
it carries a hope of domain independence: machine learning from syntactic
structures appears to be much more universally applicable than lexicalized vari-
ants, attached to specific words that are usually tied to some domain. Generally
the reported CRF results (especially the models that use rule-based features)
leave little room for further improvements due to their high overall perfor-
mance.
The best performing feature set, namely the last line in Table 3 has been im-
plemented in the Python language and made available publicly as a tool called
OPFI. The package requires that the input was dependency parsed and pre-
annotated with sentiment on word-level. The first requirement can be fulfilled
using the dependency parser Malt and models that are available for download
from <http://zil.ipipan.waw.pl/PDB/PDBparser>. The second word-level senti-
ment annotations are possible using one of the dictionaries described in Section
2. Using rule-based sentiments is also possible, but not advisable: sentiment po-
larities are not relevant for opinion targets. What matters is the placement of
sentiment-bearing words, regardless of their phrase-level or contextual polarity
modifications.
Sentiment analysis for Polish 463

Input for OPFI is a tab-separated CONLL format. Tokens are placed in


rows, their properties such as POS are separated by tabs. Sentences are separat-
ed by a single empty row. More descriptions of the opinion finding tool OPFI
(along with some unsuccessful deep learning experiments) can be found in
(Wawer).
The opinion finding tool OPFI is available for download.11

6. Comparison with English

This section contains a short comparison with tools available in English. The
language selected here as a reference is without doubts, the one with most avail-
able tools and resources. By comparing with the English language, one can
clearly see what is missing for Polish. The review in this section is by no means
comprehensive and focuses only on major areas, as more detailed comparison
would fall out of the scope of this article.
The first area, entirely missing in Polish and closely related to sentiment
analysis, is subjectivity detection. In English, the development of subjectivity-
related field has been facilitated by corpora such as MPQA (Deng and Wiebe
2015) and IMDB subjectivity dataset (Pang and Lee 2004). One of these tools is
OpinionFinder (Wilson et al. 2005). It offers not only sentiment analysis in the
sense of polarity recognition, but also subjectivity analysis: automatically iden-
tifying when opinions, sentiments, speculations, and other private states are pre-
sent in text. More recent applications offering sentence-level subjectivity classi-
fication are usually based on bidirectional recurrent neural networks.12
Another area not addressed in Polish is document-level sentiment classifica-
tion and more generally, the problem of computing sentiment of longer utter-
ances. In English, available tools are usually based on machine learning para-
digm, models often trained on the IMDB movie review corpus (Pang and Lee
2004; Pang et al. 2002). This data set serves as an example in multiple machine
learning and sentiment analysis tutorials that demonstrate how to build a docu-
ment-level sentiment classification tools, including the well-known deep learn-
ing package Keras.13

11
http://zil.ipipan.waw.pl/OPTA
12
For example https://github.com/fractalego/subjectivity_classifier
13
The source code of the example is available for download from:
https://github.com/keras-team/keras/blob/master/examples/imdb_lstm.py
464 A. Wawer

Challenges facing the development of tools related to sentiment analysis in


Polish, as compared to English, fall into at least two categories of problems. The
first one is relatively complex syntax and difficult morphosyntactics. Processing
Polish texts to compute their sentiment often requires a more complex NLP
pipeline with morphosyntactic disambiguation than in the case of English. The
situation has been improved with the advent of deep learning, since this kind of
solutions often work in an end-to-end manner. The second issue compared to
English is the size of community of researchers dedicated to sentiment analysis
problems. In Poland, the total number of labs dealing systematically with com-
putational linguistics and natural language processing is probably less than five.
Sentiment analysis, which is just one of multiple areas in NLP, is then within the
research scope of a small group of people. For English, the community of re-
searchers is located in multiple English-speaking countries and larger by several
orders of magnitude. This observation can be extrapolated also to amounts of
research grants.

7. Summary

The article presents a review of publicly available tools and selected resources
for sentiment analysis in Polish, covering word-level, phrase-level and sentence-
level analysis. Tools and resources known from literature but not publicly avail-
able were mentioned only briefly or entirely skipped. The article reflects the
state of knowledge of its authors as of mid 2018. It is entirely possible that we
were not aware of some items that fulfill our criteria and should be described
here or were under preparation at the time of writing, and therefore not yet in-
dexed by search engines and scholarly databases.
The list of applications and resources described more thoroughly includes:
two publicly available dictionaries, one rule-based phrase-level tool, three deep
neural network solutions for phrase-level and sentence-level analysis. Finally, it
also contains a description of an application for matching sentiments with their
targets.
The main focus was to present available software applications. Only senti-
ment dictionaries were taken into account in a manner similar to applications
because of their usability to compute sentiment with very little programming ef-
fort. In order to develop a full-fledged dictionary-based sentiment analysis solu-
tion in Polish, one would need to supplement it with a lemmatizer (morphosyn-
tactic disambiguator) such as Pantera (Acedański 2010) integrated in Multi-
Sentiment analysis for Polish 465

service API14 or WCRFT (Radziszewski 2013) integrated in REST API of


CLARIN-PL services.15 In the case of the plWordNet Emo one also needs to ap-
ply a solution for word sense disambiguation.
The presented tools and resources have been developed over the course of
the last decade. The technologies behind these applications reflect the length of
that period: they vary from ones based on dictionaries and engines using semi-
manually and manually crafted rules (typical approach a decade ago), cover
more recent advances in machine learning, to finally include the newest deep
learning solutions released as a part of the PolEval 2017 competition.
A comment on usability should be made in the context of the opinion find-
ing tool presented in Section 5. While it is not directly intended to compute sen-
timent, it makes sentiment analyzers more usable by enabling the design of
more complex tools, targeted to analyze large corpora and social media streams
for opinions expressed regarding entity levels and their attributes, perhaps
providing aggregated reports about trends and over time. Similar services have
been offered on commercial terms by multiple companies in Poland.
To the best of our knowledge, no document-level tools are available for sen-
timent categorization of larger texts. The problem is not trivial, especially given
product reviews that are often mixed, contain both favourable (positive) and
non-favourable (negative) opinions about some aspects of products. How such
ambivalence translates to overall sentiment is a difficult subject. It is possible to
use word-level dictionaries as well as phrase-level and sentence-level applica-
tions to produce per-document aggregations. Then, the decision if a document
can be classified as positive or negative may be based on counts of detected sen-
timent-bearing words, phrases or sentences. Classification decision (for exam-
ple, threshold of the ratio of positive to negative items, such as words, to con-
sider a document positive or negative) has to take into account the bias towards
positivity present in most corpora (Kloumann et al. 2012).
Future developments in Polish sentiment analysis will likely involve deep
learning methods, not only improved over the current state, but also applied to
multiple new tasks. The research areas will likely include multimodal analysis
(for instance, text analyzed jointly with related speech, pictures or videos), anal-
ysis of not just sentiments but emotions (already initiated by plWordNet Emo;
Zaśko-Zielińska et al. 2015).

14
http://multiservice.nlp.ipipan.waw.pl/en/
15
http://ws.clarin-pl.eu/
466 A. Wawer

REFERENCES
Acedański, S. 2010. “A morphosyntactic brill tagger for inflectional languages”. In:
Loftsson, H., E. Rögnvaldsson and S. Helgadóttir (eds.), Advances in natural lan-
guage processing. Berlin, Heidelberg: Springer. 3–14.
Bojanowski, P., E. Grave, A. Joulin and T. Mikolov. 2016. “Enriching word vectors
with subword information”. arXiv preprint arXiv:1607.04606.
Buczyński, A. and A. Wawer. 2008a. “Shallow parsing in sentiment analysis of product
reviews”. Proceedings of the Partial Parsing Workshop at LREC. 14–18.
Buczyński, A. and A. Wawer. 2008b. “Automated classification of product review sen-
timents in Polish”. Proceedings of the Intelligent Information Systems (IIS).
Deng, L. and J. Wiebe. 2015. “MPQA 3.0: An entity/event-level sentiment corpus”. In:
Mihalcea, R., J. Yue Chai and A. Sarkar (eds.), HLT-NAACL. The Association for
Computational Linguistics. 1323–1328.
Haniewicz, K., W. Rutkowski, M. Adamczyk and M. Kaczmarek. 2013. “Towards the
lexicon-based sentiment analysis of Polish texts: Polarity lexicon”. In: Bǎdicǎ, C.,
N. Thanh Nguyen and M. Brezovan (eds.), Computational collective intelligence.
Technologies and applications. Berlin: Springer. 286–295.
Hochreiter, S. and J. Schmidhuber. 1997. “Long short-term memory”. Neural Computa-
tion 9(8). 1735–1780. doi:10.1162/neco.1997.9.8.1735.
Kędzia, P., M. Piasecki and M. Orlińska. 2015. “Word sense disambiguation based on
large scale polish clarin heterogeneous lexical resources”. Cognitive Studies 15.
269–292.
Kloumann, I. M., C. M. Danforth, K. D. Harris, C. A. Bliss and P. S. Dodds. 2012. “Pos-
itivity of the English language”. PloS One 7(1). e29484.
Krueger, D., T. Maharaj, J. Kramár, M. Pezeshki, N. Ballas, N. R. Ke, A. Goyal, et al.
2016. “Zoneout: Regularizing rnns by randomly preserving hidden activations”.
CoRR abs/1606.01305. <http://arxiv.org/abs/1606.01305>
Lafferty, J. D., A. McCallum and F. C. N. Pereira. 2001. “Conditional random fields:
Probabilistic models for segmenting and labeling sequence data”. Proceedings of
the Eighteenth International Conference on Machine Learning (ICML ’01). San
Francisco, CA: Morgan Kaufmann Publishers Inc. 282–289.
Levy, R. and G. Andrew. 2006. “Tregex and tsurgeon: Tools for querying and manipu-
lating tree data structures”. Proceedings of the Fifth International Conference on
Language Resources and Evaluation. 2231–2234.
Lew, M. and P. Pęzik. 2017. “A sequential child-combination tree-lstm network for sen-
timent analysis”. In Vetulani, Z. (ed.), Proceedings of the 8th Language and Tech-
nology Conference: Human Language Technologies as a Challenge for Computer
Science and Linguistics. Poznań.
Liu, B. 2012. Sentiment Analysis and Opinion Mining. Morgan and Claypool Publishers.
Mikolov, T., K. Chen, G. Corrado and J. Dean. 2013. “Efficient estimation of word rep-
resentations in vector space”. CoRR abs/1301.3781.
<http://arxiv.org/abs/1301.3781>.
Pang, B. and L. Lee. 2004. “A sentimental education: Sentiment analysis using subjec-
tivity summarization based on minimum cuts”. Proceedings of ACL. 271–278.
Sentiment analysis for Polish 467

Pang, B., L. Lee and S. Vaithyanathan. 2002. “Thumbs up? Sentiment classification us-
ing machine learning techniques”. CoRR cs.CL/0205070.
<http://arxiv.org/abs/cs.CL/0205070>
Przepiórkowski, A. and A. Buczyński. 2007. “Spade: Shallow Parsing and Disambigua-
tion Engine”. Proceedings of the 3rd Language and Technology Conference
(LTC’07). Poznań.
Radziszewski, A. 2013. “A tiered crf tagger for Polish”. In: Bembenik, R., Ł. Skoniecz-
ny, H. Rybiński, M. Kryszkiewicz and M. Niezgódka (eds.), Intelligent tools for
building a scientific information platform: Advanced architectures and solutions.
Berlin: Springer. 215–230.
Rill, S., J. Scheidt, J. Drescher, O. Schütz, D. Reinel and F. Wogenstein. 2012. “A ge-
neric approach to generate opinion lists of phrases for opinion mining applications”.
Proceedings of the first international workshop on issues of sentiment discovery
and opinion mining (WISDOM ’12). New York: ACM. 7:1–7:8.
Ryciak, N. 2017. “Polish language sentiment analysis with tree-structured long short-
term memory network”. In: Vetulani, Z. (ed.), Proceedings of the 8th Language and
Technology Conference: Human Language Technologies as a Challenge for Com-
puter Science and Linguistics. Poznań.
Socher, R., A. Perelygin, J. Wu, J. Chuang, C. D. Manning, A. Y. Ng and C. Potts.
2013. “Recursive deep models for semantic compositionality over a sentiment tree-
bank”. Proceedings of the 2013 Conference on Empirical Methods in Natural Lan-
guage Processing. Stroudsburg, PA: Association for Computational Linguistics.
1631–1642.
Tai, K. S., R. Socher and C. D. Manning. 2015a. “Improved semantic representations
from tree-structured long short-term memory networks”.
<http://arxiv.org/abs/1503.00075>
Wawer, A. 2012. “Mining co-occurrence matrices for SO-PMI paradigm word candi-
dates”. Proceedings of the Student Research Workshop at the 13th Conference of
the European Chapter of the Association for Computational Linguistics (EACL’12
SRW). Avignon: Association for Computational Linguistics. 74–80.
Wawer, A. 2015. “Towards domain-independent opinion target extraction”. IEEE Inter-
national Conference on Data Mining Workshop, ICDMW 2015. Atlantic City, NJ,
November 14–17, 2015. IEEE. 1326–1331. doi:10.1109/ICDMW.2015.255.
Wawer, A. 2016. “OPFI: A tool for opinion finding in Polish”. Proceedings of the Tenth
International Conference on Language Resources and Evaluation (LREC 2016).
Paris,: European Language Resources Association (ELRA).
Wawer, A. and M. Ogrodniczuk. 2017. “Results of the poleval 2017 competition: Sen-
timent analysis shared task”. 8th Language and Technology Conference: Human
Language Technologies as a Challenge for Computer Science and Linguistics.
Wawer, A. and D. Rogozińska. 2012. “How much supervision? Corpus-based lexeme
sentiment estimation”. 2012 IEEE 12th International Conference on Data Mining
Workshops (ICDMW). 724–730. doi:10.1109/ICDMW.2012.119
Wilson, T., P. Hoffmann, S. Somasundaran, J. Kessler, J. Wiebe, Y. Choi, C. Cardie, E.
Riloff and S. Patwardhan. 2005. “OpinionFinder: A system for subjectivity analy-
sis”. Proceedings of HLT/EMNLP on Interactive Demonstrations (HLT-Demo ’05),
Stroudsburg, PA: Association for Computational Linguistics. 34–35.
468 A. Wawer

Zaśko-Zielińska, M., M. Piasecki and S. Szpakowicz. 2015. “A large wordnet-based


sentiment lexicon for Polish”. Proceedings of the International Conference Recent
Advances in Natural Language Processing, Hissar, Bulgaria: INCOMA Ltd. 721–
730. <http://www.aclweb.org/anthology/R15-1092>
Żak, P. and T. Korbak. 2017. “Fine-tuning tree-LSTM for phrase-level sentiment classi-
fication on a Polish dependency treebank”. In: Vetulani, Z. (ed.), Proceedings of the
8th Language and Technology Conference: Human Language Technologies as a
Challenge for Computer Science and Linguistics. Poznań.

Address correspondence to:


Aleksander Wawer
Institute of Computer Science
Polish Academy of Sciences
Linguistic Engineering Group
Jana Kazimierza 5
01-248 Warszawa
Poland
axw@ipipan.waw.pl

You might also like