Style

Multimodal Emoji Prediction
Francesco Barbieri♦ Miguel Ballesteros♠ Francesco Ronzano♥ Horacio Saggion♦

♦
Large Scale Text Understanding Systems Lab, TALN, UPF, Barcelona, Spain
♠
IBM Research, U.S
♥
Integrative Biomedical Informatics Group, GRIB, IMIM-UPF, Barcelona, Spain
♦♥ ♠
{name.surname}@upf.edu, miguel.ballesteros@ibm.com
Abstract stagram post, given its picture and text1 . Our task
and experimental framework are similar to (Bar-
Emojis are small images that are commonly bieri et al., 2017), however, we use different data
included in social media text messages. The (Instagram instead of Twitter) and, in addition, we
combination of visual and textual content in rely on images to improve the selection of the most
the same message builds up a modern way likely emojis to associate to a post. We show that
of communication, that automatic systems are a multimodal approach (textual and visual content
not used to deal with. In this paper we extend
of the posts) increases the emoji prediction accu-
recent advances in emoji prediction by putting
forward a multimodal approach that is able to racy compared to the one that only uses textual in-
predict emojis in Instagram posts. Instagram formation. This suggests that textual and visual
posts are composed of pictures together with content embed different but complementary fea-
texts which sometimes include emojis. We tures of the use of emojis.
show that these emojis can be predicted by us- In general, an effective approach to predict the
ing the text, but also using the picture. Our emoji to be associated to a piece of content may
main finding is that incorporating the two syn-
help to improve natural language processing tasks
ergistic modalities, in a combined model, im-
proves accuracy in an emoji prediction task. (Novak et al., 2015), such as information retrieval,
This result demonstrates that these two modal- generation of emoji-enriched social media con-
ities (text and images) encode different infor- tent, suggestion of emojis when writing text mes-
mation on the use of emojis and therefore can sages or sharing pictures online. Given that emo-
complement each other. jis may also mislead humans (Miller et al., 2017),
the automated prediction of emojis may help to
1 Introduction achieve better language understanding. As a con-
sequence, by modeling the semantics of emojis,
In the past few years the use of emojis in social we can improve highly-subjective tasks like senti-
media has increased exponentially, changing the ment analysis, emotion recognition and irony de-
way we communicate. The combination of visual tection (Felbo et al., 2017).
and textual content poses new challenges for infor-
mation systems which need not only to deal with 2 Dataset and Task
the semantics of text but also that of images. Re-
cent work (Barbieri et al., 2017) has shown that Dataset: We gathered Instagram posts published
textual information can be used to predict emo- between July 2016 and October 2016, and geo-
jis associated to text. In this paper we show that localized in the United States of America. We con-
in the current context of multimodal communica- sidered only posts that contained a photo together
tion where texts and images are combined in social with the related user description of at least 4 words
networks, visual information should be combined and exactly one emoji.
with texts in order to obtain more accurate emoji- Moreover, as done by Barbieri et al. (2017),
prediction models. we considered only the posts which include one
We explore the use of emojis in the social media and only one of the 20 most frequent emojis (the
platform Instagram. We put forward a multimodal 1
In this paper we only utilize the first comment issued by
approach to predict the emojis associated to an In- the user who posted the picture.
679
Proceedings of NAACL-HLT 2018, pages 679–686
c
New Orleans, Louisiana, June 1 - 6, 2018. 2018 Association for Computational Linguistics
most frequent emojis are shown in Table 3). Our 3.2 FastText
dataset is composed of 299,809 posts, each con- Fastext (Joulin et al., 2017) is a linear model for
taining a picture, the text associated to it and only text classification. We decided to employ FastText
one emoji. In the experiments we also considered as it has been shown that on specific classifica-
the subsets of the 10 (238,646 posts) and 5 most tion tasks, it can achieve competitive results, com-
frequent emojis (184,044 posts) (similarly to the parable to complex neural classifiers (RNNs and
approach followed by Barbieri et al. (2017)). CNNs), while being much faster. FastText repre-
Task: We extend the experimental scheme of Bar- sents a valid approach when dealing with social
bieri et al. (2017), by considering also visual infor- media content classification, where huge amounts
mation when modeling posts. We cast the emoji of data needs to be processed and new and relevant
prediction problem as a classification task: given information is continuously generated. The Fast-
an image or a text (or both inputs in the multi- Text algorithm is similar to the CBOW algorithm
modal scenario) we select the most likely emoji (Mikolov et al., 2013), where the middle word is
that could be added to (thus used to label) such replaced by the label, in our case the emoji. Given
contents. The task for our machine learning mod- a set of N documents, the loss that the model at-
els is, given the visual and textual content of a tempts to minimize is the negative log-likelihood
post, to predict the single emoji that appears in the over the labels (in our case, the emojis):
input comment.
n=1
1 X
loss = − en log(softmax (BAxn ))
N
3 Models N
where en is the emoji included in the n-th Insta-

We present and motivate the models that we use gram post, represented as hot vector, and used as
to predict an emoji given an Instagram post com- label. A and B are affine transformations (weight
posed by a picture and the associated comment. matrices), and xn is the unit vector of the bag of
features of the n-th document (comment). The bag
of features is the average of the input words, rep-
3.1 ResNets
resented as vectors with a look-up table.
Deep Residual Networks (ResNets) (He et al., 3.3 B-LSTM Baseline
2016) are Convolutional Neural Networks which
were competitive in several image classification Barbieri et al. (2017) propose a recurrent neural
tasks (Russakovsky et al., 2015; Lin et al., 2014) network approach for the emoji prediction task.
and showed to be one of the best CNN architec- We use this model as baseline, to verify whether
tures for image recognition. ResNet is a feed- FastText achieves comparable performance. They
forward CNN that exploits “residual learning”, by used a Bidirectional LSTM with character repre-
bypassing two or more convolution layers (like sentation of the words (Ling et al., 2015; Balles-
similar previous approaches (Sermanet and Le- teros et al., 2015) to handle orthographic variants
Cun, 2011)). We use an implementation of the (or even spelling errors) of the same word that oc-
original ResNet where the scale and aspect ratio cur in social media (e.g. cooooool vs cool).
augmentation are from (Szegedy et al., 2015), the
4 Experiments and Evaluation
photometric distortions from (Howard, 2013) and
weight decay is applied to all weights and biases In order to study the relation between Instagram
(instead of only weights of the convolution layers). posts and emojis, we performed two different ex-
The network we used is composed of 101 layers periments. In the first experiment (Section 4.2)
(ResNet-101), initialized with pretrained parame- we compare the FastText model with the state of
ters learned on ImageNet (Deng et al., 2009). We the art on emoji classification (B-LSTM) by Bar-
use this model as a starting point to later finetune bieri et al. (2017). Our second experiment (Sec-
it on our emoji classification task. Learning rate tion 4.3) evaluates the visual (ResNet) and textual
was set to 0.0001 and we early stopped the train- (FastText) models on the emoji prediction task.
ing when there was not improving in the validation Moreover, we evaluate a multimodal combination
set. of both models respectively based on visual and
680
top-5 top-10 top-20 top-5 top-10 top-20
P R F1 P R F1 P R F1 P R F1 P R F1 P R F1
BW 61 61 61 45 45 45 34 36 32 Maj 7.9 20.0 11.3 2.7 10.0 4.2 0.9 5.0 1.5
BC 63 63 63 48 47 47 42 39 34 W.R. 20.1 20.0 20.1 9.8 9.8 9.8 4.6 4.8 4.7
FT 61 62 61 47 49 46 38 39 36 Vis 38.6 31.1 31.0 26.3 20.9 20.5 20.3 17.5 16.1
Tex 56.1 54.4 54.9 41.6 37.5 38.3 36.7 29.9 31.3
Table 1: Comparison of B-LSTM with word mod- Mul 57.4 56.3 56.7 42.3 40.5 41.1 36.6 35.2 35.5
eling (BW), B-LSTM with character modeling (BC), % 2.3 3.5 3.3 1.7 8 7.3 -0.3 17.7 13.4
and FastText (FT) on the same Twitter emoji predic-
tion tasks proposed by Barbieri et al. (2017), using the Table 2: Prediction results of top-5, top-10 and top-
same Twitter dataset. 20 most frequent emojis in the Instagram dataset: Pre-
cision (P), Recall (R), F-measure (F1). Experimental
settings: majority baseline, weighted random, visual,
textual inputs. Finally we discuss the contribution textual and multimodal systems. In the last line we
of each modality to the prediction task. report the percentage improvement of the multimodal
We use 80% of our dataset (introduced in Sec- over the textual system.
tion 2) for training, 10% to tune our models, and
10% for testing (selecting the sets randomly).
4.3 Multimodal Emoji Prediction
4.1 Feature Extraction and Classifier
We present the results of the three emoji classifica-
To model visual features we first finetune the tion tasks, using the visual, textual and multimodal
ResNet (process described in Section 3.1) on the features (see Table 2).
emoji prediction task, then extract the vectors from The emoji prediction task seems difficult by just
the input of the last fully connected layer (before using the image of the Instagram post (Visual),
the softmax). The textual embeddings are the bag even if it largely outperforms the majority base-
of features shown in Section 3.2 (the xn vectors), line3 and weighted random4 . We achieve better
extracted after training the FastText model on the performances when we use feature embeddings
emoji prediction task. extracted from the text. The most interesting find-
With respect to the combination of textual and ing is that when we use a multimodal combina-
visual modalities, we adopt a middle fusion ap- tion of visual and textual features, we get a non-
proach (Kiela and Clark, 2015): we associate to negligible improvement. This suggests that these
each Instagram post a multimodal embedding ob- two modalities embed different representations of
tained by concatenating the unimodal representa- the posts, and when used in combination they are
tions of the same post (i.e. the visual and textual synergistic. It is also interesting to note that the
embeddings), previously learned. Then, we feed a more emojis to predict, the higher improvement
classifier2 with visual (ResNet), textual (FastText), the multimodal system provides over the text only
or multimodal feature embeddings, and test the ac- system (3.28% for top-5 emojis, 7.31% for top-10
curacy of the three systems. emojis, and 13.42 for the top-20 emojis task).
4.2 B-LSTM / FastText Comparison
4.4 Qualitative Analysis
To compare the FastText model with the word and
character based B-LSTMs presented by Barbieri In Table 3 we show the results for each class in the
et al. (2017), we consider the same three emoji top-20 emojis task.
prediction tasks they proposed: top-5, top-10 and The emoji with highest F1 using the textual fea-
top-20 emojis most frequently used in their Tweet tures is the most frequent one (0.62) and the
datasets. In this comparison we used the same US flag (0.52). The latter seems easy to pre-
Twitter datasets. As we can see in Table 1 FastText dict since it appears in specific contexts: when the
model is competitive, and it is also able to outper- word USA/America is used (or when American
form the character based B-LSTM in one of the cities are referred, like #NYC).
emoji prediction tasks (top-20 emojis). This result The hardest emojis to predict by the text only
suggests that we can employ FastText to represent system are the two gestures (0.12) and (0.13).
Social Media short text (such as Twitter or Instra- The first one is often selected when the gold stan-
gram) with reasonable accuracy. 3
Always predict since it is the most frequent emoji.
2 4
L2 regularized logistic regression Random keeping labels distribution of the training set
681
E % Tex Vis MM E % Tex Vis MM
17.46 0.62 0.35 0.69 3.68 0.22 0.15 0.29
9.10 0.45 0.30 0.47 3.55 0.20 0.02 0.26
8.41 0.32 0.15 0.34 3.54 0.13 0.02 0.2
5.91 0.23 0.08 0.26 3.51 0.26 0.17 0.31
5.73 0.35 0.17 0.36 3.31 0.43 0.25 0.45
4.58 0.45 0.24 0.46 3.25 0.12 0.01 0.16
4.31 0.52 0.23 0.53 3.14 0.12 0.02 0.15
4.15 0.38 0.26 0.49 3.11 0.34 0.11 0.36
3.84 0.19 0.1 0.22 2.91 0.36 0.04 0.37
3.73 0.13 0.03 0.16 2.82 0.45 0.54 0.59
Table 3: F-measure in the test set of the 20 most fre-

quent emojis using the three different models. “%” in-
dicates the percentage of the class in the test set
dard emoji is the second one or is often mis-

predicted by wrongly selecting or . Figure 1: Confusion matrix of the multimodal model.
The gold labels are plotted as y-axes and the predicted
Another relevant confusion scenario related to
labels as x-axes. The matrix is normalized by rows.
emoji prediction has been spotted by Barbieri
et al. (2017): relying on Twitter textual data they
showed that the emoji was hard to predict as it feature maps (Zhou et al., 2016). By visually ob-
was used similarly to . Instead when we con- serving the image heatmaps of the set of Insta-
sider Instagram data, the emoji is easier to pre- gram post pictures we note that in most cases it
dict (0.23), even if it is often confused with . is quite difficult to determine a clear association
When we rely on visual contents (Instagram between the emoji used by the user and some par-
picture), the emojis which are easily predicted are ticular portion of the image. Detecting the correct
the ones in which the associated photos are simi- emoji given an image is harder than a simple ob-
lar. For instance, most of the pictures associated to ject recognition task, as the emoji choice depends
are dog/pet pictures. Similarly, is predicted on subjective emotions of the user who posted the
along with very bright pictures taken outside. image. In Figure 2 we show the first four predic-
is correctly predicted along with pictures related tions of the CNN for three pictures, and where the
to gym and fitness. The accuracy of is also high network focuses (in red). We can see that in the
since most posts including this emoji are related to first example the network selects the smile with
fitness (and the pictures are simply either selfies at sunglasses because of the legs in the bottom of
the gym, weight lifting images, or protein food). the image, the dog emoji is selected while fo-
Employing a multimodal approach improves cusing on the dog in the image, and the smiling
performance. This means that the two modali- emoji while focusing on the person in the back,
ties are somehow complementary, and adding vi- who is lying on a hammock. In the second exam-
sual information helps to solve potential ambigu- ple the network selects again the due to the wa-
ities that arise when relying only on textual con- ter and part of the kayak, the heart emoji focus-
tent. In Figure 1 we report the confusion matrix ing on the city landscape, and the praying emoji
of the multimodal model. The emojis are plotted focusing on the sky. The same “praying” emoji
from the most frequent to the least, and we can see is also selected when focusing on the luxury car
that the model tends to mispredict emojis selecting in the third example, probably because the same
more frequent emojis (the left part of the matrix is emoji is used to express desire, i.e. “please, I want
brighter). this awesome car”.
It is interesting to note that images can give con-
4.4.1 Saliency Maps text to textual messages like in the following In-
In order to show the parts of the image most rel- stagram posts: (1)“Love my new home ” (asso-
evant for each class we analyze the global aver- ciated to a picture of a bright garden, outside) and
age pooling (Lin et al., 2013) on the convolutional (2) “I can’t believe it’s the first day of school!!!
682
context, Wijeratne et al. (2017a) propose a plat-
form for exploring emoji semantics. In order to
further study emoji semantics, two datasets with
pairwise emoji similarity, with human annotations,
have been proposed: EmoTwi50 (Barbieri et al.,
2016c) and EmoSim508 (Wijeratne et al., 2017b).
Emoji similarity has been also used for proposing
efficient keyboard emoji organization (Pohl et al.,
2017). Recently, Barbieri and Camacho-Collados
(2018) show that emoji modifiers (skin tones and
gender) can affect the semantics vector represen-
tation of emojis.
Emoji play an important role in the emotional
content of a message. Several sentiment lexicons
for emojis have been proposed (Novak et al., 2015;
Kimura and Katsurai, 2017; Rodrigues et al.,
Figure 2: Three test pictures. From left to right, we
show the four most likely predicted emojis and their 2018) and also studies in the context of emotion
correspondent class activation mapping heatmap. and emojis have been published recently (Wood
and Ruder, 2016; Hu et al., 2017).
During the last decade several studies have
I love being these boys’ mommy!!!! #myboys
shown how sentiment analysis improves when we
#mommy ” (associated to picture of two boys
jointly leverage information coming from differ-
wearing two blue shirts). In both examples the tex-
ent modalities (e.g. text, images, audio, video)
tual system predicts . While the multimodal sys-
(Morency et al., 2011; Poria et al., 2015; Tran
tem correctly predicts both of them: the blue color
and Cambria, 2018). In particular, when we deal
in the picture associated to (2) helps to change the
with Social Media posts, the presence of both tex-
color of the heart, and the sunny/bright picture of
tual and visual content has promoted a number of
the garden in (1) helps to correctly predict .
investigations on sentiment or emotions (Baecchi
5 Related Work et al., 2016; You et al., 2016b,a; Yu et al., 2016;
Chen et al., 2015) or emojis (Cappallo et al., 2015,
Modeling the semantics of emojis, and their ap- 2018).
plications, is a relatively novel research problem
with direct applications in any social media task. 6 Conclusions
Since emojis do not have a clear grammar, it is not
In this work we explored the use of emojis in a
clear their role in text messages. Emojis are con-
multimodal context (Instagram posts). We have
sidered function words or even affective markers
shown that using a synergistic approach, thus re-
(Na’aman et al., 2017), that can potentially affect
lying on both textual and visual contents of social
the overall semantics of a message (Donato and
media posts, we can outperform state of the art
Paggio, 2017).
unimodal approaches (based only on textual con-
Emojis can encode different meanings, and they
tents). As future work, we plan to extend our mod-
can be interpreted differently. Emoji interpretation
els by considering the prediction of more than one
has been explored user-wise (Miller et al., 2017),
emoji per Social Media post and also considering
location-wise, specifically in countries (Barbieri
a bigger number of labels.
et al., 2016b) and cities (Barbieri et al., 2016a),
and gender-wise (Chen et al., 2017) and time-wise
Acknowledgments
(Barbieri et al., 2018).
Emoji sematics and usage have been studied We thank the anonymous reviewers for their
with distributional semantics, with models trained important suggestions. Francesco B. and Horacio
on Twitter data (Barbieri et al., 2016c), Twitter S. acknowledge support from the TUNER project
data together with the official unicode description (TIN2015-65308-C5-5-R, MINECO/FEDER,
(Eisner et al., 2016), or using text from a popu- UE) and the Maria de Maeztu Units of Excellence
lar keyboard app Ai et al. (2017). In the same Programme (MDM-2015-0502).
683
References Spencer Cappallo, Stacey Svetlichnaya, Pierre Gar-
rigues, Thomas Mensink, and Cees GM Snoek.
Wei Ai, Xuan Lu, Xuanzhe Liu, Ning Wang, Gang 2018. The new modality: Emoji challenges in pre-
Huang, and Qiaozhu Mei. 2017. Untangling diction, anticipation, and retrieval. arXiv preprint
emoji popularity through semantic embeddings. In arXiv:1801.10253 .
ICWSM. pages 2–11.
Fuhai Chen, Yue Gao, Donglin Cao, and Rongrong
Claudio Baecchi, Tiberio Uricchio, Marco Bertini, and
Ji. 2015. Multimodal hypergraph learning for mi-
Alberto Del Bimbo. 2016. A multimodal feature
croblog sentiment prediction. In Multimedia and
learning approach for sentiment analysis of social
Expo (ICME), 2015 IEEE International Conference
network multimedia. Multimedia Tools and Appli-
on. IEEE, pages 1–6.
cations 75(5):2507–2525.
Miguel Ballesteros, Chris Dyer, and Noah A. Smith. Zhenpeng Chen, Xuan Lu, Sheng Shen, Wei Ai, Xu-
2015. Improved transition-based parsing by mod- anzhe Liu, and Qiaozhu Mei. 2017. Through
eling characters instead of words with lstms. In a gender lens: An empirical study of emoji us-
Proceedings of the 2015 Conference on Empirical age over large-scale android users. arXiv preprint
Methods in Natural Language Processing. Associ- arXiv:1705.05546 .
ation for Computational Linguistics, Lisbon, Portu-
J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-
gal, pages 349–359.
Fei. 2009. ImageNet: A Large-Scale Hierarchical
Francesco Barbieri, Miguel Ballesteros, and Horacio Image Database. In CVPR09.
Saggion. 2017. Are emojis predictable? In Pro-
ceedings of the 2017 Conference of the European Giulia Donato and Patrizia Paggio. 2017. Investigat-
Chapter of the Association for Computational Lin- ing redundancy in emoji use: Study on a twitter
guistics. Association for Computational Linguistics, based corpus. In Proceedings of the 8th Workshop
Valencia, Spain. on Computational Approaches to Subjectivity, Senti-
ment and Social Media Analysis. pages 118–126.
Francesco Barbieri and Jose Camacho-Collados. 2018.
How Gender and Skin Tone Modifiers Affect Emoji Ben Eisner, Tim Rocktäschel, Isabelle Augenstein,
Semantics in Twitter. In Proceedings of the 7th Joint Matko Bošnjak, and Sebastian Riedel. 2016.
Conference on Lexical and Computational Seman- emoji2vec: Learning emoji representations from
tics (*SEM 2018). New Orleans, LA, United States. their description. arXiv preprint arXiv:1609.08359 .
Francesco Barbieri, Luis Espinosa-Anke, and Horacio Bjarke Felbo, Alan Mislove, Anders Søgaard, Iyad
Saggion. 2016a. Revealing patterns of Twitter emoji Rahwan, and Sune Lehmann. 2017. Using millions
usage in Barcelona and Madrid. Frontiers in Artifi- of emoji occurrences to learn any-domain represen-
cial Intelligence and Applications. 2016;(Artificial tations for detecting sentiment, emotion and sar-
Intelligence Research and Development) 288: 239- casm. EMNLP .
44. .
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian
Francesco Barbieri, German Kruszewski, Francesco Sun. 2016. Deep residual learning for image recog-
Ronzano, and Horacio Saggion. 2016b. How Cos- nition. In Proceedings of the IEEE Conference
mopolitan Are Emojis? Exploring Emojis Usage on Computer Vision and Pattern Recognition. pages
and Meaning over Different Languages with Dis- 770–778.
tributional Semantics. In Proceedings of the 2016
ACM on Multimedia Conference. ACM, Amster- Andrew G Howard. 2013. Some improvements on
dam, Netherlands, pages 531–535. deep convolutional neural network based image
classification. arXiv preprint arXiv:1312.5402 .
Francesco Barbieri, Luis Marujo, William Brendel,
Pradeep Karuturim, and Horacio Saggion. 2018. Tianran Hu, Han Guo, Hao Sun, Thuy-vy Thi Nguyen,
Exploring Emoji Usage and Prediction Through a and Jiebo Luo. 2017. Spice up Your Chat: The In-
Temporal Variation Lens. In 1st International Work- tentions and Sentiment Effects of Using Emoji. In
shop on Emoji Understanding and Applications in In Proceeding of the International AAAI Conference
Social Media (at ICWSM 2018). on Web and Social Media (ICWSM). AAAI.
Francesco Barbieri, Francesco Ronzano, and Horacio Armand Joulin, Edouard Grave, Piotr Bojanowski, and
Saggion. 2016c. What does this emoji mean? a Tomas Mikolov. 2017. Bag of tricks for efficient text
vector space skip-gram model for twitter emojis. In classification. In Proceedings of the 2017 Confer-
LREC. ence of the European Chapter of the Association for
Computational Linguistics. Association for Compu-
Spencer Cappallo, Thomas Mensink, and Cees GM tational Linguistics, Valencia, Spain.
Snoek. 2015. Image2emoji: Zero-shot emoji pre-
diction for visual media. In Proceedings of the Douwe Kiela and Stephen Clark. 2015. Multi-and
23rd ACM international conference on Multimedia. cross-modal semantics beyond vision: Grounding in
ACM, pages 1311–1314. auditory perception. In EMNLP. pages 2461–2470.
684
Mayu Kimura and Marie Katsurai. 2017. Automatic David Rodrigues, Marı́lia Prada, Rui Gaspar, Mar-
construction of an emoji sentiment lexicon. In Pro- garida V Garrido, and Diniz Lopes. 2018. Lis-
ceedings of the 2017 IEEE/ACM International Con- bon emoji and emoticon database (leed): Norms
ference on Advances in Social Networks Analysis for emoji and emoticons in seven evaluative dimen-
and Mining 2017. ACM, pages 1033–1036. sions. Behavior research methods pages 392–405.
Min Lin, Qiang Chen, and Shuicheng Yan. 2013. Net- Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause,
work in Network. In International Conference on Sanjeev Satheesh, Sean Ma, Zhiheng Huang, An-
Learning Representations. drej Karpathy, Aditya Khosla, Michael Bernstein,
Alexander C. Berg, and Li Fei-Fei. 2015. Ima-
Tsung-Yi Lin, Michael Maire, Serge Belongie, James geNet Large Scale Visual Recognition Challenge.
Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, International Journal of Computer Vision (IJCV)
and C Lawrence Zitnick. 2014. Microsoft coco: 115(3):211–252.
Common objects in context. In European Confer-
ence on Computer Vision. Springer, pages 740–755. Pierre Sermanet and Yann LeCun. 2011. Traffic
sign recognition with multi-scale convolutional net-
Wang Ling, Chris Dyer, Alan W Black, Isabel Tran- works. In Neural Networks (IJCNN), The 2011 In-
coso, Ramon Fermandez, Silvio Amir, Luis Marujo, ternational Joint Conference on. IEEE, pages 2809–
and Tiago Luis. 2015. Finding function in form: 2813.
Compositional character models for open vocabu-
lary word representation. In Proceedings of the 2015 Christian Szegedy, Wei Liu, Yangqing Jia, Pierre
Conference on Empirical Methods in Natural Lan- Sermanet, Scott Reed, Dragomir Anguelov, Du-
guage Processing. Association for Computational mitru Erhan, Vincent Vanhoucke, and Andrew Ra-
Linguistics, Lisbon, Portugal, pages 1520–1530. binovich. 2015. Going deeper with convolutions. In
Proceedings of the IEEE Conference on Computer
Tomas Mikolov, Kai Chen, Greg Corrado, and Jef- Vision and Pattern Recognition. pages 1–9.
frey Dean. 2013. Efficient estimation of word
representations in vector space. arXiv preprint Ha-Nguyen Tran and Erik Cambria. 2018. Ensemble
arXiv:1301.3781 . application of elm and gpu for real-time multimodal
sentiment analysis. Memetic Computing 10(1):3–
Hannah Miller, Daniel Kluver, Jacob Thebault-Spieker, 13.
Loren Terveen, and Brent Hecht. 2017. Understand-
Sanjaya Wijeratne, Lakshika Balasuriya, Amit Sheth,
ing emoji ambiguity in context: The role of text
and Derek Doran. 2017a. Emojinet: An open ser-
in emoji-related miscommunication. In 11th In-
vice and api for emoji sense discovery. International
ternational Conference on Web and Social Media,
AAAI Conference on Web and Social Media (ICWSM
ICWSM 2017. AAAI Press.
2017). Montreal, Canada .
Louis-Philippe Morency, Rada Mihalcea, and Payal Sanjaya Wijeratne, Lakshika Balasuriya, Amit Sheth,
Doshi. 2011. Towards multimodal sentiment analy- and Derek Doran. 2017b. A semantics-based mea-
sis: Harvesting opinions from the web. In Proceed- sure of emoji similarity. International Confer-
ings of the 13th international conference on multi- ence on Web Intelligence (Web Intelligence 2017).
modal interfaces. ACM, pages 169–176. Leipzig, Germany .
Noa Na’aman, Hannah Provenza, and Orion Montoya. Ian Wood and Sebastian Ruder. 2016. Emoji as emo-
2017. Varying linguistic purposes of emoji in (twit- tion tags for tweets. In Emotion and Sentiment Anal-
ter) context. In Proceedings of ACL 2017, Student ysis Workshop, LREC.
Research Workshop. pages 136–141.
Quanzeng You, Liangliang Cao, Hailin Jin, and Jiebo
Petra Kralj Novak, Jasmina Smailović, Borut Sluban, Luo. 2016a. Robust visual-textual sentiment analy-
and Igor Mozetič. 2015. Sentiment of emojis. PloS sis: When attention meets tree-structured recursive
one 10(12):e0144296. neural networks. In Proceedings of the 2016 ACM
on Multimedia Conference. ACM, pages 1008–
Henning Pohl, Christian Domin, and Michael Rohs. 1017.
2017. Beyond just text: Semantic emoji similar-
ity modeling to support expressive communication. Quanzeng You, Jiebo Luo, Hailin Jin, and Jianchao
ACM Transactions on Computer-Human Interaction Yang. 2016b. Cross-modality consistent regression
(TOCHI) 24(1):6. for joint visual-textual sentiment analysis of social
multimedia. In Proceedings of the Ninth ACM Inter-
Soujanya Poria, Erik Cambria, and Alexander Gel- national Conference on Web Search and Data Min-
bukh. 2015. Deep convolutional neural network ing. ACM, pages 13–22.
textual features and multiple kernel learning for
utterance-level multimodal sentiment analysis. In Yuhai Yu, Hongfei Lin, Jiana Meng, and Zhehuan
Proceedings of the 2015 conference on empiri- Zhao. 2016. Visual and textual sentiment analysis
cal methods in natural language processing. pages of a microblog using deep convolutional neural net-
2539–2544. works. Algorithms 9(2):41.
685
Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude
Oliva, and Antonio Torralba. 2016. Learning deep
features for discriminative localization. In Proceed-
ings of the IEEE Conference on Computer Vision
and Pattern Recognition. pages 2921–2929.
686

Style

Uploaded by

Copyright:

Available Formats

You might also like

Style

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Style

Uploaded by

Copyright:

Available Formats

Multimodal Emoji Prediction

Francesco Barbieri♦ Miguel Ballesteros♠ Francesco Ronzano♥ Horacio Saggion♦

where en is the emoji included in the n-th Insta-

Table 3: F-measure in the test set of the 20 most fre-

dard emoji is the second one or is often mis-

You might also like