Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

Demonstration Paper SIGIR’17, August 7-11, 2017, Shinjuku, Tokyo, Japan

Social Media Image Recognition for Food Trend Analysis


Giuseppe Amato Paolo Bolettieri Vinicius Monteiro de Lira∗
ISTI-CNR, Pisa, Italy ISTI-CNR, Pisa, Italy ISTI-CNR, Pisa, Italy

Cristina Ioana Muntean Raffaele Perego Chiara Renso


ISTI-CNR, Pisa, Italy ISTI-CNR, Pisa, Italy ISTI-CNR, Pisa, Italy

ABSTRACT beef steak or pizza), but rather they provide generic and contextual
An increasing number of people share their thoughts and the images comments (e.g. “yesterday night dinner”). Therefore, analyzing this
of their lives on social media platforms. People are exposed to food social phenomenon requires to recognize the dish captured by the
in their everyday lives and share on-line what they are eating by photo by classifying the image into a number of food categories,
means of photos taken to their dishes. The hashtag #foodporn is and associating it with the place where the photo was taken or
constantly among the popular hashtags in Twitter and food photos where the user is from. The proposed WorldFoodMap system has
are the second most popular subject in Instagram after selfies. The been designed and developed with the objective of visualizing, in an
system that we propose, WorldFoodMap, captures the stream of interactive way, the popularity and trends in the categories of food
food photos from social media and, thanks to a CNN food image shared in social media worldwide. WorldFoodMap is equipped
classifier, identifies the categories of food that people are sharing. with a image recognition engine specifically trained on food images,
By collecting food images from the Twitter stream and associating methods to detect and capture the posts with food images from
food category and location to them, WorldFoodMap permits to media streams, methods to properly locate them on the globe, and
investigate and interactively visualize the popularity and trends of analysis methods for computing popularity and trends measures.
the shared food all over the world. The WorldFoodMap image recognition engine leverages Deep
Learning techniques based on Deep Convolutional Neural Networks
KEYWORDS (CNNs) [10]. We use a pre-trained GoogLeNet [12] CNN, fine-
food image recognition; social media streaming; deep neural net- tuned using training images from the ETHZ Food-101 dataset [4],
containing in total 101.000 images belonging to 101 food categories.
work
Food recognition is obtained using a k-NN classifier on the deep
features extracted from image queries and the images of a training
1 INTRODUCTION set composed of ETHZ Food-101 and UPMC Food-101 [13] datasets.
We are increasingly experiencing the phenomenon called eat and The potential users of WorldFoodMap may vary from re-
tweet where a growing number of people is sharing photos of their searchers in social media mining to domain-specific stakeholders
cooked and intaken food through social channels like Instagram, like, for example, health and nutrition-based experts. Although in
Twitter, Facebook. Social media statistics say that there are cur- this paper the tool is instantiated in the food domain, the proposed
rently about 600 Million users on Instagram1 and about 500 million solution is not domain dependent and the whole system can be
posts are posted daily on Twitter2 . Food photos are Instagram’s sec- easily instantiated in other domains by properly training a specific
ond most popular subject, surpassed only by selfies, counting more image recognition network. The potential users encompass thus
than 300 million photos3 . This phenomenon, commonly known also domain experts interested in having a global and timely vision of a
as food porn, engages people to glamorize the food they are cooking image-based social media phenomenon (e.g., traveling and cultural
or eating by posting beautiful and attracting pictures. Instagram heritage fruition, selfies, cats and dogs, etc).
counts over 100 million posts with the #foodporn hashtag4 . These To the best of our knowledge, WorldFoodMap is the first pro-
numbers give potentially a broad and interesting measure of what posal to interactively show food trends based on media stream
people are eating worldwide at nearly real time. However, the photo images recognition. However, the research work in the context of
caption or post comments not always describe the food shot (e.g. food recognition from photos is receiving increasing interests in
the literature. Preliminary works on visual food recognition em-
∗ also Federal University of Pernambuco, Brazil and University of Pisa, Italy ployed aggregation of hand-crafted features, capturing mainly color
1 https://instagram-press.com/
2 goo.gl/SjX6jo and texture information. Leveraging on HOG features and color
3 goo.gl/jGvkvD histograms, Kawano et al. proposed FoodCam [8], a system for
4 Taken from the Instagram web site on February 26, 2017 smartphones for real-time user-aided visual food recognition using
CNNs. In PlateClick [14], a food preference elicitation system uses
Permission to make digital or hard copies of all or part of this work for personal or a deep CNN to retrieve visually similar food images from visual
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation quizzes for recommendations purposes. Christodoulidis et al. [5]
on the first page. Copyrights for components of this work owned by others than ACM trained a CNN to recognize patches of an image of a dish among
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specific permission and/or a
only seven different food classes, using a sliding-window classifica-
fee. Request permissions from permissions@acm.org. tion and a majority voting scheme to predict a food class. All these
SIGIR ’17, August 07-11, 2017, Shinjuku, Tokyo, Japan solutions are limited in the number of different food classes they
© 2017 ACM. 978-1-4503-5022-8/17/08. . . $15.00
DOI: http://dx.doi.org/10.1145/3077136.3084142
can recognize, or require a high cognitive load from the user.

1333
Demonstration Paper SIGIR’17, August 7-11, 2017, Shinjuku, Tokyo, Japan

images that are salient to the training domain. The layers closer
to the output of the neural network carry out a more semantic
knowledge (similar objects, similar scenes). The layers closer to the
input of the neural network carry out a more syntactic knowledge
(shapes, geometry, colours). In many cases, the activation values of
the chosen layer of the neural network, to be used as features, are
treated as vectors of real values, and compared using the Eucledian
distance. The visual features, extracted using this approach are
often referred to as deep features. In order to perform food recog-
nition, we used a pre-trained GoogLeNet [12], fine-tuned with a
further training process on images from the ETHZ Food-101 dataset
[4]. The fine-tuned network is used to perform deep feature ex-
traction from food images. Specifically, the activation values of the
pool5/7x7 s1 layer of the network, obtained when receiving as input
an image, are used as food specialized image deep features. Food
recognition is obtained using a k-NN classifier on the deep features
extracted from image queries and the images of the training set. As
training set for the k-NN classifier we use the union of the ETHZ
Food-101 dataset (which was also used to execute the refinement of
Figure 1: WorldFoodMap Architecture
the GoogLeNet) and the UPMC Food-101 [13] dataset.
This approach of using a k-NN classifier on the deep features
2 SYSTEM ARCHITECTURE extracted from the GoogLeNet model tuned on food images, has the
advantage of being less prone to over-fitting, especially on classes
The architecture of WorldFoodMap, illustrated in Figure 1 is or-
with few training examples. Moreover, it can be easily extended to
ganized into four layers: the Data Gathering layer collects Twitter
recognize new classes of food. The feature representation, learnt
data from the Streaming API, the Data Processing classifies the im-
by the neural network, is used for all training images of all classes.
ages according to 101 food categories and identifies the location of
Classes with fewer training examples can still use the full power
the post; the Data Analysis computes trends and popularity mea-
of the learnt features. In order to add an additional category of
sures of food categories at a per country level, and, finally, the
food to be recognized, it is sufficient to extract the deep features
Interactive Visualization layer provides an interactive web interface
from the new training examples, and include them in the k-NN
from which users can query tendencies and popularities of food
classification process. From some preliminary experiments, best
categories or visualize the incoming stream of food photos.
accuracy is obtained with k equal to 15 and in 95% of the cases the
correct food class is among the three highest ranked classes.
2.1 Data Gathering
WorldFoodMap continuously collects tweets through the Stream- Location Identifier. The Location Identifier extracts city and
ing API provided by Twitter5 . The Twitter stream is filtered for country information from each tweet. As a primary source of
tweets related to food by means of a list of manually selected rel- location information we use the GPS and Place fields in the tweet,
evant keywords and hashtags (e.g., #food, #foodporn, #instafood, when present. However, these data are very sparse, and they are
etc) that can be easily changed or extended. The tweets passing the present in less than 10% of the tweets. A second information source
filter are further checked for the presence of images. We discard used to infer the tweet location, when the primary source is not
tweets without images since there is no visual content to analyse. available, is thus the free-text user location field in the profile of the
user posting the message. We identify the city and country from the
2.2 Data Processing content of this field based on data from the Geonames6 dictionary
There are two main processing units for the information extraction which fed a simple “parsing and matching” heuristic procedure. This
from tweets: the Image Classifier and the Location Identifier. technique provides reliable geo-location information in presence
of meaningful user location data [11]. All the tweets for which the
The Image Classifier. Visual food recognition leverages Deep above geo-referencing process succeeds are used for the time series
Learning techniques based on CNNs. Deep learning techniques analysis discussed below. For the streaming visualization however,
have outperformed previous state of the art techniques in several only the tweets having the GPS or Place information are used, as it
applications, such as image classification [1, 3, 6], image retrieval, requires a precise localization.
and object recognition. The activation values of internal layers of a
trained CNN have been effectively used as visual features to com- 2.3 Data Analysis
pare objects for the task on which the neural network was trained The Data analysis layer materializes items in time series from the
[2, 6]. If two images are similar, the corresponding activation values Data processing layer and provides analytics functionalities. The
of the internal layer are similar as well. The internal layers of the items encapsulate the timestamp of the tweet, the attached image,
neural network somehow learn to recognize the visual aspects of the dish category and the location from where the tweet was posted.
5 dev.twitter.com 6 http://www.geonames.org/

1334
Demonstration Paper SIGIR’17, August 7-11, 2017, Shinjuku, Tokyo, Japan

This information is used to compute long-term trends for specific in near-real time from the Twitter stream. All these visualizations
food categories and to detect short-term popularity bursts. can be personalized and refined by using a number of user-level
settings such as the focus on a specific food category (e.g. Lasagna
Trend analysis. Trends can be identified by looking at time
Bolognese, Steak, etc), or the temporal interval when to perform
series over a fixed time period [7]. A trend can be either positive
popularity and trend analysis, or the focus on a specific country.
or negative and it determines a tendency over the observed period.
In addition, WorldFoodMap provides functionalities for real
In WorldFoodMap, we observe food trends to see how the differ-
time notifications of rapid changes in trends of observed food. Once
ent food categories are trending both worldwide and in a specific
the user is logged in the system, she can set up custom alerts for
country. For finding a trend, we first correlate time with the fre-
following the trends of food around the world. The user specifies the
quency of a given food category in the relevant tweets posted in
category of food, the country, a time window (e.g. last semester, last
the specific time and place. Then we fit the observed data with a
2 years) and a temporal interval (e.g. monthly, weekly) for observing
simple linear regression. The objective is to find the function that
the trend and the percentage of variation. Those alerts are triggered
best fits the described data: the function (at + b) that minimizes the
and notified to the users when, for the matched configuration of
sum( of squared errors ) between the data and the line minimizing country and food category, the trending rate is higher or lower
t [(at + b) − yt ] . The trend is derived by calculating the slope
P 2
than the specified threshold.
of the least squares between the actual values and the fitted line.
Popularity analysis. Another view on how culinary habits are
changing is obtained by a short-term analysis aimed at looking if a 3 THE DEMOSTRATION
certain food category presents a bursty behavior. We investigate Any user may interact with the demo to browse the trends of
whether a food category is growing or losing popularity by look- selected food categories or visualize the streaming information,
ing at the deviation of frequency from the mean, observed over where tweets with the relative location and classified food category
a determined time interval configurable by the user. This can be are plotted on the map in a near-real time fashion. These two views
expressed by calculating the standard score [9], a measure providing correspond to the modules depicted in Figure 1 and discussed in
an assessment of how off-target a process is operating. The standard Section 2: the popularity and trends visualization and the streaming
x −µ
score z is calculated as z = σ , where x is the value of the current visualization.
observation, µ is the mean of the population and σ is the standard Trends and Popularity Visualization. WorldFoodMap
deviation of the population. This represents the signed number provides two visualization modes for trending and popularity anal-
of standard deviations and observations that are above/below the ysis: the report with a tabular view and a worldwide choropleth
mean. A positive score indicates a datum above the mean, whereas map. The reports view allow the user to perform custom analysis
a negative score indicates a datum below the mean. A score of 0 either by country or by food category. The reports by country show
means the observation is the same as the mean. the ranking of trending food categories for the country selected
A bursty phenomenon occurs far from the average, so the stan- by the user. As for example, in the Figure 2(a) WorldFoodMap
dard score can be a good indicator of a trending situation. A nega- reports the trending food categories in Japan. The reports by food
tive score can indicate a sudden fall in popularity, whereas a positive category instead, show the ranking of countries where the selected
one indicates an increasing number of observations. For example we food category is trending or not, Figure 2(b) reports the trending
may see bursty dishes in periods close to seasonal holidays, when countries for sharing ‘Sashimi’ food. Furthermore, for both reports
tweets with typical dishes show increased frequencies. Around the ranking can also be ordered by popularity variation and the
Thanksgiving day, in the US, we may see many images of roasted results can be visualized in tables, summarizing the results, or on a
turkey, which may appear as a bursty dish in our analysis. In ad- line chart, following the time-series style.
dition to popularity, WorldFoodMap provide a frequency-based The visualization on the worldwide map has a more immediate
analysis visualizing the frequency of a given food category for a impact, as the user can see the upward and downward trends and
given temporal interval. popularity at country level, for a given food category, thanks to
the use of a color scale, as in Figure 2(c). Darker colors represent a
2.4 Interactive Visualization higher value, while lighter colors present a lower trend or popularity
value. By selecting one option between trends or popularity, the
WorldFoodMap enables interactive analysis through a responsive
user indicates whether WorldFoodMap computes trending values
web interface adapted either for mobile and desktop web browsers.
or popularity variances, as detailed in Section 2.3. The user can
Such interface is designed to be intuitive and simple. Among the
change the food category by selecting an item in a list of pre-
interactive functionalities available through WorldFoodMap web
existing categories (101 food categories) learned by our classifier
interface, we highlight the Trends and Popularity depicting the
model. The user can set a time-window filter for computing the
worldwide trending and popular foods, and the Streaming View.
trends and popularity variations. For example, as we can see in
For trends and popularity we have two visualization modes:
Figure 2(c), the user chooses “Lasagna Bolognese” from the list
the trends reports illustrating in a tabular form the results of the
of food categories, and the map displays the countries where it is
global or country-level trends and popularity analysis, and the
trending up (USA, Iceland) or trending down (Canada, France).
worldwide map plotting the same information in a choropleth map.
The Streaming View permits, instead, to point in a worldwide map Streaming visualization. Another visualization is the world-
the punctual location of geotagged and classified images gathered wide stream map, giving a real-time view of the food tweets as they

1335
Demonstration Paper SIGIR’17, August 7-11, 2017, Shinjuku, Tokyo, Japan

(a) Food trends in a country (b) Food trends by dish category

(c) Map of trends for a food category by country (d) Streaming visualization of tweets with the classified image food category

Figure 2: WorldFoodMap screenshots.

appear in the network. Through this visualization it is possible to [3] Artem Babenko, Anton Slesarev, Alexandr Chigorin, and Victor Lempitsky. 2014.
see what people are sharing “now” all around the world. Given the Neural codes for image retrieval. In Computer Vision-ECCV 2014. Springer.
[4] Lukas Bossard, Matthieu Guillaumin, and Luc Van Gool. 2014. Food-101 – Mining
location data, we can place the object on the world map and anno- Discriminative Components with Random Forests. In Eur. Conf. on Comp. Vision.
tate it with the food category label provided by the image classifier. [5] Stergios Christodoulidis, Marios Anthimopoulos, and Stavroula Mougiakakou.
2015. Food Recognition for Dietary Assessment Using Deep Convolutional Neural
The tweets remain visible in the map for a fixed temporal window Networks. Springer.
or when the density of images on the map becomes unreadable. A [6] Jeff Donahue, Yangqing Jia, Oriol Vinyals, Judy Hoffman, Ning Zhang, Eric
simple example selecting one tweet from Japan showing an image Tzeng, and Trevor Darrell. 2013. Decaf: A deep convolutional activation feature
for generic visual recognition. arXiv preprint arXiv:1310.1531 (2013).
classified as sashimi category can be seen in Figure 2(d). [7] James Douglas Hamilton. 1994. Time series analysis. Vol. 2. Princeton university
From this interface the user can visualize the stream of tweets press Princeton.
by selecting among a list of relevant hashtags provided by [8] Yoshiyuki Kawano and Keiji Yanai. 2015. FoodCam: A real-time food recognition
system on a smartphone. Multimedia Tools and Applications 74, 14 (2015).
WorldFoodMap, as #food, #breakfast, #dinner, #lunch, [9] Erwin Kreyszig. 2007. Advanced engineering mathematics. John Wiley & Sons.
#foodporn, #instafood. [10] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classifica-
tion with deep convolutional neural networks. In Advances in neural information
More details of the tool and additional screenshots are available processing systems. 1097–1105.
at the URL: http://worldfoodmap.isti.cnr.it [11] Jukka-Pekka Onnela, Samuel Arbesman, Marta C González, Albert-László
Barabási, and Nicholas A Christakis. 2011. Geographic constraints on social
Acknowledgments. This work was supported by the EC H2020 network groups. PLoS one 6, 4 (2011), e16939.
INFRAIA-1-2014-2015 SoBigData (654024). [12] Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir
Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015.
Going Deeper with Convolutions. In Computer Vision and Pattern Recognition
(CVPR). http://arxiv.org/abs/1409.4842
REFERENCES [13] Xin Wang, Devinder Kumar, Nicolas Thome, Matthieu Cord, and Frédéric Pre-
[1] Giuseppe Amato, Fabio Carrara, Fabrizio Falchi, Claudio Gennaro, Carlo Meghini, cioso. 2015. Recipe recognition with large multimodal food dataset. In 2015 IEEE
and Claudio Vairo. 2017. Deep learning for decentralized parking lot occupancy International Conference on Multimedia. 1–6.
detection. Expert Syst. Appl. 72 (2017), 327–334. [14] Longqi Yang, Yin Cui, Fan Zhang, John P. Pollak, Serge Belongie, and Deborah
[2] Giuseppe Amato, Fabrizio Falchi, Claudio Gennaro, and Fausto Rabitti. 2016. Estrin. 2015. PlateClick: Bootstrapping Food Preferences Through an Adaptive
YFCC100M-HNfc6: A Large-Scale Deep Features Benchmark for Similarity Visual Interface. In Proceedings of the 24th ACM CIKM.
Search. In SISAP 2016.

1336

You might also like