Sciencedirect Sciencedirect

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

4th IFAC Symposium on Telematics Applications

4th
4th IFAC
IFAC Symposium
November on
on Telematics
6-9, 2016. UFRGS,
Symposium Applications
Porto Alegre,
Telematics RS, Brazil
Applications
4th IFAC
November Symposium
6-9, 2016. on Telematics
UFRGS, Porto Applications
Alegre, RS, Brazil
November 2016. UFRGS, Porto Available
November 6-9, 2016. UFRGS, Porto Alegre, RS,
6-9, Alegre, online at www.sciencedirect.com
RS, Brazil
Brazil

ScienceDirect
IFAC-PapersOnLine 49-30 (2016) 257–262
Using
Using Big
Big Data and Real-Time Analytics to
Big Data
Using Support and
and Real-Time
DataSmartReal-Time
City
Analytics
Analytics
Initiatives
to
to
Support Smart City Initiatives
Support Smart City Initiatives
ArthurSouza ∗∗ Mickael Figueredo ∗∗ ∗∗ Nélio Cacho ∗

Arthur
ArthurSouza
Souza ∗ Mickael ∗Figueredo ∗∗
Mickael
Daniel Mickael
∗ Figueredo
Araújo ∗Figueredo ∗∗ Nélio
Carlos A. Prolo ∗ Cacho
Nélio Cacho ∗

ArthurSouza
Daniel Araújo Nélio
∗ Cacho
Daniel Araújo ∗ Carlos

Carlos A.
A. Prolo
Prolo ∗

Daniel Araújo Carlos A. Prolo

∗ Department of Informatics and Applied Mathematics ,Federal
∗ Department

Department of
of Informatics and Applied
Norte Mathematics
,Natal, Brazil,Federal
∗∗
of Informatics
University
Department
University of Rio
and
of Rio Grande
Informatics
Grande
Applied
and do
Applied
do Norte
Mathematics
Mathematics
,Natal,
,Federal
,Federal
Brazil
∗∗ School University
of Science
University of
and
of Rio
Rio Grande
Techology
Grande do Norte
Federal
do Norte ,Natal,
University
,Natal, Brazil
of Rio
Brazil Grande
∗∗ School of Science and
∗∗ School of Science and Techology
Techology Federal
Federal University
University of
of Rio
Rio Grande
Grande
School of Science anddo
do
Norte, Natal,Brazil
Techology
Norte, Federal University
Natal,Brazil of Rio Grande
do Norte, Natal,Brazil
do Norte, Natal,Brazil
Abstract:
Abstract:
Abstract:
A central issue in the context of smart cities is related to the capability to acquire timely
Abstract:
A central
A central issue
information
A central issue
about
issue
in
in
the
the context
in city
the context
events. This
context
of
of smart
of smart
paper
smart
cities
cities is
is related
describes
cities is related to
to the
a platform
related to the
capability
the which
capability
capability
to
to acquire
focuses
to acquire
on processing
acquire
timely
timely
timely
information
information
messages about
about
posted in city
city
Twitterevents.
events. This
This
social paper
paper
network. describes
describes
Key issues aa here
platform
platform are which
which
the high focuses
focuses
throughput on
on processing
processing
a large
information
messages about
posted in city
Twitterevents. This
social paper
network. describes
Key issues a platform
here are which
the high focuses
throughput on processing
aa large
messages
volume
messages of posted
data
posted per in Twitter
second
insecond
Twitter thatsocial
needs
social network.
to
network.be Key issues
processed,
Key issues and here
here the are
need
are the
to
the high
process
high throughput
ill formed
throughput a large
natural
large
volume
volume
language of
of data
data
texts. per
per
With second
these that
that
in needs
needs
mind to
to
the be
be processed,
processed,
platform has and
and the
the
pipelined need
need to
to
modules process
process for ill
ill formed
formed
robust, natural
natural
fast, real
volume
language of data per second that needs to be processed, and the need to process ill formed natural
time tweettexts.
language
language texts. With
With these
acquisition
texts. With these in
in mind
and storage,
these in mind
the
the platform
mind filtering
the platform has
has pipelined
of several
platform has pipelined
kinds, natural
pipelined
modules
modules
modules
for
for robust,
language
for robust,
processing
robust,
fast,
fast, real
fast, real
and
real
time
time tweet
tweet
sentiment acquisition
acquisition
analysis, thatandfeedstorage,
and storage, filtering
filtering
a final analysis of
of
and several
several kinds,
kinds,
visualization natural
natural
module. Alanguage
language
case study processing
processing
of sentiment and
and
time tweet
sentiment acquisition
analysis, and storage, filtering of several kinds, natural language processing and
sentiment
analysis during
sentiment analysis,
analysis,the that2014feed
that
that feed
FIFA
feed
a
a final
a final
World
final
analysis
analysis
Cup in
analysis
and
and
and
visualization
Brazil is used tomodule.
visualization
visualization validateA
module.
module. A
Athe
case
case
the study
study
effort
caseeffort
studymade
of
of sentiment
of sentiment
made so far.
sentiment
analysis
analysis during
during the
the 2014
2014 FIFA
FIFA World
World Cup
Cup in
in Brazil
Brazil is
is used
used to
to validate
validate the effort made so
so far.
analysis
© 2016, IFACduring the 2014 FIFA
(International World
Federation of Cup in Brazil
Automatic is used
Control) Hosting to validate
by Elsevier theLtd.
effort so far.
madereserved.
All rights far.
Keywords: Smart cities, social networks, tweets, natural language processing.
Keywords:
Keywords: Smart Smart cities,
cities, social
social networks,
networks, tweets,
tweets, natural
natural language
language processing.
processing.
Keywords: Smart cities, social networks, tweets, natural language processing.
1. INTRODUCTION social media as a data source to support the decisions of
1. INTRODUCTION
1. INTRODUCTION social
social media as
media as
in aaathe
data
data source
source to
of to support
support the decisions
the decisions of
1. INTRODUCTION policymakers
social media as
policymakers in data
the
context
context source of toa smart
a support
smart the decisions of
city.
city. of
policymakers
policymakers in
in the
the context
context of
of aa smart
smart city.
city.
The city’s growth adds complexity and management chal- The The proposed platform was implemented and tested by
The city’s growth adds complexity
complexity and management chal- The means proposed
proposed
of analyzing platform
platform was implemented
was
7.5 million implemented
Tweets. Theand and tested
tested
results by
by
show
The
lenges
The city’s
to
city’s growth
the
growth adds
government
adds authorities
complexity and
andin management
dealing
management with chal-
prob-
chal- The
means proposed
of analyzing platform7.5 was implemented
million Tweets. The and tested
results by
show
lenges
lenges
lems to the
to the government
related government
to water authorities
authorities
supply, local inwaste
in dealing
dealing with prob-
with
disposal, prob-
ur- means
that it of
means is analyzing
of possible to7.5
analyzing 7.5 million
identify
million Tweets.
relevant
Tweets. The
information
The results
results show
about
show
lenges
lems to the government
related to water
water supply, authoritieslocal inwaste
dealing with prob-
disposal, ur- thatthat it ofis
is agglomeration
possible to to identify relevant information
information about
lems related to pointsit
that possible
possible to identify
it ofis agglomeration identify relevant
and concentration
relevant information about
of users, move-
about
ban traffic
lems
ban related
traffic to water supply,
management
management
system, local
supply,
system,
health,
local
health,
waste disposal,
education,
waste disposal,
education,
ur-
public
ur- points
public points and concentration of users, move-
ban traffic
safety, management
economy, environment system, and health,
tourism.education,
In this public
sense, ment ofof
points of agglomeration
users and the user
agglomeration and
and concentration
perception
concentration aboutof
of users,
users, move-
city events.
move-
ban traffic
safety, management
economy, environment system, and health,
tourism.education, this public
In sustainable
sense, ment ment of the
of users
users and results
and the user
the user perception
perception about
about city events.
city events.
safety,
the great economy,
challenge environment
to be faced and is tourism.
to ensure In this sense, Overall
ment of the
usersinitial
and results
the user suggest
perception that data about collected from
city events.
safety,
the greateconomy,
challenge environment
to be faced and is tourism.
to ensure In this sense,
sustainable Overall
Overall the initial
initial results suggest
suggest that
that data
data collected
collected from
from
social
Overall media can
the initial be applicable to the effective management
the great
great challenge
urbanization
the
urbanization
associated
challenge
associated
to
to be with
be
with
faced is
is to
to ensure
socioeconomic
faced
socioeconomic
sustainable
progress.
ensureprogress.
sustainable social social
of smart media
media can
citycan beresults
be applicable
applicable
initiatives.
suggest
The to to that
the
the
remainder
data collected
effective
effective management
management
of this
from
paper is
urbanization
urbanization associated
associated with
with socioeconomic
socioeconomic progress.
progress. social
of media
smart city citycan be applicable
initiatives. to the
The 2remainder
remainder effective management
Politicians around the world are seeking for answers and of smart
organized initiatives.
as follows. The
Section describes of of
some this paper
thisdetails
paper ofis
is
Politicians
Politicians
ways to dealaround
around
with the world
the
these world are seeking
are
challenges. seeking
One forthe
for
of answers
answers and of
and
strategies
smart city
organized as initiatives.
follows. The 2remainder
Section describes of
some this paper
details is
of
Politicians around the world are seeking fortheanswers organized
the Natal as follows.
Smart City Section
initiative.
and organized as follows. Section 2 describes some details of 2 describes
Section some
3 details
presents of
the
ways to deal
deal with these
these challenges. One of strategies the Natal Smart Smart Cityimplementation
initiative. Section
Section
ways
ways to
proposed
to deal with
encompasses
with thesethe challenges.
the creation One
challenges. One of
of the
of smart strategies
cities.
thecities. The the
strategies Natal
architecture
the Natal Smart and the City initiative.
Cityimplementation
initiative. Section details 333 presents
presents
presents
the
the
of the plat-
the
proposed
proposed
work of encompasses
encompasses
Caragliu et al. the creation
creation
(2011) argue of
of smart
smart
that a cities.
city can The
Thebe architecture and the details of the plat-
proposed encompasses the creation ofthat
smart cities. The architecture
form. Sectionand
architecture and the
4 describes
the implementation
the evaluation
implementation details
process.
details of
of the
the plat-
Section
plat-
work of Caragliu et al. (2011) argue a city can be form. Section
work
defined
work ofasCaragliu
ofas “smart”,et
Caragliu al.
al. (2011)
when
etwhen there isargue
(2011) investmentthat
that aainincity
human can andbe
be form.
5 presentsSection some 444 describes
describes
related works. the evaluation
the evaluation
Finally, Section process.
process. Section
Section
6 provides
defined
defined
social as “smart”,
“smart”,
capital, as wellwhen
as in there
there isargue
is
informationinvestment
investment
and in
city
human
human
communication
canand
and
form.
5 Section
presents some describes
related the evaluation
works. Finally, process.
Section 6 Section
provides
defined as “smart”, when there is investment in human and 5
5 presents
some
presents some
concluding
some related
remarks.
related works.
works. Finally,
Finally, Section
Section 6
6 provides
provides
social
social capital,
capital, as well
as well as in
as in information and and communication some concluding remarks.
technology
social (ICT)
capital,
technology (ICT)as well as in information
information and communication
infrastructure.
infrastructure. communication some some concluding
concluding remarks. remarks.
technology
technology (ICT)
(ICT) infrastructure.
infrastructure. 2. NATAL SMART CITY INITIATIVE
Smart city incorporates a large number of systems, which
Smart citythe
incorporates 2. NATAL
NATAL SMART CITY CITY INITIATIVE
Smart
represent
Smart city
citythe most basicaa
incorporates
incorporates
large number
large numberfor
infrastructure
ainfrastructure
large numberfor
of systems,
of systems,
integrating
of systems,
which
which
the
which 2.
2. NATAL SMART SMART CITY INITIATIVE INITIATIVE
represent
represent the
real and virtual most
most basic
basic
worlds. infrastructure
One of the great integrating
for integrating
integrating
challengesthe the
theof Natal is located on the northeast of Brazil by the Atlantic
represent
real the
and virtual most
virtual basic
worlds. infrastructure
One of the for
great challenges
challenges of Natal
Natal is The
located on the
the northeast of Brazil
Brazil by thethe Atlantic
real
real and
deployment
and virtualof smartworlds.
worlds.citiesOne
One is of thethe
of the great
extraction of relevant
great challenges of Ocean.is
of Natal located
is The
locatedcapitalon
on the citynortheast
of the state
northeast of
of Brazil by
of Rio
by the Atlantic
Grande do
Atlantic
deployment
deployment
information of of
fromsmart
smart cities
the cities
ICT is is the
the extraction
extraction
infrastructure of ofof relevant
relevant
cities. Ocean.
For Ocean.
Norte isThe home capital
capital city
city
of approximately of
of the
the state
state
862.000 of
of Rio
Rio Grande
Grande
people. do
do
Natal
deployment
information of
fromsmartthe cities
ICT is the extraction
infrastructure of of relevant
cities. For Ocean.
Norte is Thehome capitalof city of
approximately the state
862.000 of Rio Grande
people. do
Natal
information
Komninos et from
information from the
al. (2013),
the ICT
ICT infrastructure
suchinfrastructure
extraction usually of
of cities. For
reliesFor
cities. Norte
on Norte
was a host is
is home
city of
home of approximately
of the 2014 FIFA 862.000
approximately 862.000
World Cup. people. Natal
Although
people. Natal
Komninos
Komninos
the use of et et al. (2013),
al.
sensors (2013),
that are suchinstalled
such extraction
extraction usually relies
usually
to capture relies
the flow on was
on was
Natalaaa was
host not
host citythe
city of the
of the
location 2014 for
2014 FIFA
FIFA World
theWorld
WorldCup. Cup. Although
Cup Although
knockout
Komninos
the use of et al. (2013),
sensors that suchinstalled
are extraction to usually relies
capture the on was
flow Natal host not
was citythe of the
location 2014 forFIFA the World
World Cup.
Cup Although
knockout
the
of
the use of
vehicles, sensors
water
use of sensors that
and
that are
energy installed
consumption,
are installed to capture
thus
to capture the flow
requiring
the flow Natal Natal
stage, was
Natal
was not
not the
hosted location
4
the location games for
forin the
the
the World
group
World Cup
stage
Cup knockout
with an
knockout
of
of vehicles,
vehicles,
high water
public water and
and
investment energy
energy consumption,
for the consumption,
development of thus
thus
smartrequiring
requiring stage,
cities. stage, Natal
averageNatal hosted
hosted
attendance 4 games
of44 40,000
games in
in
fans the
the group
at group
each game. stage
stage with
with
In an
an
total,
of vehicles,
high public water
public investment and
investment for energy
for the consumption,
the development
development of thus
of smart requiring
smart cities. stage,
cities. average
average Natal hosted
attendance of games
40,000 in
fans theat group
each stage
game. with
In an
total,
high
high public investment for the development of smart cities. Natal received
average attendance
attendance around of
of 40,000
173,000fans
40,000 fans at
tourists
at each
each game.
during
game. theIn
In total,
World
total,
To overcome such difficulty, some studies (Doran et al. Natal Natal received
received around to
around 173,000
173,000 tourists during the World
World
To overcome such difficulty,
difficulty, some studies
studies (Doran et al. Cup period.
al. Natal received According
around to study tourists
173,000 performed
tourists during
during the
by ForwadKeys
the World
To
To overcome
(2013);
overcome such
Anantharam
such et al. (2015))
difficulty, some
some suggest(Doran
studies (Doran et
using social
et al. Cup period. According study performed by ForwadKeys
(2013);
(2013); Anantharam
media toAnantharam
identify the et et al. (2015))
al.
perception (2015)) suggest using
suggest
of residents using social Cup
social
and visitors Cup period.
period. According
and Pires&Associados According (2014), to
to study
(2014),
studyNatalperformed
Natal presented
performed by
by ForwadKeys
the highest
ForwadKeys
(2013); Anantharam et al. (2015)) suggest using social and and Pires&Associados
Pires&Associados
growing number of bookings (2014), Natal
among presented
presented
all host cities the highest
the highest
when
media
media
about atoto identify
identify
particular the
the perception
city.perception
For example, of
of residents
residents
social mediaand
and visitors
visitors and
can be growing Pires&Associados (2014), Natal presented the highest
media to identify the perception of residents and visitors growing
compared number
number
to the of bookings
of
same bookings
period among
among
in 2013, all
all
for host
host
which cities
cities when
when
bookings
about
about a particular
used toaa particular city.
city. For example, social media can be growing number of bookings among all host cities when
about obtain relevant
to particular city. For
For example,
information
example, about social
social media
about media can
can be
the situation compared
be compared
did have a to to the same
the same period in
period in 2013, for for which bookingsbookings
used
used to
of public obtain
obtain
transport,relevant
relevant
traffic information
information
and environmental about the situation
theconditions,
situation compared
did have a
increase
to the same
increase of period in 2013,
of 1000%.
1000%. 2013, for which which bookings
used
of to
public obtain
transport,relevant information
trafficevents
and environmental about
environmental the situation did have a increase of 1000%.
of public
public
of public transport,
safety and general
transport, traffic and
trafficevents in cities. In thisconditions,
and environmental sense, the did
conditions,
conditions, Thehavehigh anumberincrease of of 1000%.
tourists puts severe pressure on the ur-
public
public safety
safety and
and general
general events in
in cities.
cities. In
In this
this sense,
sense, the
the The
The high
high number
number of
of tourists
tourists putsrelated
puts severe pressure
severe pressure on the
on the ur-
purpose
public
purpose
of
safety
of
this
this
paper
andpaper
general is
is
to
to
present
events
present
a
in cities.
a
platform that
In this sense,
platform that
uses
the The
uses
ban
ban
infrastructure
high number ofand
infrastructure
and
tourists services
putsrelated
services severe pressure
to on the ur-
to transportation,
transportation, ur-
purpose
purpose of of this
this paper
paper is is toto present
present aa platform
platform that that uses
uses ban ban infrastructure
infrastructure and and services
services related
related to to transportation,
transportation,
Copyright
2405-8963 ©© 2016,
2016 IFAC
IFAC (International Federation of Automatic Control)257 Hosting by Elsevier Ltd. All rights reserved.
Copyright
Copyright
Peer review©
© 2016
2016
under IFAC
IFAC
responsibility of International Federation of 257
257
Automatic
Copyright © 2016 IFAC 257Control.
10.1016/j.ifacol.2016.11.121
2016 IFAC TA
258
November 6-9, 2016. Porto Alegre, Brazil Arthur Souza et al. / IFAC-PapersOnLine 49-30 (2016) 257–262

safety and water consumption. In order to handle such prehensive IT infrastructure spread throughout the city.
pressure, the Natal city council in partnership with public It needs only a computer connected to the Internet to
and private sector have engaged in an initiative to trans- make use of the API provided by the social network used.
form itself into smart city. The development of Smart City Despite these facilities, the use of social media in the
facilitates seamless access to value-added services such as context of smart cities poses several challenges. The first
access to real-time information on public transportation is related to the capability to process in real-time all posts
network, enriches tourist experiences and enhances city (about 6,000 per second Telegraph (2013)) sent on various
competitiveness (Buhalis and Amaranggana (2013)). subjects. The second challenge is related to the capability
to interpret such posts. The sentiment expressed by a tweet
The purpose of the Natal smart city approach is to accel- can be used to identify the occurrence of events. The third
erate and enable the delivery of outcomes across various challenge is to store such data in a way that it is possible
sectors, through a truly integrated approach. For instance, to generate appropriate views. In order to address these
the plan creates a network infrastructure, named Giga challenges, the next section describes a platform developed
Metropole, that has an optical backbone of approximately and implemented to support smart cities initiatives.
160km, as well as a passive network of approximately
300km to interconnect public institutions in the state of
3. PROPOSED PLATFORM
Rio Grande do Norte. More precisely, the Giga Metropole
will benefit around 650 public and private institutions
in Natal’s metropolitan area, including: 350 state and Peoples perceptions about events and issues they en-
municipal public basic education schools, police stations, counter in their cities are often embodied in the words,
universities and technical schools, teaching laboratories, terms and phrases that form their spoken language, and
and 10 hospitals. now also in their social media posts (Doran et al. (2013)).
Social media produce millions of posts being broadcast
The implantation of smart cities initiatives usually re- over time. These posts need to be analyzed in order to ex-
quires the use of a variety of sensors and applications tract situation awareness about people behavior in a smart
that are connected and can interact with each other to city. This work has adapted and optimized a set of machine
create controlled environments that can be adjusted in real learning and natural language processing techniques to
time (Hancke et al. (2012)). In many cases the potential deal with real-time and high-volume text streams. These
of smart cities depends on the density and integration of techniques are packaged in a proposed software platform.
the sensors and their applications. Examples of applica-
tions for smart cities include: implementation of vibration 3.1 Real-time Stream Infrastructure
sensors to monitor the flow of vehicles and the integrity
of the pavement (López-Higuera et al. (2011)), the use The proposed platform uses real-time processing infras-
of sensors to predict traffic conditions and optimize the tructure as a central component. This infrastructure is
public lighting in Lyon, France (Perchet (2013)), use of implemented through the use of Apache Storm (Apache
cameras around the city to identify incidents in Liverpool, (2015a)). Storm is a free and open source stream-
England (Coleman and Sim (2000)), etc. processing framework capable of processing one million
However, the use of a variety of sensors poses a number 100 byte messages per second per node (Apache (2015a)).
of challenges that are related to device management with A Storm cluster is formed by a distributed network of
restricted physical capabilities (as energy, processing, and processing nodes that process a set of data compartmental-
memory), or are related to security and privacy (par- ized in tuples. For this, three components are defined: (i)
ticularly of citizens, critical infrastructure and systems Zookeeper, (ii) Nimbus and (iii) the Supervisor. Zookeeper
information), or to dependability requirements (reliability (Apache (2015b)) is a high-performance service that co-
and availability), since defects can lead to system failures, ordinates distributed applications through configuration
resulting in financial losses and environment or people management, appointment and work services group syn-
damage. In addition to all these challenges, Brazilian cities chronization. On storms architecture, it stores the syn-
face the very limited budget to spend on ICT. chronization of data and the processing state of tuples
that will be performed at the nodes supervisor. Supervisor
In this context, some researches (Doran et al. (2013); represents the node(s) of the cluster Storm responsible for
Anantharam et al. (2015)), suggest to use social media the data processing. Finally, Nimbus is the primary node
to identify in real-time the tourists or resident perception of the cluster Storm, responsible for distribution of code
about a particular event in a city. According to Doran to be processed, assigning tasks to nodes supervisors and
et al. (2013), the use of sensors can be useful for identifying fault monitoring.
“what” is happening, but is unable to identify “why” and
“how” such an incident occurs. In this case, Doran et al. The workflow on Storm architecture is structured by
(2013) suggests using social media to capture the human Topology concept. Topologies as mechanisms for com-
perception of events. Such suggestion is based on the fact putational organization and are defined as a processing
that the perceptions of a city event are often described graph where each node in a topology contains a logic
through comments on social media. One of the main social processing and links between them. The links indicate that
media networks currently available is Twitter. Twitter has data can be exchanged between nodes. The data stream
over 500 million users and generates around 500 million is represented by Streams elements. Stream is a sequence
posts (tweets) per day (Telegraph (2013)). of tuples that can be affected by spouts components and
bolts. Spouts and bolts have interfaces that can be used
The use of social media as a source of information does by developers to implement the program logic. Spouts are
not require the deployment and maintenance of a com- elements that receive a data stream and organize them in

258
2016 IFAC TA
November 6-9, 2016. Porto Alegre, Brazil Arthur Souza et al. / IFAC-PapersOnLine 49-30 (2016) 257–262 259

of Trident. The Section 3.2 lists and explains in detail the


element of the topology.

3.2 Natural Language Infrastructure

The Natural Language Processing (NLP) (Bird et al.


(2009)), covers any type of manipulation of a computer in
a language used for communication between humans. In
our case, the main goal is to analyze tweet texts. The top-
left of this topology takes inputs from a public streaming
Twitter API (Twitter4j). This data source is called spout
(rectangle in Figure 1) and creates a stream with tuples
that contain the following data: name, age and city of
Twitter users, tweet ID, latitude, longitude, date and time
of the post, and body of the tweet. Some of these data, such
as the user’s city, are only obtained when the user allows
access to your profile on Twitter.
Tuples created by spout are passed onto units of computa-
tion called function (ellipses in Figure 1). The first function
in our topology uses the Language Detection Library to
detect the language of the post. The Language Detection
function allows performing specific actions for the feelings
of tourists from different countries that may visit a smart
city. Then, the data stream is modified by a filter (rhombus
in Figure 1) that performs a preprocessing that includes
the removal of special terms (RT, via, etc.) and hashtags
treatment according to the language. Subsequently, the
processing makes the polarization of sentences posted in
tweets. The polarization is to identify whether a post is
positive, negative or neutral sentiment on an issue. The
proposed platform uses two polarizing mechanisms: one for
English sentences and other for Portuguese. The polariza-
tion of the English sentences uses the Stanford CoreNLP
library (SNLPG (2014)). Sentences in Portuguese were
polarized by function PolarizerPortuguese, for this we used
the SentiLex (Silva et al. (2012)).
Lemmatizer functions takes into account the language
of the post to reduce inflectional forms and sometimes
derivationally related forms of a word to a common base
form lemma (Jurafsky and Martin (2009)). For instance,
in English, if confronted with the token saw, Lemmatiza-
Fig. 1. Natural Language processing Topology tion function would attempt to return either see or saw
depending on whether the use of the token was as a verb
tuples. Bolts consume stream elements from a spout or a or a noun. The Lemmantizer tool for Portuguese (NILC
bolt. Each topology can still be seen as a spouts and bolts (2015)) was used to make the textual processing lemma in
package. Portuguese (function LemmantizerPT-BR) while Stanford
CoreNLP (SNLPG (2014)) was used to process lemma in
The construction of topologies to real-time processing can
English (function LemmantizerEng). Finally, SplitWords
be performed only by defining spouts and bolts, however,
and WordsCount functions are responsible for keeping up-
for greater flexibility and efficiency, Storm also makes
to-date the number of terms (words, hashtags) mostly us
available a high-level abstraction called Trident. Trident
ed in Twitter in a similar concept to the top trends.
allows to apply group operations, aggregation, joins and
filters on streams. Thus an operation with counters, re- The data storage is executed by the States: TweetsState
moval or joint streams is facilitated. The logic processing and WordsState. TweetsState represents a tweet with all
on Trident is encapsulated in functions. Each function attributes cited above. WordsState represents a word in
is implemented to receive a modified flow from others conjunction with the number of times it has appear. These
abstractions like spouts, filters or other functions. Trident states store the processed tweets in three databases: Mon-
also defines elements to be used for persistence. The persis- goDB (MongoDB (2015)) , PostGIS (PostGis (2015)) and
tence of entities in Trident is carried out by elements called Titan (TitanDB (2015)) . MongoDB is an open-source
States. States elements store data by implementing the document database that provides high performance, high
StateUpdater interface. Figure 1 shows how Storm frames availability, and automatic scaling. MongoDB was used to
the natural language processing techniques used in this create two data collections. The first collection stores all
work. The topology defined in Figure 1 uses the concepts tweets, creating a log for tweets. In turn, the second collec-

259
2016 IFAC TA
260
November 6-9, 2016. Porto Alegre, Brazil Arthur Souza et al. / IFAC-PapersOnLine 49-30 (2016) 257–262

tion stores the most commonly used terms, their common


base form (lemma) and the time they were collected. In
order to understand people behavior, the platform uses the
PostGIS database to create a cross-reference among the
spatial information (latitude, longitude and time) gener-
ated by the Twitter posts and the list of georeferenced city
places obtained from Google Places. This cross-reference
allows identifying: (i) which are the most or the least
visited places, (ii) what are the user perception about
city places, (iii) and which kind of places are preferred
by different target groups. Finally, Titan is used to store,
in the form of a graph, the relationship between users and
tweets. Titan is a distributed graph database optimized
for storing and querying graph structures. Like Storm and
MongoDB, Titan databases can run as a cluster and can Fig. 2. Heat map representing the places where tweets were
scale horizontally to accommodate increasing data volume posted
and user load. The graph structure of Titan facilitates the
implementation of algorithms to discover the underlining less than 10 posts. The areas in yellow indicate that there
rules governing the behaviour of people in a social network, were between 11 and 999 posts. Finally, the areas in red
such as centrality and closeness. indicate that there were more than 1,000 posts during the
period of study. The analysis of Figure 2 shows that most
3.3 Analyses and Visualization of the red areas (higher density of posts) are in Brazil.
In fact, it has been found that 81.16% of all posts were
The analysis and visualization component also uses bolts originated in the Brazilian territory. Other South America
and functions to implement the data analysis. These countries accounted for 5.69% of the posts whereas the
elements have not been described in Figure 1 for the sake North and Central American accounted for 4.89% and
of simplification of the figure. Data processed is retrieved 2.75%, respectively. Despite being a continent with great
from databases and functions are used to generate the tradition in football, Europe appears only with 3.05% of
results that will be displayed by the graphical interface the posts.
of the platform. The graphical interface is a web interface These data shows an implicit feature of Twitter, i.e., to
in the form of a dashboard, implemented in HTML / be used to share information and describe day-to-day
JavaScript using the Google Maps API V3 and Google activities of people lives (Wu et al. (2014)). According to
Charts library to generate maps and graphics, respectively. (ForwadKeys and Pires&Associados (2014)), 80% of users
use Twitter to update their followers on what they are do-
4. EVALUATION ing, while the remaining 20% use Twitter to send general
background information. The small number of posts out-
4.1 Methodology side Brazil helps to confirm this notion that Twitter users
have little reciprocity in the exchange of messages among
The platform ran from the 10th of June 2014 to the 15th of users (Buhalis and Amaranggana (2013)), unlike other
July 2014. The initial date was two days before the opening social media, suggesting that the main goal of Twitter is
ceremony and the closing date was two days after the not maintaining relationships, but disseminating personal
closing ceremony. The platform collected automatically news.
all tweets containing at least one of the following terms:
Salvador, Manaus, Natal, RiodeJaneiro, Recife, SaoPaulo,
BeloHorizonte, PortoAlegre, Fortaleza, Cuiaba, Brasilia
and Curitiba. These terms corresponds to the World Cup
host cities names. The platform processed during the
assessment period approximately 7.5 million tweets.

4.2 Data Results

The proposed platform allows performing different analy-


sis and use different filters to be applied to the collected
dataset. For instance, when only the tweets with latitude
and longitude are taken into account, the number of tweets
drops from 7.5 million to 286,000 tweets. Based on this Fig. 3. Tourist nationality
subset (tweets with latitude and longitude), the proposed
platform generates a heat map (showed in Figure2) to In order to analyze the tourism demand, were defined
depict the place where most of the tweets were posted. The two additional filters. The first selected only the posts of
heat map is a data visualization tool that is used to easily Twitter users whose original location was not in Brazil
identify clusters and find where there is high concentration (based on the user profile information), and the second
of a particular activity. In the case of Figure 4, the areas restricted to tweets posted within the perimeter of the
in green indicate that there was during the period studied metropolitan region of Natal (i.e., posts sent from Natal,

260
2016 IFAC TA
November 6-9, 2016. Porto Alegre, Brazil Arthur Souza et al. / IFAC-PapersOnLine 49-30 (2016) 257–262 261

but from foreign users). These two filters have generated researches, it was found that both found a higher number
a subset that comprises 7,465 posts. Based on this subset, of reservations for the North Americans, thus validating
it was found that Natal received during the FIFA World the data generated by the platform. This shows that
Cup tourists from 25 different nationalities (see Figure 3), the use of Twitter posts with their respective analysis
such as Americans (39.60%), Mexicans (11.56%), British of tourist profiles on the perimeter of Natal metropolitan
(8.38%), Uruguayans (7.51%), and Italians (6.65%). In region presented an accurate representation of the emitters
addition, it was possible to identify the languages used when compared to other research. In the context of a smart
in Twitter posts. In total, it was identified 11 different city, this information is extremely important because it
languages: English being the most used with 41% of the can guide the definition of specific marketing plans for
posts, followed by Spanish (25%), and Italian (18%). For issuers poles. In addition, municipal managers can develop
example, it was observed a majority presence of posts campaigns to train local tour operators about the best way
in English on the date of the match between USA and to welcome tourists, for example through training of new
Ghana (i.e., 16th June). Real-time identification of tourists languages.
is another information relevant to the context of smart
cities. With such information, the public managers of the
tourism sector can improve the tourist infrastructure in 4.3 Performance Evaluation
those areas while the public safety managers can optimize
the distribution of vehicles to cover the places of greatest The use of third-party libraries for proposed platform
tourist presence. created a challenge: “How to use them together with the
Storm”. The main problem was how to access files needed
to run the libraries used in natural language processing,
like a Lemmatizer tool for Portuguese (NILC (2015))
which consists of a pre-packed jar file. To use this tool,
it is necessary to create an implementation process that
accesses files with fixed physical location, both for writing
and for reading. The solution of using a network directory
with permission of writing and reading was rejected due
to their non scalability, as well as the large amount of
network traffic resulting from the access to files. Focusing
scalability, we have created a web service to encapsulate
the Lemmatizer tool for Portuguese. The biggest concern
in the use of a web service was how this would affect the
performance of the proposed platform. We adopted the
concept of a connection pooling with the aim of always
keep available connections to web service and such that
the execution of the topology will not be interrupted.
We leveraged the work of Bedini et al. (2013) to define
the cost of processing tweets. To do so, starting from
the definition that CPU processing cost for a processing
Fig. 4. Places where the crowds were concentrated bolt per time unit is modeled as the sum of cost to
According to Figure 4, the posts main area was in the process all input tuples received per time unit and the cost
Arena das Dunas Stadium, then a live music venue near of processing all resulting output tuples, for all possible
by the stadium, followed by the hotel zone, the airport, tuple types this bolt can receive (Bedini et al. (2013)).
and lastly, FIFA Fan Fest. These results revealed that the Namely, the cost of processing is directly impacted by the
location of this music venue close to the Stadium probably values of input and output data of functions. In order
was favored in comparison with FIFA Fan Fest (official to make this fixed effect on the analysis, we selected a
music venue). Finally, it was observed that 72% of posts finite set of tweets that would be sent to the defined
in English were positive, 18% neutral and 10% negative. topology(see Fig. 1). Thus, the same tweets would be
Most of the negative posts were related to the bus service, processed in the same order among the different scenarios
since during the competition in Natal, bus drivers went on of analysis. The objective of the evaluation was to measure
strike. In order to test the validity and reliabilities of the the average processing time of tweets and the amount of
data obtained, we have compared our results with other tweets processed per second. Using the topology defined
research conducted during the same period. One of the in Figure 1 we sent the same set of tweets (109 tweets) in
studies was performed by the Spanish company Forward a fixed time rate (every 1 second) to avoid simultaneous
Data that in partnership with Pires & Associados in Brazil capture.
researched the nationalities of foreigner tourists to Brazil.
The results of (ForwadKeys and Pires&Associados (2014)) Table 1. Rate and Time Process of Tweets
shows that for the city of Natal, 29% of bookings were Storm Average Time Tweets
made by North Americans, 14% of Uruguayans and 7% by Workers Node Process / Tweet / Second
Italians. Another survey undertaken by the Association 1 2,82 sec 0,35 tweets
of Hotels (ABIH-RN (2014)) with a thousand of tourists 2 2,32 sec 0,42 tweets
identified that 25,22% of international tourists were North 3 2,12 sec 0,47 tweets
Americans. Despite there being no consensus among the 4 2,39 sec 0,41 tweets

261
2016 IFAC TA
262
November 6-9, 2016. Porto Alegre, Brazil Arthur Souza et al. / IFAC-PapersOnLine 49-30 (2016) 257–262

The result shown in table 1 reveals that the maximum Bird, S., Klein, E., and Loper, E. (2009). Natural language
average time for processing the tweets was approximately processing with Python. ” O’Reilly Media, Inc.”.
2.5 seconds. Taking into account that the average runtime Buhalis, D. and Amaranggana, A. (2013). Smart tourism
of natural language processing libraries is 5 seconds when destinations. In Information and Communication Tech-
the storm is not used, we concluded that the use of storm nologies in Tourism 2014, 553–564. Springer.
improves the performance and availability of the proposed Caragliu, A., Del Bo, C., and Nijkamp, P. (2011). Smart
platform. It was also observed that the use of web service cities in europe. Journal of urban technology, 18(2), 65–
did not impact on the platform performance. 82.
Coleman, R. and Sim, J. (2000). you’ll never walk alone:
5. RELATED WORK Cctv surveillance, order and neo-liberal rule in liverpool
city centre1. The British journal of sociology, 51(4),
Social networks are widely used to collect information 623–639.
about people and events. For example, (Anantharam et al. Doran, D., Gokhale, S., and Dagnino, A. (2013). Human
(2015)) and (Telegraph (2013)) describe approaches that sensing for smart cities. In Proceedings of the 2013
capture tweets on a certain radius starting from an ap- IEEE/ACM International Conference on Advances in
plication point. Then those approaches used probabilistic Social Networks Analysis and Mining, 1323–1330. ACM.
models to identify problems relating to traffic and on other ForwadKeys and Pires&Associados (2014). Fifa world
events. Our approach is based on the solutions defined cup shakes brazilian tourism trends. Availabe at
by (Anantharam et al. (2015)) and (Telegraph (2013)) http://tinyurl.com/jraruc6.
describes additional contribution as a platform that sup- Hancke, G.P., Hancke Jr, G.P., et al. (2012). The role of
ports the treatment and identification of events in real time advanced sensing in smart cities. Sensors, 13(1), 393–
using multiple languages and supporting analysis and geo- 425.
referenced data visualization. Jurafsky, D. and Martin, J.H. (2009). Speech & language
processing. 2ed. Pearson Education India.
6. CONCLUSION Komninos, N., Pallot, M., and Schaffers, H. (2013). Special
issue on smart cities and the future internet in europe.
This paper presented a platform that uses social media Journal of the Knowledge Economy, 4(2), 119–134.
as a data source to support the decisions of policymakers López-Higuera, J.M., Rodriguez Cobo, L., Incera, A.Q.,
in the context of smart city initiatives. The Social Smart and Cobo, L.R. (2011). Fiber optic sensors in structural
City platform aims to enhance citizens experience through health monitoring. Lightwave Technology, Journal of,
collecting and analysing in real-time Twitter posts. This 29(4), 587–608.
paper detailed the topology responsible for collecting, MongoDB (2015). Mongodb database. Availabe at
processing, and storing the Twitter posts. Some of the https://www.mongodb.org/.
data gathered by the platform was analysed to show how NILC, I.C.f.C.L.U. (2015). Lemmatizer for portuguese.
the platform can be used by a smart city initiative. The Availabe at http://tinyurl.com/jzsxxv3.
results showed that it is possible to identify the nationality, Perchet, A.C. (2013). Reduce traffic congestion and
the language of the posts, the sentiment, and the points improve mobility information services in urban places.
of agglomeration of visitors during a big event. Overall Availabe at http://bit.ly/XtEeqm/.
results suggest that data collected from Twitter posts can PostGis (2015). Spatial and geographic objects for post-
be applicable to the effective management of smart city gresql. Availabe at http://postgis.net/.
initiatives Silva, M.J., Carvalho, P., and Sarmento, L. (2012). Build-
ACKNOWLEDGMENTS ing a sentiment lexicon for social judgement mining. In
Computational Processing of the Portuguese Language,
This work was partially supported by the Smart Metropo- 218–228. Springer.
lis Project. SNLPG, T.S.N.L.P.G. (2014). Stanford corenlp:
a suite of core nlp tools. Availabe at
REFERENCES http://nlp.stanford.edu/software/corenlp.shtml.
Telegraph, D. (2013). Twitter in numbers. Availabe at
ABIH-RN (2014). Turista na copa em natal. Availabe at http://tinyurl.com/pdc5f6c.
http://tinyurl.com/jabtme2. TitanDB (2015). Titan distributed graph database. Avail-
Anantharam, P., Barnaghi, P., Thirunarayan, K., and abe at http://thinkaurelius.github.io/titan/.
Sheth, A. (2015). Extracting city traffic events Wu, F., Lei, T.K.H., Li, Z., and Han, J. (2014). Movemine
from social streams. ACM Trans. Intell. Syst. Tech- 2.0: Mining object relationships from movement data.
nol., 6(4), 43:1–43:27. doi:10.1145/2717317. URL Proceedings of the VLDB Endowment, 7(13), 1613–1616.
http://doi.acm.org/10.1145/2717317.
Apache, S.F. (2015a). Apache storm. Availabe at
https://storm.apache.org/.
Apache, S.F. (2015b). Apache storm. Availabe at
https://storm.apache.org/.
Bedini, I., Sakr, S., Theeten, B., Sala, A., and Cogan, P.
(2013). Modeling performance of a parallel streaming
engine: bridging theory and costs. In Proceedings of
the 4th ACM/SPEC International Conference on Per-
formance Engineering, 173–184. ACM.

262

You might also like