Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

Analyzing Tweets to Aid Situational Awareness

Tim van Kasteren, Birte Ulrich, Vignesh Srinivasan, and Maria Niessen

AGT group (R&D) GmbH,


Hilpertstr. 35, 64295 Darmstadt, Germany
{tkasteren,bulrich,vsrinivasan,mniessen}@agtinternational.com
http://www.agtinternational.com

Abstract. Social media networks can be used to gather near real-time


information about safety and security events. In this paper we analyze
Twitter data that was captured around fifteen real world safety and
security events and use a number of analytical tools to help understand
the effectiveness of certain features for event detection and to study how
this data can be used to aid situational awareness.

Keywords: Social Media Analytics, Situational Awareness

1 Introduction

The popularity of social media networks provides us with a constant flow of in-
formation which can be used as a low-cost global sensing network for gathering
near real-time information about safety and security events. This information
can be very valuable to emergency response teams, who rely on an accurate sit-
uational awareness picture of the emergency at hand. Obtaining a more accurate
situational awareness picture allows a better and faster response and results in
less damage and casualties.
The microblogging service Twitter has become very popular and has been
reported to sometimes spread news before traditional news channels [1, 5]. How-
ever, the data obtained from Twitter is very diverse (i.e. few constraints on
what users can post) and it is not well understood which information relevant
to emergencies is present in the data and at which point in time.
In this paper we analyze Twitter data that was captured around fifteen real
world safety and security events from four categories (accidents, natural disas-
ters, crowd gatherings and terrorist attacks). Using a number of analytical tools
we identify recurring patterns and event specific characteristics that provide a
basis for creating automated classification algorithms and show the possibilities
of using this data to aid situational awareness.
The remainder of this paper is organized as follows: Section 2 discusses related
work. In Section 3, we present the data and in Section 4, the analysis of the data.
Finally, Section 5 concludes our findings.
2 Tim van Kasteren, et al.

2 Related Work
In previous work, Twitter has been analyzed to understand its usage in emer-
gency situations. Hughes et al. provide an analysis of Twitter usage during two
emergency events and two political conventions by comparing the amount of
tweets per day and the relative number of reply tweets to general Twitter usage.
They show a difference in statistics when an event is taking place [4]. Vieweg et
al. analyze which information can be extracted from tweets to enhance the situa-
tional awareness during natural disasters. On the basis of two datasets, one for a
flood and one for a wildfire, the authors show that the percentage of geo-location
usage, location references, situational updates, and retweets is higher for a fast
spreading wildfire than for a slow rising flood. This indicates that the features
of the datasets reflect the actual emergency situation (i.e. warning, impact or
recovery phase) and the type of disaster itself (unexpected fast or predictable
slow event) [6].
Twitter is also used to automatically detect safety and security events from
data. Walther et al. present an event detection system that uses geospatial infor-
mation from tweets to automatically identify and classify events. Their approach
is evaluated on events captured from real world Twitter data [7]. Sakaki et al.
give an example of how semantic analysis of tweets can be used to detect natural
disasters like earthquakes and typhoons. With the geo-location information of
the tweets they estimate the location of the earthquake and typhoon [5]. Bouma
et al. present a method for automatic anomaly detection in Twitter data using
a correlation analysis of a number of variables, such as sentiment, retweets, post
frequencies and other meta data [2].
In this work we analyze tweets for a large number of events from different
categories to find features that consistently help detect events.

3 Data
We collected data for fifteen safety and security events from four categories: ac-
cidents (e.g. train, plane or car crash), natural disasters (e.g. flood, earthquake),
crowd gatherings (e.g. festival, demonstration) and terrorist attacks (including
mass shootings). Accidents and terrorist attacks are unexpected and unplanned
events, natural disasters are sometimes predicted such as in the Acapulco floods
and crowd gaterings are planned in case of festivals, but unplanned in the case
of raids. The data was gathered by monitoring the news for reported events and
collecting one to two weeks of tweets using the Twitter REST API (version 1.1).
The tweets were collected using an empty query (i.e. no keyword filtering) and
a geocode centered around the event with a 15 mile radius. Details of the events
can be found in Table 1.

4 Analysis
The collected data was analyzed using counts, Twitter-specific, text-based, image-
based and location-based tools. Our goal was to get a better understanding of
Analyzing Tweets to aid Situational Awareness 3

Description Category Tweets Days Dates


Antwerp Tomorrowland festival Crowd gathering 110.161 6 24-29 July, 2013
Zurich ZuriFascht festival Crowd gathering 41.102 4 06-09 July, 2013
Santiago train crash Accident 156.881 7 23-29 July, 2013
Leiden factory fire Accident 184.063 6 14-19 August, 2013
Chicago car crash Accident 16.343 5 14-18 August, 2013
Heidelberg shooting Terrorist 21.043 5 16-20 August, 2013
Vadodara building collapse Accident 11.501 4 25-28 August, 2013
Cairo raids Crowd gathering 4.641.634 11 09-19 August, 2013
Frankfurt tree on railroad Accident 125.833 7 30 Aug.-05 Sept., 2013
Zevenaar factory fire Accident 156.129 8 02-09 September, 2013
Washington shooting Terrorist 997.001 3 15-17 September, 2013
Acapulco floods Natural disaster 202.915 12 11-22 September, 2013
Nairobi mall attack Terrorist 1.212.249 9 16-24 September, 2013
India cyclone Natural disaster 1.684 6 07-12 October, 2013
Philipines earthquake Natural disaster 597.043 10 08-17 October, 2013
Table 1. List of events for which Twitter data was captured.

how people tweet about an event and what information can be extracted. These
results help to get a better understanding of the limitations of Twitter data
and help to obtain a better understanding of the effectiveness of certain features
when creating event detection algorithms. We study the data collected prior to
the event, during the event and after the event, to identify the potential of using
this data for prediction, detection and investigation, respectively.

4.1 Counts
Our first analysis counts the number of tweets per hour. These counts can be
plotted over time (Fig. 1a) and show a very clear recurrent pattern when no
event is taking place, due to the common daily activities of users. This has also
been reported in previous work [3].
In the majority of events we recorded data for, we see a very clear increase
in the number of tweets shortly after the event takes place. The increase in the
number of tweets clearly reflects the impact the event has on the public and
is strongly related to the severity of the event, such as the number of people
being affected by a natural disaster or the number of deaths during an accident
or terrorist attack. A count-based analysis can therefore be indicative of a big
event, but does not help in determining the category of the event.

4.2 Twitter-specific analysis


Twitter specific analysis was done by counting the number of tweets over time
and getting the relative occurrence of hashtags, retweets, replies and mentions
in the tweets. These measures provide indications of how the users are using
the Twitter network. Hashtags generally identify a certain concept related to
a message. Retweets are a Twitter specific mechanism that allows users to re-
post a previously posted message and therefore help broadcast the information
contained in the tweet. Replies allow users to respond to a previously posted
message and are mainly used in discussions. Finally, mentions allow users to
mention other users within their messages and indicate social references.
4 Tim van Kasteren, et al.

Comparing these measures during the event to before the event, we observe an
increase in retweets and mentions and often a decrease in replies. For events that
cause a significant increase in the total number of tweets, and thus a high social
impact, this effect becomes very substantial. For the train crash in Santiago, for
example, the usage of retweets rises from 22% before the event to 69% during the
event, the usage of mentions grows from 35% to 72% and replies drop from 26%
to 10%. The rise in retweets can be explained by the desire of people to broadcast
information of severe events which they have not witnessed. The drop in replies
further supports this, since people are not directing their communication to one
specific receiver.
Overall, we see some significant changes in the Twitter specific measures
during events, especially when events are completely unexpected like severe ac-
cidents and terrorist attacks.

4.3 Text analysis


The text-based analysis operates on the message contained in a tweet. Basic
cleaning operations were performed such as converting the text to lower case
and removing punctuation marks and stop words. To analyze the resulting data
we calculated the most frequent hashtags, words, bi-grams and tri-grams. The
resulting most frequent words give an indication of the content of the majority
of the tweets and help to identify the nature of events. Moreover, we calculated
the TF-IDF of the most frequent words over time: the term frequency (TF) was
determined per hour for all tweets around the time of the event while the inverse
document frequency (IDF) was calculated for a week of Twitter data at the same
location but weeks after the event. Hence the TF-IDF gives a normalized view
of which keywords are used out of the ordinary and can be used to detect an
event.
For the analyzed events, the bigger events cause twitter users to converge
to a few hash tags to refer to the event and give rise to very frequent hashtags
describing the event, such as #tomorrowland (Antwerp) or #accidente (Santiago
train accident). However, the location of an event, especially of accidents, are
mostly used as reference by Twitter users during and after the event, for example
(#santiago, #zevenaar, #leiderdorp (Leiden fire), and #dossenheim (Heidelberg
shooting).
Smaller events do not always show up in the frequent hashtags, while they
can be found in the tweet texts and the most frequent words and n-grams. The
most frequent words and n-grams can be informative from an investigation point
of view, especially for events with a high social impact. For example, the most
frequent words during and after the Santiago train accident all relate to the
accident, but there is a shift in focus from emergency response (donar sangre
(donate blood), hospital, extrema necesidad (dire need)) to the effects of the
accident a few days later (necesitan psicologos (psychologists needed)). TF-IDF
is most informative about the shift in focus of tweet content compared to normal
situations. Figure 1 shows (a) the raw count of tweets around Zevenaar and (b)
the TF-IDF of three words: lekker, a very common Dutch word approximately
Analyzing Tweets to aid Situational Awareness 5

(a) 2500

2000

Total Counts
1500

1000

500

0
3 Sept 4 Sept 5 Sept 6 Sept 7 Sept 8 Sept 9 Sept

(b) 12

10

0
3 Sept 4 Sept 5 Sept 6 Sept 7 Sept 8 Sept 9 Sept

Fig. 1. (a) The total counts of tweets before, during (between approximate 6 am and
12 am on the 6th of September), and after the fire in Zevenaar and (b) the TF-IDF
measure for three words: lekker (nice), brand (fire), and Zevenaar (the town).

following the daily pattern of the counts, and brand and zevenaar, which show
a major peak around the time of the start of the fire that is hardly visible in the
raw counts.

4.4 Image analysis

For our image-based analysis we extracted all the URLs from the tweets and
downloaded images using an automated script. We manually inspected the col-
lected images to determine which were related to the event.
For most events no event-related images were available before the event.
The exceptions were predicted natural disasters, where we found images which
indicate warnings. For example, in the case of the Acapulco floods a few images
show the prediction from weather institutes of storms over the region.
The percentage of event-related images shows a sharp increase during the
event, in comparison with before the event. Almost half of the images collected
during the event are about the event, indicating a strong interest among users
to share news or information about the event. Although many images posted
are duplicates, rather than unique ones, some of the images posted have a lot of
potential to assist an operator in assessing an emergency situation. In particular,
these images might show more information than what can be obtained from an
emergency phone call. For example, in the case of Acapulco, we found images
of people plundering local stores, as well as damaged and obstructed roads.
Especially during the event, we see that the number of posted images related to
6 Tim van Kasteren, et al.

the event is relatively high, which is the time at which the need for information
is most critical.
After the event the percentage of images about the event stays high. In partic-
ular, heroic deeds (e.g. someone rescuing a child) captured in images are widely
shared and retweeted. The number of images related to the event decreases with
time after the event.

4.5 Location analysis


The location-based analysis relies on the GPS coordinates that Twitter users
can submit together with a tweet. On average approximately 10% of all collected
tweets contain GPS coordinates. We analyzed the number of tweets as a function
of the distance to the event. We see a clear increase in the number of tweets close
to the location the event took place in during and after the event, for most of
the events we analyzed. This indicates that events can be detected based on
geo-spatial clusters of tweets.

5 Conclusion
We analyzed fifteen safety and security events from four categories using various
analytical tools. Our findings show the potential of using Twitter to aid situ-
ational awareness and help understand the effectiveness of certain features for
event detection. Counts, Twitter specific features, highly retweeted images and
tweets with GPS coordinates are useful for detecting events with a high social
impact, but lack in volume for detecting smaller events. Only the text-based
feature TF-IDF is sensitive enough to detect both big and small impact events.

References
1. H. Becker, M. Naaman, and L. Gravano. Beyond trending topics: Real-world event
identification on twitter. In ICWSM, 2011.
2. H. Bouma, S. Raaijmakers, A. Halma, and H. Wedemeijer. Anomaly detection for
internet surveillance. In SPIE Defense, Security, and Sensing, pages 840807840807.
International Society for Optics and Photonics, 2012.
3. M. Cataldi, L. Di Caro, and C. Schifanella. Emerging topic detection on twitter
based on temporal and social terms evaluation. In Proceedings of the Tenth Inter-
national Workshop on Multimedia Data Mining, page 4. ACM, 2010.
4. A. L. Hughes and L. Palen. Twitter adoption and use in mass convergence and
emergency events. Int. Journal of Emergency Management, 6(3):248260, 2009.
5. T. Sakaki, M. Okazaki, and Y. Matsuo. Earthquake shakes twitter users: real-time
event detection by social sensors. In Proc. of the 19th int. conf. on WWW, 2010.
6. S. Vieweg, A. L. Hughes, K. Starbird, and L. Palen. Microblogging during two
natural hazards events: what twitter may contribute to situational awareness. In
Proc. of the SIGCHI Conf. on Human Factors in Comp. Sys., 2010.
7. M. Walther and M. Kaisser. Geo-spatial event detection in the twitter stream. In
Advances in Information Retrieval, pages 356367. Springer, 2013.

You might also like