Professional Documents
Culture Documents
Why We Twitter: Understanding Microblogging Usage and Communities
Why We Twitter: Understanding Microblogging Usage and Communities
Why We Twitter: Understanding Microblogging Usage and Communities
General Terms
Social Network Analysis, User Intent, Microblogging, Social
Media
1. INTRODUCTION
Microblogging is a relatively new phenomenon defined as “a
form of blogging that lets you write brief text updates (usu-
ally less than 200 characters) about your life on the go and
send them to friends and interested observers via text mes-
saging, instant messaging (IM), email or the web.” 1 . It is
1
http://en.wikipedia.org/wiki/Micro-blogging
Permission to make digital or hard copies of all or part of this work for Figure 1: An example Twitter homepage with up-
personal or classroom use is granted without fee provided that copies are dates talking about daily experiences and personal
not made or distributed for profit or commercial advantage and that copies interests.
bear this notice and the full citation on the first page. To copy otherwise, or
republish, to post on servers or to redistribute to lists, requires prior specific 2
http://www.twitter.com
permission and/or a fee. 3
Joint 9th WEBKDD and 1st SNA-KDD Workshop ’07 , August 12, 2007 , http://www.jaiku.com
4
San Jose, California , USA . Copyright 2007 ACM 1-59593-444-8...$5.00. http://www.pownce.com
Compared to regular blogging, microblogging fulfills a need to understand the user intentions and community structure
for an even faster mode of communication. By encourag- in microblogging. From our analysis, we find that the main
ing shorter posts, it lowers users’ requirement of time and types of user intentions are: daily chatter, conversations,
thought investment for content generation. This is also one sharing information and reporting news. Furthermore, users
of its main differentiating factors from blogging in general. play different roles of information source, friends or informa-
The second important difference is the frequency of update. tion seeker in different communities.
On average, a prolific bloger may update her blog once ev-
ery few days; on the other hand a microblogger may post The paper is organized as follows: in Section 2, we describe
several updates in a single day. the dataset and some of the properties of the underlying
social network of Twitter users. Section 3 provides an anal-
With the recent popularity of Twitter and similar microblog- ysis of Twitter’s social network and its spread across geogra-
ging systems, it is important to understand why and how phies. Next, in Section 4 we describe aggregate user behav-
people use these tools. Understanding this will help us ior and community level user intentions. Section 5 provides
evolve the microblogging idea and improve both microblog- a taxonomy of user intentions. Finally, we summarize our
ging client and infrastructure software. We tackle this prob- findings and conclude with Section 6.
lem by studying the microblogging phenomena and analyz-
ing different types of user intentions in such systems.
2. DATASET DESCRIPTION
Much of research in user intention detection has focused on Twitter is currently one of the most popular microblogging
understanding the intent of a search queries. According to platforms. Users interact with this system by either using a
Broder [5], the three main categories of search queries are Web interface, IM agent or sending SMS updates. Members
navigational, informational and transactional. Understand- may choose to make their updates public or available only to
ing the intention for a search query is very different from friends. If user’s profile is made public, her updates appear
user intention for content creation. In a survey of bloggers, in a “public timeline” of recent updates. The dataset used
Nardi et al. [26] describe different motivations for “why in this study was created by monitoring this public timeline
we blog”. Their findings indicate that blogs are used as a for a period of two months starting from April 01, 2007 to
tool to share daily experiences, opinions and commentary. May 30, 2007. A set of recent updates were fetched once
Based on their interviews, they also describe how bloggers every 30 seconds. There are a total of 1,348,543 posts from
form communities online that may support different social 76,177 distinct users in this collection.
groups in real world. Lento et al. [21] examined the im-
portance of social relationship in determining if users would Twitter allows a user, A, to “follow” updates from other
remain active in a blogging tool called Wallop. A user’s re- members who are added as “friends”. An individual who is
tention and interest in blogging could be predicted by the not a friend of user A but “follows” her updates is known as
comments received and continued relationship with other a “follower”. Thus friendships can either be reciprocated or
active members of the community. Users who are invited by one-way. By using the Twitter developer API5 , we fetched
people with whom they share pre-exiting social relationships the social network of all users. We construct a directed
tend to stay longer and active in the network. Moreover, cer- graph G(V, E), where V represents a set of users and E
tain communities were found to have a greater retention rate represents the set of “friend” relations. A directed edge e
due to existence of such relationships. Mutual awareness in exists between two users u and v if user u declares v as
a social network has been found effective in discovering com- a friend. There are a total of 87,897 distinct nodes with
munities [23]. 829,053 friend relation between them. There are more nodes
in this graph due to the fact that some users discovered
In computational linguists, researchers have studied the prob- though the link structure do not have any posts during the
lem of recognizing the communicative intentions that un- duration in which the data was collected. For each user, we
derlie utterances in dialog systems and spoken language in- also obtained their profile information and mapped their
terfaces. The foundations of this work go back to Austin location to a geographic coordinate, details of which are
[2], Stawson [32] and Grice [14]. Grosz [15] and Allen [1] provided in the following section.
carried out classic studies in analyzing the dialogues be-
tween people and between people and computers in coopera- 3. MICROBLOGGING IN TWITTER
tive task oriented environments. More recently, Matsubara This section describes some of the characteristic properties
[24] has applied intention recognition to improve the per- of Twitter’s Social Network including it’s network topology
formance of automobile-based spoken dialog system. While and geographical distribution.
their work focusses on the analysis of ongoing dialogs be-
tween two agents in a fairly well defined domain, studying
user intention in Web-based systems requires looking at both 3.1 Growth of Twitter
the content and link structure. Since Twitter provides a sequential user and post identifier,
we can estimate the growth rate of Twitter. Figure 2 shows
In this paper, we describe how users have adopted a spe- the growth rate for users and Figure 3 shows the growth rate
cific microblogging platform, Twitter. Microblogging is rel- for posts in this collection. Since, we do not have access to
atively nascent, and to the best of our knowledge, no large historical data, we can only observe its growth for a two
scale studies have been done on this form of communication month time period. For each day we identify the maximum
and information sharing. We study the topological and geo- value for the user identifier and post identifier as provided
graphical structure of Twitter’s social network and attempt 5
http://twitter.com/help/api
Twitter Growth Rate (Users) Twitter Growth Rate (Posts)
6500000 70000000
Growth of Users Growth of Posts
65000000
6000000
60000000
5500000 55000000
50000000
Max UserID
Max PostID
5000000
45000000
40000000
4500000
35000000
4000000 30000000
25000000
3500000
20000000
3000000 15000000
1-Apr 7-Apr 14-Apr 21-Apr 29-Apr 5-May 11-May 1-Apr 7-Apr 14-Apr 21-Apr 29-Apr 5-May 11-May
April - May 2007 April - May 2007
Figure 2: Twitter User Growth Rate. Figure shows Figure 3: Twitter Posts Growth Rate. Figure shows
the maximum userid observed for each day in the the maximum post ID observed for each day in the
dataset. After an initial period of interest around dataset. Although the rate at which new users are
March 2007, the rate at which new users are joining joining the network has slowed, the number of posts
Twitter has slowed. are increasing at a steady rate.
by the Twitter API. By observing the change in these val- has been shown that many properties including the degree
ues, we can roughly estimate the growth of Twitter. It is distributions on the Web follow a power law distribution
interesting to note that even though Twitter launched in [19, 6]. Recent studies have confirmed that some of these
2006, it really became popular soon after it won the South properties also hold true for the blogosphere [31].
by SouthWest (SXSW) conference Web Awards6 in March,
2007. Figure 2 shows the initial growth in users as a result Property Twitter WWE
of interest and publicity that Twitter generated at this con- Total Nodes 87897 143,736
ference. After this period, the rate at which new users are Total Links 829247 707,761
joining the network has slowed. Despite the slow down, the Average Degree 18.86 4.924
number of new posts is constantly growing, approximately Indegree Slope -2.4 -2.38
doubling every month indicating a steady base of users gen- Outdegree Slope -2.4 NA
erating content. Degree correlation 0.59 NA
Diameter 6 12
Following Kolari et al. [18], we use the following definition Largest WCC size 81769 107,916
of user activity and retention: Largest SCC size 42900 13,393
Clustering Coefficient 0.106 0.0632
Definition A user is considered active during a week if he Reciprocity 0.58 0.0329
or she has posted at least one post during that week.
Table 1: Twitter Social Network Statistics
Definition An active user is considered retained for the
given week, if he or she reposts at least once in the following
X weeks. Table 1 describes some of the properties for Twitter’s social
network. We also compare these properties with the corre-
Due to the short time period for which the data is available sponding values for the Weblogging Ecosystems Workshop
and the nature of Microblogging we decided to use X as a (WWE) collection [4] as reported by Shi et al. [31]. Their
period of one week. Figure 4 shows the user activity and study shows a network with high degree correlation (also
retention for the duration of the data. About half of the shown in Figure 6) and high reciprocity. This implies that
users are active and of these half of them repost in the fol- there are a large number of mutual acquaintances in the
lowing week. There is a lower activity recorded during the graph. New Twitter users often initially join the network
last week of the data due to the fact that updates from the on invitation from friends. Further, new friends are added
public timeline are not available for two days during this to the network by browsing through user profiles and adding
period. other known acquaintances. High reciprocal links has also
been observed in other online social networks like Livejour-
nal [22]. Personal communication and contact network such
3.2 Network Properties as cell phone call graphs [25] also have high degree corre-
The Web, blogosphere, online social networks and human
lation. Figure 5 shows the cumulative degree distributions
contact networks all belong to a class of “scale-free net-
[27, 8] of Twitter’s network. It is interesting to note that
works” [3] and exhibit a “small world phenomenon” [33]. It
the slopes γin and γout are both approximately -2.4. This
6
http://2007.sxsw.com/ value for the power law exponent is similar to that found for
Indegree Distribution of Twitter Social Network
1
10
Indegree
User Activity and Retention γin = −2.412
0
10
60000
50000
−1
10
Number of Users
40000
P(x≥ K)
−2
30000 10
20000
−3
10
10000
−4
0 10
1 2 3 4 5 6 7 8
Week
Asia 6753
−1
Oceania 910 P(x≥ K)
10
Others 78
−3
Unknown 38994 10
−4
Table 2: Table shows the geographical distribution 10
45° N
Table 4: Network properties of social networks
within a continent. Europe and Asia have a higher
reciprocity indicating closer ties in these social net-
°
0
works. (N.A = North America)
180° W 135° W 90° W 45° W 0° 45° E 90° E 135° E 180° E
°
45 S
LL = 2 ∗ (a ∗ log( E1
a
) + b ∗ log( E2
b
))
• Daily Chatter Most posts on Twitter talk about daily
where E1 = c ∗ a+b
and E2 = d ∗ a+b routine or what people are currently doing. This is the
c+d c+d
largest and most common user of Twitter
Figure 12 shows the most descriptive terms for each day
• Conversations In Twitter, since there is no direct way
of the week. Some of the extracted terms correspond to
for people to comment or reply to their friend’s posts,
recurring events and activities significant for a particular
early adopters started using the @ symbol followed by
day of the week for example “school” or “party”. Other
a username for replies. About one eighth of all posts
terms are related to current events like “easter” and “EMI”.
in the collection contain a conversation and this form
of communication was used by almost 21% of users in
the collection.
Key Terms
going:222 just:218
work:170 night:143
bed:140 time:139
good:137 com:130
lost:124 day:122
home:112
listening:111 today:100
new:98 got:97
gspn:92
watching:92 kids:88
morning:81 twitter:79
getting:77 tinyurl:75
lunch:74 like:72
podcast:72 watch:71
ready:70 tv:69
need:64 live:61
tonight:61 trying:58
love:58 cliff:58
dinner:56
Key Terms
just:312 com:180
work:180 time:149
listening:147 home:145
going:139 day:134
got:126 today:124
good:116 bed:114
night:112 tinyurl:97
getting:88 podcast:87
dinner:85 watching:83
like:78 mass:78
lunch:72 new:72
ll:70 tomorrow:69
ready:64 twitter:62
working:61 tonight:61
morning:58 need:58
great:58 finished:55
tv:54
• Sharing information/URLs About 13% of all the posts Despite infrequent updates, certain users have a large
in the collection contain some URL in them. Due to number of followers due to the valuable nature of their
the small character limit a URL shortening service like updates. Some of the information sources were also
TinyURL9 is frequently used to make this feature fea- found to be automated tools posting news and other
sible. useful information on Twitter.
• Reporting news Many users report latest news or com- • Friends Most relationships fall into this broad cate-
ment about current events on Twitter. Some auto- gory. There are many sub-categories of friendships
mated users or agents post updates like weather re- on Twitter. For example a user may have friends,
ports and new stories from RSS feeds. This is an in- family and co-workers on their friend or follower lists.
teresting application of Twitter that has evolved due Sometimes unfamiliar users may also add someone as
to easy access to the developer API. a friend.
Figure 10: Example Communities in Twitter Social Network. Key terms indicate that these communities
are talking mostly about technology. The user Scobliezer connects multiple communities in the network.
update your personal network on a holiday plan or a post in improving them and adding new features that would re-
to share an interesting link with co-workers. Multiple user tain more users. In this work, we have identified different
intentions have led to some users feeling overwhelmed by types of user intentions and studied the community struc-
microblogging services [20]. Based on our analysis of user tures. Currently, we are working on automated approaches
intentions, we believe that the ability to categorize friends of detecting user intentions with related community struc-
into groups (e.g. family, co-workers) would greatly benefit tures.
the adoption of microblogging platforms. In addition fea-
tures that could help facilitate conversations and sharing 7. ACKNOWLEDGEMENTS
news would be beneficial. We would like to thank Twitter Inc. for providing an API
to their service and Pranam Kolari, Xiaolin Shi and Amit
6. CONCLUSION Karandikar for their suggestions.
In this study we have analyzed a large social network in a
new form of social media known as microblogging. Such net- 8. REFERENCES
works were found to have a high degree correlation and reci- [1] J. Allen. Recognizing intentions from natural language
procity, indicating close mutual acquaintances among users. utterances. Computational Models of Discourse, pages
While determining an individual user’s intention in using 107–166, 1983.
such applications is challenging, by analyzing the aggregate [2] J. Austin. How to Do Things with Words. Oxford
behavior across communities of users, we can describe the University Press Oxford, 1976.
community intention. Understanding these intentions and [3] A.-L. Barabasi and R. Albert. Emergence of scaling in
learning how and why people use such tools can be helpful random networks. Science, 286:509, 1999.
Weblogs and Social Media (ICWSM 2007), March
2007.
[19] R. Kumar, P. Raghavan, S. Rajagopalan, and
A. Tomkins. Trawling the Web for emerging
cyber-communities. Computer Networks (Amsterdam,
Netherlands: 1999), 31(11–16):1481–1493, 1999.
[20] A. Lavallee. Friends swap twitters, and frustration -
new real-time messaging services overwhelm some
users with mundane updates from friends, March 16,
2007.
[21] T. Lento, H. T. Welser, L. Gu, and M. Smith. The ties
that blog: Examining the relationship between social
ties and continued participation in the wallop
Figure 12: Distinctive terms for each day of the week weblogging system, 2006.
ranked using Log-likelihood ratio. [22] D. Liben-Nowell, J. Novak, R. Kumar, P. Raghavan,
and A. Tomkins. Geographic routing in social
networks. Proceedings of the National Academy of
[4] Blogpulse. The 3rd annual workshop on weblogging Sciences,, 102(33):11623–1162, 2005.
ecosystem: Aggregation, analysis and dynamics, 15th [23] Y.-R. Lin, H. Sundaram, Y. Chi, J. Tatemura, and
world wide web conference, May 2006. B. Tseng. Discovery of Blog Communities based on
[5] A. Broder. A taxonomy of web search. SIGIR Forum, Mutual Awareness. In Proceedings of the 3rd Annual
36(2):3–10, 2002. Workshop on Weblogging Ecosystem: Aggregation,
[6] A. Broder, R. Kumar, F. Maghoul, P. Raghavan, Analysis and Dynamics, 15th World Wid Web
S. Rajagopalan, R. Stata, A. Tomkins, and J. Wiener. Conference, May 2006.
Graph structure in the web. In Proceedings of the 9th [24] S. Matsubara, S. Kimura, N. Kawaguchi,
international World Wide Web conference on Y. Yamaguchi, and Y. Inagaki. Example-based Speech
Computer networks : the international journal of Intention Understanding and Its Application to In-Car
computer and telecommunications netowrking, pages Spoken Dialogue System. Proceedings of the 19th
309–320, Amsterdam, The Netherlands, The international conference on Computational
Netherlands, 2000. North-Holland Publishing Co. linguistics-Volume 1, pages 1–7, 2002.
[7] A. Clauset, M. E. J. Newman, and C. Moore. Finding [25] A. A. Nanavati, S. Gurumurthy, G. Das,
community structure in very large networks. Physical D. Chakraborty, K. Dasgupta, S. Mukherjea, and
Review E, 70:066111, 2004. A. Joshi. On the structural properties of massive
[8] A. Clauset, C. R. Shalizi, and M. E. J. Newman. telecom call graphs: findings and implications. In
Power-law distributions in empirical data, Jun 2007. CIKM ’06: Proceedings of the 15th ACM international
[9] Comscore. http://www.usatoday.com/tech/ conference on Information and knowledge
webguide/2007-05-28-social-sites_N.htm. management, pages 435–444, New York, NY, USA,
[10] I. Derenyi, G. Palla, and T. Vicsek. Clique percolation 2006. ACM Press.
in random networks. Physical Review Letters, [26] B. A. Nardi, D. J. Schiano, M. Gumbrecht, and
94:160202, 2005. L. Swartz. Why we blog. Commun. ACM,
[11] D. Donato, L. Laura, S. Leonardi, and S. Millozzi. 47(12):41–46, 2004.
Large scale properties of the webgraph. European [27] M. E. J. Newman. Power laws, pareto distributions
Physical Journal B, 38:239–243, March 2004. and zipf’s law. Contemporary Physics, 46:323, 2005.
[12] G. W. Flake, S. Lawrence, C. L. Giles, and F. Coetzee. [28] G. Palla, I. Derenyi, I. Farkas, and T. Vicsek.
Self-organization of the web and identification of Uncovering the overlapping community structure of
communities. IEEE Computer, 35(3):66–71, 2002. complex networks in nature and society. Nature,
[13] M. Girvan and M. E. J. Newman. Community 435:814, 2005.
structure in social and biological networks, Dec 2001. [29] J. Pontin. From many tweets, one loud voice on the
[14] H. Grice. Utterers meaning and intentions. internet. The New York Times, April 22, 2007.
Philosophical Review, 78(2):147–177, 1969. [30] P. Rayson and R. Garside. Comparing corpora using
[15] B. J. Grosz. Focusing and Description in Natural frequency profiling, 2000.
Language Dialogues. Cambridge University Press, New [31] X. Shi, B. Tseng, and L. A. Adamic. Looking at the
York, New York, 1981. blogosphere topology through different lenses. In
[16] A. Java. http://ebiquity.umbc.edu/blogger/2007/ Proceedings of the International Conference on
04/15/global-distribution-of-twitter-users/. Weblogs and Social Media (ICWSM 2007), March
[17] J. M. Kleinberg. Authoritative sources in a 2007.
hyperlinked environment. Journal of the ACM, [32] P. Strawson. Intention and Convention in Speech
46(5):604–632, 1999. Acts. The Philosophical Review, 73(4):439–460, 1964.
[18] P. Kolari, T. Finin, Y. Yesha, Y. Yesha, K. Lyons, [33] D. J. Watts and S. H. Strogatz. Collective dynamics of
S. Perelgut, and J. Hawkins. On the Structure, ‘small-world’ networks. Nature, 393(6684):440–442,
Properties and Utility of Internal Corporate Blogs. In June 1998.
Proceedings of the International Conference on