Friendfeed Under A Microscope: A Visual Exploration of The Friendfeed Community

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

FriendFeed Under A Microscope:

A Visual Exploration of the FriendFeed Community

Authors

Guangming Lang, Lei Shi, Ryan Burton, Kai Wang, Xiaowen Zhang

Abstract

This paper reveals the general activities in the FriendFeed community from August to
September, 2010 by four different types of visualizations: bar charts for sources of entries and
users, interactive scatter plots for comments-likes relationship, word clouds for contents, and an
animation for user popularity.

Introduction

FriendFeed is a multi-lingual social networking service (SNS) website. It allows people to share
short text messages, pictures, and videos with their followers, and at the same time, it allows
users to comment directly under the original entries, or to push a button to like them. Users can
publish an entry directly using their FriendFeed accounts, or they can publish contents on
external services like Twitter, Facebook, or blogs, and channel the contents to FriendFeed. We
investigate what these services are. Also, tens of thousands of messages are published on
FriendFeed every day. Some are liked by others, some receive comments, and some fail to
generate any response from anyone. We investigate how exactly this plays out. We are also
curious about what is most talked about on FriendFeed. Finally, we find out who are the most
followed users and how quickly they gain followers.

In this paper, we visualize a two-month-period data retrieved using the FriendFeed API [1]. For
the question about entry sources, we create a bar chart to display their distribution. We also
create another bar chart to display the distribution of the user sources, sources that directed the
users to FriendFeed at the time when they registered. And we compare them. For the entry
engagement question, we address it with two scatter plots. One reveals the overall relationship
between the number of comments received and the number of likes received for the same entry,
with a zooming functionality. The other reveals the same relationship but having the underlying
data grouped by days, with a time slider. For the question about the actual contents, we choose
to study the English content by creating three word clouds of the most popular nouns, verbs and
adjectives and adverbs. For the question about user popularity, we make an animation and host
the data on the Amazon cloud because of its large size. Our final system is a website called
FriendFeed under a Microscope [3], which was built around these eight visualizations. The rest
of the paper discusses each visualization in details, and ends with a section on the evaluation of
the system.
Bar Charts for Sources of Entries and Users

We use the Processing programming language to create two bar charts for the sources of
entries and users respectively (Figure 1.1-1.2), however, these charts are done differently in
order to compare different design ideas.

Figure 1.1 shows the popularity of entry sources. Each entry posted to FriendFeed may be
relayed from an external source. This source may be either associated with a user’s feed to be
automatically imported, or a user can manually post content to share with others. This chart
explicitly categorizes larger sources – one can see upon exploration that Twitter, Google
Reader and delicious are among the most numerous. The other services are much less popular
in comparison, which illustrates the common long-tail phenomenon. At the far left is an “Other”
category, which houses almost 160000 different sources. These are usually miscellaneous
individual Web pages, blog posts and news stories. Graphing these would extend the tail much
farther.

This chart presents basic interactivity in the form of more detail on demand. Hovering over a bar
will highlight it and show the precise number of entries from the source.

Figure 1.2 shows the popularity of user sources. Friendfeed connects up to 58 external
services. We use a combination of bar chart and pie chart to visualize the popularity of these
external services. The pie chart with the legends shows shares among the top 10 services. One
can see from the exploration that Blog, RSS and Twitter are the top three services. Each service
with percentage of popularity above 1% is assigned a unique color while all other sources with
popularity below 1% have the same color. The x-axis is not specifically labeled because of the
limited space between each bar. Instead, we use an interaction to display this information: when
the mouse moves over a particular bar, that bar will be highlighted and its name and the value
of popularity will be displayed just above the highlighted bar.

While both charts show the dominance of a few sources and the famous long tail phenomenon,
it’s not surprising to see all top sources of the entries, except “identi.ca” and “tumblr,” are also
the top sources that got people registered on FriendFeed.
Figure 1.1 Sources of Entries

Figure 1.2 Sources of Users

Interactive Scatter Plots for Comments-Likes Relationship

The success of a micro-blogging service or social networking service site is determined by how
engaged its users are. FriendFeed allows users to comment on each other’s sayings or
simply express an affinity opinion by clicking on a “Like” button. These results in three types of
engagement: speaking alone, having a conversation or discussion, and liking.

We discover the relationship among them by scatter plots, with each dot representing a set of
entries which possess the same two coordinates (respectively representing the number of
comments and the number of likes it received in the two months period). Because there are four
types of entries: text only, image only, video only, image and video only, we use four colors to
represent each type and create four radio buttons for each type. Initially, we experiment with
check boxes, but the result becomes an unintelligible mesh of colors because dots from all four
types over lay on top of each other. By choosing the radio button approach, we lose the benefit
of comparing across the four types, but gain clarity for each type. Another challenge is that we
have 12,437,376 entries, and 89% of them receives no comments or likes, and this leads to a
high density around the point (0,0) and a slowdown of the visualization. We decide to just assign
one dot there and when the user moves the mouse over the dot, it will show the number of
entries.

We see that most people speak alone, and the more comments an entry receives, the less likes
it receives, and vice versa, indicating an inverse relationship between the two. Video entries and
Image and Video only entries tend to receive more likes than comments [4]. We are also
interested in the temporal changes of these engagements, so we group the data by day and
introduce a time slider underneath the x-axis over a time-series plot of the daily total number of
entries. We also introduce the size variable to indicate the total number of entries published on
each day. Similarly, we display the four types of entries separately by using the radio buttons to
switch. We discover that for each type of entries, the date of the largest number of entries
received is different, and so is the smallest number. Image and Video entries tend to receive
more comments and likes over time comparing to text only, image only or video only entries.
Figure 1.3 The distribution of entries in # of comments and likes

Figure 1.4 Total # of Comments & Likes entries receive in certain day
Figure 1.5 The Average # of Comments & Likes entries receive in certain day

Word Clouds for Contents

There are more than 27 languages used on FriendFeed. We are interested in discovering what
do people talk about when they post in English. To do so, we pre-processed the contents of
each FriendFeed entry using slide7, a Creative Commons language auto-detection tool written
by Fabio Celli [2]. Slide7 is able to achieve 70% accuracy.

With the English text identified, we wrote a script in Haskell to parse a file of the aggregated
content and produce a list of words with the associated word counts. We then selected the top
1000 most frequent words, and manually categorized the words into nouns, verbs and
adjectives/adverbs We further reduced redundant words, such that if two words have the same
meaning but different forms, we choose one form and sum the frequencies for a final count. For
example, (web, 72534) and (website, 44180) become (web, 116714); (make, 128867) and
(making, 36166) become (make, 165033); (large, 12269) and (big, 51931) become (large,
64200). Finally, we used the word cloud generator from Many Eyes [5] to generate three
dynamic word clouds.

As we can see from the visualization of nouns (Figure 3.1), people are very conscious of the
year “2010.” People talk more of “article” and “blog” than “books.” The perpetual themes that
have concerned us humans for thousands of years show up again here in their modern English
expressions: “business, money, home, love, men, and women.” Things were new fifty years ago
are among the most talked items: “app, Blog, email, Google, Facebook, iPhone, online, twitter,
and web.” The verbs (figure 3.2) reveal a strong sense of independence, purpose and reason.
Just look at these most frequently used verbs: “get, go, make, use, like, Google, find, know,
think, and want.” Plus, “rt,” meaning re-tweet, also shows up a lot, indicating lots of tweets are
channeled to FriendFeed from Twitter, and the users behind are very engaged since they are
re-tweeting other’s tweets. The adjectives/adverbs (figure 3.2) reveal an overwhelming positive
and spontaneous attitude: “best, first, good, great, free, just, new, now.” At the same time, the
presence of “really” and “probably” says that people are skeptical or less confident. Finally,
“social” is apparently a buzz word too.

Figure 3.1 Wordcloud for English Nouns

Figure 3.2 Wordcloud for English Verbs


Figure 3.3 Wordcloud for English Adjectives/Adverbs

An Animation for User Popularity

To make sense of the patterns by which users on FriendFeed follow the “stars” on the service,
we created an animated visualization using a mixture of Processing, server-side Perl, and
Amazon Web Services to show the times that each of the top 90 users gain followers, as seen
in Figure 4.1.

Figure 4.1: Animated Bar Chart of Follower Counts


Using data collected with millisecond granularity and aggregated by second for the visualization,
we show how the counts evolve over time with new followers falling from the top onto the bar
associated with the star. The height of the bar changes to reflect the new accumulated follower
count. The diameter of each of the falling circles is proportional to the number of followers for
each second, which can reflect sudden bursts in popularity. Interactivity is present for viewing
precise follower counts on hover.

The abstract clock in the upper-left corner shows the proportion of the two months that have
passed, and the text shows the day being visualized. The animation is sped up such that one
second in real time corresponds to three hours in the visualization.

Being able to see the rate of follower acquisition (or lack thereof) for these prominent users can
lead to some interesting insights. Barack Obama – the most popular user – gained the majority
of his followers for August and September within a very small time window. Other users gained
very little or no followers during these two months, which might suggest that as “stars” they may
have already gained most of their loyal followers and have stopped gaining new ones. Other
users’ follower bases grew more steadily over time, with one or two members gained at once.

Evaluation

In order to get feedback for our visualizations, we planned to use two methods, survey and one-
on-one user interview. We choose survey because it can bring us general feedback from a wide
population. For one-on-one interviews, we want to know users’ feelings and comments in more
details.

We proposed the following four questions to our users:

● Are you surprised by what the visualization reveals?


● How well did the visualization answer the question?
● Is the visualization intuitive enough?
● If you were the visualization designer, what would you do differently?

From these questions, we want to know what are users’ general feelings about our
visualizations, including whether the visualization could make them see the question and data
differently, how do they feel about the difficulty of understanding our visualization, and also
other suggestions or comments they would like to provide. These questions have been used in
both the survey and user interviews. However, for user interviews, we added more follow up
questions trying to dig deeper in users mind and get more firsthand feedback from them.

We got a lot of possible feedback from both the survey and users interviews. In general people
like our visualizations; they find them intuitive and can answer the questions we proposed
reasonably well. Besides, they also gave us many critics and suggestions on improvement
which we are gladder to see.
For Each Visualization:

Visualization 1 - Bar Charts for Sources of Users

1. People are surprised by the result. They thought Facebook and Twitter should be on the
top but the actually situation is surprisingly not. Instead, Blog and RSS feed take the
largest portion.
2. People want the labels to be attached to the lines, without having to hover over them.

Visualization 2 - Bar Charts for Sources of Entries

1. People like this visualization. The trend is simple and clear. However, they pointed out
that the graphic have a very long tail. It is really hard to read the ones in this long tail,
because the bars are so narrow and small. Users tell us that they still think those
resources are important and they are interested to know more about them. So they
suggested that whether we could change the scale of the coordinates or improve the
design of the bar chart to make these bars more visible.
2. Some users pointed out that they want bigger typeface, because the one right now is
hard to read.

Visualization 3 - Interactive Scatter Plots for Comments-Likes Relationship

1. The wordings of this visualization question need to be refined to better define the scope
and purpose of this visualization.
2. Users find this visualization a little bit hard to read because almost all the data are
gathered together at the left corner of the chart. They think the zoom-in feature is helpful
and really easy to use, however, they still find it would be easier for them to read the
visualization if data can be separated a little bit.
3. By just a glancing, users find they cannot get that they can interact with the pink box (for
zoom in version). Some people said a video demo or a brief instruction would be helpful.

Visualization 4 - Interactive Scatter Plots for Comments-Likes Relationship by Day

1. Users pointed out that the visualization doesn’t use universal scale on both coordinates,
so the trend is a little bit hard to tell.
2. Users think this visualization is easy to understand, however, they want to see all those
4 types of data showing simultaneously on one chart to better help them compare the
similarities and differences between each data set.

Visualization 5 - Word Clouds

1. A single word might have different attributes, for example, “love” could be a noun and
also a verb. They are confusing about how bias in separating the word cloud.
2. In the word cloud for “verb”, users are confusing about the word “rt” in it. The word “rt”
means twitter, there is a debate of whether we should include this verb in it.
3. In the “adj” word cloud, the word “new” is standing outside the word cloud, one of the
user said when looking at it, he feels that we put the word “new” there on purpose to
emphasize it. In fact, this is caused by the layout we chose for the word cloud which is a
little bit misleading.

Visualization 6 - Animation of the 90 “Stars” on FriendFeed Accumulate Followers

1. The wordings of this visualization question need to be refined to better defines the scope
and purpose of this visualization.
2. The data format in this visualization has low readability. Such as 9840000, users
suggested using “,” to separate the number, such as 9,840,000.
3. The time clock is not obvious for users. People know there should be time change, but
because of the running speed of this visualization, the date change is almost invisible for
users. There is suggestions that make it more obvious by blinking or changing colors
when time changes.
4. Users find the size of the dropping balls confusing.
5. Some really small balls are invisible when they are dropping.
6. Users pointed out that they would want to change the speed and moving directions of
the visualization, such as speed up, slow down, pause, move forward, move backward
and replay.

We value all feedback we got, which move us go further on our project. After the evaluation, we
made some improvements on our visualizations based on the feedback. For other suggestions,
we decided to put them as future directions for this project, because of the time limitation, we
cannot add them in the current visualizations.

Conclusion

We created eight web based visualizations to visualize a large dataset from the FriendFeed
community [1]. These visualizations focus on sources, contents, user engagement, and user
popularity. They can be accessed at http://www.micuby.com/infoviz649report/index.html.
References:

1. Fabio Celli, F. Marta L. Di Lascio, Matteo Magnani, Barbara Pacelli, and Luca Rossi. Social
network data and practices: the case of FriendFeed. In International Conference on Social
Computing, Behavioral Modeling and Prediction, Lecture Notes in Computer Science. Springer,
Berlin, 2010.
2. Fabio Celli, software website: http://clic.cimec.unitn.it/fabio/
3. FriendFeed Under A Microscope: http://www.micuby.com/infoviz649report/index.html
4. http://www.micuby.com/infoviz649report/viz3.html
5. http://www-958.ibm.com/software/data/cognos/manyeyes/

You might also like