Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

Analysis of Connectivity and Viewership between Twitch content

creators using Graph Visualization and Community Detection


Varun Deliwala Shail Patel Sahil Miskeen
B.Tech(CSE) B.Tech(CSE) B.Tech(CSE)
Ahmedabad University Ahmedabad University
Ahmedabad University
Ahmedabad, Gujarat, India Ahmedabad, Gujarat, India
Ahmedabad, Gujarat, India
shail.p1@ahduni.edu.in sahil.m@ahduni.edu.in
varun.d@ahduni.edu.in

ABSTRACT high-return gig or is it actually gruesome for everyone, and


along with that also came the question, how are content
With the advent of the increase in content creators and
creators actually connected when it comes to creating similar
streamers on the social media and streaming platforms, the
content.
connectivity between content creators has exponentially
The work aims at using social network analysis concepts to
increased and so has the need for analyzing the connectivity.
solve a simple problem and then comparing the results that
With concepts of social network analysis, analyzing such a
have been created and analyze the visualized graph along. After
network would actually give new content creators info about
forming proper data modules and graphs from which proper
what to do and what not to, who to follow, and who should they
inferences can be made, the analysis is based on comparative
not follow.
models. The results and hypothesis generated in the work are
based on a typical comparison-based approach for different
KEYWORDS
graphs for different datasets and then comparing the
Social Network Analysis, Visualization, Twitch, Community variability of results over datasets so as to come to conclusions
Detection, Distribution Analysis. like streamers from which language are most connected to new
streamers even though being veteran streamers and at the
ACM Reference format:
same time, content viewers prefer which type of
FirstName Surname, FirstName Surname, and FirstName Surname. content(mature/non-mature) in which language. Overall, we
2018. Insert Your Title Here: Insert Subtitle Here. In Proceedings of ACM
aim on putting forward the results from a study that analyses
Woodstock conference (WOODSTOCK’18). ACM, New York, NY, USA, 2
pages. https://doi.org/10.1145/1234567890
the network, checks which people are connected to which type
of people, and gives an overview of the current streaming scene
for 6 European languages.
1 Introduction
Users, on a social media platform, are connected based on who 2 Data Exploration
they follow and who follows them. For a streaming platform
like twitch which gives the live video streaming service that The dataset used is of streamers from 6 different languages,
focuses on video game live streaming, including broadcasts of namely, German, British English, Español, French, Portuguese
esports competitions, in addition to offering music broadcasts, and Russian. For each of the languages, we are given 2 types of
creative content, and "in real life" streams, the connection files where one gives us the info about nodes while the other
between the two particular nodes would be based on the one is giving us the info about the edges. The datasets have
follow/subscribing system but when it comes to being in the edges that are directed in nature and don't have any attributes
same community, two particular streamers could be like weights. The data overall doesn’t give us much except for
considered to be in the same community based on the games the graph and actually requires the creation of extra fields so as
that they play and the type of content they bring out for the to make the analysis possible in the first place. So, after
public. exploring the data, the need for creating new fields is created.
With online gaming booming through the years, along with-it At the same time, extracting the traits of the nodes in a graph is
content creation has also grown exponentially. The viewership also required, which is possible by calling in-built networkx
has grown and so has the number of content creators who functions, like degree and assortativity coefficients.
make content on a daily basis has also grown substantially.
With a huge increase in numbers also came a demand to
analyze whether content creation is a short-term, low-risk,

1
Ahmedabad University ‘22, Ahmedabad, Gujarat

Table 1: Data details for the 6 different datasets of 6 Once the data was preprocessed and uploaded on Gephi, we
different languages. were able to analyze different data sets in different formats,
where the node size and color could be altered according to
need, and the degree range for the visualized graph could be
3 Process edited. The most prerequisite analysis was based on the
modularity class distribution. Using Louvain Algorithms which
The first thing which has to be done with a dataset that is not is inbuilt in Gephi, we are able to segregate data into modularity
particularly big but spread is to sectionalize it for different classes. The algorithm was originally used as a fast community
fields. The dataset requires the implementation of the fields unfolding algorithm for large networks where the approach
such as days_compare and views_compare so as to would be based on modularity. This approach tries to maximize
categorically assign different nodes into different sets so as to the expected number of edges and the actual number of edges
better differentiate between them. Now to differentiate a node in a community. After actually visualizing the graph and
into three different categories of “New”, “Mid” and “Old”, a analyzing it, we aimed at drawing conclusions about the
metric has to be set such that we aren't particularly making a different cases and at the same time plotting the distribution
group niche. So the metric used was presuming that the for cases like degree distribution for different nodes around
distribution given would be normal in nature. So, the mid different age classes (based on days_compare).
consisted of 1 standard deviation from the mean on both sides.
That is nearly equal to 68% of the total data. The remaining Along with analyzing the traits of the nodes, we also analyzed
data on both sides were given the category of New and Old the assortativity of all the graphs and tier reciprocities using
respectively, New for the case that the days are lesser than python and networkx. Assortativity gave us that not a single
(mean – standard deviation) and old for the case that the days graph amongst all had nodes going back to similar nodes but
for the nodes are more than (mean + standard deviation). rather all the nodes wanted to connect with dis-similar nodes.
The reciprocity for all the graphs turned out to be 0 which gave
For the case of views, we simply divided it into two parts, the the hint that the edges don't traverse back to the source node
first part including the nodes that had views lesser than of the first edge at all, throughout the whole graph.
average and the second part consisting of nodes that had views
more than average. Once we received 2 new fields
days_compare and views_compare, we were able to start the
basic analysis. Then using networkx, we were able to append
the degree for each node to the nodes table for each of the
datasets. Table 2: Assortativity coefficient and Reciprocity of the
graphs formed by the datasets.

4 Comparative studies and Findings


4.1 Modularity/Views
The first comparison that we did was of modularity class v/s
the views that a streamer got. Here we wanted to understand if
there is any relation in the communities and the maximum
views. This relation could help us deduce whether or not one
single community is responsible for high viewership or in
simple words what type of community is more popular.
4.1.1 Results
We found out that the top streamers are spread over different
modularity classes in the British English dataset and even in the
French dataset. We can see that the same fact is true for the
Russian data as well but there are a few highly dominating
modularity classes as well thus we conclude that they have a
Figure 1: Graph visualized for the French dataset using, good spread but few modularity classes are much more
Gephi where the color is given by Modularity class and the dominating. While for the case of the Portuguese dataset it is
size of the nodes is given by the number of views. clearly visible that a particular modularity class is the most
dominating. Thus we can say that top streamer are spread over
different modularity classes in some cases but in some cases

2
predominantly streamers from some communities particularly
have a lot of viewers.

Figure 4: Comparing different cases in different datasets


via Graphs (modularity class vs days)(RU and PTBR)
Figure 2: Comparing different cases in different datasets
via Graphs (modularity class vs views)(RU and PTBR)

Figure 5: Comparing different cases in different datasets


via Graphs (modularity class vs days)(FR and ENGB)
4.2.1 Results
We found out that there is no bias for any of the languages when
Figure 3: Comparing different cases in different datasets considering the time spent by the streamers on the platform for
via Graphs (modularity class vs views)(FR and ENGB) any particular community. We successfully found out that all
the streamers are spread over different modularity classes
4.2 Modularity/Days regardless of which language they are streaming in.
Here we compared the modularity and the days since they have
been on the platform. This would allow us to understand how 4.3 days_compare/Views
different communities and different aged streamers are The days_compare refers to new, mid, and old creators. We
connected and if any community had dominance by any compared it with the views which would allow us to
particular age segment. understand which age group might have better dominance on
the basis of the parameters of the views.

3
Ahmedabad University ‘22, Ahmedabad, Gujarat

Figure 6: Comparing different cases in different datasets


via Graphs (days_compare vs views)(ENGB, FR, and DE)
4.3.1 Results
Here we deduced that in the case of English the big leagues are
mainly dominated by the older streamers, for French, the
domination spreads a bit and for Deutsche, the spread Figure 8: Mature vs Non-Mature content for Different
increases substantially we cannot figure out which particular datasets
age group dominates the most. We concluded that Deutsche is 4.5.1 Results
the most accepting and treats without the bias of how old or We successfully found out that people who posted more of the
new the streamer is whereas it is the exact opposite for the non-mature content got more views and it was true for every
English. dataset but in the case of Deutsche we could see that the
difference between mature and non-mature content was not
like the others. The difference was very less when compared to
the others so we can say that they have more tolerance for the
4.4 days/views_compare mature content compared to other languages.
The view_compare has been divided into two parts namely high
and low views. We compared that with the number of days to
get the inference about how the days can show us the 4.6 Degree Distribution for the nodes of
dominance of the new and older streamers. different degrees
The nodes here are divided into dNew, dMid, and dOld. We then
compared these degrees for all the sources with all the targets
to find out which degree has the most dominant among the
three.

Figure 7: Comparing different cases in different datasets


via Graphs (views_compare vs days)(ENGB, FR, and DE)
4.4.1 Results
Here as we compared the two we could deduce that for English
again the older streamers received the most views compared to
the newer streamers this reduces a bit in the case of the French
dataset and then significantly decreases for the Deutsche
dataset where we can see the changes very evidently through
the graph. Figure 9: Degree Distribution for the nodes of different
degrees (Russian, French, Portuguese and German)
4.5 Mature/Non-Mature Content 4.6.1 Results
The mature v/s non-mature content was described as having After analyzing the degree distribution, we found out that for
binary values and by having them segregated we tried to all the languages except for Russian and Portuguese, the dMid
understand using a simple bar graph whether or not the people was dominating and also increased more gradually than the
liked the mature content over immature content. This could other two parameters. If we do not consider the case of these
again help the streamers understand what type of content two (Outliers) we could conclude that the Mid-aged streamers
could work more for them. have more engagement because it could be possible that the
new streamers are yet to get their fan base and for the older
streamers it could be that their strategies might have been
outdated and hence the mid-aged streamers tend to dominate.

4
Also for the two outliers, we think that this anomaly could mean No of the views had the total of the views which we compared
that maybe these two languages might be new to the platform with the days_cmpre to get an exact idea of how many views
itself and it is still budding there thus the new streamers are were associated with which age category of the streamers. We
getting more attention because the platform not being more did this to again understand which age category is more
popular than the older streamers might have been inactive as dominant in accordance with the views that they are getting.
well.

4.7 Nodes of different degrees/dNew


Next we went on to analyze which degree was more dominant
when we tried to be more specific here namely for dNew nodes.
We wanted to understand the relationship between all the
streamers and get a better understanding of the age group
which has the most engagement and better connections. We did
the same process for dMid and dOld as well to be more case-
specific and find some interesting results.
4.7.1 Results
Here we surprisingly found the results to be the same across
the languages apart from the two outliers: Russia and
Portuguese. The mid-aged streamers had the most impactful Figure 11: No of views/days_compare for datasets DE,
results and the graph clearly indicates that they have a ENGB, ES, and FR and the outliers being circled.
significant and gradual increase which proves that the mid- .
aged streamers have better connections. 4.9.1 Results
Even though the above analysis suggested that there was a
significant difference between the mid-aged streamers and the
4.8 No of streamers/days_compare old-aged streamers this analysis proved that the views still
were very much close to each other when we talk about the
We also categorized the numeric data of how many streamers
mid-aged and the old-aged streamers. The new streamers had
were into which category of age class so we compared the
very less views in all the cases. In fact, we saw that in the case
days_compare with the number of streamers in each category
of the English dataset the old-aged streamers had more views
by using a simple bar graph.
than the mid-aged streamers which means that even though
the count of them is less the dominance is very high for the
older streamers. While for the case of French data we saw that
the new streamers received a better count of views with
respect to the other languages so we could say that they are
much more experimental and are supportive of the new
streamers. So this proved to us that the count of streamers in a
particular age group could not imply how many views they
would get, people do like to connect to veteran streamers as
much as the mid-level streamers.

5 Hypothesis
Hypothesis 1: - Based on how the human mind thinks, we
Figure 10: Mature vs Non-Mature content for the different hypothesized that old streamers must have a high connection
datasets and a more significant degree associated with them. These
4.8.1 Results higher degrees of connection must be leading to a more
The result of this analysis here is that we can pinpoint the significant number of views than any other age category of
numbers of the content creators according to which age section streamers. The result of which has been discussed in the
they belong to. It is very much evident that the mid-aged conclusion provided.
streamers were the most for all the languages and again not Hypothesis 2: If a more significant number of streamers
much difference could be seen between the new and old belong to a particular age group, the age group with the highest
streamers. counts of streamers would overall have the highest cumulative
views. The results of this hypothesis are described in the
4.9 No of views/days_compare conclusion stated below.

5
Ahmedabad University ‘22, Ahmedabad, Gujarat

[4] ACM Conferences. 2022. Social language network analysis | Proceedings of


6 Conclusion the 2010 ACM conference on Computer supported cooperative work.
Certain regions are more sensitive to the content they watch [online] Available at:
<https://dl.acm.org/doi/10.1145/1718918.1718925> [Accessed 2 May
(mature or not mature). While most of the regions prefer not 2022].
mature content; (Deutsche) region has a lower bias towards
[5] Youngchul Cha and Junghoo Cho. 2012. Social-network analysis using topic
not mature content than the other regions. The reason might be models. In <i>Proceedings of the 35th international ACM SIGIR conference
that the viewers might be above a specific age limit and can on Research and development in information retrieval</i> (<i>SIGIR
understand and appreciate mature content better than the '12</i>). Association for Computing Machinery, New York, NY, USA, 565–
574. https://doi.org/10.1145/2348283.2348360
other regions. [6] Barry Wellman. 1997. Using social network analysis to study computer
networks (tutorial). In <i>Proceedings of the international ACM SIGGROUP
conference on Supporting group work: the integration challenge</i>
An exciting fact observed from the study is that associativity for (<i>GROUP '97</i>). Association for Computing Machinery, New York, NY,
all the regions is negative. This observation concludes that the USA, 1. https://doi.org/10.1145/266838.276969
[7] William A. Hamilton, Oliver Garretson, and Andruid Kerne. 2014. Streaming
streamers streaming a particular content are significantly less
on twitch: fostering participatory communities of play within live mixed
likely to be connected to other streamers streaming the same media. In <i>Proceedings of the SIGCHI Conference on Human Factors in
content. This might be since they prefer other content rather Computing Systems</i> (<i>CHI '14</i>). Association for Computing
Machinery, New York, NY, USA, 1315–1324.
than viewing the same things they do. Another inference might https://doi.org/10.1145/2556288.2557048
be that they might not be following the streamers streaming the [8] William A. Hamilton, Oliver Garretson, and Andruid Kerne. 2014. Streaming
on twitch: fostering participatory communities of play within live mixed
same content due to competition between them. media. In <i>Proceedings of the SIGCHI Conference on Human Factors in
Computing Systems</i> (<i>CHI '14</i>). Association for Computing
In light of the study done on the twitch social network, we Machinery, New York, NY, USA, 1315–1324.
https://doi.org/10.1145/2556288.2557048
gained insightful insights. We inferred from the study that the [9] Claudia Flores-Saviaga, Jessica Hammer, Juan Pablo Flores, Joseph Seering,
regions (Russia and Portuguese) are comparatively new in Stuart Reeves, and Saiph Savage. 2019. Audience and Streamer Participation
at Scale on Twitch. In <i>Proceedings of the 30th ACM Conference on
terms of reach within the twitch network. Therefore, regions Hypertext and Social Media</i> (<i>HT '19</i>). Association for Computing
similar to them would have to be defined according to a new Machinery, New York, NY, USA, 277–278.
metric and would not fit into the existing metrics based on the https://doi.org/10.1145/3342220.3344926

comparative study. This has been based on the anomaly that


these regions, in particular, have a higher degree associated
with the newer nodes than the older and the mid-range nodes
(based on the days they have been on the platform).

We found out that the views need to be divided into broadly


four different categories (low, mid, high, and very high). The
study shows a large gap between the number of old and mid-
streamers. The mid streamers are more in number by almost 3
in every region observed. However, the views attributed to the
old and the mid streamers are almost the same, and, in some
cases, the total views for the old streamers are even higher than
the views procured by the mid streamers. Therefore, due to this
observation, we concluded that a fourth category needs to be
added to effectively compare the views associated with the
different ages the streamers have been on the platform.

REFERENCES
[1] Meier, F., 2020. Social Network Analysis as a Tool for Data Analysis and
Visualization in Information Behaviour and Interactive Information
Retrieval Research | Proceedings of the 2020 Conference on Human
Information Interaction and Retrieval. [online] ACM Conferences. Available
at: <https://dl.acm.org/doi/10.1145/3343413.3378018>
[2] Meier, F., 2020. Social Network Analysis as a Tool for Data Analysis and
Visualization in Information Behaviour and Interactive Information
Retrieval Research | Proceedings of the 2020 Conference on Human
Information Interaction and Retrieval. [online] ACM Conferences. Available
at: <https://dl.acm.org/doi/10.1145/3343413.3378018> [Accessed 3 May
2022].
[3] Communications of the ACM. 2022. The power of social media analytics |
Communications of the ACM. [online] Available at:
<https://dl.acm.org/doi/10.1145/2602574>

You might also like