Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/313476197

SOCIAL NETWORK ANALYSIS FOR MARKET TRENDS IDENTIFICATION: A


PRELIMINARY STUDY

Conference Paper · October 2014

CITATIONS READS

0 785

2 authors:

Evelyn Farias Reinaldo Gomes


Universidade Federal de Campina Grande (UFCG) Universidade Federal de Campina Grande (UFCG)
2 PUBLICATIONS   0 CITATIONS    44 PUBLICATIONS   138 CITATIONS   

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Work done with Master's students View project

All content following this page was uploaded by Evelyn Farias on 08 February 2017.

The user has requested enhancement of the downloaded file.


ISBN: 978-989-8533-24-1 © 2014

SOCIAL NETWORK ANALYSIS FOR MARKET TRENDS


IDENTIFICATION: A PRELIMINARY STUDY

Evelyn Farias and Reinaldo Gomes


Computing and System Department/ Federal University of Campina Grande – Campina Grande, Brazil

ABSTRACT
The study and analysis of social networks has been widely used to predict outcomes that solve real-world problems. In
these networks, users share information, generating useful data for analysis of various aspects, including market content.
This article raises a preliminary and simplified study of how content generated by users' activity on the Twitter microblog
can show the current trends in the smartphone market. By choosing five sets of keywords and collecting five sets of data,
we analyzed the results to respond if the choice of keywords, and the amount of collected tweets, influence the outcome
of the research, and which companies worldwide known as leaders in the smartphone market, have a greater influence
among social network users. Finally, we made a comparison with results of recent market researches, showing that the
results of this simplified experiment can really identify a real trend in the smartphone market.

KEYWORDS
Smartphone. Social Networks. Data Analysis. Twitter.

1. INTRODUCTION
Social networks have gained popularity quickly among people, organizations or other social entities, giving
their users an easy way to interact, communicate and share content with each other. Such networks have
grown so much that today they are part of the lives of their users.
The study and analysis of social networks give the ability to model many complex real problems. In this
context, we have several companies and organizations that use social networks to make their products and
services marketing strategies. The analysis of social networks for use in identifying market trends has shown
many advantages due to the fact that social media are updated frequently and include quite unbiased results
[5].
For this research, the microblog Twitter [11] was chosen as a source of data for analysis, due to its
simplicity and ease of data collection. Twitter has over 500 million active users, generating a data volume of
430 million tweets and handling over 1.6 million search queries per day [1], proving to be very useful in
identifying trends through various topics that affect different populations.
Thus, this research aims to conduct a preliminary study of social network analysis that can show which
are the most influential companies in the smartphone market. To that end, data was collected, for
approximately 48 hours, from tweets of Twitter users.
The experiment, as a business problem, aims to identify signs that can actually show real trends of
product’s popularity. As a technical problem, brings the relevance of the choice of keywords, which actually
generate useful results for the proposed analysis. Poorly chosen keywords can generate biased or even
meaningless results. The choice of keywords is one of the risks to the validity of this experiment and will be
addressed later in this article.
The social network analysis has been widely used and there is a large number of papers with several
proposed solutions to real world problems. Many works focus on the area of predicting market trends,
whether in sales of a product or service, till the stock market, using data extracted from several different
social networks.
All reviewed studies [1][4][5][6] focus on predicting trends, using data collected from Twitter, and
bringing a deeper approach to the subject, using more powerful crawlers, greater data filtering and sentiment
analysis to extract user’s opinions through their mentions in the social networks.

252
13th International Conference WWW/Internet 2014

Asur and Huberman [4] use the analysis of data from tweets to predict users' opinions about certain
products, presenting a detailed statistical analysis with a regression model for the problem.
The other studies reviewed [1][5][6] focus on the study of graph networks, analyzing connections
between users and tweets as well as sentiment analysis (opinion). In all studies is remarkable that a longer
period of data collection is necessary.
This study focuses on answering if the current most popular products and companies in the smartphones
market are indeed that, by the number of mentions in tweets of Twitter users. We collected five sets of data in
parallel, each corresponding to a set of keywords. The collection process was automated and performed at the
same time on the same machine.
The results of the analysis showed that the popularity of companies and products surveyed, even in a
small and preliminary study, corresponds to the most current market research where Apple, with the iPhone,
wins as the most popular manufacturer, and Android wins as the most popular operating system.
This article is divided into five sections. In the related work section we present the most relevant studies
to our research, showing important points covered in each. In the methodology section, we present how the
experiment was done, from choosing the keywords to the data collection. We recapitulate the purpose of the
experiment and present the research questions and hypotheses. In the implementation section, we focus on
the analysis of the collected data and answer the research questions. Finally, in the conclusions section, we
comment on the results of the experiment and suggest improvements for future experiments. The references
section brings together all the studies and papers on which this study was based.

2. METHODOLOGY
This research is experimental and aims to compare the results of analysis of the collected data sets, each
corresponding to a set of keywords. The inputs of the experiment are the keywords. For each run, a set of
keywords is used, each set corresponds to a level of input factor. The results of the experiment are analyzed
using the metrics: number of tweets per minute and the number of tweets obtained in each run.
The experiment aims to answer the research questions through the following hypotheses:

 P1: The number of keywords is relevant to the research result?


o H0-0: The number of keywords influence the research result.
o H0-1: The number of keywords does not influence the research result.

 P2: The volume of data collected is relevant to the research result?


o H1-0: The volume of data collected influence the research result.
o H1-1: The volume of collected data does not influence the research result.

The keywords were chosen in an attempt to not allow bias in the search process. The search terms in the
Twitter Search API [9] can be organized by AND or OR connectors. In our experiment, the keywords in each
set were connected by OR connectors. Twitter now supports a total of 140 characters per tweet, so it would
be impractical to put many different search terms with AND connector, because it would limit the results
(with the growth of input level) and could collect more spam tweets. Some spam tweets can be easily
identified because they include links to an external URL, more than one hashtag on disparate topics, an
amount of suggestive keywords (e.g., “iphone”, “android”, “mobile”) [12] and they are abnormally repeated
(same tweet sent by different users too many times).
Table 1 shows the keywords chosen by input level. Maximum level of 5 was defined for the present study
as being a small experiment. For more focused results of a particular company or product, it would be
interesting to use more terms (increasing input level) for more specific, less generic search results. For the
purpose of our experiment, the input terms would have to be the most neutral possible, so the smartphone and
mobile terms were chosen. The mobile term has some risk in its choosing, because it can bring a lot of data
not related to smartphones, but for the analysis we intend to do this extraneous data will not present
problems. Then the terms iPhone, android and windows phone, the first was chosen for being the best known
device, the second is best known for being the most used operating system [7] and the last to be better known
by the device type [8].

253
ISBN: 978-989-8533-24-1 © 2014

Table 1. Sets of Keywords


Number Keywords
1 Smartphone
2 smartphone, mobile
3 smartphone, mobile, iPhone
4 smartphone, mobile, iPhone, android
5 smartphone, mobile, iPhone, android, windows phone

The collection of data was performed by the utilization of the free and open source tool, NodeXL [10].
This tool brings the possibility of running data collections of some social networks via command line.
Through a configuration file, the collection can be automated through a task scheduler.
The NodeXL tool already incorporates graph searches and the collected data is returned in graphxml file
format. This format contains several layers with vertices of the graph of connections between users and
collected tweets. For our experiment, we used only the first layer, which contains the tweets and their
timestamps.
The search query is defined in the NodeXL’s configuration file. Five configuration files, one for each set
of keywords were written for the collection data process.
In Table 2 we can see how the automation of data collection was organized. A weekend was chosen
because users tweet more on Saturday and Sunday than on any other day of the week [2].
Table 2. Scheduling and Automation Of Data Collection.
Parameter Value
Collection period of time Weekend (approx. 48 hours)
Days of collection July 19, 20 and 21 of 2014
Collection schedules 4am of July, 19 of 2014 to
( according to timestamps of the tweets) 5amof July, 21 of 2014
Automatic collection period of time Every 30 minutes a new collection

The period of 30 minutes between each new collection was defined after tests with various lengths of
time, from 5 minutes to 1 hour. Every 30 minutes, the data collected is renewed, confirming that the best way
to obtain data relatively continuously would be putting a space of 30 minutes between collections.
Data collection was performed on the same machine, through a task scheduler. The data for each set of
keywords were collected in parallel, at the same time.

3. IMPLEMENTATION
The collected data had to be transformed into simple tables for statistical analysis. We could verify that with
the increase in the level of the input, the volume of collected data increased, so more tweets were captured.
This shows that, as expected, the number of keywords affects the volume of collected data.
As expected, the collected data was not normal. There are a large number of spans per minute, which
generate outliers in the analysis, but they were not removed from the data because that would not influence
the outcome of the experiment. Our objective is to evaluate the popularity of products, therefore, spans are
still part of we want in this preliminary study.
Figure 1 shows the confidence intervals with significance level of 5% of the average tweets per minute
for each set of keywords. Interestingly, from the level of 3 to 5, the confidence intervals suffer intersection,
giving us indications that it is not possible to say what level of keyword has a higher amount of tweets per
minute.

254
13th International Conference WWW/Internet 2014

Figure 1. Confidence intervals at 95% of the mean of tweets per minute


It is remarkable the vertical separation of means at levels 1, 2 and 3, and the considerable increase in the
amplitude range of the confidence intervals starting at level 3. This shows us that the choice of keywords, in
fact, influences the amount of data obtained.
Figure 2 shows the boxplots of tweets per minute for each level of input. Again, we see the difference
between the inputs starting at level 3. From this level, the medians are very close, but the amplitude of the
boxplots has significant variations showing an increase in tweets per minute through the increased level of
the input.

Figure 2. Boxplots of the tweets per minute

255
ISBN: 978-989-8533-24-1 © 2014

In Table 3 we can see the calculated proportions for each search term, corresponding to a product at each
level of input.
When the input has only neutral terms, such as smartphone and mobile in levels 1 and 2, Android has a
higher proportion of mentions than the other terms. Starting at level 4, when the terms iPhone and android are
both placed in the input, we can notice the proximity between the proportions of tweets mentioning them in
both levels 4 and 5. At level 5, the term windows phone is inserted into the input, which gives us more tweets
mentioning it, but still a very significant number.
Table 3. Proportions of Each Search Query in Each Input Level.
Amount of keywords Most popular in the Smartphone Market
Iphone Android Windows Phone
1 0.3% 1.4% 0.05%
2 0.1% 0.5% 0.08%
3 10.4% 1.5% 0.01%
4 7.2% 6.5% 0.02%
5 7.6% 6.1% 0.13%

We can conclude that, when input terms are more neutral, android has a greater number of mentions in
users’ tweets. Whereas when input terms are less neutral, iPhone has a greater number of mentions in users’
tweets. That leads us to a conclusion that android is the most mentioned, or most popular operating system,
and iPhone is the most popular device in the smartphone market.

Top Smartphone OEMs


3 Month Avg. Ending Mar. 2014 vs. 3 Month Avg. Ending
Dec. 2013
Total U.S. Smartphone Subscribers Age 13+
Source: comScore MobiLens
Share (%) of Smartphone
Subscribers
Dec-13 Mar-14 Point Change
Total Smartphone 100.0% 100.0% N/A
Subscribers
Apple 41.8% 41.4% -0.4
Samsung 26.1% 27.0% 0.9
LG 6.6% 6.7% 0.1
Motorola 6.7% 6.4% -0.3
HTC 5.7% 5.4% -0.3

Figure 3. Top smartphone manufacturers in the USA


Source: comScore, 2014
Figure 3 shows the top manufacturers of the most popular smartphones in the United States, between
December 2013 and March 2014. Apple, the manufacturer company of the iPhone, comes out ahead of the
others.
In Figure 4 are the most popular smartphone platforms in the United States, from December 2013 to
March 2014. Android is the most popular platform, followed by iOS (iPhone).

256
13th International Conference WWW/Internet 2014

Top Smartphone Platforms


3 Month Avg. Ending Mar. 2014 vs. 3 Month Avg. Ending
Dec. 2013
Total U.S. Smartphone Subscribers Age 13+
Source: comScore MobiLens
Share (%) of Smartphone
Subscribers
Dec-13 Mar-14 Point
Change
Total Smartphone 100.0% 100.0% N/A
Subscribers
Android 51.5% 52.2% 0.7
Apple 41.8% 41.4% -0.4
BlackBerry 3.4% 2.7% -0.7
Microsoft 3.1% 3.3% 0.2
Symbian 0.2% 0.2% 0.0

Figure 4. Top smartphone platforms in the USA


Source: comScore, 2014
The results of our experiment are consistent with research conducted in the real world. This shows that, in
fact, the analysis of social networks can be used to solve real problems.
The amount of keywords is relevant to the outcome of the experiment, as previously shown. The choice
of these keywords is very important to the validity of the experiment, because it can greatly alter the value of
the data collected.
By the amount of outliers, we cannot say whether the volume of data is relevant to the research. However,
we find a longer period of collection is necessary for further data filtering, the performance of which will
have stronger data, hence more reliable results.

4. CONCLUDING REMARKS
Through data analysis we found that the choice and the number of keywords influence the outcome of the
experiment. This answers our first research question, accepting its null hypothesis that the number of
keywords are relevant to the experiment results.
Observing the collected data and the results of its analysis, we found that, for a more robust and reliable
experiment, we would need a larger amount of data. Thus, we can answer our second research question,
tending to accept its null hypothesis that the volume of data influences the outcome of the experiment.
The experiment, though it’s simple and preliminary, could point through the quotes from social network
users, the actual trends in popularity of products. However, for a more robust and reliable experiment where
we can assert market trends, without comparing with real-world research results, we need a longer period of
data collection, a dataset of keywords further studied and improved filtering of the data used.
This is a starting point for further research in social network analysis, showing that, through a simple
social network like Twitter, it is possible to obtain data relevant to solve real problems. In future work, we
suggest, in addition to the changes mentioned above, the analysis of sentiment of the users’ quotes for a
deeper analysis of the popularity of brands, products, or services. Experiments using the same model of
analysis can be done in other areas, not only marketing of products.

257
ISBN: 978-989-8533-24-1 © 2014

REFERENCES
[1] Tumasjan, Andranik, Timm O. Sprenger, Philipp G. Sandner, and Isabell M. Welpe. “Predicting elections with
twitter: What 140 characters reveal about political sentiment." In Proceedings of the fourth international aaai
conference on weblogs and social media, pp. 178-185. 2010.
[2] “Strategies for Effective Tweeting: A Statistical Review”, in www.salesforce.com/marketing-clould, pp.06-07, 2012.
[3] Erika Jurisová. “The impact of social networking on business and business ethics”. Accessible at:
http://www.cutn.sk/Library/proceedings/mch_2013/editovane_prispevky/46.%20Juri%C5%A1ov%C3%A1.pdf
[4] Sitaram Asur, Bernardo A. Huberman, "Predicting the Future with Social Media," wi-iat, vol. 1, pp.492-499, 2010
IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, 2010
[5] David Alfred Ostrowski. “Identification of Trends in Consumer Behavior through Social Media”. Accessible at:
http://www.iiis.org/CDs2013/CD2013SCI/SCI_2013/PapersPdf/DW543ZE.pdf
[6] David Alfred Ostrowski. “Social Network Analysis for Consumer Behavior Prediction”. Accessible at:
http://worldcomp-proceedings.com/proc/p2012/ICA3445.pdf
[7] http://www.ibtimes.com/android-market-share-nears-52-percent-apple-iphone-still-most-popular-device-us-723349
[8] http://www.wpcentral.com/its-official-windows-phone-third-most-popular-smartphone-os
[9] http://dev.twitter.com/docs/api/1.1/get/search/tweets
[10] http://nodexl.codeplex.com
[11] http://www.twitter.com
[12] Sarita Yard, Daniel Romero, Grant Schoenebeck, Danah Boyd, “Detecting Spam in a Twitter Network”, First
Monday – Peer-reviewed Journal on the Internet, Vol.15, 2010

258

View publication stats

You might also like