Social Problems

SOCIAL SCIENCES
Social media for large studies of behavior

Large-scale studies of human behavior in social media need to be held to higher
methodological standards
different social media platforms (8). For in- The rise of “embedded researchers” (re-
By Derek Ruths1* and Jürgen Pfeffer2 stance, Instagram is “especially appealing to searchers who have special relationships
adults aged 18 to 29, African-American, La- with providers that give them elevated ac-
O
n 3 November 1948, the day after tinos, women, urban residents” (9) whereas cess to platform-specific data, algorithms,
Harry Truman won the United States Pinterest is dominated by females, aged 25 to and resources) is creating a divided social
presidential elections, the Chicago 34, with an average annual household income media research community. Such research-
Tribune published one of the most of $100,000 (10). These sampling biases are ers, for example, can see a platform’s inner
famous erroneous headlines in rarely corrected for (if even acknowledged). workings and make accommodations, but
newspaper history: “Dewey Defeats Proprietary algorithms for public data. may not be able to reveal their corrections
Truman” (1, 2). The headline was informed Platform-specific sampling problems, for or the data used to generate their findings.
Downloaded from https://www.science.org at Ben Gurion University on January 04, 2024

by telephone surveys, which had inadver- example, the highest-volume source of pub-
tently undersampled Truman supporters lic Twitter data, which are used by thou- REPRESENTATION OF HUMAN BEHAV-
(1). Rather than permanently discrediting sands of researchers worldwide, is not an IOR. Human behavior and online platform
the practice of polling, this event led to the accurate representation of the overall plat- design. Many social forces that drive the
development of more sophisticated tech- form’s data (11). Furthermore, researchers formation and dynamics of human behavior
niques and higher standards that produce are left in the dark about when and how and relations have been intensively studied
the more accurate and statistically rigorous social media providers change the sam- and are well-known (12–14). For instance,
polls conducted today (3). pling and/or filtering of their data streams. homophily (“birds of a feather flock to-
Now, we are poised at a similar techno- So long as the algorithms and processes gether”), transitivity (“the friend of a friend
logical inflection point with the rise of on- that govern these public data releases are is a friend”), and propinquity (“those close
line personal and social data for the study of largely dynamic, proprietary, and secret or by form a tie”) are all known by designers
human behavior. Powerful com- undocumented, designing reliable and re- of social media platforms and, to increase
POLICY putational resources combined producible studies of human behavior that platform use and adoption, have been incor-
with the availability of massive correctly account for the resulting biases porated in their link suggestion algorithms.
social media data sets has given rise to a will be difficult, if not impossible. Academic Thus, it may be necessary to untangle psy-
growing body of work that uses a combina- efforts to characterize aspects of the behav- chosocial from platform-driven behavior.
tion of machine learning, natural language ior of such proprietary systems can provide Unfortunately, few studies attempt this.
processing, network analysis, and statistics details needed to begin reporting biases. Social platforms also implicitly target
for the measurement of population struc-
ture and human behavior at unprecedented
scale. However, mounting evidence suggests
that many of the forecasts and analyses be- Reducing biases and faws in social media data
ing produced misrepresent the real world
(4–6). Here, we highlight issues that are DATA COLLECTION
endemic to the study of human behavior
• 1. Quantifes platform-specifc biases (platform design, user base, platform-specifc
through large-scale social media data sets behavior, platform storage policies)
and discuss strategies that can be used to
• 2. Quantifes biases of available data (access constraints, platform-side fltering)
address them (see the table). Although some
of the issues raised are very basic (and long- • 3. Quantifes proxy population biases/mismatches
studied) in the social sciences, the new kinds
METHODS
of data and the entry of a variety of com-
munities of researchers into the field make • 4. Applies flters/corrects for nonhuman accounts in data
these issues worth revisiting and updating.
• 5. Accounts for platform and proxy population biases
a. Corrects for platform-specifc and proxy population biases
REPRESENTATION OF HUMAN POPU- OR
LATIONS. Population bias. A common as- b. Tests robustness of fndings
sumption underlying many large-scale social • 6. Accounts for platform-specifc algorithms
media-based studies of human behavior a. Shows results for more than one platform
OR
is that a large-enough sample of users will b. Shows results for time-separated data sets from the same platform
drown out noise introduced by peculiarities
of the platform’s population (7). However, • 7. For new methods: compares results to existing methods on the same data
substantial population biases vary across • 8. For new social phenomena or methods or classifers: reports performance
on two or more distinct data sets (one of which was not used during classifer
1
Department of Computer Science, McGill University, development or design)
Montreal, Quebec H3A 0G4, Canada. 2Institute for Software
Research, Carnegie Mellon University, Pittsburgh, PA 15213, Issues in evaluating data from social media. Large-scale social media studies of human behavior should i
USA. *E-mail: derek.ruths@mcgill.ca address issues listed and discussed herein (further discussion in supplementary materials).
SCIENCE sciencemag.org 28 NOVEMBER 2014 • VOL 346 ISSUE 6213 1063

Published by AAAS
INSIGHTS
and capture human behavior according to Incomparability of methods and data. (so as to permit the calculation of a signifi-
behavioral norms that develop around and With few exceptions, the terms of usage for cance score within the study itself).
as a result of the specific platforms. For in- social media platforms forbid the retention
stance, the ways in which users view Twitter or sharing of data sets collected from their CONCLUSIONS. The biases and issues high-
as a space for political discourse affects how sites. As a result, canonical data sets for lighted above will not affect all research in
representative political content will be. The the evaluation and comparison of compu- the same way. Well-reasoned judgment on
challenge of accounting for platform-specific tational and statistical methods—common the part of authors, reviewers, and editors is
behavioral norms is compounded by their in many other fields—largely do not exist. warranted here. Many of the issues discussed
temporal nature: They change with shifts Furthermore, few researchers publish code have well-known solutions contributed by
in population composition, the rise and fall implementing their methods. The result is other fields such as epidemiology, statistics,
of other platforms, and current events (e.g., a culture in which new methods are intro- and machine learning. In some cases, the
revelations concerning interest and tracking duced (and often touted as being “better”) solutions are difficult to fit with practical
of social media platforms by intelligence ser- without having been directly compared to realities (e.g., as in the case of proper sig-
vices). In the absence of new methodologies, existing methods on a single data set. Given nificance testing) whereas in other cases the
we must rely on assessments of where such community simply has not broadly adopted
entanglements likely occur. best practices (e.g., independent data sets for
Distortion of human behavior. Develop- testing machine learning techniques) or the
ers of online social platforms are building There is “ … the need for existing solutions may be subject to biases

tools to serve a specific, practical purpose— increased awareness of what of their own. Regardless, a crucial step is to
not necessarily to represent social behavior resolve the disconnect that exists between
or provide good data for research. So, the is actually being analyzed …“ this research community and other (often
way data are stored and served can destroy related) fields with methods and practices
aspects of the human behavior of interest. platforms’ understandable sensitivity to user for managing analytical bias.
For instance, Google stores and reports final privacy and the competitive value of their Moreover, although the issues highlighted
searches submitted, after auto-completion is data, the research community will likely im- above all have different origins and specific
done, as opposed to the text actually typed prove method and result comparison issues solutions, they share in common the need
by the user (5); Twitter dismantles retweet more quickly by focusing on enforcing the for increased awareness of what is actually
chains by connecting every retweet back sharing of methods at publication time. being analyzed when working with social
to the original source (rather than the post Multiple comparison problems. The body media data. ■
that triggered that retweet). There are valid, of social media analysis that concerns the de-
REF ERENCES AND NOTES
practical reasons for platforms to make such velopment of user/content classification and
1. This was not the first or last such erroneous prediction, e.g.,
design decisions, but in many cases these prediction has unaddressed issues with over- the Literary Digest on the 1936 U.S. Presidential election.
either obscure or lose important aspects of fitting. Specifically, when building a com- 2. F. Mosteller, H. Hyman, P. J. McCarthy, E. S. Marks, D. B.
Truman, The Pre-Election Polls of 1948 (Bulletin 60, Social
the underlying human behavior. Quantifying putational machine that recognizes two or Science Research Council, New York, 1949).
and, if possible, correcting for these storage more classes (of users, for example), it is cus- 3. I. Crespi, Public Opinion, Polls, and Democracy (Westview
and access policies should be part of the data tomary to introduce tens to hundreds of fea- Press, Boulder, CO, 1989).
4. Z. Tufekci, in ICWSM ’14: Proceedings of the Eighth
set reporting and curation process. tures as the basis for the classifier. At the very International Association for the Advancement of Artificial
Nonhumans in large-scale studies. Despite least, the performance of the classifier should Intelligence (AAAI) Conference on Weblogs and Social Media
(AAAI, Palo Alto, CA, 2014).
attempts by platform designers to police ac- take into account the number of features be- 5. D. Lazer, R. Kennedy, G. King, A. Vespignani, Science 343,
counts, there are large populations of spam- ing used. Of greater concern, however, is the 1203 (2014).
mers and bots masquerading as “normal” extent to which the classifier performance is 6. R. Cohen, D. Ruths, ICWSM ’13: Proceedings of the Seventh
International AAAI Conference on Weblogs and Social Media
humans on all major online social platforms. a result of “feature hunting”—testing feature (AAAI, Palo Alto, CA, 2013), pp. 91–99.
Moreover, many prominent individuals after feature until one is found that delivers 7. V. Mayer-Schoenberger, K. Cukier, Big Data: A Revolution
That Will Transform How We Live, Work, and Think (Houghton
maintain social media accounts that are pro- significant performance on the specific data Mifflin Harcourt, New York, 2013).
fessionally managed to create a constructed set. Standard practices of reporting the P 8. A. Mislove, S. Lehmann, Y.-Y. Ahn, J.-P. Onnela, J. N.
image or even behave so as to strategically value for classifiers based on the number of Rosenquist, ICWSM ’11: Proceedings of the Fifth International
AAAI Conference on Weblogs and Social Media (AAAI, Palo
influence other users. It is hard to remove or features involved, as well as keeping a data Alto, CA, 2011), pp. 554–557.
correct for such distortions. set independent of the training set for final 9. M. Duggan, J. Brenner, The demographics of social
media users; www.pewinternet.org/2013/02/14/
classifier evaluation, would work toward ad- the-demographics-of-social-media-users-2012/.
ISSUES WITH METHODS. Proxy population dressing these issues (15). 10. 13 ‘pinteresting’ facts about Pinterest users; www.pinterest.
mismatch. Every social media research ques- Multiple hypothesis testing. In an academic com/pin/234257618087475827/.
11. F. Morstatter, J. Pfeffer, H. Liu, Proceedings of Web Science
tion defines a population of interest: e.g., culture that celebrates only positive findings, Track, at the 23rd Conference on the WWW (Association for
voting preference among California univer- a meta-issue emerges as multiple groups Computing Machinery, New York, 2014), pp. 555–556.
12. M. McPherson et al., Annu. Rev. Sociol. 27, 415 (2001).
sity students. However, because human pop- report successes in modeling or predicting 13. F. Heider, J. Psychol. 21, 107 (1946).
ulations rarely self-label, proxy populations a specific social phenomenon. Without see- 14. L. Festinger, S. Schachter, K. Back, in Social Pressure in
of users are commonly studied instead, for ing the failed studies, we cannot assess the Informal Groups, L. Festinger, S. Schachter, and K. Back, Eds.
(MIT Press, Cambridge, MA, 1950), chap. 4.
example, the set of all Facebook users who extent to which successful findings are the 15. S. J. Russell, P. Norvig, Artificial Intelligence: A Modern
report attending a UC school. However, the result of random chance. This issue has been Approach (Pearson Education, Upper Saddle River, NJ,
2003).
quantitative relation between the proxy and observed when predicting political election 16. H. Schoen et al., Internet Res. 23, 528 (2013).
original populations studied, typically, is un- outcomes with Twitter (16). We are not the 17. J. P. A. Ioannidis, PLOS Med. 2, e124 (2005).
known—a source of potentially serious bias. only field struggling with this issue (17). Solu- SUPPLEMENTARY MATERIALS
A recent study revealed that this proxy effect tions to this problem could involve enabling www.sciencemag.org/content/346/6213/1063/suppl/DC1
has caused substantially incorrect estimates the publication of negative results or requir-
of political orientation on Twitter (6). ing the use of more data sets in a single study 10.1126/science.1257756
1064 28 NOVEMBER 2014 • VOL 346 ISSUE 6213 sciencemag.org SCIENCE
Published by AAAS
Social media for large studies of behavior
Derek Ruths and Jürgen Pfeffer
Science 346 (6213), . DOI: 10.1126/science.346.6213.1063

View the article online
https://www.science.org/doi/10.1126/science.346.6213.1063
Permissions
https://www.science.org/help/reprints-and-permissions
Use of this article is subject to the Terms of service
Science (ISSN 1095-9203) is published by the American Association for the Advancement of Science. 1200 New York Avenue NW,
Washington, DC 20005. The title Science is a registered trademark of AAAS.
Copyright © 2014, American Association for the Advancement of Science

Social Problems

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Social Problems

Uploaded by

Copyright:

Available Formats

SOCIAL SCIENCES

Social media for large studies of behavior

Downloaded from https://www.science.org at Ben Gurion University on January 04, 2024

SCIENCE sciencemag.org 28 NOVEMBER 2014 • VOL 346 ISSUE 6213 1063

Downloaded from https://www.science.org at Ben Gurion University on January 04, 2024

1064 28 NOVEMBER 2014 • VOL 346 ISSUE 6213 sciencemag.org SCIENCE

Science 346 (6213), . DOI: 10.1126/science.346.6213.1063

Downloaded from https://www.science.org at Ben Gurion University on January 04, 2024

Use of this article is subject to the Terms of service

Copyright © 2014, American Association for the Advancement of Science

You might also like