2018 Troubling The Concept of Data in Qualitative Digital Research The-Sage-Handbook-Of-Qualitative-Data-Collection - I3422

The SAGE Handbook of Qualitative Data
Collection
Troubling the Concept of Data in Qualitative Digital
Research
Contributors: Author:Annette N. Markham

Book Title: The SAGE Handbook of Qualitative Data Collection
Chapter Title: "Troubling the Concept of Data in Qualitative Digital Research"
Pub. Date: 2018
Access Date: March 23, 2021
Publishing Company: SAGE Publications Ltd
City: 55 City Road
Print ISBN: 9781473952133
Online ISBN: 9781526416070
DOI: http://dx.doi.org/10.4135/9781526416070.n33
Print pages: 511-523
© 2018 SAGE Publications Ltd All Rights Reserved.
This PDF has been generated from SAGE Knowledge. Please note that the pagination of the online
version will vary from the pagination of the print book.
SAGE SAGE Reference
© Uwe Flick, 2018
Troubling the Concept of Data in Qualitative Digital Research
Troubling the Concept of Data in Qualitative Digital Research

Annette N. Markham
Introduction
In late 2016, the unanticipated results of the US presidential election sent a shock wave through the world,
including the digital data science community. All the election polls predicted that Hillary Clinton would win.
All of them turned out to be wrong. Even the republican candidate did not expect to win, based on the data
analytics. It seemed a failure of data.1
This chapter focuses on the concept of data to clarify how it operates on our research sensibilities. By
deconstructing the concept, we can better situate it, consider whether or not we should use the term at all,
or be more clear in our definitions of what we mean when we explain our research to others. Drawing on
current critical academic responses2 to the rise of data and big data, I posit that data operates on at least two
levels; as thing and as ideology. Though inextricable in practice, we can separate these concepts momentarily
to begin to identify how quite different meanings might be operating in our theoretical frameworks, research
design, and everyday activities. Once these dual levels are recognized – a process that requires conscious
and critical self-reflexivity – one can more strategically frame and use the interpretation of data in multiple and
nuanced ways, to add layers of meaning or augment the analytical processes.
The term ‘data’ refers to many things (see Flick, Chapter 1, this volume, and Lindgren, Chapter 28,
this volume). For example, we could think of data as the representation of traces of human and non-
human behaviors and experiences, isolated and observed as discrete objects. While not the only way to
describe data, this conceptualization has become prominent in the so-called digital age, information age,
or internet age for good reason. Our social situations are increasingly embedded in, or saturated with,
digital and global networks of information flows. We leave traces everywhere when we connect to the
internet. Massive amounts of information can be collected. Any of us who use the internet know that we are
continually producing data that will be archived – by us, by marketers, by the companies who provide our
devices, platforms, apps, and so forth. The information itself is microscopic and detailed. Whether produced
deliberately or not, it is possible to archive these traces, transforming them to units of information that can
then be combined with data that has been produced, archived, and transformed elsewhere.
Computation of large datasets can reveal interesting patterns and yield novel insights about human behavior.
Perhaps because data is so plentiful, miniscule, and detailed, we – and here I mean data scientists as well as
politicians and citizens – can sometimes forget that it is not meaningful in itself. This mistake sometimes takes
the form of assuming the parts add up to the whole. Or conflating data with knowledge. Whatever the specific
form of faulty reasoning, overvaluing the immediate meaning and truth value of data is a problem amplified
by the size and number of datasets involved in even the most basic of algorithmic calculations. We have
reached an era when we no longer have the human computational power to calculate the math necessary
to analyze massive datasets.3 Algorithms take the place of human cognition and we must trust various self-
learning mathematical models (or neural networks) to make the computations for us.
‘Data’ is not a bad term in itself, but because its value in this decade of big data is overstated, many faulty
logics and premises about data, truth, and algorithmic computation can end up influencing how we make
sense of the world around us. Returning to November 9, 2016, the night of the US elections, shock about
this unforeseen turn of predicted outcomes shifted to anger; social media exploded into heated arguments
about who or what was to blame for such miscalculations in the expected results. Was it Hillary Clinton's team
that failed to analyze the data correctly? Were journalists biased against and therefore blind to the poor or
uneducated whites who came out in droves to vote for Donald Trump? Did the polling companies collect data
inadequately? Were the algorithms and formulas underlying the polls or forecasts wrong? One might ask: why
was everyone so shocked? As boyd and Crawford (2012) remind us in their critique of exaggerations about
the power of big data, data never speak for themselves, but are interpreted. But when we are continually
The SAGE Handbook of Qualitative Data Collection
Page 2 of 12
SAGE SAGE Reference
© Uwe Flick, 2018
confronted by ‘facts’ that are beyond our human cognitive ability to double check, common sense can fail. We
simply believe what is seen, or more precisely, what the numbers tell us. At least in the immediate aftermath
of the 2016 elections, frustration continued to grow.
As many realized and have discussed since, finding the best tools or metrics for collecting and analyzing data
is not the answer. It is the failure of interpretation that always catches us in the end. This became an even
more paramount truth as we faced a third shock wave in the weeks after the election – that fake news had
been widely planted, believed, and spread through social networking sites like Facebook. Alongside ‘post-
truth’ as OED's word of the year,4 the public was continually reminded that any supposed fact or truth could
be believed, despite the blatant absence of any evidence.
Let me underline the key point in relation to this chapter: Interpretation, not data, is where we should be
focusing our attention. This should be a comfortable statement for those familiar with epistemologies and
approaches that are labeled qualitative, for whom the term ‘data’ has been problematized for decades. ‘Data’
and ‘Computation’ are not favored terms to describe qualitative methods because they symbolically indicate
an approach fundamentally opposed to the hallmark of qualitative inquiry, which is inductive, immersive, and
interpretive. How should qualitative researchers respond to this new tidal shift toward datafication? How do
we design studies when ‘data’ becomes the predominant concept for giving shape or meaning to cultural
materiality? We could simply refuse to use the term, since it does not fit well with the qualitative enterprise.
Or we could try to replace ‘data’ with other terms. Neither option confronts the more insidious problem, which
is not data itself, or the growth of computation as a way of knowing, but the revival and creeping spread
of positivist procedures and frameworks. Retaining the strength of the qualitative approach requires critical
awareness of what guides our practical choices and everyday practices.
The importance of disturbing concepts
Concepts are multiplicitous and therefore ambiguous. They shape and target our sensibilities and thus
function as powerful guides for action. But because concepts are comprised of multiple meanings, they shift in
meaning and emphasis over time and use. They also allow for specification and transformation within context.
Both the concept of ‘data’ and the concept of ‘digital’ are shifty creatures in this decade, since they are in high
and varied use across multiple stages of discourse. It is important to continually re-examine our concepts, to
keep them from stabilizing, at which point they take on more power than they deserve, to define and delimit
what and how we make sense of the world.
Throughout what follows, I propose we continually disturb the concept of data, testing it against other viable
terminologies that might frame and inform our inquiry practices. In such a way, we can better articulate what
we mean when we use the term to describe what we're up to, whether this is related to ‘the digital’ or not.
We'll also have better research results in the end, since this reflexive exercise can do nothing but improve our
research designs.
Another concept, ‘digital', is central to this chapter. As a concept, this term functions on multiple levels. Most
directly, whatever we call ‘the digital’ is born from the transformation of tangible, visual, or audible analog
material into a binary system of zeros and ones, which become ‘on’ or ‘off’ electrical impulses in a computer
system. Networked, the digital travels through the Internet. Machine language systems help us transform
these bits back into meaningful information. This definition functions at a literal level to help us comprehend
what is meant by the term. It is at other levels where we find meaning for what we might label digital identity,
digital media, digital culture, or digital data.
Digital never stands alone as a topic of inquiry. As a modifier for some other concept, digital operates
metonymically to stand in for countless modes of interaction, types of information, platforms for interaction,
and cultural formations. For the past two decades, we have been living on a planet where many social if not
human processes are digitally saturated, Internet-mediated, and globally networked. It is high time to move
beyond ‘digital’ as the default modifier, grammatically and conceptually speaking, especially because to be
useful at all, the term must be defined each time it is used, to identify the specificity of meaning within context
(for more on this injunction, see also Bakardjieva, 2012; Deuze, 2012; Hine, 2015; Horst and Miller, 2012;
Markham, 2017; Turkle, 2011).
Page 3 of 12
SAGE SAGE Reference
© Uwe Flick, 2018
Below, I offer a framework for digital researchers to consider how data, first as things, can be conceptually
specified as: a) background information, b) emergent materiality, c) fragment of artifact, and d) evidence. I
then shift to the second frame: a discussion of data as an ideology, initially discussing the prevailing data
science ideologies and later offering an alternative that is more aligned with interpretive ontologies. I conclude
the chapter by returning to the basics of strong qualitative approaches as a reminder of the importance of –
in this era of data-everything – troubling and reflexively tweaking research design concepts so that they work
for us rather than the other way around.
Data as thing5
It's easy to conceptualize data as a thing since this is how it is presented to us in everyday advertisements, in
statistical graphs, and poll percentages. Long before digital archives, geolocation tracking apps, self-tracking
devices, and SurveyMonkey, humans transformed actions, behaviors, and objects into units of information
that can be examined closely. In ancient times, we charted the position of stars, giving common shapes to
the heavens and providing maps for humans to navigate without land-based reference points. Transforming
motions into discrete units in the early 1900s, Frank and Lillian Gilbreth, along with Frederick Taylor, founded
a still current practice of using time and motion studies to standardize movements of workers on factory
assembly lines or behind the counter at any fast food restaurant, like McDonald's. Transforming beliefs into
numeric values is standard practice in psychology and sociology, where we might measure everything from
personality to voting preference.
In addition to dissecting and separating a whole into component parts, datafication is also the result of
abstraction, whereby the complex is rendered sensible by means of simplification and categorization. The
manner of abstraction depends on one's goal, of course, but in any case, the outcome of any abstraction will
be largely if not entirely based on what is focused on and therefore what is seen.6
Whatever we call data, therefore, is the material result of a series of choices made at critical junctures. An
important quality of a qualitative approach is to pay attention to and honor the complexity of lived experience.
Generally, this means researchers transform practices, experience, conversations, and movements into
standardized units with great care, only after long consideration about whether or not this ‘theme’ or ‘code’
adequately captures the essence or meaning. This is not about condensing meaning, but trying to grasp what
is essentially complex. The idea of ‘data’ sits uncomfortably within this effort. As experience is categorized to
make sense of a situation or phenomenon, or more colloquially what Marcus (1998) would call ‘what is going
on here', the process is, through and through, one of interpretation. The key to good qualitative inquiry is
honing this interpretive strength. Thus, data can become a double-bind term in qualitative research because it
focuses our attention on exactly the wrong part of the process, yet at the same time gives a common ground
vocabulary to help others identify the focus of our gaze.
The basic starting point for working through and around this double bind is to more precisely define what
we mean when we're using the term data. This will help us find distinctions and clarify what it is not. There
are many useful textbook(ish) guides or typologies (see, for example, the typology described by Lindgren,
Chapter 28, this volume) to help us conceptualize data in a broad sense. Here, I specify four ways in which
we can think about data as a thing:
• 1.
First, data as background information: what we use to search for good research questions. It is the
stuff7 that helps us engage in a process called pattern recognition.
• 2.
Second, data as emergent: the stuff, material or otherwise, that emerges as you focus – on a
phenomenon, on a situation, on a text, on a stream of tweets.
• 3.
Third, data as a fragment of an artifact: whether we say it's emergent, found, or made, data is a
partial, non-representative signal or indicator of something else.
• 4.
Fourth, data as the word we use for evidence: the stuff we use to support a claim, focus readers’
attention on something, or provide details in larger arguments.
Page 4 of 12
SAGE SAGE Reference
© Uwe Flick, 2018
1. Data as Background Information
Data helps us search for patterns. In this sense, we can use pre-existing or create data sets to wander around,
searching for interesting occurrences or trends. In a playful way, one can experiment with different analytical
tools to see how these occurrences, interactions, or trends might be visualized. In doing so, we might be
hoping a research question will emerge. We might be looking for verification of a hunch, to locate a specific
target for inquiry. Many tools allow us to access and mess around in massive data sets. Data scraping8 is a
common tool designed to gather lots of information. Scraping can be used to search for a bigger picture or to
find multiple standpoints. Here, we might be seeking a broad view to counter our own seeing, which cannot
help but be situated in what is one's own ‘filter bubble’ or ‘echo chamber'. Scraping can be interesting and
might identify some patterns, but we must calculate the payoff of such a cumbersome and multistage process.
Patterns are not the end point, but the beginning point. After the identification of patterns, the qualitative
researcher begins to test whether the patterns articulate or point to valuable questions to ask. Here, then,
data is used as background information to inform one's research inquiry. It informs the direction and type of
interpretation that will follow.
Importantly, it is not necessary to use computation to identify patterns. Indeed, the hallmark of qualitative
inquiry is that it is driven not by data but by our questions. These questions emerge because we've already
noticed patterns in our own prior contact with, connection to, or immersion in the phenomenon. One's study
of any digital media context is likely not coming out of the blue, but as a result of the informal inquiry we were
engaged in before we called it research. The data for such background knowledge may therefore be different
than any archived dataset one might access.
2. Data as Emergent
In qualitative inquiry, anything we consider to be an object for analysis is generated by our choice to focus on
certain particularities versus others. When we choose an angle for our attention, whether we are an algorithm
or a person, certain data will become more likely or possible than other options. So, data is not pre-formed,
but is made, apparating9 when we choose to focus on it.
Qualitative inquiry is an emergent process; informal interest or immersion in a particular context drives us
to ask scholarly questions. At that point, we begin to define that which we study in more concrete terms
and make official those things we've likely been doing a long time. We start to write down or track our
observations, record our conversations with people, and archive images, texts, or other materials. Some
aspects of the situation fed our interest long before we started collecting anything we might label as data.
When we identify these activities as part of our research process, we can shift our notion from data as an all-
encompassing or a priori object into a more nuanced notion that data emerges as ‘object’ or ‘data’ because
we have decided to focus on a particular question, which highlights certain material and obscures the rest.
This conceptualization of data as an emergent rather than a priori aspect of inquiry foregrounds the
importance of the questions we cannot help but ask when we focus our gaze in specific directions. Focusing
on the questions allows the possibilities for what counts as data to blossom. We are no longer trapped by
thinking that we only collect what is visible or archivable, or that we must collect all of whatever it is.
3. Digital Data as Fragments or Artifacts
Often, we collect data about people, places, or things when we don't have access to the original context.
Alternatively, data are10 continually produced or derived from lingering traces of presence or actions that
are caught and stored automatically or for unrelated reasons (e.g. time and date stamps in the metatag
information on photos, or location data that might be collected coincidentally by a car hire company and end
up passing through many database aggregators to then end up in the hands of social media marketers).
When we reflect on what data represent (browsing through this book as well as other ongoing critiques of
data, big data, and datafication), it should be clear that data are always a partial representation of some
specific aspect of a thing, not the entirety. As Baym (2013) and others have argued, even the most robust
Page 5 of 12
SAGE SAGE Reference
© Uwe Flick, 2018
digital data metric yields a partial and unrepresentative sample. As she argues, this is not a technical glitch
to be solved, but a characteristic of human interaction, which naturally occurs in ways that are not traceable
because these interactions are not signals given or given off, in the Goffmanian sense, but meanings
emerging from and happening throughout interactions. Her argument is one of many that problematizes the
mistaken assumptions that everything that is digital qualifies as data and everything that happens through
digital media is capturable as data.
Data are like pot shards to an archeologist, fragments of information that can be used, along with other
sensibilities, to make sense of the situation. Put together, these shards might construct the likeness of the
thing, but they can never be the thing itself, as Latour (1999) noted about soil in the Amazon. Because
meaning occurs outside that which can be collected as data, these fragments might convey meaning to us,
but only after they are pieced together, situated after the fact into a context that fundamentally no longer
exists. As Geertz (1973) eloquently noted about our efforts to interpret culture, ‘what we call our data are really
our own constructions of other people's constructions of what they and their compatriots are up to’ (1973, p.
10).
In a practical sense, as many authors in this Handbook discuss, collecting data is a deliberate sampling
method that requires careful analysis about what is actually represented. If we think of data as clues,
presented in fragments and moments, it can help us determine what, among this and other stuff, are needed
to fully address one's research question. Extending my repeated theme of the chapter, the key to good digital
data collection is not bound to either the data or the methods, but rather to the good fit between what is
available and what questions one can feasibly and sensibly address.
4. Data as Evidence
This last point highlights how we use data. Starting from the outset of a project, research design emerges
from early immersion in a context, where one's questions (research questions or just everyday questions)
prompt a direction for the analytical gaze. As these questions turn our heads, data emerges that helps us
address these questions. The entire process may appear on the surface to be data-driven, because there
is a tight rhythm between stimuli and sensemaking. Put more simply, we are in a dynamic relationship with
the contexts we study. As we enter, move within, and move through this relationship, it might initially seem
strange, but over time, we start to make sense. This iterative and reflexive dynamic – as a long string of
scholars have articulated in better ways than I ever could – narrows our attention. Data might be used (or
become) our focus (object) of analysis but in a larger sense, the process of analysis and interpretation yields
an account, whereby we incorporate data as supporting evidence.
To finish this section, it's important to reiterate that the idea of data is highly ambiguous. While useful, these
four definitions are not mutually exclusive or all inclusive. Many such typologies exist; this one focuses
specifically on how research is designed if data is conceptualized as a thing. It should be considered a starting
point for specifying the many meanings that data inherits throughout different stages of the research project.
Data as ideology
The shift toward ‘big data’ in the first decade of the twenty-first century may have been a pragmatic adoption
of a term to describe massive amounts of information and the exponential growth of computational power,
but it has strengthened a steadily encroaching ideological shift that includes the growing operationalization of
human experience into discrete data points, numeric units, or pre-determined categories of meaning, which
can, because of their concrete qualities, be captured, recombined with data points from other humans, and
analyzed to find large cultural patterns. This influences all forms of scientific inquiry, including qualitative
inquiry.
Over the past 300 years, the concept of data shifted from something that could be taken for granted in
argument, to that which is pre-semantic, pre-factual, and pre-analytical (Rosenberg, 2013). As a matter of
rhetoric, Rosenberg continues, data has become understood as distinguishable from ‘fact’ in that it (data) is
irreducible: ‘When a fact is proven false, it ceases to be a fact. False data is data nonetheless’ (2013, p. 18).
Page 6 of 12
SAGE SAGE Reference
© Uwe Flick, 2018
The rhetorical power of such logic is potent. When events, people, behaviors, interactions, or other dynamic
human processes are framed as data that can be collected, these complexities necessarily end up gaining
objective, or even obdurate shapes and qualities, because the baseline conceptualization of data specifies
that data can be held, measured, and aggregated with other things.
It is important to recognize and actively counter this ideology since it is both alluring and deeply flawed. In
qualitative research, it can trap us into believing that we have access to complete records of what happened,
or can capture every element of the social situation, when even in the most surveilled and archived situations,
this is not possible. Baym (2013) reinforces this point in her analysis of the limitations of how social media
is measured through various metrics, noting, ‘it has never been more essential to remind ourselves what
data are not seen, and what cannot be measured'. Social media metrics, she reminds us, are not only non-
representative samples, they are also skewed by algorithms that foreground some content over other content.
Most of all, she continues, ‘their meanings – seemingly so obvious – are inherently ambiguous'. Ultimately,
she concludes, ‘it is not clear what visible social media metrics might mean or, more accurately, what range
of meanings they collapse'.
Despite the good sense we might apply in situ during our qualitative studies, it is difficult to resist the ideology
behind big data. As boyd and Crawford (2012) aptly note, ‘Big Data not only refers to very large data sets and
the tools and procedures used to manipulate and analyze them, but also to a computational turn in thought
and research (2012, p. 3, drawing on Burkholder, 1992). This computational turn, they continue, functions as
orthodoxy, radically shifting how we think about research (2012, p. 3–4). The undercurrent of the past five
years of big data discourse is one ‘where all other forms of analysis can be sidelined by production lines of
numbers, privileged as having a direct line to raw knowledge’ (2012, p. 4).
Even though we might know better when we give it careful consideration, this ‘direct line’ boyd and Crawford
mention seems to work in our everyday experiences with our platforms, apps, and devices, where big data
computation is almost magical in its eerie accuracy. Take personalized advertising, which now occurs almost
instantaneously online. Predictive modeling has become a sophisticated tool for marketing, whereby snippets
of code, algorithms in this case, are used to filter and process massive data sets from multiple sources. Just
a few clicks are needed to activate thousands of processes that occur over distributed computer networks.
There is a fuzzy relationship between what you click on and what you desire, but the result of this algorithmic
processing can appear seamless. It is only recognized as magic when the algorithms fail and we see ads that
are totally off target. Otherwise, the system just works. I see bargains and sales on items I pause on in my
Instagram feed. News is targeted and fine-tuned depending on friend networks, organizational affiliation, and
geolocation. This same sort of computation can work on billions of data points across the genome to locate
the most precise genetic markers and, through DNA analysis services such as 23andMe, find genealogical
lines for us that we never knew existed.
Repeated exposure to such computational finesse can function powerfully to reinforce the premise that data
itself gives us insight, that data simply speaks for itself. However, when we thingify everything, we set and
spring multiple traps. We set a flawed premise in place when we separate, flatten, and equalize everything
as a unit of information. More subtly, we can be snared by our own common premise that ‘data analysis’ is
the underlying framework for our studies. Over time, as Rosenberg's careful rhetorical analysis of the term
highlights, prioritizing data as the central element of a study can mislead readers of our research to believe
that data is not simply one of many sources for analytical inspiration, but the entirety of what the researcher
needs. This is not to say we should dismiss data as being unimportant in our research projects. Rather, the
goal may be to shift the ideology back to the core of qualitative epistemologies, to refocus our attention.
Let me offer some provocation: For the qualitative researcher, data is a red herring of the brightest color. It
directs our attention to objects, pieces of texts, and the outcomes of interactions rather than the interactions
themselves, all the while distracting us from the point that this is not where meaning resides. Data, as an
ideological concept, beguiles us to think that it is the reality, when in fact it is only a fragment or artifact, not
even of the whole, but only of whatever we chose to focus on as we made decisions that led us to where we
are situated. Focusing on data can easily distract us from the much larger and complex process and practice
of qualitative inquiry. If we're going to use the term ‘data', we must co-opt it through an interpretive ontology.

Page 7 of 12
SAGE SAGE Reference
© Uwe Flick, 2018
In 2004, I articulated questions that, for me, were crucial for the ethical and nuanced study of people
embedded in digital contexts:
As researchers and members of various communities and cultures, what do we use to construct a sense of
who the Other really is?
In what ways do our methods of comprehending [social life] either disavow or validate multiplicitous,
polyvocal, ever-shifting constructions of identity?
To what extent do we acknowledge our own participation in the construction of the subject of inquiry?
(Markham, 2004, p. 372)
These are still vital questions, in that they cut to the heart of what many of us seek when we study people.
These questions frame an ethically sensitive approach for digital research design, returning specifically to
the relationship between the researcher and that which is researched and what happens when we extract
certain elements of experience from its moment and transform it into something more abstract than what it
was. Transformation and abstraction are necessary to sensemaking. These two activities are the outcomes of
decisions we make at critical junctures throughout the process of inquiry, each of which impacts what is later
understood to be known.
In this sense, we only ever make data. When we pick up various tools to collect, manage, sort, and
analyze any phenomenon, we are choosing specific lenses through which we can better comprehend the
subject. These tools, whether in the form of philosophical prose, theoretical premises, conceptual models, or
methodological techniques, will help us focus our attention. Combined, they enable us to carve out meaning
from the many overlapping contexts of the situation. Our unique set of tools also hides, obscures, minimizes,
sets aside, and otherwise filters out other possibilities. Our senses are rarely, if ever, unhooked from these
filters, many of which are so taken for granted as natural ways of knowing that we don't notice how with them
we encapsulate, control, and otherwise trap the Other into boxes that fit them perfectly because their potential
has been shaped to make the perfect fit.
Using Interpretive Ontologies as an Active Disturbance to Data Science
This century marks the age of distraction. Amazon's ‘Alexa', my latest gadget, is an electronic sensor in my
house that also conveniently functions as a speaker for my music. This IoT, or Internet of Things, listens
to me while I move around the house. I talk with it and as I do, it learns my habits. We can connect this
sensor to our lightbulbs, thermostats, and health care facilities. The value of this device is convenience. The
amazement, contentment, and occasional irritation I feel is a distraction from the facts that I must pay for
many additional services to get it to work seamlessly, it continually collects and shares data about me with
multiple third parties, and it is changing how I make queries in my head.11
As qualitative researchers in this century, we need an ideology that provokes, raises questions, keeps us out
of our comfort zone. I find an occasional shake-up necessary for myself, because the world around us is so
deceptively ‘figured out'. In this decade, we don't question as much why the interface of Facebook looks as it
does. News is fed (through the feed). All friends are the same (unless we choose otherwise). In this decade,
we don't remain disturbed long enough for us to shift our methods to fit a critical mindset in any sustained
way. Consider how quickly we forget that the US, as Snowden revealed in 2014, collected massive data on
its citizens without justification; or that various social media platforms like OKCupid or Facebook experiment
with the emotional responses of citizens without letting us know we're part of a laboratory study in the guise
of A/B testing.12 These are only two of many critical turning points with actual and serious consequence for
the shape of future societies.
We need to remain disturbed by the rising tide of datafication. Data analytics give us powerful tools to see,
combine, compute, and understand phenomena like never before. This can have a remarkable and positive
impact, so the data era is, in many ways, a wonderful and astonishing time to be a researcher. At the same
time, these results coincide with a strong trend across all industries toward transforming ‘humans (and their
data) into data’ (Grinter, 2013, p. 10). Qualitative researchers are uniquely equipped to resist deeply flawed
Page 8 of 12
SAGE SAGE Reference
© Uwe Flick, 2018
but popular ideas that we should collect and archive everything because we have the technical ability to do
so; that data is neutral and objective; that data can be stripped and cleaned adequately to protect individuals;
or that given the right technologies, data collection can be complete.
An ideology of data for qualitative researchers necessarily includes competing elements, or even dialectics.
On the one hand, if we maintain focus on the question, our position, and what is going on around us, as we
study the phenomenon, we need not really worry about data at all. We're finding it all the time, generating it
as we sense and pay attention to patterns in our own unique, human ways. As we continue to focus attention
on the situation, data will continue to be created as we let certain stuff emerge from the larger flows of social
life. Likewise, other data will be lost to us, not because we once had it and we lost it, but because it is not part
of what we are attending to. For some other researcher, a different set will emerge, because of their particular
way of seeing and attending to the details. And in the end, we'll select certain data and ignore others to tell
the rest about what we think we know. This is a natural part of storytelling as well as science.
On the other hand, qualitative researchers should not ignore data, datafication, or the research designs that
undergird data science. We need to comprehend the logics of computational research design to combat it
when necessary. For example, the ability to aggregate and cross reference is used for anything from building
location-based apps that label certain neighborhoods as ‘sketchy', as in the case of SketchFactor, an app that
eventually shut down from accusations that it was racist (Marantz, 2015), to conducting predictive policing of
citizens (Brayne, 2015), a science and practice that gains more ground annually. In both cases, we should be
disturbed that data is used in these ways because these sciences rely on algorithmic abstraction that is not
infallible; but to adequately respond, we must understand the nuances of how the second case is the result
of vastly different research efforts than the first.
This does not mean we should give ground regarding the fundamental principles or practices of qualitative
inquiry. As I have written elsewhere:
One type of response to the rise of data, datafication and big data has been to defend ethnographic research
within the discursive frame of data, insisting that ethnography is about ‘small data', ‘all ethnography is big
data', or ‘big data needs thick data'. These responses help justify ethnography but yield epistemological
ground, so that the entire baseline for appropriate ethnographic inquiry shifts to a new register. (Markham,
2017, p. 8)
When qualitative researchers use phrases like ‘data collection', it should be with clarity that this terminology
is more a convenience for separating and managing materials or a rhetorical tactic than an ontological claim.
As a strategic choice, this terminology usefully situates one's research inside the sciences within research
institutions. It helps social studies to be accepted as social sciences.
Meanwhile, we cannot forget that the impact of totalizing the ‘stuff’ of culture as data that can be collected,
measured, and analyzed through scientific methods is devastating. The power of computation is undeniable.
By continually collecting and archiving information, as well as transforming historical paper documents into
digital form, we can use data science methods to see significant political, biological, climate, and social trends.
As datafication is normalized, those qualities of inquiry that are not quantifiable become abnormal. How can
we measure embodiment, emotion, and other complexities that function as material evidence as well as
an interpretive process (Davies and Spencer, 2010)? We may insist that qualitative inquiry is fundamentally
interpretive, but this is a difficult stance to maintain when the systems surrounding the methods for qualitative
inquiry actively de-legitimize interpretation as an individual, subjective, and key role in the process.
The ongoing risk, which the interpretive movement has long sought to combat (see the many authors in the
edited collections of Denzin and Lincoln, e.g. 2017), is that qualitative inquiry is subsumed within a larger
paradigm of data science, whereby a qualitative perspective is seen to contribute merely a type of analysis,
rather than a worldview. This is exacerbated by the division of methods into the faulty categories of qualitative
and quantitative, which implies that the enterprise is essentially the same except that the method of handling
and analyzing data differs.
To pre-empt and resist this trend, we must continue to highlight and explore what qualitative modes of inquiry

Page 9 of 12
SAGE SAGE Reference
© Uwe Flick, 2018
are for and how these ways of knowing are performed. This means, at the very least, promoting interpretation
as the more central consideration than data; defining data more precisely and variously as it is actually used
in one's study; and finally, highlighting interpretive conceptual frameworks and ideologies.
Conclusion
Some of the authors in Part IV of this volume propose novel ways to deal with, or think about, data within
qualitative analysis of complex digital situations. They all argue for the same thing: qualitative research is
driven by context and question, not data. They all understand that collecting digital data is a misnomer. As
we make choices along our pathways to meaning, we continually sample from the past and present material
world, whether that is represented in atoms or bits. We speculate, using a what-if or other future-oriented
focus. We play around with ideas, move them around on paper or in concept maps. The authors represented
here each understand, sometimes through trial and error, that putting data at the forefront of qualitative inquiry
is the worst sort of trap because it deludes us into thinking we have already collected the knowledge when, in
actuality, what qualitative inquiry produces is a bricolage of multiple voices, actors, and perspectives filtered
through our own unique gaze and interpretive lenses.
Notes
1. It's difficult to pinpoint only one example of the tsunami of Tweets and news stories about ‘data failure', but
it's representative in Republican strategist Mike Murphy's early statement on MSNBC and Twitter that ‘data
has died', Wired's later report, ‘Trump's Win Isn't the Death of Data – It Was Flawed All Along’ (Metz, 2016)
and a November 10 story by the New York Times, ‘How Data Failed Us in Calling an Election’ (Lohr and
Singer, 2016).
2. This ongoing and important discussion can be followed across the edited collection by Lisa Gitelman, 2013;
a special issue of First Monday in 2013; the special issue of Media, Culture, & Society in 2015 that responded
to the landmark article by boyd and Crawford in 2012; the blog series coming out of Data & Society, and the
‘data ethnographies’ series emerging from the Digital Ethnography Research Centre at RMIT, 2015.
3. For a short review of some key shifts in how we're thinking about data processing in the era of big data,
see Ouelette (2013).
4.https://en.oxforddictionaries.com/word-of-the-year/word-of-the-year-2016.
5. I use singular versus plural grammar deliberately here, to reflect my argument that it is deceptively easy
to treat data as if it has an incontrovertible ‘itness'. While some may argue that the plural is grammatically
correct, ‘it seems preferable in modern English to allow context to determine whether the term should be
treated as a plural or as a collective singular, since the connotations are different’ (Rosenberg, 2013, p. 19).
6. For an excellent popular history and discussion of data, datafication, and big data, see Schonenberg and
Crokier (2014).
7. Here, I really do mean ‘stuff’ rather than any other word. In this sense, I take inspiration from various
sources that the Stanford Encyclopedia of Philosophy calls ‘stuff ontology’ (Steen, 2012). If we temporarily
carve out information for inspection from the universe of matter/thought or, as Karen Barad (2007) might say,
‘material discursives', an appropriate term for this selection is ‘stuff'. It is not data, not object, not matter, and
not thing, but an amalgam or combination that, if not countable, might be measurable, and if not measurable,
at least analyzable.
8. The recent invention of ‘scraping’ allows one to gather information from a website, platform, account, or
database and transform it into something that other programs can read, or that is easier to work with because
it is standardized data. For purposes of this example, we can use the simple definition of web scrapers by
developer Macwright: ‘web scrapers do the task of a very industrious user – they navigate websites, parse
pages, and save information in bulk’ (2012, n.p.). Information is transformed from one state to another to
Page 10 of 12
SAGE SAGE Reference
© Uwe Flick, 2018
create a standardized dataset that can be read or used by other programs. This is often used for large-scale
qualitative analysis, aggregation and computation, or storage.
9. I mean apparate versus appear, because it conveys the sense of magically appearing. The term originates
in the fictional series, Harry Potter. I suppose we could also use Schrödinger's cat as an illustrative example.
10. In this section I return to the plural form of data, since it makes more sense in this category.
11. This is not unique to Amazon's ‘Alexa'. To note, my statement here is part of a larger set of arguments
about how media influence us, starting with McLuhan's famous and still-important observations that the
‘medium is the message’ (1964).
12. To read more about these cases, see Greenwald (2013), Luca (2014), and McNeal (2014) respectively.
Further Reading
Gitelman, Lisa (ed.) (2013) Raw Data is an Oxymoron. Boston: MIT Press.
Markham, Annette and Baym, Nancy (eds.) (2009) Internet Inquiry: Conversations about Method. London:
Sage.
Pink, Sarah, Ardevol, Elisenda, and Lanzeni, Débora (eds.) (2016) Debora Digital Materialities: Design and
Anthropology. Huntingdon: Bloomsbury.
References
Bakardjieval, M. (2012) ‘Reconfiguring the mediapolis: New media and civic agency', New Media Society,
14(1): 63–79.
Barad, Karen (2007) Meeting the Universe Halfway: Quantum Physics and the Entanglement of Matter and
Meaning. Durham, NC: Duke University Press.
Baym, N. (2013) ‘Data not seen: The uses and shortcomings of social media metrics', First Monday, 18(10)
available at http://www.ojphi.org/ojs/index.php/fm/article/view/4873/3752.
boyd, d. and Crawford, K. (2012) ‘Critical questions for big data', Information, Communication & Society,
15(5): 662–79.
Brayne, S. (2015) Stratified Surveillance: Policing in the Age of Big Data. Unpublished dissertation.
Davies, James and Spencer, Dimitrina (eds.) (2010) Emotions in the Field: The Psychology and Anthropology
of Fieldwork Experience. Stanford, CA: Stanford University Press.
Denzin, Norman K. and Lincoln, Yvonna S. (eds.) (2018) The Sage Handbook of Qualitative Research,
5th ed.
Thousand Oaks, CA: Sage.
Deuze, Mark (2012) Media Life. Cambridge and Malden, MA: Polity Press.
Geertz, Clifford (1973) The Interpretation of Cultures: Selected Essays. New York: Basic Books.
Gitelman, Lisa (ed.) (2013) Raw Data is an Oxymoron. Boston: MIT Press.
Greenwald, G. (2013, June 6). NSA collecting phone records of millions of Verizon customers daily. Accessed
January 1, 2017 from: https://www.theguardian.com/world/2013/jun/06/nsa-phone-records-verizon-court-
order.
Grinter, B. (2013) ‘A big data confession', Interactions, 20(4): 10–11.
Hine, Christine (2015) Ethnography for the Internet: Embedded, Embodied and Everyday. Huntingdon:
Bloomsbury.
Horst, Heather and Miller, Daniel (2012) Digital Anthropology. London: Berg.
Latour, Bruno (1999). Pandora's Hope: Essays on the Reality of Science Studies. Cambridge, MA: Harvard
University Press.
Lohr, S. and Singer, N. (2016, November 10) ‘How data failed us in calling an election', New York Times.
Accessed January 1, 2017 from: https://www.nytimes.com/2016/11/10/technology/the-data-said-clinton-
would-win-why-you-shouldnt-have-believed-it.html.
Luca, M. (2014, June 29) ‘Were OkCupid's and Facebook's experiments unethical?', Harvard Business
Review. Accessed January 1, 2017 from: https://hbr.org/2014/07/were-okcupids-and-facebooks-experiments-
Page 11 of 12
SAGE SAGE Reference
© Uwe Flick, 2018
unethical.
Macwright, T. (2012) ‘On scrapers', Blog entry. Accessed January 1, 2017 from: http://www.macwright.org/
2012/09/06/scrapers.html.
Marantz, A. (2015, July 29) ‘When an app is called racist', New Yorker Magazine.
Marcus, George E. (1998) Ethnography Through Thick and Thin. Princeton, NJ: Princeton University Press.
Markham, Annette (2004) ‘Internet as research context', in Clive Seale, Jaber Gubrium, David Silverman, and
Giampietro Gobo (eds.), Qualitative Research Practice. London: Sage, pp. 358–74.
Markham, Annette (2013) ‘Remix culture, remix methods: Reframing qualitative methods for social media
contexts', in Norman K. Denzin and Michael Giardina (eds.), Global Dimensions of Qualitative Inquiry. Walnut
Creek, CA: Left Coast Press, pp. 63–81.
Markham, Annette N. (2017) ‘Ethnography in the digital era: From fields to flow, descriptions to interventions',
in Norman K. Denzin, and Yvonna S. Lincoln (eds.), The Sage Handbook of Qualitative Research,
5th ed
. Thousand Oaks, CA: Sage, pp. 650–68.
Markham, A. N. and Lindgren, S. (2014) ‘From object to flow: Network sensibility, symbolic interactionism,
and social media', Studies in Symbolic Interaction, 43: 7–41.
McLuhan, Marshall (1964) Understanding Media: The Extensions of Man. New York: McGraw Hill.
McNeal, G. (2014, June 28) ‘Facebook manipulated user news feeds to create emotional responses',
Forbes Magazine. Accessed January 1, 2017 from: http://www.forbes.com/sites/gregorymcneal/2014/06/28/
facebook-manipulated-user-news-feeds-to-create-emotional-contagion/#4c9ef6cc5fd8.
Metz, C. (2016, 9 Nov) ‘Trumps win isn't the death of data – It was flawed all along', Wired. Accessed January
1, 2017 from: https://www.wired.com/2016/11/trumps-win-isnt-death-data-flawed-along/.
Ouelette, J. (2013, June 9) ‘The future fabric of data analysis', Quanta Magazine. Accessed December 9,
2017 from: https://www.quantamagazine.org/20131009-the-future-fabric-of-data-analysis/.
Rabinow, Paul, Marcus, George E., Faubion, James D., and Rees, Tobias (2008) Designs for an Anthropology
of the Contemporary. Durham, NC: Duke University Press.
Rosenberg, Daniel (2013) ‘Data before the act', in Lisa Gitelman (ed.), ‘Raw data’ is an Oxymoron.
Cambridge, MA: MIT Press,pp. 15–40.
Steen, M. (2016) ‘The metaphysics of mass expressions', in The Stanford Encyclopedia of Philosophy (
Winter 2016 Edition
), Edward N. Zalta (ed.), URL = https://plato.stanford.edu/archives/win2016/entries/metaphysics-
massexpress/.
Turkle, Sherry (2011) Alone Together: Why We Expect More from Technology and Less from Each Other.
New York: Basic Books.
http://dx.doi.org/10.4135/9781526416070.n33

Page 12 of 12

2018 Troubling The Concept of Data in Qualitative Digital Research The-Sage-Handbook-Of-Qualitative-Data-Collection - I3422

Uploaded by

Copyright:

Available Formats

You might also like

2018 Troubling The Concept of Data in Qualitative Digital Research The-Sage-Handbook-Of-Qualitative-Data-Collection - I3422

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

2018 Troubling The Concept of Data in Qualitative Digital Research The-Sage-Handbook-Of-Qualitative-Data-Collection - I3422

Uploaded by

Copyright:

Available Formats

The SAGE Handbook of Qualitative Data

Contributors: Author:Annette N. Markham

Troubling the Concept of Data in Qualitative Digital Research

Troubling the Concept of Data in Qualitative Digital Research

The importance of disturbing concepts

1. Data as Background Information

3. Digital Data as Fragments or Artifacts

The SAGE Handbook of Qualitative Data Collection

Using Interpretive Ontologies as an Active Disturbance to Data Science

The SAGE Handbook of Qualitative Data Collection

The SAGE Handbook of Qualitative Data Collection

You might also like