BUZATO - 2017 - Capitulo - Critical Data Literacies

Construções de sentido e letramento digital CrítiCo
na área de línguas/linguagens
BUZATO, M. Critical Data Literacies: going beyond words to challenge the illusion of a literal world. In:
TAKAKI, N. H.; MONTE MOR, W. (Eds.). . Construções de sentido e letramento digital crítico na área de
línguas e linguagem. Campinas, SP: Pontes Editores, 2017. p. 119–142.
CRITICAL DATA LITERACIES: GOING BEyOND WORDS TO

CHALLENGE THE ILLUSION OF A LITERAL WORLD
Marcelo El Khouri Buzato
The ideal of an education committed to promoting social

agenc and social transformation in the pursuit of a fuller humanit
in toda and tomorrow’s world requires that we keep on training
not onl criticall literate students, but also criticall literate
teachers, teacher educators and teaching education researchers.
From the Freirean perspective advocated here (FREIRE, 2000),
critical literacies would be those that require and hone one’s
abilities and disposition not onl to develop an awareness of
causal chains among elements and situations previousl hidden
behind supposedl neutral words, but also to uncover muted words
behind supposedl neutral situations that must be interpreted and
talked about. I will argue that two major twists to the Freirean
conceptualization must be urgentl considered: the multimodal,
multimedia, multisemiotic nature of texts, and the (in)effectiveness
of seeking to uncover cause-consequence mechanisms when
“correlation” threatens to replace “causation” as the primar
strateg in the knowledge/truth and power-seeking regimes of our
epoch. The three examples that follow illustrate the case.
First example: a global manufacturer of wearable devices has
recentl informed that, on average, people in Japan sleep less
than six hours per night while people in Dubai often sleep until
11:00 a.m. (WALCH et al, 2016). The know it from compiling data
off the digital health tracking devices the sell around the world.
119
But, what do those numbers mean? That wealth Westerners are

laz whereas the Japanese work hard? That sleep-deprived people
like fish while lie-a-beds prefer lamb? The two conclusions are
equall stupid, of course. The first uses a correlation between
biometric and biographical data to support a cultural stereot pe;
the second pushes the correlation envelope under an absurd
cause-consequence relation. Both, however, come with a kind and
a volume of data that were previousl impossible to get.
Second example: giant Chinese e-retailer Alibaba found out
that the larger breasts a (Chinese) woman has, the more mone she
tends to spend in her shopping overall. The know it from tracking
the sales data on brassieres and seeking patterns with no previous
model – otherwise, who would’ve thought of that? Well, in China,
there is a correlation between higher income and engagement with
western beaut stereot pes, including cosmetic surger for double
e elids and breast implants (WEN, 2013). But two correlations put
together don’t necessaril make a cause-consequence explanation,
do the ? The onl “sure” thing this correlation can mean, which is
actuall the onl thing that matters to Alibaba, is that a good wa
to improve revenues is to make frequent sales on larger-size bras.
There is a “critical wa ” to read this situation that is not based on
causalit , but on a certain t pe of “correlation”: one could think
of Alibaba’s data scientists (s mbolicall ) probing their female
client’s bod in order to reach deeper into their pockets as related
(metaphoricall ?) to a brothel’s habitué probing into his own pocket
in order to reach a “purchasable breast”.
Third example: Google Flu/Dengue Trends was a service created
in 2008 that could allegedl predict flu and dengue outbreaks in
an region of the US up to ten da s in advance. It did so b relating
data on flu/dengue-related word searches (cough, fever, headache,
etc.) and information about flu/dengue-related doctor visits from
government databases. After initial success, the s stem began to
fail miserabl until it was quietl discontinued in 2015. Wh did
it fail, despite the volume and “reliabilit ” of data? Because the
120
causal h pothesis was wrong: people use flu/dengue-related words

for man reasons (as in homon ms, metaphors, looking for news
about diseases, looking for poems, because the feel cold or a
movie gives them the chills, and so on); in short, it failed because
correlation could not make up for a computer’s inabilit to deal
with situated meanings in prett much the same wa a patient is
often unable to tell what a “situated s mptom” reall means without
the help of a doctor who “speaks” causalit properl .
These examples illustrate what could be called two new wa s
to go back and forth between words and the world toda . First,
words have lost quite a lot of their prestige in representing and
decoding the world. It is numbers, datasets, data anal tics based
on machine learning, artificial intelligence and data visualization
genres based on algorithms that are expected to unveils the world.
And these forms of representation and “knowledge extraction”,
as data scientists put it, which were previousl reserved for trul
“scientific purposes”, are now up for grabs for an one literate
enough in these multisemiotic discursive technologies in order to
seek patterns among things that are not related in culture or in
the human mind.
Second, these discursive technologies are meant to unveil
features of the world that are often less subservient to the
semantics of natural language than the are representable through
math/statistics and/or visuals. But, for most citizens, that kind of
math/statistics and visualization genres are basicall impossible to
question, not onl because the are not knowledgeable in math
and statistics, but because the “truths” represented this wa are
protected b a thick and heav veil of statistical pragmatism.
How can we engage with these new wa s of going back and
forth between words and the world in order to “update” and
strengthen the theor and practice of critical literacies in our
epoch? In particular, what strategies can be emplo ed b polic
makers and practitioners to bring this version of critical literacies to
121
the foreground of graduate studies and graduate research training

in the Humanities?
The suggestion made here is that three basic steps could
be taken in order for these questions to be answered in the long
run. First, we must have a better understanding of the concept of
Big Data and the practices of Data Science, with their respective
implications. On top of that, we should devote some attention to
what the literature from other fields of educational research calls
Data Literac . From there, we should be in position to determine
what Critical Data Literacies should mean in applied language
and literac studies, and have some idea of the complexities and
challenges inherent to such an interdisciplinar object.
Big Data, Data Science anD Data Literacy:

practiceS, meaningS anD impLicationS
Big Data
Big Data (henceforth, BD) is data that comes in high

volume (petab tes – billions of megab tes), high variet (e.g.
demographics, business transactions, tweets, biometric data,
internet of things sensors, dereferencing, browsing and calling
records, brassier sizes, etc.) and high velocit (collected and
transmitted in real time). According to Schroeder (2014, p.6), BD
represents “a step change in the scale and scope of knowledge
about a given phenomenon”.
Bo d and Crawford (2012, p. 662), who are social scientists,
provide an alternative definition of BD as “a cultural, technological,
and scholarl phenomenon that rests on the interpla of technolog ,
anal sis, and m tholog ...”. M tholog , the sa , because BD
enthusiasts and evangelists like Anderson believe it to represent a
“superior form of knowledge” that dismisses the need for theories
and/or philosophical underpinnings.
122
Put more directl , the m th sa s that, after BD, correlation

supersedes causation (KITCHIN, 2014), and no one has been as
bombastic about this allegation as Anderson (2008, para. 7):
Out with ever theor of human behavior, from lin-

guistics to sociolog . Forget taxonom , ontolog , and
ps cholog . Who knows wh people do what the do?
The point is the do it, and we can track and measure
it with unprecedented fidelit . With enough data, the
numbers speak for themselves.
Despite Anderson’s and his like-minded fellows’ naïveté in

believing that “numbers speak for themselves”, access to the
numbers and the data anal tics does matter for democratic
societies, or else companies and governments will be in an even
more privileged position to establish monological versions of “the
Truth”.
Finall , as BD gets gathered from countless sources in a plethora
of wa s, scholars are talking of an unrestricted “datafication” of the
world (MAyER-SCHÖNBERGER; CUKIER, 2013). Because datafication
is alread advanced and applies to man fields, from genomics to
urban planning, form national securit to digital humanities and
from financial transactions to biometric information, there’s an
urgent need for democratic societies to establish clearer BD ethical
standards (METCALF; CRAWFORD, 2016) so that critical citizens
can easil understand that BD is not onl a means to empowering
their social agencies, but also “a leap in how data can be used to
manipulate people in more powerful wa s” (SCHROEDER, 2014, p.4).
Data Science
Prensk (2012, p. 207), an enthusiast of BD epistemolog ,

argues that
123
scientists no longer have to make educated guesses,

construct h potheses and models, and test them
with data-based experiments and examples. Instead,
the can mine the complete set of data for patterns
that reveal effects, producing scientific conclusions
without further experimentation.
Implicit in the quote, for those who go b a more heterodox

notion of science, is a job description for a technician with
a mathematics, statistics, engineering or computer science
background known in the job market as “the data scientist”.
Heterodox because Data Science (henceforth, DS) is but an
expansion and evolution of efforts formerl applied to business
anal sis, not scientific endeavors. It incorporates computer science,
statistics, and applied mathematics into new highl automated
methods – hence the evolution – for anal zing high volumes of
data and “extracting knowledge” from them.
There are basicall three features of DS that make it a special
case in contrast with ordinar forms of quantitative anal tics: the
kind and amount of data that can be anal zed, the tools and methods
used to deal with such data and the discursive technologies used
to represent the targeted phenomena and communicate findings.
DS experts claim to be able to extract knowledge from
unstructured, “raw” and “unclean” data, that is, data from videos,
texts and human behavior, not previousl specified b categories,
ontologies and conceptual relations. Because the vast amount
of data provides high confidence intervals, it is possible to treat
patterns and tendencies as facts, without an underl ing cause-
consequence rationale. Such unprecedented statistical robustness
provides DS and DS findings with an aura of scientific facticit and
indisputabilit that makes them influential in decision-making
processes in business, government, securit and other fields.
Since there is no human-based method capable of carr ing
out such “knowledge finding” work at such scale in such a short
124
time, DS draws heavil on Machine Learning (henceforth ML) and

Artificial Intelligence (henceforth AI). ML is about “training” s stems
to acquire behaviors for which the have not been programmed.
This is done b feeding the computer with known input-output
relations and getting it to progressivel build an internal model
or path that allows for reliable output predictions for subsequent
inputs.
While ML is totall inductive, AI can also work deductivel ,
inferring contextual/environmental conditions from co-related
data and making decisions that would resemble those of an
intelligent human being in the target situation (e.g. choosing the
right translation of a homon m b considering its collocations or
pulling the brakes if a person is detected b a sensor in front of a
self-driving car).
Obviousl , the greater the amount of correlated data points
available is, the more robust ML and AI-based findings will be.
But for humans to be able to reason with and from such findings,
the have to be represented in human-friendl wa s. That’s where
Information Visualization (henceforth IV) comes in.
IV helps humans read the data in the sense of carr ing out
an “interactive data exploration” in quest of structure or patterns
that point to relationships and trends (BURLEy, 2010). IV amplifies
human cognition b expanding human working memor , facilitating
perceptual inference of relationships, supporting human perceptual
monitoring of a large number of potential events, and so on.
In addition, IV facilitates the communication of findings when
patterns/trends are easier to recognize (visuall ) than to describe
(verball ) (BURLEy, 2010).
Unlike regular scientific visualizations, IV might represent
abstract concepts and relationships that do not necessaril have
a counterpart in the ph sical world” (BURLEy, 2010, Section 2.0).
That is not to sa that such concepts and relationships have no
worldl impact. Actuall , not onl powerful social agents such as
125
governments and large businesses are rel ing more and more on
BD and data anal tics to make decisions and, but, more importantl ,
newl found correlations with no underpinning scientific theor
(such as the ones in examples 1 and 2 above) are continuousl
inviting us to reread and correlate certain traits in people, or things,
or places that simpl didn’t use to mean an thing culturall , and
using such traits predictors, contextual cues and categor markers
that, quite probabl , will become additional ps chological stressors
to people and even new social stigmas.
Dealing with IV, either as writer or as reader, requires the
integration of several literacies (visual, statistical and computer
literacies, to sa the least) in pursuit of different t pes of anal sis
(temporal, geospatial, topical, network, etc.) supported b
appropriate representation t pes (chart, graph, map, network
la out, etc.). BD, however, requires non-conventional IV, or, more
specificall , new techniques in Data Visualization1 (henceforth,
DV). In fact, the ver traits that characterize it (volume, variet and
velocit ) pose the hardest technical challenges to generating good
BD visualizations (WANG; WANG; ALEXANDER, 2015). In man cases,
because of processing limitations, data volume and data dimensions
have to be reduced. Also, the visual perceptual apparatus imposes
limits on what can be visualized b humans, e.g. high rates of image
change cause users not to react to meaningful quantit or intensit
changes on displa . Thus, those who produce the visualizations are
constantl facing a dilemma: let ever data point be shown and
overwhelm human perceptual and cognitive capacities or reduce
the data and risk neglecting interesting structures or outliers?
(WANG; WANG; ALEXANDER, 2015). Apparentl a datafied world is
often willing to tell us more than we can read and turn into words,
so those who select and clean data are, in man wa s, deciding
what is not to be read in the world.
Data that does get through to the productive stage of DV
1 Data visualization can be described as a subdomain of information that specializes in represent-

ing complex data in such a way as to provide the viewer with qualitative insights.
126
must also be represented through formats that are amenable to

multiple variables and dimensions, supporting interactive, d namic
explorations b the user. This implies the use of a whole series of
different t pes of graphs and diagrams – often integrated with
dashboards and other data-exploration and filtering tools – man of
which are totall unfamiliar to the ordinar citizen. When the right
combination of data, DV tool and visualization t pe is achieved,
however, the result makes the usual visual representations of
quantitative meanings look as abstract and lifeless as the numbers
behind them, and can even look artistic. The example in Figure
1 illustrates (see ZOSS, 2016, for a good taxonom of other data
visualization t pes).
Brazils Racial Dot Map
Figure 1 – Brazil’s Racial Dotmap (PEREIRA, 2015). Blue stands for whites, green for mixed
races (pardo), red for black, yellow for yellow/mongolian and brown for indigenous
127
Figure 1 is a one-to-one dot densit map2 comprising three

dimensions in the data form Brazil’s 2010 census made available
online b IBGE (the Brazilian Institute of Geograph and Statistics):
geographic distribution, population densit , and racial diversit .
“One-to-one” means that each dot represents one person3.
Percentage distribution of the population according to sex and race (color) in Brazil
Figure 2 - Percentage distribution of the population according to sex and race (color) in Brazil,
2009 (IPEA, 2011, p.16). Red clothing represents “black”, Blue clothing represents “white”
and yellow clothing represents “all others”.
2 Dot density maps portray the geographic distribution of discrete phenomena using an arrange-
ment of identical point symbols to represent the global distribution of a phenomenon and com-
paring relative densities of instances of the phenomenon across spatial regions. One-to-many
maps are also used, in which case every dot represents a group instead of an individual.
3 The project was developed by Pereira (2015), who replicated a similar visualization made for
the US (CABLE, 2013).
128
Percent distribution of races (colors) per state in Brazil
Whites4 Blacks5 Mixed race (pardo)6
Figure 3 - Percent distribution of races (colors) per state in Brazil according to Wikipedia
(COMPOSIÇÃO ÉTNICA DO BRASIL, 2016)
Even though Figures 1, 2 and 3 are alike in bringing more

“contextual information” to the quantitative information 7 the
represent than would a table or a pie/bar chart, Figures 2 and 3
are no match for Figure 1 when it comes to “experiencing” such
meanings as facts.
First, Figure 1 does a much better job at representing “expe-
riential meanings” per se, that is, presenting lived experiences b
reference to the processes, participants and circumstances that
constitute it and communicating movement, action and events
organized through narrative/descriptive and spatial/temporal re-
lations (O’HALLORAN, 2008).
For instance, while Figure 3 suggests racial homogeneit throu-
gh single coloring spaces within artificial border lines and racial
densit through single color gradients, Figure 1 reveals degrees of
racial homogeneit through same color dot densit and racial he-
4 By Henriquehorta (Own work) [CC BY-SA 3.0 (http://creativecommons.org/licenses/by-sa/3.0)],
via Wikimedia Commons
5 Por Henriquehorta (Own work) [CC BY-SA 3.0, (https://commons.wikimedia.org/w/index.
php?curid=25541905)], via Wikimedia Commons
6 Por Henriquehorta (Own work) [CC BY-SA 3.0, https://commons.wikimedia.org/w/index.
php?curid=25540988)], via Wikimedia Commons
7 These maps are claimed to be accurate by their authors at the time of their publication. The point
of presenting them here is merely to discuss the way they represent quantitative information,
so the (in)accuracy of the numbers is not to be considered. Reliable up-do-date information on
Brazilian demographics can be found on The Brazilian Institute of Geography and Statistics –
IBGE's website at <http://www.ibge.gov.br>, access: Apr 5. 2017.
129
terogeneit through clustering of multiple color dots regardless of

abstract geopolitical borders. Although Figure 2 does a much better
job than Figures 3 and 1 at showing that the ultimate reference
of the represented numbers are “people” (not dots or terrain), it
shows nothing of the spatial-temporal organizational and narrative
relations conve ed b the other two figures.
In addition, although both Figure 1 and Figure 2 use primar
colors to represent the color/race attribute, in Figure 1 its arguable
that this non-iconic choice is aimed at preserving the visibilit of the
data on larger scales while, in Figure 2, the iconic relations between
(skin) color and race, that IBGE supports, are downpla ed b the
unnecessar use of primar colors in the clothes. Along with the
standardized shapes of the icons and their groupings in an abstract
geometrical space (not a territor or place, as in Figures 1 and
3), this use of red, blue and ellow projects a nonconflicting and
unequivocal relation among organized and ordered people from
different races. To use skin color instead of clothing color would
perhaps bring too much concreteness to the neutralit , rationalit
and equit the chart wants to project onto the mess lived space/
experience of race in Brazil.
The messiness and liveliness of racialized geographic spaces
as an “experiential meaning” is precisel what Figure 1 is best at
showing, regardless its own attempts at abstraction and neutralit .
Even if we disregard the fact that Figure 1, when seen online, can
show color variation and shading d namicall as one zooms in and
out of the map, there is this additional kind of effect to it that is
“orientational” in nature (LEMKE, 1998; 2002), that is, a meaning
that installs itself in the stance the reader is impelled to assume
towards the text and towards the interlocutor who is offering the
text as information.
Here is the effect: in Figure 1, the sense of accurac , richness
and objectivit provided b the geospatial data instills the reader
with the funn feeling that she herself is one of those dots. This
130
sense of an objective and finel grained – almost live – indexicalit

inclines the reader to
1. evaluate the text as “realistic” even though the degree of iconi-

cit is low (dots for bodies, primar colors for skin colors);
2. feel “involved” in represented phenomenon, the same kind

of feeling one gets when watching a subjective camera shot in a
movie or using a discursive marker such as “The wa I see it is…”
in a text, simpl because of her sheer awareness of being one of
those dots;
3. take the other interactive participant, that is, the author(s) of

the map, as a voiceless, math-speaking machine who does nothing
but reveal a facet of realit that humans are not able to see b the-
mselves, and, consequentl , forget that the data used dependents
on human designs and choices based on specific ideologies such
as the “race ontolog ” (JAMES, 2017) adopted b IBGE.
To sum it up, as BD and DV remediate traditional forms of

representing the world to make them more “objective” and “un-
mediated”, the reader is invited to help filling the gap between
signifier and signified b lending both her data and her subjectivit
to the process. Critique, consequentl , can’t be onl about remo-
ving the veil of hegemonic meanings from one’s e es; it has to be
about removing the reader from the fantas of having access to a
literal world.
Big Data Literacy
Schield (2004, p.8) defines data literac (henceforth DL) as

the abilit to “access, assess, manipulate, summarize, and present
data”, which is inter-related with two other literacies: information
literac , i.e. “the abilit to read, interpret and evaluate information”
and statistical literac , i.e. “the abilit to think criticall about basic
descriptive statistics”. In Schield’s hierarchical model (Figure 4),
131
this triad supports a “critical thinking perspective” for dealing with

data, and the anal sis, interpretation and evaluation of information
involves knowing what comparisons to make among statistics,
knowing how to communicate data and findings effectivel , and
understanding the not onl the socioconstructive character of
statistics, but also the role of context in its interpretation.
Figure 4 – Schied’s (2004, p.8) critical thinking perspective in data literac
More recentl , Bhargava and D’Ignazio (2015) defined DL

as “the abilit to read, work with, anal ze and argue with data”
(BHARGAVA; D’IGNAZIO, 2015, p.1, m emphasis). B “arguing with
data”, the mean “using data to support a larger narrative intended
to communicate some message to a particular audience” (BHAR-
GAVA; D’IGNAZIO, 2015, p.1). But if DL entails understanding the
socioconstructive character of statistics and the role of context in
its interpretation, we must take “data” not onl as a means to make
a point, but also as an entit that can and ought to be engaged
and “argued with”.
It is precisel on the notion of “engagement with data” that
Data-Pop Alliance8 (2015, p. 8) bases their definition of DL: “the de-
sire and abilit to constructivel engage in societ through or about
data”. Unlike the two aforementioned definitions, Data-Pop’s (2015,
p. iv) conceptualization enmeshes technical skills with attitudes
8 “Data-Pop Alliance is a coalition created by the Harvard Humanitarian Initiative (HHI), the
MIT Media Lab, and the Overseas Development Institute (ODI) to “promote a people-centered
Big Data revolution” (DATA POP ALLIANCE, 2015, p. iii).
132
and dispositions that are at the core of an critical literac : “desire”

highlights technolog as an enabler of the human intentions and
capabilities; “abilit ” refers to a continuum of mutuall implicated
degrees and subt pes of skills; “constructivel engage in societ ”
points to active citizens with a sense of purpose to engage with
data. Finall , and most importantl , “through or about data” means
that it is possible “to engage as data literate individuals without
being able to conduct advanced anal tics” (DATA-POP ALLIANCE,
2015, p. iv), in prett much the same wa sociocultural models
consider that an one who lives in a societ that is functionall
organized around written signs is literate in a wa , even if the
can’t read or write properl . As a matter of fact, most of the highl
alphabeticall literate people in the world are, at present, data
illiterate individuals living in datafied societies. The do produce
data all the time, but have no idea what Open Data9 repositories
can do for democrac or how social media APIs10 can be tapped
into research in the social sciences or the humanities.
A view of DL as a set of social practices should not trick us
into thinking about neatl segregated skills and scripts that can
be learned or used in isolation, though. As was the case with the
digital literacies we began to be concerned with in the 1990s, it is
important to realize that ever literac , seen as a social practice, is
a network of other literacies (BUZATO, 2009; 2012; 2013). In the DL
network, according to the Data Pop Alliance (2015, p.9) group, are:
• Scientific literacy: the application of scientific concepts and

experimentation methods for personal decision-making and civic
participation;
9 Open data is “data that can be freely used, re-used and redistributed by anyone - subject only, at
most, to the requirement to attribute and share alike” (DIETRICH et al, 2009). It abides by the
same general “open” principles as open software, i.e. the data must be available as a whole and
at no more than a reasonable reproduction cost, it can be modi ed, re-used and redistributed,
including the intermixing with other datasets.
10 An application program interface (API) is a piece of software that allows other pieces of software
to communicate even though the codes involved are not mutually known. Open or public APIs
are published on the Internet and shared freely by developers who want to write programs that
work inside other programs or use data collected by other applications.
133
• Computational literacy: the attitude and skills for seeking algori-

thmic approaches and mathematical/logical modelling to problems;
• Digital literacy: the ability to find, evaluate, utilize, share, and

create content using information technologies;
• Media literacy: the skills involved in producing media and having

a critical understanding of issues involving modes, languages,
production processes, and audiences.
To sum it up, engaging with societ “through or about data”

refers to (organized) citizens taking advantage of datafication to
“change things” through access to open data, appropriation of data
anal tics and visualization techniques, and even managing and
selling their own data, in prett much the same wa a communit
of organic farmers can grow and profit from food in an ethical,
sustainable wa . This alluring picture is, up to a certain extent,
analogous to previous naive discourses about “digital inclusion”.
But, as we now realize, the fact that most of us quickl got “digi-
tall included” as computers and smartphones became accessible
consumer goods (BUZATO, 2010), did not mean we necessaril be-
came more engaged citizens. In fact, we are far more “BD-included”
toda than we realize, but mostl b unwittingl suppl ing data
about our dail lives of which the potential meanings or worth we
are not aware, and to companies we’ve never heard about. This is
but one reason wh we should invest in “critical digital literacies”
as a research object and a teaching objective.
From DataFication to criticaL Data StuDieS
Critical Data Studies (henceforth CDS) is a oung interdisci-

plinar field devoted to approaching datafication and BD though
voices alternative to the tech-centered, marketing-oriented voices
of the “Big Data evangelists”. B focusing on the epistemological,
ethical, political, economic, historic, geographic and philosophic
aspects of datafication, CDS addresses “the multitude of wa s that
134
alread -composed data structures inflect and interact with societ ,

its organization and functioning, and the resulting impact on in-
dividuals’ dail lives (ILLIADIS; RUSSO, 2016, p.1). CDS is a critical
approach inasmuch as it questions man assumptions about Big
Data “b locating instances where Big Data ma be naivel taken to
denote objective and transparent informational entities” (ILLIADIS;
RUSSO, 2016, p.1).
Although man CDS practitioners are sociologists or philoso-
phers, the field is open to practitioners from an discipline. Dalton
and Thatcher (2014), for example, are geographers who argue we
must pose a series of crucial interdisciplinar questions about BD.
For instance, how do people resist aspects of BD? How can BD be
shown to be as much the product of societ as societ ’s producer?
In addition, assuming that “data is never raw”, the suggest we
need to investigate what is quantified, stored, sorted and what is
neglected or discarded and wh . This is, of course, analogous to our
customar questions about texts, the choices made b text authors
and what is silenced when a text gives voice to something else.
Dalton and Thatcher (2014) also ask self-reflexive questions
that are fit for all of those involved in CDS: what, in their praxis
as geographers, can contribute to a critical approach to BD? The
conclude that, among other contributions, geographers are used
to dealing with a broad range of approaches and mixed methods
research designs that can help counterbalance the radicall quan-
titative, neo-positivistic bias of DS. This is another wa of sa ing
what, in their language, the put this wa : DS should not be limited
to stud ing space – i.e. abstract location –, but help us understand
place – i.e. locations with meaning and histor , created and sustai-
ned b human experience (TUAN, 1979).
Like geographers, linguists and researchers in digital huma-
nities are alread engaged with BD – e.g. using mega-corpora
and big geolinguistics databases (DI NUNZIO; POLETTO, 2016) or
performing historical sentiment mining from large public media
135
collections (EIJNATTEN; PIETERS; VERHEUL, 2013) – and there’s no

reason wh researchers in applied language and literac studies
should not become uses of DS, too. But, since “applied” for us
also means privileging place over space, an important contribution
from our field to CDS would be the establishment of situated and
critical data literacies (henceforth CDL) as a research object and an
educational objective in our field.
graduate studies in critical data literacies: harbo-

ring an interdisciplinary object
What is the best wa to harbor a “new” (interdisciplinar )

object in a given research/teaching area? Possibl , b using the
object as an instrument, experimenting with it and, then, engaging
in critical reflection about the experience. Is it feasible, however,
in the case of such radicall different kinds of knowledge and
worldviews as are those of applied language/literac researchers/
educators and data scientists?
It is obvious that, like an literac , CDLs require specific sets
of skills (DATA POP ALLIANCE, 2014) which, unfortunatel , too
man of us in the humanities lack. That does not mean that we are
supposed to engage in research and teaching of DL subservientl nor
monologicall . Like geographers, the practitioners in our field have
experience with mixed method designs and qualitative approaches
that can not onl be associated with and counterpoised to positivistic
data anal tics, but also help DS practitioners and ourselves develop
a sense of reflexivit in their own fields (BUZATO, 2016).
This means that, es, ideall , we should use what BD and DV
have to offer for the purposes of our own research – e.g. acquiring
awareness of correlations between ke elements in language use,
literac events and language learning that could not be measured
in the past –, but also look over such uses reflexivel , so we gain,
and share, a better understanding of BD and DV as mediations and
their sociocultural and educational entailments.
136
Apart from that, ethonographic-qualitative research on BD,

DL and DV as situated social practices is needed so that we gain
understanding of how, in practice, the world and the words (whether
actual words, of pictures, numbers, dots, colors etc.) are negotiated
in these communities. How does a fundamentall quantitative and
correlational view of the world get translated into the words and
texts (even if mathematical and info-graphical representations)
t pical of those literacies? How do the discourses prevalent in
those practices relate to power relations involving technicians,
machines, people and data in specific assemblages of actor-
networks (LATOUR, 1987; BUZATO, 2012; 2013)? Ideall , we should
be able to carr out ethnographies of data labs, interview data
scientists, go to data anal tics and data visualization conferences
to observe the discussions and negotiations of meanings among
the “natives”, and so on. Likewise, we should learn about “data
agencies” in the communities of “ordinar people”. What kinds of
data does the communit produce? To whom do the data belong?
What do the mean locall and to DS team? What meanings can
a communit make of their data, given sufficient cognitive and
technological means are available? How do those meanings relate
to the situated meanings of their ever da discursive practices?
FinaL remarkS
On ending this chapter, I would like to suggest possible steps

towards introducing CDL both as object and as instrument in
applied language and literac studies postgraduate programs.
An obvious first step is to bring BD, DV and DL to the s llabuses
and research agendas, an endeavor to which this chapter itself is
meant to contribute. Right now, most of the DL-related literature
is authored b mathematics educators, engineers and business
people, and there are often interesting language and literac issues
there which could be explored.
137
We should also find wa s to make training in the literacies

listed b Data Pop Alliance (2014) available in our postgraduate and
graduate programs on a permanent basis, perhaps b designing
credit validation policies and interdisciplinar courses for that
purpose. These literacies should be framed as a kind of “second
language learning” experience, in the sense that people should
learn how to engage with and about data around relevant, real-life
topics b making meanings from the new semiotic resources made
available to them b their expert-peers.
More importantl , these studies and teaching initiatives will
be effective and critical in the positive sense of the word onl
inasmuch as the sustain a respectful, but inquisitive, dialogue
with the practitioners of DS and DV whose quantitative, data-
centric worldviews are behind the whole process of datafication
of the world. We ought to do it so the concept and practice of
CDL become productive in man fields apart from education and
citizen participation. Of course, this presupposes that just like we
“qualitative” people are willing to learn the basics of DS, so our
“quantitative” counterparts should be open to acquire some basic
knowledge and appreciation concepts in linguistics, semiotics,
discourse anal sis and so on. And, likewise, this should be done
in a “qualitative as a second language” fashion for them, perhaps
through self-ethnographies in their own field.
All of this must sound idealistic, if not naïve, given the current
state of interdisciplinar work in the Brazilian Academia. But I
believe pursuing humans’ ontological vocation, as Freire urged us
to do, is cruciall an interdisciplinar enterprise. Unless we define
human happiness and dignit as “ontological surrender.”
138
reFerences
ANDERSON, Chris. The End of Theor : The Data Deluge Makes the
Scientific Method Obsolete. Wired, 2008. Available at: <http://www.
wired.com/2008/06/pb-theor />. Access: 26 Jul. 2016.
BHARGAVA, Rahul; D’IGNAZIO, Catherine. Designing Tools and Activities
for Data Literac Learners. In: Data Literac Workshop at Web Science.
Oxford, UK: Universit of Oxford, 30 Jun., 2015. Available at: <https://
www.media.mit.edu/publications/designing-tools-and-activities-for-
data-literac -learners/>. Access: 18 Aug. 2016.
BOyD, Danah; CRAWFORD, Kate. Critical questions for big data:
provocations for a cultural, technological, and scholarl phenomenon.
Information, Communication & Society, v. 15, n. 5, p. 662–679, 2012.
BURLEy, Diana. Information Visualization as a Knowledge Integration
Tool. Journal of Knowledge Management Practice, v. 11, n. 4, 2010.
BUZATO, Marcelo El Khouri. Cultura digital e apropriação ascendente:
apontamentos para uma educação 2.0. Educação em Revista, v. 26, n. 3,
p. 283–303, 2010. DOI: 10.1590/S0102-46982010000300014
BUZATO, Marcelo El Khouri. Letramento e inclusão: do estado-nação à era
das TIC. DELTA: Documentação de Estudos em Lingüística Teórica e Aplicada,
v. 25, n. 1, p. 01-38, 2009. DOI: 10.1590/S0102-44502009000100001
BUZATO, Marcelo El Khouri. Letramentos em rede: textos, máquinas,
sujeitos e saberes em translação. Revista Brasileira de Linguística
Aplicada, v. 12, n. 4, p. 783–809, 2012. DOI: 10.1590/S1984-
63982012000400007
BUZATO, Marcelo El Khouri. Mapping Flows of Agenc in New Literacies:
Self and Social Structure in a Post-social World. In: JUNQUEIRA,
EDUARDO S.; BUZATO, MARCELO E. K. (Org.). New Literacies,
New Agencies: a Brazilian perspective. New Literacies and Digital
Epistemologies. New york: Peter Lang, 2013.
BUZATO, Marcelo. Towards an interdisciplinar ICT applied ethics:
language matters. Revista Brasileira de Linguística Aplicada, v. 16, n. 3,
p. 493–519, 2016. DOI: 10.1590/1984-6398201610240
CABLE, Dustin. The Racial Dot Map. Demographics - Weldon Cooper
Center for Public Service. 2013. Available at: <http://demographics.
coopercenter.org/racial-dot-map/>. Access: 28 Jan. 2017.
COMPOSIÇÃO ÉTNICA DO BRASIL. COMPOSIÇÃO ÉTNICA DO BRASIL. In:
Wikipédia, a enciclopédia livre. [S.l: s.n.], 2016. Available at: <https://
pt.wikipedia.org/w/index.php?title=Composi%C3%A7%C3%A3o_%C3%
A9tnica_do_Brasil&oldid=47362644>. Access: 4 Apr. 2017.
DALTON, Craig M.; THATCHER, Jim. What does a critical data studies
look like, and wh do we care? Seven points for a critical approach
to ‘Big Data.’ Societ & Space Open Site. 2014 Available at <http://
139
societ andspace.com/material/commentaries/craig-dalton-and-jim-
thatcher-what-does-a-critical-data-studies-look-like-and-wh -do-we-
care-seven-points-for-a-critical-approach-to-big-data/>. Access: 6
Sep. 2016
DATA POP ALLIANCE. Be ond Data Literac : Reinventing Communit
Engagement and Empowerment in the Age of Data. (White Paper)
2015. Available at: <https://datatherap .files.wordpress.com/2015/10/
be ond-data-literac -2015.pdf>. Access: 05 Jan. 2017
DI NUNZIO, Giorgio Maria; POLETTO, Cecilia. A Love-Hate Relationship
for Big Data and Linguistics: Present Issues and Future Possibilities. In:
Proceedings of 1st International Workshop on Accessing Cultural Heritage
at Scale (ACHS’16). Newark, USA: ACHS, 2016, v. 1611. Available at
<http://ceur-ws.org/Vol-1611/paper3.pdf>. Access 07 Jan. 2017
DIETRICH, Daniel; GRAy, Jonathan; MCNAMARA, Tim et al. What’s
open data? Open Data Handbook. 2009. Available at: <http://
opendatahandbook.org/guide/en/what-is-open-data/>. Access:
4 Feb. 2017.
EIJNATTEN, Joris van; PIETERS, Toine; VERHEUL, Jaap. Big Data for Global
Histor : The Transformative Promise of Digital Humanities. BMGN - Low
Countries Historical Review, v. 128, n. 4, p. 55, 2013. DOI: 10.18352/
bmgn-lchr.9350
FREIRE, Paulo. Pedagogy of the oppressed. 30th anniversar ed. New york:
Continuum, 2000.
ILIADIS, Andrew RUSSO, Federica. Critical data studies: An introduction.
Big Data & Society, v. 3, n. 2, p. 1–7, 2016.
IPEA-INSTITUTO DE PESQUISA ECONÔMICA APLICADA (Org.). Retrato das
desigualdades de gênero e raça. 4. ed. Brasília: Ipea, 2011.
JAMES, Michael. Race. In: ZALTA, EDWARD N. (Org.). The Stanford
Encyclopedia of Philosophy. Spring 2017 ed. [S.l.]: Metaph sics Research
Lab, Stanford Universit , 2017. Available at <https://plato.stanford.
edu/archives/spr2017/entries/race/>. Access: 1 Apr. 2017.
KITCHIN, R. Big Data, new epistemologies and paradigm shifts. Big Data
& Society, v. 1, n. 1, 1 Apr. 2014. Disponível em: <http://bds.sagepub.
com/lookup/doi/10.1177/2053951714528481>. Access: 26 Jul. 2016.
KRAMER, A. D. I.; GUILLORy, J. E.; HANCOCK, J. T. Experimental evidence
of massive-scale emotional contagion through social networks.
Proceedings of the National Academy of Sciences, v. 111, n. 24, p. 8788–
8790, 2014.
LATOUR, Bruno. Science in action: how to follow scientists and engineers
through society. Cambridge, Mass.: Harvard Universit Press, 1987.
LEMKE, J. L. Travels in h permodalit . Visual Communication, v. 1, n. 3, p.
299–325, 1 out. 2002. DOI: 10.1177/147035720200100303
140
LEMKE, Ja L. Resources for attitudinal meaning: Evaluative orientations

in text semantics. Functions of Language, v. 5, n. 1, p. 33–56, 1998.
DOI: 10.1075/fol.5.1.03lem
MAyER-SCHÖNBERGER, Viktor; CUKIER, Kenneth. Big data: a revolution
that will transform how we live, work, and think. Boston: Houghton
Mifflin Harcourt, 2013.
METCALF, Jacob; CRAWFORD, Kate. Where are human subjects in Big
Data research? The emerging ethics divide. Big Data & Society, v. 3, n.
1, 2016. DOI: 10.1177/2053951716650211
O’HALLORAN, Ka L. S stemic functional-multimodal discourse anal sis
(SF-MDA): constructing ideational meaning using language and visual
imager . Visual Communication, v. 7, n. 4, p. 443–475, 2008. DOI:
10.1177/1470357208096210
PEREIRA, Rafael H. M. Urban Demographics: Brazil Racial Dotmap. 2015.
Available at: <http://urbandemographics.blogspot.com.br/2015/12/
brazil-racial-dotmap.html>. Access: 28 Jan. 2017.
PRENSKy, Marc. From digital natives to digital wisdom: hopeful essays for 21st
century learning. Thousand Oaks, Calif: Corwin, 2012.
SCHROEDER, Ralph. Big Data and the brave new world of social
media research. Big Data & Society, v. 1, n. 2, 2014. DOI:
10.1177/2053951714563194
SCHIELD, Milo. Information Literac , Statistical Literac , Data Literac .
IASSIST Quarterly, v. 28, p. 6–11, 2004.
TUAN, yi-Fu. Space and Place: Humanistic Perspective. In: GALE, Stephen;
OLSSON, Gunnar (Orgs.). Philosophy in Geography. Dordrecht: Springer
Netherlands, 1979, p. 387–427.
WALCH, O. J.; COCHRAN, A.; FORGER, D. B. A global quantification of
“normal” sleep schedules using smartphone data. Science Advances, v.
2, n. 5, p. e1501705–e1501705, 2016.
WANG, Lidong; WANG, Guanghui; ALEXANDER, Cher l Ann. Big Data and
Visualization: Methods, Challenges and Technolog Progress. Digital
Technologies, v. 1, n. 1, p. 33–38, 2015. DOI: 10.12691/dt-1-1-7
WEN, Hua. Buying beauty: cosmetic surgery in China. Hong Kong: Hong
Kong Univ. Press, 2013.
ZOSS, Angela. Introduction to Data Visualization: Visualization T pes.
Duke Universit Libraries. Available at: <http://guides.librar .duke.
edu/datavis/vis_t pes>. Access: 28 Jan. 2017.
141

BUZATO - 2017 - Capitulo - Critical Data Literacies

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

BUZATO - 2017 - Capitulo - Critical Data Literacies

Uploaded by

Copyright:

Available Formats

Construções de sentido e letramento digital CrítiCo

CRITICAL DATA LITERACIES: GOING BEyOND WORDS TO

Marcelo El Khouri Buzato

The ideal of an education committed to promoting social

But, what do those numbers mean? That wealth Westerners are

causal h pothesis was wrong: people use flu/dengue-related words

the foreground of graduate studies and graduate research training

Big Data, Data Science anD Data Literacy:

Big Data (henceforth, BD) is data that comes in high

Put more directl , the m th sa s that, after BD, correlation

Out with ever theor of human behavior, from lin-

Despite Anderson’s and his like-minded fellows’ naïveté in

Prensk (2012, p. 207), an enthusiast of BD epistemolog ,

scientists no longer have to make educated guesses,

Implicit in the quote, for those who go b a more heterodox

time, DS draws heavil on Machine Learning (henceforth ML) and

1 Data visualization can be described as a subdomain of information that specializes in represent-

must also be represented through formats that are amenable to

Brazils Racial Dot Map

Figure 1 is a one-to-one dot densit map2 comprising three

Percent distribution of races (colors) per state in Brazil

Whites4 Blacks5 Mixed race (pardo)6

Even though Figures 1, 2 and 3 are alike in bringing more

terogeneit through clustering of multiple color dots regardless of

sense of an objective and finel grained – almost live – indexicalit

1. evaluate the text as “realistic” even though the degree of iconi-

2. feel “involved” in represented phenomenon, the same kind

3. take the other interactive participant, that is, the author(s) of

To sum it up, as BD and DV remediate traditional forms of

Big Data Literacy

Schield (2004, p.8) defines data literac (henceforth DL) as

this triad supports a “critical thinking perspective” for dealing with

Figure 4 – Schied’s (2004, p.8) critical thinking perspective in data literac

More recentl , Bhargava and D’Ignazio (2015) defined DL

and dispositions that are at the core of an critical literac : “desire”

• Scientific literacy: the application of scientific concepts and

• Computational literacy: the attitude and skills for seeking algori-

• Digital literacy: the ability to find, evaluate, utilize, share, and

• Media literacy: the skills involved in producing media and having

To sum it up, engaging with societ “through or about data”

From DataFication to criticaL Data StuDieS

Critical Data Studies (henceforth CDS) is a oung interdisci-

alread -composed data structures inflect and interact with societ ,

collections (EIJNATTEN; PIETERS; VERHEUL, 2013) – and there’s no

graduate studies in critical data literacies: harbo-

What is the best wa to harbor a “new” (interdisciplinar )

Apart from that, ethonographic-qualitative research on BD,

On ending this chapter, I would like to suggest possible steps

We should also find wa s to make training in the literacies

LEMKE, Ja L. Resources for attitudinal meaning: Evaluative orientations

You might also like