Professional Documents
Culture Documents
The Humanities in The Digital Beyond Critical Digital Humanities
The Humanities in The Digital Beyond Critical Digital Humanities
© The Editor(s) (if applicable) and The Author(s) 2023. This book is an open access
publication.
Open Access This book is licensed under the terms of the Creative Commons Attribution
4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits
use, sharing, adaptation, distribution and reproduction in any medium or format, as long as
you give appropriate credit to the original author(s) and the source, provide a link to the
Creative Commons license and indicate if changes were made.
The images or other third party material in this book are included in the book’s Creative
Commons license, unless indicated otherwise in a credit line to the material. If material is not
included in the book’s Creative Commons license and your intended use is not permitted by
statutory regulation or exceeds the permitted use, you will need to obtain permission directly
from the copyright holder.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this
publication does not imply, even in the absence of a specific statement, that such names are
exempt from the relevant protective laws and regulations and therefore free for general use.
The publisher, the authors, and the editors are safe to assume that the advice and information
in this book are believed to be true and accurate at the date of publication. Neither the
publisher nor the authors or the editors give a warranty, expressed or implied, with respect
to the material contained herein or for any errors or omissions that may have been made.
The publisher remains neutral with regard to jurisdictional claims in published maps and
institutional affiliations.
This Palgrave Macmillan imprint is published by the registered company Springer Nature
Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Alla mia famiglia
FOREWORD
vii
viii FOREWORD
times hard to discern in our daily habits. Even the least computationally
savvy scholar regularly makes use of digital resources and activities. Of
course, discrepancies arise cross cultures and geographies, and certainly
not all scholarship in every community exists in the same networked
environment. Plenty of ‘traditional’ scholarship continues in direct contact
with physical artefacts of every kind. But much of what has been considered
‘the humanities’ within the long traditions of Western and Asian culture
now functions on foundations of digitised infrastructure for access and
distribution. To a lesser degree, it also functions with the assistance of
computational processing that assists not only search and retrieval, but also
analysis and presentation in quantitative and graphical display.
As we know, familiar technologies tend to become invisible as they
increase in efficiency. The functional device does not, generally, call atten-
tion to itself when it performs its tasks seamlessly. The values of technolog-
ical optimisation privilege these qualities as virtues. Habitual consumers of
streaming content are generally disinterested in a Brechtian experience of
alienation meant to raise awareness of the conditions of viewing within
a cultural-ideological matrix. Such interventions would be tedious and
distracting and only lead to frustration whether part of an entertainment
experience or a scholarly and pedagogical one. Consciousness raising
through aesthetic work has gone the way of the early twentieth-century
avant-garde ‘slap in the face of public taste’ and other tactics to shock the
bourgeoisie. The humanities must stream along with the rest of content
delivered on demand and in consumable form, preferably with special
effects and in small packets of readily digested material. Immersive Van
Gogh and Frida Kahlo exhibits have joined the traditional experience of
looking at painting. The expectations of gallery viewers are now hyped
by theme park standards. The gamification of classrooms and learn-
ing environments caters to attention-distracted participants. Humanities
research struggles in such a context even as knowledge of history, ancient,
indigenous and classical languages and expertise in scholarly methods such
as bibliography and critical editing are increasingly rare.
Meanwhile, debates in digital humanities have become fractious in
recent years with ‘critical’ approaches pitting themselves against earlier
practices characterised as overly positivistic and reductive. Pushback and
counterarguments have split the field without resulting in any substantive
innovation in computational methods, just a shift in rhetoric and claims.
Algorithmic bias is easier to critique than change, even if advocates
assert the need to do so. The question of whether statistical and formal
FOREWORD ix
methods imported from the social sciences can adequately serve humanistic
scholarship remains open and contentious several decades into the use
of now-established practices. Even exuberant early practitioners rarely
promoted digital humanities as a unified or salvific approach to scholarly
transformation, though they sometimes adopted computational techniques
somewhat naively, using programs and platforms in a black box mode,
unaware of their internal workings.
The instrumental aspects of digital infrastructure are only one topic
of current dialogues. From the outset of my own encounter with digital
humanities, sometime in the mid-1990s, what I found intriguing were the
intellectual exigencies asserted as a requirement for working within the
constraints of these formal systems. While all of this has become normative
in the last decades, a quarter of a century ago, the recognition that much
that had been implicit in humanities scholarship had to be made explicit
within a digital context produced a certain frisson of excitement. The
task of rethinking how we thought, of thinking in relation to different
requirements, of learning to imagine algorithmic approaches to research,
to conceptualise our explorations in terms of complex systems, emergent
properties, probabilistic results and other frameworks infused our research
with exciting possibilities.
As I have noted, much of that novelty has become normative—no longer
called self-consciously to attention—just as awareness of the interface is lost
in the familiar GUI screens. Still, new questions did get asked and answered
through automated methods—mainly benefits at scale, the ‘reading’ of
a corpus rather than a work, for instance. Innovative research crosses
material sciences and the humanities, promotes non-invasive archaeology
and supports authentification and attribution studies. Errors and mistakes
abound, of course, and the misuse of platforms and processes is a regular
feature of bad digital humanities—think of all those network diagrams
whose structure is read as semantic even though it is produced to optimise
screen legibility. Misreading and poor scholarship are hardly exclusive to
digital projects even if the bases for claims to authority are structured
differently in practices based on human versus machine interpretation.
Increased sophistication in automated processes (such as named entity
recognition, part of speech parsing, visual feature analysis etc.) continues
to refine results. But the challenge that remains is to learn to think in and
with the technological premises. A digital project is not just an automated
version of a traditional project; it is a project conceived from the outset
in terms that structure the problems and possible outcomes according
x FOREWORD
against each other, critical methods against technical ones and so on are
distractions from the core issues: How is humanities scholarship to proceed
ahead? What intellectual expertise is required to work with and read into
and through the processes and conditions in which we conceptualise the
research we do?
Digital objects and computational processes have specific qualities that
are distinct from those of analogue ones, and learning to think in those
modes is essential to conceptualising the questions these environments
enable, while recognising their limitations. Thinking as an algorithm, not
just ‘using’ technology, requires a shift of intellectual understanding. But
knowledge of disciplinary domains and traditions remains a crucial part
of the essential expertise of the scholar. Without subject area expertise,
humanities research is pointless whether carried out with digital or tradi-
tional methods. As long as human beings are the main players—agents or
participants—in humanities research, no substitute or surrogate for that
expertise can arise. When that ceases to be the case, these questions and
debates will no longer matter. No sane or ethical humanist would hasten
the arrival of the moment when we cease to engage in human discourse.
No matter how much agency and efficacy are imagined to emerge in the
application of digital methods, or how deeply we may come to love our
robo-pets and AI-bot-assistants, the humanities are still intimately and
ultimately linked to human experience of being in the world. Finally, the
challenge to infuse computational methods with humanistic values, such
as the capacity to tolerate ambiguity and complexity, remains. What, after
all, is a humanistic algorithm, a bias-sensitive digital format or a self-
conscious interface? What interventions in the technology would result in
these transformations?
Lorella Viola has much to say on these matters and works from experi-
ence that combines hands-on engagement with computational methods
and a critical framework that advances insight and understanding. So,
machine and human readers, turn your attention to her text.
xiii
xiv PREFACE
ties which are essentially antithetical in their essence and which despite
intersecting—as it is said a few lines below—remain fundamentally separate.
This description of digital humanities reflects a specific mental model,
that knowledge is made up of competences delimited by established,
disciplinary boundaries. Should there be overlapping spaces, boundaries
do not dissolve nor they merge; rather, disciplines further specialise and
create yet new fields.
When the COVID-19 pandemic was at its peak, I was spending my
days in my loft in Luxembourg and most of my activities were digital. I
was working online, keeping contact with my family and friends online,
watching the news online and taking online courses and online fitness
classes. I even took part in an online choir project and in an online pub
quiz. Of course for my friends, family and acquaintances, it was not much
different. As the days became weeks and the weeks became months and
then years, it was quite obvious that it was no longer a matter of having
the digital in our lives; rather, now everyone was in the digital.
This book is titled ‘The Humanities in the Digital’ as an intentional
reference to the LARB’s interview series. With this title, I wanted to mark
how the digital is now integral to society and its functioning, including how
society produces knowledge and culture. The word order change wants
to signal conclusively the obsolescence of binary modulations in relation
to the digital which continue to suggest a division, for example, between
digital knowledge production and non-digital knowledge production. Not
only that. It is the argument of this book that dual notions of this kind are
the spectre of a much deeper fracture, that which divides knowledge into
disciplines and disciplines into two areas: the sciences and the humanities.
This rigid conceptualisation of division and competition, I maintain, is
complicit of having promoted a narrative which has paired computational
methods with exactness and neutrality, rigour and authoritativeness whilst
stigmatising consciousness and criticality as carriers of biases, unreliability
and inequality.
This book argues against a compartmentalisation of knowledge and
maintains that division in disciplines is not only unhelpful and conceptually
limiting, but especially after the exponential digital acceleration brought
about by the 2020 COVID-19 pandemic, also incompatible with the
current reality. In the pages that follow, I analyse many of the different
ways in which reality has been transformed by technology—the pervasive
adoption of big data, the fetishisation of algorithms and automation and
the digitisation of education and research—and I argue that the full
PREFACE xv
xvii
xviii ACKNOWLEDGEMENTS
meetings, I have also much enjoyed our chats during lunch and coffee
breaks. A very great thank you to Sean Takats, Machteld Venken, Andreas
Musolff, Angela Cunningham, Joris van Eijnatten and Andreas Fickers for
taking the time to comment on earlier versions of this book; thanks for
being my critical readers. And thank you to the C2 DH; in the Centre, I
found fantastic colleagues and impressive expertise and resources. I could
have not asked for more.
I am deeply grateful to Jaap Verheul and Joris van Eijnatten for providing
me with invaluable intellectual support and guidance when I was taking my
first steps in digital humanities. You have both always shown respect for my
ideas and for me as a professional and a learner and I will always cherish
the memory of my time at Utrecht University.
My sincere thanks to the Transatlantic research project Oceanic
Exchanges (OcEx) of which I had the privilege to be part. In those years, I
had the opportunity to work with lead historians and computer scientists;
without OcEx, I would not be the scholar that I am today.
Very special thanks to my family to which this book is dedicated for
always listening, supporting me and encouraging me to aim high and never
give up. Your love and care are my light through life; I love you with all my
heart. And a very big thank you to my little nephew Emanuele, who thinks
I must be very cold away from Sicily. Your drawings of us in the Sicilian
sun have indeed warmed many cold, Luxembourgish winter days.
I heartily thank Johanna Drucker for writing the Foreword to this book
and more importantly for producing inspiring and important science.
Thanks also to those who have supported this book project right from
the start, including the anonymous reviewers who have provided insights
and valuable comments, considerably contributing to improve my work.
And finally, thanks to all those who, directly or indirectly, have been
involved in this venture; in sometimes unpredictable ways, you have all
contributed to make this book possible.
CONTENTS
4 How Discrete 81
6 Conclusion 137
References 147
Index 169
xix
ABOUT THE AUTHOR
xxi
ACRONYMS
AI Artificial intelligence
API Application Programming Interface
BoW Bag of words
CAGR Compound annual growth rate
CDH Critical Digital Humanities
CHS Critical Heritage Studies
C2 DH Centre for Contemporary and Digital History
DDTM Discourse-driven topic modelling
DeXTER DeepteXTminER
DH Digital humanities
DHA Discourse-historical approach
DHARPA Digital History Advanced Research Projects Accelerator
GNM GeoNewsMiner
GPE Geopolitical entity
GUI Graphical user interface
IR Information retrieval
ITM Interactive topic modelling
LARB Los Angeles Review of Books
LDA Latent Dirichlet Allocation
LDC Least developed countries
LLDC Landlocked developing countries
LOC Locality entity
ML Machine learning
NA Network analysis
NDNP National Digital Newspaper Program
xxiii
xxiv ACRONYMS
xxv
xxvi LIST OF FIGURES
The ultimate, hidden truth of the world is that it is something that we make,
and could just as easily make differently. (Graeber 2013)
(Fenn and Raskino 2008), only some of these processes are actually
expected to eventually reach the virtual status of ‘Plateau of Productivity’
and if there is a cost to adapting slowly, the cost to being wrong is higher.
The 2020 health crisis changed all this. In just a few months’ time, the
COVID-19 pandemic accelerated years of change in the functioning of
society, including the way companies in all sectors operated. In 2020, the
McKinsey Global Institute surveyed 800 executives from a wide variety of
sectors based in the United States, Australia, Canada, China, France, Ger-
many, India, Spain and the United Kingdom (Sua et al. 2020). The report
showed that since the start of the pandemic, companies had accelerated
the digitisation of both their internal and external operations by three to
four years, while the share of digital or digitally enabled products in their
portfolios had advanced by seven years. Crucially, the study also provided
insights into the long-term effects of such changes: companies claimed
that they were now investing in their long-term digital transformations
more than in anything else. According to a BDO’s report on the digital
transformation brought about by the COVID crisis (Cohron et al. 2020,
2), just as much as businesses that had developed and implemented digital
strategies prior to the pandemic were in a position to leapfrog their
less digital competitors, organisations that would not adapt their digital
capabilities for the post-coronavirus future would simply be surpassed.
Higher education has also been deeply affected. Before the COVID-19
crisis, higher education institutions would look at technology’s strategic
importance not as a critical component of their success but more as one
piece of the pedagogical puzzle, useful both to achieve greater access
and as a source of cost efficiency. For example, many academics had
never designed or delivered a course online, carried out online students’
supervisions, served as online examiners and presented or attended an
online conference, let alone organise one. According to the United Nations
Educational, Scientific and Cultural Organization (UNESCO), at the first
peak of the crisis in April 2020, more than 1.6 billion students around
the world were affected by campus closures (UNESCO 2020). As on-
campus learning was no longer possible, demands for online courses
saw an unprecedented rise. Coursera, for example, experienced a 543%
increase in new courses enrolments between mid-March and mid-May
2020 alone (DeVaney et al. 2020). Having to adapt quickly to the virtual
switch—much more quickly than they had considered feasible before the
outbreak—universities and higher education institutions were forced to
implement some kind of temporary digital solutions to meet the demands
1 THE HUMANITIES IN THE DIGITAL 5
that the COVID-19 pandemic has marked a clear turning point of historic
proportions for technology adoption for which the paradigm shift towards
digitisation has been sharply accelerated.
Yet if during the health crisis companies and universities were forced
to adopt similar digitisation strategies, almost three years after the start
of the pandemic, now things between the two sectors look different
again. To succeed and adapt to the demands of the new digital market,
companies understood that in addition to investing massively in their
digital infrastructures, they crucially also had to create new business models
that replaced the existing ones which had simply become inadequate
to respond to the rules dictated by new generations of customers and
technologies. The digital transformation has therefore required a deeper
transformation in the way businesses were structuring their organisations,
thought of the market challenges and approached problem-solving (Morze
and Strutynska 2021). In contrast, it appears that higher education has
returned to look at technology as a means for incremental changes, once
again as a way to enhance learning approaches or for cost reduction
purposes, but its disruptive and truly revolutionary impact continues to be
poorly understood and on the whole under-theorised (Branch et al. 2020;
Alenezi 2021). For instance, although universities and research institutes
have to various degrees digitised pedagogical approaches, added digital
skills to their curricula and favoured the use and development of digital
methods and tools for research and teaching, technology is still treated
as something contextual, something that happens alongside knowledge
creation.
Knowledge creation, however, happens in society. And while society
has been radically transformed by technology which has in turn trans-
formed culture and the way it creates it, universities continue to adopt
an evolutionary approach to the digital (Alenezi 2021): more or less
gradual adjustments are made to incorporate it but the existing model of
knowledge creation is left essentially intact. The argument that I advance
in this book is on the contrary that digitisation has involved a much greater
change, a more fundamental shift for knowledge creation than the current
model of knowledge production accommodates. This shift, I claim, has
in fact been in—as opposed to towards—the digital. As societies are in
the digital, one profound consequence of this shift is that research and
knowledge are also in turn inevitably mediated by the digital to various
degrees. As a bare minimum, for example, regardless of the discipline,
a post-COVID researcher is someone able to embrace a broad set of
1 THE HUMANITIES IN THE DIGITAL 7
digital tools effectively. Yet what this entails in terms of how knowledge
production is now accordingly lived, reimagined, conceptualised, managed
and shared has not yet been adequately explored, let alone formally
addressed. In relation to knowledge creation, what I therefore argue for
is a revolutionary rather than an evolutionary approach to the digital.
Whereas an evolutionary approach to the digital extends the existing
model of knowledge creation to incorporate the digital in some form of
supporting role, a revolutionary approach calls for a different model which
entirely reconceptualises the digital and how it affects the very practices
of knowledge production. Indeed, claiming that the shift has been in the
digital acknowledges conclusively that the digital is now integral to not
only society and its functioning, but crucially also to how society produces
knowledge and culture.
Crucially, such different model of knowledge production must break
with the obsolescence of persisting binary modulations in relation to the
digital—for example between digital knowledge creation and non-digital
knowledge creation—in that they continue to suggest artificial divisions. It
is the argument of this book that dual notions of this kind are the spectre of
a much deeper fracture, that which divides knowledge into disciplines and
disciplines into two areas: the sciences and the humanities. Significantly, a
consequence of the shift in the digital is that reality has been complexified
rather than simplified. Many of the multiple levels of complexity that
the digital brings to reality are so convoluted and unpredictable that
the traditional model of knowledge creation based on single discipline
perspectives and divisions is not only unhelpful and conceptually limiting,
but especially after the exponential digital acceleration brought about by
the 2020 COVID-19 pandemic, also incompatible with the current reality
and no longer suited to understand and explain the ramifications of this
unpredictability.
In arguing against a compartmentalisation of knowledge which essen-
tially disconnects rather than connecting expertise (Stehr and Weingart
2000), I maintain that the insistent rigid conceptualisation of division
and competition is complicit of having promoted a narrative which has
paired computational methods with exactness and neutrality, rigour and
authoritativeness whilst stigmatising consciousness and criticality as carriers
of biases, unreliability and inequality. The book is therefore primarily a
reflection on the separation of knowledge into disciplines and of disciplines
into the sciences vs the humanities and discusses its contemporary relevance
and adequateness in relation to the ubiquitous impact of digital technolo-
8 L. VIOLA
gies on society and culture. In the pages that follow, I analyse many of
the different ways in which reality has been transformed by technology—
the pervasive adoption of big data, the fetishisation of algorithms and
automation, the digitisation of education and research and the illusory,
yet believed, promise of objectivism—and I argue that the full digitisation
of society, already well on its way before the COVID-19 pandemic but
certainly brought to its non-reversible turning point by the 2020 health
crisis, has added even further complexity to reality, exacerbating existing
fractures and disparities and posing new complex questions that urgently
require a re-theorisation of the current model of knowledge creation in
order to be tackled.
In advocating for a new model of knowledge production, the book
firmly opposes notions of divisions, particularly a division of knowledge
into monolithic disciplines. I contend that the recent events have brought
into sharper focus how understanding knowledge in terms of discipline
compartmentalisation is anachronistic and not equipped to encapsulate
and explain society. The pandemic has ultimately called for a reconcep-
tualisation of knowledge creation and practices which now must operate
beyond outdated models of separation. In moving beyond the current rigid
framework within which knowledge production still operates, I introduce
different concepts and definitions in reference to the digital, digital objects
and practices of knowledge production in the digital, which break with
dialectical principles of dualism and antagonism, including dichotomous
notions of digital vs non-digital positions.
This book focuses on the humanities, the area of academic knowledge
that had already undergone radical transformation by the digital in the
last two decades. I start by retracing schisms in the field between the
humanities, the digital humanities (DH) and critical digital humanities
(CDH); these are embedded, I argue, within the old dichotomy of
sciences vs humanities and the persistent physics envy in our society and
by extension, in research and academic knowledge. I especially challenge
existing notions such as that of ‘mainstream humanities’ that characterise
it as a field that is seemingly non-digital but critical. I maintain that in the
current landscape, conceptualisations of this kind have more the colour of a
nostalgic invocation of a reality that no longer exists, perhaps as an attempt
to reconstruct the core identity of a pre-digital scholar who now more than
ever feels directly threatened by an aggressive other: the digital. Equally
not relevant nor useful, I argue, is a further division of the humanities into
DH and CDH. In pursuing this argumentation, I examine how, on the one
1 THE HUMANITIES IN THE DIGITAL 9
hand, scholars arguing in favour of CDH claim that the distinction between
digital and analogue is pointless; therefore, humanists must embrace the
digital critically; on the other hand, by creating a new field, i.e., CDH,
they fall into the trap of factually perpetuating the very separation between
digital and critical that they define as no longer relevant.
In pursuing my case for a novel model of knowledge creation in the
digital, throughout the book, I analyse personal use cases; specifically, I
examine how I have addressed in my own work issues in digital practice
such as transparency, documentation and reproducibility, questions about
reliability, authenticity and biases, and engaging with sources through
technology. Across the various examples presented in the following chap-
ters, this book demonstrates how a re-examination of digital knowledge
creation can no longer be achieved from a distance, but only from the
inside, that the digital is no longer contextual to knowledge creation but
that knowledge is created in the digital. This auto-ethnographic and self-
reflexive approach allows me to show how my practice as a humanist in the
digital has evolved over time and through the development of different
digital projects. My intention is not to simply confront algorithms as
instruments of automation but to unpack ‘the cultural forms emerging in
their shadows’ (Gillespie 2014, 168). Expanding on critical posthumanities
theories (Braidotti 2017; Braidotti and Fuller 2019), to this aim I then
develop a new framework for digital knowledge creation practices—the
post-authentic framework (cfr. Chap. 2)—which critiques current positivis-
tic and deterministic views and offers new concepts and methods to be
applied to digital objects and to knowledge creation in the digital.
A little less than a decade ago, Berry and Dieter (2015) claimed that
we were rapidly entering a world in which it was increasingly difficult
to find culture outside digital media. The major premise of this book is
that especially after COVID-19, all information is now digital and even
more, algorithms have become central nodes of knowledge and culture
production with an increased capacity to shape society at large. I therefore
maintain that universities and higher education institutions can no longer
afford to consider the digital has something that is happening to knowledge
creation. It is time to recognise that knowledge creation is happening in
the digital. As digital vs non-digital positions have entirely lost relevance,
we must recognise that the current model of knowledge grounded in
rigid divisions is at best irrelevant and unhelpful and at worst artificial and
harmful. Scholars, researchers, universities and institutions have therefore
a central role to play in assessing how digital knowledge is created not
10 L. VIOLA
just today, but also for the purpose of future generations, and clear
responsibilities to shoulder, those that come from being in the digital.
Big data analytics has also been incorporated into the banking sector
for tasks such as improving the accuracy of risk models used by banks and
financial institutions. In credit management, banks use big data to detect
fraud signals or to understand the customer behaviour from the analysis
of investment patterns, shopping trends, motivation to invest and personal
or financial background. According to recent predictions, the market of
big data analytics in banking could rise to $62.10 billion by 2025 (Flynn
2020). Ever larger and more complex data-sets are also used for law and
order policy (e.g., predictive policing), for mapping user behaviour (e.g.,
social media), for recording speech (e.g., Alexa, Google Assistant, Siri) and
for collecting and measuring the individual’s physiological data, such as
their heart rate, sleep patterns, blood pressure or skin conductance. And
these are just a few examples.
More data and therefore more accuracy and freedom from subjectivity
were also promised to research. Disciplines across scientific domains have
increasingly incorporated technology within their traditional workflows
and developed advanced data-driven approaches to analyse ever larger and
more complex data-sets. In the spirit of breaking the old schemes of opaque
practices, it is the humanities, however, that has arguably been impacted
the most by this explosion of data. Thanks to the endless flow of searchable
material provided by the Digital Turn, now humanists could finally change
the fully hermeneutical tradition, believed to perpetuate discrimination and
biases.
This looked like ‘that noble dream’ (Novick 1988). Millions of records
of sources seemed to be just a click away. Any humanist scholar with a
laptop and an Internet connection could potentially access them, explore
them and analyse them. Even more revolutionising was the possibility to
finally be able to draw conclusions from objective evidence and so dismiss
all accusations that the humanities was a field of obscure, non-replicable
methods. Through large quantities of ‘data’, humanists could now under-
stand the past more wholly, draw more rigorous comparisons with the
present and even predict the future. This ‘DH moment’, as it was called,
was perfectly in line with a more global trend for which data was (and to a
large extent still is) presumed to be accurate and unbiased, therefore more
reliable and ultimately, fairer (Christin 2016). The ‘DH promise’ (Thomas
2014; Moretti 2016) was a promise of freedom, freedom from subjectivity,
from unreliability, but more importantly from the supposed irrelevance of
the humanities in a data-driven world. It was also soaked in positivistic
hypes about the endless opportunities of data-driven research methods
12 L. VIOLA
But it is not just tech giants and academic research that jumped on the
suspicious big data and AI bandwagon; governments around the world
have also been exploiting this technology for matters of governance, law
enforcement and surveillance, such as blacklisting and the so-called pre-
dictive policing, a data-driven analytics method used by law enforcement
departments to predict perpetrators, victims or locations of future crimes.
Predictive policing software analyses large sets of historic and current crime
data using machine learning (ML) algorithms to determine where and
when to deploy police (i.e., place-based predictive policing) or to identify
individuals who are allegedly more likely to commit or be a victim of a crime
(i.e., person-based predictive policing). While supporters of predictive
policing argue that these systems help predict future crimes more accurately
and objectively than police’s traditional methods, critics complain about
the lack of transparency in how these systems actually work and are used
and warn about the dangers of blindly trusting the supposed rigour of this
technology. For example, in June 2020, Santa Cruz, California—one of
the first US cities to pilot this technology in 2011—was also the first city
in the United States to ban its municipal use. After nine years, the city of
Santa Cruz decided to discontinue the programme over concerns of how
it perpetuated racial inequality. The argument is that, as the data-sets used
by these systems include only reported crimes, the obtained predictions are
deeply flawed and biased and result in what could be seen as a self-fulfilling
prophecy. In this respect, Matthew Guariglia maintains that ‘predictive
policing is tailor-made to further victimize communities that are already
overpoliced—namely, communities of colour, unhoused individuals, and
immigrants—by using the cloak of scientific legitimacy and the supposed
unbiased nature of data’ (Guariglia 2020). Despite other examples of
predictive policing programmes being discontinued following audits and
lawsuits, at the moment of writing, more than 150 cities in the United
States have adopted predictive policing (Electronic Frontier Foundation
2021). Outside of the United States, China, Denmark, Germany, India,
the Netherlands and the United Kingdom are also reported to have tested
or deployed predictive policing tools.
The problem with predictive policing has little to do with intentionality
and a lot to do with the limits of computation. Computer algorithms are a
finite list of instructions designed to perform a computational task in order
to produce a result, i.e., an output of some kind. Each task is therefore
performed based on a series of instructed assumptions which, far from
being unbiased, are not only obfuscated by the complexity of the algorithm
14 L. VIOLA
[…] if they [computers] are fed a pile of data and asked to identify
correlations in that data, they will return an answer dependent solely on
the data they know and the mathematical definition of correlation that they
are given. Computers do not know if the data they receive is wrong, biased,
incomplete, or misleading. They do not know if the algorithm they are told to
use has flaws. They simply produce the output they are designed to produce
based on the inputs they are given.
Boyd gives the example of a traffic violation: a red light run by someone
who is drunk vs by someone who is experiencing a medical emergency. If
the latter scenario is not embedded into the model as a specific exception,
then the algorithm will categorise both events as the same traffic violation.
The crucial difference in decision-making processes between humans and
algorithms is that humans are able to make a judgement based on a
combination of factors such as regulations, use cases, guidelines and,
fundamentally, environmental and contextual factors, whereas algorithms
still have a hard time mimicking the nature of human understanding.
Human understanding is fluid and circular, whilst algorithms are linear
and rigid. Furthermore, the data-sets on which computational decision-
making models are based are inevitably biased, incomplete and far from
being accurate because they stem from the very same unequal, racist, sexist
and biased systems and procedures that the introduction of computational
decision-making was intended to prevent in the first place.
Moreover, systems become increasingly complex and what might be
perceived as one algorithm may in fact be many. Indeed, some systems
can reach a level of complexity so deep that understanding the intricacies
and processes according to which the algorithms perform the assigned tasks
becomes problematic at best, if at all possible (Gillespie 2014). Although
this may not always have serious consequences, it is nevertheless worth of
close scrutiny, especially because today complex ML algorithms are used
extensively, and more and more in systems that operate fundamental social
functions such as the already cited healthcare and law and order, but as a
1 THE HUMANITIES IN THE DIGITAL 15
matter of fact they are still ‘poorly understood and under-theorized’ (Boyd
2018). Despite the fact that they are assumed to be, and often advertised
as being neutral, fair and accurate, each algorithm within these complex
systems is in fact built according to a set of assumptions and cultural values
that reflect the strategic choices made by their creators according to specific
logics, may these be corporate or institutional.
Another largely distorted view surrounding digital and algorithmic
discourse concerns data. Although algorithms and data are often thought
to be two distinct entities independent from each other, they are in
fact two sides of the same coin. In fact, to fully understand how an
algorithm operates the way it does, one needs to look at it in combination
with the data it uses, better yet at how the data must be prepared for
the algorithm to function (Gillespie 2014). This is because in order for
algorithms to work properly, that is automatically, information needs to be
rendered into data, e.g., formalised according to categories that will define
the database records. This act of categorising is precisely where human
intervention hides. Gillespie pointedly remarks that far from being a neutral
and unbiased operation, categorisation is in fact an act of ‘a powerful
semantic and political intervention’ (Gillespie 2014, 171), deciding what
the categories are, what belongs in a category and what does not are
all powerful worldview assertions. Database design can therefore have
potentially enormous sociological implications which to date have largely
been overlooked (ibid.).
A recent example of the larger repercussions of these powerful world-
view assertions is fashion companies for people with disabilities and how
their requests to be advertised by Facebook have been systematically
rejected by Facebook’s automated advertising centre. Again, the reason
for the rejection is unlikely to have anything to do with intentionally
discriminating against people with disabilities, but it is to be found in
the way fashion products for people with disabilities are identified (or
rather misidentified) by Facebook algorithms that determine products’
compliance with Facebook policy. Specifically, these items were categorised
as ‘medical and health care products and services including medical devices’
and as such, they violated Facebook’s commercial policy (Friedman 2021).
Although these companies had their ads approved after appealing to Face-
book’s decision, episodes like this one reveal not only the deep cracks in
ML models, but worse, the strong biases in society at large. To paraphrase
Kate Crawford, every classification system in machine learning contains
a worldview (Crawford 2021). In this particular case, the implicit bias in
16 L. VIOLA
‘undisciplined.’ Thus, in the process of research, new and ever finer structures
are constantly created as a result of this behaviour. This is (exceptions
notwithstanding) the very essence of the innovation process, but it takes
place primarily within disciplines, and it is judged by disciplinary criteria of
validation. (Weingart 2000, 26–27)
The author argues that starting from the early nineteenth century when
the separation and specialisation of science into different disciplines was
created, interdisciplinarity became a promise, the promise of the unity of
science which in the future would have been actualised by reducing the
fragmentation into disciplines. Today, however, interdisciplinarity seems to
have lost interest in that promise as the discourse has shifted from the idea
of ultimate unity to that of innovation through a combination of variations
(ibid., 41). For example, in his essay Becoming Interdisciplinary, McCarty
(2015) draws a close parallel between the struggle of dealing with the post-
World War II overwhelming amount of available research that inspired
Vannevar Bush’s Memex and the situation of contemporary researchers.
Bush (1945) maintained that the investigator could not find time to deal
with the increasing amount of research which had exceeded far beyond
anyone’s ability to make real use of the record. The difficulty was, in his
view, that if on the one hand ‘specialization becomes increasingly necessary
for progress’, on the other hand, ‘the effort to bridge between disciplines
is correspondingly superficial.’ The keyword on which we should focus our
attention, McCarty argues, is superficial (2015, 73):
gan and McCarty 2011). This would also include a deep understanding
of disciplines and approaches other than one’s own (Gooding 2020).
Indeed, the contemporary notion of interdisciplinarity based on the idea
that innovation is better achieved by recombining ‘bits of knowledge
from previously different fields’ into novel fields is bound to create more
specialisation and therefore new boundaries (Weingart 2000, 40).
The schism of the humanities between ‘mainstream humanities’ and
digital humanities, and later between digital humanities and critical digital
humanities, perfectly illustrates the issue. In 2012, Alan Liu wrote a
provocative essay titled Where Is Cultural Criticism in the DH? (Liu 2012).
The essay was essentially a plea for DH to embrace a wider engagement
with the societal impact of technology. It was very much the author’s hope
that the plea would help to convert this ‘deficit’ into ‘an opportunity’,
the opportunity being for DH to gain a long overdue full leadership,
as opposed to a ‘servant’ role within the humanities. In other words, if
the DH wanted to finally become recognised as legitimate partners of
‘mainstream humanities’, they needed to incorporate cultural criticism in
their practices and stop pushing buttons without reflecting on the power
of technology.
In the aftermath of Liu’s essay, reactions varied greatly with views
ranging from even harsher accusations towards DH to more optimistic
perspectives, and some also offering fully programmatic and epistemolog-
ical reflections. Some scholars, for example, voiced strong concerns about
the wider ramifications of the lack of cultural critique in DH, what has
often been referred to as ‘the dark side of the digital humanities’ (Grusin
2014; Chun et al. 2016), the association of DH with the ‘corporatist
restructuring of the humanities’ (Weiskott 2017), neoliberalism (Allington
et al. 2016), and white, middle-class, male dominance (Bianco 2012). Two
controversial essays in particular, one published in 2016 by Allington et
al. (op. cit.) and the other a year later by Brennan (2017) argued that,
in a little over a decade, the myopic focus of DH on neoliberal tooling
and distant reading had accomplished nothing but consistently pushing
aside what has always been the primary locus of humanities investigation:
intellectual practice.
This view was also echoed by Grimshaw (2018) who indicted DH
for going to bed with digital capitalism, ‘an online culture that is anti-
diversity and enriching a tiny group of predominantly young white men’
(2). Unlike Weiskott (2017), however, who argued ‘There is no such
a thing as “the digital humanities”’, meaning that DH is merely an
22 L. VIOLA
opportunistic investment and a marketing ploy but it doesn’t really alter the
core of the humanities, Grimshaw maintained that this kind of pandering
causes rot at the heart of the humanistic knowledge and practice. This
he calls ‘functionalist DH’, the use of tools to produce information in
line with managerial metrics but with no significant knowledge value (6).
Grimshaw strongly criticises DH for having disappointed the promise of
being a new discipline of emancipation and for being in fact ‘nothing
more than a tool for oppression’. The digital transformation of society,
he continues, has resulted in increased inequality, wider economic gap,
an upsurge in monopolies and surveillance, lack of transparency of big
data, mobbing, trolling, online hate speech and misogyny. Rather than
resisting it, DH is guilty of having embraced such culture, of operating
within the framework of lucrative tech deals which perpetuate and reinforce
the neoliberal establishment. Digital humanists are establishment curators
and no longer able of critical thought; DH is therefore totally unequipped
to rethink and criticise digital capitalism. Although he acknowledges the
emergence of critical voices within DH, he also strongly advocates a more
radical approach which would then justify the need for a ‘new’ field,
an additional space within the university where critique, opposition and
resistance can happen (7). This space of resistance and critical engagement
with digital capitalism is, he proposes, critical digital humanities (CDH).
Over the years, other authors such as Hitchcock (2013), Berry (2014)
and Dobson (2019) have also advocated critical engagement with the
digital as the epistemological imperative for digital humanists and have
identified CDH as the proper locus for such engagement to take place. For
example, according to Hitchcock, humanists that use digital technology
must ‘confront the digital’, meaning that they must reflect on the contex-
tual theoretical and philosophical aspects of the digital. For Berry, CDH
practice would allow digital humanists to explore the relationship between
critical theory and the digital and it would be both research- and practice-
led. Equally for Dobson, digital humanists must endlessly question the
cultural dimension and historical determination of the technical processes
behind digital operations and tools. With perhaps the sole exception of
Grimshaw (op. cit.) who is not interested in practice-led digital enquiry,
the general consensus is on the urgency of conducting critically engaged
digital work, that is, drawing from the very essence of the humanities, its
intrinsic capacity to critique.
However, whilst these proposed methodologies do not differ dramati-
cally across authors, there seems to be disagreement about the scope of the
1 THE HUMANITIES IN THE DIGITAL 23
enquiry itself. In other words, the open question around CDH would not
concern so much the how (nor the why) but the what for?. For example,
Dobson (2019) is not interested in a critical engagement with the digital
that aims to validate results; this would be a pointless exercise as the
distinction between the subjectivity of an interpretative method and the
objectivity of both data and computational methods is illusory. He claims
(ibid., 46):
…
there is no such thing as contextless quantitative data. […] Data are imag-
ined, collected, and then typically segmented. […] We should doubt any
attempt to claim objectivity based on the notion of bypassed subjectivity
because human subjectivity lurks within all data. This is because data do
not merely exist in the world, but are abstractions imagined and generated
by humans. Not only that, but there always remain some criteria informing
the selection of any quantity of data. This act of selection, the drawing of
boundaries that names certain objects a data-set introduces the taint of the
human and subjectivity into supposedly raw, untouched data.
other words, by ‘fixing’ the lack of critical engagement of the field, the
function of CDH would be to strengthen DH, thus markedly diverging
from Dobson.
Albeit from different epistemological points of view, these reflections
share similar methodological and ethical concerns and question the lack of
critical engagement of DH, be they historical, cultural or political. I argue
however that this reasoning exposes at least three inconsistencies. Firstly,
in earlier perspectives (e.g., Liu 2012), the sciences are deemed to be
obviously superior to the humanities and yet, as soon as the computational
is incorporated into the field, the value of the humanities seems to have
decreased rather than increased. For example, Bianco (2012) advocates
a change in the way digital humanists ‘legitimise’ and ‘institutionalise’
the adoption of computational practices in the humanities. Such change
would require not simply defending the legitimacy or advocating the
‘obvious’ supremacy of computational practices but by reinvesting in the
word humanities in DH. The supremacy of the digital would then be
understood as a combination of superiority, dominance and relevance that
computational practices—and by extension, the hard sciences (i.e., physics
envy)—are believed to have over the humanities. However, as Grimshaw
(2018) also argued later, in the process of incorporating the computational
into their practices, the humanities forgot all about questions of power,
domination, myth and exploitation and have become less and less like the
humanities and more and more like a field of execute button pushers.
Despite acknowledging the illusion of subjectivity, this view shows how
deeply rooted in the collective unconscious is the myth surrounding
technology and science which firmly positions them as detached from
human agency and distinctly separated from the humanities.
Secondly and following from the first point, these views all share a
persistent dualistic, opposing notion of knowledge, which in one form or
another, under the disguise of either freshly coined or well-seasoned terms,
continue to reflect what Snow famously called ‘the two cultures’ of the
humanities and the sciences (2013). Such separation is typically verbalised
in competing concepts such as subjectivity vs objectivity, interpretative vs
analytical and critical vs digital. Despite using terms that would suggest
union (e.g., ‘incorporated’), the two cultures remain therefore clearly
divided. The conceptualisation of knowledge creation which continues to
compartmentalise fields and disciplines, I argue, is also reflected in the clear
division between the humanities, DH and CDH. This model, I contend, is
highly problematic because besides promoting intense schism, it inevitably
1 THE HUMANITIES IN THE DIGITAL 25
the creation of niche fields, let alone exclusively within the humanities, but
through a reconceptualisation of knowledge creation itself.
The post-authentic framework that I propose in this book moves beyond
the existing breakdown of disciplines which I see as not only unhelpful
and conceptually limiting but also harmful. The main argument of this
book is that it is no longer solely the question of how the digital affects
the humanities but how knowledge creation more broadly happens in
the digital. Thinking in terms of yet another field (e.g., CDH) where
supposedly computational science and critical enquiry would meet in this
or that modulation, for this or that goal, still reiterates the same bound-
aries that hinder that enquiry. Similarly, claiming that DH scholarship
conducts digital enquiry suggests that humanities scholarship does not
happen in the digital and therefore it continually reproduces the outmoded
distinction between digital and analogue as well as the dichotomy between
digital/non-critical and non-digital/critical. Conversely, calls for a CDH
presuppose that DH is never critical (or worse, that it cannot be critical
at all) and that the humanities can (should?) continue to defer their
appointment with the digital, and disregard any matter of concern that
has to do with it, ultimately implying that to remain unconcerned by the
digital is still possible.
But the digital affects us all, including (perhaps especially) those who
do not have access to it. The digital transformation exacerbates the already
existing inequalities in society as those who are the most vulnerable such
as migrants, refugees, internally displaced persons, older persons, young
people, children, women, persons with disabilities, rural populations and
indigenous peoples are disproportionately affected by the lack of digital
access. The digital lens provided by the 2020 pandemic has therefore
magnified the inequality and unfairness that are deeply rooted in our
societies. In this respect, for example, on 18 July 2020, UN Secretary-
General Antonio Guterres declared (United Nations 2020a):
The structures are by no means fixed and irreplaceable, but they are social
constructs, products of long and complex social interactions, subject to social
processes that involve vested interests, argumentation, modes of conviction,
and differential perceptions and communications.
areas of knowledge do not compete against each other but benefit from a
mutually compensating relationship. Building on the notion of monism in
posthuman theory (Braidotti and Fuller 2019, 16) (cfr. Sect. 1.2) in which
differences are not denied but which at the same time do not function
hierarchically, symbiosis and mutualism help refigure our understanding
of knowledge creation not as a space of conflict and competition but as a
space of fluid interactions in which differences are understood as mutually
enriching.
Symbiosis and mutualism are central concepts of the post-authentic
framework that I propose in this book, a theoretical framework for knowl-
edge creation in the digital. If collaboration across areas of knowledge has
so far been largely an option, often motivated more by a grant-seeking
logic than by genuine curiosity, the digital calls for an actual change in
knowledge culture. The question we should ask ourselves is not ‘How can
we collaborate?’ but ‘How can we contribute to each other?’. Concepts such
as those of symbiosis and mutualism could equally inform our answer when
asking ourselves the question ‘How do we want to create knowledge and
how do we want to train our next generation of students?’.
To answer this question, the post-authentic framework starts by recon-
ceptualising digital objects as much more complex entities than just
collections of data points. Digital objects are understood as the conflation
of humans, entities and processes connected to each other according to
the various forms of power embedded in computational processes and
beyond and which therefore bear consequences (Cameron 2021). As
such, digital objects transcend traditional questions of authenticity because
digital objects are never finished nor they can be finished. Countless
versions can continuously be created through processes that are shaped
by past actions and in turn shape the following ones. Thus, in the post-
authentic framework, the emphasis is on both products and processes
which are acknowledged as never neutral and as incorporating external,
situated systems of interpretation and management. Specifically, I take
digitised cultural heritage material as an illustrative case of a digital object
and I demonstrate how the post-authentic framework can be applied to
knowledge creation in the digital. Throughout the chapters of this book,
I devote specific attention to four key aspects of knowledge creation in
the digital: creation of digital material in Chap. 2, enrichment of digital
material in Chap. 3, analysis of digital material in Chap. 4, and visualisation
of digital material in Chap. 5.
1 THE HUMANITIES IN THE DIGITAL 33
NOTES
1. The Gartner Hype Cycle of technology is a cycle model that explains a
generally applicable path a technology takes in terms of expectations. It states
that after the initial, overly positive reception follows a ‘Trough of Disillu-
sionment’ during which the hype collapses due to disappointed expectations.
Some technologies manage to then climb the ‘Slope of Enlightenment’ to
eventually plateau to a status of steady productivity.
2. This is not to mistake for the Amazon Mechanical Turk which is a crowd-
sourcing website that facilitates the remote hiring of ‘crowdworkers’ to
perform on-demand tasks that cannot be handled by computers. It is
operated under Amazon Web Services and is owned by Amazon.
3. Compound annual growth rate.
4. https://github.com/lorellav/DeXTER-DeepTextMiner.
Open Access This chapter is licensed under the terms of the Creative Commons
Attribution 4.0 International License (http://creativecommons.org/licenses/by/
4.0/), which permits use, sharing, adaptation, distribution and reproduction in any
medium or format, as long as you give appropriate credit to the original author(s)
and the source, provide a link to the Creative Commons license and indicate if
changes were made.
The images or other third party material in this chapter are included in the
chapter’s Creative Commons license, unless indicated otherwise in a credit line
to the material. If material is not included in the chapter’s Creative Commons
license and your intended use is not permitted by statutory regulation or exceeds
the permitted use, you will need to obtain permission directly from the copyright
holder.
CHAPTER 2
different disciplines can happen (Fickers and Heijden 2020). Within this
institutional and conceptual framework, I conceived DeXTER as a post-
authentic research activity to critically assess and implement different
state-of-the-art natural language processing (NLP) and deep learning
techniques for the curation and visualisation of digital heritage material.
DeXTER’s ultimate goal was to bring the utilised techniques into as close
an alignment as possible with the principle of human agency (cfr. Chap. 3).
The larger ecosystem of the ChroniclItaly collections thus exemplifies
the evolving nature of digital objects and how international and national
processes interweave with wider external factors, all impacting differen-
tially on the objects’ evolution. The existence of multiple versions of
ChroniclItaly, for example, is in itself a reflection of the incompleteness
of the Chronicling America project to which titles, issues and digitised
material are continually added. ChroniclItaly and ChroniclItaly 2.0 include
seven titles and issues from 1898 to 1920 that portray the chronicles of
Italian immigrant communities from four states (California, Pennsylvania,
Vermont, and West Virginia); ChroniclItaly 3.0 expands the two previous
versions by including three additional titles published in Connecticut and
pushing the overall time span to cover from 1898 to 1936. In terms of
issues, ChroniclItaly 3.0 almost doubles the number of included pages
compared to its predecessors: 8653 vs 4810 of its previous versions. This is
a clear example of how the formation of a digital object is impacted by the
surrounding digital infrastructure, which in turn is dependent on funding
availability and whose very constitution is shaped by the various research
projects and the involved actors in its making.
These impressive figures on the whole may point to the influential role of
the Italian language press not just for the immigrant community but within
the wider American context, too. At a time when the mass migrations were
causing a redefinition of social and racial categories, notions of race, civilisa-
tion, superiority and skin colour had polarised into the binary opposition of
white/superior vs non-white/inferior (Jacobson 1998; Vellon 2017; Viola
and Verheul 2019a). The whiteness category, however, was rather complex
and not at all based exclusively on skin colour. Jacobson (1998) for instance
describes it as ‘a system of “difference” by which one might be both white
and racially distinct from other whites’ (ibid., p. 6). Indeed, during the
period covered by the ChroniclItaly collections, immigrants were granted
‘white’ privileges depending not on how white their skin might have been,
rather on how white they were perceived (Foley 1997). Immigrants in
the United States who were experiencing this uncertain social identity
situation have been described as ‘conditionally white’ (Brodkin 1998),
‘situationally white’ (Roediger 2005) and ‘inbetweeners’ (among others
Barrett and Roediger 1997; Guglielmo and Salerno 2003; Guglielmo
2004; Orsi 2010).
This was precisely the complicated identity and social status of Italians,
especially of those coming from Southern Italy; because of their challeng-
ing economic and social conditions and their darker skin, both other ethnic
groups and Americans considered them as socially and racially inferior and
often discriminated against them (LaGumina 1999; Luconi 2003). For
example, Italian immigrants would often be excluded by employment and
housing opportunities and be victims of social discrimination, exploita-
tion, physical violence and even lynching (LaGumina 1999; Connell and
Gardaphé 2010; Vellon 2010; LaGumina 2018; Connell and Pugliese
2018). The social and historical importance of Italian immigrant news-
papers is found in how they advocated the rights for the community they
represented, crucially acting as powerful inclusion, community building
and national identity preservation forces, as well as language and cultural
retention tools. At the same time, because such advocate role was often
paired with the condemnation of American discriminatory practices, these
newspapers also performed a decisive transforming role of American society
at large, undoubtedly contributing to the tangible shaping of the country.
The immigrant press and the ChroniclItaly collections can therefore be
an extremely valuable source to investigate specifically how the internal
mechanisms of cohesion, class struggle and identity construction of the
Italian immigrant community contributed to transform America.
54 L. VIOLA
Lastly, these collections can also bring insights into the Italian immi-
grants’ role in the geographical shaping of the United States. The majority
of the 4 million Italians that had arrived to the United States—mostly
uneducated and mostly from the south—had done so as the result of
chain migration. Naturally, they would settle closely to relatives and
friends, creating self-contained neighbourhoods clustered according to dif-
ferent regional and local affiliations (MacDonald and MacDonald 1964).
Through the study of the geographical places contained in the collections
as well as the place of publication of the newspapers’ titles, the ChroniclItaly
collections provide an unconventional and traditionally neglected source
for studying the transforming role of migrants for host societies.
On the whole, however, the novel contribution of the ChroniclItaly
collections comes from the fact that they allow us to devote attention to
the study of historical migration as a process experienced by the migrants
themselves (Viola 2021). This is rare as in discourse-based migration
research, the analysis tends to focus on discourse on migrants, rather than
by migrants (De Fina and Tseng 2017; Viola 2021). Instead, through the
analysis of migrants’ narratives, it is possible to explore how displaced
individuals dealt with social processes of migration and transformation
and how these affected their inner notions of identity and belonging.
A large-scale digital discourse-based study of migrants’ narratives creates
a mosaic of migration, a collective memory constituted by individual
stories. In this sense, the importance of being digital lies in the fact that
this information can be processed on a large-scale and across different
migrants’ communities. The digital therefore also offers the possibility–
perhaps unimaginable before—of a kaleidoscopic view that simultaneously
apprehends historical migration discourse as a combination of inner and
outer voices across time and space. Furthermore, as records are regularly
updated, observations can be continually enriched, adjusted, expanded,
recalibrated, generalised or contested. At the same time, mapping these
narratives creates a shimmering network of relations between the past
migratory experiences of diasporic communities and contemporary migra-
tion processes experienced by ethnic groups, which can also be compared
and analysed both as active participants and spectators.
Abby Smith Rumsey said that the true value of the past is that it is
the raw material we use to create the future (Rumsey 2016). It is only
through gaining awareness of these spatial temporal correspondences that
the past can become part of our collective memory and, by preventing us
from forgetting it, of our collective future. Understanding digital objects
2 THE IMPORTANCE OF BEING DIGITAL 55
through the post-authentic lens entails that great emphasis must be given
on the processes that generate the mappings of the correspondences. The
post-authentic framework recognises that these processes cannot be neutral
as they stem from systems of interpretation and management which are
situated and therefore partial. These processes are never complete nor they
can be completed and as such they require constant update and critical
supervision.
In the next chapter, I will illustrate the second use case of this book—
data augmentation; the case study demonstrates that the task of enriching
a digital object is a complex managerial activity, made up of countless
critical decisions, interactions and interventions, each one having conse-
quences. The application of the post-authentic framework for enriching
ChroniclItaly 3.0 demonstrates how symbiosis and mutualism can guide
how the interaction with the digital unfolds in the process of knowledge
creation. I will specifically focus on why computational techniques such
as optical character recognition (OCR), named entity recognition (NER),
geolocation and sentiment analysis (SA) are problematic and I will show
how the post-authentic framework can help address the ambiguities and
uncertainties of these methods when building a source of knowledge for
current and future generations.
NOTES
1. For a review of the discussion, see, for example, Cameron (2007, 2021).
2. https://www.marktechpost.com/2019/01/30/will-machine-learning-
enable-time-travel/.
3. These may sometimes be referred to as ‘wicked problems’; see, for instance,
Churchman (1967), Brown et al. (2010), and Ritchey (2011).
4. https://chroniclingamerica.loc.gov/.
5. https://utrecht-university.shinyapps.io/GeoNewsMiner/.
6. https://github.com/lorellav/DeXTER-DeepTextMiner.
7. https://www.c2dh.uni.lu/about.
56 L. VIOLA
Open Access This chapter is licensed under the terms of the Creative Commons
Attribution 4.0 International License (http://creativecommons.org/licenses/by/
4.0/), which permits use, sharing, adaptation, distribution and reproduction in any
medium or format, as long as you give appropriate credit to the original author(s)
and the source, provide a link to the Creative Commons license and indicate if
changes were made.
The images or other third party material in this chapter are included in the
chapter’s Creative Commons license, unless indicated otherwise in a credit line
to the material. If material is not included in the chapter’s Creative Commons
license and your intended use is not permitted by statutory regulation or exceeds
the permitted use, you will need to obtain permission directly from the copyright
holder.
CHAPTER 3
When you control someone’s understanding of the past, you control their
sense of who they are and also their sense of what they can imagine becoming.
(Abby Smith Rumsey, 2016)
collections. It soon became obvious that the solution was to simplify and
improve the process of exploring digital archives, to make information
retrievable in more valuable ways and the user experience more meaningful
on the whole. Naturally, within the wider incorporation of technology in
all sectors, it is not at all surprising that ML and AI have been more than
welcomed to the digital cultural heritage table. Indeed, AI is particularly
appreciated for its capacity to automate lengthy and boring processes that
nevertheless enhance exploration and retrieval for conducting more in-
depth analyses, such as the task of annotating large quantities of digital
textual material with referential information. Indeed, as this technology
continues to develop together with new tools and methods, it is more and
more used to help institutions fulfil the main purposes of heritagisation:
knowledge preservation and access.
One widespread way to enhance access is through ‘content enrichment’
or just enrichment for short. It consists of a wide range of techniques
implemented to achieve several goals from improving the accuracy of
metadata for better content classification1 to annotating textual content
with contextual information, the latter typically used for tasks such as
discovering layers of information obscured by data abundance (see, for
instance, Taylor et al. 2018; Viola and Verheul 2020a). There are at least
four main types of text annotation: entity annotation (e.g., named entity
recognition—NER), entity linking (e.g., entity disambiguation), text clas-
sification and linguistic annotation (e.g., parts-of-speech tagging—POS).
Content enrichment is also often used by digital heritage providers to
link collections together or to populate ontologies that aim to standardise
procedures for digital sources preservation, help retrieval and exchange
(among others Albers et al. 2020; Fiorucci et al. 2020).
The theoretical relevance of performing content enrichment, especially
for digital heritage collections, lies precisely in its great potential for discov-
ering the cultural significance underneath referential units, for example, by
cross-referencing them with other types of data (e.g., historical, social, tem-
poral). We enriched ChroniclItaly 3.0 for NER, geocoding and sentiment
within the context of the DeXTER project. Informed by the post-authentic
framework, DeXTER combines the creation of an enrichment workflow
with a meta-reflection on the workflow itself. Through this symbiotic
approach, our intention was to prompt a fundamental rethink of both the
way digital objects and digital knowledge creation are understood and the
practices of digital heritage curation in particular.
3 THE OPPOSITE OF UNSUPERVISED 59
models, leading to the current strong bias towards English models, data-
sets and tools and an overall digital language and cultural injustice.
Although the non-English content in Chronicling America has been
reviewed by language experts, many additional OCR errors may have
originated from markings on the material pages or a general poor condition
of the physical object. Again, the specificity of the source adds further
complexity to the many problematic factors involved in its digitisation; in
the case of ChroniclItaly 3.0, for example, we found that OCR errors were
often rendered as numbers and special characters. To alleviate this issue,
we decided to remove such items from the collection. This step impacted
differently on the material, not just across titles but even across issues of the
same title. Figure 3.1 shows, for example, the impact of this operation on
Cronaca Sovversiva, one of the newspapers collected in ChroniclItaly 3.0
with the longest publication record, spanning almost throughout the entire
archive, 1903–1919. On the whole, this intervention reduced the total
number of tokens from 30,752,942 to 21,454,455, equal to about 30% of
overall material removed (Fig. 3.2). Although with sometimes substantial
variation, we found the overall OCR quality to be generally better in the
Fig. 3.4 Distribution of entities per title after intervention. Positive bars indicate
a decreased number of entities after the process, whilst negative bars indicate an
increased number. Figure taken from Viola and Fiscarelli (2021b)
future work may also focus on performing NER using a more fine-tuned
algorithm so that the LOC-type entities could also be geocoded.
exist in a vacuum and all human exchanges are always context-bound, view-
pointed and processual (see Langacker 1983; Talmy 2000; Croft and Cruse
2004; Dancygier and Sweetser 2012; Gärdenfors 2014; Paradis 2015). In
fields such as corpus linguistics, for example, which heavily rely on manually
annotated language material, disagreement between human annotators on
same annotation decisions is in fact expected and taken into account when
drawing linguistic conclusions. This factor is known as ‘inter-annotator
agreement’ and it is rendered as a measure that calculates the agreement
between the annotators’ decisions about a label. The inter-annotator
agreement measure is typically a percentage and depends on many factors
(e.g., number of annotators, number of categories, type of text); it can
therefore vary greatly, but generally speaking, it is never expected to be
100%. Indeed, in the case of linguistic elements whose annotation is highly
subjective because it is inseparable from the annotators’ culture, personal
experiences, values and beliefs—such as the perception of sentiment—this
percentage has been found to remain at 60–65% at best (Bobicev and
Sokolova 2018).
The post-authentic framework to digital knowledge creation introduces
a counter-narrative in the main discourse that oversimplifies automated
algorithmic methods such as SA as objective and unproblematic and
encourages a more honest conversation across fields and in society. It
acknowledges and openly addresses the interrelations between the chosen
technique and its deep entrenchment in the system that generated it. In the
case of SA, it advocates more honesty and transparency when describing
how the sentiment categories have been identified, how the classification
has been conducted, what the scores actually mean, how the results have
been aggregated and so on. At the very least, an acknowledgement of such
complexities should be present when using these techniques. For example,
rather than describing the results as finite, unquestionable, objective and
certain, a post-authentic use of SA incorporates full disclosure of the
complexities and ambiguities of the processes involved. This would con-
tribute to ensuring accountability when these analytical systems are used in
domains outside of their original conception, when they are implemented
to base centralised decisions that affect citizens and society at large or when
they are used to interpret the past or write the future past.
The decision of how to define the scope (see for instance Miner 2012)
prior to applying SA is a good example of how the post-authentic frame-
work can inform the implementation of these techniques for knowledge
creation in the digital. The definition of the scope includes defining
74 L. VIOLA
La Tribuna
del Connecticut
La Sentinella
del West
La Sentinella
La Rassegna
La Ragione
Title
La Libera
Parola
L'Italia
L'Indipendente
Il Patriota
Cronaca
Sovversiva
0 20 40 60 80
Entities
Fig. 3.5 Logarithmic distribution of selected entities for SA across titles. Figure
taken from Viola and Fiscarelli (2021b)
On the opposite end of the scale, empirical works are believed to be—at
least in theory—fully replicable. Thus, despite the still unresolved debate
on the ‘R-words’, over the years, protocols and standards for replication in
science have been perfectioned and systematised. When computers started
to be used for experiments and data analysis, things turned complicated.
Plesser (2018), for instance, explains how it became apparent that the
canonical margins for experimental error did not somehow apply to digital
research:
In the next chapter, I will illustrate the third use case of the book, the
application of the post-authentic framework to digital analysis. Through
the example of topic modelling, I will show how the post-authentic
framework can guide a deep understanding of the assemblage of culture
and technology in software and help us achieve the interpretative potential
of computation. I will specifically discuss the implications for knowledge
creation of the transformation of continuous material into discrete form—
binary sequences of 0s and 1s—with particular reference to the notions
of causality and correlations. Within this broader discussion, I will then
illustrate the example of topic modelling as a computational technique
that treats a collection of texts as discrete data, and I will focus on the
critical aspects of topic modelling that are highly dependent on the sources:
pre-processing, corpus preparation and deciding on the number of topics.
The topic modelling example ultimately shows how producing digital
knowledge requires sustained engagement with software, in the form of
fluid, symbiotic exchanges between processes and sources.
NOTES
1. See, for instance, the Europeana Semantic Enrichment Framework at
https://pro.europeana.eu/share-your-data/enrichment.
2. usa in Italian means ‘he/she/it uses’.
3. For the documentation of the training of the Italian model used to annotate
ChroniclItaly 3.0 including information on the output format and F1 score,
please see Viola et al. (2019); Viola and Fiscarelli (2021b).
4. https://github.com/lorellav/DeXTER-DeepTextMiner.
5. This logarithmic function is the inverse function of the exponential func-
tion.
6. https://cloud.google.com/natural-language/docs/analyzing-sentiment.
7. https://console.cloud.google.com/.
8. https://cloud.google.com/natural-language/docs/basics#interpreting_
sentiment_analysis_values.
9. See note 5.
10. For an overview of the so-called R debate, please see Rougier et al. (2016).
80 L. VIOLA
Open Access This chapter is licensed under the terms of the Creative Commons
Attribution 4.0 International License (http://creativecommons.org/licenses/by/
4.0/), which permits use, sharing, adaptation, distribution and reproduction in any
medium or format, as long as you give appropriate credit to the original author(s)
and the source, provide a link to the Creative Commons license and indicate if
changes were made.
The images or other third party material in this chapter are included in the
chapter’s Creative Commons license, unless indicated otherwise in a credit line
to the material. If material is not included in the chapter’s Creative Commons
license and your intended use is not permitted by statutory regulation or exceeds
the permitted use, you will need to obtain permission directly from the copyright
holder.
CHAPTER 4
How Discrete
If you torture the data long enough, it will confess to anything. (Attributed
to Ronald H. Coase, 1960)
devised such techniques because the principles upon which they are based
are very clearly defined by their creators and understood in those circles.
It may on the contrary have huge consequences when these methods are
passively transferred into other disciplines or practices. In his analysis of
informational approaches in cancer biology research, Longo (2018), for
example, critiques the extensive use of computer science terminology such
as ‘instructions’, ‘to reprogram a deprogrammed DNA’ and in general the
DNA described as a computer program and genes as information carriers.
He argues (88):
make these methods extremely popular outside their fields of origin, but
at the same time, it has obfuscated the well-defined laws upon which they
are based. Longo claims:
effects’ (Holbach 1770). This view has been criticised over the course
of the twentieth century for several shortcomings such as the proximity
of elements in determining cause-effect relationships, predictability as the
main criterion for establishing causation and the reduction of causality
essentially to a mere temporal relationship. Discoveries of and advances in
differential equations, atomic physics and quantum mechanics have further
consolidated such criticisms eventually leading to the current separation
of causality from determinism. Particularly in quantum mechanics, recent
experiments have provided strong evidence for the validity of this notion
of causality without determinism. In this view, consequent states of a
quantum system are related to its antecedent states by a form of conditional
dependency (Weinert 2005, 241) as opposed to every event having a
unique cause that precedes it.
Coming back to the distinction between discrete and continuous struc-
tures, this means that in discrete systems, there is no deterministic cause-
effect relationship, because points are totally separated from each other,
whereas in continuous systems, causal relations can be observed and
measured, but not predicted3 (Longo 2018, 86). Though it may appear
inconsequential at first, this observation about causality has specific and
profound implications that stretch well beyond mathematical and phys-
ical reasoning. Stating that in discrete structures such as say a database
where something belongs to either one category or another, no cause-
effect relationship of observed phenomena can be established but only a
probabilistic one essentially means that explanations for such phenomena
cannot be found, only correlations. If two random variables are correlated,
or as noted by Calude and Longo (2017), co-related, it means that they
are associated according to a statistical measure, that they co-occur. This
statistical measure is rendered by a correlation coefficient, a number
between −1 and 1 that expresses the strength of the linear relationship
between two numeric variables. If two variables are positively correlated
(e.g., they both increase), then the correlation coefficient will be closer to
1, if there is a negative correlation (i.e., they are inversely correlated), it will
be closer to −1, and closer to 0 if there is no correlation at all. It is a well-
established fact in statistics and beyond that a correlation coefficient per se
is not enough to explain the cause for the patterns that are captured.4
The identification of statistical correlations is nevertheless an important
factor in understanding the relationship between two quantitative variables
and it remains an insightful method that can potentially lead to significant
discoveries. Indeed, the observation of correlations is at the foundation
88 L. VIOLA
of the classic scientific method in the sense that starting from the mea-
surement of correlated phenomena, scientists have been able to formulate
theories that could be tested and later confirmed or disproved. The history
of science is full of extraordinary achievements which originated from
mere observations of not-so-obviously correlated phenomena, for example,
distributional semantics theory, a famous linguistic theory that stemmed
from the intuition of Zellig S. Harris and John R. Firth, two semanticists
(though Harris was also a statistical mathematician). This intuition—
famously captured by Firth’s quote ‘You shall know a word by the company
it keeps’ (1957, 11)—acknowledges the relevance of words’ collocation
(i.e., the place of occurrence of words) in determining their meaning. The
core idea behind Harris and Firth’s work on collocational meaning and
distributional semantics is that meanings do not exist in isolation; rather,
words that are used and occur in the same contexts tend to purport similar
meanings (Harris 1954, p. 156).
In those days, gaining access to real language data was costly and
very time-consuming and for a long time, it was not possible to test
this theory. But more recently, new advances in computer science merged
with huge quantities of naturally occurring language material, including
digitised historical data-sets, have indeed proven that languages are not
deterministic systems—as previously believed—but that they should be
thought to be ‘probabilistic, analogical, preferential systems’ (Hanks 2013,
310). As intuitively theorised in distributional semantics, words do not
have a one-to-one relationship with meaning because meanings are not
precise, exact or stable. To the contrary, words in isolation do not possess
any meaning and meanings can only be entailed from words’ context. As
argued by Harris, ‘We cannot say that each morpheme or word has a single
or central meaning, or even that it has a continuous or coherent range
of meanings’ (Harris 1954, 151). Sixty years after its initial formulation,
distributional semantics theory laid the basis for Google’s renowned
word2vec algorithm, and today, it constitutes the theoretical background
of NLP studies concerned with language and meaning, including the very
topic modelling (cfr. Sect. 4.4).
Coming back full circle to causality, correlations and patterns, a correla-
tion measure only informs us of the strength of a relationship between two
variables, whereas the patterns tell us that certain regularities can be found
in how the observed variables are distributed. Hence, for they highlight
trends in the data, correlations and patterns may potentially have predictive
power, but neither of them provides causal explanations for the analysed
4 HOW DISCRETE 89
coherent substructure, let alone their wider historical, social and cultural
entrenchment.
As said earlier in the chapter, topic modelling provides a probabilistic
representation of how words are distributed in documents according to
statistical calculations, that is, correlations. This means that words are
considered to be discrete elements; for example, in the corpus preparation
stage (cfr. Sect. 4.5.2), words are transformed into numeric variables and
their distribution across documents is represented as a distribution matrix.
What topic modelling then does is measuring the strength of the linear
relationship between these numeric variables. But topic modelling also
treats the corpus itself as a collection of discrete data, which means that
each text is also processed as a separate entity totally disconnected from all
the other texts in the batch. This is true regardless of whether the input
is all the chapters from the same book, all the issues of a newspaper or
all the abstracts ever submitted to an academic journal under the keyword
tag ‘topic modelling’. In other words, it is a computational technique that
efficiently identifies patterns of words’ distribution, but because it lacks
the words’ underlying continuous structure—the infinity of language—no
cause-effect relationship of the correlated phenomena can be established,
i.e., the meaning of such patterns.
Another issue with topic modelling is that it assumes that an a priori fixed
number of topics—which in any case is decided more or less arbitrarily—
is represented in different proportions in all the documents. Hence, if the
algorithm is instructed to find X number of topics, it will build a model that
fits that number. This assumption behind the technique cannot but paint
a rather artificial and non-exhaustive picture of the documents’ content
as it is hard to imagine how in reality, a fixed number of topics could
adequately represent the actual content of all the analysed documents.
Thus, correlations will surely be identified but not all these correlations
will necessarily carry significance, that is, meaning. Moreover, as countless
parameters can be tweaked, the smallest change will output a different
model, in which different correlations will be found and many others will
be missing. Conversely, even when the same parameters from the same
software are used on the same data-set, the algorithm will output a slightly
different model, which indeed proves once again that patterns will always
be identified, regardless of their significance. I will return to this point in
Sect. 4.5.3.
94 L. VIOLA
4.5.1 Pre-processing
In Chap. 3, I illustrated how pre-processing operations are far from being
standard and how it is in fact required that each intervention is carefully
assessed by scholars and practitioners and evaluated on a case-by-case basis.
In my discussion, I considered the many influential factors at play (e.g., the
96 L. VIOLA
for the specific task of topic modelling, additional specific linguistic deci-
sions must also be evaluated, here I discuss stemming and lemmatisation.
Although both aim to obtain a word root by reducing the inflection in
words, these operations are built on very different assumptions. Stemming
deletes the initial or final characters in a token based on a list of common
prefixes and suffixes that may typically occur in the inflected words of a lan-
guage (e.g., states → state). It is therefore language-dependent as it relies
on limited cases which would apply exclusively to certain languages that
follow specific inflection rules. Therefore for languages that follow fairly
regular inflection rules such as English, stemming may work reasonably
well, but applied to highly inflectional languages such as Italian, due to its
many exceptions and irregularities, the algorithm would almost certainly
perform poorly. Another strong limitation of stemming is that in many
cases—including low-inflectional languages—the output would not be an
actual word, meaning that the operation is likely to introduce new errors.
On the other hand, as it is not a particularly advanced technique, stemming
does not require a long processing time or processing power, and therefore
this solution may be implemented when working with particularly large
corpora or when constrained by time limitations.
Lemmatising is on the contrary a much more sophisticated technique as
it is based on more solid linguistic principles than stemming. By means of
detailed dictionaries that contain lemmas and by examining words’ context,
a lemmatising algorithm analyses the morphology of each word and it then
transforms it into its grammatical root (e.g., better → good). Especially
in the case of topic modelling in which the output is essentially a list of
words without any context, lemmatising can be very helpful to distinguish
between homonyms, words that have the same spelling, sometimes the
same pronunciation too but which in fact possess different meanings. For
example, the word mento in Italian can mean either ‘chin’ or ‘I lie’. A
lemmatising algorithm would theoretically be able to entail the use of
mento from its context and distinguish it from its homonym; in this case,
the different outputs would be mento (i.e., chin) for the former and mentire
(i.e., to lie) for the latter. Because of its complexity, however, lemmatising
may require a long time and very high processing power to perform, and so
in the case of large size collections or depending on the available means and
resources, it may not be ideal. Additionally, if on the one side lemmatising is
effective at differentiating between homonyms, on the other the reduction
of all inflected words to their lemma may cause information loss. For
instance, it would no longer be possible to recognise the tense (present,
98 L. VIOLA
past, future) or the grammatical person (I, they, you, etc.) of the verbs,
the gender or number of the nouns, the degree of the adjectives (e.g.,
superlative, comparative), etc.
To assess whether this type of information is relevant or not depends
once again on several factors such as the type of data-set (e.g., size,
content), the context of the digital analysis, the language of the data-set
and the specific research question(s); researchers should therefore carefully
evaluate pros and cons of implementing this operation. For example, in
researching narratives of migration as they were told by Italian American
migrants, the cons of implementing either stemming or lemmatising would
in my opinion exceed the pros. Italian is a highly inflectional language
and a great deal of linguistic information is encoded in suffixes and
prefixes; stemming therefore ill suits it. Similarly, lemmatising the corpus
would also cause the loss of information encoded in inflected words (e.g.,
verbs expressed in the first person, collective concepts expressed by plural
nouns) which could bring valuable insights into the cognitive, subjective
dimension of the stories told by the migrants.
Finally, whether to perform or not either of these operations is very
much dependent on the language of the data-set, not just because dif-
ferent languages have different inflection rules, but crucially also because
not all languages are equally resourced digitally. Indeed, as discussed in
Sect. 2.2, the digital consequence of the fact that most mass digitisation
projects have been carried out in the United States and later in Europe
is that computational resources available for languages other than English
continue to remain on the whole scarce. Such Anglophone-centricity is
often still a barrier to researchers, teachers and curators whose sources
are in languages other than English. Indeed, the comparative lack of
computational resources in other languages often dictates which tasks can
be performed, with which tools and through which platforms (Viola and
Fiscarelli 2021b). Moreover, even when adaptations for other languages
may be possible, identifying which changes should be implemented, and
perhaps more importantly, understanding the impacts these may have, is
often unclear (Mahony 2018). This includes lemmatising algorithms and
dictionaries which do not yet exist for all idioms; therefore, for particularly
under-resourced languages, stemming may be the only, far from ideal,
option.
4 HOW DISCRETE 99
of the high risk of attributing meaning to patterns and trends that in reality
may be ‘spurious’ in the mathematical sense, i.e., meaningless (Calude and
Longo 2017) (cfr. Sect. 4.3). Naturally, the risk is even higher when the
technique is adopted uncritically, especially in fields outside of computer
science. The authors clarified that although typically it is implicitly assumed
that the identified latent spaces will be semantically salient, in reality, this is
not at all what the promise of topic modelling is about. Since then, others
(see for instance Bail 2018) have also openly acknowledged the limitations
of the technique and repeatedly attempted to reframe topic modelling as
‘a tool for reading’ rather than a tool for meaning, that is, an exploratory
tool which in order to obtain more nuanced and reliable findings, should
be integrated with other methods. In this respect, for instance, sociologist
Chris Bail (ibid.) notes:
Despite this rather humble assessment of the promise of topic models, many
people continue to employ them as if they do in fact reveal the true meaning
of texts, which I fear may create a surge in “false positive” findings in studies
that employ topic models.
the one hand this approach allows the researcher to closely examine the
varied topics’ structures before deciding on the most coherent model,
on the other it has the limitation to potentially lead analysts to prefer a
model that seems to confirm their a priori ideas, thus resulting in biased
interpretations. This approach may work fairly well in those cases when
the analyst has extensive knowledge of the material, the field and the
period of reference of the collection among others, but it is generally
not recommended in statistics; in the words of statistician Stephen M.
Stigler: ‘Beware of the problem of testing too many hypotheses; the more
you torture the data, the more likely they are to confess, but confessions
obtained under duress may not be admissible in the court of scientific
opinion’ (Stigler 1987).
More often, however, very little is known about the actual content of
the documents as true content is exactly what the technique is wrongly
believed to be able to find, which provides the original justifying argument
for using the method. It goes like this: due to the increasingly large size of
available digital material, it is not possible for researchers and practitioners
to explore the documents through traditional close reading methods; not
only would this be too time-consuming but also somewhat less efficient as a
machine will always outperform humans in identifying patterns. Although
this is in principle true as clarified earlier, the assumption that all the found
patterns are intrinsically meaningful is not. To meet this challenge, research
has been conducted towards implementing statistical methods that could
help researchers and practitioners find the craved ‘optimal number of
topics’. Two of the most common methods are model perplexity and
topic coherence, measures that score the statistical quality of different
topic models based on the topics’ compositions in several models. Though
not unanimously, the believed assumption behind these techniques is
that a higher statistical quality yields more interpretable topics. Model
perplexity (also known as predictive likelihood) predicts the likelihood
of new (i.e., unseen) text to appear based on a pre-trained model. The
lower the perplexity value, the better the model predicts the distribution
of the words that appear in each topic. However, studies have shown
that optimising a topic model for perplexity does not necessarily increase
topics’ interpretability, as perplexity and human judgement are often not
correlated, and sometimes even slightly anti-correlated (Jacobi et al. 2015,
7).
Topic coherence was developed to compensate for this shortcoming
and it has become popular over the years. What the method is designed
104 L. VIOLA
NOTES
1. “Bloquer le pays ne permet pas d’endiguer l’épidémie”.
2. For a detailed and in-depth historical discussion on causality in physics and
philosophy, I refer the reader to Weinert (2005).
3. Please note that not everyone agrees with this view and that there are still
unanswered questions around causality, particularly in relation to discrete
phenomena in quantum mechanics. See, for instance, Le Bellac (2006) and
Jaeger (2009).
4. A well-known phrase that synthesises this fact is ‘correlation does not mean
causation’.
5. Not previously annotated material.
106 L. VIOLA
Open Access This chapter is licensed under the terms of the Creative Commons
Attribution 4.0 International License (http://creativecommons.org/licenses/by/
4.0/), which permits use, sharing, adaptation, distribution and reproduction in any
medium or format, as long as you give appropriate credit to the original author(s)
and the source, provide a link to the Creative Commons license and indicate if
changes were made.
The images or other third party material in this chapter are included in the
chapter’s Creative Commons license, unless indicated otherwise in a credit line
to the material. If material is not included in the chapter’s Creative Commons
license and your intended use is not permitted by statutory regulation or exceeds
the permitted use, you will need to obtain permission directly from the copyright
holder.
CHAPTER 5
Figures don’t lie, but liars do figure. (Attributed to Carroll D. Wright, 1889)
La Tribuna
del Connecticut
La Sentinella
del West
La Sentinella
La Rassegna
La Ragione
Title
La Libera
Parola
L'Italia
L'Indipendente
Il Patriota
Cronaca
Sovversiva
98
01
04
07
10
13
16
19
22
25
28
31
34
37
18
19
19
19
19
19
19
19
19
19
19
19
19
19
Year
Fig. 5.1 Distribution of issues within ChroniclItaly 3.0 per title. Red lines indicate
at least one issue in a three-month period. Figure taken from Viola and Fiscarelli
(2021b)
Allowing the analyst to know that the material for the digital analysis is
distributed differently acts as a way to highlight that problematic aspects
of digital research and digital objects that precede the analysis itself but
which nevertheless influence how the technique may be applied and the
results interpreted. Figure 5.2 shows how this step could be handled in
the interface. Once the documents are uploaded, the analyst is prompted
by a question asking them about the potential presence of metadata infor-
mation. With this question the intention is to maintain contact with the
continuous aspect of the digital object hidden by its discrete representation
and further altered by the topic modelling algorithm which treats the
documents, too as a collection of discrete data.
If the analyst chooses ‘yes’, the metadata information would then be
used to create a dynamic, interactive visualisation inspired by the one
displayed in Fig. 5.1; this would display how the files are distributed in
the collection, ultimately creating room for reflection and awareness. In
116 L. VIOLA
the case of ChroniclItaly 3.0, for example, this visualisation displays the
number of published issues on a specific day, month or year and by
which titles; the display of this information allows the analyst to promptly
identify the difference in the frequency rate of publication across titles and
potential gaps in the collection (Fig. 5.3). The post-authentic framework
to visualisation signals the importance of maintaining the connection with
the digital object, understood as an organic, problematic entity. Such
connection is acknowledged as an essential element of the process of
knowledge creation in the digital in that it favours a more engaged, critical
approach to digital objects and it creates a space in which more informed
decisions can be made and ultimately answering the need for digital data
and visualisation literacy.
5 WHAT THE GRAPH 117
de Crouy Chanel
119
120 L. VIOLA
Fig. 5.5 Interface for topic modelling: data pre-processing. The wireframe dis-
plays how the post-authentic framework to UI could make pre-processing more
transparent to users. Wireframe by the author and Mariella de Crouy Chanel
Fig. 5.6 Interface for topic modelling: data pre-processing (stemming and lem-
matising). The wireframe displays how the post-authentic framework to UI could
make stemming and lemmatising more transparent to users. Wireframe by the
author and Mariella de Crouy Chanel
and settings, compare results and make more informed decisions. In this
way, the interface would actualise a counterbalancing narrative in the main
positivist discourse that equals the removal of the human—which in any
case is illusory—to the removal of biases. To the contrary, the argument
I advance in this book is that it is only through the active and conscious
participation of the human in processes of data creation, tools’ selection,
methods’ and algorithms’ implementation that such biases can in fact be
identified, acknowledged and to an extent, addressed.
The post-authentic framework to knowledge creation in the digital
advocates a more participatory, critical approach towards digital methods
and tools, particularly if they are applied for humanistic enquiry. Against a
purely correlations-driven big data approach, it offers a more complex and
nuanced perspective that challenges current views sidelining human agency
122 L. VIOLA
Fig. 5.7 Interface for topic modelling: corpus preparation. The wireframe dis-
plays how the post-authentic framework to UI could make corpus preparation more
transparent to users. Wireframe by the author and Mariella de Crouy Chanel
relationship between the use of linguistic forms and culture (Diehl 2019).
In branches of DH such as digital history and digital cultural heritage,
NA is also considered to be an efficient method to intuitively reduce
complexity (Düring et al. 2015). This may be due to the fact that this
technique benefits particularly from attractive visualisations which support
the impression that explanations for social events are accurate, complete,
detailed and scientific, naturally adding to the allure of using it.
However, a typically omitted, yet rather critical issue of NA is that the
graphs can only display the nodes and attributes that are modelled; as
these stem from samples which by definition are incomplete and which
undergo several layers of manipulation, transformation and selection, the
conclusions the graphs suggest will always be partial and potentially based
on over-represented actors or conversely, on underrepresented social cate-
gories. In the case of a digital object such as the cultural heritage collection
ChroniclItaly 3.0 which aggregates sources heterogeneously distributed
(cfr. Sect. 5.2), this issue is particularly significant as any resulting graph
depends on the modelled newspaper (e.g., mainstream vs anarchic), on the
type and number of entities included and excluded and on the attributes’
variables (e.g., frequency of co-occurrence, number of relations, sentiment
polarity), to name but a few. Each one of these factors can dramatically
influence the network displays and consequently impact on the provided
interpretation of the past.
The project’s GitHub repository3 —which is to be understood as inte-
gral part of the visualisation interface—is a good example of how the
post-authentic framework can guide the actualisation of principles of trans-
parency, accountability and reproducibility and how it values ambiguity
and uncertainty. The DeXTER’s GitHub repository documents, explains
and motivates all the interventions on the data, including reporting on the
processes of entity selection (cfr. Sect. 3.3). The aim is to warn the analyst
that despite being (too) often presented as a statement of fact, a visually
displayed network is a mediated and heavily processed representation of the
modelled actors. As such, the post-authentic framework does not solely
aim to increase trust in the data and how it is transformed, but also
to acknowledge uncertainty in both the data lifecycle and the resulting
graphs and finally to expose and accept how these may be untrustworthy
(Boyd Davis et al. 2021, 546). The act of making explicit the interpretative
work of shaping the data is what Drucker calls ‘exposing the enunciative
workings’ (2020, 149):
126 L. VIOLA
For data production, the task is to expose some of the procedures and steps
by which data is created, selected, cleaned, and processed. Retracing the
statistical processes, showing the datamodel and what has been eliminated,
averaged, reduced, and changed in the course of the lifecycle would put the
values of the data into a relative, rather than declarative, mode. This is one of
the points of connection with the interface system and task of exposing the
enunciative workings.
By acknowledging that the displayed entities are not all the entities in
the collection but in fact a representative, yet small, selection, DeXTER
encourages close engagement with the NA graphs; it does not try to
remove uncertainty but it points where it is. At the same time, it recognises
the management of data as an act of constant creation, rather than a mere
observation of neutral phenomena. For example, the process of entity
selection as I described it in Sect. 3.4 created a subset of the most fre-
quently occurring entities distributed proportionately across the different
newspapers. With this intervention, we aimed to alleviate the issue of source
over-representation due to some titles being much larger than others
and to reduce complexity in the resulting network graphs, notoriously
considered as the downside of NA. At the same time, however, this
intervention may cause the least occurring entities to be under-represented
in the visualisations. Thus, the transparent and detailed documentation
of how we intervened on the data that originates the NA visualisations
counterbalances the illusion of neutrality and completeness often conveyed
by ultra-polished NA visualisations.
Another issue of NA data modelling concerns the theoretical assumption
upon which the technique is based. As a bare minimum, a network visual-
isation connects nodes through a line (i.e., edge) that carries information
on the type of relation between the nodes (i.e., attributes). Nodes are
understood as discrete objects, i.e., completely independent from each
other (cfr. Chap. 4); this ultimately means that the nodes are modelled
to remain stable and that the emphasis is on the relations, as these are
believed to provide adequate explanations of social phenomena. However,
this type of modelling arguably paints a rather artificial picture of both
the phenomena and the actors who remain unaffected by the changing
relationships between them. To put it in Drucker’s words:
Fig. 5.8 DeXTER default landing interface for NA. The red oval highlights the
time bar (historicise feature)
128 L. VIOLA
Fig. 5.9 DeXTER default landing interface for NA. The red oval highlights the
different title parameters
for example, observe not just how the relationships between entities change
over time but also the entities themselves. It is for instance possible to
explore how entities of interest were mentioned by migrants over time:
by selecting/deselecting specific titles (cfr. Fig. 5.9) of different political
orientation and geographical location, by selecting the frequency rate
and sentiment polarity (cfr. Fig. 5.10) to observe the prevailing emotional
attitude of the sentences in which the entities were mentioned together as
well as their frequency of occurrence.
By visualising both entities and relations and by creating dynamic and
interactive NA visualisations, the DeXTER interface on the whole aims
to provide several viewpoints on the same data, and it effectively shows
how several dimensions of observance dramatically affect the graphical
arrangements. In the case of the historicisation feature, for example, as the
5 WHAT THE GRAPH 129
Fig. 5.10 DeXTER default landing interface for NA. The red ovals highlight the
frequency and sentiment polarity parameters
Fig. 5.11 DeXTER: egocentric network for the node sicilia across all titles in the
collection in sentences with prevailing positive sentiment
of their choice as the focal node, this third visualisation displays the actors
mentioned in specific newspapers on specific days. In this way, the issue-
focused network offers an additional perspective on the same digital object
potentially contributing valuable insights for the analysis of how events and
actors of interest were portrayed by migrants of different political affiliation
and who were based in different parts of the United States. Thus, instead of
offering one obvious meaning, DeXTER offers multiple perspectives, and
by capturing heterogeneous contexts, it creates a tension that the analyst
is encouraged to resolve through independent assessment (Gaver et al.
2003). Figure 5.13 shows the default issue-focused network graph.
DeXTER’s visualisation of sentiment as an attribute of NA is also guided
by post-authentic principles. As already discussed in Sect. 3.4, SA is a com-
putational technique that aims to identify the prevailing emotional attitude,
i.e., the sentiment, in a given text (or portions of a text); the sentiment is
then typically categorised according to three labels, i.e., positive, negative
132 L. VIOLA
Fig. 5.12 DeXTER: network for the ego sicilia and alters across titles in the
collection in sentences with prevailing positive sentiment
are purposely smudged and pale, and pastel shades are preferred over
bright, solid shades; the aim is to openly acknowledge SA as ambiguous,
situated and therefore open to interpretation, rather than precise, neutral
and certain. By exposing these inconsistencies, post-authentic visualisations
on the whole question the main positivist discourse around technology. We
achieved this goal by providing a transparent documentation of how we
identified the sentiment categories, how we aggregated the results, how
we conducted the classification, how we interpreted the scores and how
we rendered them in the visualisation, in the openly available dedicated
GitHub repository which also includes the code, links to the original and
processed material and the files documenting the manual interventions.
Finally, guided by the post-authentic framework, DeXTER emphasises
the continuous making and re-making of data; this process of forming,
arranging and interpreting data is encoded within the interface itself.
Through the tab ‘Data’, users can at any point access and download the
data behind the visualisations as they reflect users’ selection of filters and
134 L. VIOLA
NOTES
1. A human-in-the-loop method requires human and machine intelligence to
create machine learning models. In this approach, humans interact with the
algorithm during training by tuning and testing
2. https://github.com/DHARPA-Project
3. https://github.com/lorellav/DeXTER-DeepTextMiner
Open Access This chapter is licensed under the terms of the Creative Commons
Attribution 4.0 International License (http://creativecommons.org/licenses/by/
4.0/), which permits use, sharing, adaptation, distribution and reproduction in any
medium or format, as long as you give appropriate credit to the original author(s)
and the source, provide a link to the Creative Commons license and indicate if
changes were made.
The images or other third party material in this chapter are included in the
chapter’s Creative Commons license, unless indicated otherwise in a credit line
to the material. If material is not included in the chapter’s Creative Commons
license and your intended use is not permitted by statutory regulation or exceeds
the permitted use, you will need to obtain permission directly from the copyright
holder.
CHAPTER 6
Conclusion
Philosophers until now have only interpreted the world in various ways.
The point, however, is to change it. (Karl Marx, 1846)
a close connection with the digital object; for example, in the chapter,
I stressed how aspects such as pre-processing, corpus preparation and
choosing the number of topics typically reputed as unproblematic are in
fact fundamental moments within a topic modelling workflow in which
the analyst is required to make countless choices. The example of topic
modelling demonstrates how the post-authentic framework can guide the
exploration, questioning and challenging of the interpretative potential of
computation.
Operating within the post-authentic framework crucially means
acknowledging digital objects as living entities that have far-reaching,
unpredictable consequences; the continually changing complexity of
nets involving processes and actors must therefore always be critically
supervised. The visualisation of a digital object is one such process.
The post-authentic framework opposes an uncritical adoption of digital
methods and points to the intrinsic dynamic, situated, interpreted and
partial nature of the digital. Despite being often employed as exact ways
of presenting reality, visualisations are extremely ambiguous techniques
which embed numerous human decisions and judgement calls. In
Chap. 5, I illustrated how the post-authentic framework can be applied to
visualisation by discussing two examples: efforts towards the development
of a user interface (UI) for topic modelling and the design choices for
developing the app DeXTER, the interactive visualisation interface to
explore ChroniclItaly 3.0. I specifically centred my discussion on how
the ambiguities and uncertainties of topic modelling, network analysis
(NA) and SA can be encoded visually. A key notion of the post-authentic
framework is the acknowledgement of curatorial practices as manipulative
interventions and of how it is in fact through exposing the ambiguities and
uncertainties that knowledge creation in the digital can be kept honest and
accountable for current and future generations.
Through the application of the post-authentic framework to these four
case examples, the book aimed to show how an uncritical and naïve
approach to the use of computational methods is bound to reproduce
the very opaque processes that the publicised algorithmic discourse claims
to break, but more worryingly, it contributes to make society worse. The
book was therefore also a contribution to working towards systemic change
in knowledge creation practises and by extension, in society at large; it
provided a new set of notions and methods that can be implemented when
collecting, assessing, reviewing, enriching, analysing and visualising digital
material. It is this more problematised notion of the digital conceptualised
144 L. VIOLA
in the framework that highlights how its transcending nature makes old
dichotomies between digital knowledge creation and non-digital knowl-
edge creation no longer relevant and in fact, harmful.
The digitisation of society already well on its way before the COVID-
19 pandemic but certainly brought to its non-reversible turning point
by the 2020 health crisis has brought into sharper focus how the digital
exacerbates existing fractures and disparities in society. Unable to deal
adequately with the complexity of society and social change, the cur-
rent model of knowledge creation urgently requires a re-theorisation.
This book is therefore a wake-up call for understanding the digital as
no longer contextual to knowledge creation and for recognising that a
discipline compartmentalisation model sustains an anachronistic and not
equipped way to encapsulate and explain society. All information is now
digital and algorithms are more and more central nodes of knowledge
and culture production with an increased capacity to shape society at
large. As digital vs non-digital positions have entirely lost relevance, it
has become increasingly futile to create ultra-specialised disciplines from
other disciplines’ overlapping spaces or indeed to invest energy in trying
to define those, such as in the case of DH; the digital transformation
has magnified the inadequacy of a mono-perspective approach, legacy
of a model of knowledge that compartmentalises competing disciplines.
Scholars, researchers, universities and institutions must acknowledge the
central role they have to play in assessing how knowledge is created not
just today, but also for future generations.
The new theoretical and methodological framework that I proposed in
this book moves beyond the current static conceptualisation of knowledge
production which praises interdisciplinarity but forces knowledge into rigid
categories. To the contrary, the framework offered novel concepts and
terminologies that break with dialectical principles of dualism and antago-
nism, including dichotomous notions of digital vs non-digital, sciences vs
the humanities, authentic vs non-authentic and computational/neutral vs
non-computational/biased. The re-devised notions, practices and values
that I offered help re-figure the way in which society conceptualises data,
technology, digital objects and the process of knowledge creation in the
digital.
My re-examination of the current model of knowledge includes not just
scholarship but pedagogy too. And whilst this is not the main focus of
this book, the arguments I put forward here for scholarship equally apply
to pedagogy. In order to achieve systemic change, academic programmes
6 CONCLUSION 145
Open Access This chapter is licensed under the terms of the Creative Commons
Attribution 4.0 International License (http://creativecommons.org/licenses/by/
4.0/), which permits use, sharing, adaptation, distribution and reproduction in any
medium or format, as long as you give appropriate credit to the original author(s)
and the source, provide a link to the Creative Commons license and indicate if
changes were made.
The images or other third party material in this chapter are included in the
chapter’s Creative Commons license, unless indicated otherwise in a credit line
to the material. If material is not included in the chapter’s Creative Commons
license and your intended use is not permitted by statutory regulation or exceeds
the permitted use, you will need to obtain permission directly from the copyright
holder.
REFERENCES
Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res
3(Jan):993–1022
Bobicev V, Sokolova M (2018) Thumbs up and down: sentiment analysis of medical
online forums. In: EMNLP 2018. https://doi.org/10.18653/v1/W18-5906
Boccagni P, Schrooten M (2018) Participant observation in migration studies:
an overview and some emerging issues. In: Zapata-Barrero R, Yalaz E (eds)
Qualitative research in european migration studies, imiscoe research series.
Springer International Publishing, Cham, pp 209–225. https://doi.org/10.
1007/978-3-319-76861-8-12
Boyd D (2016) Undoing the neutrality of big data. Fla Law Rev Forum 67:226–
232
Boyd D (2018) Don’t believe every ai you see. https://www.newamerica.org/pit/
blog/dont-believe-every-ai-you-see/
Boyd Davis S, Vane O, Kräutli F (2021) Can I believe what I see? Data visualization
and trust in the humanities. Interdiscip Sci Rev 46(4):522–546, https://
doi.org/10.1080/03080188.2021.1872874, https://www.tandfonline.com/
doi/full/10.1080/03080188.2021.1872874
Braidotti (2017) Posthuman critical theory. J Posthuman Stud 1(1):9. https://
doi.org/10.5325/jpoststud.1.1.0009
Braidotti R (2019) A theoretical framework for the critical posthumanities. Theory
Cult Soc 36(6):31–61. https://doi.org/10.1177/0263276418771486
Braidotti R, Fuller M (2019) The posthumanities in an era of unexpected
consequences. Theory Cult Soc 36(6):3–29. https://doi.org/10.1177/
0263276419860567
Branch JW, Burgos D, Serna MDA, Ortega GP (2020) Digital transformation in
higher education institutions: between myth and reality. In: Burgos D (ed)
Radical solutions and eLearning. Lecture Notes in Educational Technology.
Springer Singapore, Singapore, pp 41–50, https://doi.org/10.1007/978-
981-15-4952-6_3
Brennan T (2017) The digital humanities bust. The Chronicle of Higher
Education, Washington. https://www.chronicle.com/article/the-digital-
humanities-bust/. Section: The Review
Brodkin K (1998) How Jews became white folks and what that says about race in
America. Rutgers University Press, New Brunswick. OCLC: 237334921
Bronstein JL (ed) (2015) Mutualism, 1st edn. Oxford University Press, Oxford.
OCLC: ocn909326420
Brown VA, Harris JA, Russell JY (eds) (2010) Tackling wicked problems through
the transdisciplinary imagination. Earthscan, London
Brown-Martin G, Yiannouka SN, Tavakolian N (2014) Learning {re}imagined:
how the connected society is transforming learning. Bloomsbury Qatar Foun-
dation Publishing, Qatar. OCLC: 905268079
150 REFERENCES
Graeber D (2013) The democracy project: a history, a crisis, a movement, 1st edn.
Spiegel & Grau, New York
Gretarsson B, O’Donovan J, Bostandjiev S, Höllerer T, Asuncion AU, Newman
D, Smyth P (2012) TopicNets: visual analysis of large text corpora with topic
modeling. ACM Trans Intell Syst Technol 3(2):23:1–23:26. https://doi.org/
10.1145/2089094.2089099
Grimshaw M (2018) Towards a manifesto for a critical digital humanities:
critiquing the extractive capitalism of digital society. Palgrave Commun
4(1):21. https://doi.org/10.1057/s41599-018-0075-y, http://www.nature.
com/articles/s41599-018-0075-y
Grothendieck A (1986) Récoltes et Semailles, Part I, II. Réflexions et témoignage
sur un passé de mathématicien. Gallimard, Paris
Grusin R (2014) The dark side of digital humanities: dispatches from two recent
MLA conventions. Differences 25(1):79–92. https://doi.org/10.1215/
10407391-2420009, https://read.dukeupress.edu/differences/article/25/
1/79-92/60682
Guariglia M (2020) Technology can’t predict crime, it can only weaponize prox-
imity to policing. https://www.eff.org/deeplinks/2020/09/technology-cant-
predict-crime-it-can-only-weaponize-proximity-policing
Guglielmo TA (2004) White on arrival: Italians, race, color, and power in Chicago,
1890–1945. Oxford University Press, New York. OCLC: 1090473075
Guglielmo J, Salerno S (2003) Are Italians white?: how race is made in America.
Routledge, New York. OCLC: 51622484
Hall S (1999) Un-settling ‘the heritage’, re-imagining the post-
nationWhose heritage? Third Text 13(49):3–13. https://doi.org/10.1080/
09528829908576818, http://www.tandfonline.com/doi/abs/10.1080/
09528829908576818
Hall G (2002) Culture in bits: the monstrous future of theory. Continuum,
London. http://site.ebrary.com/id/10224963. OCLC: 320326067
Hanks P (2013) Lexical analysis: norms and exploitations. MIT Press, Cambridge.
http://site.ebrary.com/id/10651991. OCLC: 907618528
Hannak A, Soeller G, Lazer D, Mislove A, Wilson C (2014) Measuring price
discrimination and steering on e-commerce web sites. In: Proceedings of
the 2014 conference on internet measurement conference. ACM, Vancouver,
pp 305–318. https://doi.org/10.1145/2663716.2663744, https://dl.acm.
org/doi/10.1145/2663716.2663744
Harris ZS (1954) Distributional structure. Word J Linguist Circ N Y 10(2–3):146–
162
Harris M (1976) History and significance of the EMIC/ETIC distinction.
Annu Rev Anthropol 5(1):329–350. https://doi.org/10.1146/annurev.an.
05.100176.001553
Harrison R (2013) Heritage: critical approaches. Routledge, Milton Park
REFERENCES 157
Liu A (2012) Where is cultural criticism in the digital humanities? In: Gold MK (ed)
Debates in the digital humanities. University of Minnesota Press, Minneapo-
lis, pp 490–510. https://doi.org/10.5749/minnesota/9780816677948.003.
0049
Liu B (2020) Sentiment analysis: mining opinions, sentiments, and emotions.
Cambridge University Press, Cambridge
Longo G (2018) Information and causality: mathematical reflections on cancer
biology. Organ J Biol Sci 2:83–104. Paginazione. https://doi.org/10.13133/
2532-5876_3.15. Artwork size: 83-104. Paginazione publisher: Organisms.
Journal of Biological Sciences
Longo G (2019) Quantifying the world and its webs: mathematical discrete vs
continua in knowledge construction. Theory Cult Soc 36(6):63–72. https://
doi.org/10.1177/0263276419840414
Luconi S (2003) Frank L. Rizzo and the whitening of Italian Americans in
Philadelphia. In: Guglielmo J, Salvatore S (eds) Are Italians white? How race
is made in America. Routledge, New York, pp 177–191
MacDonald JS, MacDonald LD (1964) Chain migration ethnic neighborhood
formation and social networks. Milbank Mem Fund Q 42(1):82. https://
doi.org/10.2307/3348581, https://www.jstor.org/stable/3348581?origin=
crossref
Mahony S (2018) Cultural diversity and the digital humanities. Fudan J Human Soc
Sci 11(3):371–388. https://doi.org/10.1007/s40647-018-0216-0, http://
link.springer.com/10.1007/s40647-018-0216-0
Mandell L (2019) Gender and cultural analytics: finding or making stereotypes?
In: Gold MK, Klein L (eds) Debates in the digital humanities. University of
Minnesota Press, Minneapolis
Manovich L (2002) The language of new media, 1st edn. Leonardo, MIT Press,
Cambridge
McCallum AK (2002) MALLET: a machine learning for language Toolkit. http://
mallet.cs.umass.edu
McCarty W (2015) Becoming interdisciplinary. In: Schreibman S, Siemens R,
Unsworth J (eds) A new companion to digital humanities. John Wiley & Sons
Ltd., Chichester, pp 67–83. https://doi.org/10.1002/9781118680605.ch5,
https://onlinelibrary.wiley.com/doi/10.1002/9781118680605.ch5
McKeon M (1994) The origins of interdisciplinary studies. Eighteenth Century
Stud 28(1):17. https://doi.org/10.2307/2739220
McNally PF (2002) Double fold: libraries and the assault on paper, Nicholson
Baker. Bibliogr Soc Can 40(1). https://doi.org/10.33137/pbsc.v40i1.18264,
https://jps.library.utoronto.ca/index.php/bsc/article/view/18264
McPherson T (2019) “Chapter 9: why are the digital humanities so white? or
thinking the histories of race and computation. In: Gold MK (ed) Debates in
the digital humanities. University of Minnesota Press, Minneapolis
160 REFERENCES
Rhodes LD (2010) The ethnic press: shaping the American dream. Lang, New
York. OCLC: 705943676
Ricœur P (2003) The rule of metaphor the creation of meaning in language.
Routledge Classics, London. OCLC: 1229646734
Riedl M, Padó S (2018) A named entity recognition shootout for German. In:
Proceedings of the 56th annual meeting of the association for computational
linguistics (volume 2: short papers). Association for Computational Linguistics,
Melbourne, pp 120–125. https://doi.org/10.18653/v1/P18-2020
Risam R (2015) Beyond the margins: intersectionality and the digital humanities.
Digit Human Q 9(2)
Ritchey T (2011) Wicked problems – social messes: decision support modelling
with morphological analysis. vol. 17. In: Risk, governance and society. Springer,
Berlin
Röder M, Both A, Hinneburg A (2015) Exploring the space of topic coherence
measures. In: Proceedings of the eighth ACM international conference on web
search and data mining – WSDM ’15. ACM Press, Shanghai, pp 399–408.
https://doi.org/10.1145/2684822.2685324
Roediger DR (2005) Working toward whiteness: how America’s immigrants
became white: the strange journey from Ellis Island to the suburbs. BasicBooks,
New York. OCLC: 475064525
Rorty R (2004) Being that can be understood is language. In: Krajewski B (ed)
Gadamer’s repercussions, 1st edn. Reconsidering Philosophical Hermeneutics.
University of California Press, Berkeley, pp 21–29
Rougier NP (2016) R-words · Issue #5 · ReScience/ReScience-article. https://
github.com/ReScience/ReScience-article/issues/5
Rumsey AS (2016) When we are no more: how digital memory is shaping our
future. https://www.youtube.com/watch?v=_ZJDFDscmWE
Rumsey AS, Digital Library Federation (2001) Strategies for building digitized
collections. Digital Library Federation, Council on Library and Information
Resources, Washington. https://lccn.loc.gov/2002726838
Sánchez RT, Santos AB, Vicente RS, Gómez AL (2019) Towards an uncertainty-
aware visualization in the digital humanities. Informatics 6(3):31. https://doi.
org/10.3390/informatics6030031
Schwab K (2017) The fourth industrial revolution. Penguin, London. OCLC:
1004975093
Shanmugam R (2018) Elements of causal inference: foundations and learning
algorithms. J Stat Comput Simul 88(16):3248–3248. https://doi.org/10.
1080/00949655.2018.1505197
Sievert C, Shirley K (2014) LDAvis: a method for visualizing and interpreting
topics. In: Proceedings of the workshop on interactive language learning, visu-
alization, and interfaces. Association for Computational Linguistics, Baltimore,
pp 63–70. https://doi.org/10.3115/v1/W14-3110
REFERENCES 163
Viola L, Verheul J (2020b) One hundred years of migration discourse in the times: a
discourse-historical word vector space approach to the construction of meaning.
Front Artif Intell 3:64. https://doi.org/10.3389/frai.2020.00064, https://
www.frontiersin.org/article/10.3389/frai.2020.00064/full
Viola L, De Bruin J, van Eijden K, Verheul J (2019) The GeoNewsMiner (GNM):
an interactive spatial humanities tool to visualize geographical references in
historical newspapers (v1.0.0). https://github.com/lorellav/GeoNewsMiner
Viola L, Cunningham AR, Jaskov H (2021) Introducing the DHARPA project:
an interdisciplinary lab to enable critical DH practice. In: The humanities in
a digital world, virtual. https://doi.org/10.5281/zenodo.5109974, https://
zenodo.org/record/5109974
Waldrop MM (1992) Complexity: the emerging science at the edge of order and
chaos. Simon & Schuster, New York. OCLC: 26310607
Wallach HM (2006) Topic modeling: beyond bag-of-words. In: Proceedings of
the 23rd international conference on Machine learning - ICML ’06, ACM
Press, Pittsburgh, pp 977–984. https://doi.org/10.1145/1143844.1143967,
http://portal.acm.org/citation.cfm?doid=1143844.1143967
Wang X, McCallum A, Wei X (2007) Topical N-grams: phrase and topic dis-
covery, with an application to information retrieval. In: Seventh IEEE inter-
national conference on data mining (ICDM 2007). IEEE, Omaha, pp 697–
702. https://doi.org/10.1109/ICDM.2007.86, http://ieeexplore.ieee.org/
document/4470313/
Ware C (2021) Information visualization: perception for design.
Elsevier, Amsterdam. https://www.sciencedirect.com/science/book/
9780128128756. OCLC: 1144746447
Weinelt B (2016) Digital transformation of industries: societal implications.
Technical report, World Economic Forum. https://reports.weforum.org/
digital-transformation/wp-content/blogs.dir/94/mp/files/pages/files/dti-
societal-implications-white-paper.pdf
Weinert F (2005) The scientist as philosopher: philosophical consequences of great
scientific discoveries. Springer, Berlin
Weingart P (2000) 2. Interdisciplinarity: the paradoxical discourse. In: Stehr
N, Weingart P (eds) Practising interdisciplinarity. University of Toronto
Press, Toronto, pp 25–42. https://doi.org/10.3138/9781442678729-004,
https://www.degruyter.com/document/doi/10.3138/9781442678729-
004/html
Weiskott E (2017) There is no such thing as ‘the digital humanities’. The Chronicle
of Higher Education. https://www.chronicle.com/article/there-is-no-such-
thing-as-the-digital-humanities/
Wilding R (2007) Transnational ethnographies and anthropological imaginings
of migrancy. J Ethn Migr Stud 33(2):331–348. https://doi.org/10.1080/
13691830601154310
REFERENCES 167
Williams L (2019) What computational archival science can learn from art
history and material culture studies. In: 2019 IEEE international conference
on big data (big data). IEEE, Los Angeles, pp 3153–3155. https://doi.
org/10.1109/BigData47090.2019.9006527, https://ieeexplore.ieee.org/
document/9006527/
Wills C (2005) Destination America. DK Publishers, New York. OCLC: 61361495
Windhager F, Federico P, Schreder G, Glinka K, Dork M, Miksch S, Mayr
E (2019a) Visualization of cultural heritage collection data: state of the
art and future challenges. IEEE Trans Vis Comput Graph 25(6):2311–
2330. https://doi.org/10.1109/TVCG.2018.2830759, https://ieeexplore.
ieee.org/document/8352050/
Windhager F, Salisu S, Mayr E (2019b) Exhibiting uncertainty: visualizing data
quality indicators for cultural collections. Informatics 6(3):29. https://doi.
org/10.3390/informatics6030029, https://www.mdpi.com/2227-9709/6/
3/29
Winter T (2011) Post-conflict heritage, postcolonial tourism: culture, politics
and development at Angkor, 1st edn. No. 21 in Routledge studies in Asia’s
transformations, Routledge, London
Zuanni C (2020) Heritage in a digital world: algorithms, data, and future
value generation. Grazer theologische Perspektiven 3(2):236–259. https://doi.
org/10.25364/17.3:2020.2.12, https://digital.obvsg.at/limina/periodical/
titleinfo/5552067. Medium: PDF. Publisher: Universität Graz. Version Num-
ber: Version of Record
INDEX
G K
Gartner Hype Cycle, 3 Knowledge creation, 3, 8, 17, 26, 27,
Gensim, 99 44, 55, 58, 77, 144
Geocoding, 58, 61, 67, 68
Geolocation, 33, 55, 141
GeoNewsMiner, 50, 64 L
Graphical User Interface, 50 L’Italia, 64, 67
Graph theory, 124 La Rassegna, 67
Language injustice, 68
Latent Dirichlet Allocation, 82, 95,
H 99–101, 114
Heritage, 47 LDAvis, 109
Heritagisation, 58 Lemmatisation, 59, 97
Higher education, 4, 34 Lemmatising, 118
Historical migration, 54 Library of Congress, 43, 48, 96, 113
Historical newspapers, 48 Linguistic categories, 72
Human-in-the-loop, 109 Lowercasing, 59, 61
The humanities, 76, 83, 85, 86, 110
M
I Machine learning, 13–15, 58, 62, 84,
Immigrant communities, 52 90, 91, 104, 108, 113, 142
Immigrant press, 52 Mainstream humanities, 21
Information Retrieval, 71, 99, 111 MALLET, 99, 113
Information visualisation, 107, 108, Markers of identity, 66, 70, 124
111 Mass migrations, 53
Interactive Topic Modelling, 109 Material culture, 39
Inter-annotator agreement, 73 Materiality of the sources, 61, 63, 65,
Interdisciplinarity discourse, 19, 20, 66, 95, 113, 114, 118
27, 144 McKinsey Global Institute, 4, 5
Interpreting the topics, 101 Meaning, 88, 91
Interspecific competition, 45, 139 Mechanical Turk, 10, 17
Issue-focused network, 130 Metaphors, 81–83, 142
Italian American diaspora, 70 Microfilm, 43, 46, 49, 61, 62
Italian American migration, 52, 53 Microtargeting, 12
Italian immigrant communities, 51 Migrant communities, 48
Italian immigrant newspapers, 53 Migrants’ narratives, 54, 66
Italian Transatlantic migration, 124 Migration, 54
IVisClustering, 109 Model of knowledge creation, 137
172 INDEX