Download as pdf or txt
Download as pdf or txt
You are on page 1of 193

The Humanities in the

Digital: Beyond Critical


Digital Humanities
Lorella Viola
The Humanities in the Digital: Beyond Critical
Digital Humanities
Lorella Viola

The Humanities in the


Digital: Beyond
Critical Digital
Humanities
Lorella Viola
University of Luxembourg
Esch-sur-Alzette, Luxembourg

ISBN 978-3-031-16949-6 ISBN 978-3-031-16950-2 (eBook)


https://doi.org/10.1007/978-3-031-16950-2

© The Editor(s) (if applicable) and The Author(s) 2023. This book is an open access
publication.
Open Access This book is licensed under the terms of the Creative Commons Attribution
4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits
use, sharing, adaptation, distribution and reproduction in any medium or format, as long as
you give appropriate credit to the original author(s) and the source, provide a link to the
Creative Commons license and indicate if changes were made.
The images or other third party material in this book are included in the book’s Creative
Commons license, unless indicated otherwise in a credit line to the material. If material is not
included in the book’s Creative Commons license and your intended use is not permitted by
statutory regulation or exceeds the permitted use, you will need to obtain permission directly
from the copyright holder.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this
publication does not imply, even in the absence of a specific statement, that such names are
exempt from the relevant protective laws and regulations and therefore free for general use.
The publisher, the authors, and the editors are safe to assume that the advice and information
in this book are believed to be true and accurate at the date of publication. Neither the
publisher nor the authors or the editors give a warranty, expressed or implied, with respect
to the material contained herein or for any errors or omissions that may have been made.
The publisher remains neutral with regard to jurisdictional claims in published maps and
institutional affiliations.

Cover illustration: © Alex Linch shutterstock.com

This Palgrave Macmillan imprint is published by the registered company Springer Nature
Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Alla mia famiglia
FOREWORD

No single line of development leads to the intersection of digital methods


and scholarship in the humanities. The mid-twentieth-century activity of
Roberto Busa is frequently cited in origin stories of digital humanities. His
collaboration with IBM was undertaken when Thomas Watson saw the
potential for automating the Jesuit scholar’s concordance of the works of
St Thomas Aquinas.
But instruments essential to automation include intellectual and tech-
nological components with much longer histories. These connect to such
mechanical precedents as the Pascaline, a calculating device named for its
seventeenth-century inventor, the philosopher Blaise Pascal, and to the
finely crafted Jacquard punch cards used to direct shuttle movements in
the nineteenth-century textile industry. In addition, a wide array of systems
of formal logic, procedural mathematics and statistical methods developed
over centuries have provided essential foundations for computational
operations.
Many features of contemporary networked scholarship can be tracked
to earlier information systems of knowledge management and even ancient
strategies of record-keeping. Scholarly practices were always mediated
through technologies and infrastructures whether these were hand-copied
scrolls and codices, shelving systems and catalogues or other methods
for search, retrieval and reproduction. The imprints of Babylonian grids,
arithmetic visualisations, legacy metadata and classification schemes remain
present across contemporary knowledge work. Now that the role of
digital technology in scholarship has become conspicuous in any and every
domain, the line between technological and humanistic domains is some-

vii
viii FOREWORD

times hard to discern in our daily habits. Even the least computationally
savvy scholar regularly makes use of digital resources and activities. Of
course, discrepancies arise cross cultures and geographies, and certainly
not all scholarship in every community exists in the same networked
environment. Plenty of ‘traditional’ scholarship continues in direct contact
with physical artefacts of every kind. But much of what has been considered
‘the humanities’ within the long traditions of Western and Asian culture
now functions on foundations of digitised infrastructure for access and
distribution. To a lesser degree, it also functions with the assistance of
computational processing that assists not only search and retrieval, but also
analysis and presentation in quantitative and graphical display.
As we know, familiar technologies tend to become invisible as they
increase in efficiency. The functional device does not, generally, call atten-
tion to itself when it performs its tasks seamlessly. The values of technolog-
ical optimisation privilege these qualities as virtues. Habitual consumers of
streaming content are generally disinterested in a Brechtian experience of
alienation meant to raise awareness of the conditions of viewing within
a cultural-ideological matrix. Such interventions would be tedious and
distracting and only lead to frustration whether part of an entertainment
experience or a scholarly and pedagogical one. Consciousness raising
through aesthetic work has gone the way of the early twentieth-century
avant-garde ‘slap in the face of public taste’ and other tactics to shock the
bourgeoisie. The humanities must stream along with the rest of content
delivered on demand and in consumable form, preferably with special
effects and in small packets of readily digested material. Immersive Van
Gogh and Frida Kahlo exhibits have joined the traditional experience of
looking at painting. The expectations of gallery viewers are now hyped
by theme park standards. The gamification of classrooms and learn-
ing environments caters to attention-distracted participants. Humanities
research struggles in such a context even as knowledge of history, ancient,
indigenous and classical languages and expertise in scholarly methods such
as bibliography and critical editing are increasingly rare.
Meanwhile, debates in digital humanities have become fractious in
recent years with ‘critical’ approaches pitting themselves against earlier
practices characterised as overly positivistic and reductive. Pushback and
counterarguments have split the field without resulting in any substantive
innovation in computational methods, just a shift in rhetoric and claims.
Algorithmic bias is easier to critique than change, even if advocates
assert the need to do so. The question of whether statistical and formal
FOREWORD ix

methods imported from the social sciences can adequately serve humanistic
scholarship remains open and contentious several decades into the use
of now-established practices. Even exuberant early practitioners rarely
promoted digital humanities as a unified or salvific approach to scholarly
transformation, though they sometimes adopted computational techniques
somewhat naively, using programs and platforms in a black box mode,
unaware of their internal workings.
The instrumental aspects of digital infrastructure are only one topic
of current dialogues. From the outset of my own encounter with digital
humanities, sometime in the mid-1990s, what I found intriguing were the
intellectual exigencies asserted as a requirement for working within the
constraints of these formal systems. While all of this has become normative
in the last decades, a quarter of a century ago, the recognition that much
that had been implicit in humanities scholarship had to be made explicit
within a digital context produced a certain frisson of excitement. The
task of rethinking how we thought, of thinking in relation to different
requirements, of learning to imagine algorithmic approaches to research,
to conceptualise our explorations in terms of complex systems, emergent
properties, probabilistic results and other frameworks infused our research
with exciting possibilities.
As I have noted, much of that novelty has become normative—no longer
called self-consciously to attention—just as awareness of the interface is lost
in the familiar GUI screens. Still, new questions did get asked and answered
through automated methods—mainly benefits at scale, the ‘reading’ of
a corpus rather than a work, for instance. Innovative research crosses
material sciences and the humanities, promotes non-invasive archaeology
and supports authentification and attribution studies. Errors and mistakes
abound, of course, and the misuse of platforms and processes is a regular
feature of bad digital humanities—think of all those network diagrams
whose structure is read as semantic even though it is produced to optimise
screen legibility. Misreading and poor scholarship are hardly exclusive to
digital projects even if the bases for claims to authority are structured
differently in practices based on human versus machine interpretation.
Increased sophistication in automated processes (such as named entity
recognition, part of speech parsing, visual feature analysis etc.) continues
to refine results. But the challenge that remains is to learn to think in and
with the technological premises. A digital project is not just an automated
version of a traditional project; it is a project conceived from the outset
in terms that structure the problems and possible outcomes according
x FOREWORD

to what automated and computational processes enable. Using statistical


sampling methods is not a machine-supported version of serendipity or
chance encounter—it is a structurally and intellectually different activity.
Photoshop is not just a camera on steroids; it is a means of abstracting
visual information into discrete components that can be manipulated in
ways that were not conceptually or physically possible in wet darkrooms.
Similarly, other common programs like Gephi, Cytoscape, Voyant and
Tableau contain conceptual features un-thought and unthinkable in ana-
logue environments—but they need to be engaged with an understanding
of what those features allow.
Detractors scoff, sceptics cringe and the naysayers of various critical
stripes protest that all of this aligns with various agendas—political, neolib-
eral, free market or whatever—as if intellectual life had ever been free
of conditions and contexts. Where were those pure humanities scholars
of a bygone era? Working for the Church? The State? Elite universities?
Administrative units within the legal systems of national power structures?
The science labs that hatched nefarious outcomes in the name of pure
research? Finger pointing and head wagging will not change the reality that
the humanities have always been integrated into civilisations and cultures
to serve partisan agendas and hegemonic power structures. Poetics and
aesthetics provide insight into the conditions from which they arise; they
are not independent of it. We no longer subscribe to the tenets of Matthew
Arnold’s belief in the ‘best that has been made and thought’ of human
expression contributes to moral uplift and improvement. Everything is
complicit. Digital humanities is hardly the first or likely to be the last
instrument of exclusivity or oppression—as well as liberation and social
progress.
The labour of scholarship continues along with the pedagogy to sustain
it. This activity imprints many values and judgements in the materials and
methods on which it proceeds. Basic activities like classification model the
way objects are found and identified. Crucial decisions about digitisation—
size, scale, quality and source—affect what is presented for study. Terms of
access and use create hierarchies among communities, some of whom have
more resources than others. In short, at every point in the chain of interre-
lated social and material activities that create digital assets, implementation
and intellectual implications are combined. The charge to address social ills
and inequities freights projects in digital humanities with tasks of reparation
and redress, asking that it bears the weight of an entire agenda of social
justice. The relation between ethics and application to digital work raises
FOREWORD xi

more institutional resource and epistemological issues than technical ones.


Fairness requires equal opportunity for skilled production and access, as
well as a share in the interpretative discourse. No substitute exists for doing
the work and having the educational and material resources to do it. The
first step in transforming a field is the choice to acquire its competencies.
Ignorant critique is as pernicious and ineffectual as unthinking practice.
Lorella Viola’s argument about the current state of digital scholarship
is meant to shift the frameworks for understanding these issues. Historical
tensions are evident in the way her subtitle frames her work as ‘Beyond
Critical Digital Humanities’. Acknowledging debates that have often pitted
first wave digital humanists against later critics, she positions her own
research as ‘post-authentic’ by contrast. This term signals dismissal of the
last shred of belief that digital and computational techniques were value
neutral or promoted objectivity (a position taken only by a fraction of
practitioners). But it also distances her from the standard ‘critique’ of
these methods—that they are tools of a neoliberal university environment
promoting entrepreneurial approaches to scholarship at the expense of
some other not very clearly articulated alternative (another very worn out
line of discussion).
Keen to move beyond all this, Viola advocates ‘symbiotic’ and ‘mutual-
ist’ as concepts that eschew many old binarisms and disciplinary bound-
aries. While acknowledging the range of work on which she herself
is building, and the historical development of positions and counter-
positions, she seeks to integrate critical principles into digital methods and
projects from within her own experience of their practice. Her work is
grounded in knowledge of text analysis and computational linguistics as
well as interface design and visualisation. While her summary of polarised
positions forms the opening section of this book, and underpins much of
what she offers as an alternative, her vision of the way forward is synthetic
and affirmative. Throughout, she invokes a post-authentic framework
that emphasises critical engagement with digital operations as mediations.
She frequently reiterates the points that geo-coding or sentiment analysis
works within the dominant power structures that privilege Anglo-centric
approaches and English language materials. Such biases are in part due
to the historical site in which the work arose. Certain environments have
more resources than others. The issue now is to create opportunities for
transformation.
The larger assertions of Viola’s project are crucial: that artificial bina-
risms that pit traditional/analogue and computational/digital approaches
xii FOREWORD

against each other, critical methods against technical ones and so on are
distractions from the core issues: How is humanities scholarship to proceed
ahead? What intellectual expertise is required to work with and read into
and through the processes and conditions in which we conceptualise the
research we do?
Digital objects and computational processes have specific qualities that
are distinct from those of analogue ones, and learning to think in those
modes is essential to conceptualising the questions these environments
enable, while recognising their limitations. Thinking as an algorithm, not
just ‘using’ technology, requires a shift of intellectual understanding. But
knowledge of disciplinary domains and traditions remains a crucial part
of the essential expertise of the scholar. Without subject area expertise,
humanities research is pointless whether carried out with digital or tradi-
tional methods. As long as human beings are the main players—agents or
participants—in humanities research, no substitute or surrogate for that
expertise can arise. When that ceases to be the case, these questions and
debates will no longer matter. No sane or ethical humanist would hasten
the arrival of the moment when we cease to engage in human discourse.
No matter how much agency and efficacy are imagined to emerge in the
application of digital methods, or how deeply we may come to love our
robo-pets and AI-bot-assistants, the humanities are still intimately and
ultimately linked to human experience of being in the world. Finally, the
challenge to infuse computational methods with humanistic values, such
as the capacity to tolerate ambiguity and complexity, remains. What, after
all, is a humanistic algorithm, a bias-sensitive digital format or a self-
conscious interface? What interventions in the technology would result in
these transformations?
Lorella Viola has much to say on these matters and works from experi-
ence that combines hands-on engagement with computational methods
and a critical framework that advances insight and understanding. So,
machine and human readers, turn your attention to her text.

Los Angeles, CA, USA Johanna Drucker


May 2022
PREFACE

In 2016, the Los Angeles Review of Books (LARB) conducted a series of


interviews with both scholars and critics of digital humanities titled ‘The
Digital in the Humanities’. The aim of the special interview series was
‘to explore the intersection of the digital and the humanities’ (Dinsman
2016) and the impact of that intersection on teaching and research. As I
was reading through the various interviews collected in the series, there
was something that recurrently caught my attention. Despite an extensive
use of terminology that attempted to communicate ideas of unity—for
example, digital humanities is described as a field that ‘melds computer
science with hermeneutics’—it gradually became obvious to me how the
traditional rigid notions of separation and dualism that characterise our
contemporary model of knowledge creation were creeping in, surrepti-
tiously but consistently. The following excerpt from the editorial to the
special issue provides a good example (emphasis mine):

“digital humanities” seems astoundingly inappropriate for an area of study


that includes, on one hand, computational research, digital reading and
writing platforms, digital pedagogy, open-access publishing, augmented
texts, and literary databases, and on the other, media archaeology and theories
of networks, gaming, and wares both hard and soft. (ibid.)

Language is see-through. It is the functional description of our mental


models, that is, it expresses our conceptual understanding of the world.
In the example above, the use of the construction ‘on one hand…on
the other’ conveys an image of two distinct, contrasting polarised enti-

xiii
xiv PREFACE

ties which are essentially antithetical in their essence and which despite
intersecting—as it is said a few lines below—remain fundamentally separate.
This description of digital humanities reflects a specific mental model,
that knowledge is made up of competences delimited by established,
disciplinary boundaries. Should there be overlapping spaces, boundaries
do not dissolve nor they merge; rather, disciplines further specialise and
create yet new fields.
When the COVID-19 pandemic was at its peak, I was spending my
days in my loft in Luxembourg and most of my activities were digital. I
was working online, keeping contact with my family and friends online,
watching the news online and taking online courses and online fitness
classes. I even took part in an online choir project and in an online pub
quiz. Of course for my friends, family and acquaintances, it was not much
different. As the days became weeks and the weeks became months and
then years, it was quite obvious that it was no longer a matter of having
the digital in our lives; rather, now everyone was in the digital.
This book is titled ‘The Humanities in the Digital’ as an intentional
reference to the LARB’s interview series. With this title, I wanted to mark
how the digital is now integral to society and its functioning, including how
society produces knowledge and culture. The word order change wants
to signal conclusively the obsolescence of binary modulations in relation
to the digital which continue to suggest a division, for example, between
digital knowledge production and non-digital knowledge production. Not
only that. It is the argument of this book that dual notions of this kind are
the spectre of a much deeper fracture, that which divides knowledge into
disciplines and disciplines into two areas: the sciences and the humanities.
This rigid conceptualisation of division and competition, I maintain, is
complicit of having promoted a narrative which has paired computational
methods with exactness and neutrality, rigour and authoritativeness whilst
stigmatising consciousness and criticality as carriers of biases, unreliability
and inequality.
This book argues against a compartmentalisation of knowledge and
maintains that division in disciplines is not only unhelpful and conceptually
limiting, but especially after the exponential digital acceleration brought
about by the 2020 COVID-19 pandemic, also incompatible with the
current reality. In the pages that follow, I analyse many of the different
ways in which reality has been transformed by technology—the pervasive
adoption of big data, the fetishisation of algorithms and automation and
the digitisation of education and research—and I argue that the full
PREFACE xv

digitisation of society, already well on its way before the COVID-19


pandemic but certainly brought to its non-reversible turning point by the
2020 health crisis, has added levels of complexity to reality that our model
of knowledge based as it is on single-discipline perspective can no longer
explain. With this book, my intention is to have the necessary conversation
that the historical moment demands.
The book is therefore primarily a reflection on the separation of knowl-
edge into disciplines and of disciplines into the sciences vs the humanities
and discusses its contemporary relevance and adequateness in relation to
the ubiquitous impact of digital technologies on society and culture. In
arguing in favour of a reconfigured model of knowledge creation in the
digital, I propose different notions, practices and values theorised in a novel
conceptual and methodological framework, the post-authentic framework.
This framework offers a more complex and articulated conceptualisation
of digital objects than the one found in dominant narratives which reduce
them to mere collections of data points. Digital objects are understood
as living compositions of humans, entities and processes interconnected
according to various modulations of power embedded in computational
processes, actors and societies. Countless versions can be created through
such processes which are shaped by past actions and in turn shape the
following ones; thus digital objects are never finished nor they can be
finished ultimately transcending traditional questions of authenticity.
Digital objects act in and react to society and therefore bear con-
sequences; the post-authentic framework rethinks both products and
processes which are acknowledged as never neutral, incorporating external,
situated systems of interpretation and management. Taking the humanities
as a focal point, I analyse personal use cases in a variety of applied contexts
such as digital heritage practices, digital linguistic injustice, critical digital
literacy and critical digital visualisation. I examine how I addressed in my
own work issues in digital practice such as transparency, documentation
and reproducibility, questions about reliability, authenticity, biases, ambi-
guity and uncertainty and engaging with sources through technology. I
discuss these case examples in the context of the novel conceptual and
methodological framework that I propose, the post-authentic framework.
By recognising the larger cultural relevance of digital objects and the
methods to create them, analyse them and visualise them, throughout the
chapters of the book, I show how the post-authentic framework affords
an architecture for issues such as transparency, replicability, Open Access,
sustainability, data manipulation, accountability and visual display.
xvi PREFACE

The Humanities in the Digital ultimately aims to address the increasingly


pressing questions: how do we create knowledge today? And how do we
want the next generation of students to be trained? Beyond the rigid model
of knowledge creation still fundamentally based on notions of separation
and competition, the book shows another way: knowledge creation in the
digital.

Esch-sur-Alzette, Luxembourg Lorella Viola


July 2022
ACKNOWLEDGEMENTS

The Humanities in the Digital is published Open Access thanks to sup-


port from three funding schemes: the Fonds National de la Recherche
Luxembourg (FNR)—RESCOM: Scientific Monographs scheme, the Lux-
embourg Centre for Contemporary and Digital History (C2 DH)—Digital
Research Axis Fund and the C2 DH Open Access Fund. My sincerest
thanks for funding this book project. Your support accelerates discovery
and creates a fairer access to knowledge that is open to all.
The research carried out for this book was supported by FNR. Parts
of the use cases illustrated in the book, including the conceptual work
done towards developing the interface for topic modelling illustrated in
Chap. 5, stem from research carried out within the project: DIGITAL HIS-
TORY ADVANCED RESEARCH P ROJECTS ACCELERATOR—DHARPA.
The discussions about network analysis and sentiment analysis and the
interface examples of DeXTER described in Chap. 5 are the result of
research carried out within the THINKERING GRANT which was awarded
to me by the C2 DH.
I would like to express my deepest gratitude to Sean Takats for his
support and continuous encouragement during these years; without it,
writing this book would have been much harder. A big thank you to the
DHARPA project as a whole which has been instrumental for elaborating
the framework proposed in this book. I have been really lucky to be part
of it.
I warmly thank Mariella de Crouy Chanel, my exchanges with you
helped me identify and solve challenges in my work. Outside of official

xvii
xviii ACKNOWLEDGEMENTS

meetings, I have also much enjoyed our chats during lunch and coffee
breaks. A very great thank you to Sean Takats, Machteld Venken, Andreas
Musolff, Angela Cunningham, Joris van Eijnatten and Andreas Fickers for
taking the time to comment on earlier versions of this book; thanks for
being my critical readers. And thank you to the C2 DH; in the Centre, I
found fantastic colleagues and impressive expertise and resources. I could
have not asked for more.
I am deeply grateful to Jaap Verheul and Joris van Eijnatten for providing
me with invaluable intellectual support and guidance when I was taking my
first steps in digital humanities. You have both always shown respect for my
ideas and for me as a professional and a learner and I will always cherish
the memory of my time at Utrecht University.
My sincere thanks to the Transatlantic research project Oceanic
Exchanges (OcEx) of which I had the privilege to be part. In those years, I
had the opportunity to work with lead historians and computer scientists;
without OcEx, I would not be the scholar that I am today.
Very special thanks to my family to which this book is dedicated for
always listening, supporting me and encouraging me to aim high and never
give up. Your love and care are my light through life; I love you with all my
heart. And a very big thank you to my little nephew Emanuele, who thinks
I must be very cold away from Sicily. Your drawings of us in the Sicilian
sun have indeed warmed many cold, Luxembourgish winter days.
I heartily thank Johanna Drucker for writing the Foreword to this book
and more importantly for producing inspiring and important science.
Thanks also to those who have supported this book project right from
the start, including the anonymous reviewers who have provided insights
and valuable comments, considerably contributing to improve my work.
And finally, thanks to all those who, directly or indirectly, have been
involved in this venture; in sometimes unpredictable ways, you have all
contributed to make this book possible.
CONTENTS

1 The Humanities in the digital 1

2 The Importance of Being Digital 37

3 The Opposite of Unsupervised 57

4 How Discrete 81

5 What the Graph 107

6 Conclusion 137

References 147

Index 169

xix
ABOUT THE AUTHOR

Lorella Viola is research associate at the Luxembourg Centre for Con-


temporary and Digital History (C2 DH), University of Luxembourg. She
is co-Principal Investigator in the Luxembourg National Research Fund
project DHARPA (Digital History Advanced Research Project Accelera-
tor). She was previously research associate at Utrecht University where
she was Work Package Leader in the Transatlantic digital humanities
research project Oceanic Exchanges. Currently, she is co-editing the vol-
ume Multilingual Digital Humanities (Routledge) which brings together,
advances and reflects on recent work on the social and cultural relevance
of multilingualism for digital knowledge production. Her scholarship has
appeared among others in Digital Scholarship in the Humanities, Frontiers
in Artificial Intelligence, International Journal of Humanities and Arts
Computing and Reviews in Digital Humanities.

xxi
ACRONYMS

AI Artificial intelligence
API Application Programming Interface
BoW Bag of words
CAGR Compound annual growth rate
CDH Critical Digital Humanities
CHS Critical Heritage Studies
C2 DH Centre for Contemporary and Digital History
DDTM Discourse-driven topic modelling
DeXTER DeepteXTminER
DH Digital humanities
DHA Discourse-historical approach
DHARPA Digital History Advanced Research Projects Accelerator
GNM GeoNewsMiner
GPE Geopolitical entity
GUI Graphical user interface
IR Information retrieval
ITM Interactive topic modelling
LARB Los Angeles Review of Books
LDA Latent Dirichlet Allocation
LDC Least developed countries
LLDC Landlocked developing countries
LOC Locality entity
ML Machine learning
NA Network analysis
NDNP National Digital Newspaper Program

xxiii
xxiv ACRONYMS

NEH National Endowment for the Humanities


NER Named entity recognition
NLP Natural language processing
OcEx Oceanic Exchanges
OCR Optical character recognition
ORG Organisation entity
PER Person entity
POS Parts-of-speech tagging
SA Sentiment analysis
SIDS Small island developing states
TF-IDF Term frequency-inverse document frequency
UI User interface
UN United Nations
UNESCO United Nations Educational, Scientific and Cultural Organi-
zation
LIST OF FIGURES

Fig. 3.1 Variation of removed material (in percentage) across


issues/years of Cronaca Sovversiva 63
Fig. 3.2 Impact of pre-processing operations on ChroniclItaly 3.0
per title. Figure taken from Viola and Fiscarelli (2021b) 64
Fig. 3.3 Variation of removed material (in
percentage) across issues/years of
L’ Italia 65
Fig. 3.4 Distribution of entities per title after intervention. Positive
bars indicate a decreased number of entities after the
process, whilst negative bars indicate an increased number.
Figure taken from Viola and Fiscarelli (2021b) 68
Fig. 3.5 Logarithmic distribution of selected entities for SA across
titles. Figure taken from Viola and Fiscarelli (2021b) 75
Fig. 5.1 Distribution of issues within ChroniclItaly 3.0 per title. Red
lines indicate at least one issue in a three-month period.
Figure taken from Viola and Fiscarelli (2021b) 115
Fig. 5.2 Wireframe of a post-authentic interface for topic
modelling: sources upload. The wireframe displays how the
post-authentic framework to metadata information could
guide the development of an interface. Wireframe by the
author and Mariella de Crouy Chanel 116
Fig. 5.3 Post-authentic framework to sources metadata information
display. Interactive visualisation available at https://
observablehq.com/@dharpa-project/timestamped-corpus.
Visualisation by the author and Mariella de Crouy Chanel 117

xxv
xxvi LIST OF FIGURES

Fig. 5.4 Post-authentic interface for topic modelling: data preview.


The wireframe displays how the post-authentic framework
could guide the development of an interface for exploring
the sources. Wireframe by the author and Mariella de Crouy
Chanel 119
Fig. 5.5 Interface for topic modelling: data pre-processing. The
wireframe displays how the post-authentic framework to
UI could make pre-processing more transparent to users.
Wireframe by the author and Mariella de Crouy Chanel 120
Fig. 5.6 Interface for topic modelling: data pre-processing
(stemming and lemmatising). The wireframe displays how
the post-authentic framework to UI could make stemming
and lemmatising more transparent to users. Wireframe by
the author and Mariella de Crouy Chanel 121
Fig. 5.7 Interface for topic modelling: corpus preparation. The
wireframe displays how the post-authentic framework to UI
could make corpus preparation more transparent to users.
Wireframe by the author and Mariella de Crouy Chanel 122
Fig. 5.8 DeXTER default landing interface for NA. The red oval
highlights the time bar (historicise feature) 127
Fig. 5.9 DeXTER default landing interface for NA. The red oval
highlights the different title parameters 128
Fig. 5.10 DeXTER default landing interface for NA. The red ovals
highlight the frequency and sentiment polarity parameters 129
Fig. 5.11 DeXTER: egocentric network for the node sicilia across all
titles in the collection in sentences with prevailing positive
sentiment 131
Fig. 5.12 DeXTER: network for the ego sicilia and alters across
titles in the collection in sentences with prevailing positive
sentiment 132
Fig. 5.13 DeXTER: default issue-focused network graph 133
CHAPTER 1

The Humanities in the digital

The ultimate, hidden truth of the world is that it is something that we make,
and could just as easily make differently. (Graeber 2013)

1.1 IN THE DIGITAL


The digital transformation of society was saluted as the imperative, unstop-
pable revolution which would have provided unparalleled opportunities to
our increasingly globalised societies. Among other benefits, it was praised
for being able to accelerate innovation and economic growth, increase
flexibility and productivity, reduce waste consumption, simplify and facil-
itate services and information provision and improve competitiveness by
drastically reducing development time and cost (Komarčević et al. 2017).
At the same time, however, warnings about the dramatic and disruptive
changes and outcomes that it would inevitably carry accompanied the
considerable hype. For example, several economists raised serious concerns
about the major risks that would derive from the digital transformation of
society. A non-negligible number of evidence-based studies projected rise
in social inequality, job loss and job insecurity, wage deflation, increased
polarisation in society, issues of environmental sustainability, local and
global threats to security and privacy, decrease in trust, ethical questions
on the use of data by organisations and governments and online profiling,
outdated regulations, issues of accountability in relation to algorithmic

© The Author(s) 2023 1


L. Viola, The Humanities in the Digital: Beyond Critical Digital
Humanities, https://doi.org/10.1007/978-3-031-16950-2_1
2 L. VIOLA

governance, erosion of the social security and intensification of isolation,


anxiety, stress and exhaustion (e.g., Autor et al. 2003; Cook and Van Horn
2011; Hannak et al. 2014; Lacy and Rutqvist 2015; Weinelt 2016; Frey
and Osborne 2017; Komarčević et al. 2017; Schwab 2017).
Despite all the evidence, however, the extraordinary collective advan-
tages presented by the new technologies were believed to far outweigh
the risks (Weinelt 2016; Komarčević et al. 2017; Schwab 2017). Indeed,
the prevailing tendency was to describe these great dangers rather as
‘challenges’ which, however significant, were believed to be within govern-
ments’ reach. The digital transformation of society would have undoubt-
edly provided unprecedented ‘opportunities’ to collaborate across geogra-
phies, sectors and disciplines, so naturally, on the whole, the highly praised
positives of the digital revolution overshadowed the negatives. Some
experts comment that this is in fact hardly surprising as in order for a
revolution to be accomplished, the necessary support must be mobilised
by governments, universities, research institutions, citizens and businesses
(Komarčević et al. 2017).
Thus, in the last decade, though with differences across countries,
both the public and the private sector have embraced the digital trans-
formation (European Center for Digital Competitiveness 2021). Govern-
ments around the world have increasingly implemented comprehensive
technology-driven programmes and legal frameworks aimed at boosting
innovation and entrepreneurship, whilst the industrial sector as a whole
has invested massively in digitising business processes, work organisation
and culture, modalities of market access, models of management and
relationships with customers (ibid.). The digital transformation has then
over the years forced businesses and governments to revolutionise their
infrastructures to incorporate an effective and comprehensive digital strat-
egy. Indeed, like always in history, the choice between adopting the new
technology or not has quickly become rather between innovation and
extinction.
The digital transformation has profoundly affected research as well. The
incorporation of technology in scholarship practice and culture, the imple-
mentation of data-driven approaches and the size and complexity of usable
and used data have increased exponentially in natural, computational, social
science and humanities research. The ‘Digital Turn’, as it is called, has
almost forced scholars to integrate advanced quantitative methods in their
research, and in the humanities at large, it has, for example, led to the
1 THE HUMANITIES IN THE DIGITAL 3

emergence of completely new fields such as of course digital humanities


(DH) (Viola and Verheul 2020b).
Institutionally, universities have in contrast been slow to adapt.
Although bringing the digital to education and research has been on
higher education institutions’ agendas for years, the changes have always
been set to be implemented gradually over the span of several years.
Universities have in other words adopted an evolutionary approach to the
digital (Alenezi 2021), according to which digital benefits are incorporated
within an existing model of knowledge creation. This means that, on the
one hand, the integration of the digital into knowledge creation practices
and the combination of methods and perspectives from different disciplines
are highly encouraged and much praised as the most effective way to
accelerate and expand knowledge. At the same time, however, technology
and the digital are seen as entities somewhat separate or indeed separable
from knowledge creation itself. This moderate approach allows a gradual
pace of change, and it is generally praised for its capacity to minimise
disruptions while at the same time allowing change (Komarčević et al.
2017; Microsoft Partner Community 2018).
The reasons why universities have traditionally chosen this strategy
are various and complex, but generally speaking they all have something
in common. In his book Learning Reimagined, Graham Brown-Martin
(2014) argues that the current model of education is still the same as the
one that was set to prepare the industrial workforce of the nineteenth-
century factories. This model was designed to create workers who would
do their job silently all day to produce identical products; collaboration,
creativity and critical thinking were precisely what the model aimed to
discourage. As this system has become less and less relevant over the years,
it has become increasingly costly to replace the existing infrastructures,
including to radically rethink teaching and learning practices and to re-
devise a new model of knowledge creation that would suit the higher
education’s mission while at the same time respond to the needs of the
new digital information and knowledge landscape. Therefore, for higher
education institutions, the preferred strategy has traditionally been to
progressively integrate digital tools in their existing systems, as a means to
advance educational practices whilst containing the exorbitant costs that a
true revolution would entail, including the inevitable disruptive changes.
After all, despite what the word ‘revolution’ may suggest, these complex
and radical processes are painfully slow and always require years to be
implemented. In fact, as the ‘Gartner Hype Cycle’ of technology1 indicates
4 L. VIOLA

(Fenn and Raskino 2008), only some of these processes are actually
expected to eventually reach the virtual status of ‘Plateau of Productivity’
and if there is a cost to adapting slowly, the cost to being wrong is higher.
The 2020 health crisis changed all this. In just a few months’ time, the
COVID-19 pandemic accelerated years of change in the functioning of
society, including the way companies in all sectors operated. In 2020, the
McKinsey Global Institute surveyed 800 executives from a wide variety of
sectors based in the United States, Australia, Canada, China, France, Ger-
many, India, Spain and the United Kingdom (Sua et al. 2020). The report
showed that since the start of the pandemic, companies had accelerated
the digitisation of both their internal and external operations by three to
four years, while the share of digital or digitally enabled products in their
portfolios had advanced by seven years. Crucially, the study also provided
insights into the long-term effects of such changes: companies claimed
that they were now investing in their long-term digital transformations
more than in anything else. According to a BDO’s report on the digital
transformation brought about by the COVID crisis (Cohron et al. 2020,
2), just as much as businesses that had developed and implemented digital
strategies prior to the pandemic were in a position to leapfrog their
less digital competitors, organisations that would not adapt their digital
capabilities for the post-coronavirus future would simply be surpassed.
Higher education has also been deeply affected. Before the COVID-19
crisis, higher education institutions would look at technology’s strategic
importance not as a critical component of their success but more as one
piece of the pedagogical puzzle, useful both to achieve greater access
and as a source of cost efficiency. For example, many academics had
never designed or delivered a course online, carried out online students’
supervisions, served as online examiners and presented or attended an
online conference, let alone organise one. According to the United Nations
Educational, Scientific and Cultural Organization (UNESCO), at the first
peak of the crisis in April 2020, more than 1.6 billion students around
the world were affected by campus closures (UNESCO 2020). As on-
campus learning was no longer possible, demands for online courses
saw an unprecedented rise. Coursera, for example, experienced a 543%
increase in new courses enrolments between mid-March and mid-May
2020 alone (DeVaney et al. 2020). Having to adapt quickly to the virtual
switch—much more quickly than they had considered feasible before the
outbreak—universities and higher education institutions were forced to
implement some kind of temporary digital solutions to meet the demands
1 THE HUMANITIES IN THE DIGITAL 5

of students, academics, researchers and support staff. In the peak of the


pandemic, classes needed to be moved online practically overnight, and so
did all sorts of academic interactions that would typically occur face-to-face:
supervisions, meetings, seminars, workshops and conferences, to name but
a few. Universities and research institutes didn’t have much choice other
than to respond rapidly. Thus, just like in the business sector, the shift
towards digital channels had to happen fast as those institutions that did
not promptly and successfully achieve the transition towards the digital
were in high risk of reducing their competitiveness dramatically, and not
just in the near-term.
The sudden accelerated digital shift by universities is one aspect of
society’s forced digital switch during 2020. Remote work, omnichan-
nel commerce, digital content consumption, platformification and digital
health solutions are also examples of how society was kept afloat by the
migration to the digital during the pandemic. This is not the kind of process
that can be fully reversed. On the contrary, the most significant changes
such as remote working, online offerings and remote interactions are in fact
the most likely to remain in the long term, at least in some hybrid form.
According to the McKinsey Global Institute survey (op. cit.), because such
changes reflect new health and hygiene sensitivities, respondents were more
than twice as likely to believe that there won’t be a full return to pre-crisis
norms at all. Similarly, higher education predictions concerning digital or
digitally enhanced offerings anticipated that these were likely to stay even
after the health crisis would be resolved. Dynamic and blended approaches
are therefore likely to become the ‘new normal’ as they allow universities to
minimise potential teaching and learning disruptions in case of emergency
and more importantly, they can now be implemented at a moment’s notice.
Consequently, instructors are more and more required to reimagine their
courses for an online format. The same goes for all the other aspects
of a scholar’s life such as conference presentations, seminars, workshops,
supervisions and exams, as well as research-specific tasks, including data
gathering and analysis.
COVID-19 has finally also changed the role of technology particularly
with regard to its crucial function in universities’ risk mitigation strategies.
According to the 2020 Coursera guide for universities to build and scale
online learning programmes, universities that today are investing heavily
in their digital infrastructures will be able to seamlessly pivot through any
crisis in the future (DeVaney et al. 2020, 1). Although the digitisation of
society was already underway before the crisis, it is argued in these reports
6 L. VIOLA

that the COVID-19 pandemic has marked a clear turning point of historic
proportions for technology adoption for which the paradigm shift towards
digitisation has been sharply accelerated.
Yet if during the health crisis companies and universities were forced
to adopt similar digitisation strategies, almost three years after the start
of the pandemic, now things between the two sectors look different
again. To succeed and adapt to the demands of the new digital market,
companies understood that in addition to investing massively in their
digital infrastructures, they crucially also had to create new business models
that replaced the existing ones which had simply become inadequate
to respond to the rules dictated by new generations of customers and
technologies. The digital transformation has therefore required a deeper
transformation in the way businesses were structuring their organisations,
thought of the market challenges and approached problem-solving (Morze
and Strutynska 2021). In contrast, it appears that higher education has
returned to look at technology as a means for incremental changes, once
again as a way to enhance learning approaches or for cost reduction
purposes, but its disruptive and truly revolutionary impact continues to be
poorly understood and on the whole under-theorised (Branch et al. 2020;
Alenezi 2021). For instance, although universities and research institutes
have to various degrees digitised pedagogical approaches, added digital
skills to their curricula and favoured the use and development of digital
methods and tools for research and teaching, technology is still treated
as something contextual, something that happens alongside knowledge
creation.
Knowledge creation, however, happens in society. And while society
has been radically transformed by technology which has in turn trans-
formed culture and the way it creates it, universities continue to adopt
an evolutionary approach to the digital (Alenezi 2021): more or less
gradual adjustments are made to incorporate it but the existing model of
knowledge creation is left essentially intact. The argument that I advance
in this book is on the contrary that digitisation has involved a much greater
change, a more fundamental shift for knowledge creation than the current
model of knowledge production accommodates. This shift, I claim, has
in fact been in—as opposed to towards—the digital. As societies are in
the digital, one profound consequence of this shift is that research and
knowledge are also in turn inevitably mediated by the digital to various
degrees. As a bare minimum, for example, regardless of the discipline,
a post-COVID researcher is someone able to embrace a broad set of
1 THE HUMANITIES IN THE DIGITAL 7

digital tools effectively. Yet what this entails in terms of how knowledge
production is now accordingly lived, reimagined, conceptualised, managed
and shared has not yet been adequately explored, let alone formally
addressed. In relation to knowledge creation, what I therefore argue for
is a revolutionary rather than an evolutionary approach to the digital.
Whereas an evolutionary approach to the digital extends the existing
model of knowledge creation to incorporate the digital in some form of
supporting role, a revolutionary approach calls for a different model which
entirely reconceptualises the digital and how it affects the very practices
of knowledge production. Indeed, claiming that the shift has been in the
digital acknowledges conclusively that the digital is now integral to not
only society and its functioning, but crucially also to how society produces
knowledge and culture.
Crucially, such different model of knowledge production must break
with the obsolescence of persisting binary modulations in relation to the
digital—for example between digital knowledge creation and non-digital
knowledge creation—in that they continue to suggest artificial divisions. It
is the argument of this book that dual notions of this kind are the spectre of
a much deeper fracture, that which divides knowledge into disciplines and
disciplines into two areas: the sciences and the humanities. Significantly, a
consequence of the shift in the digital is that reality has been complexified
rather than simplified. Many of the multiple levels of complexity that
the digital brings to reality are so convoluted and unpredictable that
the traditional model of knowledge creation based on single discipline
perspectives and divisions is not only unhelpful and conceptually limiting,
but especially after the exponential digital acceleration brought about by
the 2020 COVID-19 pandemic, also incompatible with the current reality
and no longer suited to understand and explain the ramifications of this
unpredictability.
In arguing against a compartmentalisation of knowledge which essen-
tially disconnects rather than connecting expertise (Stehr and Weingart
2000), I maintain that the insistent rigid conceptualisation of division
and competition is complicit of having promoted a narrative which has
paired computational methods with exactness and neutrality, rigour and
authoritativeness whilst stigmatising consciousness and criticality as carriers
of biases, unreliability and inequality. The book is therefore primarily a
reflection on the separation of knowledge into disciplines and of disciplines
into the sciences vs the humanities and discusses its contemporary relevance
and adequateness in relation to the ubiquitous impact of digital technolo-
8 L. VIOLA

gies on society and culture. In the pages that follow, I analyse many of
the different ways in which reality has been transformed by technology—
the pervasive adoption of big data, the fetishisation of algorithms and
automation, the digitisation of education and research and the illusory,
yet believed, promise of objectivism—and I argue that the full digitisation
of society, already well on its way before the COVID-19 pandemic but
certainly brought to its non-reversible turning point by the 2020 health
crisis, has added even further complexity to reality, exacerbating existing
fractures and disparities and posing new complex questions that urgently
require a re-theorisation of the current model of knowledge creation in
order to be tackled.
In advocating for a new model of knowledge production, the book
firmly opposes notions of divisions, particularly a division of knowledge
into monolithic disciplines. I contend that the recent events have brought
into sharper focus how understanding knowledge in terms of discipline
compartmentalisation is anachronistic and not equipped to encapsulate
and explain society. The pandemic has ultimately called for a reconcep-
tualisation of knowledge creation and practices which now must operate
beyond outdated models of separation. In moving beyond the current rigid
framework within which knowledge production still operates, I introduce
different concepts and definitions in reference to the digital, digital objects
and practices of knowledge production in the digital, which break with
dialectical principles of dualism and antagonism, including dichotomous
notions of digital vs non-digital positions.
This book focuses on the humanities, the area of academic knowledge
that had already undergone radical transformation by the digital in the
last two decades. I start by retracing schisms in the field between the
humanities, the digital humanities (DH) and critical digital humanities
(CDH); these are embedded, I argue, within the old dichotomy of
sciences vs humanities and the persistent physics envy in our society and
by extension, in research and academic knowledge. I especially challenge
existing notions such as that of ‘mainstream humanities’ that characterise
it as a field that is seemingly non-digital but critical. I maintain that in the
current landscape, conceptualisations of this kind have more the colour of a
nostalgic invocation of a reality that no longer exists, perhaps as an attempt
to reconstruct the core identity of a pre-digital scholar who now more than
ever feels directly threatened by an aggressive other: the digital. Equally
not relevant nor useful, I argue, is a further division of the humanities into
DH and CDH. In pursuing this argumentation, I examine how, on the one
1 THE HUMANITIES IN THE DIGITAL 9

hand, scholars arguing in favour of CDH claim that the distinction between
digital and analogue is pointless; therefore, humanists must embrace the
digital critically; on the other hand, by creating a new field, i.e., CDH,
they fall into the trap of factually perpetuating the very separation between
digital and critical that they define as no longer relevant.
In pursuing my case for a novel model of knowledge creation in the
digital, throughout the book, I analyse personal use cases; specifically, I
examine how I have addressed in my own work issues in digital practice
such as transparency, documentation and reproducibility, questions about
reliability, authenticity and biases, and engaging with sources through
technology. Across the various examples presented in the following chap-
ters, this book demonstrates how a re-examination of digital knowledge
creation can no longer be achieved from a distance, but only from the
inside, that the digital is no longer contextual to knowledge creation but
that knowledge is created in the digital. This auto-ethnographic and self-
reflexive approach allows me to show how my practice as a humanist in the
digital has evolved over time and through the development of different
digital projects. My intention is not to simply confront algorithms as
instruments of automation but to unpack ‘the cultural forms emerging in
their shadows’ (Gillespie 2014, 168). Expanding on critical posthumanities
theories (Braidotti 2017; Braidotti and Fuller 2019), to this aim I then
develop a new framework for digital knowledge creation practices—the
post-authentic framework (cfr. Chap. 2)—which critiques current positivis-
tic and deterministic views and offers new concepts and methods to be
applied to digital objects and to knowledge creation in the digital.
A little less than a decade ago, Berry and Dieter (2015) claimed that
we were rapidly entering a world in which it was increasingly difficult
to find culture outside digital media. The major premise of this book is
that especially after COVID-19, all information is now digital and even
more, algorithms have become central nodes of knowledge and culture
production with an increased capacity to shape society at large. I therefore
maintain that universities and higher education institutions can no longer
afford to consider the digital has something that is happening to knowledge
creation. It is time to recognise that knowledge creation is happening in
the digital. As digital vs non-digital positions have entirely lost relevance,
we must recognise that the current model of knowledge grounded in
rigid divisions is at best irrelevant and unhelpful and at worst artificial and
harmful. Scholars, researchers, universities and institutions have therefore
a central role to play in assessing how digital knowledge is created not
10 L. VIOLA

just today, but also for the purpose of future generations, and clear
responsibilities to shoulder, those that come from being in the digital.

1.2 THE ALGORITHM MADE ME DO IT!


Computational technology such as artificial intelligence (AI) can be
thought in many ways to be like a ‘Mechanical Turk’.2 The Mechanical
Turk or simply ‘The Turk’ was a chess-playing machine constructed by
Wolfgang von Kempelen in the late eighteenth century. The mechanism
appeared to be able to play a game of chess against a human opponent
completely by itself. The Turk was brought to various exhibitions and
demonstrations around Europe and the Americas for over eighty years and
won most of the games played, defeating opponents such as Napoleon
Bonaparte and Benjamin Franklin. In reality, the Mechanical Turk was a
complex, mechanical illusion that was in fact operated by a human chess
master hiding inside the machine.
AI and technology can be thought in many ways to be like the
Mechanical Turk whereby the choices and actions hidden from view only
but create the illusion of both a fully autonomous process and impartial
output. And just like the Mechanical Turk was celebrated and paraded,
the ‘Digital Turn’ and its flow of data have been applauded and welcomed
practically ubiquitously. Indeed, hyped up by the reassuring promises of
neutrality, objectivity, fairness and accuracy held out by digital technology
and data, both industry and academia have embraced the so-called big
data revolution, data-sets that are so large and complex that no traditional
software—let alone humans—would ever be able to analyse it. In 2017,
IBM reported that more than 90% of the world’s data had appeared in
the two previous years alone. Today, in sectors such as healthcare, big
data is being used to reduce healthcare costs for individuals, to improve
the accuracy and the waiting time for diagnoses, to effectively avoid
preventable diseases or to predict epidemic outbreaks. The market of big
data analytics in healthcare has continually grown and not just since the
COVID-19 pandemic. According to a 2020 report about big data in
healthcare, the global big data healthcare analytics market was worth over
$14.7 billion in 2018, $22.6 billion in 2019 and expected to be worth
$67.82 billion by 2025. A more recent projection in June 2020 estimated
this growth to reach $80.21 billion by 2026, exhibiting a CAGR3 of 27.5%
(ResearchAndMarkets.com 2020).
1 THE HUMANITIES IN THE DIGITAL 11

Big data analytics has also been incorporated into the banking sector
for tasks such as improving the accuracy of risk models used by banks and
financial institutions. In credit management, banks use big data to detect
fraud signals or to understand the customer behaviour from the analysis
of investment patterns, shopping trends, motivation to invest and personal
or financial background. According to recent predictions, the market of
big data analytics in banking could rise to $62.10 billion by 2025 (Flynn
2020). Ever larger and more complex data-sets are also used for law and
order policy (e.g., predictive policing), for mapping user behaviour (e.g.,
social media), for recording speech (e.g., Alexa, Google Assistant, Siri) and
for collecting and measuring the individual’s physiological data, such as
their heart rate, sleep patterns, blood pressure or skin conductance. And
these are just a few examples.
More data and therefore more accuracy and freedom from subjectivity
were also promised to research. Disciplines across scientific domains have
increasingly incorporated technology within their traditional workflows
and developed advanced data-driven approaches to analyse ever larger and
more complex data-sets. In the spirit of breaking the old schemes of opaque
practices, it is the humanities, however, that has arguably been impacted
the most by this explosion of data. Thanks to the endless flow of searchable
material provided by the Digital Turn, now humanists could finally change
the fully hermeneutical tradition, believed to perpetuate discrimination and
biases.
This looked like ‘that noble dream’ (Novick 1988). Millions of records
of sources seemed to be just a click away. Any humanist scholar with a
laptop and an Internet connection could potentially access them, explore
them and analyse them. Even more revolutionising was the possibility to
finally be able to draw conclusions from objective evidence and so dismiss
all accusations that the humanities was a field of obscure, non-replicable
methods. Through large quantities of ‘data’, humanists could now under-
stand the past more wholly, draw more rigorous comparisons with the
present and even predict the future. This ‘DH moment’, as it was called,
was perfectly in line with a more global trend for which data was (and to a
large extent still is) presumed to be accurate and unbiased, therefore more
reliable and ultimately, fairer (Christin 2016). The ‘DH promise’ (Thomas
2014; Moretti 2016) was a promise of freedom, freedom from subjectivity,
from unreliability, but more importantly from the supposed irrelevance of
the humanities in a data-driven world. It was also soaked in positivistic
hypes about the endless opportunities of data-driven research methods
12 L. VIOLA

in general and for humanities research in particular, such as the artful


deception of suddenly being able to access everything or the scientistic
belief in data as being more reliable than sources.
Following this positivistic hype, however, the unquestioning belief in
the endless possibilities and benefits of applying computational techniques
for the good of society and research started to be harshly criticised for being
false and unrealistic (cfr. Sect. 1.3). The alluring and reassuring promises of
data neutrality, objectivity, fairness and accuracy have indeed been found
illusory, algorithms and data-driven methods even more biased than the
interpretative act itself (Dobson 2019) and, ironically, in desperate need of
human judgement to not cause harm (Gillespie 2014).
Particularly the indiscriminate use of big data in domains of societal
influence such as bureaucracy, policy-making or policing has started to raise
fundamental questions about democracy, ethics and accountability. For
example, data companies hired by politicians all over the world have used
questionable methods to mine the social media profiles of voters to influ-
ence election results through a technique called microtargeting that uses
extremely targeted messages to influence users’ behaviour. Although it is
true that this technique has proven highly effective for marketing purposes,
the causality of political microtargeting remains largely under-researched
and therefore it is still poorly understood. The fact remains, however,
that the use of personal data collected without the user’s knowledge or
permission to build sophisticated profiling models raises ethical and privacy
issues. For example, in 2015, Cambridge Analytica acquired the personal
data of about 87 million Facebook users without their explicit permission.
Their data had been collected via the 270,000 Facebook users who had
given the third-party app ‘This Is Your Digital Life’ access to information
on their friends’ network. Cambridge Analytica had acquired and used such
data claiming it was exclusively for academic purposes; Facebook had then
allowed the app to harvest data from the Facebook friends of the app’s
users which were subsequently used by Cambridge Analytica. In this way,
although only 270,000 people had given permission to the app, data was
in fact collected from 87 million users. This revealed a scary privacy and
personal data management loophole in Facebook’s privacy agreement; it
raised serious concerns about how digital private information is collected,
stored and shared not just by Facebook but by companies in general and
how these opaque processes often leave unaware individuals completely
powerless.
1 THE HUMANITIES IN THE DIGITAL 13

But it is not just tech giants and academic research that jumped on the
suspicious big data and AI bandwagon; governments around the world
have also been exploiting this technology for matters of governance, law
enforcement and surveillance, such as blacklisting and the so-called pre-
dictive policing, a data-driven analytics method used by law enforcement
departments to predict perpetrators, victims or locations of future crimes.
Predictive policing software analyses large sets of historic and current crime
data using machine learning (ML) algorithms to determine where and
when to deploy police (i.e., place-based predictive policing) or to identify
individuals who are allegedly more likely to commit or be a victim of a crime
(i.e., person-based predictive policing). While supporters of predictive
policing argue that these systems help predict future crimes more accurately
and objectively than police’s traditional methods, critics complain about
the lack of transparency in how these systems actually work and are used
and warn about the dangers of blindly trusting the supposed rigour of this
technology. For example, in June 2020, Santa Cruz, California—one of
the first US cities to pilot this technology in 2011—was also the first city
in the United States to ban its municipal use. After nine years, the city of
Santa Cruz decided to discontinue the programme over concerns of how
it perpetuated racial inequality. The argument is that, as the data-sets used
by these systems include only reported crimes, the obtained predictions are
deeply flawed and biased and result in what could be seen as a self-fulfilling
prophecy. In this respect, Matthew Guariglia maintains that ‘predictive
policing is tailor-made to further victimize communities that are already
overpoliced—namely, communities of colour, unhoused individuals, and
immigrants—by using the cloak of scientific legitimacy and the supposed
unbiased nature of data’ (Guariglia 2020). Despite other examples of
predictive policing programmes being discontinued following audits and
lawsuits, at the moment of writing, more than 150 cities in the United
States have adopted predictive policing (Electronic Frontier Foundation
2021). Outside of the United States, China, Denmark, Germany, India,
the Netherlands and the United Kingdom are also reported to have tested
or deployed predictive policing tools.
The problem with predictive policing has little to do with intentionality
and a lot to do with the limits of computation. Computer algorithms are a
finite list of instructions designed to perform a computational task in order
to produce a result, i.e., an output of some kind. Each task is therefore
performed based on a series of instructed assumptions which, far from
being unbiased, are not only obfuscated by the complexity of the algorithm
14 L. VIOLA

itself but also artfully hidden by the surrounding algorithmic discourse


which socially legitimises its outputs as objective and reliable. The truth
is, however, that computers are extremely efficient and fast at automating
complex and lengthy processes but that they perform rather poorly when
it comes to decision-making and judgement. In the words of Danah Boyd
(2016, 231):

[…] if they [computers] are fed a pile of data and asked to identify
correlations in that data, they will return an answer dependent solely on
the data they know and the mathematical definition of correlation that they
are given. Computers do not know if the data they receive is wrong, biased,
incomplete, or misleading. They do not know if the algorithm they are told to
use has flaws. They simply produce the output they are designed to produce
based on the inputs they are given.

Boyd gives the example of a traffic violation: a red light run by someone
who is drunk vs by someone who is experiencing a medical emergency. If
the latter scenario is not embedded into the model as a specific exception,
then the algorithm will categorise both events as the same traffic violation.
The crucial difference in decision-making processes between humans and
algorithms is that humans are able to make a judgement based on a
combination of factors such as regulations, use cases, guidelines and,
fundamentally, environmental and contextual factors, whereas algorithms
still have a hard time mimicking the nature of human understanding.
Human understanding is fluid and circular, whilst algorithms are linear
and rigid. Furthermore, the data-sets on which computational decision-
making models are based are inevitably biased, incomplete and far from
being accurate because they stem from the very same unequal, racist, sexist
and biased systems and procedures that the introduction of computational
decision-making was intended to prevent in the first place.
Moreover, systems become increasingly complex and what might be
perceived as one algorithm may in fact be many. Indeed, some systems
can reach a level of complexity so deep that understanding the intricacies
and processes according to which the algorithms perform the assigned tasks
becomes problematic at best, if at all possible (Gillespie 2014). Although
this may not always have serious consequences, it is nevertheless worth of
close scrutiny, especially because today complex ML algorithms are used
extensively, and more and more in systems that operate fundamental social
functions such as the already cited healthcare and law and order, but as a
1 THE HUMANITIES IN THE DIGITAL 15

matter of fact they are still ‘poorly understood and under-theorized’ (Boyd
2018). Despite the fact that they are assumed to be, and often advertised
as being neutral, fair and accurate, each algorithm within these complex
systems is in fact built according to a set of assumptions and cultural values
that reflect the strategic choices made by their creators according to specific
logics, may these be corporate or institutional.
Another largely distorted view surrounding digital and algorithmic
discourse concerns data. Although algorithms and data are often thought
to be two distinct entities independent from each other, they are in
fact two sides of the same coin. In fact, to fully understand how an
algorithm operates the way it does, one needs to look at it in combination
with the data it uses, better yet at how the data must be prepared for
the algorithm to function (Gillespie 2014). This is because in order for
algorithms to work properly, that is automatically, information needs to be
rendered into data, e.g., formalised according to categories that will define
the database records. This act of categorising is precisely where human
intervention hides. Gillespie pointedly remarks that far from being a neutral
and unbiased operation, categorisation is in fact an act of ‘a powerful
semantic and political intervention’ (Gillespie 2014, 171), deciding what
the categories are, what belongs in a category and what does not are
all powerful worldview assertions. Database design can therefore have
potentially enormous sociological implications which to date have largely
been overlooked (ibid.).
A recent example of the larger repercussions of these powerful world-
view assertions is fashion companies for people with disabilities and how
their requests to be advertised by Facebook have been systematically
rejected by Facebook’s automated advertising centre. Again, the reason
for the rejection is unlikely to have anything to do with intentionally
discriminating against people with disabilities, but it is to be found in
the way fashion products for people with disabilities are identified (or
rather misidentified) by Facebook algorithms that determine products’
compliance with Facebook policy. Specifically, these items were categorised
as ‘medical and health care products and services including medical devices’
and as such, they violated Facebook’s commercial policy (Friedman 2021).
Although these companies had their ads approved after appealing to Face-
book’s decision, episodes like this one reveal not only the deep cracks in
ML models, but worse, the strong biases in society at large. To paraphrase
Kate Crawford, every classification system in machine learning contains
a worldview (Crawford 2021). In this particular case, the implicit bias in
16 L. VIOLA

Facebook’s database worldview would be that a person with disability is not


believed to possibly have an interest in fashion as a form of self-expression.
Despite the growing evidence as well as statements of acknowledge-
ment—‘Raw Data is an oxymoron’, Lisa Gitelman wrote in 2013 (Gitel-
man 2013)—in most of the public and academic discourse, data continues
to be exalted as being exact and unarguable, mostly still thought of as
a natural resource rather than a cultural, situated one. To the contrary,
it is the uncritical use of data to make predictions in matters of welfare,
homelessness, crime and child protection to name but a few which has
created systems that are, in Virginia Eubanks’ words, ‘Automating Inequal-
ity’ (2017). The immediate, profound and dangerous consequence of the
indiscriminate use of automated systems is that the resulting decisions
are remorselessly blamed on the targeted individual and justified morally
through the legitimisation of practices believed to be evidence-based,
therefore accurate and unbiased. This is what Boyd calls ‘dislocation of
liability’ (2016, 232) for which decision-makers are distanced from the
humanity of those affected by automated procedures.
In this book, I advance a critique of the mainstream big data and
algorithmic discourse which continues to fetishisise data as impartial and
somewhat pre-existing and which obscures the subjective and interpretative
dimension of collecting, selecting, categorising and aggregating, i.e., the
act of making data. I argue that following the shift in the digital rapidly
accelerated by the pandemic, a new set of notions, practices and values
needs to be devised in order to re-figure the way in which we conceptualise
data, technology, digital objects and on the whole the process of digital
knowledge creation. Drawing on posthumanist studies (Braidotti 2017;
Braidotti and Fuller 2019; Braidotti 2019) and on recent theories of digital
cultural heritage (Cameron 2021), to this end, I present a novel framework:
the post-authentic framework. With this framework, I propose concepts,
practices and values that recognise the larger cultural relevance of digital
objects and the methods to create them, analyse them and visualise them.
Significantly, the post-authentic framework problematises digital objects
as unfinished, situated processes and acknowledges the limitations, biases
and incompleteness of tools and methods adopted for their analysis in the
process of digital knowledge creation. In this way, the framework ultimately
introduces a counterbalancing narrative in the main positivist discourse that
equals the removal of the human—which in any case is illusory—to the
removal of biases. Indeed, as the promises of a newly found freedom from
subjectivity are increasingly found to be false, the post-authentic framework
1 THE HUMANITIES IN THE DIGITAL 17

acts as a reminder that in our own time, computational technology is like


the Mechanical Turk of that earlier century.
Featuring a range of personal case studies and exploring a variety
of applied contexts such as digital heritage practices, digital linguistic
injustice, critical digital literacy and critical digital visualisation, I devote
specific attention to four key aspects of knowledge creation in the digital:
creation of digital material, enrichment of digital material, analysis of digital
material and visualisation of digital material. My intention is to show
how contributions to working towards systemic change in research and
by extension in society at large, can be implemented when collecting,
assessing, reviewing, enriching, analysing and visualising digital material.
Throughout the chapters, I use the post-authentic framework to discuss
these various case examples and to show that it is only through the
conscious awareness of the delusional belief in the neutrality of data, tools,
methods, algorithms, infrastructures and processes (i.e., by acknowledging
the human chess master hiding inside the Turk) that the embedded biases
can be identified and addressed.
My argument is closely related to the notion of ‘originary technicity’
(see, for instance, Heidegger 1977; Clark 1992; Derrida 1994;
Beardsworth 1996; Stiegler 1998) which rejects the Aristotelian view
of technology as merely utilitarian. Originary technicity claims that
technology is not simply a tool that humans deploy for their own ends,
because humans are always invested in the technology they develop. In this
way, technology (e.g., AI and algorithms) becomes in turn a central node
of knowledge and culture production and the knowledge and culture
so produced shape humans and their vision of the world in a mutually
reinforcing cycle. Culture is incorporated in technology as it is built by
humans who then use technology to produce culture. Hence, as the
very concept of an absolute objectivity when adopting computational
techniques (or in general, for that matter) is an illusion, so are the notions
of ‘fully autonomous’ or ‘completely unbiased’ processes. An uncritical
approach to the use of computational methods, I maintain, not only
simply reinforces the very old schemes of obscure practices that digital
technology claims to break, but more importantly it can make society
worse.
This is a reality that can no longer be ignored and which can only
be confronted through a reconfiguration of our model of knowledge
creation. This re-examination would relinquish illusory positivistic notions
and acknowledge digital processes as situated and partial, as an extremely
18 L. VIOLA

convoluted assemblage of components which are themselves part of wider


networks of other entities, processes and mechanisms of interaction.
Broadly, the argument that I advance is that the current model of
knowledge must be re-figured to incorporate this critical awareness, ever
more necessary in order to address the new challenges brought by the
pandemic and the digital transformation of society. The shift in the digital
has created a complexity that a model of knowledge supporting divisive
positions (i.e., between on one side disciplines that are digital and therefore
believed to be objective and on the other disciplines that are non-digital
and therefore biased) cannot address.
I start my argument for an urgent knowledge reconceptualisation by
building upon posthuman critical theory (Braidotti 2017) which argues
that the matter ‘is not organized in terms of dualistic mind/body oppo-
sitions, but rather as materially embedded and embodied subjects-in-
process’ (16). In this regard, posthuman critical theory introduces the
helpful notion of monism (cfr. Chap. 2), in which the power of differences
is not denied but at the same time, it is not structured according to
principles of oppositions, and therefore it does not function hierarchically
(ibid.). A model of knowledge in the digital equally abandons dichotomous
ideas that continue to be at the foundation of our conceptualisation of
knowledge formation, such as digital vs non-digital positions, critical vs
technological and, the biggest of all, that of the sciences vs the humanities.

1.3 A TALE OF TWO CULTURES


The hyper-specialisation of research that a discipline-based model of knowl-
edge creation inevitably entails and how such a solid structure impedes
rather than advancing knowledge has been debated in the academic forum
for years (e.g., Klein 1983; Thompson Klein 2004; Chubin et al. 1986;
Stehr and Weingart 2000; McCarty 2015). As the rigid organisation
into disciplines has begun to dissolve over the course of the twenty-first
century, observers started to suggest that the existing model of knowledge
production was increasingly inadequate to explain the world and that it
was in fact modern society itself that was calling for its reconceptualisation.
Weingart and Stehr (2000), for instance, proposed that ‘one may have to
add a postdisciplinary stage to the predisciplinary stage of the seventeenth
and eighteenth centuries and the disciplinary stage of the nineteenth and
twentieth centuries’ (ibid., xii). At the same time, however, the undeniable
1 THE HUMANITIES IN THE DIGITAL 19

amalgamation of disciplines was affecting areas of knowledge unevenly;


authors noticed how, for example, in fields such as the natural sciences with
a problem-solving orientation and where knowledge production is typically
fast, boundaries between disciplines were much more blurred than in the
humanities (ibid.).
The Digital Turn seemed to be capable of changing this tradition. The
dynamic and disrupting essence of the digital on knowledge creation and
on humanities scholarship in particular appeared to be correcting this
unevenness and make the humanities interdisciplinary. Scholars observed
how the digital was not only challenging and transforming structures
of knowledge but that it was also creating new structures (e.g., digital
humanities, digital history, digital cultural heritage) (Klein 2015; Cameron
and Kenderdine 2007; Cameron 2007). The field of DH, it was argued,
would in this sense be ‘naturally’ interdisciplinary as it provides new
methods and approaches which necessarily require new practices and new
ways of collaborating. Another ‘promise’ of DH was that of being able
to ‘transform the core of the academy by refiguring the labor needed for
institutional reformation’ (Klein 2015, 15).
After the initial enthusiasm and despite many examples around the
world of interdisciplinary initiatives, academic programmes, departments
and centres (Stehr and Weingart 2000; Deegan and McCarty 2011; Klein
2015), in twenty years, the rigid division into disciplines has however not
changed much; it remains the persistent dominant model in use for knowl-
edge production, and true collaboration is on the whole rare (Deegan and
McCarty 2011, 2). Indeed, what these cases of interdisciplinarity show
is a common trend: when disciplines share similar interests, rather than
boundaries dissolving and merging as interdisciplinary discourse usually
claims, what in fact tends to happen is that in order to respond to the new
external challenges, disciplines further specialise and by leveraging their
overlapping spaces, they create yet new fields. This modern phenomenon
has been referred to as ‘The paradox of interdisciplinarity’ (Weingart
2000):

interdisciplinarity […] is proclaimed, demanded, hailed, and written into


funding programs, but at the same time specialization in science goes on
unhampered, reflected in the continuous complaint about it. […] The
prevailing strategy is to look for niches in uncharted territory, to avoid
contradicting knowledge by insisting on disciplinary competence and its
boundaries, to denounce knowledge that does not fall into this realm as
20 L. VIOLA

‘undisciplined.’ Thus, in the process of research, new and ever finer structures
are constantly created as a result of this behaviour. This is (exceptions
notwithstanding) the very essence of the innovation process, but it takes
place primarily within disciplines, and it is judged by disciplinary criteria of
validation. (Weingart 2000, 26–27)

The author argues that starting from the early nineteenth century when
the separation and specialisation of science into different disciplines was
created, interdisciplinarity became a promise, the promise of the unity of
science which in the future would have been actualised by reducing the
fragmentation into disciplines. Today, however, interdisciplinarity seems to
have lost interest in that promise as the discourse has shifted from the idea
of ultimate unity to that of innovation through a combination of variations
(ibid., 41). For example, in his essay Becoming Interdisciplinary, McCarty
(2015) draws a close parallel between the struggle of dealing with the post-
World War II overwhelming amount of available research that inspired
Vannevar Bush’s Memex and the situation of contemporary researchers.
Bush (1945) maintained that the investigator could not find time to deal
with the increasing amount of research which had exceeded far beyond
anyone’s ability to make real use of the record. The difficulty was, in his
view, that if on the one hand ‘specialization becomes increasingly necessary
for progress’, on the other hand, ‘the effort to bridge between disciplines
is correspondingly superficial.’ The keyword on which we should focus our
attention, McCarty argues, is superficial (2015, 73):

Bush’s geometrical metaphor (superficies, having length or breadth with-


out thickness), though undoubtedly intended as merely a common adjec-
tive, makes the point elaborated in another context by Richard Rorty
(2004/2002): that the implicit model of knowledge at work here privileges
singular truth at depth, reached by the increasingly narrower focus of
disciplinary specialization, and correspondingly trivializes plenitude on the
surface, and so the bridging of disciplines.

According to Rorty, being interdisciplinary does not mean looking for


the one answer but going superficial, i.e., wide, to collect multiple voices
and multiple perspectives (2004). It has been argued, however, that true
collaboration requires a more fundamental shift in the way knowledge
creation is conceived than simply studying a common question or problem
from different perspectives (van den Besselaar and Heimeriks 2001; Dee-
1 THE HUMANITIES IN THE DIGITAL 21

gan and McCarty 2011). This would also include a deep understanding
of disciplines and approaches other than one’s own (Gooding 2020).
Indeed, the contemporary notion of interdisciplinarity based on the idea
that innovation is better achieved by recombining ‘bits of knowledge
from previously different fields’ into novel fields is bound to create more
specialisation and therefore new boundaries (Weingart 2000, 40).
The schism of the humanities between ‘mainstream humanities’ and
digital humanities, and later between digital humanities and critical digital
humanities, perfectly illustrates the issue. In 2012, Alan Liu wrote a
provocative essay titled Where Is Cultural Criticism in the DH? (Liu 2012).
The essay was essentially a plea for DH to embrace a wider engagement
with the societal impact of technology. It was very much the author’s hope
that the plea would help to convert this ‘deficit’ into ‘an opportunity’,
the opportunity being for DH to gain a long overdue full leadership,
as opposed to a ‘servant’ role within the humanities. In other words, if
the DH wanted to finally become recognised as legitimate partners of
‘mainstream humanities’, they needed to incorporate cultural criticism in
their practices and stop pushing buttons without reflecting on the power
of technology.
In the aftermath of Liu’s essay, reactions varied greatly with views
ranging from even harsher accusations towards DH to more optimistic
perspectives, and some also offering fully programmatic and epistemolog-
ical reflections. Some scholars, for example, voiced strong concerns about
the wider ramifications of the lack of cultural critique in DH, what has
often been referred to as ‘the dark side of the digital humanities’ (Grusin
2014; Chun et al. 2016), the association of DH with the ‘corporatist
restructuring of the humanities’ (Weiskott 2017), neoliberalism (Allington
et al. 2016), and white, middle-class, male dominance (Bianco 2012). Two
controversial essays in particular, one published in 2016 by Allington et
al. (op. cit.) and the other a year later by Brennan (2017) argued that,
in a little over a decade, the myopic focus of DH on neoliberal tooling
and distant reading had accomplished nothing but consistently pushing
aside what has always been the primary locus of humanities investigation:
intellectual practice.
This view was also echoed by Grimshaw (2018) who indicted DH
for going to bed with digital capitalism, ‘an online culture that is anti-
diversity and enriching a tiny group of predominantly young white men’
(2). Unlike Weiskott (2017), however, who argued ‘There is no such
a thing as “the digital humanities”’, meaning that DH is merely an
22 L. VIOLA

opportunistic investment and a marketing ploy but it doesn’t really alter the
core of the humanities, Grimshaw maintained that this kind of pandering
causes rot at the heart of the humanistic knowledge and practice. This
he calls ‘functionalist DH’, the use of tools to produce information in
line with managerial metrics but with no significant knowledge value (6).
Grimshaw strongly criticises DH for having disappointed the promise of
being a new discipline of emancipation and for being in fact ‘nothing
more than a tool for oppression’. The digital transformation of society,
he continues, has resulted in increased inequality, wider economic gap,
an upsurge in monopolies and surveillance, lack of transparency of big
data, mobbing, trolling, online hate speech and misogyny. Rather than
resisting it, DH is guilty of having embraced such culture, of operating
within the framework of lucrative tech deals which perpetuate and reinforce
the neoliberal establishment. Digital humanists are establishment curators
and no longer able of critical thought; DH is therefore totally unequipped
to rethink and criticise digital capitalism. Although he acknowledges the
emergence of critical voices within DH, he also strongly advocates a more
radical approach which would then justify the need for a ‘new’ field,
an additional space within the university where critique, opposition and
resistance can happen (7). This space of resistance and critical engagement
with digital capitalism is, he proposes, critical digital humanities (CDH).
Over the years, other authors such as Hitchcock (2013), Berry (2014)
and Dobson (2019) have also advocated critical engagement with the
digital as the epistemological imperative for digital humanists and have
identified CDH as the proper locus for such engagement to take place. For
example, according to Hitchcock, humanists that use digital technology
must ‘confront the digital’, meaning that they must reflect on the contex-
tual theoretical and philosophical aspects of the digital. For Berry, CDH
practice would allow digital humanists to explore the relationship between
critical theory and the digital and it would be both research- and practice-
led. Equally for Dobson, digital humanists must endlessly question the
cultural dimension and historical determination of the technical processes
behind digital operations and tools. With perhaps the sole exception of
Grimshaw (op. cit.) who is not interested in practice-led digital enquiry,
the general consensus is on the urgency of conducting critically engaged
digital work, that is, drawing from the very essence of the humanities, its
intrinsic capacity to critique.
However, whilst these proposed methodologies do not differ dramati-
cally across authors, there seems to be disagreement about the scope of the
1 THE HUMANITIES IN THE DIGITAL 23

enquiry itself. In other words, the open question around CDH would not
concern so much the how (nor the why) but the what for?. For example,
Dobson (2019) is not interested in a critical engagement with the digital
that aims to validate results; this would be a pointless exercise as the
distinction between the subjectivity of an interpretative method and the
objectivity of both data and computational methods is illusory. He claims
(ibid., 46):


there is no such thing as contextless quantitative data. […] Data are imag-
ined, collected, and then typically segmented. […] We should doubt any
attempt to claim objectivity based on the notion of bypassed subjectivity
because human subjectivity lurks within all data. This is because data do
not merely exist in the world, but are abstractions imagined and generated
by humans. Not only that, but there always remain some criteria informing
the selection of any quantity of data. This act of selection, the drawing of
boundaries that names certain objects a data-set introduces the taint of the
human and subjectivity into supposedly raw, untouched data.

As ‘There is no such thing as the “unsupervised”’ (ibid., 45), the aim of


CDH is to thoroughly critique any claimed objectivity of all computational
tools and methods, to be suspicious of presumed human-free approaches
and to acknowledge that complete de-subjectification is impossible. The
aim of CDH, he argues, is not to expand the set of questions in DH, like
in Berry and Fagerjord’s view (2017), but to challenge the very notion of
a completely objective approach. In this sense, CDH is the endless search
for a methodology, the very essence of humanistic enquiry.
Berry (2014) also starts from the assumption that the notion of objective
data is illusory, however, he reaches opposite conclusions about what the
aim of CDH is. For him and Fagerjord (2017), CDH would provide
researchers with a space to conduct technologically engaged work, that is,
work that uses technology but also draws on a vast range of theoretical
approaches (e.g., software studies, critical code studies, cultural/critical
political economy, media and cultural studies). This would allow scholars
from many critical disciplines to tackle issues such as the historical context
of any used technology and its theoretical limitations, including, for
instance, a commitment to its political dimension. By doing so, CDH
would address the criticism about the lack of cultural critique in DH and
it would enrich DH with other forms of scholarly work (ibid., 175). In
24 L. VIOLA

other words, by ‘fixing’ the lack of critical engagement of the field, the
function of CDH would be to strengthen DH, thus markedly diverging
from Dobson.
Albeit from different epistemological points of view, these reflections
share similar methodological and ethical concerns and question the lack of
critical engagement of DH, be they historical, cultural or political. I argue
however that this reasoning exposes at least three inconsistencies. Firstly,
in earlier perspectives (e.g., Liu 2012), the sciences are deemed to be
obviously superior to the humanities and yet, as soon as the computational
is incorporated into the field, the value of the humanities seems to have
decreased rather than increased. For example, Bianco (2012) advocates
a change in the way digital humanists ‘legitimise’ and ‘institutionalise’
the adoption of computational practices in the humanities. Such change
would require not simply defending the legitimacy or advocating the
‘obvious’ supremacy of computational practices but by reinvesting in the
word humanities in DH. The supremacy of the digital would then be
understood as a combination of superiority, dominance and relevance that
computational practices—and by extension, the hard sciences (i.e., physics
envy)—are believed to have over the humanities. However, as Grimshaw
(2018) also argued later, in the process of incorporating the computational
into their practices, the humanities forgot all about questions of power,
domination, myth and exploitation and have become less and less like the
humanities and more and more like a field of execute button pushers.
Despite acknowledging the illusion of subjectivity, this view shows how
deeply rooted in the collective unconscious is the myth surrounding
technology and science which firmly positions them as detached from
human agency and distinctly separated from the humanities.
Secondly and following from the first point, these views all share a
persistent dualistic, opposing notion of knowledge, which in one form or
another, under the disguise of either freshly coined or well-seasoned terms,
continue to reflect what Snow famously called ‘the two cultures’ of the
humanities and the sciences (2013). Such separation is typically verbalised
in competing concepts such as subjectivity vs objectivity, interpretative vs
analytical and critical vs digital. Despite using terms that would suggest
union (e.g., ‘incorporated’), the two cultures remain therefore clearly
divided. The conceptualisation of knowledge creation which continues to
compartmentalise fields and disciplines, I argue, is also reflected in the clear
division between the humanities, DH and CDH. This model, I contend, is
highly problematic because besides promoting intense schism, it inevitably
1 THE HUMANITIES IN THE DIGITAL 25

leads disciplines to operate within a hierarchical, competitive structure in


which they are far from equal. For example, Liu’s critique mirrors the
persistent dichotomy of science vs humanities: due to the lack of cultural
criticism—typical of the sciences but not of the humanities—DH is not
humanities at all. DH may be instrumental to the humanities (i.e., the
humanities is superior to DH but inferior to the sciences), but it is reduced
to a servant role. Hence, if typical descriptions of DH as a space in which
the two worlds—the sciences and the humanities—‘meld’ seem to initially
suggest a harmonious and egalitarian coexistence, in reality the way this
relationship is interplayed is anything but.
The third contradiction refers to what Berry and Fagerjord (2017)
(among others) point out in reference to the digital transformation of
society that ‘The question of whether something is or is not “digital”
will be increasingly secondary as many forms of culture become mediated,
produced, accessed, distributed or consumed through digital devices and
technologies” (13). Humanists, they claim, must relinquish any com-
parative notion of digital vs analogue as this contrast ‘no longer makes
sense’ (ibid., 28). What humanists need to do instead, they continue, is
to reflect critically on the computational and on the ramifications of the
computational in a dedicated space which, like Grimshaw and Dobson, they
also suggest calling CDH, thus circling back to the second contradiction.
If the humanities are critical and if the distinction between digital and
analogue ‘no longer makes sense’, then by insisting on establishing a
CDH, they fail to transcend the very same distinction between digital and
analogue they claim it to be nonsensical.
While I see the validity and truth in the debates that have animated past
DH scholarship, I also argue that the reason for these inconsistencies is to
be found in the specific model of knowledge within which these scholars
still operate: a model in which knowledge is divided into competing
disciplines. Behind the pushes to relinquish ideas of divisions and embrace
the digital is a persistent disciplinary structure of knowledge which, despite
the declared novelty, is bound to the epistemology of the last century.
Instead, I maintain, we should not accommodate the digital within the
existing disciplinary structure as it is the structure of knowledge itself and
its conceptualisation into separate fields and worldviews that has to change.
The current model of knowledge creation, grounded in division and
competition, is unequipped to explain the complexities of the world and
the 2020 pandemic has magnified the urgency of adopting a strong critical
stance on the digital transformation of society. This cannot happen through
26 L. VIOLA

the creation of niche fields, let alone exclusively within the humanities, but
through a reconceptualisation of knowledge creation itself.
The post-authentic framework that I propose in this book moves beyond
the existing breakdown of disciplines which I see as not only unhelpful
and conceptually limiting but also harmful. The main argument of this
book is that it is no longer solely the question of how the digital affects
the humanities but how knowledge creation more broadly happens in
the digital. Thinking in terms of yet another field (e.g., CDH) where
supposedly computational science and critical enquiry would meet in this
or that modulation, for this or that goal, still reiterates the same bound-
aries that hinder that enquiry. Similarly, claiming that DH scholarship
conducts digital enquiry suggests that humanities scholarship does not
happen in the digital and therefore it continually reproduces the outmoded
distinction between digital and analogue as well as the dichotomy between
digital/non-critical and non-digital/critical. Conversely, calls for a CDH
presuppose that DH is never critical (or worse, that it cannot be critical
at all) and that the humanities can (should?) continue to defer their
appointment with the digital, and disregard any matter of concern that
has to do with it, ultimately implying that to remain unconcerned by the
digital is still possible.
But the digital affects us all, including (perhaps especially) those who
do not have access to it. The digital transformation exacerbates the already
existing inequalities in society as those who are the most vulnerable such
as migrants, refugees, internally displaced persons, older persons, young
people, children, women, persons with disabilities, rural populations and
indigenous peoples are disproportionately affected by the lack of digital
access. The digital lens provided by the 2020 pandemic has therefore
magnified the inequality and unfairness that are deeply rooted in our
societies. In this respect, for example, on 18 July 2020, UN Secretary-
General Antonio Guterres declared (United Nations 2020a):

COVID-19 has been likened to an x-ray, revealing fractures in the fragile


skeleton of the societies we have built. It is exposing fallacies and falsehoods
everywhere: the lie that free markets can deliver healthcare for all; the fiction
that unpaid care work is not work; the delusion that we live in a post-racist
world; the myth that we are all in the same boat. While we are all floating on
the same sea, it’s clear that some are in super yachts, while others are clinging
to the drifting debris.
1 THE HUMANITIES IN THE DIGITAL 27

The post-authentic framework that I propose in this book is a con-


ceptual framework for knowledge creation in the digital; it rejects the
view of the digital as crossing paths with disciplines, intersecting, melting,
merging, meeting or any other verb that suggests that separate entities
are converging but which leave the model of knowledge essentially unaf-
fected. I maintain that this sort of worldview is obsolete, even dangerous;
researchers can no longer justify statements such as ‘I’m not digital’ as
we are all in the digital. But rather than seeing this transformation as a
threat, some sort of bleak reality in which critical thinking no longer has a
voice and everything is automated, I see it as an opportunity for change of
historic proportion. Any process of transformation fundamentally changes
all the parts involved; if we accept the notion of digital transformation
with regard to society, we also have to acknowledge that as much as the
digital transforms society, the way society produces knowledge must also
be transformed. This entails acknowledging the unsuitability of current
frameworks of knowledge creation for understanding the deep implications
of technology on culture and knowledge and for meeting the world
challenges complexified by the digital. This book wants to signal how the
digital acceleration brought by the 2020 events now adds new urgency to
an issue already identified by scholars some twenty years ago but that now
cannot be procrastinated any further. Hall for instance argued (2002, 128):

We cannot rely merely on the modern “disciplinary” methods and frame-


works of knowledge in order to think and interpret the transformative effect
new technology is having on our culture, since it is precisely these methods
and frameworks that new technology requires us to rethink.

I therefore suggest we stop using the term ‘interdisciplinarity’ alto-


gether. As it contains the word discipline, albeit in reference to breaking,
crossing, transcending disciplines’ boundaries and all the other usual
suspects that typically recur in interdisciplinarity discourse, I believe that
the term continues to refer to the exact same notions of knowledge com-
partmentalisation that the digital transformation requires us to relinquish.
In my view, thinking in these terms is not helpful and does not adequately
respond to the consequences of the digital transformation that society,
higher education and research have undertaken. Based on separateness
and individualism, the current model of knowledge creation restricts our
ability to identify and access the various complexities of reality. Traditional
binary views of deep/significant vs superficial/trivial, digital/non-critical
28 L. VIOLA

vs non-digital/critical and the sciences vs the humanities may appear firm,


but only because we exaggerate their fixity. Similarly, the separation into
disciplines may seem inevitable and fixed, but in reality the majority of
norms and views are arbitrary, neither unavoidable nor final and, therefore,
completely alterable. Weingart, for instance, states (Weingart 2000, 39):

The structures are by no means fixed and irreplaceable, but they are social
constructs, products of long and complex social interactions, subject to social
processes that involve vested interests, argumentation, modes of conviction,
and differential perceptions and communications.

With specific reference to the current model of knowledge creation, for


example, Stichweh (2001) reminds us that the organisation of universities
in academic departments is rather a recent phenomenon, ‘an invention
of nineteenth century society’ (13727); in fact, to paraphrase McKeon,
the apparently monolithic integrity of disciplines as we know them may
sometimes obscure a radically disparate and interdisciplinary core (1994).
The argument I reiterate in this book is that the current landscape requires
us to move from this model, beyond (not away from) thick description of
single-discipline case studies, and to recognise not only that knowledge is
much more fluid than we are accustomed to think, but also that the digital
transcends artificial discipline boundaries.
In the chapters that follow, I take an auto-ethnographic and self-reflexive
approach to show how the application of the post-authentic framework that
I have developed has informed my practice as a humanist in the digital.
More broadly, I show how the framework can guide a conceptualisation
of knowledge creation that transcends discipline boundaries, especially
digital vs non-digital positions. Thinking in terms of in the digital—and
no longer and the digital—thus bears enormous potential for tangibly
undisciplining knowledge, for introducing counter-narratives in the digital
capitalistic discourse, for developing, encouraging and spreading a digital
conscience and for taking an active part in the re-imagination of post-
authentic higher education and research. The world has entered a new
dimension in which knowledge can no longer afford to see technology
and its production simply as instrumental and contextual or as an object of
critique, admiration, fear or envy. In my view, the current landscape is much
more complex and has now much wider implications than those identified
so far. In this book, I want to elaborate on them, not with the purpose of
rejecting previous positions but to provide additional perspectives which
1 THE HUMANITIES IN THE DIGITAL 29

I think are urgently required especially as a consequence of the 2020


pandemic.
In what still is predominantly a binary conceptual framework, e.g.,
the sciences vs the humanities, the humanities vs DH and DH vs CDH,
this book provides a third way: knowledge creation in the digital. The
book argues that the new paradigm shift in the digital—as opposed to
towards—accelerated considerably by the COVID-19 pandemic positions
knowledge creation beyond such outdated dichotomous conceptualisa-
tions. We develop technology at a blistering pace, but so does our capacity
to misuse it, abuse it and do harm. It is therefore everyone’s duty to
argue against any claimed computational neutrality but more importantly
to relinquish outmoded and rather presumptuous perspectives that grant
solely to humanists the moral monopoly right to criticise and critique.
Indeed, as we are all in the digital, critical engagement cannot afford
to remain limited exclusively to a handful of scholars who may or may
not have interest in practice-led digital research—but who are in the
digital nevertheless—as this would tragically create more fragmentation,
polarisation and ultimately harm.
This is not a book about CDH, neither is it a book about DH, nor is it
about the digital and the humanities or the digital in the humanities. What
this book is about is knowledge in the digital.

1.4 OH, THE PLACES YOU’LL GO!


The digital transformation of society—and therefore of academia and
of knowledge creation more generally—will not be stopped, let alone
reversed. The claim I advance in this book is that, whilst a great deal of talk
has so far revolved around the impact of the digital on individual fields,
how the model of knowledge creation should be transformed accordingly
has largely been overlooked. I argue that the increasing complexity of the
world brought about by the digital transformation now demands a new
model of knowledge to understand, explain and respond to the reality
of ubiquitous digital data, algorithmic automated processes, computa-
tional infrastructures, digital platforms and digital objects. I contend that
such engagement should not unfold as coming from a place of criticism
per se but that it should be seized as a historic opportunity for truly
decompartmentalising knowledge and reconfiguring the way we think
about it. A decompartmentalised model of knowledge does not denature
30 L. VIOLA

disciplines but it breaks the current opposing, hierarchical structure in


which disciplines still operate. The digital transformation finally forces us
to go back to the fundamental questions: how do we create knowledge and
how do we want to train our next generation of students?
Be it in the form of data, platforms, infrastructures or tools, across
the humanities, scholars have pointed out the interfering nature of the
digital at different levels and have called for a reconfiguration of research
practice conceptualisations (e.g., Cameron and Kenderdine 2007; Drucker
2011, 2020; Braidotti 2019; Cameron 2021; Fickers 2022). Fickers,
for instance, proposes digital hermeneutics as a helpful framework to
address both the archival and historiographical issues ‘raised by changing
logics of storage, new heuristics of retrieval, and methods of analysis and
interpretation of digitized data’ (2020, 161). In this sense, the digital
hermeneutics framework combines critical reflection on historical practice
as well as digital literacy, for instance by embedding digital source criticism,
a reflection on the consequences for the epistemology of history of the
transformation from sources to data through digitisation.
With specific reference to cultural heritage concepts and their relation
to the digital, Cameron (2007; 2021) refigures digital cultural heritage
curation practices and digital museology by problematising digital cul-
tural heritage as societal data, entities with their own forms of agency,
intelligence and cognition (Cameron 2021). By reflecting on the wider
consequences of the digital on heritage for future generations includ-
ing Western perspectives, climate change, environmental destruction and
injustice, the scholar proposes a more-than-human digital museology
framework which recognises the impact of AI, automated systems and
infrastructures as part of a wider ecology of components in digital cultural
heritage practices.
On the mediating role of the digital for the visual representation of mate-
rial destined to humanistic enquiry, Drucker (2004; 2011; 2013; 2014;
2020) has also long advocated a critical stance and a more problematised
approach. She has, for example proposed alternative ways of visualising
digital material that would expose rather than hiding the different stages
of mediation, interpretation, selection and categorisation that typically
disappear in the final graphical display. Her work introduces an important
counter-narrative in the public and academic discourse which predom-
inantly exalts data, computational processes and digital visualisations as
unarguable and exact.
1 THE HUMANITIES IN THE DIGITAL 31

These contributions are all unmistakable signs of the decreasing rele-


vance of the current model of knowledge production following the digital
transformation of society and of the fact that the notion that the digital is
something that ‘happens’ to knowledge creation is entirely anachronistic
now. At the same time, however, these past approaches insist on disciplinary
competence and indeed are modulated primarily within the fields and for
the disciplines they originate from (e.g., digital history, digital cultural
heritage, the humanities). The post-authentic framework that I propose
here attempts to break with the ‘paradox of interdisciplinarity’ in relation
to the digital, for which knowledge is not truly undisciplined but the digital
is incorporated in existing fields and creates yet new fields, hence new
boundaries. The post-authentic framework incorporates all these recent
perspectives but at the same time it goes beyond them; as it intentionally
refers to digital objects rather than to the disciplines within which they
are created, it provides an architecture for issues such as transparency,
replicability, Open Access, sustainability, accountability and visual display
with no specific reference to any discipline.
I build my argument for advocating the post-authentic framework to
digital knowledge creation and digital objects upon recent theories of
critical posthumanities (Braidotti 2017; Braidotti and Fuller 2019). In
recognising that current terminologies and methods for posthuman knowl-
edge production are inadequate, critical posthumanities offers a more
holistic perspective on knowledge creation, and it is therefore particularly
relevant to the argument I advance in this book. With specific reference
to the need for novel notions that may guide a reconceptualisation of
knowledge creation, Braidotti and Fuller (Braidotti 2017; Braidotti and
Fuller 2019) propose Transversal Posthumanities, a theoretical framework
for the Critical Posthumanities. With this framework, they introduce the
concept of transversality, a term borrowed from geometry that refers to
the understanding of spaces in terms of their intersection (Braidotti and
Fuller 2019, 1). Although the main argument I advance in this book is
also that of an urgent need for knowledge reconfiguration, I maintain
that transversality still suggests a view of knowledge as solid and thus
it only partially breaks with the outdated conceptualisation of discipline
compartmentalisation that aims to relinquish. To actualise a remodelling of
knowledge, I introduce two concepts: symbiosis and mutualism. In Chap. 2,
I explain how the notion of symbiosis—from Greek ‘living together’—
embeds in itself the principle of knowledge as fluid and inseparable.
Similarly, borrowed from biology, the notion of mutualism proposes that
32 L. VIOLA

areas of knowledge do not compete against each other but benefit from a
mutually compensating relationship. Building on the notion of monism in
posthuman theory (Braidotti and Fuller 2019, 16) (cfr. Sect. 1.2) in which
differences are not denied but which at the same time do not function
hierarchically, symbiosis and mutualism help refigure our understanding
of knowledge creation not as a space of conflict and competition but as a
space of fluid interactions in which differences are understood as mutually
enriching.
Symbiosis and mutualism are central concepts of the post-authentic
framework that I propose in this book, a theoretical framework for knowl-
edge creation in the digital. If collaboration across areas of knowledge has
so far been largely an option, often motivated more by a grant-seeking
logic than by genuine curiosity, the digital calls for an actual change in
knowledge culture. The question we should ask ourselves is not ‘How can
we collaborate?’ but ‘How can we contribute to each other?’. Concepts such
as those of symbiosis and mutualism could equally inform our answer when
asking ourselves the question ‘How do we want to create knowledge and
how do we want to train our next generation of students?’.
To answer this question, the post-authentic framework starts by recon-
ceptualising digital objects as much more complex entities than just
collections of data points. Digital objects are understood as the conflation
of humans, entities and processes connected to each other according to
the various forms of power embedded in computational processes and
beyond and which therefore bear consequences (Cameron 2021). As
such, digital objects transcend traditional questions of authenticity because
digital objects are never finished nor they can be finished. Countless
versions can continuously be created through processes that are shaped
by past actions and in turn shape the following ones. Thus, in the post-
authentic framework, the emphasis is on both products and processes
which are acknowledged as never neutral and as incorporating external,
situated systems of interpretation and management. Specifically, I take
digitised cultural heritage material as an illustrative case of a digital object
and I demonstrate how the post-authentic framework can be applied to
knowledge creation in the digital. Throughout the chapters of this book,
I devote specific attention to four key aspects of knowledge creation in
the digital: creation of digital material in Chap. 2, enrichment of digital
material in Chap. 3, analysis of digital material in Chap. 4, and visualisation
of digital material in Chap. 5.
1 THE HUMANITIES IN THE DIGITAL 33

The second content chapter, Chap. 3, focuses on the application of


the post-authentic framework to the task of enriching digital material;
I use DeXTER – DeepteXTminER4 and ChroniclItaly 3.0 (Viola and
Fiscarelli 2021a) as case examples. DeXTER is a workflow that implements
deep learning techniques to contextually augment digital textual material;
ChroniclItaly 3.0 is a digital heritage collection of Italian American news-
papers published in the United States between 1898 and 1936. In the
chapter, I show how symbiosis and mutualism have guided each action of
DeXTER’s enrichment workflow, from pre-processing to data augmenta-
tion. My aim is to exemplify how the post-authentic framework can guide
interaction with the digital not as a strategic (grant-oriented) or instrumen-
tal (task-oriented) collaboration but as a cognitive mutual contribution. I
end the chapter arguing that the task of augmenting information of cultural
heritage material holds the responsibility of building a source of knowledge
for current and future generations. In particular, the use of methods such
as named entity recognition (NER), geolocation, and sentiment analysis
(SA) requires a thorough understanding of the assumptions behind these
techniques, constant update and critical supervision. In the chapter, I
specifically discuss the ambiguities and uncertainties of these methods and I
show how the post-authentic framework can help address these challenges.
In Chap. 4, I illustrate how the post-authentic framework can be applied
to the analysis of a digital object through the example of topic modelling,
a distant reading method born in computer science and widely used in the
humanities to mine large textual repositories. In particular, I highlight how
through the deep understanding of the assemblage of culture and technol-
ogy in the software, the post-authentic framework can guide us towards
exploring, questioning and challenging the interpretative potential of com-
putation. Drawing on the mathematical concepts of discrete vs continuous
modelling of information, in the chapter I reflect on the implications
for knowledge creation of the transformation of continuous material into
discrete form, binary sequences of 0s and 1s, and I especially focus on the
notions of causality and correlations. I then illustrate the example of topic
modelling as a computational technique that treats continuous material
such as a collection of texts as discrete data. I bring critical attention to
problematic aspects of topic modelling that are highly dependent on the
sources: pre-processing, corpus preparation and deciding on the number of
topics. The topic modelling example ultimately shows how post-authentic
knowledge creation can be achieved through a sustained engagement with
software, also in the form of a continuous exchange between processes
34 L. VIOLA

and sources. Guided by symbiosis and mutualism, such dialogue maintains


the interconnection between two parallel goals: output—any processed
information—and outcome, the value resulting from the output (Patton
2015).
Operating within the post-authentic framework crucially means
acknowledging digital objects as having far-reaching, unpredictable
consequences; as the complex pattern of interrelationships among
processes and actors continually changes, interventions and processes
must always be critically supervised. One such process is the provision of
access to digital material through visualisation. In Chap. 5, I argue that
the post-authentic framework can help highlight the intrinsic dynamic,
situated, interpreted and partial nature of computational processes and
digital objects. Thus, whilst appreciating the benefits of visualising digital
material, the framework rejects an uncritical adoption of digital methods
and it opposes the main discourse that still presents graphical techniques
and outputs as exact, final, unbiased and true. In the chapter, I illustrate
how the post-authentic framework can be applied to the visualisation of
cultural heritage material by discussing two examples: efforts towards the
development of a user interface (UI) for topic modelling and the design
choices for developing the app DeXTER, the interactive visualisation
interface that explores ChroniclItaly 3.0. Specifically, I present work
done towards visualising the ambiguities and uncertainties of topic
modelling, network analysis (NA) and SA, and I show how key concepts
and methods of the post-authentic framework can be applied to digital
knowledge visualisation practices. I centre my argumentation on how the
acknowledgement of curatorial practices as manipulative interventions can
be encoded in the interface. I end the discussion by arguing that it is in
fact through the interface display of the ambiguities and uncertainties of
these methods that the active and critical participation of the researcher
is acknowledged as required, keeping digital knowledge honest and
accountable.
In the final chapter, Chap. 6, I review the main formulations of this book
project and I retrace the key concepts and values at the foundation of the
post-authentic framework proposed here. I end the chapter with a few
additional propositions for remodelling the process of digital knowledge
production that could be adopted to inform the restructurin of academic
and higher education programmes.
1 THE HUMANITIES IN THE DIGITAL 35

NOTES
1. The Gartner Hype Cycle of technology is a cycle model that explains a
generally applicable path a technology takes in terms of expectations. It states
that after the initial, overly positive reception follows a ‘Trough of Disillu-
sionment’ during which the hype collapses due to disappointed expectations.
Some technologies manage to then climb the ‘Slope of Enlightenment’ to
eventually plateau to a status of steady productivity.
2. This is not to mistake for the Amazon Mechanical Turk which is a crowd-
sourcing website that facilitates the remote hiring of ‘crowdworkers’ to
perform on-demand tasks that cannot be handled by computers. It is
operated under Amazon Web Services and is owned by Amazon.
3. Compound annual growth rate.
4. https://github.com/lorellav/DeXTER-DeepTextMiner.

Open Access This chapter is licensed under the terms of the Creative Commons
Attribution 4.0 International License (http://creativecommons.org/licenses/by/
4.0/), which permits use, sharing, adaptation, distribution and reproduction in any
medium or format, as long as you give appropriate credit to the original author(s)
and the source, provide a link to the Creative Commons license and indicate if
changes were made.
The images or other third party material in this chapter are included in the
chapter’s Creative Commons license, unless indicated otherwise in a credit line
to the material. If material is not included in the chapter’s Creative Commons
license and your intended use is not permitted by statutory regulation or exceeds
the permitted use, you will need to obtain permission directly from the copyright
holder.
CHAPTER 2

The Importance of Being Digital

A perspective is by nature limited. It offers us one single vision of a landscape.


Only when complementary views of the same reality combine are we capable
of achieving fuller access to the knowledge of things. The more complex
the object we are attempting to apprehend, the more important it is to
have different sets of eyes, so that these rays of light converge and we can
see the One through the many. That is the nature of true vision: it brings
together already known points of view and shows others hitherto unknown,
allowing us to understand that all are, in actuality, part of the same thing.
(Grothendieck 1986)

2.1 AUTHENTICITY, COMPLETENESS AND THE


DIGITAL
For the past twenty years, digital tools, technologies and infrastructures
have played an increasingly determining role in framing how digital objects
are understood, preserved, managed, maintained and shared. Even in tra-
ditionally object-centred sectors such as cultural heritage, digitisation has
become the norm: heritage institutions such as archives, libraries, museums
and galleries continuously digitise huge quantities of heritage material. The
most official indication of this shift towards the digital in cultural heritage is
perhaps provided by UNESCO which, in 2003, recognised that the world’s
documentary heritage was increasingly produced, distributed, accessed
and maintained in digital form; accordingly, it proclaimed digital heritage

© The Author(s) 2023 37


L. Viola, The Humanities in the Digital: Beyond Critical Digital
Humanities, https://doi.org/10.1007/978-3-031-16950-2_2
38 L. VIOLA

as common heritage (UNESCO 2003). Unsurprisingly yet significantly,


the acknowledgement was made in the context of endangered heritage,
including digital, whose conservation and protection must be considered
‘an urgent issue of worldwide concern’ (ibid.).
The document also officially distinguished between heritage created
digitally (from then on referred to as digitally born heritage), that is,
heritage for which no other format but the digital object exists, and
digitised heritage, heritage ‘converted into digital form from existing
analogue resources’ (UNESCO 2003). Therefore, as per heritage tradition,
the semantic motivation behind digitisation was that of preserving cultural
resources from feared deterioration or forever disappearance. It has been
argued, however, that by distinguishing between the two types of digital
heritage, the UNESCO statement de facto framed the digitisation process
as a heritagising operation in itself (Cameron 2021). Consequently, to
the classic cultural heritage paradigm ‘preserved heritage = heritage worth
preserving’, UNESCO added another layer of complexity: the equation
‘digitised = preserved’ (ibid.).
UNESCO’s acknowledgement of digital heritage and in particular of
digitised heritage as common heritage has undoubtedly had profound
implications for our understanding of heritage practices, material culture
and preservation. For example, by officially introducing the digital in
relation to heritage, UNESCO’s statement deeply affected traditional
notions of authenticity, originality, permanent preservation and complete-
ness which have historically been central to heritage conceptualisations.
For the purposes of this book, I will simplify the discussion1 by saying
that more traditional positions have insisted on the intrinsic lack of
authority of copies, what Benjamin famously called the ‘aura’ of an object
(Benjamin 1939). Museums’ culture has conventionally revolved around
these traditional, rigid rules of originality and authenticity, established
as the values legitimising them as the only accredited custodians of true
knowledge. Historically, such understanding of heritage has sadly gone
hand in hand with a very specific discourse, the one dominated by Western
perspectives. These views have been based on ideas of old, grandiose sites
and objects as being the sole heritage worthy of preservation which have
in turn perpetuated Western narratives of nation, class and science (ACHS
2012).
More recent scholarship, however, has moved away from such object-
centred views and reworked conventional conceptualisations of authentic-
ity and completeness in relation to the digital (see for instance, Council
2 THE IMPORTANCE OF BEING DIGITAL 39

on Library and Information Resources 2000; Jones et al. 2018; Gori-


unova 2019; Zuanni 2020; Cameron 2021; Fickers 2021). From the
1980s onwards, for example, the influence wielded by postmodernism
and post-colonialism theories has challenged these traditional frameworks
and brought new perspectives for the conceptualisation of material culture
(see for instance, Tilley 1989; Vergo 1989). The idea key to this new
approach particularly relevant to the arguments advanced in this book is
that material culture does not intrinsically possess any meanings; instead,
meanings are ascribed to material culture when interpreting it in the
present. As Christopher Y. Tilley famously stated, ‘The meaning of the past
does not reside in the past, but belongs in the present’ (Tilley 1989, 192).
According to this perspective, the significance of material culture is not
eternal and absolute but continually negotiated in a dialectical relationship
with contemporary values and interactions. For example, in disciplines such
as museum studies, this view takes the form of a critique of the social and
political role of heritage institutions. Through this lens, museums are not
seen as neutral custodians of material culture but as grounded in Western
ideologies of elitism and power and representing the interests of only a
minority of the population (Vergo 1989).
Such considerations have led to the emergence of new disciplines such
as Critical Heritage Studies (CHS). In CHS, heritage is understood as a
continuous negotiation of past and present modularities in the acknowl-
edgement that heritage values are not fixed nor universal, rather they
are culturally situated and constantly co-constructed (Harrison 2013).
Though still aimed at preserving and managing heritage for future genera-
tions, CHS are resolutely concerned with questions of power, inequality
and exploitation (Hall 1999; Butler 2007; Winter 2011) thus showing
much of the same foci of interest as critical posthumanities (Braidotti 2019)
and perfectly intersecting with the post-authentic framework I propose in
this book.
The official introduction of the digital in the context of cultural heritage
has necessarily become intertwined with the political and ideological legacy
concerning traditional notions of original and authentic vs copies and
reproductions. Simplistically seen as mere immaterial copies of the original,
digital objects could not but severely disrupt these fundamental values, in
some cases going as far as being framed as ‘terrorists’(Cameron 2007, 51),
that is destabilising instruments of what is true and real. In an effort to
defend material authenticity as the sole element defining meaning, digital
40 L. VIOLA

artefacts were at best bestowed an inferior status in comparison to the


originals, a servant role to the real.
The parallel with DH vs ‘mainstream humanities’ is hard to miss (cfr.
Chap. 1). In 2012, Alan Liu had defined DH as ‘ancillary’ to mainstream
humanities (Liu 2012), whereas others (Allington et al. 2016; Brennan
2017, e.g.,) had claimed that by incorporating the digital into the human-
ities, its very essence, namely agency and criticality, was violated, one
might say polluted. In opposition to the analogue, the digital was seen
as an immaterial, agentless and untrue threatening entity undermining the
authority of the original. Similar to digital heritage objects, these criticisms
of DH did not problematise the digital but simplistically reduced it to a
non-human, uncritical entity.
Nowadays, this view is increasingly challenged by new conceptual
dimensions of the digital; for instance Jones et al. (2018) argue that ‘a
preoccupation with the virtual object—and the binary question of whether
it is or is not authentic—obscures the wider work that digital objects
do’ (Jones et al. 2018, 350). Similarly, in her exploration of the digital
subject, Olga Goriunova (2019) reworks the notion of distance in Valla
and Benenson’s artwork in which a digital artefact is described as ‘neither
an object nor its representation but a distance between the two’ (2014).
Far from being a blank void, this distance is described as a ‘thick’ space in
which humans, entities and processes are connected to each other (ibid.,
4) according to the various forms of power embedded in computational
processes. According to this view, the concept of authenticity is considered
in relation to the digital subject, i.e., the digital self, which is rethought
as a much more complex entity than just a collection of data points and
at the same time, not quite a mere extension of the self. More recently,
Cameron (2021) states that in the context of digital cultural heritage, the
very conceptualisation of a digital object escapes Western ideas of curation
practices, and authenticity ‘may not even be something to aspire to’ (15).
This chapter wants to expand on these recent positions, not because I
disagree with the concepts and themes expressed by these authors, but
because I want to add a novel reflection on digital objects, including
digital heritage, and on both theory and practice-oriented aspects of
digital knowledge creation more widely. I argue that such aspects are in
urgent need of reframing not solely in museum and gallery practices, and
heritage policy and management, but crucially also in any context of digital
knowledge production and dissemination where an outmoded framework
of discipline compartmentalisation persists. Taking digital cultural heritage
2 THE IMPORTANCE OF BEING DIGITAL 41

as an illustrative case of a digital object typical of humanities scholarship, I


devote specific attention to the way in which digitisation has been framed
and understood and to the wider consequences for our understanding of
heritage, memory and knowledge.

2.2 DIGITAL CONSEQUENCES


This book challenges traditional notions of authenticity by arguing for
a reconceptualisation of the digital as an organic entity embedding past,
present and future experiences which are continuously renegotiated during
any digital task (Cameron 2021). Specifically, I expand on what Cameron
calls the ‘ecological composition concept’ (ibid., 15) in reference to digital
cultural heritage curation practices to include any action in a digital
setting, also understood as bearing context and therefore consequences.
She argues that the act of digitisation does not merely produce immaterial
copies of their analogue counterparts—as implied by the 2003 UNESCO
statement with reference to digitised cultural heritage—but by creating
digital objects, it creates new things which in turn become alive, and
which therefore are themselves subject to renegotiation. I further argue
that any digital operation is equally situated, never neutral as each in turn
incorporates external, situated systems of interpretation and management.
For example, the digitisation of cultural heritage has been discursively
legitimised as a heritigising operation, i.e., an act of preservation of cultural
resources from deterioration or disappearance. Though certainly true to an
extent, preservation is only one of the many aspects linked to digitisation
and by far not the only reason why governments and institutions have
started to invest massively in it. In line with the wider benefits that digiti-
sation is thought to bring at large (cfr. Chap. 1), the digitisation of cultural
heritage is believed to serve a range of other more strategic goals such as
fuelling innovation, creating employment opportunities, boosting tourism
and enhancing visibility of cultural sites including museums, libraries and
archives, all together leading to economic growth (European Commission
2011).
Inevitably, the process of cultural heritage digitisation itself has therefore
become intertwined with questions of power, economic interests, ideolog-
ical struggles and selection biases. For instance, after about two decades of
major, large-scale investments in the digitisation of cultural heritage, self-
reported data from cultural heritage institutions indicate that in Europe,
42 L. VIOLA

only about 20% of heritage material exists in a digital format (Enumerate


Observatory 2017), whereas globally, this percentage is believed to remain
at 15%.2 Behind these percentages, it is very hard not to see the colonial
ghosts from the past. CHS have problematised heritage designation not
just as a magnanimous act of preserving the past, but as ‘a symbol of
previous societies and cultures’ (Evans 2003, 334). When deciding which
societies and whose cultures, political and economic interests, power
relations and selection biases are never far away. For example, particularly
in the first stages of large-scale mass digitisation projects, special collections
often became the prioritised material to be digitised (Rumsey and Digital
Library Federation 2001), whereas less mainstream works and minority
voices tended to be largely excluded. Typically, libraries needed to decide
what to digitise based on cost-effective analyses and so their choices were
often skewed by economic imperatives rather than ‘actual scholarly value’
(Rumsey and Digital Library Federation 2001). The UNESCO-induced
paradigm ‘digitising = preserving’ contributed to communicate the idea
that any digitised material was intrinsically worth preserving, thus in
turn perpetuating previous decisions about what had been worth keeping
(Crymble 2021).
There is no doubt that today’s under-representation of minority voices
in digital collections directly mirrors decades of past decisions about what
to collect and preserve (Lee 2020). In reference to early US digitisation
programmes, for example, Rumsey Smith points out that as a direct
consequence of this reasoning:

foreign language materials are nearly always excluded from consideration,


even if they are of high research value, because of the limitations of optical
character recognition (OCR) software and because they often have a limited
number of users. (Rumsey and Digital Library Federation 2001, 6)

This has in turn had other repercussions. As most of the digitised


material has been in English, tools and software for exploring and analysing
the past have primarily been developed for the English language. Although
in recent years greater awareness around issues of power, archival biases,
silences in the archives and lack of language diversity within the context of
digitisation has certainly developed not just in archival and heritage studies,
but also in DH and digital history (see for instance, Risam 2015; Putnam
2016; Earhart 2019; Mandell 2019; McPherson 2019; Noble 2019), the
fact remains that most of that 15% is the sad reflection of this bitter legacy.
2 THE IMPORTANCE OF BEING DIGITAL 43

Another example of the situated nature of digitisation is microfilming.


In his famous investigative book Double Fold, Nicholson Baker (2002) doc-
uments in detail the contextual, economic and political factors surrounding
microfilming practices in the United States. Through a zealous investiga-
tion, he tells us a story involving microfilm lobbyists, former CIA agents
and the destruction of hundreds of thousands of historical newspapers.
He pointedly questions the choices of high-profile figures in American
librarianship such as Patricia Battin, previous Head Librarian of Columbia
University and the head of the American Commission on Preservation
and Access from 1987 to 1994. From the analysis of government records
and interviews with persons of interest, Baker argues that Battin and the
Commission pitched the mass digitisation of paper records to charitable
foundations and the American government by inventing the ‘brittle book
crisis’, the apparent rapid deterioration that was destroying millions of
books across America (McNally 2002). In reality, he maintains, her con-
vincing was part of an agenda to provide content for the microfilming
technology.
In advocating for preservation, Baker also discusses the limitations of
digitisation and some specific issues with microfilming, such as loss of
colour and quality and grayscale saturation. Such issues have had over
the years unpredictable consequences, particularly for images. In historical
newspapers, some images used to be printed through a technique called
rotogravure, a type of intaglio printing known for its good quality image
reproduction and especially well-suited for capturing details of dark tones.
Scholars (i.e., Williams 2019; Lee 2020) have pointed out how the
grayscale saturation issue of microfilming directly affects images of Black
people as it distorts facial features by achromatising the nuances. In the
case of millions and millions of records of images digitised from microfilm
holdings, such as the 1.56 million images in the Library of Congress’
Chronicling America collection, it has been argued that the microfilming
process itself has acted as a form of oppression for communities of colour
(Williams 2019). This together with several other criticisms concerning
selection biases have led some authors to talk about Chronicling White
America (Fagan 2016).
In this book I argue in favour of a more problematised conceptualisation
of digital objects and digital knowledge creation as living entities that bear
consequences. To build my argument, I draw upon posthuman critical the-
ory which understands the matter as an extremely convoluted assemblage
of components, ‘complex singularities relate[d] to a multiplicity of forces,
44 L. VIOLA

entities, and encounters’(Braidotti 2017, 16). Indeed, for its deconstruct-


ing and disruptive take, I believe the application of posthumanities theories
has great potential for refiguring traditional humanist forms of knowledge.
Although I discuss examples of my own research based on digital cultural
heritage material, my aim is to offer a counter-narrative beyond cultural
heritage and with respect to the digitisation of society. My intention is to
challenge the dominant public discourse that continues to depict the digital
as non-human, agentless, non-authentic and contextless and by extension
digital knowledge as necessarily non-human, cultureless and bias-free. The
digitisation of society sharply accelerated by the COVID-19 pandemic has
added complexity to reality, precipitating processes that have triggered
reactions with unpredictable, potentially global consequences. I therefore
maintain that with respect to digital objects, digital operations and to the
way in which we use digital objects to create knowledge, it is the notion of
the digital itself that needs reframing. In the next section, I introduce the
two concepts that may inform such radical reconfiguration: symbiosis and
mutualism.

2.3 SYMBIOSIS, MUTUALISM AND THE DIGITAL


OBJECT
This book recognises the inadequacy of the traditional model of knowledge
creation, but it also contends that the 2020 pandemic-induced pervasive
digitisation has added further urgency to the point that this change can
no longer be deferred. Such re-figured model, I argue, must conceptualise
the digital object as an organic, dynamic entity which lives and evolves
and bears consequences. It is precisely the unpredictability and long-term
nature of these consequences that now pose extremely complex questions
which the current rigid, single discipline-based model of knowledge cre-
ation is ill-equipped to approach.3 This book is therefore an invitation for
institutions as well as for us as researchers and teachers to address what
it means to produce knowledge today, to ask ourselves how we want our
digital society to be and what our shared and collective priorities are, and
so to finally produce the change that needs to happen.
As a new principle that goes beyond the constraints of the canonical
forms, posthuman critical theory has proposed transversality, ‘a pragmatic
method to render problems multidimensional’ (Braidotti and Fuller 2019,
1). With this notion of geometrical transversality that describes spaces ‘in
2 THE IMPORTANCE OF BEING DIGITAL 45

terms of their intersection’ (ibid., 9), posthuman critical theory attempts to


capture ‘relations between relations’. I argue, however, that the suggested
image of a transversal cut across entities that were previously disconnected,
e.g., disciplines, does not convey the idea of fluid exchanges; rather, it
remains confined in ideas of separation and interdisciplinarity and therefore
it only partially breaks with the outdated conceptualisations of knowledge
compartmentalisation that it aims to disrupt. The term transversality, I
maintain, ultimately continues to frame knowledge as solid and essentially
separated.
This book firmly opposes notions of divisions, including a division of
knowledge into monolithic disciplines, as they are based on models of
reality that support individualism and separateness which in turn inevitably
lead to conflict and competition. To support my argument of an urgent
need for knowledge reconfiguration and for new terminologies, I propose
to borrow the concept of symbiosis from biology. The notion of symbiosis
from Greek ‘living together’ refers in biology to the close and long-
term cooperation between different organisms (Sims 2021). Applied to
knowledge remodelling and to the digital, symbiosis radically breaks with
the current conceptualisation of knowledge as a separate, static entity,
linear and fragmented into multiple disciplines and of the digital as an
agentless entity. To the contrary, the term symbiosis points to the continual
renegotiation in the digital of interactions, past, present and future systems,
power relations, infrastructures, interventions, curations and curators,
programmers and developers (see also Cameron 2021).
Integral to the concept of symbiosis is that of mutualism; mutualism
opposes interspecific competition, that is, when organisms from different
species compete for a resource, resulting in benefiting only one of the
individuals or populations involved (Bronstein 2015). I maintain that the
current rigid separation in disciplines resembles an interspecific competi-
tion dynamic as it creates the conditions for which knowledge production
has become a space of conflict and competition. As it is not only outdated
and inadequate but indeed deeply concerning, I therefore argue that the
contemporary notion of knowledge should not simply be redefined but
that it should be reconceptualised altogether. Symbiosis and mutualism
embed in themselves the principle of knowledge as fluid and inseparable in
which areas of knowledge do not compete against each other but benefit
from a mutually compensating relationship. When asking ourselves the
questions ‘How do we produce knowledge today?’, ‘How do we want our
next generation of students to be trained?’, the concepts of symbiosis and
46 L. VIOLA

mutualism may guide the new reconfiguration of our understanding of


knowledge in the digital.
Symbiosis and mutualism are central notions for the development of a
more problematised conceptualisation of digital objects and digital knowl-
edge production. Expanding on Cameron’s critique of the conceptual
attachment to digital cultural heritage as possessing a complete quality
of objecthood (Cameron 2021, 14), I maintain that it is not just digital
heritage and digital heritage practices that escape notions of completeness
and authenticity but in fact all digital objects and all digital knowledge
creation practices. According to this conceptualisation, any intervention
on the digital object (e.g., an update, data augmentation interventions,
data creation for visualisations) should always be understood as the sum of
all the previously made and concurrent decisions, not just by the present
curator/analyst, but by external, past actors, too (see for instance, the
example of microfilming discussed in Sect. 2.2). These decisions in turn
shape and are shaped by all the following ones in an endless cycle that
continually transforms and creates new object forms, all equally alive, all
equally bearing consequences for present and future generations. This is
what Cameron calls the ‘more-than-human’, a convergence of the human
and the technical.
I maintain, however, that the ‘more-than-human’ formulation still
presupposes a lack of human agency in the technical (the supposedly non-
human) and therefore a yet again binary view of reality. In Cameron’s view,
the more-than-human arises from the encounter of human agency with the
technical, which therefore would not possess agency per se. But agency
does not uniquely emerge from the interconnections between let’s say the
curator (what could be seen as ‘the human’) and the technical components
(i.e., ‘the non-human’) because there is no concrete separation between the
human and the technical and in truth, there is no such a thing as neutral
technology (see Sect. 1.2). For example, in the practices of early large-
scale digitisation projects, past decisions about what to (not) digitise have
eventually led to the current English-centric predominance of data-sets,
software libraries, training models and algorithms. Using this technology
today contributes to reinforce Western, white worldviews not just in digital
practices, but in society at large.
Hence, if Cameron believes that framing digital heritage as ‘possessing
a fundamental original, authentic form and function […] is limiting’
(ibid.,12), I elaborate further and maintain that it is in fact misleading.
2 THE IMPORTANCE OF BEING DIGITAL 47

Indeed, in constituting and conceptualising digital objects, the question of


whether it is or it is not authentic truly doesn’t make sense; digital objects
transcend authenticity; they are post-authentic. To conceptualise digital
objects as post-authentic means to understand them as unfinished processes
that embed a wide net of continually negotiable relations of multiple
internal and external actors, past, present and future experiences; it means
to look at the human and the technical as symbiotic, non-discriminable
elements of the digital’s immanent nature which is therefore understood as
situated and consequential. To this end, I introduce a new framework that
could inform practices of knowledge reconfiguration: the post-authentic
framework. The post-authentic framework problematises digital objects
by pointing to their aliveness, incompleteness and situatedness, to their
entrenched power relations and digital consequences. Throughout the
book, I will unpack key theoretical concepts of the post-authentic frame-
work and, through the illustration of four concrete examples of knowledge
creation in the digital—creation of digital material, enrichment of digital
material, analysis of digital material and visualisation of digital material—I
evaluate its full implications for knowledge creation.

2.4 CREATION OF DIGITAL OBJECTS


The post-authentic framework acknowledges digital objects as situated,
unfinished processes that embed a wide net of continually negotiable
relations of multiple actors. It is within the post-authentic framework that
I describe the creation of ChroniclItaly 3.0 (Viola and Fiscarelli 2021a), a
digital heritage collection of Italian American newspapers published in the
United States by Italian immigrants between 1898 and 1936. I take the
formation and curation of this collection as a use case to demonstrate how
the post-authentic framework can inform the creation of a digital object
in general, reacting to and impacting on institutional and methodological
frameworks for knowledge creation. In the case of ChroniclItaly 3.0, this
includes effects on the very conceptualisation of heritage and heritage
practices.
Being the third version of the collection, ChroniclItaly 3.0 is in itself a
demonstration of the continuously and rapidly evolving nature of digital
research and of the intrinsic incompleteness of digital objects. I created
the first version of the collection, ChroniclItaly (Viola 2018) within the
framework of the Transatlantic research project Oceanic Exchanges (OcEx)
48 L. VIOLA

(Cordell et al. 2017). OcEx explored how advances in computational


periodicals research could help historians trace and examine patterns of
information flow across national and linguistic boundaries in digitised
nineteenth-century newspaper corpora. Within OcEx, our first priority
was therefore to study how news and concepts travelled between Europe
and the United States and how, by creating intricate entanglements of
informational exchanges, these processes resulted in transnational linguistic
and cultural contact phenomena. Specifically, we wanted to investigate
how historical newspapers and Transatlantic reporting shaped social and
cultural cohesion between Europeans in the United States and in Europe.
One focus was specifically on the role of migrant communities as nodes
in the Transatlantic transfer of culture and knowledge (Viola and Verheul
2019a). As the main aim was to trace the linguistic and cultural changes
that reflected the migratory experience of these communities, we first
needed to obtain large quantities of diasporic newspapers that would
be representative of the Italian ethnic press at the time. Because of the
project’s time and costs limitations, such sources needed to be available for
computational textual analysis, i.e., already digitised. This is why I decided
to machine harvest the digitised Italian American newspapers from Chron-
icling America,4 the Open Access, Internet-based Library of Congress
directory of digitised historical newspapers published in the United States
from 1777 to 1963. Chronicling America is also an ongoing digitisation
project which involves the National Digital Newspaper Program (NDNP),
the National Endowment for the Humanities (NEH), and the Library of
Congress. Started in 2005, the digitisation programme continuously adds
new titles and issues through the funding of digitisation projects awarded
to external institutions, mostly universities and libraries, and thus in itself
it encapsulates the intrinsic incompleteness of digital infrastructures and
digital objects and the far-reaching network of influencing factors and
actors involved.
This wider net of interrelations that influence how digital objects come
into being and which equally influenced the ChroniclItaly collections
can be exemplified by the criteria to receive the Chronicling America
grant. In line with the main NDNP’s aim ‘to create a national digital
resource of historically significant newspapers published between 1690
and 1963, from all the states and U.S. territories’ (emphasis mine NEH
2021, 1), institutions should digitise approximately 100,000 newspaper
pages representing their state. How this significance is assessed depends
on four principles. First, titles should represent the political, economic
2 THE IMPORTANCE OF BEING DIGITAL 49

and cultural history of the state or territory; second, titles recognised as


‘papers of record’, that is containing ‘legal notices, news of state and
regional governmental affairs, and announcements of community news
and events’ are preferred (ibid., 2). Third, titles should cover the majority
of the population areas, and fourth, titles with longer chronological runs
and that have ceased publication are prioritised. Additionally, applicants
must commit to assemble an advisory board including scholars, teachers,
librarians and archivists to inform the selection of the newspapers to be
digitised. The requirement that most heavily conditions which titles are
included in Chronicling America, however, is the existence of a complete,
or largely complete microfilm ‘object of record’ with priority given to
higher-quality microfilms. In terms of technical requirements, this criterion
is adopted for reasons of efficiency and cost; however, as in the past
microfilming practices in the United States were entrenched in a complex
web of interrelated factors (cfr. Sect. 2.2), the impact of this criterion on
the material included in the directory incorporates issues such as previous
decisions of what was worth microfilming and more importantly, what was
not.
Furthermore, to ensure consistency across the diverse assortment of
institutions involved over the years and throughout the various grant cycles,
the programme provides awardees with further technical guidelines. At
the same time, however, these guidelines may cause over-representation
of larger or mainstream publications; therefore, to counterbalance this
issue, titles that give voice to under-represented communities are highly
encouraged. Although certainly mitigated by multiple review stages (i.e.,
by each state awardee’s advisory board, by the NEH and peer review
experts), the very constitutional structure of Chronicling America reveals
the far-reaching net of connections, economic and power relations, mul-
tiple actors and factors influencing the decisions about what to digitise.
Significantly, it exposes how digitisation processes are intertwined with
individual institutions’ research agendas and how these may still embed
and perpetuate past archival biases.
The creation of ChroniclItaly therefore ‘inherits’ all these decisions
and processes of mediation and in turn embeds new ones such as those
stemming from the research aims of the project within which it was created,
i.e., OcEx, and the expertise of the curator, i.e., myself. At this stage, for
example, we decided to not intervene on the material with any enriching
operation as ChroniclItaly mainly served as the basis for a combination
of discourse and text analyses investigations that could help us research to
50 L. VIOLA

which extent diasporic communities functioned as nodes and contact zones


in the Transatlantic transfer of information.
As we explored the collection further, we realised however that to limit
our analyses to text-based searches would not exploit the full potential of
the archive; we therefore expanded the project with additional grant money
earned through the Utrecht University’s Innovation Fund for Research in
IT. We made a case for the importance of experimenting with computa-
tional methodologies that would allow humanities scholars to identify and
map the spatial dimension of digitised historical data as a way to access
subjective and situational geographical markers. It is with this aim in mind
that I created ChroniclItaly 2.0 (Viola 2019), the version of the collection
annotated with referential entities (i.e., people, places, organisations). As
part of this project, we also developed the app GeoNewsMiner (GNM)5
(Viola et al. 2019). This is an interactive graphical user interface (GUI)
to visually and interactively explore the references to geographical entities
in the collection. Our aim was to allow users to conduct historical, finer-
grained analyses such as examining changes in mentions of places over time
and across titles as a way to identify the subjective and situational dimension
of geographical markers and connect them to explicit geo-references to
space (Viola and Verheul 2020a).
The creation of the third version of the collection, ChroniclItaly
3.0, should be understood in the context of yet another project,
DeepteXTminER (DeXTER)6 (Viola and Fiscarelli 2021b) supported
by the Luxembourg Centre for Contemporary and Digital History’s
(C2 DH—University of Luxembourg) Thinkering Grant. Composed of
the verbs tinkering and thinking, this grant funds research applying the
method of ‘thinkering’: ‘the tinkering with technology combined with
the critical reflection on the practice of doing digital history’ (Fickers
and Heijden 2020). As such, the scheme is specifically aimed at funding
innovative projects that experiment with technological and digital tools for
the interpretation and presentation of the past. Conceptually, the C2 DH
itself is an international hub for reflection on the methodological and
epistemological consequences of the Digital Turn for history;7 it serves
as a platform for engaging critically with the various stages of historical
research (archiving, analysis, interpretation and narrative) with a particular
focus on the use of digital methods and tools. Physically, it strives to
actualise interdisciplinary knowledge production and dissemination by
fostering ‘trading zones’ (Galison and Stump 1996; Collins et al. 2007),
working environments in which interactions and negotiations between
2 THE IMPORTANCE OF BEING DIGITAL 51

different disciplines can happen (Fickers and Heijden 2020). Within this
institutional and conceptual framework, I conceived DeXTER as a post-
authentic research activity to critically assess and implement different
state-of-the-art natural language processing (NLP) and deep learning
techniques for the curation and visualisation of digital heritage material.
DeXTER’s ultimate goal was to bring the utilised techniques into as close
an alignment as possible with the principle of human agency (cfr. Chap. 3).
The larger ecosystem of the ChroniclItaly collections thus exemplifies
the evolving nature of digital objects and how international and national
processes interweave with wider external factors, all impacting differen-
tially on the objects’ evolution. The existence of multiple versions of
ChroniclItaly, for example, is in itself a reflection of the incompleteness
of the Chronicling America project to which titles, issues and digitised
material are continually added. ChroniclItaly and ChroniclItaly 2.0 include
seven titles and issues from 1898 to 1920 that portray the chronicles of
Italian immigrant communities from four states (California, Pennsylvania,
Vermont, and West Virginia); ChroniclItaly 3.0 expands the two previous
versions by including three additional titles published in Connecticut and
pushing the overall time span to cover from 1898 to 1936. In terms of
issues, ChroniclItaly 3.0 almost doubles the number of included pages
compared to its predecessors: 8653 vs 4810 of its previous versions. This is
a clear example of how the formation of a digital object is impacted by the
surrounding digital infrastructure, which in turn is dependent on funding
availability and whose very constitution is shaped by the various research
projects and the involved actors in its making.

2.5 THE IMPORTANCE OF BEING DIGITAL


Understanding digital objects as post-authentic objects means to acknowl-
edge them as part of the complex interaction of countless factors and
dynamics and to recognise that the majority of such factors and dynamics
are invisible and unpredictable. Due to the extreme complexity of inter-
related forces at play, the formidable task of writing both the past in the
present and the future past demands careful handling. This is what Braidotti
and Fuller call ‘a meaningful response move from the relatively short chain
of intention-to-consequence […] to the longer chains of consequences in
which chance becomes a more structural force’ (2019, 13). Here chance is
understood as the unpredictable combination of all the numerous known
52 L. VIOLA

and unknown actors involved, conscious and unconscious biases, past,


present and future experiences, and public, private and personal interests.
With specific reference to the ChroniclItaly collections, for example, in
addition to the already discussed multiple factors influencing their creation,
many of which date even decades before, the nature itself of this digital
object and of its content bears significance for our conceptualisation of
digital heritage and more broadly, for digital knowledge creation practices.
The collections collate immigrant press material. The immigrant press
represents the first historical stage of the ethnic press, a phenomenon
associated with the mass migration to the Americas between the 1880s
and 1920s, when it is estimated that over 24 million people from all
around the world arrived to America (Bandiera et al. 2013). Indeed, as
immigrant communities were growing exponentially, so did the immigrant
press: at the turn of the twentieth century, about 1300 foreign-language
newspapers were being printed in the United States with an estimated
circulation of 2.6 million (Bjork 1998). By giving immigrants all sorts
of practical and social advice—from employment and housing to religious
and cultural celebrations and from learning English to acquiring American
citizenship—these newspapers truly helped immigrants to transition into
American society. As immigrant newspapers quickly became an essential
element at many stages in an immigrant’s life (Rhodes 2010, 48), the
immigrant press is a resource of particularly valuable significance not only
for studying the lives of many of the communities that settled in the United
States but also for opening a comprehensive window onto the American
society of the time (Viola and Verheul 2020a).
As far as the Italians were concerned, it has been calculated that by 1920,
they were representing more than 10% of the non-US-born population
(about 4 millions) (Wills 2005). The Italian community was also among
the most prolific newspapers’ producers; between 1900 and 1920, there
were 98 Italian titles that managed to publish uninterruptedly, whereas at
its publication peak, this number ranged between 150 and 264 (Deschamps
2007, 81). In terms of circulation, in 1900, 691,353 Italian newspapers
were sold across the United States (Park 1922, 304), but in New York
alone, the circulation ratio of the Italian daily press is calculated as one
paper for every 3.3 Italian New Yorkers (Vellon 2017, 10). Distribution
and circulation figures should however be doubled or perhaps even tripled,
as illiteracy levels were still high among this generation of Italians and
newspapers were often read aloud (Park 1922; Vellon 2017; Viola and
Verheul 2019a; Viola 2021).
2 THE IMPORTANCE OF BEING DIGITAL 53

These impressive figures on the whole may point to the influential role of
the Italian language press not just for the immigrant community but within
the wider American context, too. At a time when the mass migrations were
causing a redefinition of social and racial categories, notions of race, civilisa-
tion, superiority and skin colour had polarised into the binary opposition of
white/superior vs non-white/inferior (Jacobson 1998; Vellon 2017; Viola
and Verheul 2019a). The whiteness category, however, was rather complex
and not at all based exclusively on skin colour. Jacobson (1998) for instance
describes it as ‘a system of “difference” by which one might be both white
and racially distinct from other whites’ (ibid., p. 6). Indeed, during the
period covered by the ChroniclItaly collections, immigrants were granted
‘white’ privileges depending not on how white their skin might have been,
rather on how white they were perceived (Foley 1997). Immigrants in
the United States who were experiencing this uncertain social identity
situation have been described as ‘conditionally white’ (Brodkin 1998),
‘situationally white’ (Roediger 2005) and ‘inbetweeners’ (among others
Barrett and Roediger 1997; Guglielmo and Salerno 2003; Guglielmo
2004; Orsi 2010).
This was precisely the complicated identity and social status of Italians,
especially of those coming from Southern Italy; because of their challeng-
ing economic and social conditions and their darker skin, both other ethnic
groups and Americans considered them as socially and racially inferior and
often discriminated against them (LaGumina 1999; Luconi 2003). For
example, Italian immigrants would often be excluded by employment and
housing opportunities and be victims of social discrimination, exploita-
tion, physical violence and even lynching (LaGumina 1999; Connell and
Gardaphé 2010; Vellon 2010; LaGumina 2018; Connell and Pugliese
2018). The social and historical importance of Italian immigrant news-
papers is found in how they advocated the rights for the community they
represented, crucially acting as powerful inclusion, community building
and national identity preservation forces, as well as language and cultural
retention tools. At the same time, because such advocate role was often
paired with the condemnation of American discriminatory practices, these
newspapers also performed a decisive transforming role of American society
at large, undoubtedly contributing to the tangible shaping of the country.
The immigrant press and the ChroniclItaly collections can therefore be
an extremely valuable source to investigate specifically how the internal
mechanisms of cohesion, class struggle and identity construction of the
Italian immigrant community contributed to transform America.
54 L. VIOLA

Lastly, these collections can also bring insights into the Italian immi-
grants’ role in the geographical shaping of the United States. The majority
of the 4 million Italians that had arrived to the United States—mostly
uneducated and mostly from the south—had done so as the result of
chain migration. Naturally, they would settle closely to relatives and
friends, creating self-contained neighbourhoods clustered according to dif-
ferent regional and local affiliations (MacDonald and MacDonald 1964).
Through the study of the geographical places contained in the collections
as well as the place of publication of the newspapers’ titles, the ChroniclItaly
collections provide an unconventional and traditionally neglected source
for studying the transforming role of migrants for host societies.
On the whole, however, the novel contribution of the ChroniclItaly
collections comes from the fact that they allow us to devote attention to
the study of historical migration as a process experienced by the migrants
themselves (Viola 2021). This is rare as in discourse-based migration
research, the analysis tends to focus on discourse on migrants, rather than
by migrants (De Fina and Tseng 2017; Viola 2021). Instead, through the
analysis of migrants’ narratives, it is possible to explore how displaced
individuals dealt with social processes of migration and transformation
and how these affected their inner notions of identity and belonging.
A large-scale digital discourse-based study of migrants’ narratives creates
a mosaic of migration, a collective memory constituted by individual
stories. In this sense, the importance of being digital lies in the fact that
this information can be processed on a large-scale and across different
migrants’ communities. The digital therefore also offers the possibility–
perhaps unimaginable before—of a kaleidoscopic view that simultaneously
apprehends historical migration discourse as a combination of inner and
outer voices across time and space. Furthermore, as records are regularly
updated, observations can be continually enriched, adjusted, expanded,
recalibrated, generalised or contested. At the same time, mapping these
narratives creates a shimmering network of relations between the past
migratory experiences of diasporic communities and contemporary migra-
tion processes experienced by ethnic groups, which can also be compared
and analysed both as active participants and spectators.
Abby Smith Rumsey said that the true value of the past is that it is
the raw material we use to create the future (Rumsey 2016). It is only
through gaining awareness of these spatial temporal correspondences that
the past can become part of our collective memory and, by preventing us
from forgetting it, of our collective future. Understanding digital objects
2 THE IMPORTANCE OF BEING DIGITAL 55

through the post-authentic lens entails that great emphasis must be given
on the processes that generate the mappings of the correspondences. The
post-authentic framework recognises that these processes cannot be neutral
as they stem from systems of interpretation and management which are
situated and therefore partial. These processes are never complete nor they
can be completed and as such they require constant update and critical
supervision.
In the next chapter, I will illustrate the second use case of this book—
data augmentation; the case study demonstrates that the task of enriching
a digital object is a complex managerial activity, made up of countless
critical decisions, interactions and interventions, each one having conse-
quences. The application of the post-authentic framework for enriching
ChroniclItaly 3.0 demonstrates how symbiosis and mutualism can guide
how the interaction with the digital unfolds in the process of knowledge
creation. I will specifically focus on why computational techniques such
as optical character recognition (OCR), named entity recognition (NER),
geolocation and sentiment analysis (SA) are problematic and I will show
how the post-authentic framework can help address the ambiguities and
uncertainties of these methods when building a source of knowledge for
current and future generations.

NOTES
1. For a review of the discussion, see, for example, Cameron (2007, 2021).
2. https://www.marktechpost.com/2019/01/30/will-machine-learning-
enable-time-travel/.
3. These may sometimes be referred to as ‘wicked problems’; see, for instance,
Churchman (1967), Brown et al. (2010), and Ritchey (2011).
4. https://chroniclingamerica.loc.gov/.
5. https://utrecht-university.shinyapps.io/GeoNewsMiner/.
6. https://github.com/lorellav/DeXTER-DeepTextMiner.
7. https://www.c2dh.uni.lu/about.
56 L. VIOLA

Open Access This chapter is licensed under the terms of the Creative Commons
Attribution 4.0 International License (http://creativecommons.org/licenses/by/
4.0/), which permits use, sharing, adaptation, distribution and reproduction in any
medium or format, as long as you give appropriate credit to the original author(s)
and the source, provide a link to the Creative Commons license and indicate if
changes were made.
The images or other third party material in this chapter are included in the
chapter’s Creative Commons license, unless indicated otherwise in a credit line
to the material. If material is not included in the chapter’s Creative Commons
license and your intended use is not permitted by statutory regulation or exceeds
the permitted use, you will need to obtain permission directly from the copyright
holder.
CHAPTER 3

The Opposite of Unsupervised

When you control someone’s understanding of the past, you control their
sense of who they are and also their sense of what they can imagine becoming.
(Abby Smith Rumsey, 2016)

3.1 ENRICHMENT OF DIGITAL OBJECTS


After the initial headlong rush to digitisation, libraries, museums and other
cultural heritage institutions realised that simply making sources digitally
available did not ensure their use; what in fact became apparent was that
as the body of digital material grew, users’ engagement decreased. This
was rather disappointing but more importantly, it was worrisome. Millions
had been poured into large-scale digitisation projects, pitched to funding
agencies as the ultimate Holy Grail of cultural heritage (cfr. Chap. 2), a
safe, more efficient way to protect and preserve humanity’s artefacts and
develop new forms of knowledge, simply unimaginable in the pre-digital
era. Although some of it was true, what had not been anticipated was the
increasing difficulty experienced by users in retrieving meaningful content,
a difficulty that corresponded to the rate of digital expansion. Especially
when paired with poor interface design, frustrated users were left with an
overall unpleasant experience, feeling overwhelmed and dissatisfied.
Thus, to earn the return on investment in digitisation, institutions
urgently needed novel approaches to maximise the potential of their digital

© The Author(s) 2023 57


L. Viola, The Humanities in the Digital: Beyond Critical Digital
Humanities, https://doi.org/10.1007/978-3-031-16950-2_3
58 L. VIOLA

collections. It soon became obvious that the solution was to simplify and
improve the process of exploring digital archives, to make information
retrievable in more valuable ways and the user experience more meaningful
on the whole. Naturally, within the wider incorporation of technology in
all sectors, it is not at all surprising that ML and AI have been more than
welcomed to the digital cultural heritage table. Indeed, AI is particularly
appreciated for its capacity to automate lengthy and boring processes that
nevertheless enhance exploration and retrieval for conducting more in-
depth analyses, such as the task of annotating large quantities of digital
textual material with referential information. Indeed, as this technology
continues to develop together with new tools and methods, it is more and
more used to help institutions fulfil the main purposes of heritagisation:
knowledge preservation and access.
One widespread way to enhance access is through ‘content enrichment’
or just enrichment for short. It consists of a wide range of techniques
implemented to achieve several goals from improving the accuracy of
metadata for better content classification1 to annotating textual content
with contextual information, the latter typically used for tasks such as
discovering layers of information obscured by data abundance (see, for
instance, Taylor et al. 2018; Viola and Verheul 2020a). There are at least
four main types of text annotation: entity annotation (e.g., named entity
recognition—NER), entity linking (e.g., entity disambiguation), text clas-
sification and linguistic annotation (e.g., parts-of-speech tagging—POS).
Content enrichment is also often used by digital heritage providers to
link collections together or to populate ontologies that aim to standardise
procedures for digital sources preservation, help retrieval and exchange
(among others Albers et al. 2020; Fiorucci et al. 2020).
The theoretical relevance of performing content enrichment, especially
for digital heritage collections, lies precisely in its great potential for discov-
ering the cultural significance underneath referential units, for example, by
cross-referencing them with other types of data (e.g., historical, social, tem-
poral). We enriched ChroniclItaly 3.0 for NER, geocoding and sentiment
within the context of the DeXTER project. Informed by the post-authentic
framework, DeXTER combines the creation of an enrichment workflow
with a meta-reflection on the workflow itself. Through this symbiotic
approach, our intention was to prompt a fundamental rethink of both the
way digital objects and digital knowledge creation are understood and the
practices of digital heritage curation in particular.
3 THE OPPOSITE OF UNSUPERVISED 59

It is all too often assumed that enrichment, or at least parts of it,


can be fully automated, unsupervised and even launched as a one-step
pipeline. Preparing the material to be ready for computational analysis,
for example, often ambiguously referred to as ‘cleaning’, is typically
presented as something not worthy of particular critical scrutiny. We
are misleadingly told that operations such as tokenisation, lowercasing,
stemming, lemmatisation and removing stopwords, numbers, punctuation
marks or special characters don’t need to be problematised as they are
rather tedious, ‘standard’ operations. My intention here is to show how
it is on the contrary paramount that any intervention on the material
is tackled critically. When preparing the material for further processing,
full awareness of the curator’s influential role is required as each one
of the taken actions triggers different chain reactions and will therefore
output different versions of the material. To implement one operation
over another influences how the algorithms will process such material and
ultimately, how the collection will be enriched, the information accessed,
retrieved and finally interpreted and passed on to future generations (Viola
and Fiscarelli 2021b, 54).
Broadly, the argument I present provokes a discussion and critique of the
fetishisation of empiricism and technical objectivity not just in humanities
research but in knowledge creation more widely. It is this critical and hum-
ble awareness that reduces the risks of over-trusting the pseudo-neutrality
of processes, infrastructures, software, categories, databases, models and
algorithms. The creation and enrichment of ChroniclItaly 3.0 show how
the conjuncture of the implicated structural forces and factors cannot be
envisioned as a network of linear relations and as such, cannot be predicted.
The acknowledgement of the limitations and biases of specific tools and
choices adopted in the curation of ChroniclItaly 3.0 takes the form of a
thorough documentation of the steps and actions undertaken during the
process of creation of the digital object. In this way, it is not just the product,
however incomplete, that is seen as worthy of preservation for current and
future generations, but also equally the process (or indeed processes) for
creating it. Products and processes are unfixed and subject to change, they
transcend questions of authenticity; they allow room for multiple versions,
all equally post-authentic, in that they may reflect different curators and
materials, different programmers, rapid technological advances, changing
temporal frameworks and values.
60 L. VIOLA

3.2 PREPARING THE MATERIAL


How to critically assess which ones of the preparatory operations for
enrichment one should perform depends on internal factors such as the
language of the collection; the type of material; the specific enrichment
tasks to follow as well as external factors such as the available means and
resources, both technical and financial; the time-frame, the intended users
and research aims; the infrastructure that will store the enriched collection;
and so forth. Indeed, far from being ‘standard’, each intervention needs to
be specifically tailored to individual cases. Moreover, since each operation
is factually an additional layer of manipulation, it is fundamental that
scholars, heritage operators and institutions assess carefully to what degree
they want to intervene on the material and how, and that their decisions
are duly documented and motivated. In the case of ChroniclItaly 3.0,
for example, the documentation of the specific preparatory interventions
taken towards enriching the collection, namely, tokenisation, removing
numbers and dates and removing words with less than two characters and
special characters, is embedded as an integral part of the actual workflow.
I wanted to signal the need for refiguring digital knowledge creation
practices as honest and fluid exchanges between the computational and
human agency, counterbalancing the narrative that depicts computational
techniques as autonomous processes from which the human is (should be?)
removed. Thus, as a thoughtful post-authentic project, I have considered
each action as part of a complex web of interactions between the multiple
factors and dynamics at play with the awareness that the majority of
such factors and dynamics are invisible and unpredictable. Significantly,
the documentation of the steps, tools and decisions serves the valuable
function of acknowledging such awareness for contemporary and future
generations.
This process can be envisioned as a continuous dialogue between
human and artificial intelligence and it can be illustrated by describing
how we handled stopwords (e.g., prepositions, articles, conjunctions)
and punctuation marks when preparing ChroniclItaly 3.0 for enrichment.
Typically, stopwords are reputed to be semantically non-salient and even
potentially disruptive to the algorithms’ performance; as such, they are
normally removed automatically. However, as they are of course language-
bound, removing these items indiscriminately can hinder future analyses
having more destructive consequences than keeping them. Thus, when
enriching ChroniclItaly 3.0, we considered two fundamental factors: the
3 THE OPPOSITE OF UNSUPERVISED 61

language of the data-set—Italian—and the enrichment actions to follow,


namely, NER, geocoding and SA. For example, we considered that in
Italian, prepositions are often part of locations (e.g., America del Nord—
North America), organisations (e.g., Camera del Senato—the Senate)
and people’s names (e.g., Gabriele d’Annunzio); removing them could
have negatively interfered with how the NER model had been trained to
recognise referential entities. Similarly, in preparation for performing SA
at sentence level (cfr. Sect. 3.4), we did not remove punctuation marks; in
Italian punctuation marks are typical sentence delimiters; therefore, they
are indispensable for the identification of sentences’ boundaries.
Another operation that we critically assessed concerns the decision to
whether to lowercase the material before performing NER and geocoding.
Lowercasing text before performing other actions can be a double-edged
sword. For example, if lowercasing is not implemented, a NER algorithm
will likely process tokens such as ‘USA’, ‘Usa’, ‘usa’, ‘UsA’ and ‘uSA’ as
distinct items, even though they may all refer to the same entity. This may
turn out to be problematic as it could provide a distorted representation
of that particular entity and how it is connected to other elements in the
collection. On the other hand, if the material is lowercased, it may become
difficult for the algorithm to identify ‘usa’ as an entity at all,2 which may
result in a high number of false negatives, thus equally skewing the output.
We, once again, intervened as human agents: we considered that entities
such as persons, locations and organisations are typically capitalised in
Italian and therefore, in preparation for NER and geocoding, lowercasing
was not performed. However, once these steps were completed, we did
lowercase the entities and following a manual check, we merged multiple
items referring to the same entity. This method allowed us to obtain a
more realistic count of the number of entities identified by the algorithm
and resulted in a significant redistribution of the entities across the different
titles, as I will discuss in Sect. 3.3. Albeit more accurate, this approach did
not come without problems and repercussions; many false negatives are
still present and therefore the tagged entities are NOT all the entities in
the collections. I will return to this point in Chap. 5.
The decision we took to remove numbers, dates and special characters
is also a good example of the importance of being deeply engaged
with the specificity of the source and how that specificity changes the
application of the technology through which that engagement occurs. Like
the large majority of the newspapers collected in Chronicling America, the
pages forming ChroniclItaly 3.0 were digitised primarily from microfilm
holdings; the collection therefore presents the same issues common to
62 L. VIOLA

OCR-generated searchable texts (as opposed to born digital texts) such


as errors derived from low readability of unusual fonts or very small char-
acters. However, in the case of ChroniclItaly 3.0, additional factors must
be considered when dealing with OCR errors. The newspapers aggregated
in the collection were likely digitised by different NDNP awardees, who
probably employed different OCR engines and/or chose different OCR
settings, thus ultimately producing different errors which in turn affected
the collection’s accessibility in an unsystematic way. Like all ML predictions
models, OCR engines embed the various biases encoded not only in the
OCR engine’s architecture but more importantly, in the data-sets used for
training the model (Lee 2020). These data-sets typically consist of sets of
transcribed typewritten pages which embed the human subjectivity (e.g.,
spelling errors) as well as individual decisions (e.g., spelling variations).
All these factors have wider, unpredictable consequences. As previously
discussed in reference to microfilming (cfr. Sect. 2.2), OCR technology has
raised concerns regarding marginalisation, particularly with reference to
the technology’s consequences for content discoverability (Noble 2018;
Reidsma 2019). These scholars have argued that this issue is closely related
to the fact that the most largely implemented OCR engines are both
licensed and opaquely documented; they therefore not only reflect the
strategic, commercial choices made by their creators according to specific
corporate logics but they are also practically impossible to audit. Despite
being promoted as ‘objective’ and ‘neutral’, these systems incorporate
prejudices and biases, strong commercial interests, third-party contracts
and layers of bureaucratic administration. Nevertheless, this technology is
implemented on a large scale and it therefore deeply impacts what—on a
large scale—is found and lost, what is considered relevant and irrelevant,
what is preserved and passed on to future generations and what will not
be, what is researched and studied and what will not be accessed.
Understanding digital objects as post-authentic entails being mindful
of all the alterations and transformations occurring prior to accessing the
digital record and how each one of them is connected to wider networks
of systems, factors and complexities, most of which are invisible and
unpredictable. Similarly, any following intervention adds further layers
of manipulation and transformation which incorporate the previous ones
and which will in turn have future, unpredictable consequences. For
example, in Sect. 2.2 I discussed how previous decisions about what was
worth digitising dictated which languages needed to be prioritised, in turn
determining which training data-sets were compiled for different language
3 THE OPPOSITE OF UNSUPERVISED 63

models, leading to the current strong bias towards English models, data-
sets and tools and an overall digital language and cultural injustice.
Although the non-English content in Chronicling America has been
reviewed by language experts, many additional OCR errors may have
originated from markings on the material pages or a general poor condition
of the physical object. Again, the specificity of the source adds further
complexity to the many problematic factors involved in its digitisation; in
the case of ChroniclItaly 3.0, for example, we found that OCR errors were
often rendered as numbers and special characters. To alleviate this issue,
we decided to remove such items from the collection. This step impacted
differently on the material, not just across titles but even across issues of the
same title. Figure 3.1 shows, for example, the impact of this operation on
Cronaca Sovversiva, one of the newspapers collected in ChroniclItaly 3.0
with the longest publication record, spanning almost throughout the entire
archive, 1903–1919. On the whole, this intervention reduced the total
number of tokens from 30,752,942 to 21,454,455, equal to about 30% of
overall material removed (Fig. 3.2). Although with sometimes substantial
variation, we found the overall OCR quality to be generally better in the

Fig. 3.1 Variation of removed material (in percentage) across issues/years of


Cronaca Sovversiva
64 L. VIOLA

Fig. 3.2 Impact of pre-processing operations on ChroniclItaly 3.0 per title.


Figure taken from Viola and Fiscarelli (2021b)

most recent texts. This characteristic is shared by most OCRed nineteenth-


century newspapers, and it has been ascribed to a better conservation status
or better initial condition of the originals which overall improved over time
(Beals and Bell 2020). Figure 3.3 shows the variation of removed material
in L’Italia, the largest newspaper in the collection comprising 6489 issues
published uninterruptedly from 1897 to 1919.
Finally, my experience of previously working on the GeoNewsMiner
(Viola et al. 2019) (GNM) project also influenced the decisions we took
when enriching ChroniclItaly 3.0. As said in Sect. 2.4, GNM loads Chron-
iclItaly 2.0, the version of the ChroniclItaly collections annotated with
referential entities without having performed any of the pre-processing
tasks described here in reference to ChroniclItaly 3.0. A post-tagging
manual check revealed that, even though the F1 score of the NER
model—that is the measure to test a model’s accuracy—was 82.88, due
to OCR errors, the locations occurring less than eight times were in fact
false positives (Viola et al. 2019; Viola and Verheul 2020a). Hence, the
interventions we made on ChroniclItaly3.0 aimed at reducing the OCR
errors to increase the discoverability of elements that were not identified
in the GNM project. When researchers are not involved in the creation
3 THE OPPOSITE OF UNSUPERVISED 65

Fig. 3.3 Variation of removed material (in percentage) across issues/years of


L’ Italia

of the applied algorithms or in choosing the data-sets for training them—


which especially in the humanities represents the majority of cases—and
consequently when tools, models and methods are simply reused as part of
the available resources, the post-authentic framework can provide a critical
methodological approach to address the many challenges involved in the
process of digital knowledge creation.
The illustrated examples demonstrate the complex interactions between
the materiality of the source and the digital object, between the enrichment
operations and the concurrent curator’s context, and even among the
enrichment operations themselves. The post-authentic framework high-
lights the artificiality of any notion conceptualising digital objects as
copies, unproblematised and disconnected from the material object. Indeed,
understanding digital objects as post-authentic means acknowledging the
continuous flow of interactions between the multiple factors at play, only
some of which I have discussed here. Particularly in the context of digital
cultural heritage, it means acknowledging the curators’ awareness that the
past is written in the present and so it functions as a warning against
ignoring the collective memory dimension of what is created, that is the
importance of being digital.
66 L. VIOLA

3.3 NER AND GEOLOCATION


In addition to the typical motivations for annotating a collection with
referential entities such as sorting unstructured data and retrieving poten-
tially important information, my decision to annotate ChroniclItaly 3.0
using NER, geocoding and SA was also closely related to the nature of
the collection itself, i.e., the specificity of the source. One of the richest
values of engaging with records of migrants’ narratives is the possibility to
study how questions of cultural identities and nationhood are connected
with different aspects of social cohesion in transnational, multicultural
and multilingual contexts, particularly as a social consequence of migra-
tion. Produced by the migrants themselves and published in their native
language, ethnic newspapers such as those collected in ChroniclItaly 3.0
function in a complex context of displacement, and as such, they offer deep,
subjective insights into the experience and agency of human migration
(Harris 1976; Wilding 2007; Bakewell and Binaisa 2016; Boccagni and
Schrooten 2018).
Ethnic newspapers, for instance, provide extensive material for inves-
tigating the socio-cognitive dimension of migration through markers of
identity. Markers of identity can be cultural, social or biological such as
artefacts, family or clan names, marriage traditions and food practices,
to name but a few (Story and Walker 2016). Through shared claims of
ethnic identity, these markers are essential to communities for maintaining
internal cohesion and negotiating social inclusion (Viola and Verheul
2019a). But in diasporic contexts, markers of identity can also reveal the
changing subtle renegotiations of migrants’ cultural affiliation in mediating
interests of the homeland with the host environment. Especially when
connected with entities such as places, people and organisations, these
markers can be part of collective narratives of pride, nostalgia or loss, and
their analysis may therefore bring insights into how cultural markers of
identity and ethnicity are formed and negotiated and how displaced indi-
viduals make sense of their migratory experience. The ever-larger amount
of available digital sources, however, has created a complexity that cannot
easily be navigated, certainly not through close reading methods alone.
Computational methods such as NER methodologies, though presenting
limitations and challenges, can help identify names of people, places, brands
and organisations thus providing a way to identify markers of identity on a
large scale.
3 THE OPPOSITE OF UNSUPERVISED 67

We annotated ChroniclItaly 3.0 by using a NER deep learning sequence


tagging tool (Riedl and Padó 2018) which identified 547,667 entities
occurring 1,296,318 times across the ten titles.3 A close analysis of the
output, however, revealed a number of issues which required a critical
intervention combining expert knowledge and technical ability. In some
cases, for example, entities had been assigned the wrong tag (e.g., ‘New
York’ tagged as a person), other times elements referring to the same
entity had been tagged as different entities (e.g., ‘Woodrow Wilson’,
‘President Woodrow Wilson’), and in some other cases elements identified
as entities were not entities at all (e.g., venerdí ‘Friday’ tagged as an
organisation). To avoid the risk of introducing new errors, we intervened
on the collection manually; we performed this task by first conducting a
thorough historical triangulation of the entities and then by compiling a
list of the most frequent historical entities that had been attributed the
wrong tag. Although it was not possible to ‘repair.’ All the tags, this
post-tagging intervention affected the redistribution of 25,713 entities
across all the categories and titles, significantly improving the accuracy
of the tags that would serve as the basis for the subsequent enrichment
operations (i.e., geocoding and SA). Figure 3.4 shows how in some cases
the redistribution caused a substantial variation: for example, the number
of entities in the LOC (location) category significantly decreased in La
Rassegna but it increased in L’Italia. The documentation of these processes
of transformation is available Open Access4 and acts as a way to acknowl-
edge them as problematic, as undergoing several layers of manipulation
and interventions, including the multidirectional relationships between the
specificity of the source, the digitised material and all the surrounding
factors at play. Ultimately, the post-authentic framework to digital objects
frames digital knowledge creation as honest and accountable, unfinished
and receptive to alternatives.
Once entities in ChroniclItaly 3.0 were identified, annotated and ver-
ified, we decided to geocode places and locations and to subsequently
visualise their distribution on a map. Especially in the case of large
collections with hundreds of thousands of such entities, their visualisation
may greatly facilitate the discovery of deeper layers of meaning that may
otherwise be largely or totally obscured by the abundance of material
available. I will discuss the challenges of visualising digital objects in
Chap. 5 and illustrate how the post-authentic framework can guide both
68 L. VIOLA

Fig. 3.4 Distribution of entities per title after intervention. Positive bars indicate
a decreased number of entities after the process, whilst negative bars indicate an
increased number. Figure taken from Viola and Fiscarelli (2021b)

the development of a UI and the encoding of criticism into graphical


display approaches.
Performing geocoding as an enrichment intervention is another exam-
ple of how the process of digital knowledge creation is inextricably entan-
gled with external dynamics and processes, dominant power structures and
past and current systems in an intricate net of complexities. In the case
of ChroniclItaly 3.0, for instance, the process of enriching the collection
with geocoding information shares much of the same challenges as with
any material whose language is not English. Indeed, the relative scarcity of
3 THE OPPOSITE OF UNSUPERVISED 69

certain computational resources available for languages other than English


as already discussed often dictates which tasks can be performed, with
which tools and through which platforms. Practitioners and scholars as
well as curators of digital sources often have to choose between either
creating resources ad hoc, e.g., developing new algorithms, fine-tuning
existing ones, training their own models according to their specific needs
or more simply using the resources available to them. Either option may
not be ideal or even at all possible, however. For example, due to time
or resources limitations or to lack of specific expertise, the first approach
may not be economically or technically feasible. On the other hand, even
when models and tools in the language of the collection do exist—like in
the case of ChroniclItaly 3.0—typically their creation would have occurred
within the context of another project and for other purposes, possibly
using training data-sets with very different characteristics from the material
one is enriching. This often means that the curator of the enrichment
process must inevitably make compromises with the methodological ideal.
For example, in the case of ChroniclItaly 3.0, in the interest of time, we
annotated the collection using an already existing Italian NER model. The
manual annotation of parts of the collection to train an ad hoc model
would have certainly yielded much more accurate results but it would have
been a costly, lengthy and labour-intensive operation. On the other hand,
while being able to use an already existing model was certainly helpful
and provided an acceptable F1 score, it also resulted in a poor individual
performance for the detection of the entity LOC (locations) (54.19%)
(Viola and Fiscarelli 2021a). This may have been due to several factors
such as a lack of LOC-category entities in the data-set used for originally
training the NER model or a difference between the types of LOC entities
in the training data-set and the ones in ChroniclItaly 3.0. Regardless of the
reason, due to the low score, we decided to not geocode (and therefore
visualise) the entities tagged as LOC; they can however still be explored,
for example, as part of SA or in the GitHub documentation available Open
Access. Though not optimal, this decision was motivated also by the fact
that geopolitical entities (GPE) are generally more informative than LOC
entities as they typically refer to countries and cities (though sometimes
the algorithm retrieved also counties and States), whereas LOC entities
are typically rivers, lakes and geographical areas (e.g., the Pacific Ocean).
However, users should be aware that the entities currently geocoded are
by no means all the places and locations mentioned in the collection;
70 L. VIOLA

future work may also focus on performing NER using a more fine-tuned
algorithm so that the LOC-type entities could also be geocoded.

3.4 SENTIMENT ANALYSIS


Annotating textual material for attitudes—either sentiment or opinions—
through a method called sentiment analysis (SA) is another enriching
technique that can add value to digital material. This method aims to
identify the prevailing emotional attitude in a given text, though it often
remains unclear whether the method detects the attitude of the writer or
the expressed polarity in the analysed textual fragment (Puschmann and
Powell 2018). Within DeXTER, we used SA to identify the prevailing
emotional attitude towards referential entities in ChroniclItaly 3.0. Our
intention was twofold: firstly, to obtain a more targeted enrichment
experience than it would have been possible by applying SA to the entire
collection and, secondly, to study referential entities as markers of identity
so as to access the layers of meaning migrants attached historically to
people, organisations and geographical spaces. Through the analysis of
the meaning humans invested in such entities, our goal was to delve into
how their collective emotional narratives may have changed over time
(Tally 2011; Donaldson et al. 2017; Taylor et al. 2018; Viola and Verheul
2020a). Because of the specific nature of ChroniclItaly 3.0, this exploration
inevitably intersects with understanding how questions of cultural identities
and nationhood were connected with different aspects of social cohesion
(e.g., transnationalism, multiculturalism, multilingualism), how processes
of social inclusion unfolded in the context of the Italian American diaspora,
how Italian migrants managed competing feelings of belonging and how
these may have changed over time.
SA is undoubtedly a powerful tool that can facilitate the retrieval of
valuable information when exploring large quantities of textual material.
Understanding SA within the post-authentic framework, however, means
recognising that specific assumptions about what constitutes valuable
information, what is understood by sentiment and how it is understood and
assessed guided the devise of the technique. All these assumptions are invis-
ible to the user; the post-authentic framework warns the analyst to be wary
of the indiscriminate use of the technique. Indeed, like other techniques
used to augment digital objects including digital heritage material, SA
did not originate within the humanities; SA is a computational linguistics
3 THE OPPOSITE OF UNSUPERVISED 71

method developed within natural language processing (NLP) studies as


a subfield of information retrieval (IR). In the context of visualisation
methods, Johanna Drucker has long discussed the dangers of a blind
and unproblematised application of approaches brought in the humanities
from other disciplines, including computer science. Particularly about the
specific assumptions at the foundation of these techniques, she points out,
‘These assumptions are cloaked in a rhetoric taken wholesale from the
techniques of the empirical sciences that conceals their epistemological
biases under a guise of familiarity’ (Drucker 2011, 1). In Chap. 4, I will
discuss the implications of a very closely related issue, the metaphorical
use of everyday lexicon such as ‘sentiment analysis’, ‘topic modelling’ and
‘machine learning’ as a way to create familiar images whilst however refer-
ring to rather different concepts from what is generally internalised in the
collective image. In the case of SA, for example, the use of the familiar word
‘sentiment’ conceals the fact that this technique was specifically designed
to infer general opinions from product reviews and that, accordingly, it
was not conceived for empirical social research but first and foremost as an
economic instrument.
The application of SA in domains different from its original conception
poses several challenges which are well known to computational linguists—
the techniques’ creators—but perhaps less known to others; whilst opinions
about products and services are not typically problematic as this is precisely
the task for which SA was developed, due to their much higher linguistic
and cultural complexity, opinions about social and political issues are much
harder to tackle. This is due to the fact that SA algorithms lack sufficient
background knowledge of the local social and political contexts, not to
mention the challenges of detecting and interpreting sarcasm, puns, plays
on words and ironies (Liu 2020). Thus, although most SA techniques will
score opinions about products and services fairly accurately, they will likely
perform poorly when based on opinionated social and political texts. This
limitation therefore makes the use of SA problematic when other disciplines
such as the humanities and the social sciences borrow it uncritically, worse
yet it raises disturbing questions when the technique is embedded in a range
of algorithmic decision-making systems based, for instance, on content
mined from social media. For example, since its explosion in the early
2000s, SA has been heavily used in domains of society that transcend the
method’s original conception: it is constantly applied to make stock market
predictions and in the health sector and by government agencies to analyse
citizens’ attitudes or concerns (Liu 2020).
72 L. VIOLA

In this already overcrowded landscape of interdependent factors, there is


another element that adds yet more complexity to the matter. As with other
computational techniques, the discourse around SA depicts the method as
detached from any subjectivity, as a technique that provides a neutral and
observable description of reality. In their analysis of the cultural perception
of SA in research and the news media, Puschmann and Powell (2018)
highlight for example how the public perception of SA is misaligned with its
original function and how such misalignment ‘may create epistemological
expectations that the method cannot fulfill due to its technical properties
and narrow (and well-defined) original application to product reviews’ (2).
Indeed, we are told that SA is a quantitative method that provides us with
a picture of opinionated trends in large amounts of material otherwise
impossible to map. In reality, the reduction of something as idiosyncratic
as the definition of human emotions to two/three categories is highly
problematic as it hides the whole set of assumptions behind the very
establishment of such categories. For example, it remains unclear what is
meant by neutral, positive or negative as these labels are typically presented
as a given, as if these were unambiguous categories universally accepted
(Puschmann and Powell 2018). On the contrary, to put it in Drucker’s
words ‘the basic categories of supposedly quantitative information […] are
already interpreted expressions’ (Drucker 2011, 4).
Through the lens of the post-authentic framework, the application of
SA is acknowledged as problematic and so is the intrinsic nature of the
technique itself. A SA task is usually modelled as a classification problem,
that is, a classifier processes pre-defined elements in a text (e.g., sentences),
and it returns a category (e.g., positive, negative or neutral). Although
there are so-called fine-grained classifiers which attempt to provide a more
nuanced distinction of the identified sentiment (e.g., very positive, positive,
neutral, negative, very negative) and some others even return a prediction
of the specific corresponding sentiment (e.g., anger, happiness, sadness),
in the post-authentic framework, it is recognised that it is the fundamental
notion of sentiment as discrete, stable, fixed and objective that is highly
problematic. In Chap. 4, I will return to this concept of discrete modelling
of information with specific reference to ambiguous material, such as
cultural heritage texts; for now, I will discuss the issues concerning the
discretisation of linguistic categories, a well-known linguistic problem.
In his classic book Foundations of Cognitive Grammar, Ronald Lan-
gacker (1983) famously pointed out how it is simply not possible to
unequivocally define linguistic categories; this is because language does not
3 THE OPPOSITE OF UNSUPERVISED 73

exist in a vacuum and all human exchanges are always context-bound, view-
pointed and processual (see Langacker 1983; Talmy 2000; Croft and Cruse
2004; Dancygier and Sweetser 2012; Gärdenfors 2014; Paradis 2015). In
fields such as corpus linguistics, for example, which heavily rely on manually
annotated language material, disagreement between human annotators on
same annotation decisions is in fact expected and taken into account when
drawing linguistic conclusions. This factor is known as ‘inter-annotator
agreement’ and it is rendered as a measure that calculates the agreement
between the annotators’ decisions about a label. The inter-annotator
agreement measure is typically a percentage and depends on many factors
(e.g., number of annotators, number of categories, type of text); it can
therefore vary greatly, but generally speaking, it is never expected to be
100%. Indeed, in the case of linguistic elements whose annotation is highly
subjective because it is inseparable from the annotators’ culture, personal
experiences, values and beliefs—such as the perception of sentiment—this
percentage has been found to remain at 60–65% at best (Bobicev and
Sokolova 2018).
The post-authentic framework to digital knowledge creation introduces
a counter-narrative in the main discourse that oversimplifies automated
algorithmic methods such as SA as objective and unproblematic and
encourages a more honest conversation across fields and in society. It
acknowledges and openly addresses the interrelations between the chosen
technique and its deep entrenchment in the system that generated it. In the
case of SA, it advocates more honesty and transparency when describing
how the sentiment categories have been identified, how the classification
has been conducted, what the scores actually mean, how the results have
been aggregated and so on. At the very least, an acknowledgement of such
complexities should be present when using these techniques. For example,
rather than describing the results as finite, unquestionable, objective and
certain, a post-authentic use of SA incorporates full disclosure of the
complexities and ambiguities of the processes involved. This would con-
tribute to ensuring accountability when these analytical systems are used in
domains outside of their original conception, when they are implemented
to base centralised decisions that affect citizens and society at large or when
they are used to interpret the past or write the future past.
The decision of how to define the scope (see for instance Miner 2012)
prior to applying SA is a good example of how the post-authentic frame-
work can inform the implementation of these techniques for knowledge
creation in the digital. The definition of the scope includes defining
74 L. VIOLA

problematic concepts of what constitutes a text, a paragraph or a sentence,


and how each one of these definitions impacts on the returned output,
which in turn impacts on the digitally mediated presentation of knowledge.
In other words, in addition to the already noted caveats of applying SA
particularly for social empirical research, the post-authentic framework
recognises the full range of complexities derived from preparing the
material, a process—as I have discussed in Sect. 3.2—made up of countless
decisions and judgement calls. The post-authentic framework acknowl-
edges these decisions as always situated, deeply entrenched in internal and
external dynamics of interpretation and management which are themselves
constructed and biased. For example, when preparing ChroniclItaly 3.0
for SA, we decided that the scope was ‘a sentence’ which we defined as the
portion of text: (1) delimited by punctuation (i.e., full stop, semicolon,
colon, exclamation mark, question mark) and (2) containing only the most
frequent entities. If, on the one hand, this approach considerably reduced
processing time and costs, on the other hand, it may have caused less
mentioned entities to be underrepresented. To at least partially overcome
this limitation, we used the logarithmic function 2*log25 to obtain a more
homogeneous distribution of entities across the different titles, as shown
in Fig. 3.5.
As for the implementation of SA itself, due to the lack of suitable
SA models for Italian when DeXTER was carried out, we used the
Google Cloud Natural Language Sentiment Analysis 6 API (Application
Programming Interface) within the Google Cloud Platform Console,7 a
console of technologies which also includes NLP applications in a wide
range of languages. The SA API returned two values: sentiment score and
sentiment magnitude. According to the available documentation provided
by Google,8 the sentiment score—which ranges from −1 to 1—indicates
the overall emotion polarity of the processed text (e.g., positive, negative,
neutral), whereas the sentiment magnitude indicates how much emotional
content is present within the document; the latter value is often propor-
tional to the length of the analysed text. The sentiment magnitude ranges
from 0 to 1, whereby 0 indicates what Google defines as ‘low-emotion
content’ and 1 indicates ‘high-emotion content’, regardless of whether the
emotion is identified as positive or negative. The magnitude value is meant
to help differentiate between low-emotion and mixed-emotion cases, as
they would both be scored as neutral by the algorithm. As such, it alleviates
the issue of reducing something as vague and subjective as the perception
of emotions to three rigid and unproblematised categories. However, the
3 THE OPPOSITE OF UNSUPERVISED 75

La Tribuna
del Connecticut
La Sentinella
del West
La Sentinella

La Rassegna

La Ragione
Title

La Libera
Parola
L'Italia

L'Indipendente

Il Patriota
Cronaca
Sovversiva
0 20 40 60 80
Entities

Fig. 3.5 Logarithmic distribution of selected entities for SA across titles. Figure
taken from Viola and Fiscarelli (2021b)

post-authentic framework recognises that any conclusion based on results


derived from SA should acknowledge a degree of inconsistency between
the way the categories of positive, negative and neutral emotion have
been defined in the training model and the writer’s intention in the actual
material to which the model is applied. Specifically, the Google Cloud
Natural Language Sentiment Analysis algorithm differentiates between
positive and negative emotion in a document, but it does not specify what is
meant by positive or negative. For example, if in the model sentiments such
as ‘angry’ and ‘sad’ are both categorised as negative emotions regardless of
their context, the algorithm will identify either text as negative, not ‘sad’
or ‘angry’, thus creating further ambiguity to the already problematic and
non-transparent way in which ‘sad’ and ‘angry’ were originally defined
and categorised. To marginally deal with this issue, we established a
threshold within the sentiment range for defining ‘clearly positive’ (i.e.,
>0.3) and ‘clearly negative’ cases (i.e., <−0.3). The downside of this
approach was however that the algorithm considered all the cases between
76 L. VIOLA

these two values as neutral/mixed-emotion cases which inevitably led to


a flattening of nuances. In Chap. 5, I will return to the ambiguities of SA
when discussing the design choices for developing the DeXTER app, the
interactive visualisation tool to explore ChroniclItaly 3.0, and I will present
suggestions towards visualising the complexities and uncertainties in data-
models and visualisation techniques.
The application of the post-authentic framework to SA highlights that
the technique is far from being methodologically ideal and it calls attention
to all the uncertainties of using it in fields other than IR and for tasks other
than product review, as the use case discussed here. The post-authentic
framework acts therefore as a warning against these shortcomings and
creates a space for accountability for the adopted curatorial decisions.
Within DeXTER and ChroniclItaly 3.0, we thoroughly documented such
decisions which can be accessed through the openly available dedicated
GitHub repository9 which also includes the code, links to the original and
processed material, and the files documenting the manual interventions.
Ultimately, the post-authentic framework counterbalances the main public
discourse—separate from computational research—which promotes SA as
an exact way to measure emotions and opinions, it recognises when its use
is disconnected from its original purpose, and it accordingly advocates the
reworking of the user’s epistemological expectations.
In this respect, the implementation of the post-authentic framework
for knowledge creation in the digital relates to one of the central pillars
of science, that of replicability (or reproducibility/repeatability).10 The
principle postulates that following a study’s detailed descriptions, claims
and conclusions obtained by scientists can be verified by others. This is
done in the name of transparency, traceability and accountability, which are
also fundamental aspects of post-authentic work. The difference however
lies in the purpose of these fundamental notions; whereas in science they are
primarily aimed at allowing independent confirmation of a study’s results,
within the post-authentic framework, they are not solely concerned with
this specific scientific goal and in fact they move beyond it. For example,
traditionally, a study is believed to be replicable if sufficient transparency
has been observed on the data, the research purposes, the method, the
conclusions, etc. and yet some studies can be perfectly transparent and not
at all replicable (Peels 2019; Viola 2020b). This is, for instance, believed to
be the case especially in the humanities for which the very nature of some
studies can make replication impossible, for example, due to a particularly
interpretative analysis (Peels, 2019).
3 THE OPPOSITE OF UNSUPERVISED 77

On the opposite end of the scale, empirical works are believed to be—at
least in theory—fully replicable. Thus, despite the still unresolved debate
on the ‘R-words’, over the years, protocols and standards for replication in
science have been perfectioned and systematised. When computers started
to be used for experiments and data analysis, things turned complicated.
Plesser (2018), for instance, explains how it became apparent that the
canonical margins for experimental error did not somehow apply to digital
research:

Since digital computers are exact machines, practitioners apparently assumed


that results obtained by computer could be trusted, provided that the
principal algorithms and methods employed were suitable to the problem
at hand. Little attention was paid to the correctness of implementation,
potential for error, or variation introduced by system soft- and hardware,
and to how difficult it could be to actually reconstruct after some years—or
even weeks—how precisely one had performed a computational experiment.
(Plesser 2018, 1)

The post-authentic framework is comfortable with the belief that attain-


ability of complete objectivity (and therefore perfect replicability) is always
but an illusion. Indeed, the post-authentic relevance of transparency, trace-
ability and consequently, accountability lies primarily in the acknowledge-
ment of a collective responsibility, the one that comes with the building of
a source of knowledge for current and future generations. Thus, within the
post-authentic framework being transparent about both the ‘raw’ and the
processed material, about the methodology, the analytical processes and the
tools assumes a whole new importance: the creation of other digital forms
which allow to trace technical obsolescence, acknowledge power relations
and attempt to fluidly incorporate the exchanges that lead to symbiosis,
not friction, across interactions. As argued by Fiona Cameron with regard
to digital cultural heritage (2021, 12):

[digital cultural heritage] encapsulate[s] other registers of significance, tem-


porality and agency such as planetary technological infrastructures, material
agency, non-human, elemental, and earthly processes, all of which are
invisible figures in their constitution.

The post-authentic framework for digital knowledge creation recognises


that whatever arises out of the confluence of all these different agencies
78 L. VIOLA

cannot be fully predicted. The role of documentation by researchers,


museums, archives, libraries, software developers and so on acts therefore as
a means to acknowledge that we are writing the future past and that writing
the past means controlling the future. The post-authentic framework
provides an architecture to meet the need for accountability to current
and future generations.
Finally, the documentation of the interventions has wider resonance
particularly in relation to increasing awareness towards sustainability in
digital knowledge creation. In June 2020, the UN published the Roadmap
for Digital Cooperation report which set a list of key actions to be
achieved by 2030 in order to advance a more equitable digital world.
Whilst acknowledging that ‘Meaningful participation in today’s digital
age requires a high-speed broadband connection to the Internet’ (United
Nations 2020b, 5), the report also highlights that half of the world’s
population (3.7 billion people) currently does not have access to the
Internet. The lack of digital access, also commonly referred to as ‘Digital
Divide’, affects those mostly located in least developed countries (LDCs),
landlocked developing countries (LLDCs) and small island developing
states (SIDS) with an even more acute gap in countries such as sub-Saharan
Africa, where only 11% have access to household computers and 82% lack
Internet access altogether.
The digital inequality worsens the already existing inequalities in society
as those who are the most vulnerable are disproportionately affected by
the divide. Based as it is on a universal vision of digital transformation,
current digital knowledge creation practices face therefore not only the
danger of being available exclusively to half of humanity but also of
yet again imposing Western-centred perspectives on how knowledge is
created and accessed. The future looks ever more digital and digitally
available repositories will become larger and larger; reconceptualising
digital objects within the post-authentic framework means also fostering
their reconceptualisation not just in terms of what we are digitising but
also how and for whom. In this sense, the creation, curation, analysis and
visualisation of digital objects should whenever possible prefer methods
and practices that make curatorial workflows sustainable, interoperable and
reusable. This should include the storage of the material in an Open Access
repository, the use of freely available and fully documented software and
a thorough documentation of the implemented steps and interventions,
including an explanation of the choices made which will in turn facilitate
research accessibility, transparency and dissemination.
3 THE OPPOSITE OF UNSUPERVISED 79

In the next chapter, I will illustrate the third use case of the book, the
application of the post-authentic framework to digital analysis. Through
the example of topic modelling, I will show how the post-authentic
framework can guide a deep understanding of the assemblage of culture
and technology in software and help us achieve the interpretative potential
of computation. I will specifically discuss the implications for knowledge
creation of the transformation of continuous material into discrete form—
binary sequences of 0s and 1s—with particular reference to the notions
of causality and correlations. Within this broader discussion, I will then
illustrate the example of topic modelling as a computational technique
that treats a collection of texts as discrete data, and I will focus on the
critical aspects of topic modelling that are highly dependent on the sources:
pre-processing, corpus preparation and deciding on the number of topics.
The topic modelling example ultimately shows how producing digital
knowledge requires sustained engagement with software, in the form of
fluid, symbiotic exchanges between processes and sources.

NOTES
1. See, for instance, the Europeana Semantic Enrichment Framework at
https://pro.europeana.eu/share-your-data/enrichment.
2. usa in Italian means ‘he/she/it uses’.
3. For the documentation of the training of the Italian model used to annotate
ChroniclItaly 3.0 including information on the output format and F1 score,
please see Viola et al. (2019); Viola and Fiscarelli (2021b).
4. https://github.com/lorellav/DeXTER-DeepTextMiner.
5. This logarithmic function is the inverse function of the exponential func-
tion.
6. https://cloud.google.com/natural-language/docs/analyzing-sentiment.
7. https://console.cloud.google.com/.
8. https://cloud.google.com/natural-language/docs/basics#interpreting_
sentiment_analysis_values.
9. See note 5.
10. For an overview of the so-called R debate, please see Rougier et al. (2016).
80 L. VIOLA

Open Access This chapter is licensed under the terms of the Creative Commons
Attribution 4.0 International License (http://creativecommons.org/licenses/by/
4.0/), which permits use, sharing, adaptation, distribution and reproduction in any
medium or format, as long as you give appropriate credit to the original author(s)
and the source, provide a link to the Creative Commons license and indicate if
changes were made.
The images or other third party material in this chapter are included in the
chapter’s Creative Commons license, unless indicated otherwise in a credit line
to the material. If material is not included in the chapter’s Creative Commons
license and your intended use is not permitted by statutory regulation or exceeds
the permitted use, you will need to obtain permission directly from the copyright
holder.
CHAPTER 4

How Discrete

If you torture the data long enough, it will confess to anything. (Attributed
to Ronald H. Coase, 1960)

4.1 METAPHORS WITH DESTINY


Metaphors are fascinating and powerful linguistic devices. Over the years,
numerous scholars have indeed extensively explored their manipulative
talent for creating realities (see for instance, Lakoff 1992, 2004, 2008;
Goatly 2007; Mio and Katz 2016). In the context of political discourse
alone, for example, the study of metaphors’ capacity to hide or popularise
latent ideologies, justify or blame governments’ decisions, or strategically
attribute blame goes back decades (e.g., Musolff 2004, 2010, 2014; Goatly
2007; Ottatti et al. 2014; Viola 2020a). Though extremely powerful—
‘Metaphors can kill’ (Lakoff 1992, 1)—metaphors are neither good nor
bad per se; we simply routinely use them, often rather unreflectively, so
that abstract and complex ideas can be processed in a cognitively simplified
way (ibid.). What makes metaphors so effective, particularly conceptual
metaphors, is their use of conceptual frames such as war, disease, sport,
family, religion and others which, by evoking mental images that are famil-
iar to the message receivers, can turn complex concepts into a simple, linear
logic (Viola 2020a). It is thanks to this ‘framing power’ that metaphors’
arguments become plausible and the proposed conclusions are perceived as

© The Author(s) 2023 81


L. Viola, The Humanities in the Digital: Beyond Critical Digital
Humanities, https://doi.org/10.1007/978-3-031-16950-2_4
82 L. VIOLA

unproblematic and even ‘self-evident’ (Musolff 2016, 133). Moreover, as


we mostly use metaphors implicitly, such framing power remains typically
unnoticed and so do metaphors. So, for example, in the context of the
COVID-19 pandemic, when commenting on the effectiveness of Italy’s
decision to institute national lockdown, French Prime Minister at the time
Édouard Philippe said, ‘To block the country does not allow to contain
the epidemy’1 (Valeurs actuelles 2020). At the time when the comment
was made, France was adopting much less drastic measures compared to
Italy; therefore, the differences in the two countries’ crisis management
approaches needed to be justified, and in order to be accepted by the
nation, the domestic strategy had to be presented to the public as the
best possible solution (Viola 2022). In this particular example, the framing
power is conveyed by the expression to block the country: the metaphorical
use of the verb to block frames the Italian lockdown measure not only as
overly aggressive but wrongly targeted: it is the country that is put to a
halt, not the spread of the virus.
But metaphors are not typically found just in political discourse; scien-
tific discourse also regularly exploits the power of metaphors to simplify
complex concepts. In 2003, Blei et al. published a study which, at the
moment of writing, counts 36,483 citations (2003). The paper tackled
the task of modelling a collection of discrete data, for example, a corpus
of texts, for efficient processing tasks such as classification and content
summarisation. The authors’ basic idea was to model each item in the
collection, e.g., each text, according to the Latent Dirichlet Allocation
(LDA) model, a generative probabilistic model for which documents are
represented as distributions of sets of words statistically likely to occur
together. Although the article itself was titled ‘Latent Dirichlet Allocation’,
the technique described in the article went down in history as topic
modelling. The reason for that may be found in the fact that the authors
had decided to name the above-mentioned sets of words as ‘topics’, albeit
their intention was not to make epistemological claims regarding the latent
variables but to simply ‘exploit text-oriented intuitions’ (996), that is, to
take advantage of a familiar image such as that of topics. In other words,
the term topic was used metaphorically.
A similar observation about the metaphorical use of everyday notions to
refer to techniques which are however based on specific, rather different,
principles may also apply to computational techniques such as ‘sentiment
analysis’ and ‘machine learning’. The metaphorical use of the terms ‘sen-
timent’, ‘learning’ and ‘topic’ may be harmless within the fields that have
4 HOW DISCRETE 83

devised such techniques because the principles upon which they are based
are very clearly defined by their creators and understood in those circles.
It may on the contrary have huge consequences when these methods are
passively transferred into other disciplines or practices. In his analysis of
informational approaches in cancer biology research, Longo (2018), for
example, critiques the extensive use of computer science terminology such
as ‘instructions’, ‘to reprogram a deprogrammed DNA’ and in general the
DNA described as a computer program and genes as information carriers.
He argues (88):

The informational approach in biology conflates the concept of program-


ming on discrete data with the common-sense understanding of ‘informa-
tion’ and ‘computer program’, which are vaguely familiar to everybody [...]
In fact, the use of ‘information’ and ‘programming’ in biology is not scientific
because it neither applies the mathematical invariants proper to information
and programming, nor the theorems proper to the corresponding scientific
disciplines. Instead, it transfers a vague, everyday notion and refers to ‘weak’
meanings.

Longo argues that the metaphorical use of mathematical and compu-


tational language has had enormous consequences for molecular biology
cancer research which essentially studies cancer as the result of DNA de-
programming, inherited or otherwise caused by a carcinogen that disrupts
the DNA ‘encoded instructions’ (92). The use of an everyday notion
such as that of ‘program’, he continues, has also no doubt facilitated
understanding among funding agencies and the public, perhaps even
leading to the exclusion of alternative hypotheses. Similarly, one might
argue that it is the metaphorical use of the word topic that explains why
topic modelling has become so popular beyond computer science and in
the humanities in particular: whereas not everyone may be an expert in
statistical modelling, we are all more or less familiar with a fairly general
conceptualisation of what a topic is. However, what humanities scholars
may have not been too familiar with—and to a large extent, still aren’t—is
the set of assumptions behind a method born in the computer sciences and
adopted in critical research.
The popularity of topic modelling beyond computer science (as well
as SA and ML) is closely related to another phenomenon, well-known in
linguistics: when a metaphor is adopted by a significant part of the linguistic
community, language users may no longer be aware of its metaphorical
84 L. VIOLA

use, the metaphor becomes a common meaning and so it dies (Ricœur


2003, 115). The metaphorical use of ‘sentiment’, ‘learning’ and ‘topic’,
I will argue here, has certainly contributed to make these techniques
very popular outside of their field of origin. At the same time, however,
precisely because of this popularity, these meanings have become common
meanings, i.e., ‘dead metaphors’. This in turn has major consequences: the
creation of epistemological expectations that these methods will obviously
disappoint (Puschmann and Powell 2018). For example, as I have discussed
in Chap. 3 in reference to SA, the familiar word ‘sentiment’ creates a
specific epistemological expectation, that it is somewhat possible to obtain
a neutral way to assess attitudes and moods in large quantities of material.
Assessment, however, requires language understanding as a prerequisite
and when it comes to machines, this is exactly what they are not able to
do. The post-authentic framework that I advance in this book serves also
as a reminder that these terms are used as mere metaphors.
In the next section, I will discuss a more concerning aspect concealed
by the use of vague, familiar notions such as ‘sentiment’, ‘learning’ and
‘topic’: the underlying process upon which these techniques are based, i.e.,
the elaboration of continuous information into discrete systems and the
implications for causality. In discrete systems, causality is hidden because
information is rendered as exact and separate points, all encoded in one
dimension and according to precise instructions (Longo 2018). The three-
dimensional, causal essence of information cannot be accessed by the
user who, instead, is offered an altered image made up of predictions
of correlations. The resulting information will still refer to its original
continuous structure, but computers will only render it as a sequence of 0s
and 1s, that is in discrete form, thus hiding relational causality.
In the case of SA, this distorted image is reflected in the reduction
of the subjectivity of human emotions to two/three categories, scored
according to probabilistic calculations; in the case of ML, the holistic,
human capacity to acquire knowledge and skills through experience, logic
and contextual factors is reduced to the probabilistic processing of huge,
yet partial, quantities of discrete data; in the case of topic modelling, the
text itself disappears and so does its entrenchment in the wider context
that produced it. In all these cases, the three-dimensional, causal structure
is no longer accessible nor is its historical and social susceptibility as it
is all dissembled by the computational, dualistic system of 0s and 1s.
This conflation of discrete data modelling with familiar notions such as
‘sentiment’, ‘learning’ and ‘topic’ has therefore certainly contributed to
4 HOW DISCRETE 85

make these methods extremely popular outside their fields of origin, but
at the same time, it has obfuscated the well-defined laws upon which they
are based. Longo claims:

This is an amazing technological achievement: by fine engineering, one may


forget the underlying physical hardware and its continuous flows and just
consider (and work on) the discrete software processes by writing alpha-
numeric programs. (Longo 2018, 87)

In a world where all information is digital, the consequence of this


amazing technological achievement is that it also presents a distorted image
of knowledge because, to paraphrase David Tong, the world does not seem
to be discrete (Tong 2011).
In this chapter, I first examine the implications of adopting discrete
methods and technologies not just as quantitative tools in the humanities
but for knowledge production in general and, more widely, for our
understanding of society. Specifically, I reflect on the notions of causality
and correlations in light of the considerations discussed so far about
the mythicised discourse on data and technology neutrality, the dangers
of using metaphorical language to refer to digital technologies and the
consequential urgent need for knowledge reconfiguration inspired by
symbiosis and mutualism. I then proceed to examine the text mining
technique of topic modelling and the premises on which it is based with
a special focus on its use of discrete mathematics to encode information.
Finally, I illustrate how applying the post-authentic framework to topic
modelling can facilitate critical engagement with this technique, especially
in humanities research.
In my discussion, I argue that such engagement can only happen by
maintaining a sustained connection with the digital object and I demon-
strate how the application of key post-authentic concepts and methods
can be especially effective at three decisive stages in a topic modelling
workflow: pre-processing, corpus preparation and choosing the number
of topics. The post-authentic framework, as the analysis will show, may
be especially effective at prompting the active and reflexive participation
of the user in the process of knowledge production in the digital. In the
next section, I start my argument by discussing the implications of the
‘big data philosophy’, that is, the obsession with patterns and correlations
as opposed to causation, to explain phenomena; I also examine such
86 L. VIOLA

implications in relation to topic modelling and its use for knowledge


creation, in humanistic enquiry and beyond.

4.2 CAUSALITY, CORRELATIONS, PATTERNS


Perhaps one of the most significant implications of the ‘Digital Turn’ in the
humanities, more widely in the natural, computational and social sciences,
and more widely still in relation to the digitisation of society is contained
in the notion of discrete vs continuous modelling of information. The
concepts of discrete and continuous and the tension between the two
are at the foundation of mathematical thought and of how mathematical
modelling is used to explain natural phenomena (Fenstad 1985). A way to
understand the crucial difference between discrete and continuous struc-
tures is to consider that in a discrete structure, all points are isolated and
completely disconnected from each other; one can therefore label them and
count them and their count is exact and absolute. On the contrary, one can
only access a continuous structure by measuring it and these measurements
create intervals or fractions of intervals; moreover, in the continuous, a scale
for the measurement has to be set (Longo 2018, 84). Therefore, in discrete
systems, there is no room for approximation, no uncertainty, no nuances, as
something is either one point or another, whereas in the continuous—since
phenomena can only be accessed by measuring them—the measurements
are always approximated (Longo 2019, 64–65).
Even without going too deep into the full mathematical (and physical!)
ramifications of these two notions, one can intuitively understand that
they refer to very different ways of mathematical thinking. A fundamental
difference particularly relevant to the arguments advanced in this book
is concerned with the understanding of causality, a notion whose theo-
retical conceptualisation from philosophy to physics can be traced back
to antiquity.2 For the sake of the argument advanced in this chapter, I
will summarise the discussion by saying that in the classical worldview
which prevailed until the twentieth century, a mechanistic notion strongly
identified causation with determinism. Determinism can be understood
as the ability to determine the future state of a physical system from its
present state (Weinert 2005, 196). According to this view, also known
as functional view of causation, every event has a unique cause that
precedes it (de Laplace 1820; Stigler 1986; Čpek and Čapek 1961),
and therefore the world is seen as an ‘uninterrupted chain of causes and
4 HOW DISCRETE 87

effects’ (Holbach 1770). This view has been criticised over the course
of the twentieth century for several shortcomings such as the proximity
of elements in determining cause-effect relationships, predictability as the
main criterion for establishing causation and the reduction of causality
essentially to a mere temporal relationship. Discoveries of and advances in
differential equations, atomic physics and quantum mechanics have further
consolidated such criticisms eventually leading to the current separation
of causality from determinism. Particularly in quantum mechanics, recent
experiments have provided strong evidence for the validity of this notion
of causality without determinism. In this view, consequent states of a
quantum system are related to its antecedent states by a form of conditional
dependency (Weinert 2005, 241) as opposed to every event having a
unique cause that precedes it.
Coming back to the distinction between discrete and continuous struc-
tures, this means that in discrete systems, there is no deterministic cause-
effect relationship, because points are totally separated from each other,
whereas in continuous systems, causal relations can be observed and
measured, but not predicted3 (Longo 2018, 86). Though it may appear
inconsequential at first, this observation about causality has specific and
profound implications that stretch well beyond mathematical and phys-
ical reasoning. Stating that in discrete structures such as say a database
where something belongs to either one category or another, no cause-
effect relationship of observed phenomena can be established but only a
probabilistic one essentially means that explanations for such phenomena
cannot be found, only correlations. If two random variables are correlated,
or as noted by Calude and Longo (2017), co-related, it means that they
are associated according to a statistical measure, that they co-occur. This
statistical measure is rendered by a correlation coefficient, a number
between −1 and 1 that expresses the strength of the linear relationship
between two numeric variables. If two variables are positively correlated
(e.g., they both increase), then the correlation coefficient will be closer to
1, if there is a negative correlation (i.e., they are inversely correlated), it will
be closer to −1, and closer to 0 if there is no correlation at all. It is a well-
established fact in statistics and beyond that a correlation coefficient per se
is not enough to explain the cause for the patterns that are captured.4
The identification of statistical correlations is nevertheless an important
factor in understanding the relationship between two quantitative variables
and it remains an insightful method that can potentially lead to significant
discoveries. Indeed, the observation of correlations is at the foundation
88 L. VIOLA

of the classic scientific method in the sense that starting from the mea-
surement of correlated phenomena, scientists have been able to formulate
theories that could be tested and later confirmed or disproved. The history
of science is full of extraordinary achievements which originated from
mere observations of not-so-obviously correlated phenomena, for example,
distributional semantics theory, a famous linguistic theory that stemmed
from the intuition of Zellig S. Harris and John R. Firth, two semanticists
(though Harris was also a statistical mathematician). This intuition—
famously captured by Firth’s quote ‘You shall know a word by the company
it keeps’ (1957, 11)—acknowledges the relevance of words’ collocation
(i.e., the place of occurrence of words) in determining their meaning. The
core idea behind Harris and Firth’s work on collocational meaning and
distributional semantics is that meanings do not exist in isolation; rather,
words that are used and occur in the same contexts tend to purport similar
meanings (Harris 1954, p. 156).
In those days, gaining access to real language data was costly and
very time-consuming and for a long time, it was not possible to test
this theory. But more recently, new advances in computer science merged
with huge quantities of naturally occurring language material, including
digitised historical data-sets, have indeed proven that languages are not
deterministic systems—as previously believed—but that they should be
thought to be ‘probabilistic, analogical, preferential systems’ (Hanks 2013,
310). As intuitively theorised in distributional semantics, words do not
have a one-to-one relationship with meaning because meanings are not
precise, exact or stable. To the contrary, words in isolation do not possess
any meaning and meanings can only be entailed from words’ context. As
argued by Harris, ‘We cannot say that each morpheme or word has a single
or central meaning, or even that it has a continuous or coherent range
of meanings’ (Harris 1954, 151). Sixty years after its initial formulation,
distributional semantics theory laid the basis for Google’s renowned
word2vec algorithm, and today, it constitutes the theoretical background
of NLP studies concerned with language and meaning, including the very
topic modelling (cfr. Sect. 4.4).
Coming back full circle to causality, correlations and patterns, a correla-
tion measure only informs us of the strength of a relationship between two
variables, whereas the patterns tell us that certain regularities can be found
in how the observed variables are distributed. Hence, for they highlight
trends in the data, correlations and patterns may potentially have predictive
power, but neither of them provides causal explanations for the analysed
4 HOW DISCRETE 89

phenomena nor they intrinsically carry significance. In the next section,


I will elaborate on these reflections to discuss the important implications
for society of operating predominantly within the discrete system of the
contemporary encoding of all digital information, binary sequences of 0s
and 1s. Taking the example of analysis of material that had originally been
conceived of as a coherent entity, i.e., continuous (e.g., a book, a collection
of essays on the same topic, all the issues of a newspaper), I explore the
implications of its digital encoding into discrete form through digitisation
and subsequent digital analysis. One critical implication, I argue, is that
the adoption of an indiscriminate, data-driven approach to analysis risks to
completely disregard context and to attribute meaning to correlations and
patterns per se. Through the example of topic modelling and its application
to the analysis of ChroniclItaly 3.0, further in the chapter, I show how the
application of concepts and methods of the post-authentic framework to
digital knowledge creation can be useful to prompt a critical stance towards
computational methods and tools which I argue is urgently needed for the
configuration of a model for knowledge production in the digital.

4.3 MANY PATTERNS, FEW MEANINGS


Big data analytics (cfr. Sect. 1.2) is supported by the idea that correlations
are expected to be recurrent, i.e., they will iterate similarly along the chosen
parameter, for example, time (Calude and Longo 2017, 602). Recurrent
correlations are an established scientific principle and they can be observed
in natural cycles such as the water cycle and the alternation of seasons.
The recurrence of correlations suits well deterministic systems in which
it is believed that one can determine the future state of a physical system
from its present state (cfr. Sect. 4.2). This is precisely what the ‘big data
philosophy’ states: because patterns are expected to be recurrent, the future
can be predicted by statistical algorithms based on the patterns found in
past data, without the need for causal explanation. Naturally, the larger the
data-set, the more accurate the prediction.
This idea that all that counts are the patterns is not in fact new and it can
be traced back to the 1990s and to Complexity Theory (Waldrop 1992).
Complexity Theory argues that there is a hidden order to the behaviour
and evolution of complex systems and chaos can be made manageable
by looking at its underlying, ubiquitous patterns. What these patterns
show is how complex systems work, more specifically how organisations
90 L. VIOLA

cope with uncertainty and nonlinearity and manage to remain stable.


The idea behind Complexity Theory is that complex systems are the
assemblage of extremely convoluted factors which make them funda-
mentally unpredictable. Yet, at the same time, complex systems exhibit
order rules according to which independent actors, i.e., discrete elements,
spontaneously self-organise. This contradictory property makes it possible
for patterned behaviour and properties to be observed. It also means,
however, that the meaning of any system is irrelevant as the focus is and
remains on the observed behavioural patterns.
One does not have to dig too deep to see how computer science has
strongly supported Complexity Theory. Indeed, Complexity Theory fits
perfectly with what machines excel at: finding patterns in the data (Turkle
2014). Ever powerful computers can be given enormous quantities of
data and instructed to find the patterns that human beings will never be
able to find. And it works. Patterns are always found. However, despite
appearing (at first, at least) logically sound and despite being validated by
the cycles present in nature, the discourse surrounding big data analytics
obscures at least four fundamental truths. Firstly, as said earlier in the
chapter, in discrete systems such as a database, no cause-effect relationship
of observed phenomena can be established but only correlations and
patterns. Computers are not programmed to find meanings, only the
patterns; as correlations and patterns do not intrinsically carry significance,
this essentially means that databases provide an a-causal image of the
world (Longo 2018, 86). Thus, what the big data hype obscures is that
today’s computer-dominated world offers us countless patterns but no
explanations for them, and so we are left to deal with a patterned, yet a-
causal, way of making sense of reality.
Secondly, the idea that information is uniquely absorbed from data
is also closely related to Complexity Theory. The theory argues that
complex systems are constantly altered by agents’ interactions through a
process of feedback loops; thanks to their intrinsic capacity to learn from
experience, complex adaptive systems are organic and better evolving.
The big data approach has essentially adopted this theory in toto, but it
seems to have failed to recognise that machines are in fact incapable to
learn. Indeed, the deterministic belief that the future state of a physical
system can be predicted from the observation of its past state, which in
any case has been criticised over the course of the twentieth century and
mostly disproved as discussed in Sect. 4.2, has become conflated into the
metaphorical use of the word ‘learning’ in ML. The familiar notion of
4 HOW DISCRETE 91

‘learning’ confounds what learning actually means for a machine—finding


correlations and patterns but no causal explanations—with the human
capacity to understand and make sense of the world, i.e., attempting to
find causality.
Thirdly, the big data analytics’ deterministic claim that based on avail-
able data, one can provide accurate predictions of the future without
the need for causal explanation is provably wrong. Calude and Longo
(2017) demonstrated that in a large enough data-set, there will always
be correlations but most of them will be random, i.e., meaningless. This
means that the probability that a series of correlations will be recurrent
as in the natural cycles is extremely low; the authors explain: ‘recurrence
may occur, but only for immense values of the intended parameters and,
thus, an immense database’ (ibid., 609). In other words, the patterns
found in databases do not per se constitute sufficient proof to offer reliable
predictions of the future because most of these patterns will actually
be false positives. In techniques such as topic modelling, an element of
randomness is in fact built into the algorithm itself as initially, documents
are assigned to topics through random probability. Although it is true
that the calculations become increasingly accurate as the algorithm iterates
through more documents, the risk once again is to see meaning where
there is none.
Fourthly, the fact that databases are exact, i.e., discrete, perpetuates
the false belief that data is also exact, neutral and objective. It is always
emphasised by the ‘big data philosophy’ that statistical algorithms will find
patterns where nobody else can, and because databases are exact, this is
enough. What is on the contrary not at all emphasised is the subjective and
interpretative dimension of collecting, selecting, categorising, aggregating,
in other words of making data. Recognising that data is created makes
the claims of absolute impartiality, exactness and reliability shaky at best
and ethically concerning at worst, particularly when necessarily incomplete,
biased and opaquely collected data is used to make predictions that
influence decision-making processes or produce research findings.
Reassuringly, these limitations have recently started to be at the centre
of the academic debate and have originated the so-called causal inference
challenge. In their work The Book of Why (2018), computer scientist Judea
Pearl and mathematician Dana Mackenzie argue that these limitations
make the big data philosophy inadequate to solve our world’s challenges.
They note that as current ML solutions cannot find the causality relations
between patterns, they inevitably fail to generalise beyond the domain
92 L. VIOLA

of examples present in a given data-set, which most of the time will


include synthetic data (as opposed to real-world generated data). In other
words, most current ML methods tend to ‘overfit the data’, meaning that
‘they try to learn the past perfectly, instead of uncovering the real/causal
relationships that will continue to hold over time’ (Gonfalonieri 2020).
New avenues in this direction are increasingly being explored and have
resulted in new emerging fields such as causal machine learning (see for
instance, Pearl et al. 2016; Shanmugam 2018; Hernán and Robins 2021).
However, although the interest in this topic has grown exponentially in
the span of only a few years, methods and applications are still at an
experimental stage and, to my knowledge, primarily limited to academic
research.

4.4 THE PROBLEM WITH TOPIC MODELLING


The topic modelling algorithm essentially formalises distributional seman-
tics theory (cfr. Sect. 4.2). However, whereas the focus of distributional
semantics theory is on the meaning of a single word, topic modelling
tries to capture the overall meaning of clusters of words that appear
together (i.e., that are correlated) in a document. Put it differently, as single
words do not possess any meaning but meanings can only be entailed by
their context, topic modelling assumes that groups of words also purport
collective meanings, i.e., topics. This all sounds very logical but there is a
caveat. Similar to quantum, computational and genetic systems, languages
are discrete representations (i.e., outputs) of fundamentally continuous
structures (i.e., inputs). This property—called the discrete infinity of
language—essentially means unlimited productivity from limited means
(Chomsky and Smith 2000). It describes the ability of languages to create
an infinite variety of expressions of thought from a limited set of discrete
elements (Studdert-Kennedy and Goldstein 2003). The discrete infinity
of language necessarily entails that languages are intrinsically ambiguous
because meaning is context-bound, but significantly, it indicates that dif-
ferent contexts shape the creation of infinite meanings. The problem with
topic modelling is that it provides a probabilistic representation of words’
distributions in the ingested documents, but it is completely agnostic of the
underlying continuous structure of such documents, such as the ambiguity
of words’ use in each document and across texts as well as the documents’
4 HOW DISCRETE 93

coherent substructure, let alone their wider historical, social and cultural
entrenchment.
As said earlier in the chapter, topic modelling provides a probabilistic
representation of how words are distributed in documents according to
statistical calculations, that is, correlations. This means that words are
considered to be discrete elements; for example, in the corpus preparation
stage (cfr. Sect. 4.5.2), words are transformed into numeric variables and
their distribution across documents is represented as a distribution matrix.
What topic modelling then does is measuring the strength of the linear
relationship between these numeric variables. But topic modelling also
treats the corpus itself as a collection of discrete data, which means that
each text is also processed as a separate entity totally disconnected from all
the other texts in the batch. This is true regardless of whether the input
is all the chapters from the same book, all the issues of a newspaper or
all the abstracts ever submitted to an academic journal under the keyword
tag ‘topic modelling’. In other words, it is a computational technique that
efficiently identifies patterns of words’ distribution, but because it lacks
the words’ underlying continuous structure—the infinity of language—no
cause-effect relationship of the correlated phenomena can be established,
i.e., the meaning of such patterns.
Another issue with topic modelling is that it assumes that an a priori fixed
number of topics—which in any case is decided more or less arbitrarily—
is represented in different proportions in all the documents. Hence, if the
algorithm is instructed to find X number of topics, it will build a model that
fits that number. This assumption behind the technique cannot but paint
a rather artificial and non-exhaustive picture of the documents’ content
as it is hard to imagine how in reality, a fixed number of topics could
adequately represent the actual content of all the analysed documents.
Thus, correlations will surely be identified but not all these correlations
will necessarily carry significance, that is, meaning. Moreover, as countless
parameters can be tweaked, the smallest change will output a different
model, in which different correlations will be found and many others will
be missing. Conversely, even when the same parameters from the same
software are used on the same data-set, the algorithm will output a slightly
different model, which indeed proves once again that patterns will always
be identified, regardless of their significance. I will return to this point in
Sect. 4.5.3.
94 L. VIOLA

4.5 ANALYSIS OF DIGITAL OBJECTS: A


POST-AUTHENTIC APPROACH TO TOPIC
MODELLING
The post-authentic framework to digital knowledge creation contributes
to the urgent need for the establishment of critical data and visualisation
literacy in the current landscape—both public and academic—in which
computational techniques and outputs are predominantly framed as and
often believed to be exact, final, objective and true. Whilst exploiting
the new opportunities offered by computational technologies, the post-
authentic framework rejects an uncritical adoption of digital methods, and
it promotes a model not simplistically oriented towards problem-solving,
solution automation and sleek interface designs but towards encourag-
ing critical engagement and active participation. This ultimately means
recognising that knowledge is fluid and that the complex challenges we
face today therefore require a model of knowledge production that fosters
symbiotic collaborations, fluid exchanges and mutualistic contributions, as
opposed to hierarchical separation and competition.
As an example of how the application of the post-authentic framework
can contribute towards fluid processes of knowledge creation in the digital,
including the need for a less naïve conceptualisation of computational
techniques, digital objects and methods, I discuss here the third use case
of the book: analysis of digital objects. The example of topic modelling
demonstrates how critical engagement with computational techniques is
urgently required to meet the uncertain and problematic aspects of digital
research. For example, in fields such as DH in which this technique is used
extensively, a recent survey on LDA topic modelling (Du 2019) found
out that 74% of the surveyed studies didn’t report how their corpora were
prepared, more than 70% didn’t report which tool was used to train their
topic models, almost 57% didn’t report how many topics were trained, and
about 90.5% didn’t report how their topic models were evaluated.
DH is not at all an isolated case, however. Though with some dif-
ferences, a similar trend has also been found in software engineering
research (Silva et al. 2021) where topic modelling is widely used to
analyse online conversations among developers or to improve software
engineering tasks such as source code comprehension. From the analysis of
111 relevant papers, Silva et al. (2021) found both general inconsistency
and the adoption of opaque methods in topic modelling practices on
the whole pointing to a degree of uncertainty on the specificity of the
4 HOW DISCRETE 95

technique itself. The highest inconsistency was found with reference to


tasks such as choosing the number of topics, naming the topics and
evaluating the topics’ semantic interpretability. The authors attributed the
lack of specificity of the technique to the fact that the majority of the
surveyed papers had employed LDA ‘as is’, that is, they had adopted the
default parameters as an off-the-shelf software. This approach, however,
is generally not encouraged; computer scientists openly acknowledge that
finding the meaning behind the identified patterns is highly dependent on
the specifics of the sources because, as argued by Hindle et al. (2015, 510),
‘LDA does not look for the same patterns that people do’.
In this part of the chapter, I illustrate how the post-authentic framework
can be applied to topic modelling to guide a more mindful understanding
of the materiality of the sources. To this end, I deliberately choose cultural
heritage material, sources that are inevitably problematic from a computa-
tional point of view. I then focus on the key aspects of topic modelling that
are highly dependent on the sources and which in my experience have the
most significant impact on the results: pre-processing, corpus preparation
and deciding the number of topics. As a case example, I use the already
discussed Italian American newspapers as collected in ChroniclItaly 3.0
(cfr. Chaps. 2 and 3); my aim is to emphasise how preparing the material
for the analysis is part of the analysis itself. My discussion demonstrates
how, far from being fully automated, neutral and objective, the analysis
of a digital object requires the analyst to make countless decisions which
are yet different from the ones required when preparing the material for
enrichment, even when the same sources are used. Indeed, engagement
with the technique starts much earlier than the algorithm’s implementation
stage, which in any case should also not be performed as a fully automatic
operation. The application of the post-authentic framework allows me to
evidence how LDA may well be an unsupervised technique, but this simply
means that it works with unstructured data,5 and not at all that despite what
may be generally believed it does not require human intervention.

4.5.1 Pre-processing
In Chap. 3, I illustrated how pre-processing operations are far from being
standard and how it is in fact required that each intervention is carefully
assessed by scholars and practitioners and evaluated on a case-by-case basis.
In my discussion, I considered the many influential factors at play (e.g., the
96 L. VIOLA

materiality of the source, the specific task to be performed, the available


resources, both economic and technical) and illustrated how they in turn
are embedded in a complex, wide net of co-dependent actors, elements and
circumstances which have influenced each other and will in turn influence
current and future interventions. The same considerations apply to the
analysis of a digital object; this, I maintain, requires a high level of critical
engagement with the chosen method well before than the algorithm’s
implementation stage. In the case of topic modelling, for example, which
takes as its input unstructured data, e.g., plain text, the first thing one needs
to decide is the scope (cfr. Sect. 3.4), that is, what to consider as documents
(i.e., the input) (see for instance, Miner 2012). Topic modelling aims to
represent documents as probabilistic distributions of words; hence, in a
book, the documents could be the book’s pages else on a newspaper’s
page, they could be individual articles and so on. Conceptually, it of course
intuitively makes a difference to search for the topics in a chapter vs the
topics in each page of that chapter. But this is an important decision to
make also from a pragmatic point of view: as topic modelling is essentially
a statistical method, the length of each modelled item, i.e., the document,
does matter. And yet, although this is a rather determining factor, studies
using this method rarely specify how the criteria to decide the scope of the
documents are assessed and, even when mentioned, they are referred to
vaguely. In Silva et al.’s survey of topic modelling in software engineering
research (2021), for example, the authors found that 86% did not mention
such criteria at all nor did they acknowledge documents’ length as being
an important factor; they also found that even when the relevance of the
vocabulary size was acknowledged (14%), about a half (7.4%) did not
specify the selection criteria or the document’s length.
In the case of CroniclItaly 3.0, I considered that each file in the
collection corresponds to the first page of each issue published by the
newspapers on a certain date. This structure mirrors the way the collection
was digitised by the Library of Congress, evidencing once more the
inseparable complexity of relations between digital material and its wider
entrenchment in the surrounding digital infrastructure that created it
and/or provides it. Therefore, I defined as documents each file/issue as
it was in the collection; the decision had the dual advantage of modelling
the documents according to the events narrated on a day/issue basis while
following the Library of Congress metadata schema.
In terms of preparatory operations such as removing stopwords, lower-
casing, removing punctuation, numbers, special characters (cfr. Chap. 3),
4 HOW DISCRETE 97

for the specific task of topic modelling, additional specific linguistic deci-
sions must also be evaluated, here I discuss stemming and lemmatisation.
Although both aim to obtain a word root by reducing the inflection in
words, these operations are built on very different assumptions. Stemming
deletes the initial or final characters in a token based on a list of common
prefixes and suffixes that may typically occur in the inflected words of a lan-
guage (e.g., states → state). It is therefore language-dependent as it relies
on limited cases which would apply exclusively to certain languages that
follow specific inflection rules. Therefore for languages that follow fairly
regular inflection rules such as English, stemming may work reasonably
well, but applied to highly inflectional languages such as Italian, due to its
many exceptions and irregularities, the algorithm would almost certainly
perform poorly. Another strong limitation of stemming is that in many
cases—including low-inflectional languages—the output would not be an
actual word, meaning that the operation is likely to introduce new errors.
On the other hand, as it is not a particularly advanced technique, stemming
does not require a long processing time or processing power, and therefore
this solution may be implemented when working with particularly large
corpora or when constrained by time limitations.
Lemmatising is on the contrary a much more sophisticated technique as
it is based on more solid linguistic principles than stemming. By means of
detailed dictionaries that contain lemmas and by examining words’ context,
a lemmatising algorithm analyses the morphology of each word and it then
transforms it into its grammatical root (e.g., better → good). Especially
in the case of topic modelling in which the output is essentially a list of
words without any context, lemmatising can be very helpful to distinguish
between homonyms, words that have the same spelling, sometimes the
same pronunciation too but which in fact possess different meanings. For
example, the word mento in Italian can mean either ‘chin’ or ‘I lie’. A
lemmatising algorithm would theoretically be able to entail the use of
mento from its context and distinguish it from its homonym; in this case,
the different outputs would be mento (i.e., chin) for the former and mentire
(i.e., to lie) for the latter. Because of its complexity, however, lemmatising
may require a long time and very high processing power to perform, and so
in the case of large size collections or depending on the available means and
resources, it may not be ideal. Additionally, if on the one side lemmatising is
effective at differentiating between homonyms, on the other the reduction
of all inflected words to their lemma may cause information loss. For
instance, it would no longer be possible to recognise the tense (present,
98 L. VIOLA

past, future) or the grammatical person (I, they, you, etc.) of the verbs,
the gender or number of the nouns, the degree of the adjectives (e.g.,
superlative, comparative), etc.
To assess whether this type of information is relevant or not depends
once again on several factors such as the type of data-set (e.g., size,
content), the context of the digital analysis, the language of the data-set
and the specific research question(s); researchers should therefore carefully
evaluate pros and cons of implementing this operation. For example, in
researching narratives of migration as they were told by Italian American
migrants, the cons of implementing either stemming or lemmatising would
in my opinion exceed the pros. Italian is a highly inflectional language
and a great deal of linguistic information is encoded in suffixes and
prefixes; stemming therefore ill suits it. Similarly, lemmatising the corpus
would also cause the loss of information encoded in inflected words (e.g.,
verbs expressed in the first person, collective concepts expressed by plural
nouns) which could bring valuable insights into the cognitive, subjective
dimension of the stories told by the migrants.
Finally, whether to perform or not either of these operations is very
much dependent on the language of the data-set, not just because dif-
ferent languages have different inflection rules, but crucially also because
not all languages are equally resourced digitally. Indeed, as discussed in
Sect. 2.2, the digital consequence of the fact that most mass digitisation
projects have been carried out in the United States and later in Europe
is that computational resources available for languages other than English
continue to remain on the whole scarce. Such Anglophone-centricity is
often still a barrier to researchers, teachers and curators whose sources
are in languages other than English. Indeed, the comparative lack of
computational resources in other languages often dictates which tasks can
be performed, with which tools and through which platforms (Viola and
Fiscarelli 2021b). Moreover, even when adaptations for other languages
may be possible, identifying which changes should be implemented, and
perhaps more importantly, understanding the impacts these may have, is
often unclear (Mahony 2018). This includes lemmatising algorithms and
dictionaries which do not yet exist for all idioms; therefore, for particularly
under-resourced languages, stemming may be the only, far from ideal,
option.
4 HOW DISCRETE 99

4.5.2 Corpus Preparation


There are several libraries, for example, in Python or R, as well as off-
the-shelf tools (e.g., MALLET) that implement LDA for topic modelling.
Some allow for more sophisticated parameters than others, but generally
speaking, they all follow the same principles that I have already discussed:
a topic modelling algorithm models a number of documents to find
correlations essentially combining term frequency and word collocation
operations. In order to model topics from unstructured text, the material
first needs to be converted into a structured model that allows the
algorithm to perform such calculations, for example, through a method
called bag of words (BoW). What BoW does is to first transform the words
in the documents into numbers, i.e., into ids; this operation is typically
called ‘dictionary’. It then builds a matrix based on the frequency of the
words in the documents.
The generation of a BoW provides a notable example of the decisive
influence of the analyst on algorithmic processes and therefore ultimately,
on the output. Specifically, in order to prepare the dictionary, i.e., the
unique id assignment, the analyst has several so-called optimising oper-
ations at their disposal. For example, one might decide to filter out
‘extremes’, terms in the collection that are particularly frequent or infre-
quent; this operation may be performed in order to obtain what is believed
to be a more representative core vocabulary. There are several ways to
perform this task; for instance, the Python library Gensim (Řeh˚uřek and
Sojka 2010) has a built-in function called filter_extremes which
filters out tokens in the dictionary based on their frequency of occurrence.
The parameters are defined by the user who can decide—though one
might argue somewhat arbitrarily—to keep tokens which are contained
in a defined number of documents (i.e., no more than in X number of
documents and no less than in X number of documents) or to keep only
the first X number of most frequent tokens.
Another very common technique originated in the field of IR and
believed to contribute towards obtaining better topic modelling results
is the term frequency—inverse document frequency method (TF-IDF).
The method also scores the ‘importance’ of a word, also known as weight,
according to its relative frequency, i.e., the frequency of occurrence of
that word with respect to the number of documents in the collection in
which it appears. In this way, the weight of words that are ‘expected’
to appear more frequently—generally speaking non-salient words such as
100 L. VIOLA

prepositions, articles and so on but this is also specific to the material—


is resized accordingly. These preparatory operations are believed to help
optimise a corpus for IR tasks (not just topic modelling) and in most
cases, they may succeed. The assumption is, however, that a word is as
important as its relative frequency, which may be true most times, but
not always. Indeed, the possibility to capture words that are very rare
or that appear in very few documents may be as valuable in that they
may indicate a sudden shift in the used vocabulary, which may in turn
signal a linguistic change or perhaps even a conceptual one. Furthermore,
and perhaps even more significantly, these techniques only consider the
formal frequency of a word, meaning that they do not cater for how that
word is used. In the words of David Blei (Blei 2012, 82)—one of the
creators of topic modelling: ‘One assumption that LDA makes is the “bag
of words” assumption, that the order of the words in the document does
not matter’. This approach, defined as ‘unrealistic’ by Blei himself, may
work well for grammatical articles, prepositions or particularly recurrent
OCR errors, but as no semantic detection is formally conducted, the
frequency of a word, misleadingly referred to as the weight, becomes the
unique, determining factor in assessing whether a word is worth keeping
or not. What is important to remember is that what is worth keeping
for an algorithm may not reflect at all the writer’s original intention.
Languages may be probabilistic systems, but since words do not have a
one-to-one relationship with meaning, they are fundamentally ambiguous,
preferential systems. For this reason, researchers and practitioners should
assess carefully whether using relative frequency methods is the best option
when preparing the corpus to train the topic models. For example, research
has shown that statistically more accurate models do not necessarily lead
to a higher interpretability of the results (Jacobi et al. 2015).
As an attempt to retain the meaning of words, a method that aims to
compensate for this shortcoming is preparing the corpus as a dictionary of
n-grams, typically bi-grams or tri-grams. These are pairs or triples of words
that are statistically more likely to occur together than if they were found
independently from each other. Several studies (see for instance, Wallach
2006; Wang et al. 2007; Kherwa and Bansal 2020) have indeed reported
that using bi-grams to prepare the corpus may increase topics’ interpretabil-
ity as well as the efficiency of statistical methods such as perplexity and
coherence (cfr. Sect. 4.5.3), developed to help researchers and practitioners
optimise topic modelling results. Unfortunately, preparing the corpus as a
dictionary of n-grams is a lengthy and intense process which may indeed be
4 HOW DISCRETE 101

costly and time-consuming, especially in the case of very large repositories.


Furthermore, researchers working on historical material which typically
contains a high number of OCR errors should consider the actual added
value of using this technique. Studies on topic modelling which suggest
novel IR techniques or improved corpus preparation methods such as
those discussed here and which report an increase in the models’ quality
typically make use of digitally born data such as online film reviews, blogs,
news websites’ headlines or contemporary conference proceedings. Being
digitally born, these data-sets are of very high quality, especially compared
to digitised historical material. Indeed, the amount of OCR errors in
historical collections inevitably skims the output as each word containing
an error will be interpreted by the algorithm as a new word, even if only
by one character. Although pre-processing steps are taken to improve the
quality of the collection, many errors may remain. In most cases, these
errors would not prevent a human from reading and understanding, but
they will interfere with how a machine processes the text. As LDA is a
probabilistic method, regardless of the specific variations in the chosen pre-
processing and corpus preparation techniques, the results will be heavily
reliant on the data quality.
Finally, it is worth reminding that, due to the intrinsic unstable and non-
deterministic nature of topic modelling, assessing how and to what extent
any of these corpus preparation techniques actually improves the quality of
the models remains difficult. Users should indeed be aware that findings
obtained with topic modelling can never be fully replicated or generalised
even if the same data-sets are used, the same steps are implemented and
the same LDA settings are chosen from the same library/tool (Silva et al.
2021, 120). The post-authentic framework acknowledges such limitations
and it is mindful of drawing conclusions which are based solely on topic
modelling findings.

4.5.3 Number of Topics


The weaknesses and limitations as well as the dangers of overly trusting the
capacity of topic modelling to find meaningful patterns have been openly
acknowledged by several authors, including its very creators. Already in
2009, Chang et al. (2009), for example, compared the task of interpreting
the topics, i.e., finding the semantic meaning of the discovered patterns, to
the Chinese ritual of reading tea leaves. The authors wanted to warn users
102 L. VIOLA

of the high risk of attributing meaning to patterns and trends that in reality
may be ‘spurious’ in the mathematical sense, i.e., meaningless (Calude and
Longo 2017) (cfr. Sect. 4.3). Naturally, the risk is even higher when the
technique is adopted uncritically, especially in fields outside of computer
science. The authors clarified that although typically it is implicitly assumed
that the identified latent spaces will be semantically salient, in reality, this is
not at all what the promise of topic modelling is about. Since then, others
(see for instance Bail 2018) have also openly acknowledged the limitations
of the technique and repeatedly attempted to reframe topic modelling as
‘a tool for reading’ rather than a tool for meaning, that is, an exploratory
tool which in order to obtain more nuanced and reliable findings, should
be integrated with other methods. In this respect, for instance, sociologist
Chris Bail (ibid.) notes:

Despite this rather humble assessment of the promise of topic models, many
people continue to employ them as if they do in fact reveal the true meaning
of texts, which I fear may create a surge in “false positive” findings in studies
that employ topic models.

The application of the post-authentic framework to topic modelling


helps reframe the technique as a statistical tool and resizes the user’s
expectations accordingly. Topic modelling posits a set of multinomial
distributions over words—misleadingly called topics—as being present in
each document in various proportions; it provides fairly accurate models of
documents based on their words’ distribution as grouped into clusters. This
is valuable for obtaining a corpus representation through its words’ distri-
bution and/or for predicting a model of unseen text but the commonly
shared belief that these identified word clusters will also be semantically
meaningful, i.e., that they will be topics in the human sense, remains only
anecdotal (Chang et al. 2009).
The high risk of finding patterns that are in reality meaningless can be
exemplified by the challenge of finding the so-called ‘optimal’ number of
topics. This task requires user’s input to instruct the algorithm about how
many words’ distributions it has to search for in the corpus, which of course
cannot be known in advance. Depending on individual cases, sometimes
researchers and practitioners may know the collection extensively enough
to feel confident about what this number might be; others prefer building
multiple models with different numbers of topics to subsequently compare
the various compositions of the topics (Viola and Verheul 2019b). If on
4 HOW DISCRETE 103

the one hand this approach allows the researcher to closely examine the
varied topics’ structures before deciding on the most coherent model,
on the other it has the limitation to potentially lead analysts to prefer a
model that seems to confirm their a priori ideas, thus resulting in biased
interpretations. This approach may work fairly well in those cases when
the analyst has extensive knowledge of the material, the field and the
period of reference of the collection among others, but it is generally
not recommended in statistics; in the words of statistician Stephen M.
Stigler: ‘Beware of the problem of testing too many hypotheses; the more
you torture the data, the more likely they are to confess, but confessions
obtained under duress may not be admissible in the court of scientific
opinion’ (Stigler 1987).
More often, however, very little is known about the actual content of
the documents as true content is exactly what the technique is wrongly
believed to be able to find, which provides the original justifying argument
for using the method. It goes like this: due to the increasingly large size of
available digital material, it is not possible for researchers and practitioners
to explore the documents through traditional close reading methods; not
only would this be too time-consuming but also somewhat less efficient as a
machine will always outperform humans in identifying patterns. Although
this is in principle true as clarified earlier, the assumption that all the found
patterns are intrinsically meaningful is not. To meet this challenge, research
has been conducted towards implementing statistical methods that could
help researchers and practitioners find the craved ‘optimal number of
topics’. Two of the most common methods are model perplexity and
topic coherence, measures that score the statistical quality of different
topic models based on the topics’ compositions in several models. Though
not unanimously, the believed assumption behind these techniques is
that a higher statistical quality yields more interpretable topics. Model
perplexity (also known as predictive likelihood) predicts the likelihood
of new (i.e., unseen) text to appear based on a pre-trained model. The
lower the perplexity value, the better the model predicts the distribution
of the words that appear in each topic. However, studies have shown
that optimising a topic model for perplexity does not necessarily increase
topics’ interpretability, as perplexity and human judgement are often not
correlated, and sometimes even slightly anti-correlated (Jacobi et al. 2015,
7).
Topic coherence was developed to compensate for this shortcoming
and it has become popular over the years. What the method is designed
104 L. VIOLA

to do is to model human judgement by scoring the composition of the


topics based on how coherent, i.e., interpretable, they are (Röder et al.
2015). If the coherence score increases as the number of topics increases,
for example, that would suggest that the most interpretable model is
the one that displays the highest coherence value before flattening out
or dropping. Both techniques are widely used to determine the optimal
number of topics; the truth is, however, that neither of these measures is
ideal because what they actually score is the probability of observations
and not their degree of semantic meaning (Chang et al. 2009). In a study
by Chang et al. (2009) about topics’ interpretability, the authors noted
that these traditional metrics do not in fact capture whether topics are
interpretable or not as they optimise topic models for likelihood-based
measures but, as clarified earlier (cfr. Sect. 4.5), ‘LDA does not look for the
same patterns that people do’ (Hindle et al. 2015, 510). In the study, the
authors therefore suggest practitioners to adopt a more critical assessment
of the topics’ quality.
In this chapter, I have discussed how the use of familiar notions to name
computational techniques such as topic modelling, sentiment analysis and
machine learning has increased their popularity while creating epistemo-
logical expectations that these methods will disappoint. Especially when
used outside of their field of origin, the generated confusion contributes to
obfuscate the mathematical assumptions upon which these techniques are
built, such as the fundamental difference between discrete vs continuous
modelling of information and the stemming consequences. In the context
of digital knowledge creation and in relation to the big data philosophy, I
reflected on the significant, yet often overlooked, implications for notions
of causality and correlations. I then applied these considerations to describe
the third use case of the book, analysis of a digital object, and used the
properties and assumptions of topic modelling as the case example of a
widely used computational technique that treats a collection of texts as
discrete data. I have shown how the post-authentic framework can be used
as the applied theory to engage critically with topic modelling by devoting
special attention to the aspects of the analysis that are key for maintaining a
symbiotic connection with the sources: pre-processing, corpus preparation
and the number of topics. Specifically, I have shown how the application
of the post-authentic framework to topic modelling acknowledges the
technique at core correct but problematic and therefore in need of critical
engagement.
4 HOW DISCRETE 105

My intention is not to dismiss topic modelling as woefully inadequate,


but rather to encourage the integration of the method with critical scrutiny
in order to address its limitations. In so doing, I have argued that by
introducing a counter-narrative in the main scientistic discourse, the post-
authentic framework strains the current system and can help us refigure
a novel and more honest model for knowledge production in the digital.
For example, when topic modelling is used for humanistic enquiry such
as the analysis of cultural heritage material as discussed here, the post-
authentic framework serves as a warning that the technique’s limitations
are particularly significant and their impact on the provided interpretation
of the past is problematic. I will return to these points in the next chapter
in which I discuss the fourth and last use case of the book, visualisation
of a digital object. Specifically, I will show how I have applied the post-
authentic framework to prototyping a UI for topic modelling. I will insist
on key aspects that aim to promote the active and reflective participation
of the researcher in the process of digital knowledge production; I will
devote particular attention to the added value of building UI elements
that contribute to the urgent need for the establishment of critical data and
visualisation literacy, especially when computational methods are adopted
in fields outside of their original design.

NOTES
1. “Bloquer le pays ne permet pas d’endiguer l’épidémie”.
2. For a detailed and in-depth historical discussion on causality in physics and
philosophy, I refer the reader to Weinert (2005).
3. Please note that not everyone agrees with this view and that there are still
unanswered questions around causality, particularly in relation to discrete
phenomena in quantum mechanics. See, for instance, Le Bellac (2006) and
Jaeger (2009).
4. A well-known phrase that synthesises this fact is ‘correlation does not mean
causation’.
5. Not previously annotated material.
106 L. VIOLA

Open Access This chapter is licensed under the terms of the Creative Commons
Attribution 4.0 International License (http://creativecommons.org/licenses/by/
4.0/), which permits use, sharing, adaptation, distribution and reproduction in any
medium or format, as long as you give appropriate credit to the original author(s)
and the source, provide a link to the Creative Commons license and indicate if
changes were made.
The images or other third party material in this chapter are included in the
chapter’s Creative Commons license, unless indicated otherwise in a credit line
to the material. If material is not included in the chapter’s Creative Commons
license and your intended use is not permitted by statutory regulation or exceeds
the permitted use, you will need to obtain permission directly from the copyright
holder.
CHAPTER 5

What the Graph

Figures don’t lie, but liars do figure. (Attributed to Carroll D. Wright, 1889)

5.1 POKER (INTER)FACES


Data visualisation and information visualisation are commonly used as
synonyms but it has been argued that they in fact mean different things
(Spence 2014; Falkowitz 2019; Ware 2021). The main difference would lie
in the basic distinction between data and information in computer science:
data is understood as raw materials (e.g., numbers), that is, the input, and
believed to not carry any specific meaning per se, whereas information is
the output, i.e., the meaning carried by a set of data. Thus, following this
definition, information visualisation is understood as a cognitive activity
(Spence 2014, 2), the process of discovering the meaning associated with
a set of data, whereas data visualisation is the process of exploring data that
may or may not uncover meaning, i.e., result in information visualisation.
Another way to look at it is to consider the purpose of these two activities;
data visualisation would essentially be a heuristic activity, whereas the
main goal of information visualisation would be to influence a decision-
making process (Falkowitz 2019). The two types of visualisations would
accordingly translate into distinct products: data visualisations would allow
several levels of interactions (e.g., filtering, zooming, selecting, aggregat-

© The Author(s) 2023 107


L. Viola, The Humanities in the Digital: Beyond Critical Digital
Humanities, https://doi.org/10.1007/978-3-031-16950-2_5
108 L. VIOLA

ing), whereas information visualisations would simply show one or limited


viewpoints while obscuring other perspectives more or less deliberately.
Thus, according to this logic, only data that function as cognitive tools
become information and therefore not all data is information.
The post-authentic framework that I advance in this book argues against
binary conceptualisations that misleadingly suggest and continue to perpet-
uate the artificial notion of ‘raw data’, as if data could naturally pre-exist in
a pristine, untouched environment, as if all the steps preceding the visual-
isation, for example selection, collection, compilation, categorisation and
storage, would not already be acts of interpretation and creation (Manovich
2002; Gitelman 2013; Drucker 2020). The post-authentic framework
therefore transcends the distinction between data and information and
between data visualisation and information visualisation; it acknowledges
that data always embed the interpretative dimensions that have originated
it. It also recognises that not just the processes of data creation but equally
the very tools and methods adopted for creating data are equally situated,
limited and partial. Actions, tools, algorithms, platforms, infrastructures
and methods are never neutral because they themselves stem from systems
that are in turn situated and therefore already interpreted. Hence, whether
the intent is to explore data or to persuade through data, the post-authentic
framework to visualisation advocates transparency in the way the data is
created and conclusions are drawn. In light of the considerations reasoned
in the previous chapters, I will therefore use these terms interchangeably
to signal that we need to move beyond the distinction between data and
information and consequently between data visualisation and information
visualisation because data is always produced to various degrees.
Historically, innovations in data visualisation have originated from
concrete, often practical goals (Friendly 2008, 30) so it is no surprise
that the explosion of data of the last two decades and the subsequent
need to analyse it and interpret it paired with advances in technology and
statistical theory have greatly impacted the field. Indeed, as it is praised for
its capacity to promptly display emerging properties in the data as well as to
enhance access, visualisation has increasingly become an integral part of the
digital. For example, using information visualisation to better understand
the complex, internal processes according to which ML models elaborate
data and provide results has been shown to offer insights that may lead
to more transparency and increased trustworthiness in ML outputs and it
has therefore become very popular in recent years (Chatzimparmpas et al.
2020).
5 WHAT THE GRAPH 109

Visualisation has also gained a significant role in the context of analytical


methods, including topic modelling. Studies have argued that graphic
display tools are valuable not only for understanding the models’ results
but, because similarity measures and human interpretation are partially
misaligned (cfr. Chap. 4), also for a general assessment of whether topic
modelling is at all a suitable technique for AI and cognitive modelling
applications (Murdock and Allen 2015, 4284). Several possible visualisa-
tion solutions have therefore over the years been proposed towards solving
some of the already discussed challenges around topic modelling. These
can be roughly divided into two research directions: the use of visualisation
to improve the interpretation of the results and, stemming from the first
one, the use of visualisation to improve the results themselves. Solutions
in the first category try to enhance topics’ interpretability by visualising
the results in a variety of ways using different statistical measures. Termite
(Chuang et al. 2012), for example, allows terms’ comparison within and
across topics using saliency measures based on the concept of weight
(cfr. Chap. 4), but it does not allow for document interactivity. Chaney
and Blei (2012) propose a web-based interface to allow nontechnical
users to navigate the output of a topic model but it is not possible to
draw comparisons of the topics’ distribution across documents. TopicNets
(Gretarsson et al. 2012) visualises the relations between a set of documents
(or parts of documents) and their discovered topics in the form of an
interactive network-type graph (i.e., nodes and edges), but it does not
show topic or document composition. LDAvis (Sievert and Shirley 2014)
visualises terms within a topic according to weighted topic-word and topic-
topic relationships but the connection with the documents is lost. Finally,
Topic Explorer (Murdock and Allen 2015) builds on LDAvis by visualising
topic-document and document-document relationships as well as topic
distribution and document composition.
Studies in the second category allow users to interact with the models
through a variety of human-in-the-loop1 (HINTL) methods. For example,
iVisClustering (Lee et al. 2012) allows users to manually create or remove
topics, merge or split topics and reassign documents to another topic while
visualising topic-document associations in a scatter plot. Using ITM —
Interactive Topic Modelling (Hu et al. 2014)—users can add, emphasise or
ignore words within topics, whereas with UTOPIAN (Jaegul Choo et al.
2013), users can adjust the weights of words within topics, merge and split
topics and generate new topics. Hoque and Carenini (2016) and Cai and
Sun (2018) also propose visual methods to curate the topics by adding
110 L. VIOLA

or removing terms within a topic, adjusting the weight, merging similar


topics or splitting mixed ones, manually validating the results and finally
generating new topics on the fly.
As this brief literature review shows, one of the main challenges of topic
modelling, namely interpreting the results, has so far been tackled from
a problem-solving point of view for which the main task is essentially
to best exploit the model’s identified document structure (Blei 2012).
Significantly, what all these studies have in common is the implementation
of visualisation techniques exclusively in the final stage of a topic modelling
workflow, that is, either to interpret the algorithm’s output or when
training the algorithm itself. What these visualisation interfaces clearly show
is the persistent conceptual disconnection between the results and the pro-
cesses that generated them, the common belief that only interventions on
the algorithm or on the final output are worthy of study and examination
and so interventions on the sources-data are dismissed as not immediately
relevant. As I have argued in Chap. 3, these processes of manipulations are
often seen as ‘standard’, unproblematic and inconsequential rather than
as heavy interventions on the sources and therefore on the results. The
post-authentic framework that I propose in this book, on the contrary,
strives to preserve and maintain the connection between the analyst and the
digital object and it opposes any naïve conceptualisation of digital objects
as finished, fixed, unproblematic entities. The post-authentic framework
ultimately sees the human-digital object relationship as an essential compo-
nent of the process of knowledge production in the digital. When applied
to UI, the post-authentic framework is therefore not only mindful of such
connection but it in fact encourages the scholar to be critically aware of it.
My efforts towards building a post-authentic interface for topic modelling
that I present here are therefore guided by this intention to enable users
to actively engage with their digital sources and take ownership of their
interventions but also to self-reflect and critique on those, thus openly
acknowledging the interpretative dimension of the digital research process.
The endless flow of digitised material and the need to store it, access
it and analyse it has impacted the role of visualisation also in those fields
that traditionally relied on material sources, for instance, cultural heritage,
history, linguistics and more widely the humanities. With specific reference
to cultural heritage, for instance, institutions have over the years resorted
more and more to visual means—typically web-based interfaces—as a way
to enhance access to cultural collections for users’ appreciation as well as
for research purposes (Windhager et al. 2019a). In a survey of information
5 WHAT THE GRAPH 111

visualisation approaches to digital cultural heritage collections from 2014


to 2017, for example, Windhager et al. (ibid.) found out that visualisations
of digital cultural heritage material have steadily increased, peaking in
2015. At the same time, however, these authors also highlighted that
the seventy visualisation systems, prototypes and platforms they surveyed
were sharing ‘overly narrow task- and deficiency-driven approaches to
interface design that are grounded in a simplistic user-as-consumer- and
problem solver-model’ (ibid., 13). Drucker (2013; 2014; 2020) has also
long argued that graphical displays in the humanities often display a
function- and task-driven UI design and generally lack a critical stance
towards visualisation, evidencing IR intentions rather than the elicitation
of curiosity, thoughtful engagement and reflection.
The post-authentic framework that I advance in this book aims to
contribute to the urgent need for the establishment of critical data lit-
eracy, including visualisation literacy. It conceptualises digital objects as
unfinished, situated processes, and it acknowledges the limitations, biases
and incompleteness of tools and methods adopted for the analysis and
visual representation of digital content. It provides helpful concepts for
a re-theorisation of the process of digital knowledge creation, including
the implementation of re-devised practices which are also acknowledged
as always being adapted, unfixed, unfinished, arranged and interpreted.
Applied to visualisations and interfaces, it acknowledges them as problem-
atic endeavours that embed a wide net of situated processes, and it caters for
their novel conceptualisation as epistemic objects which themselves carry
meanings and therefore bear consequences.
Post-authentic graphical displays counter what I call poker interfaces,
attractive visualisations and sleek interfaces that tend to present infor-
mation as detached from any subjectivity or which obscure or even
break the connection with the digital object and the multiple layers
of manipulation. In this chapter, I discuss two examples of how the
post-authentic framework can be applied to visualisations; in Sect. 5.2,
I examine prototypical work for designing a topic modelling interface
whereas in Sect. 5.3, I present the design choices we took whilst developing
DeXTER, the interactive visualisation app to explore enriched cultural
heritage material currently loaded with ChroniclItaly 3.0 (cfr. Sect. 2.4).
My discussion will specifically revolve around the challenges of promoting
symbiotic exchanges when engaging with software especially focusing
on the efforts we took to expose—rather than hiding—the ambiguities
and uncertainties of NA and SA. I end the chapter by acknowledging
112 L. VIOLA

digital visualisation as fundamentally a curatorial operation which requires


countless of subjective decisions that intervene on the digital object with
several layers of manipulation; the post-authentic framework to graphical
display, I conclude, can guide the encoding of such processes in the
visualisation.

5.2 VISUALISATION OF DIGITAL OBJECTS: TOWARDS


APOST-AUTHENTIC USER INTERFACE FOR TOPIC
MODELLING
The development of a post-authentic interface for topic modelling should
be understood in the context of the wider project Digital History Advanced
Research Projects Accelerator (DHARPA),2 within which software for DH
research is currently being developed. Originally conceived by Sean Takats,
the DHARPA project today is a team of developers and academics who
continuously contribute to each other’s expertise by sharing knowledge
and practises from a range of disciplines (computer programming, data
engineering, data visualisation, linguistics, geography and various strains
of history) (Cunningham et al 2022). Like DeXTER, DHARPA is hosted
at the C2 DH (cfr. Chap. 2). At the heart of DHARPA is encoding criticism,
the effort of advocating the active and reflexive participation of the scholar
in the process of digital knowledge production (Viola et al. 2021). Digital
tools and techniques have been harshly criticised for alienating humanities
scholars from their sources (ibid.) (cfr. Chap. 1), a bond regarded as crucial
for the pursuit of scholarly enquiry; the driving rationale of DHARPA is
that through critical assessment, contextualisation and documentation of
digital methodologies—which are understood as partial and situated—such
relationship can on the contrary be fortified and expanded. With this aim,
DHARPA is developing software that operationalise critical epistemology
by placing the scholar-source relationship at its centre. The efforts towards
building a post-authentic interface for topic modelling that I present
here are therefore guided by the very same intention to enable users
to actively engage with the digital object and take ownership of their
interventions. Moreover, through the post-authentic lens, my aim is to
openly acknowledge the interpretative dimension of the digital research
process and thus to embed self-reflection and critique into software’s both
back-end and front-end. The confluence of the post-authentic framework,
DeXTER, DHARPA and the C2 DH is a perfect example of how the
5 WHAT THE GRAPH 113

notions of symbiosis and mutualism can guide the process of knowledge


creation in the digital.
The post-authentic framework opposes any conceptualisation of digital
objects as something disconnected from the material sources; when applied
to UI, it is therefore oriented towards safeguarding such connection and
encouraging the scholar to be critically aware of it. The example of the
NLP software MALLET (McCallum 2002) illustrates a case in which this
connection is obscured. MALLET is a widely used ML tool for a range
of NLP tasks such as document classification, clustering, topic modelling,
information extraction and others. During the steps of data preparation for
topic modelling (cfr. Sect. 4.5), for example, the analyst is never prompted
to view the results of their interventions and overall, there is little chance
of interacting with the digital object. This does not intrinsically mean that
any topic modelling analysis based on MALLET is to be discarded, but it
does mean that a distance is imposed between the sources and the analyst.
I argue that it is this distance that inevitably causes disconnection and
increases the risk to attribute meaning to spurious patterns (cfr. Sect. 4.5.3).
Indeed, to ensure that the identified patterns carry actual significance, con-
siderable efforts need to be subsequently directed towards regaining this
connection, sometimes in the form of novel analytical methodologies such
as the discourse-driven topic modelling approach (DDTM) we developed
within OcEx (cfr. Sect. 2.4) (Viola and Verheul 2019b). This approach
integrates topic modelling with the discourse-historical approach (DHA)
(Reisigl and Wodak 2001), an applied method of critical discourse analysis
theory (van Dijk 1993) which triangulates linguistic, social and historical
data to understand language use in its full socio-historical context and as a
reflection of its cultural values and political ideologies (Viola and Verheul
2019b). The integration of DHA into topic modelling is particularly useful
for tasks such as topic interpretation and labelling, thus reducing the risk
of attributing meaning to spurious patterns.
Applied to interface design, the post-authentic framework strives
to avoid the human-digital object disconnection by prompting critical
engagement with the specificity of the source. Taking once again the
example of ChroniclItaly 3.0, the post-authentic framework devotes careful
attention to never lose contact with the information embedded in the
filenames themselves. Based on the Library of Congress cataloguing
schema, the filenames carry valuable metadata information including
the reference code of the newspapers’ titles, the page number and the
publication date of each issue (Viola and Fiscarelli 2021a). The reason
114 L. VIOLA

why it is so very important to critically engage with this information is


once more due to the specificity of the source. Immigrant newspapers
were constantly on the verge of bankruptcy which caused titles to be
often discontinued; for the same reason, some newspapers could afford
to publish biweekly or even daily issues, while others could only publish
intermittently (Viola and Verheul 2019a,b). This is naturally reflected in
the composition of the collection; newspapers like L’Italia—one of the
most mainstream Italian immigrant publications in the United States at
the time—and Cronaca Sovversiva, the most important anarchic Italian
American newspaper managed to continuously publish for years, whilst
others like La Rassegna or La Sentinella del West which came into being
as small, personal projects of their funders could only survive for a few
months. Although across the entire period of coverage, on the whole
the collection holds a fair balance between the number of issues, the
type of newspaper, the geographical location, the time span and political
orientation of each title, the exploration of the collection’s metadata
highlights factors such as over- or under-representation of some titles
either on the whole or at specific points in time. Figure 5.1 displays how
the issues are diversely distributed throughout the collection.
The application of the post-authentic framework to digital objects
recognises that factors like the heterogeneity of the digital object may
result in potential polarisation of topics and points of view; it therefore
maintains a connection with the digital object by facilitating access to
such information and allowing the researcher to engage critically with
it. By embedding the option to explore the metadata information (if
present), the post-authentic framework signals the acknowledgement of
the continuous underlying structure of a digital object (cfr. Sect. 4.2)
hidden by its digital transformation into discrete form, i.e., sequences of 0s
and 1s. It is indeed this acknowledgment that allows the analyst to obtain
a fuller understanding of the object itself, in turn facilitating fundamental
tasks such as adjusting the research question, resizing expectations and
making sense of the results.
This sustained connection with the materiality of the source has imme-
diate relevance for computational techniques such as topic modelling. As
discussed in Sect. 4.4, the LDA algorithm assumes that a fixed number
of topics is represented in different proportions in all the documents;
this is clearly a rather artificial and unrealistic assumption as it is highly
unlikely that one fixed—and to some extent arbitrary—number of topics
could adequately represent the content of all the ingested documents.
5 WHAT THE GRAPH 115

La Tribuna
del Connecticut
La Sentinella
del West

La Sentinella

La Rassegna

La Ragione
Title

La Libera
Parola

L'Italia

L'Indipendente

Il Patriota

Cronaca
Sovversiva
98

01

04

07

10

13

16

19

22

25

28

31

34

37
18

19

19

19

19

19

19

19

19

19

19

19

19

19
Year

Fig. 5.1 Distribution of issues within ChroniclItaly 3.0 per title. Red lines indicate
at least one issue in a three-month period. Figure taken from Viola and Fiscarelli
(2021b)

Allowing the analyst to know that the material for the digital analysis is
distributed differently acts as a way to highlight that problematic aspects
of digital research and digital objects that precede the analysis itself but
which nevertheless influence how the technique may be applied and the
results interpreted. Figure 5.2 shows how this step could be handled in
the interface. Once the documents are uploaded, the analyst is prompted
by a question asking them about the potential presence of metadata infor-
mation. With this question the intention is to maintain contact with the
continuous aspect of the digital object hidden by its discrete representation
and further altered by the topic modelling algorithm which treats the
documents, too as a collection of discrete data.
If the analyst chooses ‘yes’, the metadata information would then be
used to create a dynamic, interactive visualisation inspired by the one
displayed in Fig. 5.1; this would display how the files are distributed in
the collection, ultimately creating room for reflection and awareness. In
116 L. VIOLA

Fig. 5.2 Wireframe of a post-authentic interface for topic modelling: sources


upload. The wireframe displays how the post-authentic framework to metadata
information could guide the development of an interface. Wireframe by the author
and Mariella de Crouy Chanel

the case of ChroniclItaly 3.0, for example, this visualisation displays the
number of published issues on a specific day, month or year and by
which titles; the display of this information allows the analyst to promptly
identify the difference in the frequency rate of publication across titles and
potential gaps in the collection (Fig. 5.3). The post-authentic framework
to visualisation signals the importance of maintaining the connection with
the digital object, understood as an organic, problematic entity. Such
connection is acknowledged as an essential element of the process of
knowledge creation in the digital in that it favours a more engaged, critical
approach to digital objects and it creates a space in which more informed
decisions can be made and ultimately answering the need for digital data
and visualisation literacy.
5 WHAT THE GRAPH 117

Fig. 5.3 Post-authentic framework to sources metadata information display.


Interactive visualisation available at https://observablehq.com/@dharpa-project/
timestamped-corpus. Visualisation by the author and Mariella de Crouy Chanel

The post-authentic framework to interface design aims to make the link


between the analyst, the digital object’s discretised continuous information
and the methods employed to manage it, analyse it and visualise it explicit
at each stage of the digital knowledge creation process. Informed by
these motivations, an interface for topic modelling would facilitate close
engagement, for instance by allowing users to create and preview subsets of
the digital object (e.g., through filtering cfr. Sect. 4.5.2) for further explo-
ration or to test hypotheses on a sample. In this way, the post-authentic
framework signals the rejection of objectivist and positivist understandings
of digital processes which depict data as pre-existing and somewhat fixed.
The interface, on the contrary, would adopt a constructivist principle which
exposes the management of data as a problematic enterprise, a subjective
118 L. VIOLA

act made of constant interpretation, manipulation and decisions which


transform, select, aggregate and ultimately create data (Drucker 2011).
Following these principles, the wireframe in Fig. 5.4 displays how sources’
preview could be handled in the interface.
Research that adopts computational techniques rarely acknowledges the
influential role of tools, infrastructures, software, categories, models and
algorithms on the research process or the results, as these are typically
reputed to be neutral. The researcher or curator often provides little or no
documentation of the decisions and the mechanisms that transformed their
sources into data (Viola and Fiscarelli 2021b). Through the chapters of this
book, however, I have demonstrated that transformative operations such as
those directed at the creation, enrichment, digital analysis and visualisation
of a digital object involve an intricate network of complex interactions
between countless elements and factors including the materiality of the
sources, the digital object and the analyst as well as between the operations
themselves. Although often presented as more or less ‘standard’, these
operations on the contrary need to be problematised and tackled criti-
cally. The post-authentic framework to knowledge creation in the digital
acknowledges them as limited and situated, and it prompts a fundamental
rethink of how these operations impact the sources and produce a digital
object; this challenge, I maintain, can be met by maintaining engaged
contact with the digital object. For problematic operations such as pre-
processing, stemming and lemmatising (cfr. Sect. 4.5.2), this connection
can be sustained by prompting engagement, for instance by making
processes readily visible and intelligible to the analyst. The wireframes
in Figs. 5.5 and 5.6 show how these operations would be handled in
the interface. An expandable tool-tip asking ‘What is pre-processing?’
together with i buttons located next to each operation would give users the
possibility to access detailed explanations of the available operations—often
grouped under opaque labels such as ‘data cleaning’—to better understand
the assumptions behind them. The UI would also allow data preview, thus
making the impact of each intervention visible and accessible to the analyst.
These features would create room for more conscious decisions and, at the
same time, they would signal that data is always made.
The post-authentic framework calls upon the scholar’s critical and
active engagement in the process of knowledge creation in the digital
and raises awareness of the limitations, biases and incompleteness of tools
and methods; applied to interface design it can therefore contribute to
the establishment of critical data management and visualisation literacy.
Fig. 5.4 Post-authentic interface for topic modelling: data preview. The wireframe displays how the post-authentic
framework could guide the development of an interface for exploring the sources. Wireframe by the author and Mariella
5 WHAT THE GRAPH

de Crouy Chanel
119
120 L. VIOLA

Fig. 5.5 Interface for topic modelling: data pre-processing. The wireframe dis-
plays how the post-authentic framework to UI could make pre-processing more
transparent to users. Wireframe by the author and Mariella de Crouy Chanel

In the interface, this would be achieved by entering into a dialogue


with the researcher, for instance, by asking the question ‘What is corpus
preparation?’ (Fig. 5.7); the combination of expandable tool-tips and i
buttons next to each operation would serve the dual purpose of making
the process of data creation more intelligible to users while maintaining
the connection with the digital object. Indeed, more transparent processes
enable a more conscious participation of the scholar in the fluid exchanges
between computational and human processes which are understood as part
of a wider, complex system of interactions. The post-authentic framework
attempts to reach symbiosis and mutualism (cfr. Sect. 2.2) by making these
exchanges explicit as opposed to a passive and dissociated fruition of such
interactions. To the same aim, the output resulting from implementing
the different methods for corpus preparation would be saved each time
(left panel in Fig. 5.7) so that users could experiment with various methods
5 WHAT THE GRAPH 121

Fig. 5.6 Interface for topic modelling: data pre-processing (stemming and lem-
matising). The wireframe displays how the post-authentic framework to UI could
make stemming and lemmatising more transparent to users. Wireframe by the
author and Mariella de Crouy Chanel

and settings, compare results and make more informed decisions. In this
way, the interface would actualise a counterbalancing narrative in the main
positivist discourse that equals the removal of the human—which in any
case is illusory—to the removal of biases. To the contrary, the argument
I advance in this book is that it is only through the active and conscious
participation of the human in processes of data creation, tools’ selection,
methods’ and algorithms’ implementation that such biases can in fact be
identified, acknowledged and to an extent, addressed.
The post-authentic framework to knowledge creation in the digital
advocates a more participatory, critical approach towards digital methods
and tools, particularly if they are applied for humanistic enquiry. Against a
purely correlations-driven big data approach, it offers a more complex and
nuanced perspective that challenges current views sidelining human agency
122 L. VIOLA

Fig. 5.7 Interface for topic modelling: corpus preparation. The wireframe dis-
plays how the post-authentic framework to UI could make corpus preparation more
transparent to users. Wireframe by the author and Mariella de Crouy Chanel

and criticality in favour of patterns and correlations. Applied to meth-


ods such as topic modelling, for instance, the post-authentic framework
highlights the assumptions behind the technique, such as discreteness, a-
causality, randomness and text disappearance. Whilst exploiting the new
opportunities offered by computational technologies, it rejects a passive
adoption of these methods, and it highlights the intrinsic dynamic, situated,
interpreted and partial nature of the digital in contrast with the main
discourse that still presents techniques and outputs as exact, final, objective
and true. Applied to UI, it also provides helpful concepts for both its theo-
risation and the implementation of re-devised visualisation practices which
are also acknowledged as being adapted, unfixed, unfinished, arranged and
interpreted.
5 WHAT THE GRAPH 123

5.3 DEXTER: A POST-AUTHENTIC APPROACH TO


NETWORK AND SENTIMENT VISUALISATION
In the context of visualisation, questions of criticality, transparency, trust
and accountability have increasingly become part of the scientific dis-
course (see for instance Gaver et al. 2003; Drucker 2011, 2013, 2014,
2020; Glinka et al. 2015; Sánchez et al. 2019; Windhager et al. 2019a;
Boyd Davis et al. 2021) and several recommendations for operationalising
critical digital literacy in visual design have been suggested. For example,
the interpretative and evaluative value of ambiguity for design has been
praised by Gaver et al. (2003); Drucker (2020) has proposed a framework
for visualisations that promotes plurality, critical engagement and data
transparency; Windhager et al. (2019a) have suggested design guidelines
that also promote contingency (i.e., acknowledging the incompleteness of
user experience) and empowerment (i.e., encouraging user’s self-activation
and engagement) (141), and Sánchez et al. (2019) have offered a frame-
work for managing uncertainty in DH visualisations. Despite an increased
awareness, however, research in this area points out how intrinsic aspects
of knowledge creation such as ambiguity, uncertainty and errors are
still largely hidden from view and how instead the majority of graphical
displays tend to be sleek visualisations that convey exactness, neutrality and
assertiveness, i.e., poker interfaces.
The post-authentic framework that this book suggests incorporates all
these recent perspectives; however, as it refers to the realm of digital
knowledge that is created daily, at the same time, it goes beyond them.
With specific reference to visualisations, the post-authentic framework
endorses ambiguity, uncertainty and transparency; it acknowledges the
incompleteness and partiality of data, tools and methods and rather than
mudding it, it exposes their potential untrustworthiness. It is thanks to
this awareness, I maintain, that the post-authentic framework contributes
to maintain the process of knowledge creation in the digital honest and
accountable, both for present and future generations. The visualisations for
NA and SA in the DeXTER app that I present here are a good example of
how the post-authentic framework can actualise these aims when visualising
a digital object.
The DeXTER project is a post-authentic research activity which com-
bines the creation of an enrichment workflow with a meta-reflection
on the workflow itself as well as the creation of an interactive app to
visualise enriched digital heritage collections. This means that the main
124 L. VIOLA

intention guiding its design is to provoke independent assessment (Gaver


et al. 2003), to expose inconsistencies and cast doubts on the digital
object and to create a space for interpretation, rather than to provide
one. This includes openly acknowledging that the implementation and
potential value of the used methods are also inextricably intertwined with
the specificity of the source as well as the research context of the related
project. For example, when enriching ChroniclItaly 3.0, we used NA and
SA to explore the several ways in which referential entities relate to each
other in the collection; this included modelling their frequency of co-
occurrence in a sentence and how this changes over time, the prevailing
attitude towards such entities, and connections between entities at specific
points in time (e.g., on the same day) across the different newspapers.
These operations aimed to maximise the potential value of using referential
entities as indicators of markers of identity (cfr. Chap. 3), that is, as a way to
navigate the process of Italian Transatlantic migration as it was narrated by
the different communities of Italian immigrants in the United States. Far
from being standard, techniques and methods are therefore understood as
adapted and chosen and their suitability in need of assessment rather than
assumed to be intrinsically good (or bad).
The post-authentic framework can inform the selection of methods by
warning the analyst that techniques developed in other fields for specific
aims and with specific assumptions are not necessarily compatible across
different data types. For example, NA is a method that originates in
mathematics and graph theory (Biggs et al. 1986), and although it has
long been applied across disciplines and for different purposes, it is typically
used to answer questions mostly pertaining to the social sciences. This is
because the underlying assumption is that the discrete modelling of how
actors (e.g., entities) relate to each other (i.e., edges) provides adequate
explanations of social phenomena. For a detailed overview of its application
particularly in modern sociology, I refer the reader to Korom (2015).
Due to its characteristic feature of schematically representing abstract
and often ambiguous information, NA has recently become popular also
in the humanities. In linguistics, for example, NA has been applied to large
textual corpora of naturally occurring language to analyse the relationship
between language and identity in multilingual communities (Lanza and
Svendsen 2007) or to explore complex syntactic and lexical patterns as
networks, for example in language acquisition or language development
studies (Barceló-Coblijn et al. 2017). It has also been argued that NA
could be integrated in sociolinguistics as a way to provide insights into the
5 WHAT THE GRAPH 125

relationship between the use of linguistic forms and culture (Diehl 2019).
In branches of DH such as digital history and digital cultural heritage,
NA is also considered to be an efficient method to intuitively reduce
complexity (Düring et al. 2015). This may be due to the fact that this
technique benefits particularly from attractive visualisations which support
the impression that explanations for social events are accurate, complete,
detailed and scientific, naturally adding to the allure of using it.
However, a typically omitted, yet rather critical issue of NA is that the
graphs can only display the nodes and attributes that are modelled; as
these stem from samples which by definition are incomplete and which
undergo several layers of manipulation, transformation and selection, the
conclusions the graphs suggest will always be partial and potentially based
on over-represented actors or conversely, on underrepresented social cate-
gories. In the case of a digital object such as the cultural heritage collection
ChroniclItaly 3.0 which aggregates sources heterogeneously distributed
(cfr. Sect. 5.2), this issue is particularly significant as any resulting graph
depends on the modelled newspaper (e.g., mainstream vs anarchic), on the
type and number of entities included and excluded and on the attributes’
variables (e.g., frequency of co-occurrence, number of relations, sentiment
polarity), to name but a few. Each one of these factors can dramatically
influence the network displays and consequently impact on the provided
interpretation of the past.
The project’s GitHub repository3 —which is to be understood as inte-
gral part of the visualisation interface—is a good example of how the
post-authentic framework can guide the actualisation of principles of trans-
parency, accountability and reproducibility and how it values ambiguity
and uncertainty. The DeXTER’s GitHub repository documents, explains
and motivates all the interventions on the data, including reporting on the
processes of entity selection (cfr. Sect. 3.3). The aim is to warn the analyst
that despite being (too) often presented as a statement of fact, a visually
displayed network is a mediated and heavily processed representation of the
modelled actors. As such, the post-authentic framework does not solely
aim to increase trust in the data and how it is transformed, but also
to acknowledge uncertainty in both the data lifecycle and the resulting
graphs and finally to expose and accept how these may be untrustworthy
(Boyd Davis et al. 2021, 546). The act of making explicit the interpretative
work of shaping the data is what Drucker calls ‘exposing the enunciative
workings’ (2020, 149):
126 L. VIOLA

For data production, the task is to expose some of the procedures and steps
by which data is created, selected, cleaned, and processed. Retracing the
statistical processes, showing the datamodel and what has been eliminated,
averaged, reduced, and changed in the course of the lifecycle would put the
values of the data into a relative, rather than declarative, mode. This is one of
the points of connection with the interface system and task of exposing the
enunciative workings.

By acknowledging that the displayed entities are not all the entities in
the collection but in fact a representative, yet small, selection, DeXTER
encourages close engagement with the NA graphs; it does not try to
remove uncertainty but it points where it is. At the same time, it recognises
the management of data as an act of constant creation, rather than a mere
observation of neutral phenomena. For example, the process of entity
selection as I described it in Sect. 3.4 created a subset of the most fre-
quently occurring entities distributed proportionately across the different
newspapers. With this intervention, we aimed to alleviate the issue of source
over-representation due to some titles being much larger than others
and to reduce complexity in the resulting network graphs, notoriously
considered as the downside of NA. At the same time, however, this
intervention may cause the least occurring entities to be under-represented
in the visualisations. Thus, the transparent and detailed documentation
of how we intervened on the data that originates the NA visualisations
counterbalances the illusion of neutrality and completeness often conveyed
by ultra-polished NA visualisations.
Another issue of NA data modelling concerns the theoretical assumption
upon which the technique is based. As a bare minimum, a network visual-
isation connects nodes through a line (i.e., edge) that carries information
on the type of relation between the nodes (i.e., attributes). Nodes are
understood as discrete objects, i.e., completely independent from each
other (cfr. Chap. 4); this ultimately means that the nodes are modelled
to remain stable and that the emphasis is on the relations, as these are
believed to provide adequate explanations of social phenomena. However,
this type of modelling arguably paints a rather artificial picture of both
the phenomena and the actors who remain unaffected by the changing
relationships between them. To put it in Drucker’s words:

This is a highly mechanistic characterization of nodes (and edges), whether


they consist of human beings, institutions, or events which reduce[s] all
5 WHAT THE GRAPH 127

relationships to the same presentation and make[s] static representations out


of dynamic conditions. (2020, 180)

NA factually transforms continuous (i.e., inseparable) elements such as


cultural actors into discrete and fixed points; this transformation is further
modelled visually, giving the impression of a neutral, exact and observable
description of their entanglement. The possibility to historicise actors and
relations in DeXTER is a concrete example of how the post-authentic
framework to NA aims to counteract this inevitably artificial ‘flattening
effect’. When developing the DeXTER’s interface, we decided to model
the data points displayed in the graphs according to several parameters
and attributes that reflect a conceptualisation of networks as lively and
dynamic structures. By sliding the time bar (cfr. Fig. 5.8), the analyst can,

Fig. 5.8 DeXTER default landing interface for NA. The red oval highlights the
time bar (historicise feature)
128 L. VIOLA

Fig. 5.9 DeXTER default landing interface for NA. The red oval highlights the
different title parameters

for example, observe not just how the relationships between entities change
over time but also the entities themselves. It is for instance possible to
explore how entities of interest were mentioned by migrants over time:
by selecting/deselecting specific titles (cfr. Fig. 5.9) of different political
orientation and geographical location, by selecting the frequency rate
and sentiment polarity (cfr. Fig. 5.10) to observe the prevailing emotional
attitude of the sentences in which the entities were mentioned together as
well as their frequency of occurrence.
By visualising both entities and relations and by creating dynamic and
interactive NA visualisations, the DeXTER interface on the whole aims
to provide several viewpoints on the same data, and it effectively shows
how several dimensions of observance dramatically affect the graphical
arrangements. In the case of the historicisation feature, for example, as the
5 WHAT THE GRAPH 129

Fig. 5.10 DeXTER default landing interface for NA. The red ovals highlight the
frequency and sentiment polarity parameters

data is modelled in reference to the documents’ timestamp, the analyst can


swipe the time bar on the top left of the interface to explore the changing
relationships between entities over time and/or at specific intervals. This
adds a historical dimension to the networks and allows the analyst to
observe and engage with changes in the graphs interactively as they reflect
how the displayed entities were mentioned by migrants according to
changing temporal parameters. We also added informative tool-tips next
to each available option to encourage close engagement with the interface,
with the process of data creation, with the method of NA itself and the
meanings offered by these parameters (Gaver et al. 2003).
The post-authentic framework conceptualises ambiguity and uncer-
tainty as intrinsic elements of knowledge creation in the digital; thus, rather
than rejecting them or obscuring them, it preserves them as opportunities
130 L. VIOLA

to reduce the reliance on potentially biased methods and to remind us on


the whole of the illusion of certainty (Edmond 2019). Applied to NA, this
means creating a space for interpretation, for instance by exposing the data
multi-dimensional complexity (Windhager et al. 2019b; Drucker 2020).
In the DeXTER interface, this was implemented by providing multi-
perspectivity on the same nodes. DeXTER allows users to explore three
types of networks: two entity-focused graphs (i.e., egocentric networks)
and one issue-focused network. We decided to visualise the networks
as egocentric networks for two reasons. Egocentric networks are local
networks with one central node, known as the ego. This type of network
visualises all the nodes directly connected to the ego, i.e., the alters.
Crossley et al. (2015) suggest that one main advantage of egocentric
networks is that they allow for rich visualisations even when all the entities
in a data-set cannot be mapped because of the network’s large size, which is
indeed the case of ChroniclItaly 3.0 as discussed in Chap. 3. Furthermore,
the provided ego’s extensive information may offer a personal perspective
on the node and the alters; indeed, thanks to this property, egocentric
networks are often referred to as cognitive networks (Perry et al. 2018).
We therefore chose egocentric network visualisations for their potential
ability to provide relevant material for the study of migration as experienced
and narrated by the migrants themselves. Starting from a selected entity
of their choice, users can explore several parameters: the net of entities
most frequently mentioned in the same sentence as the ego, the prevailing
emotional attitude in those sentences, the number of times entities were
mentioned together and the titles in which they were mentioned. This
information is encoded and made available to the analyst both through
pop-up tool-tips and through the colour of the edges (i.e., pastel blue
for negative sentiment, white for neutral and pastel red for positive).
Figure 5.11 shows the egocentric network for the GPE entity sicilia
(Sicily) across all the titles of the collection as mentioned in sentences
with prevailing positive sentiment. If ego-network is not selected, the
graph additionally displays the relations among the alters. As shown in
Fig. 5.12, the representation of relations can react significantly to the tiniest
modification of parameters (Windhager et al. 2019b); even when the same
node is selected, the overall offered perspective on the relational structure
of the graph can change significantly.
The third type of network visualisation (i.e., issue-focused network)
allows the exploration of entities starting from a specific issue. Whereas in
an egocentric network users observe a network which has an actor/entity
5 WHAT THE GRAPH 131

Fig. 5.11 DeXTER: egocentric network for the node sicilia across all titles in the
collection in sentences with prevailing positive sentiment

of their choice as the focal node, this third visualisation displays the actors
mentioned in specific newspapers on specific days. In this way, the issue-
focused network offers an additional perspective on the same digital object
potentially contributing valuable insights for the analysis of how events and
actors of interest were portrayed by migrants of different political affiliation
and who were based in different parts of the United States. Thus, instead of
offering one obvious meaning, DeXTER offers multiple perspectives, and
by capturing heterogeneous contexts, it creates a tension that the analyst
is encouraged to resolve through independent assessment (Gaver et al.
2003). Figure 5.13 shows the default issue-focused network graph.
DeXTER’s visualisation of sentiment as an attribute of NA is also guided
by post-authentic principles. As already discussed in Sect. 3.4, SA is a com-
putational technique that aims to identify the prevailing emotional attitude,
i.e., the sentiment, in a given text (or portions of a text); the sentiment is
then typically categorised according to three labels, i.e., positive, negative
132 L. VIOLA

Fig. 5.12 DeXTER: network for the ego sicilia and alters across titles in the
collection in sentences with prevailing positive sentiment

or neutral. A problematic aspect of the technique is that it presents these


labels as unambiguous, universally accepted categories, providing a neutral
and observable description of reality, and obscuring the highly problematic
and interpretative quality of the very process of establishment of such
categories (cfr. Sect. 3.4) (Puschmann and Powell 2018). The concept of
‘sentiment score’ additionally reinforces the illusion of objectivity, and it
further obfuscates the inherently vague, profoundly subjective dimension
of emotions and their definitions, a process intrinsically open to multiple
interpretations and subject to ambiguity. As a way to acknowledge the
ambiguities of the assumptions behind the technique and of a ‘senti-
ment score’, DeXTER’s graph colouring scheme is fluid and nuanced (as
opposed to solid colours): the colour gradients go from a darker shade
of blue for the lowest score (i.e., negative) to a darker shade of red for
the highest score (i.e., positive). The DeXTER’s visual representation of
sentiment results in a deliberately blurred graph, the borders of the edges
5 WHAT THE GRAPH 133

Fig. 5.13 DeXTER: default issue-focused network graph

are purposely smudged and pale, and pastel shades are preferred over
bright, solid shades; the aim is to openly acknowledge SA as ambiguous,
situated and therefore open to interpretation, rather than precise, neutral
and certain. By exposing these inconsistencies, post-authentic visualisations
on the whole question the main positivist discourse around technology. We
achieved this goal by providing a transparent documentation of how we
identified the sentiment categories, how we aggregated the results, how
we conducted the classification, how we interpreted the scores and how
we rendered them in the visualisation, in the openly available dedicated
GitHub repository which also includes the code, links to the original and
processed material and the files documenting the manual interventions.
Finally, guided by the post-authentic framework, DeXTER emphasises
the continuous making and re-making of data; this process of forming,
arranging and interpreting data is encoded within the interface itself.
Through the tab ‘Data’, users can at any point access and download the
data behind the visualisations as they reflect users’ selection of filters and
134 L. VIOLA

parameters (e.g., title, time interval, frequency, entity). The intention is


to disrupt traditional notions that conceptualise data as fixed, unarguable
and defined. At the same time, DeXTER acknowledges the collective
responsibility of building a source of knowledge for current and future
generations, and it frames the process of knowledge creation in the digital
as accountable, unfinished and receptive to alternatives.
Through the exploration of several case studies, i.e., the creation,
enrichment, analysis and visualisation of a digital object, this book argues
that new theoretical paradigms are now urgently required; these must be
centred on a reconceptualisation of digital objects as epistemic objects
which themselves carry meanings and which therefore alter the perception
of knowledge created in a digital environment. With specific reference to
visualisations, interfaces and graphic display, the post-authentic framework
that I propose in this book acknowledges them as problematic endeav-
ours embedding a wide net of situated processes which require more
systematic and sophisticated criteria than over-simplistic user-as-consumer-
and problem-solver-models (Windhager et al. 2019a). The recognition of
such complexities accepts and in fact embraces digital knowledge creation
practices as being embedded in extremely convoluted networks of countless
factors at play which cannot be fully trusted nor predicted. The post-
authentic framework therefore recognises the limitations and biases of
specific tools and techniques and exposes problematic processes such
as data creation, selection and manipulation by openly disclosing their
complexities and lifecycle, by thoroughly documenting the decisions and
actions and by allowing users to access the data behind the visualisations,
including making the acts of transformation explicit.
In the post-authentic interface DeXTER, we actualised this by providing
a space for interpretation and individual assessment, by favouring multi-
perspectivity through different types of network visualisations and by
offering dynamic and interactive graphs. This also arguably alleviates the
issue of displaying artificial pictures of social phenomena due to the tech-
nique’s intrinsic properties for which actors remain stable and unaffected
by the relations. While I am not implying that a post-authentic framework
is the perfect approach to digital knowledge creation practices, I do argue
that, by redefining our understanding of the theoretical dimensions of
digital objects, tools, techniques, platforms, interfaces and infrastructures,
especially for humanistic enquiry, the framework offers theoretical and
methodological criteria that recognise the larger cultural relevance of digi-
tal objects, and it provides an urgently needed architecture for issues such as
5 WHAT THE GRAPH 135

transparency, replicability, Open Access, sustainability, data manipulation,


accountability and visual display.

NOTES
1. A human-in-the-loop method requires human and machine intelligence to
create machine learning models. In this approach, humans interact with the
algorithm during training by tuning and testing
2. https://github.com/DHARPA-Project
3. https://github.com/lorellav/DeXTER-DeepTextMiner

Open Access This chapter is licensed under the terms of the Creative Commons
Attribution 4.0 International License (http://creativecommons.org/licenses/by/
4.0/), which permits use, sharing, adaptation, distribution and reproduction in any
medium or format, as long as you give appropriate credit to the original author(s)
and the source, provide a link to the Creative Commons license and indicate if
changes were made.
The images or other third party material in this chapter are included in the
chapter’s Creative Commons license, unless indicated otherwise in a credit line
to the material. If material is not included in the chapter’s Creative Commons
license and your intended use is not permitted by statutory regulation or exceeds
the permitted use, you will need to obtain permission directly from the copyright
holder.
CHAPTER 6

Conclusion

Philosophers until now have only interpreted the world in various ways.
The point, however, is to change it. (Karl Marx, 1846)

As technology changes, society changes and so the way society produces


knowledge and culture also changes. Yet, the predominant model of
knowledge production continues to be one bound to the epistemology of
last century’s industrial societies. In this book, I argued that to respond
to the radical changes brought by the digital transformation of society
and aggravated by the 2020 pandemic, the current model of knowledge
creation must urgently be re-theorised. This means, I contended, pushing
beyond mere observations of how higher education has been transitioning
towards the digital and recognise that a more fundamental question needs
to be asked. For example, it is no longer sufficient to reflect on how
the digital transformation has required teachers to rapidly acquire digital
skills to adapt and rethink their learning methods, or how the digital has
affected branches of knowledge (e.g., humanities) or individual disciplines
(e.g., history), or how differently academics now think about sharing their
research findings (e.g., end-users) or how their research is increasingly
dominated by data rather than by sources, including having to consider
issues of storage, archival, transparency, etc. A different critical awareness
is now required: the shift has been in—as opposed to towards—the digital.
Claiming that the shift has been in the digital acknowledges conclusively
that the digital is now integral to not only society and its functioning,

© The Author(s) 2023 137


L. Viola, The Humanities in the Digital: Beyond Critical Digital
Humanities, https://doi.org/10.1007/978-3-031-16950-2_6
138 L. VIOLA

but crucially also to how society produces knowledge and culture. My


argument for a new model of knowledge production therefore starts
from recognising that persisting binary modulations in relation to the
digital—for example, between digital knowledge creation and non-digital
knowledge creation—are no longer relevant in that they continue to
suggest artificial, irrelevant divisions. Such divisions, I contended, not
only slow down progress and hinder knowledge advancement, but by
fragmenting expertise, they sustain a model of knowledge that does not
adequately respond to a reality complexified by the digital. It has been
the argument of this book that the digital transformation of society
requires a more problematised understanding of the digital as an organic
entity that brings multiple levels of complexity to reality, many of which
have unpredictable consequences. Our traditional model of knowledge
creation based on single discipline perspectives, hierarchical divisions and
competition is no longer suited to meet the unprecedented challenges
facing societies in the digital.
In this book, I developed a new theoretical and methodological frame-
work, the post-authentic framework, which critiques dominant positivistic
and deterministic views of technology and computational methods and
offers new terminologies, concepts and approaches in reference to the
digital, digital objects and practices of knowledge production in the digital.
The post-authentic framework breaks with dialectical principles of dualism
and antagonism and with the rigid model of knowledge creation that
divides knowledge into disciplines and disciplines into two areas: the
sciences and the humanities. Dual notions of this kind, I argued, are
complicit of an assiduously cultivated discourse that has historically exalted
digital methods as exact, rigorous, neutral, more relevant and funding-
worthy than critical approaches. This includes the cosy and reassuring
myths that data is unarguable, bias-free, precise and reliable, as opposed
to sources and human consciousness which have been more and more
sidelined as carriers of biases, unreliability and inequality.
My reframing of the digital through the post-authentic framework
helps us recognise that the narrative simplification around computational
techniques and consciousness sidelining cannot be afforded to continue
because knowledge does not respect the limits of disciplines and the impli-
cations of being in the digital transcend such artificial boundaries. This is a
reality we can no longer ignore and which can only be confronted through a
reconfigured model of knowledge creation that would reconceptualise it as
happening in the digital. The world has entered a new dimension in which
6 CONCLUSION 139

higher education can no longer afford to opportunistically see technology


and its production as instrumental and contextual to knowledge and
teaching or simply as an object of critique, admiration, fear or envy. The
post-authentic framework that I proposed in this book functions as a radical
critique of such outdated conceptualisations of the digital and argues that
the current model of knowledge creation with its established boundaries
between disciplines and specialisations is not suited to respond to the
complex challenges of a world in the digital.
Instead, the framework advocates a notion of knowledge as fluid, in
which differences are not rejected but welcomed according to the principles
of symbiosis and mutualism (cfr. Sect. 2.2). Symbiosis and mutualism
oppose models of reality that support individualism and separateness as
inevitably leading to conflict and competition; one such model of reality
is the division of knowledge into monolithic disciplines. Borrowed from
biology, the concept of symbiosis breaks with the current conceptualisation
of knowledge as separate, linear and fragmented into multiple disciplines
and that of the digital as a static, inconsequential entity. To the con-
trary, symbiosis evokes ideas of close and long-term cooperation between
different organisms and the continual renegotiation of interactions; past,
present and future systems; power relations; infrastructures; interventions;
curations and curators; programmers and developers.
Mutualism opposes interspecific competition, that is, when organisms
from different species compete for a resource, resulting in benefiting only
one of the actors involved. I maintained that our model of knowledge
creation based on hierarchical separations between disciplines resembles an
interspecific competition dynamic as it has forced knowledge production
to operate within a space of conflict and competition. This model, I con-
tended, is outdated and inadequate, it traps curiosity into rigid categories,
and it is unsuited to rethink and explain the transformative effect the digital
is having on our culture and society; to use Virginia Eubanks’ words,
it contributes to automate inequality and it can therefore make society
worse. I therefore argued that any re-modulation still operating within the
current disciplinary model of knowledge creation is no longer sufficient;
to this end, I proposed the notions of symbiosis and mutualism to help
us reconceptualise knowledge as fluid and inseparable. Symbiosis and
mutualism shape a model in which curiosity is finally given the long overdue
free rein, in which the different areas of knowledge do not compete against
each other but benefit from a mutually compensating relationship. When
asking ourselves the questions ‘How do we produce knowledge today?’
140 L. VIOLA

and ‘How do we want our next generation of students to be trained?’, the


concepts of symbiosis and mutualism may guide our answers.
Symbiosis and mutualism are central notions also for the development
of a more problematised conceptualisation of digital objects and digital
knowledge production. The post-authentic framework re-examines the
digital as situated and partial, an extremely convoluted assemblage of
factors and actors, themselves part of wider networks of situated com-
ponents, processes and mechanisms of interaction and the various forms
of power embedded in computational processes and beyond. As such,
far from being mere immaterial copies of the originals, digital objects
are acknowledged as bearing consequences which transcend traditional
questions of authenticity; digital objects are never finished nor they can be
finished; countless versions can endlessly be created through processes that
are shaped by past decisions and in turn shape the following ones. Thus, the
post-authentic framework engages with both products and processes which
are understood as never neutral, as incorporating external, situated systems
of interpretation and management and therefore bearing consequences
which go beyond the object-centred culture of authenticity.
To exemplify this complexity of conflating humans, entities and pro-
cesses and past, present and future experiences, I used ChroniclItaly 3.0,
a digital cultural heritage collection of Italian American newspapers pub-
lished between 1898 and 1936. Specifically, I examined and illustrated how
the application of the post-authentic framework can inform the creation,
enrichment, analysis and visualisation of a digital object. By redefining our
understanding of both the conceptual and concrete dimensions of digital
objects, tools and techniques, the post-authentic framework provides
theoretical and methodological criteria that recognise the larger cultural
relevance of digital objects and the methods to create them, analyse them
and visualise them it affords an architecture for issues such as transparency,
replicability, Open Access, sustainability, data manipulation, accountability
and visual display.
Central to the framework is the recognition that illusory, positivistic
notions of the digital are ill-suited for the problems of the digital societies
we live in. The post-authentic framework exposes aspects of knowledge
creation in the digital that oppose both the mainstream fetishisisation
of big data and algorithms and an unproblematised understanding of
the digital, it addresses issues such as ambiguity and uncertainty, and
the subjective and interpretative dimension of collecting, selecting, cat-
egorising and aggregating, i.e., the act of creating data. In pursuing
6 CONCLUSION 141

my case for a novel model of knowledge creation in the digital, in the


book, I presented a range of personal case studies and examined how the
application of the framework in my own work helped me address aspects
of knowledge creation in the digital such as transparency, documentation
and reproducibility; questions about reliability, authenticity and biases;
and engaging with sources through technology. Using ChroniclItaly 3.0
as digital object, I applied the post-authentic framework to a variety of
applied contexts such as digital heritage practices, digital linguistic injustice,
critical digital literacy and critical digital visualisation and I devoted specific
attention to four key aspects of knowledge creation in the digital: creation
of a digital object in Chap. 2, enrichment of a digital object in Chap. 3,
analysis of a digital object in Chap. 4 and visualisation of a digital object
in Chap. 5. This auto-ethnographic and self-reflexive approach allowed
me to show how a re-examination of digital knowledge creation can no
longer be achieved from a distance, but only from the inside. Ultimately,
the book demonstrated that it is only through the conscious awareness of
the delusional belief in the neutrality of data, tools, methods, algorithms,
infrastructures and processes that the biases embedded in these systems and
amplified by their ubiquitous use can in fact be identified and addressed.
In Chap. 3, for example, I showed how from pre-processing to data
augmentation, the application of the post-authentic framework to the
task of enriching digital material can guide each action of an enrichment
workflow. Using the case examples of DeXTER and ChroniclItaly 3.0
(Viola and Fiscarelli 2021a) and informed by symbiosis and mutualism,
Chap. 3 illustrated how the post-authentic framework can guide the inter-
action with the digital, not as a strategic (grant-oriented) or instrumen-
tal (task-oriented) collaboration but as a cognitive mutual contribution.
In particular, I unpacked the ambiguities and uncertainties of methods
such as optical character recognition (OCR), named entity recognition
(NER), geolocation and sentiment analysis (SA) and showed how the
post-authentic framework can help address these challenges, for instance,
through a thorough understanding of the assumptions behind these tech-
niques, constant update and critical supervision. The framework recognises
curatorial practices as manipulative interventions which especially in the
case of cultural heritage material, bear the consequence of being a source
of knowledge for current and future generations.
This book was also a reflection on the implications of the digital trans-
formation for our perception of the world. Drawing on the mathematical
concepts of discrete vs continuous modelling of information (cfr. Chap. 4),
142 L. VIOLA

I discussed some of the repercussions of the transformation of continuous


material into discrete form due to the discretisation of society, that is,
binary sequences of 0s and 1s, especially consequential for the notions of
causality and correlations in relation to knowledge creation. In discrete
systems, causality is hidden because information is discretisised into exact
and separate points, which must be categorised and made explicit. As a
result, we are given a digitally mediated image of the world, meaning
that the relational causality of continuous information is replaced by
predictions of correlations. Thus, societies in the digital in which the ‘big
data philosophy’ reigns, I argued, are offered countless patterns but no
explanations for them. Us—the digital citizens—are left to deal with a
patterned, yet a-causal, way of making sense of reality.
Closely related to this point is the use of metaphorical language to name
computational techniques, such as topic modelling, sentiment analysis
and machine learning (ML); this phenomenon can be seen as a way to
make sense of an a-causal reality. Indeed conflating specific mathematical
concepts such as discrete vs continuous modelling of information with
such familiar notions has created reassuring expectations, that machines
can learn to understand language and somewhat provide neutral, precise
and understandable accounts from large quantities of textual material. In
the case of SA, this altered image is that the subjectivity of human emotions
can be reduced to two/three categories and quantified according to prob-
abilistic calculations; in the case of ML, the unique, holistic human process
of experiential learning and of connecting logic with contextual factors
is discretisised into probabilities’ scores of huge, yet partial, quantities of
discrete data; in the case of topic modelling, the text itself disappears and
so does its continuous structure, i.e., the wider context that produced
it. The computational dissembling of the causal structure by the dualistic
system of 0s and 1s hides the original continuous nature to which the data
refers. The use of metaphorical language such as ‘sentiment’, ‘learning’
and ‘topic’, I argued, has therefore certainly contributed to make these
methods extremely popular, especially outside their fields of origin, but at
the same time, by obfuscating the precise mathematical laws upon which
these techniques are based, it has created unrealistic beliefs.
The post-authentic framework can be a useful tool to guide the unpack-
ing of properties and assumptions of computational techniques used to
analyse a digital object. Using topic modelling as an example, in Chap. 4,
I showed how the framework can be applied to engage critically with
software. At the core of the framework is the importance of maintaining
6 CONCLUSION 143

a close connection with the digital object; for example, in the chapter,
I stressed how aspects such as pre-processing, corpus preparation and
choosing the number of topics typically reputed as unproblematic are in
fact fundamental moments within a topic modelling workflow in which
the analyst is required to make countless choices. The example of topic
modelling demonstrates how the post-authentic framework can guide the
exploration, questioning and challenging of the interpretative potential of
computation.
Operating within the post-authentic framework crucially means
acknowledging digital objects as living entities that have far-reaching,
unpredictable consequences; the continually changing complexity of
nets involving processes and actors must therefore always be critically
supervised. The visualisation of a digital object is one such process.
The post-authentic framework opposes an uncritical adoption of digital
methods and points to the intrinsic dynamic, situated, interpreted and
partial nature of the digital. Despite being often employed as exact ways
of presenting reality, visualisations are extremely ambiguous techniques
which embed numerous human decisions and judgement calls. In
Chap. 5, I illustrated how the post-authentic framework can be applied to
visualisation by discussing two examples: efforts towards the development
of a user interface (UI) for topic modelling and the design choices for
developing the app DeXTER, the interactive visualisation interface to
explore ChroniclItaly 3.0. I specifically centred my discussion on how
the ambiguities and uncertainties of topic modelling, network analysis
(NA) and SA can be encoded visually. A key notion of the post-authentic
framework is the acknowledgement of curatorial practices as manipulative
interventions and of how it is in fact through exposing the ambiguities and
uncertainties that knowledge creation in the digital can be kept honest and
accountable for current and future generations.
Through the application of the post-authentic framework to these four
case examples, the book aimed to show how an uncritical and naïve
approach to the use of computational methods is bound to reproduce
the very opaque processes that the publicised algorithmic discourse claims
to break, but more worryingly, it contributes to make society worse. The
book was therefore also a contribution to working towards systemic change
in knowledge creation practises and by extension, in society at large; it
provided a new set of notions and methods that can be implemented when
collecting, assessing, reviewing, enriching, analysing and visualising digital
material. It is this more problematised notion of the digital conceptualised
144 L. VIOLA

in the framework that highlights how its transcending nature makes old
dichotomies between digital knowledge creation and non-digital knowl-
edge creation no longer relevant and in fact, harmful.
The digitisation of society already well on its way before the COVID-
19 pandemic but certainly brought to its non-reversible turning point
by the 2020 health crisis has brought into sharper focus how the digital
exacerbates existing fractures and disparities in society. Unable to deal
adequately with the complexity of society and social change, the cur-
rent model of knowledge creation urgently requires a re-theorisation.
This book is therefore a wake-up call for understanding the digital as
no longer contextual to knowledge creation and for recognising that a
discipline compartmentalisation model sustains an anachronistic and not
equipped way to encapsulate and explain society. All information is now
digital and algorithms are more and more central nodes of knowledge
and culture production with an increased capacity to shape society at
large. As digital vs non-digital positions have entirely lost relevance, it
has become increasingly futile to create ultra-specialised disciplines from
other disciplines’ overlapping spaces or indeed to invest energy in trying
to define those, such as in the case of DH; the digital transformation
has magnified the inadequacy of a mono-perspective approach, legacy
of a model of knowledge that compartmentalises competing disciplines.
Scholars, researchers, universities and institutions must acknowledge the
central role they have to play in assessing how knowledge is created not
just today, but also for future generations.
The new theoretical and methodological framework that I proposed in
this book moves beyond the current static conceptualisation of knowledge
production which praises interdisciplinarity but forces knowledge into rigid
categories. To the contrary, the framework offered novel concepts and
terminologies that break with dialectical principles of dualism and antago-
nism, including dichotomous notions of digital vs non-digital, sciences vs
the humanities, authentic vs non-authentic and computational/neutral vs
non-computational/biased. The re-devised notions, practices and values
that I offered help re-figure the way in which society conceptualises data,
technology, digital objects and the process of knowledge creation in the
digital.
My re-examination of the current model of knowledge includes not just
scholarship but pedagogy too. And whilst this is not the main focus of
this book, the arguments I put forward here for scholarship equally apply
to pedagogy. In order to achieve systemic change, academic programmes
6 CONCLUSION 145

must be updated to include opportunities for critical reflections on the


pressing issues stemming from the ubiquitous underpinning of AI in our
societies. Through real use cases similar to those illustrated throughout the
chapters of this book, students would learn about the deep implications of
digital technologies on contemporary culture and society. In the words
of Timnit Gebru, the research scientist who was recently fired by Google
after exposing how strongly biased Google’s AI systems are (Bender et al.
2021), ‘The people creating the technology are a big part of the system. If
many are actively excluded from its creation, this technology will benefit a
few while harming a great many’. Indeed, as technology is a central locus
of knowledge and culture production and AI technology in particular is
dominated by a white, mostly male workforce, the culture that is produced
replicates the biases of the almost entirely male, predominantly white
workforce that is building it.
Although there may not be any initial intention of using biased models,
tech companies become immediately accountable, at least from an ethical
perspective if not yet a legal one, as soon as they refuse to acknowledge
and correct such biases even when these are clearly exposed. If it is true
that governments are spectacularly behind in creating rules for the ethical
use of this technology, it is equally true that big tech companies shouldn’t
wait for laws to be passed. Because of the serious social repercussions of
the technology they create, they have a responsibility to bring this issue
at the centre of their organisations. Meanwhile, universities also have a
responsibility to train in ethical digital management the next generation
of thinkers, scholars and academics as well as of digital citizens at large.
Equally, research funding agencies must specifically require that the issue
of digital ethics is explicitly addressed by researchers in their projects, for
instance, by demanding a digital critical component in their proposals. As
users and co-producers of technology, our responsibility is to counterbal-
ance the main AI discourse with new, more honest narratives, to critically
reflect on how we are producing knowledge today and for tomorrow, and
on how we educate the next generation of students and digital citizens
146 L. VIOLA

to be like. The post-authentic framework of knowledge creation in the


digital provides a framework to communicate and incorporate values of
honesty, accountability, transparency and sustainability into knowledge. It
reminds us that a racist, sexist, homophobic digital society is not so much
a reflection of human subjectivity in data and algorithms but proof of its
pretend absence.

Open Access This chapter is licensed under the terms of the Creative Commons
Attribution 4.0 International License (http://creativecommons.org/licenses/by/
4.0/), which permits use, sharing, adaptation, distribution and reproduction in any
medium or format, as long as you give appropriate credit to the original author(s)
and the source, provide a link to the Creative Commons license and indicate if
changes were made.
The images or other third party material in this chapter are included in the
chapter’s Creative Commons license, unless indicated otherwise in a credit line
to the material. If material is not included in the chapter’s Creative Commons
license and your intended use is not permitted by statutory regulation or exceeds
the permitted use, you will need to obtain permission directly from the copyright
holder.
REFERENCES

ACHS (2012) Manifesto. https://www.criticalheritagestudies.org/history


Albers L, Große P, Wagner S (2020) Semantic data-modeling and long-term
interpretability of cultural heritage data—three case studies. In: Kremers H (ed)
Digital cultural heritage. Springer International Publishing, Cham, pp 239–253.
https://doi.org/10.1007/978-3-030-15200-016
Alenezi M (2021) Deep dive into digital transformation in higher education insti-
tutions. Educ Sci 11(12):770. https://doi.org/10.3390/educsci11120770,
https://www.mdpi.com/2227-7102/11/12/770. Number: 12. Publisher:
Multidisciplinary Digital Publishing Institute
Allington D, Brouillete S, Golumbia D (2016) Neoliberal tools (and archives):
a political history of digital humanities. Los Angeles Review of Books.
https://lareviewofbooks.org/article/neoliberal-tools-archives-political-
history-digital-humanities/. Section: essay
Autor DH, Levy F, Murnane RJ (2003) The skill content of recent technological
change: an empirical exploration. Q J Econ 55
Bail C (2018) Topic modeling. https://cbail.github.io
Baker N (2002) Double fold: libraries and the assault on paper, 1st edn. Vintage
Books, New York
Bakewell O, Binaisa N (2016) Tracing diasporic identifications in Africa’s urban
landscapes: evidence from Lusaka and Kampala. Ethn Racial Stud 39(2):280–
300. https://doi.org/10.1080/01419870.2016.1105994

© The Author(s) 2023 147


L. Viola, The Humanities in the Digital: Beyond Critical Digital
Humanities, https://doi.org/10.1007/978-3-031-16950-2
148 REFERENCES

Bandiera O, Rasul I, Viarengo M (2013) The making of modern america: migratory


flows in the age of mass migration. J Dev Econ 102:23–47. https://doi.org/10.
1016/j.jdeveco.2012.11.005, https://linkinghub.elsevier.com/retrieve/pii/
S0304387812000995
Barceló-Coblijn L, Serna Salazar D, Isaza G, Castillo Ossa LF, Bedia MG (2017)
Netlang: a software for the linguistic analysis of corpora by means of complex
networks. PLOS ONE 12(8):e0181341. https://doi.org/10.1371/journal.
pone.0181341, https://dx.plos.org/10.1371/journal.pone.0181341
Barrett JR, Roediger D (1997) Inbetween peoples: race, nationality and the “new
immigrant” working class. J Am Ethn Hist 16(3):3–44. OCLC: 5543891291
Beals M, Bell E (2020) The atlas of digitised newspapers and metadata: reports
from oceanic exchanges. Technical report, Transatlantic Partnership for Social
Sciences and Humanities 2016 Digging into Data Challenge. Artwork Size:
1225056 Bytes. Publisher: figshare
Beardsworth R (1996) Derrida and the political. Routledge, Milton Park
Bender EM, Gebru T, McMillan-Major A, Shmitchell S (2021) On the dangers
of stochastic parrots: can language models be too big? In: Proceedings of the
2021 ACM conference on fairness, accountability, and transparency. ACM, New
York, Virtual Event Canada, pp 610–623. https://doi.org/10.1145/3442188.
3445922, https://dl.acm.org/doi/10.1145/3442188.3445922
Benjamin W (1939) The work of art in the age of mechanical reproduction. In:
Cazeaux C (ed) The continental aesthetics reader, 2nd edn. Routledge, Milton
Park, pp 429–450. https://doi.org/10.4324/9781351226387-29
Berry D (2014) Post-digital humanities: computation and cultural critique in the
arts and humanities. EDUCAUSE Rev 49(3):22–26. https://er.educause.
edu/articles/2014/5/postdigital-humanities-computation-and-cultural-
critique-in-the-arts-and-humanities
Berry DM, Dieter M (eds) (2015) Postdigital aesthetics: art, computation and
design. Palgrave Macmillan, New York
Berry DM, Fagerjord A (2017) Digital humanities: knowledge and critique in a
digital age. Polity, Cambridge Malden. OCLC: 986233467
Bianco J (2012) Chapter 7: this digital humanities which is not one. In: Debates
in the digital humanities. Manifold Scholarship, University of Minnesota.
https://dhdebates.gc.cuny.edu/read/untitled-88c11800-9446-469b-a3be-
3fdb36bfbd1e/section/1a1c7fbc-6c4a-4cec-9597-a15e33315541
Biggs N, Lloyd EK, Wilson RJ (1986) Graph theory, 1736–1936. Clarendon Press,
Oxford
Bjork U (1998) History of the Mass Media in the US: an Encyclopedia Chicago.
Ethnic Press, Fitzroy, pp 207–209
Blei DM (2012) Probabilistic topic models. Commun ACM 55(4):77–
84. https://doi.org/10.1145/2133806.2133826, https://dl.acm.org/doi/
10.1145/2133806.2133826
REFERENCES 149

Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res
3(Jan):993–1022
Bobicev V, Sokolova M (2018) Thumbs up and down: sentiment analysis of medical
online forums. In: EMNLP 2018. https://doi.org/10.18653/v1/W18-5906
Boccagni P, Schrooten M (2018) Participant observation in migration studies:
an overview and some emerging issues. In: Zapata-Barrero R, Yalaz E (eds)
Qualitative research in european migration studies, imiscoe research series.
Springer International Publishing, Cham, pp 209–225. https://doi.org/10.
1007/978-3-319-76861-8-12
Boyd D (2016) Undoing the neutrality of big data. Fla Law Rev Forum 67:226–
232
Boyd D (2018) Don’t believe every ai you see. https://www.newamerica.org/pit/
blog/dont-believe-every-ai-you-see/
Boyd Davis S, Vane O, Kräutli F (2021) Can I believe what I see? Data visualization
and trust in the humanities. Interdiscip Sci Rev 46(4):522–546, https://
doi.org/10.1080/03080188.2021.1872874, https://www.tandfonline.com/
doi/full/10.1080/03080188.2021.1872874
Braidotti (2017) Posthuman critical theory. J Posthuman Stud 1(1):9. https://
doi.org/10.5325/jpoststud.1.1.0009
Braidotti R (2019) A theoretical framework for the critical posthumanities. Theory
Cult Soc 36(6):31–61. https://doi.org/10.1177/0263276418771486
Braidotti R, Fuller M (2019) The posthumanities in an era of unexpected
consequences. Theory Cult Soc 36(6):3–29. https://doi.org/10.1177/
0263276419860567
Branch JW, Burgos D, Serna MDA, Ortega GP (2020) Digital transformation in
higher education institutions: between myth and reality. In: Burgos D (ed)
Radical solutions and eLearning. Lecture Notes in Educational Technology.
Springer Singapore, Singapore, pp 41–50, https://doi.org/10.1007/978-
981-15-4952-6_3
Brennan T (2017) The digital humanities bust. The Chronicle of Higher
Education, Washington. https://www.chronicle.com/article/the-digital-
humanities-bust/. Section: The Review
Brodkin K (1998) How Jews became white folks and what that says about race in
America. Rutgers University Press, New Brunswick. OCLC: 237334921
Bronstein JL (ed) (2015) Mutualism, 1st edn. Oxford University Press, Oxford.
OCLC: ocn909326420
Brown VA, Harris JA, Russell JY (eds) (2010) Tackling wicked problems through
the transdisciplinary imagination. Earthscan, London
Brown-Martin G, Yiannouka SN, Tavakolian N (2014) Learning {re}imagined:
how the connected society is transforming learning. Bloomsbury Qatar Foun-
dation Publishing, Qatar. OCLC: 905268079
150 REFERENCES

Bush V (1945) As we may think. The Atlantic, Washington. https://www.


theatlantic.com/magazine/archive/1945/07/as-we-may-think/303881/.
Section: Technology
Butler B (2007) Return to Alexandria: an ethnography of cultural heritage,
revivalism, and museum memory. Publications of the Institute of Archaeol-
ogy, University College London. Left Coast Press, Walnut Creek. OCLC:
ocn163603253
Cai G, Sun F, Sha Y (2018) Interactive visualization for topic model curation. In:
CEUR workshop proceedings of the IUI workshops, vol 2068
Calude CS, Longo G (2017) The deluge of spurious correlations in big data. Found
Sci 22(3):595–612. https://doi.org/10.1007/s10699-016-9489-4
Cameron F (2007) Beyond the cult of the replicant: museums and historical digital
objects—traditional concerns, new discourses. In: Cameron F, Kenderdine S
(eds) Theorizing digital cultural heritage. The MIT Press, Cambridge, pp 49–
75. https://doi.org/10.7551/mitpress/9780262033534.003.0004
Cameron F (2021) The future of digital data, heritage and curation in a more-than-
human world. Routledge, Abingdon
Cameron F, Kenderdine S (eds) (2007) Theorizing digital cultural heritage:
a critical discourse. Media in transition. MIT Press, Cambridge. OCLC:
ocm62888361
Čapek M, Čapek M (1961) The philosophical impact of contemporary physics. The
Philosophical Impact of Contemporary Physics, Van Nostrand
Chaney AJB, Blei DM (2012) Visualizing topic models. In: ICWSM, p 4
Chang J, Gerrish S, Wang C, Boyd-graber J, Blei D (2009) Reading tea leaves:
how humans interpret topic models. In: Bengio Y, Schuurmans D, Lafferty J,
Williams C, Culotta A (eds) Advances in neural information processing systems,
vol 22. Curran Associates Inc., Red Hook
Chatzimparmpas A, Martins RM, Jusufi I, Kerren A (2020) A survey of surveys
on the use of visualization for interpreting machine learning models. Inf
Vis 19(3):207–233. https://doi.org/10.1177/1473871620904671, http://
journals.sagepub.com/doi/10.1177/1473871620904671
Chomsky N, Smith N (2000) New horizons in the study of language
and mind, 1st edn. Cambridge University Press, Cambridge. https://
doi.org/10.1017/CBO9780511811937, https://www.cambridge.org/core/
product/identifier/9780511811937/type/book
Christin A (2016) From daguerreotypes to algorithms: machines, expertise,
and three forms of objectivity. ACM SIGCAS Comput Soc 46(1):27–
32. https://doi.org/10.1145/2908216.2908220, https://dl.acm.org/doi/
10.1145/2908216.2908220
Chuang J, Manning CD, Heer J (2012) Termite: visualization techniques for
assessing textual topic models. In: Proceedings of the international work-
ing conference on advanced visual interfaces – AVI ’12. ACM Press, Capri
REFERENCES 151

Island, p 74. https://doi.org/10.1145/2254556.2254572, http://dl.acm.


org/citation.cfm?doid=2254556.2254572
Chubin DE, Porter AL, Rossini FA, Connolly T (eds) (1986) Interdisciplinary
analysis and research: theory and practice of problem-focused research and
development. Lomond, Mt. Airy
Chun WHK, Grusin R, Jagoda P, Raley R (2016) The dark side of the dig-
ital humanities. In: Debates in the digital humanities, Chapter 38. Mani-
fold Scholarship, Minnesota. https://dhdebates.gc.cuny.edu/read/untitled/
section/ca35736b-0020-4ac6-9ce7-88c6e9ff1bba
Churchman W (1967) Wicked problems. Manag Sci 14(4):B–141–B–146.
https://doi.org/10.1287/mnsc.14.4.B141, http://pubsonline.informs.org/
doi/abs/10.1287/mnsc.14.4.B141
Clark T (1992) Derrida, Heidegger, Blanchot: sources of Derrida’s notion and
practice of literature. Cambridge University Press, Cambridge
Cohron M, Cummings S, Laroia A, Yavar E (2020) COVID-19 is accelerating
the rise of the digital economy. Digital transformation in the pandemic &
post-pandemic era. Technical report. BDO International. https://www.bdo.
co.za/getattachment/Insights/2020/COVID19/Covid-19-is-accelerating-
the-rise-of-the-digital-e/ADV_DTS_COVID-19-is-Accelerating-the-Rise-
of-the-Digital-Economy_Web.pdf.aspx?lang=en-ZA
Collins H, Evans R, Gorman M (2007) Trading zones and interactional
expertise. Stud Hist Philos Sci A 38(4):657–666. https://doi.org/10.
1016/j.shpsa.2007.09.003, https://linkinghub.elsevier.com/retrieve/pii/
S003936810700060X
Connell WJ, Gardaphé FL (eds) (2010) Anti-Italianism: essays on a prejudice.
Palgrave Macmillan, New York. OCLC: 755003583
Connell WJ, Pugliese SG (eds) (2018) The Routledge history of Italian Americans.
The Routledge Histories. Routledge, New York
Cook G, Van Horn J (2011) How dirty is your data? A look at the energy choices
that power cloud computing. Technical report. Greenpeace International, Ams-
terdam. https://www.greenpeace.org/static/planet4-international-stateless/
2011/04/4cceba18-dirty-data-report-greenpeace.pdf
Cordell R, Beals M, Galina I, Priewe M, Priani E, Salmi H, Verheul J, Alegre R,
Koch S, Hauswedell T, Fyfe P, Hetherington J, Lorang E, Nivala A, Pado S,
Soh LK, Terras M (2017) Oceanic exchanges. Technical report. Transatlantic
partnership for social sciences and humanities 2016 digging into data challenge.
https://osf.io/wa94s/. Publisher: Open Science Framework
Council on Library and Information Resources (2000) Authenticity in a digital
environment. Council on Library and Information Resources, Washington.
OCLC: ocm44417224
Crawford K (2021) Atlas of AI: power, politics, and the planetary costs of artificial
intelligence. Yale University Press, New Haven. OCLC: on1111967630
152 REFERENCES

Croft W, Cruse DA (2004) Cognitive linguistics, 1st edn. Cambridge University


Press, Cambridge. https://doi.org/10.1017/CBO9780511803864
Crossley N, Bellotti E, Edwards G, Everett MG, Koskinen J, Tranmer M (2015)
Social network analysis for ego-nets. SAGE Publications Ltd., London. OCLC:
ocn918993384
Crymble A (2021) Technology and the historian: transformations in the digital
age. Topics in the digital humanities, University of Illinois Press, Urbana
Cunningham A, Jaskov H, Takats S, Viola L (2022) Introducing the DHARPA
project: an interdisciplinary lab to enable critical DH practice. DHBenelux J
4(1):29–41. In: Viola L, Prokic J, Fokkens A, Caselli Tommaso (eds) Special
issue: The Humanities in a Digital World
Dancygier B, Sweetser E (eds) (2012) Viewpoint in language: a multi-
modal perspective. Cambridge University Press, Cambridge. https://doi.
org/10.1017/CBO9781139084727, http://ebooks.cambridge.org/ref/id/
CBO9781139084727
Deegan M, McCarty W (eds) (2011) Collaborative research in the digital human-
ities. Ashgate Pub, Farnham
De Fina A, Tseng A (2017) Narrative in the study of migrants. In: Canagarajah S
(ed) The routledge handbook of migration and language. Routledge, London,
pp 381–396
de Laplace P (1820) Théorie analytique des probabilités. Oeuvres, V. Courcier
Derrida J (1994) Specters of marx: the state of the debt, the work of mourning and
the new international. Routledge, Milton Park
Deschamps B (2007) La stampa etnica negli Stati Uniti: Tra nostalgia nazionale,
ricostruzione dell’identità e alternativa culturale. In: Pompeo F (ed) La società di
tutti: multiculturalismo e politiche dell’identità, Meltemi, Roma, pp 151–168.
OCLC: 762297797
DeVaney J, Shimshon G, Rascoff M, Maggioncalda J (2020) How can universities
adapt during COVID-19? Technical report. COURSERA. https://www.
timeshighereducation.com/sites/default/files/how-can-universities-adapt-
covid19_whitepaper.pdf
Diehl DK (2019) Language and interaction: applying sociolinguistics to social
network analysis. Qual. Quant. 53(2):757–774, https://doi.org/10.1007/
s11135-018-0787-5, http://link.springer.com/10.1007/s11135-018-0787-
5
Dinsman M (2016) The digital in the humanities: a special interview series. Los
Angeles Review of Books. https://lareviewofbooks.org/feature/the-digital-
in-the-humanities/
Dobson JE (2019) Critical digital humanities: the search for a methodology.
University of Illinois Press, Champaign. OCLC: 1096281560
Donaldson C, Gregory IN, Taylor JE (2017) Locating the beautiful, pic-
turesque, sublime and majestic: spatially analysing the application of aes-
REFERENCES 153

thetic terminology in descriptions of the English Lake District. J Hist Geogr


56:43–60. https://doi.org/10.1016/j.jhg.2017.01.006, https://linkinghub.
elsevier.com/retrieve/pii/S0305748817300178
Drucker J (2011) Humanities approaches to graphical display. Digit Humanit
Q 5(1). http://www.digitalhumanities.org/dhq/vol/5/1/000091/000091.
html
Drucker J (2013) Performative materiality and theoretical approaches to interface.
Digit Humanit Q 7(1). ISBN: 1938-4122
Drucker J (2014) Visualizing temporality: modelling time from the textual record.
http://www.qmul.ac.uk/events/items/2014/119740.html
Drucker J (2020) Visualization and interpretation: humanistic approaches to
display. The MIT Press, Cambridge
Drucker J, Nowviskie B (2004) Speculative computing: aesthetic provocations
in humanities computing. In: Schreibman S, Siemens R, Unsworth J (eds) A
companion to digital humanities. Blackwell, Oxford
Du K (2019) A survey on LDA topic modeling in digital humanities. In: DH2019,
DataverseNL. OCLC: 8556919715
Düring M, Wieneke L, Croce V (2015) Interactive networks for digital cultural her-
itage collections – scoping the future of histograph. In: Cimiano P, Frasincar F,
Houben GJ, Schwabe D (eds) Engineering the web in the big data era, vol 9114.
Lecture Notes in Computer Science. Springer International Publishing, Cham,
pp 613–616. https://doi.org/10.1007/978-3-319-19890-3_41
Earhart AE (2019) “Chapter 18: can information be unfettered? Race and the new
digital humanities canon. In: Gold MK (ed) Debates in the digital humanities.
University of Minnesota Press, Minneapolis
Edmond (2019) Strategies and recommendations for the management of uncer-
tainty in research tools and environments for digital history. Informatics
6(3):36. https://doi.org/10.3390/informatics6030036, https://www.mdpi.
com/2227-9709/6/3/36
Electronic Frontier Foundation (2021) Search the data – atlas of surveillance.
https://atlasofsurveillance.org
Enumerate Observatory (2017) Europeana DSI 2– access to digital resources of
European Heritage. Technical report. Europeana. https://pro.europeana.eu/
Eubanks V (2017) Automating inequality: how high-tech tools profile, police, and
punish the poor, 1st edn. St. Martin’s Press, New York
European Center for Digital Competitiveness (2021) Digital riser report
2021. Techncial report. European Center for Digital Competitiveness, ESCP
Business School. https://digital-competitiveness.eu/wp-content/uploads/
Digital_Riser_Report-2021.pdf
European Commission (2011) Commission recommendation of 27 October 2011
on the digitisation and online accessibility of cultural material and digital
preservation. Official Journal of the European Union. https://eur-lex.europa.
eu/legal-content/EN/TXT/PDF/?uri=CELEX:32011H0711&from=EN
154 REFERENCES

Evans G (2003) Whose heritage is it anyway? Reconciling the ‘National’ and


the ‘Universal’ in Quebec City. Br J Can Stud 16(2):333–347. https://doi.
org/10.3828/bjcs.16.2.9, https://online.liverpooluniversitypress.co.uk/doi/
10.3828/bjcs.16.2.9
Fagan B (2016) Chronicling White America. Am Period J Hist Critic 26(1):10–13.
https://muse.jhu.edu/article/613375. Publisher: The Ohio State University
Press
Falkowitz R (2019) Information visualization or data visualization? This view
of service management. https://www.3cs.ch/information-visualization-data-
visualization/
Fenn J, Raskino M (2008) Mastering the hype cycle: how to choose the right
innovation at the right time. Gartner, Inc./Harvard Business School Press series.
Harvard Business Press, Boston. OCLC: ocn213312226
Fenstad JE (1985) The discrete and the continuous in mathematics and the natural
science. No. 1985,9 in Preprint series/Institute of Mathematics, University of
Oslo Pure Mathematics Institute. University of Oslo
Fickers A (2020) Update für die Hermeneutik. Geschichtswissenschaft auf
dem Weg zur digitalen Forensik? Zeithistorische Forschungen/Studies in
Contemporary History 17:157–168. https://doi.org/10.14765/ZZF.DOK-
1765, https://zeitgeschichte-digital.de/doks/1765. Publisher: ZZF – Centre
for Contemporary History: Zeithistorische Forschungen
Fickers A (2021) Authenticity: historical data integrity and the layered materiality
of digital objects. In: Balbi G, Ribeiro N, Schafer V, Schwarzenegger C (eds)
Digital roots. De Gruyter, Berlin, pp 299–312. https://doi.org/10.1515/
9783110740202-017
Fickers A (2022) Digital hermeneutics: the reflexive turn in digital public history?
In: Noiret S, Tebeau M, Zaagsma G (eds) Handbook of digital public history.
De Gruyter, Berlin, pp 139–148. https://doi.org/10.1515/9783110430295-
012
Fickers A, Heijden Tvd (2020) Inside the trading zone: thinkering in a digital
history lab. Digit Human Q 14(3)
Fiorucci M, Khoroshiltseva M, Pontil M, Traviglia A, Del Bue A, James S (2020)
Machine learning for cultural heritage: a survey. Patt Recog Lett 133:102–108.
https://doi.org/10.1016/j.patrec.2020.02.017, https://www.sciencedirect.
com/science/article/pii/S0167865520300532
Firth Jr (1957) Papers in linguistics, 1934–1951, by J.R. Firth. Oxford University
Press, Oxford. OCLC: 867793141
Flynn (2020) How big data analytics are used in the banking industry. https://
opendatascience.com/how-big-data-analytics-are-used-in-the-banking-
industry/
Foley N (1997) The white scourge: Mexicans, Blacks, and poor whites in Texas
cotton culture. University of California Press, Berkeley. OCLC: 231724580
REFERENCES 155

Frey CB, Osborne MA (2017) The future of employment: how suscepti-


ble are jobs to computerisation? Technol Forecast Soc Chang 114:254–
280. https://doi.org/10.1016/j.techfore.2016.08.019, https://linkinghub.
elsevier.com/retrieve/pii/S0040162516302244
Friedman V (2021) Why is Facebook rejecting these fashion ads? The New York
Times, New York. https://www.nytimes.com/2021/02/11/style/disabled-
fashion-facebook-discrimination.html
Friendly M (2008) A brief history of data visualization. In: Handbook of data
visualization. Springer, Berlin, pp 15–56. https://doi.org/10.1007/978-3-
540-33037-0_2
Galison PL, Stump DJ (1996) The disunity of science: boundaries, contexts, and
power. Stanford University Press, Redwood City
Gärdenfors P (2014) The geometry of meaning: semantics based on conceptual
spaces. MIT Press, Cambridge
Gaver WW, Beaver J, Benford S (2003) Ambiguity as a resource for design. In:
Proceedings of the conference on Human factors in computing systems – CHI
’03. ACM Press, Ft. Lauderdale, p 233. https://doi.org/10.1145/642611.
642653, http://portal.acm.org/citation.cfm?doid=642611.642653
Gillespie T (2014) The relevance of algorithms. In: Gillespie T, Boczkowski PJ,
Foot KA (eds) Media technologies. The MIT Press, Cambridge, pp 167–194.
https://doi.org/10.7551/mitpress/9780262525374.003.0009
Gitelman L (ed) (2013) “Raw data” is an oxymoron. Infrastructures series, The
MIT Press, Cambridge
Glinka K, Meier S, Dörk M (2015) Visualising the »Un-seen«: towards critical
approaches and strategies of inclusion in digital cultural heritage interfaces. In:
Busch C, Sieck J (eds) Kultur und Informatik (XIII) – Cross Media, Verlag
Werner Hülsbusch, Glückstadt, pp 105–118
Goatly A (2007) Washing the brain – metaphor and hidden ideology, discourse
approaches to politics, society and culture, vol 23. John Benjamins Pub-
lishing Company, Amsterdam. https://doi.org/10.1075/dapsac.23, http://
www.jbe-platform.com/content/books/9789027292933
Gonfalonieri A (2020) Introduction to causality in machine learning. https://
towardsdatascience.com/introduction-to-causality-in-machine-learning-
4cee9467f06f
Gooding P (2020) The library in digital humanities: interdisciplinary approaches
to digital materials. In: Schuster K, Dunn S (eds) Routledge international
handbook of research methods in digital humanities. Routledge, Milton Park,
p 16
Goriunova O (2019) The digital subject: people as data as persons. Theory Cult
Soc 36(6):125–145. https://doi.org/10.1177/0263276419840409, http://
journals.sagepub.com/doi/10.1177/0263276419840409
156 REFERENCES

Graeber D (2013) The democracy project: a history, a crisis, a movement, 1st edn.
Spiegel & Grau, New York
Gretarsson B, O’Donovan J, Bostandjiev S, Höllerer T, Asuncion AU, Newman
D, Smyth P (2012) TopicNets: visual analysis of large text corpora with topic
modeling. ACM Trans Intell Syst Technol 3(2):23:1–23:26. https://doi.org/
10.1145/2089094.2089099
Grimshaw M (2018) Towards a manifesto for a critical digital humanities:
critiquing the extractive capitalism of digital society. Palgrave Commun
4(1):21. https://doi.org/10.1057/s41599-018-0075-y, http://www.nature.
com/articles/s41599-018-0075-y
Grothendieck A (1986) Récoltes et Semailles, Part I, II. Réflexions et témoignage
sur un passé de mathématicien. Gallimard, Paris
Grusin R (2014) The dark side of digital humanities: dispatches from two recent
MLA conventions. Differences 25(1):79–92. https://doi.org/10.1215/
10407391-2420009, https://read.dukeupress.edu/differences/article/25/
1/79-92/60682
Guariglia M (2020) Technology can’t predict crime, it can only weaponize prox-
imity to policing. https://www.eff.org/deeplinks/2020/09/technology-cant-
predict-crime-it-can-only-weaponize-proximity-policing
Guglielmo TA (2004) White on arrival: Italians, race, color, and power in Chicago,
1890–1945. Oxford University Press, New York. OCLC: 1090473075
Guglielmo J, Salerno S (2003) Are Italians white?: how race is made in America.
Routledge, New York. OCLC: 51622484
Hall S (1999) Un-settling ‘the heritage’, re-imagining the post-
nationWhose heritage? Third Text 13(49):3–13. https://doi.org/10.1080/
09528829908576818, http://www.tandfonline.com/doi/abs/10.1080/
09528829908576818
Hall G (2002) Culture in bits: the monstrous future of theory. Continuum,
London. http://site.ebrary.com/id/10224963. OCLC: 320326067
Hanks P (2013) Lexical analysis: norms and exploitations. MIT Press, Cambridge.
http://site.ebrary.com/id/10651991. OCLC: 907618528
Hannak A, Soeller G, Lazer D, Mislove A, Wilson C (2014) Measuring price
discrimination and steering on e-commerce web sites. In: Proceedings of
the 2014 conference on internet measurement conference. ACM, Vancouver,
pp 305–318. https://doi.org/10.1145/2663716.2663744, https://dl.acm.
org/doi/10.1145/2663716.2663744
Harris ZS (1954) Distributional structure. Word J Linguist Circ N Y 10(2–3):146–
162
Harris M (1976) History and significance of the EMIC/ETIC distinction.
Annu Rev Anthropol 5(1):329–350. https://doi.org/10.1146/annurev.an.
05.100176.001553
Harrison R (2013) Heritage: critical approaches. Routledge, Milton Park
REFERENCES 157

Heidegger M (1977) The question concerning technology. Garland Publishing


Inc., New York. Publisher: Harper & Row New York
Hernán M, Robins JM (2021) Causal inference. What if. Chapman & Hall/CRC
monographs on statistics & applied probability. Chapman & Hall/CRC, Boca
Raton
Hindle A, Bird C, Zimmermann T, Nagappan N (2015) Do topics make sense to
managers and developers? Empir Softw Eng 20(2):479–515. https://doi.org/
10.1007/s10664-014-9312-1
Hitchcock T (2013) Confronting the digital: or how academic history writ-
ing lost the plot. Cult Soc Hist 10(1):9–23. https://doi.org/10.2752/
147800413X13515292098070
Holbach PHTD (1770) Système de la nature. 1. Fayard, Paris
Hoque E, Carenini G (2016) Interactive topic modeling for exploring asyn-
chronous online conversations: design and evaluation of ConVisIT. ACM Trans
Interact Intell Syst 6(1):1–24. https://doi.org/10.1145/2854158, https://
dl.acm.org/doi/10.1145/2854158
Hu Y, Boyd-Graber J, Satinoff B, Smith A (2014) Interactive topic modeling. Mach
Learn 95(3):423–469. https://doi.org/10.1007/s10994-013-5413-0
Jacobi C, van Atteveldt W, Welbers K (2015) Quantitative analysis of large amounts
of journalistic texts using topic modeling. Digit Journal 4(1):89–106. https://
doi.org/10.1080/21670811.2015.1093271
Jacobson MF (1998) Whiteness of a different color: European immigrants and the
alchemy of race. Harvard University Press, Cambridge. OCLC: 439573264
Jaeger G (2009) Entanglement, information, and the interpretation of quantum
mechanics. The frontiers collection, Springer, Berlin
Jaegul Choo, Changhyun Lee, Reddy CK, Park H (2013) UTOPIAN: user-
driven topic modeling based on interactive nonnegative matrix factorization.
IEEE Trans Vis Comput Graph 19(12):1992–2001. https://doi.org/10.
1109/TVCG.2013.212, http://ieeexplore.ieee.org/document/6634167/
Jones S, Jeffrey S, Maxwell M, Hale A, Jones C (2018) 3D heritage visualisa-
tion and the negotiation of authenticity: the ACCORD project. Int J Herit
Stud 24(4):333–353. https://doi.org/10.1080/13527258.2017.1378905,
https://www.tandfonline.com/doi/full/10.1080/13527258.2017.1378905
Kherwa P, Bansal P (2020) Semantic N-gram topic modeling. ICST Trans
Scalable Inf Syst 7(26):163131. https://doi.org/10.4108/eai.13-7-2018.
163131, http://eudl.eu/doi/10.4108/eai.13-7-2018.163131
Klein JT (1983) The dialectic and rhetoric of disciplinary and interdisciplinary.
In: Chubin DE, Porter AL, Rossini FA, Connolly T (eds) Interdisciplinary
analysis and research: theory and practice of problem-focused research and
development, Lomond Publications Inc., Mt. Airy, pp 35–74
Klein JT (2015) Interdisciplining digital humanities: boundary work in an emerg-
ing field. Digital humanities, University of Michigan Press, Ann Arbor
158 REFERENCES

Komarčević M, Dimić M, Čelik P (2017) Challenges and impacts of the digital


transformation of society in the social sphere. SEER J Lab Soc Aff East
Eur 20(1):31–48. http://www.jstor.org/stable/26379907. Publisher: Nomos
Verlagsgesellschaft mbH
Korom P (2015) Network analysis, history. In: International encyclopedia of the
social & behavioral sciences. Elsevier, Amsterdam, pp 524–531. https://doi.
org/10.1016/B978-0-08-097086-8.03226-8
Lacy P, Rutqvist J (2015) Waste to wealth: the circular economy advantage.
Palgrave MacMillan, Houndmills. OCLC: 922951290
LaGumina SJ (1999) WOP! a documentary history of anti-Italian discrimination
in the United States. Guernica, Toronto. OCLC: 245879956
LaGumina SJ (2018) Discrimination, prejudice and Italian American history. In:
Connell WJ, Pugliese SG (eds) The Routledge history of Italian Americans, The
Routledge histories. Routledge, New York, pp 223–238
Lakoff G (1992) Metaphor and war: the metaphor system used to justify war in
the Gulf. J Urban Cult Stud 2(1):59–72. https://escholarship.org/uc/item/
9sm131vj
Lakoff G (2004) Don’t think of an elephant! know your values and frame the
debate: the essential guide for progressives. Chelsea Green Pub. Co, White River
Junction
Lakoff G (2008) The political mind: why you can’t understand 21st-century politics
with an 18th-century brain. Viking, New York. OCLC: ocn213466226
Langacker RW (1983) Foundations of cognitive grammar. Indiana University
Linguistics Club, Bloomington
Lanza E, Svendsen BA (2007) Tell me who your friends are and I might
be able to tell you what language(s) you speak: social network analy-
sis, multilingualism, and identity. Int J Biling 11(3):275–300. https://doi.
org/10.1177/13670069070110030201, http://journals.sagepub.com/doi/
10.1177/13670069070110030201
Le Bellac M (2006) Quantum physics. Cambridge University Press, Cambridge.
https://doi.org/10.1017/CBO9780511616471
Lee B (2020) Compounded mediation: a data archaeology of the newspaper
navigator dataset. Digit Human Q 15(4). https://doi.org/10.17613/K9GT-
6685, https://hcommons.org/deposits/item/hc:32415/. Publisher: Human-
ities Commons
Lee H, Kihm J, Choo J, Stasko J, Park H (2012) iVisClustering: an interac-
tive visual document clustering via topic modeling. Comput Graph Forum
31(3pt3):1155–1164. https://doi.org/10.1111/j.1467-8659.2012.03108.x,
https://onlinelibrary.wiley.com/doi/10.1111/j.1467-8659.2012.03108.x
REFERENCES 159

Liu A (2012) Where is cultural criticism in the digital humanities? In: Gold MK (ed)
Debates in the digital humanities. University of Minnesota Press, Minneapo-
lis, pp 490–510. https://doi.org/10.5749/minnesota/9780816677948.003.
0049
Liu B (2020) Sentiment analysis: mining opinions, sentiments, and emotions.
Cambridge University Press, Cambridge
Longo G (2018) Information and causality: mathematical reflections on cancer
biology. Organ J Biol Sci 2:83–104. Paginazione. https://doi.org/10.13133/
2532-5876_3.15. Artwork size: 83-104. Paginazione publisher: Organisms.
Journal of Biological Sciences
Longo G (2019) Quantifying the world and its webs: mathematical discrete vs
continua in knowledge construction. Theory Cult Soc 36(6):63–72. https://
doi.org/10.1177/0263276419840414
Luconi S (2003) Frank L. Rizzo and the whitening of Italian Americans in
Philadelphia. In: Guglielmo J, Salvatore S (eds) Are Italians white? How race
is made in America. Routledge, New York, pp 177–191
MacDonald JS, MacDonald LD (1964) Chain migration ethnic neighborhood
formation and social networks. Milbank Mem Fund Q 42(1):82. https://
doi.org/10.2307/3348581, https://www.jstor.org/stable/3348581?origin=
crossref
Mahony S (2018) Cultural diversity and the digital humanities. Fudan J Human Soc
Sci 11(3):371–388. https://doi.org/10.1007/s40647-018-0216-0, http://
link.springer.com/10.1007/s40647-018-0216-0
Mandell L (2019) Gender and cultural analytics: finding or making stereotypes?
In: Gold MK, Klein L (eds) Debates in the digital humanities. University of
Minnesota Press, Minneapolis
Manovich L (2002) The language of new media, 1st edn. Leonardo, MIT Press,
Cambridge
McCallum AK (2002) MALLET: a machine learning for language Toolkit. http://
mallet.cs.umass.edu
McCarty W (2015) Becoming interdisciplinary. In: Schreibman S, Siemens R,
Unsworth J (eds) A new companion to digital humanities. John Wiley & Sons
Ltd., Chichester, pp 67–83. https://doi.org/10.1002/9781118680605.ch5,
https://onlinelibrary.wiley.com/doi/10.1002/9781118680605.ch5
McKeon M (1994) The origins of interdisciplinary studies. Eighteenth Century
Stud 28(1):17. https://doi.org/10.2307/2739220
McNally PF (2002) Double fold: libraries and the assault on paper, Nicholson
Baker. Bibliogr Soc Can 40(1). https://doi.org/10.33137/pbsc.v40i1.18264,
https://jps.library.utoronto.ca/index.php/bsc/article/view/18264
McPherson T (2019) “Chapter 9: why are the digital humanities so white? or
thinking the histories of race and computation. In: Gold MK (ed) Debates in
the digital humanities. University of Minnesota Press, Minneapolis
160 REFERENCES

Microsoft Partner Community (2018) Digital transformation – revolution,


evolution or devolution? https://www.microsoftpartnercommu-
nity.com/t5/Partner-to-Partner/Digital-transformation-revolution-evolution-
or-devolution/m-p/4771#M11. Section: Partner-to-Partner
Miner G (ed) (2012) Practical text mining and statistical analysis for non-structured
text data applications, 1st edn. Academic Press, Waltham
Mio J, Katz AN (eds) (2016) Metaphor: implications and applications, first issued in
paperback edn. A psychology press book, Routledge Taylor and Francis Group,
London
Moretti MDiF (2016) The digital in the humanities: an interview with Franco
Moretti. Los Angeles Review of Books. https://lareviewofbooks.org/article/
the-digital-in-the-humanities-an-interview-with-franco-moretti/
Morze NV, Strutynska OV (2021) Digital transformation in society: key aspects
for model development. J Phys Conf Ser 1946(1):012021. https://doi.org/
10.1088/1742-6596/1946/1/012021. Publisher: IOP Publishing
Murdock J, Allen C (2015) Visualization techniques for topic model checking. Proc
AAAI Conf Artif Intell 29(1). https://ojs.aaai.org/index.php/AAAI/article/
view/9268. Number: 1
Musolff A (2004) Metaphor and political discourse: analogical reasoning in debates
about Europe. Palgrave Macmillan, Houndmills. OCLC: 1025286722
Musolff A (2010) Metaphor, nation and the holocaust: the concept of the body
politic. Routledge, New York. OCLC: 901216467
Musolff A (2014) Metaphor, nation and the holocaust: the concept of the body
politic. Routledge, New York. OCLC: 878024183
Musolff A (2016) Political metaphor analysis: discourse and scenarios. Blooms-
bury Academic, an imprint of Bloomsbury Publishing Plc, London. OCLC:
ocn957391701
NEH (2021) National Digital Newspaper Program. Notice of funding opportu-
nity. https://www.neh.gov/grants/preservation/national-digital-newspaper-
program
Noble SU (2018) Algorithms of oppression: how search engines reinforce racism.
New York University Press, New York
Noble SU (2019) Gender and cultural analytics: finding or
making stereotypes? In: Gold MK, Klein L (eds) Debates in
the digital humanities. Univ of Minnesota Press, Minneapolis.
https://dhdebates.gc.cuny.edu/read/untitled-f2acf72c-a469-49d8-be35-
67f9ac1e3a60/section/5aafe7fe-db7e-4ec1-935f-09d8028a2687#ch02
Novick P (1988) That noble dream: the “objectivity question” and the American
historical profession. Ideas in context, Cambridge University Press, Cambridge
Orsi RA (2010) The Madonna of 115th Street: faith and community in Italian
Harlem, 1880–1950, 3rd edn. Yale University Press, New Haven. OCLC:
ocn632079628
REFERENCES 161

Ottatti V, Renstrom R, Price E (2014) The metaphorical framing model: political


communication and public opinion. In: Landau M, Robinson MD, Meier BP
(eds) The power of metaphor: examining its influence on social life. Ameri-
can Psychological Association, Washington, pp 179–202. https://doi.org/10.
1037/14278-009, http://content.apa.org/books/14278-009
Paradis C (2015) Conceptual spaces at work in sensory cognition: domains,
dimensions and distances. In: Applications of conceptual spaces. Springer,
Berlin, pp 33–55
Park RE (1922) The immigrant press and its control. Harper & Brothers, New
York. OCLC: 711682462
Patton MQ (2015) Qualitative research & evaluation methods: integrating theory
and practice, 4th edn. SAGE Publications Inc, Thousand Oaks
Pearl J, Mackenzie D (2018) The book of why: the new science of cause and effect,
1st edn. Basic Books, New York
Pearl J, Glymour M, Jewell NP (2016) Causal inference in statistics: a primer. Wiley,
Chichester
Peels R (2019) Replicability and replication in the humanities. Res Integ Peer Rev
4(1):2. https://doi.org/10.1186/s41073-018-0060-4
Perry BL, Pescosolido BA, Borgatti SP (2018) Egocentric network analysis:
foundations, methods, and models. Cambridge Univerisity Press, Cambridge.
OCLC: 1142180389
Plesser HE (2018) Reproducibility vs. replicability: a brief history of a confused ter-
minology. Front Neuroinform 11:76. https://doi.org/10.3389/fninf.2017.
00076
Puschmann C, Powell A (2018) Turning words into consumer preferences: how
sentiment analysis is framed in research and the news media. Soc Media
Soc 4(3). https://doi.org/10.1177/2056305118797724, http://journals.
sagepub.com/doi/10.1177/2056305118797724
Putnam L (2016) The transnational and the text-searchable: digitized sources and
the shadows they cast. Am Hist Rev 121(2):377–402. https://doi.org/10.
1093/ahr/121.2.377
Řeh˚uřek R, Sojka P (2010) Software framework for topic modelling with large
corpora. In: Proceedings of the LREC 2010 workshop on new challenges for
NLP frameworks. ELRA, Valletta, pp 45–50
Reidsma M (2019) Masked by trust: bias in library discovery. Library Juice Press,
Sacramento
Reisigl M, Wodak R (2001) The discourse-historical approach. In: Wodak R, Meyer
M (eds) Methods of critical discourse analysis. Sage, London, pp 63–94
ResearchAndMarketscom (2020) Big data in healthcare report 2019: global market
analysis by component, deployment, analytics type, application, end-user, and
region (2014–2024). Technical report, ResearchAndMarkets.com. https://
www.researchandmarkets.com
162 REFERENCES

Rhodes LD (2010) The ethnic press: shaping the American dream. Lang, New
York. OCLC: 705943676
Ricœur P (2003) The rule of metaphor the creation of meaning in language.
Routledge Classics, London. OCLC: 1229646734
Riedl M, Padó S (2018) A named entity recognition shootout for German. In:
Proceedings of the 56th annual meeting of the association for computational
linguistics (volume 2: short papers). Association for Computational Linguistics,
Melbourne, pp 120–125. https://doi.org/10.18653/v1/P18-2020
Risam R (2015) Beyond the margins: intersectionality and the digital humanities.
Digit Human Q 9(2)
Ritchey T (2011) Wicked problems – social messes: decision support modelling
with morphological analysis. vol. 17. In: Risk, governance and society. Springer,
Berlin
Röder M, Both A, Hinneburg A (2015) Exploring the space of topic coherence
measures. In: Proceedings of the eighth ACM international conference on web
search and data mining – WSDM ’15. ACM Press, Shanghai, pp 399–408.
https://doi.org/10.1145/2684822.2685324
Roediger DR (2005) Working toward whiteness: how America’s immigrants
became white: the strange journey from Ellis Island to the suburbs. BasicBooks,
New York. OCLC: 475064525
Rorty R (2004) Being that can be understood is language. In: Krajewski B (ed)
Gadamer’s repercussions, 1st edn. Reconsidering Philosophical Hermeneutics.
University of California Press, Berkeley, pp 21–29
Rougier NP (2016) R-words · Issue #5 · ReScience/ReScience-article. https://
github.com/ReScience/ReScience-article/issues/5
Rumsey AS (2016) When we are no more: how digital memory is shaping our
future. https://www.youtube.com/watch?v=_ZJDFDscmWE
Rumsey AS, Digital Library Federation (2001) Strategies for building digitized
collections. Digital Library Federation, Council on Library and Information
Resources, Washington. https://lccn.loc.gov/2002726838
Sánchez RT, Santos AB, Vicente RS, Gómez AL (2019) Towards an uncertainty-
aware visualization in the digital humanities. Informatics 6(3):31. https://doi.
org/10.3390/informatics6030031
Schwab K (2017) The fourth industrial revolution. Penguin, London. OCLC:
1004975093
Shanmugam R (2018) Elements of causal inference: foundations and learning
algorithms. J Stat Comput Simul 88(16):3248–3248. https://doi.org/10.
1080/00949655.2018.1505197
Sievert C, Shirley K (2014) LDAvis: a method for visualizing and interpreting
topics. In: Proceedings of the workshop on interactive language learning, visu-
alization, and interfaces. Association for Computational Linguistics, Baltimore,
pp 63–70. https://doi.org/10.3115/v1/W14-3110
REFERENCES 163

Silva CC, Galster M, Gilson F (2021) Topic modeling in software engineering


research. Empir Softw Eng 26(6):120. https://doi.org/10.1007/s10664-
021-10026-0
Sims M (2021) How to count biological minds: symbiosis, the free energy principle,
and reciprocal multiscale integration. Synthese 199(1):2157–2179. https://
doi.org/10.1007/s11229-020-02876-w
Snow CP (2013) The two cultures and the scientific revolution. Martino Publish-
ing, Mansfield Center. OCLC: 908393033
Spence R (2014) Information visualization. Springer, New York
Stehr N, Weingart P (eds) (2000) Practising interdisciplinarity. University of
Toronto Press, Toronto. OCLC: ocm43283641
Stichweh R (2001) History of scientific disciplines. In: International encyclopedia
of the social & behavioral sciences. Elsevier, Amsterdam, pp 13727–13731.
https://doi.org/10.1016/B0-08-043076-7/03187-9
Stiegler B (1998) Technics and time: the fault of Epimetheus. Stanford University
Press, Redwood City
Stigler SM (1986) Laplace’s 1774 memoir on inverse probability. Stat Sci 1(3):359–
363. Publisher: Institute of Mathematical Statistics
Stigler SM (1987) Testing hypotheses or fitting models? In: Nitecki MH, Hoff-
man A (eds) Neutral models in biology. Oxford University Press, New York.
Publisher: Oxford University Press on Demand
Story J, Walker I (2016) The impact of diasporas: markers of identity. Ethn Racial
Stud 39(2):35–141. https://doi.org/10.1080/01419870.2016.1105999
Studdert-Kennedy M, Goldstein L (2003) Launching language: the gestural origin
of discrete infinity. In: Christiansen MH, Kirby S (eds) Language evolution.
Oxford University Press, Oxford, pp 235–254
Sua A, Cheng WL, Lund S, De Smet A, Robinson O, Sanghvi S (2020) The
postpandemic workforce: responses to a McKinsey global survey of 800
executives. Technical report, McKinsey Global Institute. https://www.
mckinsey.com/featured-insights/future-of-work/what-800-executives-
envision-for-the-postpandemic-workforce
Tally RT (ed) (2011) Geocritical explorations: space, place, and mapping in literary
and cultural studies. Palgrave Macmillan, New York
Talmy L (2000) Toward a cognitive semantics. MIT Press, Cambridge
Taylor J, Donaldson CE, Gregory IN, Butler JO (2018) Mapping digitally,
mapping deep: exploring digital literary geographies. Lit. Geograph. 4(1):10–
19. Number: 1
Thomas WG III (2014) The promise of the digital humanities and the contested
nature of digital scholarship. In: Schriebman S, Siemens R, Unsworth J (eds) A
new companion to digital humanities. Wiley-Blackwell, Hoboken, pp 524–537.
https://digitalcommons.unl.edu/historyfacpub/198
164 REFERENCES

Thompson Klein J (2004) Prospects for transdisciplinarity. Futures 36(4):515–


526. https://doi.org/10.1016/j.futures.2003.10.007, https://linkinghub.
elsevier.com/retrieve/pii/S0016328703001903
Tilley CY (1989) Interpreting material culture. In: Hodder I (ed) The meanings
of things. Routledge, London, pp 185–194
Tong D (2011) Physics and the integers. FQXi Essay Contest 2011. Publisher:
Citeseer
Turkle S (2014) Life on the screen. Simon & Schuster, New York
UNESCO (2003) Records of the general conference, 32nd session, Paris, 29
September to 17 October 2003, vol 1: resolutions – UNESCO digital library.
https://unesdoc.unesco.org/ark:/48223/pf0000133171.page=80
UNESCO (2020) Education: from disruption to recovery. https://en.unesco.
org/covid19/educationresponse
United Nations (2020a) ‘Inequality defines our time’: UN chief delivers hard-
hitting Mandela day message | | UN News. UN News – Global perspective
Human stories. https://news.un.org/en/story/2020/07/1068611
United Nations (2020b) Roadmap for Digital Cooperation. Technical report,
United Nations. https://www.un.org/en/content/digital-cooperation-
roadmap/assets/pdf/Roadmap_for_Digital_Cooperation_EN.pdf
Valeurs actuelles (2020) Coronavirus: après avoir critiqué l’Italie, la France s’en
inspire. https://www.valeursactuelles.com/politique/coronavirus-apres-avoir-
critique-litalie-la-france-sen-inspire-117319
Valla C, Benenson A (2014) Some sites and their artifacts: 123D catch | Rhizome.
https://rhizome.org/editorial/2014/may/8/some-sites-and-their-artifacts-
123d-catch/
van den Besselaar P, Heimeriks G (2001) Disciplinary, multidisciplinary, interdis-
ciplinary: concepts and indicators. In: 8th conference on scientometrics and
infometrics, Sydney
van Dijk TA (1993) Principles of critical discourse analysis. Discourse Soc
4(2):249–283. https://doi.org/10.1177/0957926593004002006
Vellon PG (2010) “Between White Men and Negroes”: the perception of Southern
Italian immigrants through the lens of Italian lynchings. In: Connell WJ,
Gardaphé FL (eds) Anti-Italianism: essays on a prejudice, Palgrave Macmillan,
New York, pp 23–32. OCLC: 755003583
Vellon PG (2017) A great conspiracy against our race: Italian immigrant newspapers
and the construction of whiteness in the early twentieth century. New York
University Press, New York. OCLC: 946161027
Vergo P (1989) The New museology. Reaktion Books, London. OCLC: 21874494
Viola L (2018) ChroniclItaly: a corpus of Italian American newspapers from
1898 to 1920. Utrecht University, Utrecht. https://public.yoda.uu.nl/i-lab/
UU01/T4YMOW.html
REFERENCES 165

Viola L (2019) ChroniclItaly 2.0. a corpus of Italian American newspapers anno-


tated for entities, 1898–1920. https://doi.org/10.24416/UU01-4MECRO
Viola L (2020a) Make Italy great again. Trump’s echo and discursive manipulations
in Salvini’s end of the year Facebook speech. In: Llamas Saíz C, Breeze R
(eds) Metaphor in politics and populism, EUNSA, Navarra, pp 111–142. ISBN:
8431334673. Publisher: EUNSA
Viola L (2020b) Replication, evaluation and quantitative analysis in the DH era:
transparent digital practices and lessons learned from the development of the
GeoNewsMiner. In: DH Benelux 2020 #GoesOnline, World Wide Web, 3rd
– 5th June 2020, Zenodo. https://doi.org/10.5281/ZENODO.3859535,
https://zenodo.org/record/3859535
Viola L (2021) ChroniclItaly and ChroniclItaly 2.0: digital heritage to access
narratives of migration. Int J Human Arts Comput 15(1–2):170–185. https://
doi.org/10.3366/ijhac.2021.0268, https://www.euppublishing.com/doi/
10.3366/ijhac.2021.0268
Viola L (2022) “Italy, for example, is just incredibly stupid now”. European
crisis narrations in relation to Italy’s response to COVID-19. Front Com-
mun 7:757847. https://doi.org/10.3389/fcomm.2022.757847, https://
www.frontiersin.org/articles/10.3389/fcomm.2022.757847/full
Viola L, Fiscarelli AM (2021a) ChroniclItaly 3.0. A deep-learning, contex-
tually enriched digital heritage collection of Italian immigrant newspapers
published in the USA, 1898–1936. https://doi.org/10.5281/ZENODO.
4596345, https://zenodo.org/record/4596345. ISSN: 1613-0073. Version
Number: v3.0.0 Type: dataset
Viola L, Fiscarelli MA (2021b) From digitised sources to digital data: Behind the
scenes of (critically) enriching a digital heritage collection. In: Weber A, Heerlien
M, Gassó Miracle E, Wolstencroft K (eds) Proceedings of the international
conference collect and connect: archives and collections in a digital age, CEUR –
workshops proceedings, vol 2810, pp 51–64. http://ceur-ws.org/Vol-2810/
paper5.pdf
Viola L, Verheul J (2019a) The media construction of Italian identity: a transat-
lantic, digital humanities analysis of italianitá , ethnicity, and whiteness,
1867–1920. Identity 19(4):294–312. https://doi.org/10.1080/15283488.
2019.1681271
Viola L, Verheul J (2019b) Mining ethnicity: discourse-driven topic modelling
of immigrant discourses in the USA, 1898–1920. Digit Scholarsh Human.
https://doi.org/10.1093/llc/fqz068
Viola L, Verheul J (2020a) Machine learning to geographically enrich understudied
sources: a conceptual approach:. In: Rocha P, Steels L, van den Herik H
(eds) Proceedings of the 12th international conference on agents and artificial
intelligence. SCITEPRESS - Science and Technology Publications, Valletta,
pp 469–475. https://doi.org/10.5220/0009094204690475
166 REFERENCES

Viola L, Verheul J (2020b) One hundred years of migration discourse in the times: a
discourse-historical word vector space approach to the construction of meaning.
Front Artif Intell 3:64. https://doi.org/10.3389/frai.2020.00064, https://
www.frontiersin.org/article/10.3389/frai.2020.00064/full
Viola L, De Bruin J, van Eijden K, Verheul J (2019) The GeoNewsMiner (GNM):
an interactive spatial humanities tool to visualize geographical references in
historical newspapers (v1.0.0). https://github.com/lorellav/GeoNewsMiner
Viola L, Cunningham AR, Jaskov H (2021) Introducing the DHARPA project:
an interdisciplinary lab to enable critical DH practice. In: The humanities in
a digital world, virtual. https://doi.org/10.5281/zenodo.5109974, https://
zenodo.org/record/5109974
Waldrop MM (1992) Complexity: the emerging science at the edge of order and
chaos. Simon & Schuster, New York. OCLC: 26310607
Wallach HM (2006) Topic modeling: beyond bag-of-words. In: Proceedings of
the 23rd international conference on Machine learning - ICML ’06, ACM
Press, Pittsburgh, pp 977–984. https://doi.org/10.1145/1143844.1143967,
http://portal.acm.org/citation.cfm?doid=1143844.1143967
Wang X, McCallum A, Wei X (2007) Topical N-grams: phrase and topic dis-
covery, with an application to information retrieval. In: Seventh IEEE inter-
national conference on data mining (ICDM 2007). IEEE, Omaha, pp 697–
702. https://doi.org/10.1109/ICDM.2007.86, http://ieeexplore.ieee.org/
document/4470313/
Ware C (2021) Information visualization: perception for design.
Elsevier, Amsterdam. https://www.sciencedirect.com/science/book/
9780128128756. OCLC: 1144746447
Weinelt B (2016) Digital transformation of industries: societal implications.
Technical report, World Economic Forum. https://reports.weforum.org/
digital-transformation/wp-content/blogs.dir/94/mp/files/pages/files/dti-
societal-implications-white-paper.pdf
Weinert F (2005) The scientist as philosopher: philosophical consequences of great
scientific discoveries. Springer, Berlin
Weingart P (2000) 2. Interdisciplinarity: the paradoxical discourse. In: Stehr
N, Weingart P (eds) Practising interdisciplinarity. University of Toronto
Press, Toronto, pp 25–42. https://doi.org/10.3138/9781442678729-004,
https://www.degruyter.com/document/doi/10.3138/9781442678729-
004/html
Weiskott E (2017) There is no such thing as ‘the digital humanities’. The Chronicle
of Higher Education. https://www.chronicle.com/article/there-is-no-such-
thing-as-the-digital-humanities/
Wilding R (2007) Transnational ethnographies and anthropological imaginings
of migrancy. J Ethn Migr Stud 33(2):331–348. https://doi.org/10.1080/
13691830601154310
REFERENCES 167

Williams L (2019) What computational archival science can learn from art
history and material culture studies. In: 2019 IEEE international conference
on big data (big data). IEEE, Los Angeles, pp 3153–3155. https://doi.
org/10.1109/BigData47090.2019.9006527, https://ieeexplore.ieee.org/
document/9006527/
Wills C (2005) Destination America. DK Publishers, New York. OCLC: 61361495
Windhager F, Federico P, Schreder G, Glinka K, Dork M, Miksch S, Mayr
E (2019a) Visualization of cultural heritage collection data: state of the
art and future challenges. IEEE Trans Vis Comput Graph 25(6):2311–
2330. https://doi.org/10.1109/TVCG.2018.2830759, https://ieeexplore.
ieee.org/document/8352050/
Windhager F, Salisu S, Mayr E (2019b) Exhibiting uncertainty: visualizing data
quality indicators for cultural collections. Informatics 6(3):29. https://doi.
org/10.3390/informatics6030029, https://www.mdpi.com/2227-9709/6/
3/29
Winter T (2011) Post-conflict heritage, postcolonial tourism: culture, politics
and development at Angkor, 1st edn. No. 21 in Routledge studies in Asia’s
transformations, Routledge, London
Zuanni C (2020) Heritage in a digital world: algorithms, data, and future
value generation. Grazer theologische Perspektiven 3(2):236–259. https://doi.
org/10.25364/17.3:2020.2.12, https://digital.obvsg.at/limina/periodical/
titleinfo/5552067. Medium: PDF. Publisher: Universität Graz. Version Num-
ber: Version of Record
INDEX

A Causal machine learning, 92


Accountability, 76, 140 Causation, 85, 86
Algorithm, 13, 14, 17 Chain migration, 54
Algorithmic discourse, 14–16 Chronicling America, 43, 48, 49, 51,
Ambiguities, 33, 34, 76, 123, 132, 61, 63
141, 143 ChroniclItaly, 47, 51, 53
Artificial Intelligence, 10, 13, 17, 30, ChroniclItaly 2.0, 50, 51
58, 109, 145 ChroniclItaly 3.0, 47, 59, 111, 113,
Authenticity, 32, 38–40, 46, 140 124, 140
Coherence, 100
Collocation, 88
B
Completeness, 38
Bag of words, 99
Complexity Theory, 89, 90
Big data, 10, 12, 13, 16, 22, 90
Conceptual metaphors, 81
analytics, 89–91
Content enrichment, 58
culture, 121
Continuous, 84, 92
hype, 90
Continuous modelling of information,
philosophy, 89, 91, 104, 142
33, 141
Brittle book crisis, 43
Corpus linguistics, 73
Corpus preparation, 95, 143
C Correlations, 33, 79, 85, 87–91, 93,
C2 DH, 50, 112 104, 122, 142
Cambridge Analytica, 12 Coursera, 4
Causal inference challenge, 91 COVID-19 pandemic, 4, 6, 10, 82
Causality, 33, 79, 84–88, 91, 104, 142 Critical data literacy, 116

© The Author(s) 2023 169


L. Viola, The Humanities in the Digital: Beyond Critical Digital
Humanities, https://doi.org/10.1007/978-3-031-16950-2
170 INDEX

Critical data management, 118 Digital linguistic injustice, 17


Critical digital humanities, 8, 21–23, Digital objects, 8, 16, 32, 37, 39–41,
25, 26, 29 43, 44, 46–48, 54, 62, 65, 67, 70,
Critical digital literacy, 17, 111, 123 94, 110, 111, 113, 114, 116, 118,
Critical digital visualisation, 17 134, 140
Critical Heritage Studies, 39, 42 Digital transformation, 1, 2, 4, 6, 18,
Critical Posthumanities, 9, 31, 39 22, 25, 26, 29, 31, 32, 78, 141
Critical visualisation literacy, 116 of society, 137
Cultural heritage, 39, 57, 110, 125 Digitally-born heritage, 38
Digital Turn, 2, 10, 11, 19, 50, 86
Digitisation, 41–43, 46, 49, 57, 63, 98
D of society, 44, 86
Database categories, 15 Digitised heritage, 38
Data visualisation, 34, 107, 108, 143 Discourse-driven topic modelling,
Dead metaphors, 84 113
Deep learning, 33, 51, 67 Discourse-historical approach, 113
Determinism, 86, 87 Discrete, 72, 82, 84, 92, 126
DeXTER, 33, 34, 50, 58, 70, 76, 111, vs continuous, 86, 87
123, 125, 128, 130, 132–134, data, 93
141, 143 infinity of language, 92
DHARPA, 112 modelling of information, 33,
Diasporic communities, 54 141
Diasporic newspapers, 48 Distributional semantics theory, 88,
Digital capitalism, 21, 22 92
Digital cultural heritage, 30, 46, 65, Documentation, 60, 67, 76, 78, 112,
77, 111 118, 126, 133
Digital Divide, 78
Digital heritage, 37, 38, 40, 41, 46,
52, 58, 123 E
material, 70 Egocentric network, 130
Digital hermeneutics, 30 Encoding of criticism, 68
Digital humanities, 3, 8, 21–23, 25, Enrichment, 59, 60, 70
26, 29, 40, 42, 94, 112, 123, 125 Entity linking, 58
Digital inequality, 78 Ethnic press, 52, 66, 114
Digital knowledge creation process, 8,
16, 32, 40, 43, 52, 60, 65, 67, 68,
73, 78, 85, 89, 94, 104, 110, 111, F
117, 118, 121, 123, 129, 134 Facebook, 12, 15
Digital knowledge production, 46, F1 score, 64
105, 112 Framing power, 82
Digital language injustice, 63, 98 Functional view of causation, 86
INDEX 171

G K
Gartner Hype Cycle, 3 Knowledge creation, 3, 8, 17, 26, 27,
Gensim, 99 44, 55, 58, 77, 144
Geocoding, 58, 61, 67, 68
Geolocation, 33, 55, 141
GeoNewsMiner, 50, 64 L
Graphical User Interface, 50 L’Italia, 64, 67
Graph theory, 124 La Rassegna, 67
Language injustice, 68
Latent Dirichlet Allocation, 82, 95,
H 99–101, 114
Heritage, 47 LDAvis, 109
Heritagisation, 58 Lemmatisation, 59, 97
Higher education, 4, 34 Lemmatising, 118
Historical migration, 54 Library of Congress, 43, 48, 96, 113
Historical newspapers, 48 Linguistic categories, 72
Human-in-the-loop, 109 Lowercasing, 59, 61
The humanities, 76, 83, 85, 86, 110

M
I Machine learning, 13–15, 58, 62, 84,
Immigrant communities, 52 90, 91, 104, 108, 113, 142
Immigrant press, 52 Mainstream humanities, 21
Information Retrieval, 71, 99, 111 MALLET, 99, 113
Information visualisation, 107, 108, Markers of identity, 66, 70, 124
111 Mass migrations, 53
Interactive Topic Modelling, 109 Material culture, 39
Inter-annotator agreement, 73 Materiality of the sources, 61, 63, 65,
Interdisciplinarity discourse, 19, 20, 66, 95, 113, 114, 118
27, 144 McKinsey Global Institute, 4, 5
Interpreting the topics, 101 Meaning, 88, 91
Interspecific competition, 45, 139 Mechanical Turk, 10, 17
Issue-focused network, 130 Metaphors, 81–83, 142
Italian American diaspora, 70 Microfilm, 43, 46, 49, 61, 62
Italian American migration, 52, 53 Microtargeting, 12
Italian immigrant communities, 51 Migrant communities, 48
Italian immigrant newspapers, 53 Migrants’ narratives, 54, 66
Italian Transatlantic migration, 124 Migration, 54
IVisClustering, 109 Model of knowledge creation, 137
172 INDEX

Model perplexity, 103 Post-authentic framework, 16, 27,


Monism, 32 28, 31–34, 39, 47, 55, 58, 62, 65,
Mutualism, 31, 33, 45, 46, 55, 85, 70, 72, 76, 78, 79, 84, 85, 94, 95,
113, 120, 139, 141 102, 110, 113, 118, 121, 123,
124, 129, 134, 138, 143
Posthuman critical theory, 18, 31, 43,
N 44
Named Entity Recognition, 33, 55, Predictive likelihood, 103
58, 61, 141 Predictive policing, 13
National Digital Newspaper Program, Pre-processing, 95, 118, 141, 143
48, 62 Punctuation marks, 61
National Endowment for the
Humanities, 48
Natural language processing, 71 R
NER, 61 Replicability, 31, 76, 77, 140
Network analysis, 34, 111, 123, 124, Roadmap for Digital Cooperation, 78
143 Rotogravure, 43
N-grams, 100
NLP, 51, 74, 88, 113
Number of topics, 95, 143 S
Scope, 73, 74, 96
Sentiment analysis, 33, 34, 55, 58, 61,
O 70–73, 82, 84, 104, 111, 123,
Oceanic Exchanges, 47, 113 124, 131, 141–143
OCR, 62, 63 Sentiment magnitude, 74
errors, 100, 101 Sentiment score, 74
Open Access, 31, 78, 140 Significance, 89
Optical Character Recognition, 55 Specificity of the sources, 114, 124
Optimal number of topics, 103 Stemming, 59, 97, 118
Originality, 38 Stopwords, 59, 60
Originary technicity, 17 Sustainability, 78, 140
Symbiosis, 31, 33, 45, 46, 55, 85, 113,
120, 139, 141
P Symbiosis and mutualism, 34
Pandemic, 5, 8, 16, 18, 25, 26, 29, 44,
137
Paradox of interdisciplinarity, 19, 31 T
Parts-of-Speech Tagging, 58 Term frequency, 99
Patterns, 85, 87–91, 95, 101, 102, Termite, 109
113, 122 Text annotation, 58
Perplexity, 100 TF-IDF, 99
Poker interfaces, 123 To digital knowledge creation, 73
INDEX 173

Tokenization, 59, 60 UNESCO, 4, 37, 41, 42


Topic coherence, 103 United Nations, 78
Topic Explorer, 109 User experience, 58
Topic modelling, 33, 34, 79, 82–85, UTOPIAN, 109
88, 91, 92, 94, 96, 97, 99–101,
105, 109, 112–114, 122, 142,
143 V
Topic modelling interface, 115 Vannevar Bush’s Memex, 20
TopicNets, 109 Visual display, 31
Topics’ interpretability, 109 Visualisation, 34, 76
Trading zones, 50 Visualisation literacy, 94, 105, 111,
Transatlantic transfer, 48 118
Transparency, 31, 76, 78, 140
Transversality, 31, 44, 45
Transversal Posthumanities, 31 W
Two cultures, 24, 25 Whiteness, 53
Wireframes, 118
Word collocation, 99
U Word2vec, 88
Uncertainties, 33, 34, 76, 90, 123,
125, 141, 143

You might also like