Download as pdf or txt
Download as pdf or txt
You are on page 1of 669

SAGE was founded in 1965 by Sara Miller McCune to support

the dissemination of usable knowledge by publishing innovative


and high-quality research and teaching content. Today, we
publish over 900 journals, including those of more than 400
learned societies, more than 800 new books per year, and a
growing range of library products including archives, data, case
studies, reports, and video. SAGE remains majority-owned by
our founder, and after Sara’s lifetime will become owned by
a charitable trust that secures our continued independence.

Los Angeles | London | New Delhi | Singapore | Washington DC | Melbourne


9781473980051_OFC.indd 3 12/03/2018 10:4
SAGE Publications Ltd Editorial arrangement & Introduction © Niels Brügger &
1 Oliver’s Yard Ian Milligan, 2019
55 City Road Chapter 1 © Ian Milligan, 2019 Chapter 23 © Ian Milligan, 2019
London EC1Y 1SP Chapter 2 © Niels Brügger, 2019 Chapter 24 © Ignacio Siles, 2019
Chapter 3 © Peter Webster, 2019 Chapter 25 © Christina Ortner,
SAGE Publications Inc. Chapter 4 © Richard Rogers, 2019 Philip Sinner and Tanja Jadin, 2019
2455 Teller Road Chapter 5 © Valérie Schafer and Chapter 26 © Madhavi
Thousand Oaks, California 91320 Benjamin G. Thierry, 2019 Mallapragada, 2019
Chapter 6 © Francesca Musiani Chapter 27 © Allie Kosterich and
SAGE Publications India Pvt Ltd and Valérie Schafer, 2019 Matthew Weber, 2019
B 1/I 1 Mohan Cooperative Industrial Area Chapter 7 © Ralph Schroeder, Chapter 28 © Niels Brügger and
Mathura Road 2019 Ditte Laursen, 2019
New Delhi 110 044 Chapter 8 © Stine Lomborg, 2019 Chapter 29 © James O’Sullivan
Chapter 9 © Federico Nanni, 2019 and Dene Grigar, 2019
SAGE Publications Asia-Pacific Pte Ltd Chapter 10 © Michael Stevenson Chapter 30 © Valérie Beaudouin,
3 Church Street and Anat Ben-David, 2019 Zeynep Pehlivan and Peter
#10-04 Samsung Hub Chapter 11 © Anthony Cocciolo, Stirling, 2019
Singapore 049483 2019 Chapter 31 © Gareth
Chapter 12 © Anat Ben-David Millward, 2019
and Adam Amram, 2019 Chapter 32 © Peter Webster, 2019
Chapter 13 © Justin Joque, 2019 Chapter 33 © Jeremy Wade
Chapter 14 © Michael L. Nelson Morris, 2019
and Herbert Van de Sompel, 2019 Chapter 34 © Jim McGrath, 2019
Chapter 15 © Belinda Barnet, 2019 Chapter 35 © Gabriele de Seta,
Chapter 16 © Anne Helmond, 2019 2019
Chapter 17 © Alexander Halavais, Chapter 36 © Mark McLelland,
Editor: Michael Ainsley 2019 2019
Editorial Assistant: Umeeka Raichura Chapter 18 © Lindsay Poirier, 2019 Chapter 37 © Susanna
Production Editor: Anwesha Roy Chapter 19 © Marc Weber, 2019 Paasonen, 2019
Copyeditor: Sunrise Setting Chapter 20 © Gerard Goggin, 2019 Chapter 38 © Finn Brunton, 2019
Proofreader: Sunrise Setting Chapter 21 © Andy Famiglietti, Chapter 39 © Michael Nycyk,
Indexer: Elske Janssen 2019 2019
Marketing Manager: Lucia Sweet Chapter 22 © Matthew Crain, 2019 Chapter 40 © Jane Winters, 2019
Cover Design: Wendy Scott
Typeset by: Cenveo Publisher Services
Printed in the UK

Apart from any fair dealing for the purposes of research or private
study, or criticism or review, as permitted under the Copyright,
Designs and Patents Act, 1988, this publication may be reproduced,
stored or transmitted in any form, or by any means, only with the prior
permission in writing of the publishers, or in the case of reprographic
reproduction, in accordance with the terms of licences issued by
the Copyright Licensing Agency. Enquiries concerning reproduction
outside those terms should be sent to the publishers.

At SAGE we take sustainability seriously.


Library of Congress Control Number: 2018960265
Most of our products are printed in the UK
using FSC papers and boards. When we British Library Cataloguing in Publication data
print overseas we ensure sustainable
papers are used as measured by the A catalogue record for this book is available from the British Library
PREPS grading system. We undertake an
annual audit to monitor our sustainability. ISBN 978-1-4739-8005-1
Contents

List of Figures and Box ix


List of Tables xiv
Notes on the Editors and Contributors xv
Foreword xxv
Steve Jones
Introduction xxviii

PART I THE WEB AND HISTORIOGRAPHY 1

1. Historiography and the Web 3


Ian Milligan

2. Understanding the Archived Web as a Historical Source 16


Niels Brügger

3. Existing Web Archives 30


Peter Webster

4. Periodizing Web Archiving: Biographical, Event-Based,


National and Autobiographical Traditions 42
Richard Rogers

PART II THEORETICAL AND METHODOLOGICAL REFLECTIONS 57

5. Web History in Context 59


Valérie Schafer and Benjamin G. Thierry

6. Science and Technology Studies Approaches to Web History 73


Francesca Musiani and Valérie Schafer

7. Theorizing the Uses of the Web 86


Ralph Schroeder

8. Ethical Considerations for Web Archives and Web History Research 99


Stine Lomborg

9. Collecting Primary Sources from Web Archives: A Tale of Scarcity


and Abundance 112
Federico Nanni
vi THE SAGE HANDBOOK OF WEB HISTORY

10. Network Analysis for Web History 125


Michael Stevenson and Anat Ben-David

11. Quantitative Web History Methods 138


Anthony Cocciolo

12. Computational Methods for Web History 153


Anat Ben-David and Adam Amram

13. Visualizing Historical Web Data 168


Justin Joque

PART III TECHNICAL AND STRUCTURAL DIMENSIONS


OF WEB HISTORY 187

14. Adding the Dimension of Time to HTTP 189


Michael L. Nelson and Herbert Van de Sompel

15. Hypertext before the Web – or, What the Web Could Have Been 215
Belinda Barnet

16. A Historiography of the Hyperlink: Periodizing the Web through the


Changing Role of the Hyperlink 227
Anne Helmond

17. How Search Shaped and Was Shaped by the Web 242
Alexander Halavais

18. Making the Web Meaningful: A History of Web Semantics 256


Lindsay Poirier

19. Browsers and Browser Wars 270


Marc Weber

20. Emergence of the Mobile Web 297


Gerard Goggin

PART IV PLATFORMS ON THE WEB 313

21. Wikipedia 315


Andy Famiglietti

22. A Critical Political Economy of Web Advertising History 330


Matthew Crain

23. Exploring Web Archives in the Age of Abundance:


A Social History Case Study of GeoCities 344
Ian Milligan
Contents vii

24. Blogs 359


Ignacio Siles

25. The History of Online Social Media 372


Christina Ortner, Philip Sinner and Tanja Jadin

PART V WEB HISTORY AND USERS, SOME CASE STUDIES 385

26. Cultural Historiography of the ‘Homepage’ 387


Madhavi Mallapragada

27. Consumers, News, and a History of Change 400


Allie Kosterich and Matthew Weber

28. Historical Studies of National Web Domains 413


Niels Brügger and Ditte Laursen

29. The Origins of Electronic Literature as Net/Web Art 428


James O’Sullivan and Dene Grigar

30. Exploring the Memory of the First World War Using Web Archives:
Web Graphs Seen from Different Angles 441
Valérie Beaudouin, Zeynep Pehlivan and Peter Stirling

31. A History with Web Archives, Not a History of Web Archives: A History
of the British Measles–Mumps–Rubella Vaccine Crisis, 1998–2004 464
Gareth Millward

32. Religion and Web History 479


Peter Webster

33. Hearing the Past: The Sonic Web from MIDI to Music Streaming 491
Jeremy Wade Morris

34. Memes 505


Jim McGrath

35. Years of the Internet: Vernacular Creativity before, on and


after the Chinese Web 520
Gabriele de Seta

36. Cultural, Political and Technical Factors Influencing Early Web


Uptake in North America and East Asia 537
Mark McLelland

37. Online Pornography 551


Susanna Paasonen
viii THE SAGE HANDBOOK OF WEB HISTORY

38. Spam 564


Finn Brunton

39. Trolls and Trolling History: From Subculture to Mainstream Practices 577
Michael Nycyk

PART VI THE ROADS AHEAD 591

40. Web Archives and (Digital) History: A Troubled Past and a


Promising Future? 593
Jane Winters

Index607
List of Figures and Box

FIGURES

1.1 The White House viewed in the Wayback Machine 8


1.2 Three Prime Ministers seen in the UK Web Archive’s Shine interface.
Image used with thanks to the British Library 11
4.1 Early blogosphere, with missing archived websites. Collection based on
Eatonweb. Digital Methods Initiative, Amsterdam, 2009 51
4.2 Trackers embedded in The New York Times. Output from Tracker Tracker
tool showing trackers embedded in archived newspaper webpages over time
Digital Methods Initiative, Amsterdam, 2012 52
11.1 Library of Congress website from year 2002, with text areas highlighted
with black bounding boxes. Webpage is 23.33% text using this method 146
11.2 WhiteHouse.gov from 2002 with text areas highlighted with
black bounding boxes. Webpage is 46.10% text using this method 147
11.3 Percentage of text on webpages 149
13.1 The network of all pages from http://ai.umich.edu, a new unit at the
University of Michigan focused on academic innovation in the digital age.
Each circle (node) around the outside represents a single page and each
line (edge) between them represents a link from one page to another.
The nodes are sorted around the circle based on the next directory in the url.
It is already evident that the left is the highly connected component while the
right is very sparsely connected. This and all of the other network diagrams
made by the author (Figures 13.1 through 13.6) were made
using Cytoscape 171
13.2 The force-directed visualization of the same network from Figure 13.1
is difficult to read due to its density. Perhaps the only structural element that
is noticeable is the relatively large group of nodes in the lower left that appear
to be exclusively linked to from a single node 172
13.3 Looking just at links between the top-level directories (e.g. http://ai.umich.
edu/blog and http://ai.umich.edu/about-ai) gives a smaller network that is
slightly more manageable but still difficult to learn very much from 172
13.4 The same set of pages from Figure 13.1, but only showing edges
where a page links to another page five or more times. This most likely shows
structural links such as those that appear in headers and footers along with a
few places on a page, rather than single mentions in the body of the text.
The main page in the middle is the About page 173
13.5 Bundling the edges from Figure 13.1 provides a clearer visualization.
Edge bundling can be powerful for circle layouts as one can see the
general direction of movement 173
x THE SAGE HANDBOOK OF WEB HISTORY

13.6 This final network visualization shows the same network from Figure 13.3
with the first-level directories, but only showing edges with ten or more links.
The edges are also bundled, and labels are scaled based on the node’s
betweenness centrality, which measures how important a node is for
connecting the network 174
13.7 An example of edge and node bundling showing connections between
high-level domains 175
13.8 A hive plot of the pages that make up three of the first-level directories.
The distance from the center represents the degree (number of links coming in
and going out) and the color represents the number of nodes at that position
on the axes. Note that this diagram does not show links within a directory.
The hive plots were produced with Jhive 176
13.9 The same plot as Figure 13.7, but with the axes expanded to
show inter-directory links. Each page is thus shown twice, once on the
main axis and once on the repeated axis 176
13.10 A heatmap showing a subset of first-level directories from http://ai.umich.edu.
A gray square represents a link between pages in those directories. The directory
on the left is the directory the link originates from and the directory at the time is
the destination directory. It should be noted that directed graphs do not produce
symmetrical heatmaps. These heatmaps were made using the
statistical software R 177
13.11 The same heatmap from Figure 13.9, but colored based on the number of links 178
13.12 A scatterplot showing the relationship between links to country
high-level domains and mentions on the BBC website 179
13.13 A dispersion plot showing the frequency of the terms ‘digital’,
‘learning’ and ‘div’ from http://ai.umich.edu/about-ai. The text includes all
of the html and it is readily apparent how much greater the proportion of code
is to the human-readable text. This graph was made using the Natural Language
Toolkit (NLTK), a package for Python 180
13.14 Self-organizing map based on Wikipedia featured article data.
Closer items are more similar. The ‘mountains’ are edges between clusters
and the red lines are links between articles 181
13.15 Small multiples showing the strength of hyperlinked connections
between UK universities under study for 2000, 2005, 2010 182
14.1 The Last-Modified response header often exists for images, pdfs,
and other typically static files 191
14.2 The Last-Modified response header is typically absent from resources with
dynamically constructed representations (i.e., almost all HTML files) 192
14.3 Based on the ‘Accept-Encoding’ request header, the server responds
with a gzipped HTML page, as declared in the ‘Content-Encoding’
response header 192
14.4 URIs, resources, and representations 193
14.5 HTTP response for a Memento from the Internet Archive 194
14.6 HTTP response for a Memento from archive.is 195
14.7 The first ten lines of the TimeMap for http://www.lanl.gov/195
14.8 Negotiating with a TimeGate for a Memento of http://www.lanl.gov/
close to October 16, 2013 196
14.9 HTTP response with Memento headers from the W3C MediaWiki 197
List of Figures and Box xi

14.10 Datetime negotiation with a MediaWiki TimeGate for one second


before the latest Memento; MediaWiki uses the minpast algorithm
instead of mindist 198
14.11 Architectural overview of how the Memento framework allows a
representation of a prior state of a resource to be accessed 198
14.12 A response from an aggregated TimeGate, redirecting to
http://archive.is/20131016225948/http://www.lanl.gov/199
14.13 The processed TimeMap showing the hostnames of the eight public
web archives with Mementos for http://www.lanl.gov/ and their respective
Memento counts 199
14.14 A request to the TimeTravel service with URI-R = http://www.lanl.gov/
and datetime=2013-10-16 200
14.15 The response to the request shown in Figure 14.12, with seven archives
holding Mementos for this URI-R (available at: http://timetravel.mementow
eb.org/list/20131016000000/http://www.lanl.gov/)200
14.16 Setting the datetime to October 16, 2013 for http://www.lanl.gov/202
14.17 Right-clicking in the middle of the page to expose datetime
negotiation options for http://www.lanl.gov/203
14.18 The user is now at the Memento http://archive.is/20131016225948/
http://www.lanl.gov/204
14.19 Right-clicking in the middle of the Memento to go back to the live web
(i.e., from http://archive.is/20131016225948/http://www.lanl.gov/ back to
http://www.lanl.gov/)205
14.20 The Internet Archive may hold Mementos for https://www.quora.com/
but is blocking them due to the directives found in https://www.quora.com/
robots.txt206
14.21 https://www.quora.com/ is not in the Internet Archive but is archived
500+ times in seven other archives 206
14.22 The Memento-Datetime and X-Archive-Orig-last-modified
headers establish a range of temporal validity 207
14.23 Memento http://web.archive.org/web/19990129040356/http://www.goes.noaa.
gov/browsh2.html208
14.24 Prima Facie Violative: the embedded JPEG from Figure 14.23 was actually
modified and archived in 2003, not 1999 209
14.25 Primary link is to URI-R, alternate link to URI-M, and a preferred datetime 210
14.26 Primary link is to an aggregator, alternate link to URI-R, and
a preferred datetime 210
18.1 This figure depicts a timeline of systems, languages, and frameworks that have
been advanced in the field of knowledge representation since the 1960s.
It shows how the field has toggled between neat and scruffy approaches to
knowledge representation. Many of the Semantic Web technologies introduced in
the 2000s and 2010s can be said to derive from earlier systems – sharing common
creators, design directives, and worldviews 266
19.1 Early Web browsers, family tree 271
19.2 Viola hypertext system, 1989. Viola was a powerful hypertext system by
student Pei Wei, based around Java-like applets. He later turned it into an
early Web browser 272
19.3 WorldWideWeb browser on the NeXT computer 274
xii THE SAGE HANDBOOK OF WEB HISTORY

19.4 Screenshot, CERN line-mode browser 275


19.5 Viola browser, screenshot from later version 1993 277
19.6 Midas browser, screenshot from later version 2.1 278
19.7 NCSA Mosaic browser, 1993. Mosaic brought the Web to ordinary users. NCSA’s
‘What’s New’ page effectively became a home page for the entire early Web 280
19.8 Gopher t-shirt in the style of hot-rod artist Big Daddy Roth, ca. 1994.
Gopher was the Web’s most serious competitor. It was developed by Mark
McCahill, Paul Lindner, and Farhad Anklesaria at the University of Minnesota 281
19.9 Web portal site GNN pioneered Web advertising in 1993, with embedded ads
similar to this example from 1995. GNN evolved from a bookstore kiosk
version of ‘The Whole Internet User’s Guide’ based on the early
Viola browser 282
19.10 Mosaic marketing materials 282
19.11 White House site, 1994 283
19.12 Screenshot from Netscape Navigator 284
19.13 CommerceNet Consortium page, 1994 286
19.14 Universal 3-A stock ticker, ca. 1870–80. Among the first dedicated
e-commerce devices, ticker tape machines printed stock prices in real time.
They were named for their ticking sound 287
19.15 Windows 95 box with bundled access to MSN, the last major competitor to
the Web/Microsoft Network (MSN) logo on Windows 95 box. Windows 95
came ready to connect to this initially proprietary network and online service.
MSN later provided Internet access 289
19.16 Kinokuniya bookstore, i-mode site 291
19.17 Cybird’s mobile map, i-mode site 291
20.1 Nokia Communicator 9300 displaying Wikipedia home page 299
20.2 Opera Mini advertising, Opera.com website, 31 December 2005 302
20.3 Mobile website browsing on Opera Mini, Opera.com website, 31
December 2005 303
21.1 The rise of ‘open source’ 317
21.2 Citations in Gaza War article by country of origin 325
23.1 GeoCities.com from 22 October 1996, via Wayback Machine 349
23.2 The link structure of one GeoCities neighborhood, the Enchanted Forest 352
23.3 EnchantedForest/Glade/3891 –the highest ranked site 352
23.4 The TF-IDF search engine 355
29.1 Stuart Moulthrop’s Victory Garden (1992), published by Eastgate Systems 430
29.2 Advert for Judy Malloy’s ‘Uncle Roger: A Party in Woodside’,
from Pathfinders430
29.3 Electronic Literature Collection: Volume 1, from collection.eliterature.org 433
29.4  All the Delicate Duplicates, by Mez Breeze and Andy Campbell (2013) 437
30.1 Framework 447
30.2 Hyperlink network visualisation with host aggregation (scope SeedURL) 450
30.3 Hyperlink network visualisation with aggregation by seed URL
(scope SeedURL) 451
30.4 Hyperlink network visualisation without filtering, with host aggregation
(scope: all) 452
30.5 Hyperlink network visualisation after filtering, with host aggregation
(scope: all) 452
List of Figures and Box xiii

30.6 Hyperlink network visualisation without filtering, aggregated by host,


remaining in the scope of the seed list 453
30.7 Network visualisation of websites dedicated to WW1 456
30.8 Network visualisation of websites dedicated to WW1 (degree>30) 457
31.1 Internet users (per 100 people) 467
31.2 ‘MMR The Facts’ – front page captured 8 September 2002 469
31.3 ‘MMR The Facts – Your Questions Answered’ – captured 3 December 2002 470
31.4 ‘MMR The Facts – Myths and Truths’ – captured 19 October 2002 472
35.1 A ‘good fortune lantern’ ASCII graphic sent via email by a Chinese student
during the 1992 New Year celebrations 524
35.2 Log-in page screenshot of the Tsinghua University Shuimu Tsinghua BBS,
originally set up on a 386 computer running Linux, using the same PalmBBS
behind the National Taiwan University Coconut Trees BBS 525
35.3 Bilingual coin divination page from one of the earliest Chinese websites
hosted on GeoCities 527
35.4 A QQ group chat window, including a common message feed, a textbox with
multimedia uploading options, different toolbars to access additional services
and a list of the 118 group members 528
35.5 The Sina Blog of Zhou Xiaoping, a 1981-born blogger popular for his
nationalist and anti-American views, as of September 2010 530
35.6 In-app stickers, photos, personalized biaoqing images, short videos,
hongbao red envelopes and emoticons used in interactions across three
WeChat group chats 532

BOX

11.1 Example use of command line tool wget 144


List of Tables

4.1 Select social media platforms with principal functions in order of importance 48
10.1 Example of a matrix with undirected social network data (x signifies a tie) 126
10.2 Social ties represented in two columns (using the same data as in Table 10.1) 126
11.1 Website categories with respective websites 143
11.2 Mean percentage of text on a webpage per year, with standard deviation values 148
23.1 Origin and destination links in GeoCities 351
23.2 Topics in GeoCities 354
30.1 Evolution of the collection ‘Great War on the Web’ 446
30.2 Comparison of strategies for generating the graph 449
Notes on the Editors
and Contributors

THE EDITORS

Niels Brügger is Professor at Aarhus University, the School of Communication and Culture.
In 2000 he co-founded the Centre for Internet Studies, Aarhus University, and has headed the
centre since 2010. Since 2014 he has been Head of NetLab, a research infrastructure for the
study of the archived web. His research interests are web historiography, web archiving, and
media theory. Within these fields he has authored a number of publications, including Web 25:
Histories from the first 25 years of the World Wide Web (Ed., Peter Lang, 2017), The Web as
History: Using Web Archives to Understand the Past and the Present (Ed. with Ralph
Schroeder, UCL Press, 2017), and The Archived Web: Doing History in the Digital Age (MIT
Press, 2018). He is co-founder (2017) and managing editor of the international journal Internet
Histories: Digital Technology, Culture and Society (Taylor & Francis/Routledge).

Ian Milligan is an Associate Professor of History at the University of Waterloo, where he


teaches Canadian and digital history. Ian’s work explores how historians can use web archives,
the large repositories of cultural information that the Internet Archive and many other libraries
have been collecting since 1996. He has published two books: the co-authored Exploring Big
Historical Data: The Historian’s Macroscope (2015) and Rebel Youth: 1960s Labour Unrest,
Young Workers, and New Leftists in English Canada (2014). In 2016, Ian was named the
Canadian Society for Digital Humanities/Société canadienne des humanités numériques
(CSDH/SCHN)’s recipient of the Outstanding Early Career Award.

THE CONTRIBUTORS

Adam Amram is a scientific programmer. He holds an MSc from the Department of


Information and Knowledge Management at Haifa University. His research focuses on devel-
oping computational tools for web research.

Belinda Barnet is Senior Lecturer in Media at Swinburne with research interests in digital
cultures, social media, the app economy, data analytics, AI, and the history of digital media.
Her current projects include examining the role of automation in speech rehabilitation in order
to improve the use of cochlear implants in deaf children. Alongside her research work, she has
worked as Service Delivery Manager (Wireless Content Services) for Ericsson Australia. She
is the author of Memory Machines: The Evolution of Hypertext (Anthem Press UK, 2013). You
can find her on Twitter at @manjusrii.
xvi THE SAGE HANDBOOK OF WEB HISTORY

Valérie Beaudouin is a Professor of Social Sciences at Telecom ParisTech. She studies the
changes in social practices related to the digital era. She has been directing and conducting
research on online communities, self-publication, author’s networks, online amateur critiques,
and currently the construction of heritage and digital memories about WWI with the French
National Library. Information technology as a tool for humanities is one of her special inter-
ests: she specialized in text mining and social network tools and more generally on digital
methods for social sciences. She graduated from ENSAE ParisTech as a Statistician Economist
in 1991 and obtained a PhD in Linguistics in 2000 at EHESS (Higher School of Social
Sciences). Her most recent publication, with Dominique Pasquier, is ‘Forms of contribution
and contributors’ profiles: An automated textual analysis of amateur on line film critics’ in New
Media & Society.

Anat Ben-David is a Senior Lecturer in the Department of Sociology, Political Science and
Communication, and Head of the Open Media and Information Lab at the Open University of
Israel. Her research focuses on national web studies and digital sovereignty, web history and
web archive research, and the politics of online platforms. Methodologically, her work special-
izes in developing and applying digital and computational methods for web research.

Finn Brunton (finnb.net) is an Assistant Professor in the Department of Media, Culture, and
Communication at New York University. He is the author of Spam: A Shadow History of the
Internet (MIT Press, 2013), Obfuscation: A User’s Guide for Privacy and Protest with Helen
Nissenbaum (MIT Press, 2015), Communicate with Mercedes Bunz (University of Minnesota
Press, 2018), and Digital Cash: The Unknown History of the Anarchists, Immortalists, and
Utopians Who Created Cryptocurrency (Princeton University Press, 2019), as well as numer-
ous articles and papers.

Anthony Cocciolo is the Dean at Pratt Institute School of Information in New York City. His
research and teaching are in the area of archives and digital preservation. He recently published
Moving Image and Sound Collections for Archivists (Chicago: Society of American Archivists).
Cocciolo completed his doctorate from the Communication, Media and Learning Technologies
Design program at Teachers College, Columbia University, and his B.S. in Computer Science
from the University of California, Riverside.

Matthew Crain is an Assistant Professor in the Department of Media, Journalism, and Film at
Miami University. His research interests include the commercial development of the Internet,
the political economy of the media, communications policy, and critical studies of advertising.
His work has been published in academic journals including New Media & Society and the
International Journal of Communication. He previously taught at Queens College, City
University of New York.

Andy Famiglietti is a Professor of Digital Rhetoric at West Chester University in Pennsylvania.


He has previously published on the rhetoric of Wikipedia editors in First Monday. His ongoing
research projects adapt digital humanities distant reading methods to better understand rhe-
torical strategies utilized in Wikipedia and other online spaces. He is also a software developer,
working to build open source educational tools for writing classrooms.

Gerard Goggin is Professor of Media and Communications and ARC Future Fellow,
University of Sydney. He is a leading figure in mobile media and communication research, with
Notes on the Editors and Contributors xvii

key books including Cell Phone Culture (2006), Global Mobile Media (2011), the Routledge
Companion to Mobile Media (2014), the four-volume Major Works: Mobile Technologies
(2016), and Location Technologies in International Context (2018). Goggin is also a founding
editor of the journal Internet Histories, and editor of the Routledge Companion to Global
Internet Histories (2017).

Dene Grigar is Professor and Director of The Creative Media and Digital Culture Program at
Washington State University Vancouver, whose research focuses on the creation, curation,
preservation, and criticism of electronic literature. She is President of the Electronic Literature
Organization. Grigar has authored 14 media works, including Curlew (2014), A Villager’s Tale
(2011), and When Ghosts Will Die (2008). She curates exhibits of electronic literature and
media art, and has mounted shows at the British Computer Society, the Library of Congress,
the Symposium on Electronic Art, and the Modern Language Association (MLA), among other
venues. With Stuart Moulthrop she developed the methodology for documenting born-digital
media, a project that culminated in an open source, multimedia book, entitled Pathfinders
(2015), and a book of media art criticism, entitled Traversals (2017), for The MIT Press. In
2017, she was awarded the Lewis E. and Stella G. Buchanan Distinguished Professorship by
her university.

Alexander Halavais is an Associate Professor of Social Technologies in the School of Social


and Behavioral Sciences at Arizona State University, where he researches ways in which social
media change the nature of scholarship and learning and allow for new forms of collaboration
and self-government. He directs the Masters in Social Technologies program. The second edi-
tion of his Search Engine Society was published by Polity in 2017, and he is working on a book
tentatively entitled All Seeing.

Anne Helmond is Assistant Professor of New Media and Digital Culture at the Department of
Media Studies at the University of Amsterdam. She is a member of the Digital Methods
Initiative and App Studies Initiative research collectives where she focuses her research on the
infrastructure of social media platforms and mobile apps. Her research interests include digital
methods, software studies, platform studies, app studies, infrastructure studies, and web his-
tory. She currently holds a Veni grant from the Netherlands Organisation for Scientific
Research (NWO) for the project ‘App ecosystems: A critical history of apps’ (2017–20). More
info: annehelmond.nl.

Tanja Jadin is a Professor for E-Learning at the Study Programme Communication and
Knowledge Media at the University of Applied Sciences Upper Austria. She holds a doctoral
degree in Psychology from the University of Salzburg. Her main research interests are com-
puter-supported collaborative learning, self-regulated learning, informal learning, teaching and
learning with new media, and media literacy.

Steve Jones is a UIC-Distinguished Professor of Communication, Research Associate in the


UIC Electronic Visualization Laboratory, Adjunct Professor of Computer Science, and Adjunct
Research Professor in the Institute of Communications Research at the University of Illinois at
Urbana-Champaign. His research interests include the social history of communication technol-
ogy, health and new media, human augmentics, virtual environments and virtual reality, popular
music studies, Internet studies, and media history. He was the founder and first President of the
Association of Internet Researchers and served as Senior Research Fellow at the Pew Internet
xviii THE SAGE HANDBOOK OF WEB HISTORY

and American Life Project. He is editor of New Media & Society, co-editor of Mobile Media &
Communication, and edits the Digital Formations book series for Peter Lang Publishing. His
research has been funded by the National Science Foundation, National Institutes of Health,
National Cancer Institute, Centers for Disease Control, and the Tides Foundation. Jones was
named a Fellow of the International Communication Association in 2012.

Justin Joque is a scholar of philosophy, technology and media and the visualization librarian
at the University of Michigan. He completed his PhD in Communications and Media Studies
at the European Graduate School and holds a Masters in Science of Information from the
University of Michigan. He is most recently the author of Deconstruction Machines: Writing
in the Age of Cyberwar (University of Minnesota Press, 2018).

Allie Kosterich is an Assistant Professor in the Department of Media, Communications, and


Visual Arts at Pace University. Allie’s research examines transformation in the media industry,
particularly at the intersection of organizations, institutions, and digital technologies. Her
recent work focuses on institutional change in news media in reaction to new forms of media
production. This includes a large-scale study examining career histories and skill sets of profes-
sional journalists in the United States. Allie uses mixed methods in her work, including archival
research, interviews, and social network analysis. Allie received her PhD in 2017 from Rutgers
University School of Communication and Information.

Ditte Laursen is Head of Department for Digital Cultural Heritage at The Royal Danish
Library. She is experienced in collection management, IT governance, and research and
development. Her special interests include digital cultural heritage, digital humanities, and
digital research infrastructures. She is the author or co-author of numerous publications
on digital archives, social interaction in, around, and across digital media, and users’
engagement with museums and libraries, all published in international peer-reviewed jour-
nals and anthologies.

Stine Lomborg is Associate Professor in Communication and IT at the University of


Copenhagen. She holds a PhD in Media Studies from Aarhus University. Her research centers
on new models of communication in the context of digital media and empirical studies of the
uses of social and mobile media in the context of self-tracking and online communication. She
is the author of Social Media – Social Genres: Making Sense of the Ordinary (Routledge),
which uses web archival research on social media use in Denmark to understand the develop-
ment of new forms of everyday communication and sociality, and has also authored several
articles reflecting on the methodological, ethical and regulatory implications of digital media.

Madhavi Mallapragada is Associate Professor in the Department of Radio-Television-Film


at the University of Texas at Austin. Her research interests are in the areas of new media, cul-
tural studies, media and diaspora, race and ethnicity, media industries, Asian American media,
and immigrant culture. She is the author of Virtual Homelands: Indian Immigrants and Online
Cultures in the United States (University of Illinois Press, 2014). She is currently working on
a book-length project on the politics of race and ethnicity in US media industries. Her research
has been published in the journals Television and New Media, Communication, Culture &
Critique, New Media and Society, South Asian Popular Culture, and in the edited anthologies
Global Asian American Popular Cultures (2016), Re-Orienting Global Communication: Indian
Notes on the Editors and Contributors xix

and Chinese Media Beyond Borders (2010), Critical Cyberculture Studies: Current Terrains,
Future Directions (2006), and Web.studies: Rewiring New Media for the Digital Age
(2000).

Jim McGrath is a Postdoctoral Fellow in Digital Public Humanities at the John Nicholas
Brown Center for Public Humanities and Cultural Heritage (Brown University). His research
interests include digital humanities, public humanities, community archives, electronic litera-
ture, and Internet subcultures. He received his PhD in English from Northeastern University,
where he was also Project Co-Director of Our Marathon: The Boston Bombing Digital Archive.
He is on Twitter @JimMc_Grath.

Mark McLelland is a Professor in the Sociology program at the University of Wollongong.


He is author or editor of over ten books focusing on Japanese popular culture and cultural and
media history.

Gareth Millward is a Wellcome Trust Research Fellow at the Centre for the History of
Medicine, University of Warwick. He held a bursary from the British Library and Institute of
Historical Research to help develop new search tools for historians accessing the Library’s web
archive data in 2014–15. Since then, he has been keen to integrate web archives into his
research, particularly for contemporary events. He specializes in British health policy since
World War II. His PhD focused on disability policy since the 1960s, and he has recently com-
pleted a monograph on the history of British childhood vaccination policy. Since 2017 he has
been researching the policy and rhetoric around British sickness certification from 1945 to the
present.

Jeremy Wade Morris is an Associate Professor of Media and Cultural Studies in the
Department of Communication Arts at the University of Wisconsin-Madison. His research
interests include the digitization of cultural goods and commodities, software and app culture,
the history of sound technologies, and the current state of the popular music industries. He is
the author of Selling Digital Music, Formatting Culture (University of California Press) and
co-editor of a collection on apps and software called Appified: Culture in the Age of Apps (with
Sarah Murray, University of Michigan Press 2018). His work has also appeared in journals such
as New Media & Society, Critical Studies in Media Communication, and Popular Communication.
He is the founder of PodcastRE.org, a database to preserve podcasts and make them more
researchable for scholars of media and audio history.

Francesca Musiani is Associate Research Professor at the French National Centre for
Scientific Research (CNRS), Institute for Communication Sciences (ISCC – CNRS/
Sorbonne University). Her current research focuses on science and technology studies
approaches to the study of Internet governance and privacy. She is one of the Principal
Investigators of the NEXTLEAP project (2016–18, Next-Generation Techno-Social and
Legal Encryption Access and Privacy), funded by the European Commission. Francesca is
the author of Internet et vie privée [Internet and Privacy] (Uppr Editions, 2016) and Nains
sans géants. Architecture décentralisée et services Internet [Dwarfs Without Giants.
Decentralized Architecture and Internet Services] (Presses des Mines, 2013 [2015], was
awarded the Prix Informatique et Libertés by the French Privacy and Data Protection
Commission).
xx THE SAGE HANDBOOK OF WEB HISTORY

Federico Nanni is a Postdoctoral Researcher at the Data and Web Science Group and the
Political Science Department of the University of Mannheim. His research focuses on the
issues that arise when using born-digital documents as primary sources to study the present
times and on adopting (and adapting) Natural Language Processing methods for supporting
works in the digital humanities and computational social sciences. His previous studies have
been published in relevant digital humanities journals, such as Digital Scholarships in the
Humanities and Digital Humanities Quarterly, as well as at important computer science
venues, such as EMNLP, JCDL, and EACL.

Michael L. Nelson is a Professor of Computer Science at Old Dominion University. Prior to


joining ODU, he worked at NASA Langley Research Center from 1991–2002, where he devel-
oped the NASA Technical Report Server (NTRS). He is a co-editor of the OAI-PMH, OAI-
ORE, Memento, ResourceSync, and Robust Links specifications. His research interests include
repository-object interaction and web preservation. More information about Michael can be
found at http://www.cs.odu.edu/∼mln.

Michael Nycyk is an independent technology and social researcher affiliated with the Department
of Internet Studies at Curtin University, Perth, Australia and a graduate of the University of
Queensland, Brisbane, Australia, in Information Management and Communication and Language
Studies. His primary interest is in understanding the behaviors of Internet users, policies to
manage these, and the shaping of online identity. His areas of research include understanding
adult cyberbullying, Internet trolling, members’ flaming strategies on YouTube, computer hack-
ers’ behaviors, and analyzing people’s online behaviors, and their management, on social media.
He has published in these areas, including self-published books, and other publications.
Additional areas he has researched and published in include: the use of technologies by older
adults and effective learning strategies, minimizing the digital divide, online learning methods,
knowledge management and electronic records management, and child fostering practices.

Christina Ortner is Professor for Online Communication at the Study Programme


Communication and Knowledge Media at the University of Applied Sciences Upper Austria.
She also teaches communications and qualitative social science at the University of Salzburg
and the University of Applied Science Salzburg. Her research interests include online commu-
nication, social media, audience and reception studies, children, youth and the media, and citi-
zen communication.

James O’Sullivan is Lecturer in Digital Arts and Humanities at University College Cork
(National University of Ireland). He has previously held faculty positions at the University of
Sheffield and Pennsylvania State University. His work has been published in a variety of interdis-
ciplinary journals, including Digital Scholarship in the Humanities, Digital Humanities
Quarterly, and Hyperrhiz: New Media Cultures. He and Shawna Ross are the editors of Reading
Modernism with Machines (Palgrave Macmillan, 2016). He is the author of several collections of
poetry, including Courting Katie (Salmon Poetry, 2017), and is the founding editor of New Binary
Press. Further information on James and his work can be found at josullivan.org.

Susanna Paasonen is Professor of Media Studies at University of Turku, Finland. With an


interest in studies of popular culture, sexuality, affect, and media theory, she is the author of
Carnal Resonance: Affect and Online Pornography (MITP, 2011) and Many Splendored
Things: Thinking Sex and Play (Goldsmiths Press, 2018), co-author of Not Safe for Work: Sex,
Humor and Risk in Social Media with Kylie Jarrett and Ben Light (MITP, forthcoming), and
Notes on the Editors and Contributors xxi

co-editor of Working with Affect in Feminist Readings: Disturbing Differences (Routledge


2010, with Marianne Liljeström) and Networked Affect (MITP, 2015, with Ken Hillis and
Michael Petit). She serves on the editorial boards of the journals Sexualities, Porn Studies, New
Media & Society, Social Media + Society, and International Journal of Cultural Studies.

Zeynep Pehlivan is a Research Engineer in the legal deposit team of Ina (French National
Audiovisual Institute). She holds a PhD in Computer Science from the University of Pierre and Marie
Curie (thesis title: ‘Access to web archives: querying, navigating and optimizing’). Her research
focuses on web archiving and access methods to web archives and their optimization. Before joining
Ina, she participated in national and international R&D projects such as SCAPE (Scalable
Preservation Environments, EU FP7) and the LabEx-founded ‘Pasts in the Present: History, heritage,
memory’. She is currently working on social media archiving and mining (e.g. with Twitter).

Lindsay Poirier is a cultural anthropologist and recently completed her Ph.D. in Science and
Technology Studies at Rensselaer Polytechnic Institute. As of January 2019, she will be
Assistant Professor of Data Studies in the Science and Technology Studies Department at
University of California Davis. Her research focuses on digital expertise, data cultures, and the
theorization of digital infrastructure. She has conducted historical research on approaches to
digital knowledge representation in the artificial intelligence community and has conducted
fieldwork within both the Semantic Web community and a community of practitioners building
data standards for the human services. She is also the lead platform architect for the Platform
for Experimental Collaborative Ethnography (PECE) – an open source digital humanities plat-
form, which now supports several international research projects.

Richard Rogers is Professor of New Media and Digital Culture, Media Studies, University of
Amsterdam. He is also Director of the Digital Methods Initiative as well as the Netherlands
Research School for Media Studies (RMeS).

Valérie Schafer has been Professor of Contemporary European History at C2DH (Centre for
Contemporary and Digital History) at the University of Luxembourg since February 2018. She
was previously a researcher at the French National Centre for Scientific Research (CNRS). Her
current research deals with the Internet and Web history. She led the Web90 project funded by
the French National Research Agency (ANR) and dedicated to the French Heritage, Memories
and History of the Web in the 1990s. She is the author of La France en réseaux (années
1960–1980) [France in Networks (1960–1980)] (2012) and she co-authored with Benjamin
Thierry Le Minitel, l’enfance numérique de la France [The Minitel, the French Digital
Childhood] (2012) and with Bernard Tuy Dans les coulisses de l’Internet. RENATER, 20 ans
de technologie, d’enseignement et de recherche [On the Internet’s Sidelines: RENATER, 20
Years of Technology, Teaching and Research] (2013).

Ralph Schroeder is Professor at the Oxford Internet Institute. Before coming to Oxford
University, he was Professor in the School of Technology Management and Economics at
Chalmers University in Gothenburg (Sweden). His recent books include Rethinking Science,
Technology and Social Change (Stanford University Press, 2007), An Age of Limits: Social
Theory for the 21st Century (Palgrave Macmillan), Knowledge Machines: Digital Transformations
of the Sciences and Humanities (MIT Press, 2015, co-authored with Eric T. Meyer), and Social
Theory after the Internet: Media, Technology and Globalization (UCL Press). He is currently
doing research on the social implications of big data and on the uses of digital media by right-
wing populists.
xxii THE SAGE HANDBOOK OF WEB HISTORY

Gabriele de Seta holds a PhD in Sociology from the Hong Kong Polytechnic University and
recently completed a postdoctoral fellowship at the Institute of Ethnology, Academia Sinica in
Taipei, Taiwan. His research work, grounded in ethnographic engagement across multiple sites,
focuses on digital media practices and vernacular creativity in Chinese-speaking areas. He is
also interested in experimental music, Internet art, and the collaborative intersections of anthro-
pology and art practice. Gabriele has published in a wide range of journals, including
Fibreculture, The Information Society, Anthropology Now, and Medien&Zeit, and authored
numerous chapters for handbooks and edited volumes. More information is available on his
website http://paranom.asia

Ignacio Siles is a Professor in the School of Communication at Universidad de Costa Rica. He


is the author of Networked Selves: Trajectories of Blogging in the United States and France (Peter
Lang, 2017), Por un Sueño En.red.ado (EUCR, 2008), and several articles about technology and
society. He obtained his PhD in the Media, Technology, and Society program at Northwestern
University. His current book project examines the transnational history of computer networks in
Central America in the 1990s as a political project of integration and development.

Philip Sinner is a Research Associate and Lecturer at the Department of Communications,


University of Salzburg. His research interests concern aspects of audio visual and online com-
munication with a special focus on social media, younger people, sports, and soccer as well as
on processes of media socialization. Since 2011 he has been a member of the European
Research Network EU Kids Online and the saferinternet.at advisory board, and a committee
member since 2016 of the Austrian No Hate Speech Movement (http://www.nohatespeech.at).
Since 2018 he has been the Early Career Scholars Representative of the newly created ‘Division
Media Sport and Sport Communication’ in the German Communication Association.

Michael Stevenson is a web historian, and Associate Professor of New Media and Digital
Culture in the Media Studies department at the University of Amsterdam. His work is broadly
about the roots and foundations of media practices, genres, and forms that are considered ‘web-
native’ or otherwise specific to the new media landscape. He is currently working on ‘The Web
that Was’, a project funded by the Dutch National Science Foundation (NWO), about the Perl
programing language and the early web. He is still figuring things out.

Peter Stirling is a digital curator in the digital legal deposit team at the Bibliothèque nationale
de France (BnF). He works on the definition of services and tools for users of the web archives
and on digital preservation of the collections. He also participates in day-to-day web archiving
activity and the international activity of the team in the context of the International Internet
Preservation Consortium. He holds an MA in English Literature and an MSc in Information
and Library Studies, and previously worked for an online information portal for health profes-
sionals in the UK and in online information monitoring for the French National Cancer
Institute, before joining the BnF in 2009.

Benjamin G. Thierry is Associate Professor at Sorbonne Université and a researcher at the


French National Center for Scientific Research (Institute for Communication Sciences, CNRS
– Sorbonne Université). He specializes in history of computing and telecommunications. His
long-term research interests involve the socialization of IT (man–machine communication,
digital culture, and links to the general public). He is the co-author with Valérie Schafer of Le
Notes on the Editors and Contributors xxiii

Minitel, l’enfance numérique de la France [The Minitel, the French Digital Childhood] (2012).
He is currently the Vice-President for Digital Projects at Sorbonne Université.

Herbert Van de Sompel is an information scientist at the Los Alamos National Laboratory and
leads the Prototyping Team in the Research Library. The team does research regarding various
aspects of scholarly communication in the digital age. Herbert has played a role in various
interoperability efforts (OAI-PMH, OpenURL, OAI-ORE, info URI, Open Annotation,
ResourceSync, SharedCanvas, Memento, Robust Links) and in the design of scholarly discov-
ery tools (SFX linking server, bX recommender engine). More information about Herbert can
be found at http://public.lanl.gov/herbertv.

Marc Weber is Curatorial Director of the Internet History Program (computerhistory.org/nethis-


tory) at the Computer History Museum in Silicon Valley. He established Web history as a topic
starting in 1995 with help from Sir Tim Berners-Lee and other online pioneers, and co-founded
two of the first organizations in the field. The Internet History Program has further expanded the
Museum’s leading collection of networking history materials and developed its galleries and
exhibits on connected topics, including the permanent Web, Mobile, and Networking galleries.
Weber has conducted oral histories with several hundred online pioneers. He speaks and pub-
lishes widely and consults to companies, filmmakers and museums on the history of the online
world and has been interviewed on related topics by major media from the BBC to Wired. He
serves on the editorial board of Internet Histories and co-chairs the W3C Web History Community
Group. He is author of an upcoming book from Thomas Dunne Books/St. Martin’s Press on the
evolution of the online world.

Matthew Weber is an Associate Professor in the Hubbard School of Journalism and Mass
Communication at the University of Minnesota. Matthew is an expert on organizational change
and the use of large-scale Web data. His recent work includes a large-scale longitudinal study
examining strategies employed by media organizations for disseminating news and information
through online hyperlink networks. Subsequent research includes an examination of the effec-
tiveness of adopting social media within organizations in order to share knowledge and col-
laborate with teammates. Matthew is also leading an initiative to provide researchers with
access to the Internet Archive (archive.org) in order to study digital traces of news networks.
Matthew received his PhD in 2010 from the Annenberg School of Journalism and
Communication at the University of Southern California.

Peter Webster is an independent scholar and consultant based in the UK. He has published
widely on various aspects of contemporary British religious history from the 1920s to the
1990s, including church and state, the religious arts, evangelicalism, and the relationship of
faith and technology. He has also written extensively on the implications of the digital turn for
historical research, and on web archives in particular. His most recent book, on Walter Hussey,
Anglican patron of the arts, was published in 2017 by Palgrave Macmillan. After working with
digital archives for the University of London, the British Library, and the International Internet
Preservation Consortium, he founded Webster Research and Consulting in 2014. WR&C works
with libraries and archives to understand what users need from digital resources for research,
and works with technologists to meet those needs.

Jane Winters is Chair of Digital Humanities at the School of Advanced Study, University of
London. She has led or co-directed a range of digital projects, including Big UK Domain Data
xxiv THE SAGE HANDBOOK OF WEB HISTORY

for the Arts and Humanities; Digging into Linked Parliamentary Metadata; Traces through
Time: Prosopography in Practice across Big Data; and Born Digital Big Data and Approaches
for History and the Humanities. Recent publications include ‘Tackling complexity in humani-
ties big data: From parliamentary proceedings to the archived web’, in Big and Rich Data in
English Corpus Linguistics: Methods and Variations, ed. T. Hiltunen, J. McVeigh and T. Säily
(Helsinki: Varieng, 2017); ‘Breaking in to the mainstream: Demonstrating the value of internet
(and web) histories’, Internet Histories (March 2017); and ‘Web archives for humanities
research: Some reflections’, in The Web as History: Using Web Archives to Understand the Past
and Present, ed. N. Brügger and R. Schroeder (London: UCL Press, 2017).
Foreword
The Web as Counterpart
Steve Jones

In late 2017 the US-based Starz cable television network launched a science fiction espionage
drama titled Counterpart based on the premise that in 1987, during a science experiment, East
German scientists accidentally created a parallel Earth. While a crossing point existed in Berlin
that allowed people to physically move between the worlds, from that point in 1987 onward the
two worlds diverged. Part of the drama involved moments when a person would meet their
counterpart, or a friend, acquaintance or co-worker, from the other world. What would be the
consequences of such encounters? Could one recognize or reconcile with the other?
One way, of the admittedly many, to read Counterpart is as an allegory for the Web. Invented
in 1989, one could imagine its existence as a kind of parallel universe, with crossing points at
screens, perhaps. The interesting element of such a reading, as with most all allegories, is not in
the match or mismatch of the allegorical details so much as it is in the lesson of the parable. The
lesson in this case is not an end result, a single consequence resulting from some prior actions
or motives. It is in the consequences that arise from the moment of divergence. What becomes
of one’s identity, indeed, what becomes of the notion of identity, in a strangely split, parallel,
universe? Therein lies the importance of understanding the history of the Web. It is not merely
a technology, a product of scientific invention. Nor is it merely a new way to do things we had
already been doing (writing, communicating, reading, etc.) or even to do new things altogether.
And it is also not merely a platform on which we share thoughts, ideas, images and sounds. It
is all those things, but greater than their sum.
The Web is, like the parallel Earth of Counterpart, us, and not us, a recognizable, yet strange,
place. How did it come about, what was its development? We can find points of origination
but from those moments onward multiple trajectories emerge. How do we hold the Web still
enough to achieve a degree of satisfactory scrutiny? Numerous scholars are now, thankfully,
asking, and answering, questions such as these. The research and discussion presented in The
SAGE Handbook of Web History provide the context for pressing on with discovery of the Web
as an indivisible part of contemporary human experience, and provide the foundation for future
successful archiving and preservation of the Web’s history.
The significance of The SAGE Handbook of Web History is, among other things, in its recog-
nition of the need for a multifaceted approach to the preservation and study of the history of the
Web. Although the Web seems to be a relatively new phenomenon, it is as if time passes on it in
‘dog years’, at a rate that feels like seven years for each time the Earth travels around the sun.
To many people it likely feels as if it has been around forever, or as long as they can remember,
at least. As I write this foreword Facebook is in the early stages of the Cambridge Analytica
controversy; by the time it is published other controversies will likely have come and gone, too.
xxvi THE SAGE HANDBOOK OF WEB HISTORY

The question, then, is how do we capture the essence of the Web, the experience, and the con-
sequences, of it? Even if we can entirely preserve it, from the bits to the screens, the links to the
machines, what might we do to capture the myriad uses to which it is put as well as the affective
dimension of its use?
I am reminded of the early days of the Pew Internet & American Life Project in late 1999
and early 2000 when survey instruments were being developed for use in telephone interviews.
Would it be better to ask respondents if they went online, or if they used the Internet, or if they
accessed the Web? What would be the consequences for the answers we would get with each
question? Even a simple question, such as ‘Did you check your e-mail today?’ elides numer-
ous questions, not the least of which is whether the Web was used for e-mail and how such use
matters, whether it renders e-mail different from other forms of access. Similarly, Web access
via mobile devices and the increasing use of apps, some of which do little more than offer
Web pages while others seem completely apart from it, alters the perception of the Web. The
phenomenological turn expressed in some of the pages in this Handbook is therefore particu-
larly welcome, necessary and important. While I do not wish to argue that there is a distinctive
‘essence’ to the experience of the Web nor to set historical fact apart from experience, it does
seem to me useful to acknowledge that if there is ever to be an interpretive dimension to the
Web’s history then we need to think through the elements of what that dimension might be and
find ways to preserve it, too. Doing so would provide the means for reconstituting the Web as
greater than the sum of its parts and aid in understanding the Web as a multifaceted technology,
at once a medium, built on another medium, that facilitates other media.
It is important to remember therefore that the Web is not merely a technological web but also
an affective one. Consider, for instance, if during the present Cambridge Analytica controversy
Facebook were to find millions of its users deleting their accounts. What would happen to con-
versation threads on Facebook if the company permitted users to delete all of their posts along
with their accounts? Furthermore, those accounts are implicated in how people log in to other
sites and to apps, illustrating that the reach of the Web is at once both outside the scope of what
we consider the Web proper and connected to apps, and also impenetrable as Web content is
increasingly proprietarily held. Users have experience, loyalties, relationships with other users
as well as with sites and with the private and public entities that operate sites.
Furthermore, the Web, by means of its interconnection with other aspects of online (and
sometimes offline, as might be the case with, say, credit card use, travel arrangements, etc.)
activity that is facilitated by the tracking of Web use through cookies and other means, is impli-
cated in a panoply of engagements that people have, often determined by algorithm, whether
with other people, private companies, public institutions or machines and digital assistants. The
rapid shift from plain HTML to Java, Javascript, APIs, JSON and other software tools that have
made Web browsers a form of operating system have altered not only the experience of the Web
but also its nature. In all likelihood, the majority of the interactions that we have on the Web
are not only irreproducible but also unable to be preserved insofar as capturing them involves
more than the storage of information but also a replication of the processes involved between
the moment of a click and rendering of pixels on the screen. The Web is now greater than the
sum of its parts in no small part because of the processing involved in rendering information on
it, in putting data through the mill that delivers it to our screens.
I realize it may seem as if I am lamenting the development of the Web since the late 1990s,
and while I will admit to not being completely without nostalgia I am not by any means calling
for a return to a simpler Web. I am merely wanting to emphasize that it is important to under-
stand just how complicated the Web has become and with that added complexity how difficult
yet how very necessary it is to preserve it and to mine its history for insight. Indeed, when I
Foreword xxvii

reflect on the complexity and importance of the Web (and the Internet generally) I am reminded
of John Steinbeck’s plaintive edict about journalism, written in a letter to John P. McKnight,
then at the United States Information Service in Rome, in 1956. ‘What can I say about journal-
ism?’, Steinbeck wrote,

It has the greatest virtue and the greatest evil. It is the first thing a dictator controls. It is the mother of litera-
ture and the perpetrator of crap. In many cases it is the only history we have and yet it is the tool of the worst
men. But over a long period of time and because it is the product of so many men, it is perhaps the purest
thing we have. Honesty has a way of creeping into it even when it was not intended (Steinbeck and Wallsten,
1976: 526).

It takes little imagination to substitute the words ‘Web’ or ‘Internet’ for ‘journalism’ in
Steinbeck’s first sentence, and no more imagination to wonder whether it may be true about
new media. Whether it could truly be said about the Web, perhaps history will tell. That it might
be true is nevertheless a compelling reason to expeditiously and rigorously preserve all that we
can of the Web.

REFERENCE

Elaine Steinbeck and Robert Wallsten, Steinbeck: A Life in Letters. Penguin Books, 1976, p. 526.
Introduction
Niels Brügger and Ian Milligan

The Web has now been with us for over 25 years: new media is simply not all that new any-
more. It has developed to become an inherent part of our social, cultural, economic, political,
and social lives, and accordingly is now an object of historical study. With web archiving
having begun in 1996, we are also now living in a time when over two decades of the Web have
been collected, preserved, and made accessible – a detailed documentary record of society and
events. Two key points thus lie at the heart of this Handbook: that the history of the Web itself
needs to be studied, but also that its value as an incomparable historical record needs to be
inquired as well.
If researchers today want to fully understand the present, as well as our past from the mid
1990s onwards, the Web will play a critical role. While there is no common rule for when a
topic becomes ‘history’, the timeframe seems to be shortening as the speed of information dis-
semination accelerates. For example, as Ian Milligan argues in this book’s first chapter, it took
less than 30 years after the events of 1968 for a varied, developed, and contentious historio­
graphy to emerge; in 2021, we will be marking the 30th anniversary of the creation of the first
website.
Within the last decade, considerable scholarly interest in the Web’s history has emerged.
However, there has yet been no comprehensive review of the field. Digital historians and their
colleagues in the digital humanities have approached changing historical methods more gener-
ally, but without specific focus on the Web (see for example Cohen and Rosenzweig, 2005;
Gold and Klein, 2016; Graham et al., 2015; Terras et al., 2013). Internet histories provide
specific (and invaluable) accounts of particular technologies – the Web, or the Internet, or
pre-Web technologies like the Minitel for example (Abbate, 2000; Brunton, 2013; Mailland
and Driscoll, 2017) – but this Handbook aims to pull its gaze back from the particular stories,
and present a multifaceted understanding of a developing field. The history of the Web itself
remains relatively understudied, with only a few exceptions (Banks, 2008; Brügger, 2017,
2010; Brügger and Schroeder, 2017; Foot and Schneider, 2006; Gillies and Cailliau, 2000).
In any case, these books only offer specific perspectives on web history, as opposed to the
multifaceted Handbook you are now reading. The SAGE Handbook of Web History also shares
ground with the growing literature within the fields of ‘digital methods’, such as Richard
Rogers’ award-winning Digital Methods, but delves deeper into the specific questions around
the Web (Rogers, 2013). Finally, there is a body of work exploring the archived Web from
the perspective of those who collect, curate, and preserve it; these are useful complementary
works, but do not approach web archives from the perspective of a scholarly user (Brown,
2006; Masanès, 2006).
Time is ripe for this Handbook. In this introduction, we introduce the twin dimensions of
‘web history’ and discuss the structure and content of this Handbook a bit more. We then
provide an overview of the six sections, with some thoughts on how the pieces fit together to
suggest the emergence of a new field of study.
Introduction xxix

THE TWO DIMENSIONS OF ‘WEB HISTORY’

The title of this Handbook speaks to the book’s emphasis on two different, yet related, forms
of web history. On the one hand, ‘web history’ may refer to the use of the Web of the past as a
historical source in any kind of historical study. Imagine a scholar of higher education who uses
the webpages of universities and government agencies to tell her story; not a history of web-
pages, per se, but a history that happens to use archived websites to craft her narrative. On the
other hand, ‘web history’ can also refer to any study of the history of the Web itself (and where
the Web, of course, can also be a source). The SAGE Handbook of Web History focuses on both
of these two meanings: histories written with the Web as well as histories written of the Web.
The Web is both a historical source and an object of study in its own right.
As a historical source, web archives are unique. Unlike traditional institutional archives, the
snapshots that comprised the archived Web are artifacts created by the archival process itself
(Brügger, 2018). They are in many cases assembled by digital robots, ‘spiders’ which crawl
publicly accessible webpages by starting on a given page, downloading it, following all the
links, downloading that content, following links, and beyond – a potentially infinite process, as
even if the crawler found enough content and ended up back where it started, the Web is always
changing and the process must begin anew (Milligan, 2016). These reconstructions of the live
Web are never exact replicas, even though they may at first glance look complete. This makes
an understanding of web archives especially important. For these reasons, scholars, research-
ers, and students who wish to use archived web content as historical documents need to be
familiar with both the technical and material aspects of web archives, as well as the related
theoretical and epistemological concerns that arise when dealing with these digital artifacts.
Thus, this book offers historians and other researchers a comprehensive resource for navigating
web archives when undertaking historical research.
At the same time, this collection is also a text that aims to organize and present distinct
approaches to studying histories of the Web. Although ‘the World Wide Web’ is commonly
addressed in the singular, the following chapters include diverse examples of web histories that
consider various national contexts, popular and subcultural practices, material and technologi-
cal considerations, and beyond. By bringing this collection of historical case studies together,
the Handbook is a valuable resource for new media scholars looking to ground contemporary
digital cultures and practices historically.

STRUCTURE AND CONTENT

The Handbook you are now reading speaks to many different scholarly communities across a
diverse set of fields. Represented within the Handbook are communications scholars, histori-
ans, digital humanists, media and cultural studies practitioners, and researchers from the world
of library and information sciences. An international book, bringing together scholars from
multiple continents, the SAGE Handbook of Web History highlights the varied interdisciplinary
perspectives that scholars take towards an understanding of web history.
This book is intended for academics, graduate students, and upper-level undergraduates
within the humanities and the social sciences, in particular those with an interest in how the
past of the Web and of our more recent history can be and has been studied. This is a wide audi-
ence, ranging from scholars of media studies and communication history, to digital humanists,
and historians in general. With growing interest in this field – as several contributors argue,
xxx THE SAGE HANDBOOK OF WEB HISTORY

it would be difficult to imagine doing a history of the 1990s or beyond without using these kinds
of sources – some insight and overall approaches to the field of web history are necessary.
The book is divided into six sections, discussed in more depth below. Each chapter can
be read as a standalone contribution, of course, but also has fruitful interactions with its fel-
low chapters in the section: a perspective from a historian, for example, subsequently comple-
mented by that of a computer scientist.

Part One: The Web and Historiography


In the ‘Web and Historiography’, we provide the basic fundamentals that underpin the field. In
short: what does it mean to do web history? What is a web archive? Do archived websites
represent a new kind of primary source, or do they fundamentally represent continuity? These
chapters in some ways serve as an extended introduction to the field: from learning about the
existing web archives, to how they have been framed and understood by historians and other
new media scholars, to how they can be used ethically.
The first chapter, ‘Historiography and the Web’, by Ian Milligan, fleshes out some of the
discussions around what it means to think of the Web as a primary source, crucially highlight-
ing dimensions of both scale and scope. With web archives, historians have more information,
created by people who never before would have been part of the historical record. But how can
this information be accessed? Drawing on concepts from the digital humanities, notably that of
distant reading, this chapter argues that the growing centrality of web archives to the historical
profession will require a rethinking of historical methodology – and an understanding of the
place of technology within the historical profession since the Second World War.
We continue these themes in the Handbook’s second chapter, ‘Understanding the Archived
Web as a Historical Source’, by Niels Brügger. In this chapter, the fundamental elements of web
archives are articulated: how they rapidly change (even during the process of collection!) and how
they are fundamentally different from many other kinds of digitized primary sources. The chapter
then articulates a theoretical and methodological framework to carry out web history. To use web
archives requires fundamental knowledge of how they work, which this chapter provides.
With the basic contours of web archives presented, we then pivot towards talking about
‘Existing Web Archives’, a chapter by Peter Webster. Not all web archives are created the
same, and when using them researchers need to be aware of how their scope and structure can
dramatically differ. This chapter presents a historical overview of web archives, looks at how
they are created today, and examines how that affects our current historical arguments and
interpretations.
The ‘Web and Historiography’ section then comes to an end, appropriately, with a chapter
by Richard Rogers which explores ‘Periodizing Web Archiving: Biographical, Event-Based,
National and Autobiographical Traditions’. Rogers looks at the different approaches taken to
web archiving since the advent of the Internet Archive: from trying to save single sites, to under-
standing events, to national web domains or self-expression on social media. After this perio-
dization, Rogers then proposes new methodological approaches to access web archives, from
screencast documentaries to digging into the underlying code that makes a website display.
Taken together, these four chapters serve in many ways as a crash course into the world of
web archives and web archiving: who does it, what it means, how to use it, and the unique
aspects that underlie this type of primary source. From these four unique perspectives, we hope
that readers can come away with an interest in conceiving their own research questions – but
can do so fully informed of the opportunities and pitfalls that may arise.
Introduction xxxi

Part Two: Theoretical and Methodological Reflections


The section of the book involves theoretical and methodological reflections on how to do web
history: from understanding it from various intellectual perspectives (such as science and tech-
nology studies or taking a quantitative approach) to more technical approaches such as using
network analysis or large-scale text mining to understand the past. With a solid understanding
of how web archives are created as established in the last section, we now seek to put them to
work.
The section begins with Valérie Schafer and Benjamin G. Thierry’s ‘Web History in
Context’, an exploration of how ‘the’ Web needs to be put into context. How has the Web spread
and diffused around society? How can we write a history of the mid 1990s, for example, and
take the Web into proper account? In short, if we are to use the Web as a historical source, we
need to use it properly – and realize that there is no singular ‘the’ Web.
With the Web put into context, what sorts of approaches can one take to its study? One such
perspective is offered in the following chapter, ‘Science and Technology Studies Approaches to
Web History’, by Francesca Musiani and Valérie Schafer. The two authors introduce critical
STS concepts and notions, illuminating them with case studies from web history. The chapter
then fleshes out an in-depth example of web governance as a case study, showing how the Web
can thus be understood as a complex and changing socio-technical system.
If STS offers one theoretical framework, what other approaches could be taken as well? In
‘Theorizing the Uses of the Web’, Ralph Schroeder discusses various approaches to how the
Web is used and aims to point the way forward by developing a new theoretical framework –
and thinking about the implications for policy, ethics, and future scholarship.
Now that we know how to theorize and contextualize collections, the next question is how
we should do so. Ethics are a critical issue when using such diverse digital archives, as Stine
Lomborg’s chapter on ‘Ethical Considerations for Web Archives and Web History Research’
explains. Her article introduces the several dimensions that researchers should consider, from
contexts, subjects of study, methods of data collection, analytical types, and beyond; it is not
about firm answers, but a way of thinking.
With the theoretical underpinnings of the last four chapters under the reader’s belt, we then tran-
sition to discussions of method. How best to explore these archives? Federico Nanni’s ‘Collecting
Primary Sources from Web Archives: A Tale of Scarcity and Abundance’ explores two case studies
where web archives have been put to use to advance historical scholarship: from reconstructing
a university webpage to the study of contemporary events. As a scholar bridging computer sci-
ence and history, Nanni notes how combining traditional historical methods with the methods of
Internet studies and natural language processing could form an intriguing route forward.
The final four chapters in this section form a series of methodological pieces, outlining vari-
ous approaches a scholar can adopt when exploring web history. Michael Stevenson and Anat
Ben-David introduce critical concepts in their ‘Network Analysis for Web History’, providing a
conceptual background for how these key strategies can be applied: both an introduction to net-
work analysis as well as hands-on cases on how network analysis has assisted research, search,
social media, and the state of the art with web archives.
Networks are one way to uncover the structure of a site – but so are other methods for taking
lots of web archival data, translating them to numbers, and drawing conclusions from analy-
sis. Anthony Cocciolo’s ‘Quantitative Web History Methods’ chapter explains how to do this,
introducing the field of quantitative research methods and explaining the field through an in-
depth analysis of how we can study the archived Web to see the decrease in text and the rise in
image-based layouts and communication.
xxxii THE SAGE HANDBOOK OF WEB HISTORY

What much of the last three chapters have in common is their use of computers – computers
to collect, count, and analyze information. Accordingly, Anat Ben-David and Adam Amram’s
chapter on ‘Computational Methods for Web History’ explores how computational methods
can help us explore web archives, but provides cautionary notes around how methods, tools,
and techniques need to be adapted to the specific nature of the source. They do so through four
computational techniques, drawn from their research projects, showing both the benefits of this
form of research and the limitations.
Finally, working with web archives, given their scale, brings challenges in representing
research. Continuing on from earlier discussions of network analysis, quantitative methods, and
computational approaches more generally, Justin Joque’s ‘Visualizing Historical Web Data’
works us through how we can represent and explore web archives through data visualization.
Some of this builds on networks specifically, and other parts of the chapter address how we can
visualize both text and change over time as well.

Part Three: Technical and Structural Dimensions of Web History


Web history, indelibly associated with a particular platform – the World Wide Web – requires
that students, scholars, and practitioners have a basic technical understanding of underlying
protocols, infrastructure, and access materials. We need to understand the structure of the Web
as well, such as the hyperlinks that users explore to move from page to page, or the Web brows-
ers such as NCSA Mosaic or Internet Explorer that they used to access it from the early 1990s
onwards.
What better place to start with an exploration of the technical underpinnings of web his-
tory than with the Hypertext Transfer Protocol (or HTTP) itself and how it has been extended
through the Memento Protocol to allow the integration of past and present Web. Michael Nelson
and Herbert Van de Sompel accordingly discuss the history of Unix, HTTP, and introduce the
Memento Protocol in their ‘Adding the Dimension of Time to HTTP’ chapter. Through these
explorations, we can see how the original vision of the Web was stymied in part due to issues
with the filesystem. The work discussed in this chapter has made large amounts of web history
scholarship possible, by allowing versioning and the integration of web archives all around the
world.
We often take the HT in HTTP for granted – the hypertext that underpins the Web today. Yet,
as Belinda Barnet notes in her ‘Hypertext Before the Web – or, What the Web Could Have
Been’ chapter, hypertext stretches back much earlier. Looking at three early systems – Douglas
Engelbart’s NLS, Ted Nelson’s Xanadu,1 and Nelson and Andies van Dam’s HES – this chapter
both introduces the early conceptual underpinnings of the ever-present hyperlink today, and
also allows us to imagine alternative visions of what it could have been.
Focusing more specifically on the hyperlink itself, but continuing to question existing nar-
ratives and putting aside a more simplistic Web 1.0 to Web 2.0 paradigm, Anne Helmond’s
‘A Historiography of the Hyperlink’ explores how the link itself has evolved. Using six key
moments, from proto hypertext to the role of the link before search to how the link is disappear-
ing today, the chapter illuminates the history of the Web itself as well as the link.
Taking a similar historical approach, Alexander Halavais looks at ‘How Search Shaped and
Was Shaped by the Web’. Taking aim at traditional narratives that saw search engines as hav-
ing been ‘bolted on’ to the Web, this chapter notes how the Web and search have co-evolved
together – a process that continues to shape and evolve with our modern Web today. Users used
to ‘surf’ the Web, using links, a process which has been profoundly reoriented by search.
Introduction xxxiii

Roads taken – or not – lie at the heart of Lindsay Poirier’s chapter on ‘Making the Web
Meaningful: A History of Web Semantics’. In the original vision of Tim Berners-Lee’s Web,
nodes that linked to each other would represent not documents but individuals or objects; while
the Web that emerged was not like this, since 1994 Berners-Lee and others have been calling
for the realization of a ‘semantic Web’. This chapter explores the semantic Web community,
looking back to the 1970s and forward to help us understand how the semantic Web shapes our
understanding of knowledge today.
The final two chapters in this section then address two stories in how we use the Web. The
first, ‘Browsers and Browser Wars’, by Marc Weber, explores the history of how the Web
is accessed: through a web browser. It argues that in the Web’s early days browsers became
the main battleground for overall control of the Web. The first struggle, between the original
browser-editor vision and the simpler “read-only” model that prevailed, was followed by two
commercial “browser wars.” While these mostly ended by 1999, a mobile war continues today.
This last current is picked up in the section’s final chapter, with Gerard Goggin’s ‘Emergence
of the Mobile Web’. Users increasingly browse the Web on mobile devices, and it is increas-
ingly mediated through different social media platforms, apps, and other software. This chapter
explores how the mobile Web can be defined and studied.

Part Four: Platforms on the Web


On top of the Web’s structure exist platforms: from collaboratively written encyclopedias like
Wikipedia, to blogging software, to nearly ubiquitous advertisements and social media net-
works like Twitter, Facebook, and, in the past, web publishing platforms like GeoCities. This
section gathers five chapters which take explicitly historical approaches to explore their very
different ‘platforms’. Useful for an understanding both of those particular sites, and how the
Web evolved from the mid 1990s to the present, they help to contextualize the more targeted
case studies that follow.
The first platform we explore is Wikipedia, in Andy Famiglietti’s ‘Wikipedia’. The story
travels from 2001, when Larry Sanger sent his first note about ‘Nupedia’ to a small number of
people on their mailing list, to today, when Wikipedia’s millions of articles constitute the fifth
most visited website in the world. Yet, while it is occasionally understood as an exemplar new
media project, Famiglietti instead shows how it is a historically contingent project, growing
out of the free and open source software movement. Many of these themes come together in an
extensive case study on the 2008 Gaza War article.
With the Web today seemingly dominated by advertising, a historical perspective can
help us understand its growth: how did an information retrieval tool become so saturated
by commercial messaging? In ‘A Critical Political Economy of Web Advertising History’,
Matthew Crain looks at the development of technologies, standards, and practices that
brought advertising onto the mainstream of the Web. By using a critical political economy
approach, Crain is able to weave in the interplay between the Web and larger political and
economic questions.
While the Web was developed as a read–write medium, it was difficult for many early web
users without technical knowledge to find a place to contribute – GeoCities provided one such
important platform. In his chapter ‘Exploring Web Archives in the Age of Abundance’, Ian
Milligan argues that these sorts of platforms allow more democratic, accessible historians – Big
Data as an avenue into social history on a large scale – but that the steps to explore such sources
will require a profound rethinking of how historians understand the past.
xxxiv THE SAGE HANDBOOK OF WEB HISTORY

If GeoCities was an early platform, by the turn of the twenty-first century many web users
were turning to blogs as an outlet for self-expression. In his aptly named chapter ‘Blogs’,
Ignacio Siles explores the story of blogs in the United States: why they arose and how they
continue to develop today. It is also a fascinating example of mixed-method research: draw-
ing on over 100 interviews, archival research in traditional archives as well as on the Web, and
ethnographic studies.
Finally, in our last chapter in this section, Christina Ortner, Philip Sinner, and Tanja Jadin
explore ‘The History of Online Social Media’. Their chapter provides an overview of the social
media phenomenon as it emerged in the late 1990s before expanding dramatically throughout
the 2000s. They do so through a four-phase periodization that explores online interaction before
social media, web-based services, the emergence of diverse social media services, and the eco-
system of mobile apps today. This big picture exploration of modern social media platforms is
a perfect way to round out this section.

Part Five: Web History and Users, some Case Studies


With the foundational pieces complete – including historiography, theory, technical, structural,
and platform dimensions – the SAGE Handbook of Web History then turns itself to the largest
section of the book: understanding web history and users through a series of 14 case studies.
Each of these chapters are exemplars, ranging from an understanding of online news through
web history, to streaming media file formats, to the Chinese Web, to trolling and memes. A
diverse assemblage of authors and topics, these chapters shed light not only on their particular
topic but also on various methods for undertaking web history.
Like many of our voyages on a web browser, we begin appropriately at ‘home’ – in this
case, with Madhavi Mallapragada’s ‘Cultural Historiography of the “Homepage”. Her piece
interrogates the concept of the homepage, exploring it as both a technical and cultural concept.
Taking a historical approach, Mallapragada explores the emergence and shifting significance of
the homepage in the 1990s and 2000s, as well as its continued relevance today.
Change is also key to any understanding of news media in the context of web history. Allie
Kosterich and Matthew S. Weber explore the dramatic shifts that have transformed how con-
sumers engage with news in light of the Web in their chapter ‘Consumers, News and a History
of Change’. Their historical lens allows them to convincingly demonstrate that many contem-
porary shifts in how we consume media today can be traced back, through web history, to shifts
on the Web.
Many of our case studies throughout the book, like the last two and several others, are
grounded in particular national contexts. This should not be surprising, Niels Brügger and
Ditte Laursen argue in the next chapter, ‘Historical Studies of National Web Domains’. Despite
the global nature of the Web – we can access content made half the world away as easily as
something written next door – users tend to situate themselves in nations. Accordingly, the two
authors introduce the national Web as an object of analytical inquiry.
The next 11 chapters then move into specific case studies, each of which illuminates the field
of web history as well. James O’Sullivan and Dene Grigar explore ‘The Origins of Electronic
Literature as Net/Web Art’, a chapter which explores the story of e-lit, from its diskette-based
origins to its Web-based homes today. This shift onto the Web had profound consequences for
this medium – similarly, the authors note, to many other cultural objects migrated to the Web.
Historians who study memory have similarly seen their scholarship transformed by the
Web – our collective memory of events over a century ago is shaped by the Web as well.
Introduction xxxv

Valérie Beaudouin, Zeynep Pehlivan, and Peter Stirling approach this question in their chap-
ter ‘Exploring the Memory of the First World War Using Web Archives: Web Graphs Seen from
Different Angles’. By creating a map of links between First World War websites, they are able
to see how online communities socially organize the memory of the war.
Just as the Web can be studied to see the spread of memory, so too can web archives help
explore the historical roots of contemporary issues such as vaccine refusal. Gareth Millward
addresses this in his ‘A History with Web Archives, Not a History of Web Archives: A History
of the British Measles–Mumps–Rubella Vaccine Crisis, 1998–2004’. His chapter both explores
the MMR vaccine controversy through several archived webpages and demonstrates his sub-
stantive findings as well as the methodological issues he encounters when trying to work with
this material – necessitating engagement with both web archives as well as traditional resources.
This theme of considering web archives as just one among many primary documents is also
foundational to Peter Webster’s chapter on ‘Religion and Web History’. To study religion over
the last 20 years is to also, in many cases, consider the Web: both how religions are shaped by
the Web, and how web history has interacted with major themes in the field, from secularization
to the place of religion in society. Webster also explores future directions in this field.
We then shift gears away from specific case studies to explorations of particular aspects
of the Web. Jeremy Wade Morris explores the sonic element in his ‘Hearing the Past: The
Sonic Web from MIDI to Music Streaming’. While we tend to focus on visual and text, Morris
explores what we can learn from web history by listening – accordingly, he covers the history of
sound on the Web (from technological developments to commercialization) – and also explores
the barriers that we face due to obsolete and proprietary file formats.
Considering a broad range of media types, the next chapter, ‘Memes’, by Jim McGrath,
explores how text, images, animated GIFs, sound, and short movies can spread around the
Internet. After an introduction to the concept in general, McGrath explores where memes
emerged on the Web, beginning in early-1990s Usenet groups, through the recent rise of the
‘meme’ term itself on the Internet, using it as an avenue to look at just how quickly the Web
has transformed.
Transformation underpins the next chapter as well, by Gabriele de Seta. ‘Years of the
Internet: Vernacular Creativity before, on, and after the Chinese Web’ explores the evolution of
the Internet in China, focusing on six years – from the first email, to BBSs, and messaging and
chat services. Showing continuities in the vernacular creativity of Chinese Internet users, the
unique evolution of the Web in China can be seen in this important story.
Pulling the gaze back from China alone to considering East Asia as a region, Mark
McLelland’s ‘Cultural, Political and Technical Factors Influencing Early Web Uptake in North
America and East Asia’ explores Taiwan, mainland China, Japan, and Korea. Given the Euro-
American assumptions that underpinned much of the early Web – from QWERTY keyboards to
non-Roman script encoding – the adoption of the Web took a different route there. His chapter
begins with East Asian pre-Web systems before exploring how these early encounters led to
unique web cultures by the mid-to-late 1990s.
Our final three chapters in this section cover three aspects that the Web’s designers may not
have had in mind. The first, ‘Online Pornography’, by Susanna Paasonen, explores the world
of online pornography. The pornography industry, Paasonen argues, has underpinned much of
the Web’s technical development over the last two decades – from credit card processing to
streaming video and advertisements – and the Web has had considerable impact on pornogra-
phy as well.
Spam, the mild irritant that we all face in our inboxes, is another critical element that we
need to understand to grasp how the Web has evolved – just as pornography shaped the Web, so
xxxvi THE SAGE HANDBOOK OF WEB HISTORY

too did spam, influencing search engines, how we use the Web, legal frameworks, and beyond.
Finn Brunton explores this in his ‘Spam’ chapter, using spam as an avenue into a counter his-
tory of the Web.
The section then concludes with Michael Nycyk’s ‘Trolls and Trolling History: From
Subculture to Mainstream Practices’. This story of a small Internet-based subculture that ended
up shaping the mainstream social media and web platforms helps us take an alternate vision of
the Web’s history. From Usenet to Web 2.0, trolls have evolved alongside the Web.

Part Six: The Roads Ahead


After this array of case studies and web history approaches, it becomes clear that there are
many roads forward in the field of web archives. We give the closing words of the book to Jane
Winters, a professor of digital history, whose chapter ‘Web Archives and (Digital) History: A
Troubled Past and a Promising Future?’ investigates the failure of historians to meaningfully
engage with web archives as of 2018 – despite pressing reasons to do so. She then proposes a
new future, one which sees historians mixing both qualitative and quantitative approaches, to
reclaim the stories of everyday people.

THE NEW FIELD OF WEB HISTORY

If we are going to be able to write histories of the 1990s and beyond, not only of the Web but of
any social, cultural, political, economic, or beyond phenomenon that was itself reflected in the
Web, we need to understand the context in which these sources were created. In this lies the impor-
tance of web history as a field. A spam message in 1994, or a troll in 2008, or a homepage in 1999
were all created in particular contexts – which all need to be understood for their responsible use.
Similarly, while those who study web history need not be coders, they do need some understand-
ing of the underlying technological platforms and interfaces, from the origins of the hyperlink to
an understanding of how the web browsers that users used to access the Web changed over time.
The Web has now been with us for over a quarter of a century. Historians are not soothsayers,
able to magically use the past to understand the future, but they can begin to get a sense of the
broad contours of an evolving historical force with an eye to what it might mean for particular
outcomes. With such a rich historical record and legacy to draw on, any contemporary under-
standing of the Web can only be enriched by an understanding of its historical context; and any
breathless exploration of what the future might hold can benefit from an understanding of the
legacies at play.

ACKNOWLEDGMENTS

Our sincerest thanks, in the first place, to our contributors. We couldn’t have asked for a better
group of scholars to work with as we put this collection together. Many of the contributors
picked up the gauntlet and drafted expansive chapters, informed both by their existing research
agendas but in many cases by moving into new areas and avenues. It goes without saying that
this book would not be possible without their phenomenal work.
Introduction xxxvii

Our gracious thanks as well to Chris Rojek, the Sociology publisher at SAGE who provided
so much initial support on the project before passing on the baton to editors Mila Steele and
Michael Ainsley. Thanks also to Colette Wilson, Anwesha Roy, Serena Ugolini, and Matthew
Oldfield at SAGE for all of their support with making this Handbook a reality!
We would also both like to thank Megan Sapnar Ankerson from the University of Michigan
for her assistance and guidance in the first stages of bringing our ideas together. Ian Milligan
would also like to thank his colleagues in the Web Archives for Historical Research Group:
Nick Ruest, Jimmy Lin, Ryan Deschamps, and Samantha Fritz, for providing continual encour-
agement and insights into the world of web archiving research.

Note
1  Xanadu is a registered trade and service mark of Project Xanadu, Sausalito CA 94965, US.

REFERENCES

Abbate, J. (2000) Inventing the Internet. Cambridge: MIT Press.


Banks, M. (2008) On the Way to the Web: The Secret History of the Internet and Its Founders.
Berkeley: Apress.
Brown, A. (2006) Archiving Websites: A Practical Guide for Information Management Professionals.
London: Facet Publishing.
Brügger, N. (2010) Web History. New York: Peter Lang.
Brügger, N. (Ed.) (2017) Web 25: Histories from the First 25 Years of the World Wide Web. New
York: Peter Lang.
Brügger, N. (2018) The Archived Web: Doing History in the Digital Age. Cambridge: MIT Press.
Brügger, N., and Schroeder, R. (Eds) (2017) The Web as History. London: UCL Press.
Brunton, F. (2013) Spam: A Shadow History of the Internet. Cambridge: MIT Press.
Cohen, D., and Rosenzweig, R. (2005) Digital History: A Guide to Gathering, Preserving, and Pre-
senting the Past on the Web. Philadelphia: University of Pennsylvania Press.
Foot, K.A., and Schneider, S.M. (2006) Web Campaigning. Cambridge: MIT Press.
Gillies, J., and Cailliau, R. (2000) How the Web was Born: The Story of the World Wide Web.
Oxford: Oxford University Press.
Gold, M.K., and Klein, L.F. (Eds) (2016) Debates in the Digital Humanities 2016. Minneapolis: Uni-
versity of Minnesota Press.
Graham, S., Milligan, I., and Weingart, S. (2015) Exploring Big Historical Data: The Historian’s Mac-
roscope. London: Imperial College Press.
Mailland, J., and Driscoll, K. (2017) Minitel: Welcome to the Internet. Cambridge: MIT Press.
Masanès, J. (2006) Web Archiving. Berlin: Springer.
Milligan, I. (2016) ‘Lost in the Infinite Archive: The Promise and Pitfalls of Web Archives’, Interna-
tional Journal of Humanities and Arts Computing, 10(1): 78–94.
Rogers, R. (2013) Digital Methods. Cambridge: MIT Press.
Terras, M., Nyhan, J., and Vanhoutte, E. (Eds) (2013) Defining Digital Humanities: A Reader.
London: Routledge.
This page intentionally left blank
PART I

The Web and Historiography


This page intentionally left blank
1
Historiography and the Web
Ian Milligan

INTRODUCTION for a better understanding of ourselves, and


beyond. Web archives will be a foundation
The advent of the Web as a primary source for history.
will dramatically affect the practice of Crucially, historians need to be ready for
researching, writing, and thinking about his- this shift. They will soon be writing histo-
tory. Historians are entering into an era ries of the 1990s that require web archives to
where we will have more information than do justice to their topics – and they need to
ever before, left behind by people who rarely be ready. While there is no exact metric for
before entered the historical record. Web when past events become fodder for histori-
archives will fundamentally transform much cal interpretations, it is worth noting that the
of what a historian does, requiring a move first historical narratives of the 1960s in the
towards computational methodologies and United States and Canada for example began
the digital humanities. to appear in the 1980s; by the 1990s, estab-
Web archives matter. One cannot write lished monographs and doctoral studies could
most histories of the 1990s or later without be undertaken (Gitlin, 1987; Isserman, 1987;
reference to web archives, or at the very least Kostash, 1980; Levitt, 1984; Owram, 1997).
to do so would be to neglect a major medium As the Web is now well over 25 years old, we
of the period. Web archivists and other insti- are roughly at the time when the first seri-
tutions are today engaged in the collaborative ous historical studies will begin to be under-
effort to ensure that people in the future know taken, and it is likely that many trailblazing
what happened in 1996, or 2001, or 2006, or doctoral students in the field are now begin-
today. This ensures that we as a society will ning to contemplate their first degrees. To
have the information that we need to make not use web archives would run the very real
arguments for justice, for equality, for policy, possibility of fundamentally misrepresenting
4 THE SAGE HANDBOOK OF WEB HISTORY

any of the above topics. This will happen and considerably diminished by not consid-
sooner than we think, too. Not only are the ering these sources. This is not a niche area.
1990s history, the Web is now over 25 years Crucially, they underscore that Web histories
old, with widespread web archiving begin- will not just be histories of the World Wide
ning over two decades ago with the Internet Web (although those are important and well
Archive in 1996. represented in this Handbook), but histories
This chapter explores what the chang- that happen to use the Web as a primary
ing nature of historical scale will mean for source because of its significant role in
historians. It begins by discussing how web knowledge production and communication.
archives will become increasingly central The novelty of these web archives can be
to the historical profession. Following this, seen in two respects: that of scale, in that we
drawing upon Franco Moretti’s concepts have more data than ever before, and that of
of close reading versus distant reading, it scope, where different kinds of sources that
advances a typology of research projects car- were rarely preserved before are now being
ried out to date. The chapter then discusses so. In this section, I will explore scale and
the next directions for the field, especially scope in turn.
the growing importance of metadata analysis We are now working with sources that
rather than exploring content itself. It con- are being preserved on a different scale than
cludes by situating our contemporary trend historians are previously used to working
into a ‘third wave of computational history’, with. In this we are seeing the insights of
suggesting how historians could profit by the late American historian Roy Rosenzweig
understanding themselves in their own his- borne out, as he foresaw in a 2003 American
torical context. Historical Review article that historians were
shifting from an environment of scarcity to
one of abundance (Rosenzweig, 2003). In
other words, historians have traditionally
THE GROWING CENTRALITY OF wished we had more information about the
WEB ARCHIVES TO THE HISTORICAL past – now, when working with web archives,
PROFESSION historians are threatened by having too many
sources to parse and explore.
To reinforce the importance of web archives, Some examples can bear out the sheer
consider all of the things that one could not scale of born-digital content being generated
write a history of without using web archives. every day. A constantly updated page, ‘My
Without web archives, one could not write Data is Bigger than Your Data’, published
histories of the late 1990s Tamagotchi trend, by University of Waterloo computer scientist
figuring out what that meant about our rela- Jimmy Lin, gives a tally of the ever-changing
tionship to animals, each other, and technol- boasts of just how big datasets are. A few
ogy; or political histories of the late 1990s on examples help to bring this into contrast.
early Internet censorship, from the V-Chip to In January 2017, Twitter announced that it
the Communications Decency Act in the was storing over 500 petabytes of informa-
United States, critical moments in our early tion (one petabyte is 1,000 terabytes). The
Web history that might have fundamentally Internet Archive has over 30 petabytes of
changed how we interacted with the medium; archives, with approximately 13 to 15 tera-
or economic and business histories of the bytes per day being added to its collections.
1990s dot.com bubble; or even events of piv- Spotify, a music streaming service, collects
otal significance like the attacks of September over a terabyte of user data every day from its
11th, 2001. Each of the above would be innu- over 75 million users and one and a half bil-
merably enhanced by the use of web archives, lion playlists. YouTube sees a petabyte of data
HISTORIOGRAPHY AND THE WEB 5

uploaded every single day (Lin, n.d.). Not all select, appraise, and preserve. We need to
of this will be kept. Content on Snapchat, mind gaps in coverage and inclusion. This can
for example, is filled with largely intended be done by looking at who uses the Web and
transient content; similarly, Facebook and how, often drawing on studies of contempo-
many corporate databases will likely not be rary or historical studies (Blank and Dutton,
archived for historical consumption. Given 2014; Duggan, 2015). Additionally, we need
ethical concerns and rights to privacy, this to understand gaps in how web archives find
is not necessarily a bad thing! But even if a content on the Web and what gets preserved
fraction of the above is kept, historians will and what does not (Thelwall and Vaughan,
be challenged to no end. In particular, the 2004). But while we need to understand the
Internet Archive’s activities are of interest, context in which web archives are created, we
as they are collecting with an eye to future also need to understand how these collections
research access. In short, the amount of infor- will in many cases be broader than what has
mation generated on the Web means that our been left behind in the past. Historical prac-
historical record is dramatically changing. tice has been dominated by archival study
The expansive scope of web archives, too, since the nineteenth century, often govern-
has the prospect of bringing more people ment and institutionally dominated reposi-
into the historical record. Much of what can tories of information about what has come
be found in the Internet Archive are primary before. While web archives certainly have an
sources authored by people who never before institutional perspective, such as collections
would have been part of the historical record. from governments or of university calendars,
It is not simply that instead of learning about for example, the new forms of citizen histories
Tamagotchis from The New York Times or present a more democratic record of the past.
The Guardian we can learn about them from To reinforce this shift, I often think to my
Web-based sources, but that we can begin own first book on Canada’s 1960s, where I
to work with the pages of people who actu- studied how young workers, students, and
ally used Tamagotchis. Young kids and their activists engaged with each other (Milligan,
parents created sites in the GeoCities child- 2014). These were difficult questions to
focused section, for example, allowing us to explore, even though the events had only
work with this innovative primary source (see taken place 40 years before. Student activists
my chapter on GeoCities in this Handbook). do not keep minutes of meetings that took
It is emblematic of a broader shift in the cat- place late in pubs, coffee houses, or com-
egories of people who we can learn about munal homes; nor do young workers who
(Milligan, 2016). Think of the long list of are engaged in illegal wildcat strikes. Union
sources that we will now have thanks to the newsletters tended to stress official lines,
widespread advent of web archiving. Pages leaving little room for alternative visions. My
by children, blogs by everyday teenagers, sources then were crumbs of the past: memo-
students, and adults, or even individuals who ries from a few, garnered through around 70
can give a unique perspective on unfolding oral interviews, though it was hard to find
world events (such as the Russian soldier names or contemporary contact informa-
who posted the MH17 missile launch on tion: police reports, including of graffiti on
Facebook) (Taylor, 2014). the side of an industrial building; inform-
Web archives do not provide a perfect rep- ants; bewildered newspaper articles in the
resentation of the past – they offer facsimiles mainstream media; occasional write-ups in
of the original pages, and much is missing a student newspaper or a manifesto saved
or of reduced functionality (Brügger, 2013a) in a university archive. Tackling a similar
– but neither do traditional archives, which question today would be different. Think of
have had to be very selective with what they the resources collected around Occupy Wall
6 THE SAGE HANDBOOK OF WEB HISTORY

Street, or the Canadian First Nations #idleno- The advent of web archives has the poten-
more movement, or just the regular activities tial to expand scholarship in several respects.
of youth culture. Think of the blogs and social media that doc-
Indeed, this animated my own first foray ument what people are eating – indeed, one
into studying web archives, when I worked of the canards against social media is often
with an old collection of websites that were ‘I don’t care what you had for dinner’. But a
created by Canadian youth in the 1990s sense of what everyday people eat, consume,
under the auspices of a Canadian federal and idealize when it comes to food is going
training project. When I found these archived to be valuable; historians have long strug-
websites over a decade later, I began to real- gled with really understanding what we eat in
ize that these sources were different from our private homes, relying largely on mem-
what I had seen before (Milligan, 2012). ory, raw material information, government
In digging into youth footprints, I realized reports, how people responded to rationing,
that I could even find my own first trace of a and the like (Mosby, 2014). Alternatively, we
digital past, from when I was an 11-year-old can have new understandings of social move-
boy in 1995 asking for help about a board ments like Occupy Wall Street. If you relied
game online. Eleven-year-old boys did not, on The New York Times or the Washington
as a rule, leave sources – yet now they do. Post to understand what happened in Zuccotti
This does not make things easier, of course Park in New York City, you would have more
– one still needs to find these sources, iden- of a skewed vision than if you were able to
tify who is who if they want to, and begin to use the Occupy websites themselves (used
think about what is being crawled and what for coordination, publicity, and making the
is not. It is an incomplete record, affected organization come together).
by profound issues of access, but is cer- Occupy is also a good way to underline
tainly bigger and more expansive than what the importance of preserving Web materi-
we had before. als. It can help underline the importance of
preserving and documenting this material,
and how this is the sort of work that needs
to be done without delay. Of all the Occupy
The Infinite Archive
sites, created in the heat of an ever-evolving
Historians work with a necessarily incom- movement, two years later only 41% of the
plete record of the past. Indeed, the two sites were active (LaCalle and Reed, 2014).
terms – history and the past – are not syno- Unlike physical media, which can – espe-
nyms. The past happened, whereas history is cially if printed on acid-free paper – be sur-
created from the traces of the past that persist prisingly durable (imagine a book, placed
(from memories, to physical fragments, to on a shelf, and coming back to it 20 years
archival documents, to tweets). The vast later; you can probably access the content,
majority of things that happen are never even if there have been some moisture prob-
recorded, something which philosophically lems and the like), these sources require
remains the same. But right now a web active preservation. Server fees need to be
crawler is following links, downloading paid, and with only brief interruption, they
pages, following links, working in that poten- can disappear. And apart from high-profile
tially infinite process that creates the archives projects that had considerable funding and
of our lives. In this section, I would like to expertise, such as the reconstruction of
provide some reasons for how these traces of the first web page hosted at CERN or the
the past that now exist that would never have first North American site at Stanford, once
existed before matter and have the potential it is gone it is gone (Karampelas, 2014;
to reshape our historiography. Tsukayama, 2013).
HISTORIOGRAPHY AND THE WEB 7

Web Archiving Scholarship Today: files from a non-standard web archive, and
A Quick Glimpse beyond. In so doing, this essential work helps
us understand the broader challenges of digi-
All of this is to say that the research landscape tal preservation, and how much work goes
for historians seeking to study the 1990s is into individual stories.
changing, and that it matters that historians are Yet we also have operations on a completely
ready to engage with this material now. So different scale, requiring ‘distant reading’.
what are historians doing with it? We are only This leverages the power of what modern
now just beginning to see what users can do data mining and text analysis tools can do on
with web archives, beyond nostalgic explora- large amounts of data. This approach of dis-
tions and the such. These largely fall into two tant reading is inspired by the literary work
rough categories: traditional close reading, of Franco Moretti, or the wide gulfs of time
and the computational digital humanities and space that earlier Annales historians like
approach of distant reading (Moretti, 2007). Fernand Braudel explored (Braudel, 1972;
Some of these histories are close read- Moretti, 2007). These are stories on a mas-
ings, or the inquiry of one website or a few sive, unprecedented scale, aided by compu-
different sites – dozens, or hundreds even, tational analysis. Peter Webster, a pioneering
but still of a scale where the historian can UK historian, has used the link graphs of the
click, scroll, read, and explore the individ- UK Web Archive to explore who linked to
ual sites to a very high degree. This builds creationist websites. Did governments, aca-
on a great textual tradition within the his- demics, bloggers, Christian media, or the
torical field, where we dive deeply into one mainstream media, for example, link to these
story, flesh out the stories, and in so doing sorts of sites? This was large-scale analysis
learn something far more about the broader that would not have been possible if it were
context of the events that we are studying as to be carried out manually. In this study,
well. Several of these approaches appear in Webster noted that creationists mostly talk
the Handbook chapters that follow in part. to themselves, and are ignored by academia,
Federico Nanni, for example, has explored media, and the churches; underscoring that
the University of Bologna’s website in depth even among evangelicals this was a particu-
and has tried to reconstruct the story of that larly minority view (Webster, 2014). This use
one page, drawing on the Internet Archive, of link graphs was also employed in a recent
university IT departments, and eventually piece by Webster, looking at how linking pat-
using oral histories and personal contacts to terns changed in the wake of the Sharia Law
tell this deeply textured story. In so doing, controversy in the United Kingdom; and, for
it becomes a good entryway into the robots. example, how these links might reflect real-
txt exclusion protocol and how websites can world interest in various dioceses (Webster,
be removed, even retroactively, from the 2017). Or, at times, this has even involved
Internet Archive; and how a recent history of looking at entire national country-code top-
the website can be done through a close read- level domains (ccTLDs), such as the Danish
ing of one domain (Nanni, 2016). Similarly, .dk. Recent work by Niels Brügger has
Stanford University Libraries carried out an explored the general challenges and opportu-
in-depth process to reconstruct the first web nities presented by exploring a nation’s web
server’s page outside of Europe, that of the domain (Brügger, 2017). There is even prom-
Stanford Linear Accelerator Center’s web- ising work that looks at now-defunct top-level
site, using the SLAC backup system (AlSum, domains, such as the deleted Yugoslavian
2015; Deken, 2017). It is an in-depth story .yu. Anat Ben-David of the Open University
of digital preservation, drawing on source of Israel conducted pioneering research into
code, newsgroups, generating web archive the shape of this deleted domain, looking at
8 THE SAGE HANDBOOK OF WEB HISTORY

how the networked structure of millions of sort of work. There is a necessity to democ-
pages dramatically changed between 1996 ratize access to this material, so that histori-
and 2010. She found that the internal links ans and everyday people have better access
emerged after the fall of autocrat Slobodan to this important digital cultural resource.
Milosevic, and then began to fall apart as Currently, most historians and other research-
new domains for the independent countries ers access web archives via some flavor of a
emerged; as she puts it, ‘the intra-domain ‘Wayback’ instance. You can see a version of the
linking patterns of the .yu domain are closely Internet Archive’s instance in Figure 1.1. The
tied with stability and sovereignty’ (Ben- Wayback Machine, which is what the Internet
David, 2015, 2016). Archive calls its version of the Wayback, was
We can see from this very brief over- launched in 2001 to provide access to its collec-
view that there is great power and potential tions. It is now also available as an open-source
in being able to use these web archives. Yet project known as ‘Open Wayback’.
the number of scholars working with web The Wayback Machine allows a user to
archives today is still relatively small, due in temporally browse ‘back in time’, by render-
no small part to the problem of access. ing collected pages and making links find
the closest snapshot to the page that they are
launching their search from. In short, one
Going Wayback: Closely Reading can load up the American White House’s
home page from December 27th, 1996 and
Web Archives
when clicking on links within the Wayback
One of the major limitations of a web history, Machine, one is brought to snapshots of that
however, is the sheer difficulty of doing this page as close to that day as possible.

Figure 1.1 The White House viewed in the Wayback Machine.


HISTORIOGRAPHY AND THE WEB 9

Users can find content in the Wayback Traditionally, the historian’s approach can be
Machine by two means. First, they can type likened to that of a microscope: closely
in the URL of the address that they know exploring small numbers of documents; in
they want to find (nytimes.com, for example). an era of ever-growing datasets, the macro-
Second, they can do a limited full-text search scope may be the better metaphor (Graham
on the home pages of websites (type ‘New York et al., 2015). Niels Brügger has identified
Times’ and you find the URL of the page, which five different layers when it comes to study-
can be very useful when pages have changed ing the Web: the Web as a whole, web
their URL several times over their life). spheres, web pages, and web elements
Researching with the Wayback Machine (Brügger, 2009, 2018). The concept of web
lends itself to close reading. A user browses spheres comes from Schneider and Foot,
one page at a time, and is limited by the speed who define this as ‘not simply a collection of
of the Wayback Machine and the underly- websites, but as a hyperlinked set of dynami-
ing server. It is not a speedy experience, but cally defined digital resources that span
it does in many ways replicate the experience multiple websites and are deemed relevant,
of working with traditional documents. Even or related, to a central theme or “object”’
when working with the Web in other ways, (Schneider and Foot, 2004: 118).
finding URLs through various discovery meth- The debate between a wider focus – think-
ods discussed below, the end point is usually a ing of the Web as a whole à la Schneider,
Wayback Machine. It is how we view the doc- Foot, and Brügger – and a narrower one
uments themselves. Yet for most research pro- speaks to the importance of both close and
jects, the Wayback Machine will not be enough. distant reading. In related work, I have argued
When working at scale, such as the millions of that larger perspectives benefit our under-
pages of GeoCities, one cannot click through standing of the Web, given the importance
each page manually; and the basic search func- of context for understanding single docu-
tionality that the Internet Archive provides is ments (Milligan, 2012, 2016). Other scholars
limited only to home pages. have noted the importance of a ‘Big Data’
But imagine trying to use the Wayback perspective to derive meaning from large
Machine at scale: you would have to click amounts of cultural data (Aiden and Michel,
through pages manually, limited to home page 2013; Mayer-Schönberger and Cukier, 2013;
searches, and beyond – it would take dec- Schroeder, 2014). Some historians, however,
ades, if not more, just to look at all the pages. based on their experience with web archives
Accordingly, new systems of access are being have questioned the idea that bigger is neces-
developed to explore this. Some of these are sarily better. Gareth Millward, who worked
simply various full-text search interfaces, with web archives as part of the BUDDAH
designed not to handle the massive amount project, argued as much in a Washington Post
of data within the Wayback Machine, but to opinion piece:
be implemented on relatively smaller collec-
tions. Enter the world of distant reading. We’re going to have to realize that we can’t read
everything. We already do this with printed docu-
ments, but we need to be more explicit about it
and more willing to admit it. Smaller samples of
Computational Access to Web websites, specifically chosen for their historical
importance, can give us a much better under-
Archives: We Want to Read It All
standing. We can begin to ask questions about
but Cannot how sites are constructed and what information
people and organizations chose to reveal. Similarly,
Historians like to work with content – but the much more focussed searches on smaller time
scale of web archives makes that difficult to periods, more marginal topics, or specific cultural
achieve. What should historians then do? groups can produce a more manageable ‘corpus’
10 THE SAGE HANDBOOK OF WEB HISTORY

for reading and manipulating in the same way we software program that works with the underly-
would on our trips to traditional archives. ing WARC files in the collection, we extracted
(Millward, 2015)
all the hyperlinks between sites and aggre-
gated them by domain.1
Millward also argues that metadata, the links For example, any time a page within the left-
between websites, might be the most useful wing New Democratic Party of Canada domain
way forward – finding common ground with (ndp.ca) linked to another page, we counted
other scholars (Brügger, 2013b). the domain that it came from (ndp.ca) and
Metadata is an important concept that thus the domain that it was going to (for example,
lies at the heart of web archival research. NewYorkTimes.com). By doing so we could
While a difficult-to-define term, it can per- tell stories about the web archive that eluded
haps be best understood – as the American single page readings. In these hyperlink graphs,
National Security Agency (NSA) does – as for example, we can see the left-wing New
‘information about content (but not the con- Democratic Party of Canada (ndp.ca) linking
tent itself)’ (Greenwald, 2014: 132). Indeed, heavily to the centrist Liberal Party of Canada
the NSA’s definition is a useful begin- (liberal.ca), and largely ignoring the right-wing
ning point, as the NSA itself underscores Conservative Party of Canada (conservative.
the power of metadata. The 2013 Edward ca). This is due to several reasons, notably that
Snowden revelations that the agency had the Liberals are in power at this time. Even
been engaged in widespread harvesting of though they are ideologically closer, it makes
American metadata, such as the records of more sense to link to attack and critique the
phone calls (who you call, who calls you, party in power than an opposition party.
and how long you spoke for), shocked many This example helps underscore the impor-
Americans. Despite agency denials that this tance of metadata for exploring archived web
was not surveillance as it did not engage with material. At first, the inability to explore web
content, authors such as journalist Glenn archival data in a manner we are used to in
Greenwald convincingly hold that metadata traditional archives may seem like a down-
can actually be more revealing than the con- side, but increasingly by leveraging the struc-
tent itself. A single call, even if it were to be tured metadata within archives we are able to
transcribed and published, would not tell you quickly find material of relevance.
much about your life. However, a recurring Other projects have explored the potential
pattern over months or years, would begin of distant reading to explore web archives. In
to (Greenwald, 2014). The MIT Immersion the United Kingdom, for example, the Big
Lab’s tool at https://immersion.media.mit. UK Domain Data for the Arts and Humanities
edu/ also helps to illustrate this, as it takes a (BUDDAH) project sought to develop tech-
user’s Gmail account, extracts the to, from, nical and methodological approaches for the
cc, and timestamp fields, and begins to recon- historical uses of web archives. Partnering
struct your life based on social ties. with the British Library to ‘co-produce tools’,
A similar approach works well with web the team moved towards new access methods.
archives, as later chapters in this Handbook The ensuing Shine viewer (Figure 1.2) allows
discuss. Stevenson and Ben-David explore a user to see trends in a large web archival
this at depth, but here is an example. Recall the collection obtained by the British Library
Toronto Political Parties and Political Interest from the Internet Archive, which covers the
Groups collection, collected by the University .uk domain between 1996 and 2013. Consider
of Toronto between 2005 and today. There are the example query in Figure 1.2, which com-
some 14 million documents in total, too many pares the relative frequency of three prime
to read. Indeed, some domains consist of thou- ministers between these years. We can
sands of different pages. Using Warcbase, a see the blue line (Tony Blair) diminish in
HISTORIOGRAPHY AND THE WEB 11

Figure 1.2 Three Prime Ministers seen in the UK Web Archive’s Shine interface. Image used
with thanks to the British Library.

frequency, replaced by the red line (Gordon University used the Shine platform to provide
Brown) when he is prime minister, and finally access to a ten-year-old collection of Canadian
both are eclipsed by the yellow line (David political party and political interest group
Cameron). A user can click on the lines, and websites (webarchives.ca). The University
then be brought to a sample page containing of Toronto had been collecting these sites
that phrase (Jackson et al., 2016). since 2005, comprising some 50 domains of
These interfaces are not perfect. In the above all of Canada’s major political parties, minor
case, the samples are arranged in the order by political parties, and a somewhat nebulous
which they were crawled – earliest to last. assemblage of political interest groups cover-
While a laudable reaction to the ‘black box’ of ing topics as varied as campaigns to ban land
a search engine (like Google) that we may not mines, protect the Canadian environment,
understand, when working at scale we need or fight for social justice for Canada’s First
to begin to engage with ranking algorithms. Nations. Given the importance of this collec-
Eschewing relevance ranking is akin to doing tion, once we provided access to it, we had
research where archives are Twitter timelines – thousands of visitors after the media picked
ordered only by date, meaning we understand up the story. This underscores the power of an
the order they are in but at any scale it begins easy-to-use interface.
to overwhelm the human capacity to reason Shine, and other similar interests, speak to
and engage. Yet it can still build considerable the new methods that historians will need as
interest. In 2015, for example, our research they facilitate both distant and close reading.
team at the University of Waterloo and York Later in this book, my Handbook chapter on
12 THE SAGE HANDBOOK OF WEB HISTORY

GeoCities uses some of these approaches to and data directly through their keyboards,
show how they can be deployed in modern mice, and monitors. In 1998, the Journal of
historical research. The downside, however, the Association for History and Computing
is that not all historians are ready to change began to be published, complementing other
their technique. 1990s digital humanities-type events such
as the 1994 conference Hypertext and the
Future of the Humanities (Graham et al.,
The Third Wave of Digital History 2015). Since the early 2010s, we are now in
a third wave of computational history due to
This is not the first time that historians have several factors, notably decreasing storage
worked at scale, but it comes at a moment costs, the power of distributed cloud comput-
when the mainstream historical profession ing, the rise of digital preservation profes-
appears to be retreating from numbers, teams, sionals, and open-source software. Storage
and computers. Part of this lies with earlier is cheaper than ever before, we can put it to
historical follies. Indeed, we can see the con- good use thanks to all of the data that is being
temporary turn towards the digital humani- collected and made available to research-
ties as part of a ‘third wave of computational ers, and crucially we can harness computing
history’ (Milligan, 2012: 30). power to begin to access it (Milligan, 2012).
The 1960s and 1970s saw computers as Yet we can learn something from these ear-
indelibly associated with quantitative his- lier experiences with digital history. First, as I
tories (Anderson, 2008). These pioneer- have argued elsewhere, we need to recognize
ing computational historians relied on large the subjectivity of the tools that are designed
mainframe computers and stacks of punch- and used (Graham et al., 2015; Milligan,
cards, saw considerable advancements in the 2012). Claims towards scientific rigor alien-
realms of demographic history and studies ated other historians during the first wave
of large censuses, and generated arguments of computational history. The results from
foundational to our understanding of social a ‘distant reading’ algorithm over a web
mobility and migration today (Anderson, archive may appear to be objective, in that
2008; Katz, 1975). Others, however, brought the data and the algorithm combined will
hubris to their work, claiming ‘scientific’ produce the same answer. But the subjectiv-
rigor (Fogel and Elton, 1984). Perhaps the ity of tools is embedded in the machines that
height of this was the debate around Fogel we use: the decisions on how we structure
and Engerman’s Time on the Cross: The datasets, tokenize sentences, calculate cen-
Economics of American Negro Slavery, seen tral nodes in a network diagram, and beyond,
by critics as reducing the terrible, human all rely on generations of scholars as well as
experience of slavery to numbers and tables human agency. Just as importantly, even if
(Fogel and Engerman, 1974); by the late historians are basing their argument off the
1970s, this first wave of computational history exact same evidence, they will not draw the
was in retreat. Some of these earlier debates same historical conclusions or arguments.
tempered enthusiasm for computational his- Results always need interpretation. In short,
tories, which reappeared in the early 1990s Big Data is not better than earlier forms of
with the personal computing revolution. This exploration, simply different.
second wave of computational history was
marked by graphical user interfaces, word
processing, and the rise of the World Wide
Web and attendant scholarly networks such CONCLUSIONS
as H-Net. Instead of scholars needing to use
punchcards to interact with mainframe com- Historians are in some ways at a crossroads,
puters, they could interact with computers as we seek to grapple with the implications
HISTORIOGRAPHY AND THE WEB 13

of scale in the digital age. This also holds Web was launched, and over 20 years since
true for non-Web-based research, drawing on the beginning of widespread web archiving
databases and beyond (Milligan, 2013; with the Internet Archive. This will require
Putnam, 2016). Imagine a historian trying to a new toolkit, ethical sensitivity, and innova-
understand the rise of somebody like Donald tive approaches to access. In the Handbook
Trump, elected US president in late 2016. chapters that follow, we will explore the
Search interfaces can only get them so far, various dimensions that this rethinking can
especially if they rely on keywords; imagine play out.
the number of hits they would get if they put
his name into the equivalent of a Google
search, or newspaper database search, or
beyond. They would have no sense of what Note
was important, or what was not important. 1  WARC, or Web ARChive files, are the ISO-stan-
As Ted Underwood has noted, ‘in a database dardized file format that web archives use to
store their content. In a nutshell, they aggregate
containing millions of sentences, full-text
all the products of a crawl together with meta-
search can turn up twenty examples of any- data. The best way to understand a WARC file
thing’ (Underwood, 2014). In this dystopian is a tangible example. Imagine we are preserving
scenario, as the historian cannot read tens of a university’s website. Within it are potentially
thousands of hits on one keyword search in millions of files: HTML files, Word and PDF docu-
ments, JPG and PNG images, video, stylesheets,
one small archive alone, as they begin to put
and beyond. A WARC file can aggregate these
pen to paper, while they imagine they’ve resources with description, allowing them to be
chosen the right sources – the ranking algo- preserved and crucially accessed.
rithm is really writing the work.
To do this work will require algorithmic
and computational literacy. But imagine a
more sophisticated approach. They decide REFERENCES
to take part of a web archive, say a series of
blogs around a particular incident during the Aiden, E., and Michel, J.B. (2013) Uncharted:
Trump campaign; or a circumscribed corpus Big Data as a Lens on Human Culture. New
of media articles. Perhaps they use network York: Riverhead.
analysis to filter away extreme, unrepresenta- AlSum, A. (2015) ‘Reconstruction of the US
First Website’, Proceedings of the 15th ACM/
tive voices, focusing instead on pages that
IEEE-CS Joint Conference on Digital Libraries:
both mentioned Trump prominently and had 285–286.
many other pages (themselves trustworthy Anderson, I. (2008) ‘History and Computing’
based on linking patterns) linking to them. (http://www.history.ac.uk/makinghistory/
They consult the information they have about resources/articles/history_and_computing.
the programs they are using to explore this html). Accessed 22 June 2018.
material, recognize the biases in the archive Ben-David, A. (2015) ‘What Does the Web
and the algorithm, and begin to pare it down. Remember of Its Deleted Past?’ (https://
After this filtering process, perhaps they are webarchivehistorians.org/2015/09/07/
down to a manageable set of 500 web pages, what-does-the-web-remember-of-its-
which can be closely read. Spanning distant deleted-past/). Accessed 4 October 2016.
Ben-David, A. (2016) ‘What Does the Web
and close, hard work lies ahead but the histo-
Remember of Its Deleted Past? An Archival
rian at least has a rigorous pathway forward. Reconstruction of the Former Yugoslav Top-
None of this will be easy or straightfor- Level Domain’, New Media and Society,
ward. Our society has been grappling with 18(7): 1103–1119.
a profound medium shift with the advent of Blank, G., and Dutton, W.H. (2014) ‘Next Genera-
the World Wide Web. Historians will be no tion Internet Users: A New Digital Divide’, in M.
different. It has been over 25 years since the Graham and W.H. Dutton (Eds.), Society and
14 THE SAGE HANDBOOK OF WEB HISTORY

the Internet: How Networks of Information Isserman, M. (1987) If I Had a Hammer: The
and Communication Are Changing Our Lives. Death of the Old Left and the Birth of the
New York: Oxford University Press. pp. 36–52. New Left. New York: Basic Books.
Braudel, F. (1972) The Mediterranean and the Jackson, A., Lin, J., Milligan, I., and Ruest, N.
Mediterranean World in the Age of Philip II. (2016) ‘Desiderata for Exploratory Search
Berkeley: UC Press. Interfaces to Web Archives in Support of
Brügger, N. (2009) ‘Website History and the Scholarly Activities’, Proceedings of the 16th
Website as an Object of Study’, New Media ACM/IEEE-CS on Joint Conference on Digital
and Society, 11(1–2): 115–132. Libraries: 103–106.
Brügger, N. (2013a) ‘Web Historiography and Karampelas, G. (2014) ‘Stanford Libraries Unearths
Internet Studies: Challenges and Perspec- the Earliest U.S. Website’ (http://news.stanford.
tives’, New Media and Society, 15(5): edu/news/2014/october/slac-libraries-way-
752–764. back-102914.html). Acessed 4 April 2017.
Brügger, N. (2013b) ‘Historical Network Analy- Katz, M.B. (1975) The People of Hamilton,
sis of the Web’, Social Science Computer Canada West: Family and Class in a Mid-
Review, 31: 306–321. Nineteenth-Century City. Cambridge: Har-
Brügger, N. (2017) ‘Probing a Nation’s Web vard University Press.
Domain: A New Approach to Web History Kostash, M. (1980) Long Way from Home: The
and a New Kind of Historical Source’, in G. Story of the Sixties Generation in Canada.
Goggin and M. McLelland (Eds.), The Rout- Toronto: Lorimer.
ledge Companion to Global Internet Histo- LaCalle, M., and Reed, S. (2014) ‘Poster: The
ries. New York: Routledge. pp. 61–73. Occupy Web Archive: Is the Movement Still
Brügger, N. (2018 forthcoming) The Archived on the Live Web?’, presented at Digital Pres-
Web: Doing Web History in the Digital Age. ervation 2014, Washington, DC.
Cambridge: MIT Press. Levitt, C. (1984) Children of Privilege: Student
Deken, J.M. (2017) ‘The Web’s First “Killer Revolt in the Sixties: A Study of Student Move-
App”: SLAC National Accelerator Laborato- ments in Canada, the United States, and West
ry’s World Wide Web Site, 1991–1993’, in N. Germany. Toronto: University of Toronto Press.
Brügger (Ed.), Web 25: Histories from the Lin, J. n.d. ‘My Data Is Bigger Than Your Data’
First 25 Years of the World Wide Web. New (http://lintool.github.io/my-data-is-bigger-
York: Peter Lang Publishing. pp. 57–78. than-your-data/). Accessed 22 June 2018.
Duggan, M. (2015) ‘The Demographics of Mayer-Schönberger, V., and Cukier, K. (2013)
Social Media Users’ (http://www.pewinter- Big Data: A Revolution That Will Transform
net.org/2015/08/19/the-demographics-of- How We Live, Work, and Think. Boston:
social-media-users/). Accessed 10 December Eamon Dolan/Houghton Mifflin Harcourt.
2015. Milligan, I. (2012) ‘Mining the “Internet Grave-
Fogel, R.W., and Engerman, S.L. (1974) Time yard”: Rethinking the Historians’ Toolkit’,
on the Cross: The Economics of American Journal of the Canadian Historical Associa-
Negro Slavery. New York: W. W. Norton & tion, 23(2): 21–64.
Company. Milligan, I. (2013) ‘Illusionary Order: Online
Fogel, R., and Elton, G. (1984) Which Road to Databases, Optical Character Recognition,
the Past?: Two Views of History. New Haven: and Canadian History, 1997–2010’, Cana-
Yale University Press. dian Historical Review, 94(4): 540–569.
Gitlin, T. (1987) The Sixties: Years of Hope, Milligan, I. (2014) Rebel Youth: 1960s Labour
Days of Rage. New York: Bantam Books. Unrest, Young Workers, and New Leftists in
Graham, S., Milligan, I., and Weingart, S. English Canada. Vancouver: UBC Press.
(2015) Exploring Big Historical Data: The Milligan, I. (2016) ‘Lost in the Infinite Archive:
Historian’s Macroscope. London: Imperial The Promise and Pitfalls of Web Archives’,
College Press. International Journal of Humanities and Arts
Greenwald, G. (2014) No Place to Hide: Edward Computing, 10(1): 78–94.
Snowden, the NSA, and the U.S. Surveillance Millward, G. (2015) ‘I Tried to Use the Internet
State. New York: Metropolitan Books. to Do Historical Research. It Was Nearly
HISTORIOGRAPHY AND THE WEB 15

Impossible’, Washington Post (https://www. Information and Communication Are Chang-


washingtonpost.com/posteverything/ ing Our Lives. New York: Oxford University
wp/2015/02/17/i-tried-to-use-the-internet- Press. pp. 164–176.
to-do-historical-research-it-was-nearly- Taylor, N. (2014) ‘The MH17 Crash and Selec-
impossible/?utm_term=.77cd9f605120). tive Web Archiving’ (https://blogs.loc.gov/
Accessed 19 February 2015. thesignal/2014/07/21503/). Accessed 12
Moretti, F. (2007) Graphs, Maps, Trees: Abstract April 2017.
Models for Literary History. New York: Verso. Thelwall, M., and Vaughan, L. (2004) ‘A Fair
Mosby, I. (2014) Food Will Win the War: The History of the Web? Examining Country Bal-
Politics, Culture, and Science of Food on ance in the Internet Archive’, Library and
Canada’s Home Front. Vancouver: UBC Press. Information Science Research, 26(2):
Nanni, F. (2016) ‘Reconstructing a Website’s 162–176.
Lost Past: Methodological Issues Concerning Tsukayama, H. (2013) ‘CERN Reposts the
the History of www.unibo.it’, World’s First Web Page’, Washington Post
ArXiv160405923. (https://www.washingtonpost.com/business/
Owram, D. (1997) Born at the Right Time: A technology/cern-reposts-the-worlds-first-
History of the Baby Boom Generation. web-page/2013/04/30/d8a70128-b1ac-
Toronto: University of Toronto Press. 11e2-bbf2-a6f9e9d79e19_story.html).
Putnam, L. (2016) ‘The Transnational and the Accessed 10 June 2014.
Text-Searchable: Digitized Sources and the Underwood, T. (2014) ‘Theorizing Research
Shadows They Cast’, American Historical Practices We Forgot to Theorize Twenty
Review, 121(2): 377–402. Years Ago’, Representations, 127(1): 64–72.
Rosenzweig, R. (2003) ‘Scarcity or Abundance? Webster, P. (2014) ‘Reading Creationism in the
Preserving the Past in a Digital Era’, Ameri- Web Archive’ (https://peterwebster.
can Historical Review, 108(3): 735–762. me/2014/11/18/reading-creationism-in-the-
Schneider, S.M., and Foot, K.A. (2004) ‘The web-archive/). Accessed 23 May 2017.
Web as an Object of Study’, New Media and Webster, P. (2017) ‘Religious Discourse in the
Society, 6(1): 114–122. Archived Web: Rowan Williams, Archbishop
Schroeder, R. (2014) ‘Big Data: Towards a More of Canterbury, and the Sharia Law Contro-
Scientific Social Science and Humanities?’, versy of 2008’, in N. Brügger and R.
in M. Graham and W.H. Dutton (Eds.), Schroeder (Eds.), The Web as History.
­Society and the Internet: How Networks of London: UCL Press. pp. 190–203.
2
Understanding the Archived
Web as a Historical Source
Niels Brügger

Like any other historical study, web histories party’s present politics.1 In other cases con-
should be based on a variety of source types tent is changed or deleted as part of institu-
such as written documents, print and elec- tional routines, such as is seen with online
tronic media, oral histories, and the web news media outlets, and sometimes things
itself. As is the case with all source types, the are just taken off-line without much reflec-
online web must have been collected by tion about why this is done.
someone – an archiving institution, a This rapid changeability is an everyday
researcher, or any other person – and it has to experience for web users that they have prob-
be preserved so that it can later be made ably learned to live with, but to those who
available to historians. have taken on the task of collecting and pre-
However, the online web changes rapidly. serving the web for future studies the speed
The lifetime of web content has been debated of change constitutes a great challenge.
since at least the late 1990s, and there seems Therefore, much of the online web from the
to be agreement that large portions of the web first formative years has been lost, which is
are very ephemeral (cf. e.g. Cho and Garcia- a well-known phenomenon within media
Molina, 1999; Ntoulas et al., 2004; Agata history where it often takes decades until
et al., 2014; Jackson, 2015). Web content the technical, organisational, and economic
may have been changed, moved, or deleted conditions are in place to launch preservation
for various reasons, and sometimes in the initiatives. Nevertheless, large portions of
hope of not being found out. For instance, the online web have been collected since the
in 2013, news media reported that the UK mid 1990s and are now available in a great
Conservative Party had deleted more than a number of web archives all over the world,
decade’s political speeches from its website, which enables web historians to document
some of which might not be in line with the and study the changes that have taken place;
UNDERSTANDING THE ARCHIVED WEB AS A HISTORICAL SOURCE 17

for instance, the UK Conservative Party’s setting out to investigate the possible use of
political speeches had been archived by the the archived web it is pivotal to identify the
British Library.2 archived web’s digitality, that is its specific
As will be shown in this chapter, the ways of being digital.
archived web is in many ways fundamentally
different from other digital sources, such as
digitised documents, print and audio-visual Forms of Digitality: Digitised,
media, and even online media, and therefore Born Digital, Reborn Digital
web historians have to become familiar with
this type of source, its characteristics, and Despite the fact that digital media are differ-
how these characteristics impact its schol- ent because they each come with their own
arly use. In short, a critical approach to the digitality, they also share some features. An
archived web is needed. important criterion for determining a digital
This chapter outlines some key elements in medium’s digitality is how it became digital,
a theoretical and methodological framework and therefore grouping digital media based
for doing web history based on the archived on the provenance of their digital nature is
web as a historical source. To provide a start- reasonable. Based on this approach, three
ing point for the identification of the specific main types can be distinguished: digitised,
nature of the archived web, a brief outline is born digital, and reborn digital material
drawn of the wider landscape of digital media (Brügger, 2012: 104).
and what is termed their ‘digitality’. Before Digitised material is material that has pre-
digging deeper into the characteristics of the viously existed in a non-digital form, but has
archived web, the chapter investigates what been transformed to become digital. The non-
characterised the online web that was archived digital original can, for instance, be hand-
as well as the process of web archiving where written documents, print media, or electronic
the online web became the archived web. The audio-visual media, and they may have been
stage is then set to outline the nature of the digitised in a number of ways, from being
archived web, compared with the online web keyboarded and transferred to punched cards
and other types of archived documents and to the scanning of document prints and pho-
media.3 Finally, some of the main impacts tos to image files, or the digital recording of
that the nature of the archived web has on analogue sound and moving images.
how it can be used as a scholarly source by Born-digital material has never existed
the web historian are discussed.4 in any other form than digital. This type of
material was created for and only made avail-
able on digital media such as CD-ROMs,
DVDs, or computer networks.5 Therefore
DIGITALITY AND THE ONLINE WEB this type of material does not have any non-
digital original to go back to.
The point of departure of this chapter is that Reborn-digital material is born-digital
all digital media come with their own ‘digi- material that has been collected and pre-
tality’, that is their own way of being digital served, and that has been changed in this
(cf. Brügger, 2018). A digital medium’s digi- process to such a degree that it is not identi-
tality sets up an array of possible ways of cal to the born-digital of which it was made.
interacting with the medium and with the This could be an emulated computer game,
textual and semiotic systems that the medium a screen film of an app, or material in a web
enables (for a general discussion of the ‘rela- archive.
tively fixed features’ of a medium, see Distinguishing between these three spe-
Meyrowitz, 1994: 50). Therefore, when cific types of digital material is important
18 THE SAGE HANDBOOK OF WEB HISTORY

because the digitality of each type has a deci- To highlight this point one can compare the
sive impact on how it can be interacted with online web to a digitised newspaper. They
in the processes of collecting and preserving both come with a visible side – what can
it as well as making it available, for instance immediately be seen when displaying the
to researchers who want to use it as a histori- document – but they differ significantly when
cal source. it comes to the hidden text ‘below’ this visible
surface. The digitised newspaper usually con-
sists of an image file (bitmap or similar) that
Characteristics of the Online Web is then immediately interpreted as a newspa-
per page when displayed, whereas the online
When investigating the archived web, it is web page has an extra layer of HTML and
often overlooked that it was created on the attached files. Digitised image files may
basis of the online web. The archived web is become enriched with character recognition
reborn at the nexus of the online web and the (OCR), but in contrast to the online web this
chosen archiving forms and strategies (cf. is not an indispensable and inherent part of
below), and therefore a number of the main the document since it is added after digitisa-
characteristics of the online web influence tion, and the information it conveys is not as
the nature of the archived web. Or, to use the rich and complex as is the case with the
vocabulary above: the digitality of the hidden web text.
archived web is a function of the digitality of
the online web as it is combined with specific The web is born fragmented
web archiving forms and strategies. Any The second characteristic of the web is its
form of the online web shares the following fragmented nature. What is presented to the
three characteristics: it is born with two web user in the web browser window appears
semiotic layers, it is fragmented, and it is to be a unified web page, but as mentioned
hyperlinked. above this apparent unity hides the fact that
the page comprises patched-together frag-
The web is born with two ments. The HTML file is in itself a structured
semiotic layers patchwork of bits and pieces, including tex-
The web consists of two distinct semiotic tual content and commands, and the various
layers: on the one hand, the text that the web files that are retrieved also constitute indi-
user sees in a browser window on the screen vidual entities that have to be combined with
(or hears in the speakers), and, on the other the rest to form a coherent web page.
hand, the text that enables the presentation of Digitised collections can also come as
the semiotic elements on the screen, namely partly fragmented – for instance if parts of
the text written in an HTML file (code and the text have been marked with a mark-up
textual content) as well as the elements that language – but these fragments are not there
are requested and retrieved from one or more from the outset as an inherent part of the
web servers. When the web user enters a web material, as is the case with the online web.
address – a URL – in the location bar of a
web browser (or clicks a hyperlink), a request The web is born hyperlinked
is sent to the relevant web server, which then Finally, the web is born hyperlinked, since it is
returns the requested HTML document as possible to connect any fragment of text on one
well as the files to which the HTML docu- computer with any other semantic entity on the
ment may point (image files, audio or video same or on another computer in the computer
files, feeds, etc.), and when all the requested network.7 One could argue that hyperlinks are
material is received it is interpreted in the web not necessary for the web, which may in prin-
browser and displayed in a browser window.6 ciple be correct, but in practice a web without
UNDERSTANDING THE ARCHIVED WEB AS A HISTORICAL SOURCE 19

hyperlinks would not work; for instance, one An extended version of preserving the web
would then have to write exact web addresses as an image file is to make a screen movie,
in the location bar every time one wanted to go that is, filming what happens on the screen
to another web page. when a user moves around on a website,
As was the case with the two semiotic lay- clicks hyperlinks, watches streamed video,
ers and the fragmentation of the web, the use etc. When making a screen movie there is no
of the hyperlink also differs from what is the direct connection to a web server, and there-
case in a digitised collection. Hyperlinks may fore what is archived is not a function of a
be added to digitised newspapers, but they are hidden HTML file, but rather of the move-
not an inherent part of the material since they ments made by the individual who records
are added later, and they need not be added. the screen.
Another simple form of web archiving is
the downloading of individual files from the
Web Archiving web, be that audio or video files that have
been part of a web page, or files with only
Knowledge about the inherent properties of parts of an HTML file such as hyperlinks.
the online web is only one element in the The most complex form of web archiving
understanding of the archived web, since the is web crawling. Web crawling is a sophis-
archived web is constructed as a combination ticated, systematic, and scalable version of
of the online web and the different forms and the collecting of individual files, basically
strategies used when collecting and preserv- because the process can be automated and it
ing it. Therefore, it is important to have a scales to large amounts of web material. A
closer look at web archiving. As shall be list of the web addresses one wants to col-
shown in the following, the process of col- lect is created (a so-called seed-list), and the
lecting and preserving the web is in many archiving software goes through this list by
cases more complex, complicated, and contacting the listed web servers one by one,
opaque than is the case with establishing a retrieves the web pages in question, strips
digitised collection. all the hyperlink information on each web
page, retrieves the material to which these
Forms of collecting and hyperlinks point, strips the hyperlink infor-
preserving the web mation on these web pages, etc. And the
The online web may have been collected and software continues with this iterative pro-
preserved in various ways. The most wide- cess as far away from the starting points as
spread archiving form is web crawling, which the web archivist has asked for. The result of
is used by most major national and interna- web crawling is a collection of HTML files
tional web archiving initiatives, but it is also and the files that make up a given web page,
possible to use other forms. Since web histo- whether they are saved as they are retrieved
rians may have to combine the archived web or aggregated into dedicated file formats such
from different collection types, a broad as ARC or WARC. Thus, in contrast to the
understanding of web archiving is preferable. first three mentioned methods of web archiv-
Thus, web archiving can be understood as ing, web crawling is inextricably linked with
any form of deliberate and purposive collect- the online web’s hidden HTML code, to its
ing and preserving of web material. fragments, and to its hyperlinks, in brief to
The simplest way of preserving a web page the three characteristics of the digitality of
is by making an image, either in the form of the online web. And for the very same reason
a screenshot or by the use of dedicated soft- it is based on a wide range of settings, for
ware that contacts a web server and trans- instance it has to be specified how many lev-
forms the rendered web page to an image file. els away from the original URL the software
20 THE SAGE HANDBOOK OF WEB HISTORY

is allowed to follow hyperlinks, which spe- However, no matter how this was done the
cific file types should be included/excluded, web archivist was facing at least five chal-
if the web crawler is to remain within the lenges that all influence the later use of the
limits of a particular web domain, when the archived material.
archiving should stop, etc. Thus, because web The first challenge is that in most cases
crawling is based on following hyperlinks, there exists no stable original on the web
and because of the large number of possible against which the quality of the archiving can
and required settings, it can be maintained be checked. Unlike a digitised newspaper or
that to a large extent the archiving actor does broadcast programme, the online web may
not know exactly what is archived.8 – or may not – be the same when quality is
checked as when it was archived, given the
Strategies of web archiving possible rapid speed of change of the web.
One of the main challenges for the individual Second, compared with most other forms
or institution who wants to collect and pre- of collecting and preserving, including dig-
serve the online web is that in most cases it is itisation, the choices of ‘what’ and ‘how’
not possible to archive the entire web in all to preserve are embedded in a much more
dimensions, exactly as it looked when online. complex, unsystematic, and less transparent
Therefore, strategies have to be used to col- environment. When digitising a collection of
lect as much as possible of what is intended newspapers or radio programmes, problems
to be collected.9 such as brightness or file compression have
When considering the different possible to be addressed, but although these choices
archiving strategies, the archiving actor had influence the outcome, the number of options
to decide how the strategy should relate to is limited and their possible scope is transpar-
three variables – space, time, and possible ent, and if any other actor proceeded in the
use – by placing the archiving on a contin- same way, the result would likely be identi-
uum in relation to each variable. As for space cal. In contrast, the combination of the online
the strategies either aim at archiving as much web’s complex digitality and the many forms
as possible of the web, or they are designed and strategies for archiving, as well as their
to archive specific fractions of the web. With mutual calibration, multiplies the number of
regard to time, the strategy must address the possible options. And in particular regard-
question of the archiving frequency: either the ing web crawling, the ‘what’ tends to be
archiving is continuous, or it focuses on cer- obscured by the ‘how’, since one may have
tain delimited points in time. Finally, regard- a clear idea about ‘what’ to archive at the
ing the possible use of what is archived, the outset, but as a consequence of the multiple
archiving actor must reflect on whether the choices of the ‘how’, either one does not get
later use is either unknown or a well-known what was intended, or one gets more than
and precise use case. Thus, the strategies used what was expected.
for archiving the web can be placed at differ- A third challenge that the web archivist
ent cross-sections of a grid with all/fractions, had to struggle with in the past is related to
continuous/once, and unknown/known use. what can be termed the dynamics of updat-
ing (Schneider and Foot, 2004: 115; Brügger,
Challenges of web archiving 2005: 22–4). As stated above, the web
When an archiving actor in the past set out to changes rapidly, but updates are not necessar-
collect and preserve the web, the overall ily predictable and regular. Compared with
challenge was how the relations between the printed or electronic media, the temporality
concrete form of the online web at the spe- of the web is not that of an entity published
cific period, and the different archiving forms on a regular basis, like daily newspapers or
and strategies at hand should be negotiated. a scheduled radio and television programme.
UNDERSTANDING THE ARCHIVED WEB AS A HISTORICAL SOURCE 21

This implies that what was archived at the On the other hand, web crawling has only
beginning of one archiving process may have preserved the HTML and possibly integrated
changed as the process progresses. As if, files or services (images, graphics, a feed, etc.).
when page 10 was scanned, page 2 was sud- Therefore, what is collected with web crawl-
denly changed, or halfway through the digiti- ing does not mirror what users of the past saw
sation of a radio programme, the beginning in their browser as such, but rather the HTML
was altered. code found on the web server in an HTML
The fourth challenge to the web archive file, as well as the files that were linked to and
actor was that things are likely to have gone that were eventually interpreted in the user’s
wrong during the web archiving process. browser to form the displayed web page.
Even with simple archiving forms like saving Therefore, the web archivist was facing
the web as images, the software may not have the challenge of either preserving a given
processed the web page correctly. And with web entity exactly as it appeared online (or
more complex web archiving forms such as close to), while at the same time excluding
web crawling, the number of error sources the HTML code for good, or preserving the
grows significantly. In contrast, if errors HTML code and other elements that can later
occur during digitisation, they are generally be patched to get as close as possible to what
systematic and recurring. once was online.
Finally, the fifth challenge revolves around
the chosen archiving strategy and the relation-
ship between time and space in the different The Archived Web as
strategies. Regarding space, the biggest chal- a Historical Source
lenge is that web archiving takes time, which
means that the closer the strategy is to ‘all’, the When a web historian finds the archived web
longer the archiving will require since it is not and intends to use it as a historical source, it is
possible to archive ‘all’ at the same time. But important to critically reflect on how the online
the closer the strategy approaches ‘fractions’, web of the past has been transformed to the
the less the problem becomes. Regarding time, archived web that the web historian is interact-
the closer the archiving strategy is to ‘continu- ing with. In short, the web historian needs to
ous’, the more space is a problem because the start thinking about how the web archivist’s
archive cannot archive large amounts of the approach to the challenge of combining the
online web continuously. Finally, with regard online web’s digitality with specific archiving
to the possible scenarios for the use of what forms and strategies has affected the result, and
is archived, the less that is known about the how this may impact the next steps in the
possible use, the harder it is to choose the right research process. This section shall present
forms and strategies for archiving. some of the main characteristics of the archived
In addition to these five challenges that web as it can be found in a web collection,
cut across forms and strategies, the archiving whereas the possible impact on the research
actor has had to face challenges that are spe- process is debated in the next section.
cific to how each of the different archiving
forms handle the archiving of the three fixed Constitutive characteristics of
features of the online web’s digitality. the reborn web
On the one hand, archiving in the form of a No matter the choice of archiving forms and
picture or a screen movie has only preserved strategies, the following four constitutive
what was visible on the computer screen or in characteristics apply to the archived web.
the browser window while the HTML is not They are considered constitutive since they
archived. In this case what you see is what are a function of the archiving of the digital-
you get. ity of the online web as such.
22 THE SAGE HANDBOOK OF WEB HISTORY

Lack of an original copy that is much closer to being an identical


The online web changes fast, and over time it copy of an original.
is very likely that the original that was once
online has disappeared or changed. Therefore, Temporal and spatial inconsistency
if things are missing or are not functioning between the archived fragments
correctly in an archived web entity, one cannot In their online form the many bits and pieces
expect to return to the online web to verify the of the web are always present at the same
original. The absence of a stable original is time and in the same space. The source and
one of the most important differences between target of a hyperlink are there at the same
the digitised collection and the archived web. time, and a given website has the extension it
has at any given moment. The online web is
Incompleteness consistent with regard to time and space, but
Incompleteness is inherent of any collection. this is not the case with the archived web.
But the incompleteness of the archived web The possible temporal inconsistency affects
comes in different forms and for various rea- all instances of the archived web where
sons compared with other collections, includ- hyperlinks are involved since the source of
ing digitised collections. By and large the the hyperlink and its target may not have
degree of completeness of a digitised collec- been archived at the same time. Considering
tion can be assessed systematically, as the that the hyperlink is an inherent element in
causes of possible shortcomings are more the digitality of the online web, the temporal
transparent, but it is very difficult to assess inconsistency is very likely to be a widespread
the degree of (in)completeness of the phenomenon in a collection of archived web,
archived web compared with what was online in particular where strategies close to archiv-
with the same level of systematic approach. ing ‘all’ ‘continuously’ have been used. This
Due to an opaque combination of the chosen possible lack of temporal consistency between
archiving strategy, deliberate omissions, link source and target is not seen in a digitised
updating during the archiving, and archiving collection, simply because hyperlinks are an
errors, the web historian using a collection of optional and controlled complement.
the archived web should expect that things The possible spatial inconsistency is caused
are missing one way or another. by the fact that all web entities were not neces-
sarily archived with the same spatial extension,
A unique version and not a copy which can happen because of deliberate choices
Compared with what was once online, the to discard specific parts of the web during the
archived web is best understood as a unique archiving, because of unexpected problems, or
version and not a copy on a 1:1 scale. The because parts of the web have been deleted or
archiving forms and strategies combined with moved during the archiving process. As if, in a
the fact that things may be missing implies digitised newspaper collection, the size of the
that the same online entity that was collected newspapers was random, some of the copies
by two different archiving actors may very had only the cover, others only pages 2 and 4,
well be different versions instead of identical and yet others had all pages.
copies of what was online. In addition, each
version is only one version among other ver-
sions, and it is difficult to maintain that one of Specific Characteristics of
them constitutes an original. This uniqueness
the Reborn Web
of each archived web entity is different from
what usually happens with a digitised collec- All archived web instances share the constitu-
tion, where the result of the digitisation pro- tive phenomena described above to some
cess to a larger extent can be considered a extent, but also each of the different types of
UNDERSTANDING THE ARCHIVED WEB AS A HISTORICAL SOURCE 23

collection may have specific properties that appear like what it looked like when it was
are a function of how the content of the collec- browsed in the past. A crawled web collec-
tion was created. As mentioned above, a fun- tion is therefore malleable and there is no
damental distinction can be made between predetermined way to make the fragments
what is actually collected and preserved, and available.
as a consequence a given web collection may The web collection owner may decide to
display what was photographed or recorded at re-assemble the fragments in a way as close
the time of archiving, much like ‘What you as possible to what the web looked like in a
see is what you get’, or the collection has to browser when it was online, which is what
re-assemble the archived fragments in a mean- is done in the Wayback software, the replay
ingful way to enable the display or otherwise software that was developed by the Internet
provide access to the collection’s holdings. Archive and is used by many major web
archives. The web archive may also give
‘What you see is what you get’ access to hyperlink information whereby the
Screenshots do not come with any other imme- web collection would not look like anything
diate form of interaction than to look at the seen in a web browser, but would rather have
image, and moving web elements are not the form of a file with information about link
included, nor do hyperlinks work. Individual source and target. Or image files could be
web pages archived with dedicated software singled out to support image analysis, and the
are also still images and therefore do not show collection would then have the form of a col-
moving elements, but hyperlinks may work, lection of this specific file type. In any case it
although they point to the online web and not is difficult to say which way best reflects the
to another archived entity; and it is possible to web that was once online. And the possibili-
scroll up and down to view the entire web page. ties are almost endless.
Since screen movies are movies, they allow In contrast to a digitised collection where
for moving backwards or forwards. But inso- textual fragments such as mark-up or OCR
far as they mirror the movements of the indi- are only added at a later stage, if at all, the
vidual who initially created the movie, it is crawled web collection is reborn as frag-
only possible to move around in the archived mented and marked-up and therefore it can
web as filmed. Video may be part of the film, be made available in more – and more differ-
and hyperlinks may have been clicked, but in entiated – ways than is possible to do with a
both cases these features can only be inter- digitised collection. However, this high level
acted with as part of the movie and not as of flexibility comes at a price, since the col-
autonomous entities. lection’s volatile nature also makes it a more
unstable source, compared with a digitised
‘What you get is what can be collection where one file very often equals
assembled’ one copy of what was initially digitised. In a
A collection of crawled web can best be crawled web collection, what is stable is each
understood as a giant bucket with billions of the fragments, but not their re-assembling.
and billions of individual, but potentially When it comes to completeness, a collection
interlinked, files. This implies that the web of crawled web is different from a digitised
archive can decide to take files (or parts of collection as well as the online web, because
them, such as specific code lines in an it is incomplete and too complete at the same
HTML file) out of the bucket again and com- time. On the one hand, there is often too little
bine them in a great variety of ways, some of in the collection, basically because all the frag-
which may even be different from when the ments that were initially online may not have
bits and pieces were collected, including come into the archive, among others because
ways that by no means make the material of the choice of archiving form and strategy,
24 THE SAGE HANDBOOK OF WEB HISTORY

the dynamics of updating, and technical defi- the archived web, and they should be acknowl-
ciencies. On the other hand, there may be too edged as such by any web historian who
much in the collection, because there may exist intends to use the archived web as a historical
more versions of ‘the same’ that may be close source. Furthermore, they should serve as the
to identical, without being exactly identical; point of departure for a critical assessment of
one reason for this is that more crawled hyper- how they unfold in the specific collection(s)
links may have pointed to the same web entity that the web historian intends to use, and
that has then been archived several times, but thereby of how they can possibly impact the
often not at the exact same time. research process, from searching and select-
The fact that fragments are being placed ing material to creating a corpus to study.
in the same bucket in a crawled web col-
lection also affects temporal inconsistency,
but in different ways, depending on how the No Original to Go Back to,
fragments are made available. One example Incompleteness, and Unique
is the Wayback Machine, where the temporal Versions
inconsistency comes in an almost impercepti-
ble form, but nevertheless it is there. As pre- The major challenge for any web historical
viously mentioned on the online web, a web project based on the archived web is the fun-
page is patched together of bits and pieces damental uncertainty as to the character of
retrieved from a web server at the time they the archived web. As we have seen, this is an
were requested, and something similar hap- effect of the combination of the lack of an
pens in the Wayback Machine, except that original, that things are very likely to be
the bits and pieces are not retrieved from an missing, and that what are found in a web
online web server, but from the web archive’s collection are unique versions. And each of
own collection. But if all the needed bits are these uncertainties is aggravated because of
not there from the same date and time as the the opacity of what is actually the case, often
web page itself, the Wayback Machine’s soft- due to lack of documentation.
ware retrieves the missing elements from a Incompleteness in the source material is as
time as close as possible to the time of the such familiar to any historian, and it is always
web page. Since this can be a question of relative to the concrete research aim, since it
days, weeks, or even months, the web page is only a problem if what is missing was to be
that the user is looking at may be patched included in the study at all. Nevertheless, the
together of fragments from different points specific challenges related to web archives
in time: a two-day-old banner ad, an image are different from those related to other col-
from the following week, etc. (cf. Hockx-Yu, lection types, since the main problem is not
2015). What appears to be a temporally ‘flat’ only a possible lack of (parts of) sources, but
and consistent web page with only one tempo- rather that it can be very difficult to evalu-
rality (‘now’) may hold several invisible tem- ate to what extent the available versions are
poralities, thus including a temporal drift that identical or not to what was online in the past.
stretches back and forth in time, whereby it Although the digitality of the online web
becomes temporally inconsistent as a whole. is one of the reasons that one can sow doubt
about the status of the archived web, it also
opens up new possibilities that may even help
THE ARCHIVED WEB’S POSSIBLE to remedy the above challenges, at least to
IMPACT ON THE SCHOLARLY USE some extent. What is missing may have left
digital traces; for instance, HTML files may
The characteristics outlined above constitute provide valuable information about what
some of the main elements in the digitality of should have been displayed – for instance, a
UNDERSTANDING THE ARCHIVED WEB AS A HISTORICAL SOURCE 25

file type extension such as .img tells us that In both cases there are no good solutions
an image should have been displayed, and to overcome these challenges, but only ways
files may even have names that reveal their of handling and minimising the problems
content. in ways where one has to balance the need
When it comes to corpus creation – that for consistency against the need for having
is, the process of identifying and extracting something to study. When creating a corpus,
selections of a web collection – the presence the following general rule applies regard-
of ‘too much’ of the same may constitute a ing temporal inconsistency: one can select
challenge. When constructing a corpus from either a short period of time, which will
a web collection this has to be done in two reduce the possible temporal inconsistency,
steps: first, an initial corpus is delimited in while probably also reducing the amount of
time and space, and then the versions that material to be studied. Or a longer interval
should be included have to be identified and can be chosen, which will raise the possi-
selected. bility of temporal inconsistency, while prob-
ably also raising the amount of material
to be included. How this trade-off shall be
Possible Temporal and Spatial negotiated depends on the concrete research
Inconsistency Between the project and on a critical assessment of what
is available in a given collection. As to the
Archived Fragments
spatial inconsistency, it is not possible to
The temporal and spatial inconsistencies proceed as with the temporal inconsistency,
between the archived fragments that are in the main because of the lack of some-
potentially inherent to any collection of the thing similar to a timestamp that could help
archived web constitute a challenge to stud- determine the spatial extent of the individual
ies that intend to focus on relations between web entities.
fragments, which de facto usually means
including hyperlinks in the study.
If a project aims to study the hyperlink net- Two Collection Types and their
work between images on a number of web- Impact on Web History
sites, based on a collection of websites that
spans several weeks, it is very likely that link In addition to the above-mentioned constitu-
sources and link targets are not archived at tive characteristics’ impact on the web histo-
the same time, or that some link targets are rian’s use of web collections, it is also
not archived at all. In the analysis this will important to reflect on how the two types of
have two implications: first, that linked-to collections previously identified affect the
pages or images may have changed or disap- research process.
peared, and, second, that if the hyperlinks on
the linked-to page are also to be included in Studying a ‘what you see is
the analysis as link sources pointing to yet what you get’ collection
other link targets, the latter targets may have First, source material such as screen images
been archived even later, and so on.10 or movies is not easy to search in any system-
A similar phenomenon may happen in atic and detailed way. Files with individual
terms of spatial extension, if the study is web pages may be searched individually, but
based on a collection where some websites as for the rest, the only way of finding rele-
were archived in depth, including all levels vant material is to go through all the sources
below the front page, whereas on other web- manually. In addition, screen movies can be
sites only the front page or maybe a couple of challenging to navigate since they follow the
levels of the website were archived. movements of the individual who did the
26 THE SAGE HANDBOOK OF WEB HISTORY

filming. Second, combining images of web that have to be assembled. One way of doing
pages to form an entire website is challeng- this is the Wayback Machine. The use of the
ing to do in a consistent way, and just getting Wayback Machine affects the steps of the
an overview of the pages’ interrelations may research process in various ways. The first
be difficult, in the main because clickable thing is search, and here the main challenge
hyperlinks are missing. Third, in cases where is that as such the Wayback software does
hyperlinks are to be studied, it is a great dis- not allow for other entry points than the web
advantage that the code level is not available. address (URL), and if the researcher does not
Fourth, in terms of possible forms of analy- know the URL it is impossible to find the rel-
sis, the most obvious way of analysing this evant material to study. However, some web
type of material is a traditional manual docu- archives have created indexes that allow for
ment/image analysis. Finally, because of the full text search. Once the relevant web pages
manual approach in both search and analysis, are found, replayed by the Wayback software
research projects based on images and they are shown one by one, each with the
movies do not scale well. possible temporal inconsistency previously
Despite these challenges, screen dumps, described. Obviously, the doubt as to whether
files with individual web pages, and screen the web page embeds a temporal inconsist-
movies also come with some important ency or not is a challenge to any researcher
advantages. First, and most importantly, the wanting to make claims about what a given
images and movies are (by and large) show- web page exactly looked like in the past. But
ing exactly what the web entities looked like, the possible remedy may be nearby, because
thus providing the ‘look and feel’ of the past with crawled web the HTML code is avail-
web, without any possible temporal incon- able and it is therefore possible to check
sistencies in the replay. Second, creating a the timestamps of each individual web ele-
corpus and preserving it is relatively straight- ment to find out the extent of temporal drift.
forward and can simply be done by creating However, despite the possible inconsisten-
the needed folders on a computer desktop. cies, the many possibilities offered by the
Third, although the form of the material tends Wayback Machine’s presentation should also
to enhance reading the text as one would do not be forgotten, most notably that research-
in a non-digital medium, automated textual ers have access to a browsable web page with
analyses cannot be ruled out, since screen working hyperlinks, although the challenges
shots are bitmaps and they can therefore be are just below the surface, since hyperlinks
enriched with OCR (cf. Cocciolo, 2015). may very well make the user jump in time
by each click. If a corpus of web pages has
Studying a ‘what you get is what been identified in the Wayback Machine then
can be assembled’ collection the next research step would be to analyse
If the web historian intends to use a collec- the material, and if focus is only on the vis-
tion of crawled web, this raises new chal- ible side, historiographical approaches and
lenges and opens up other possibilities. methods that are usually used for this type
For the web historian who wants to analyse of source can be used. However, since the
the visible side of the web, the fundamental sources come in the form of individual web
challenge with a crawled web collection is pages, as seen in the Wayback Machine,
that the visible web has not been archived as the scale of the analysis probably has to be
such. The researcher does not have immedi- limited.
ate access to what the web of the past actu- But the researcher may not want to ana-
ally looked like. On the contrary, what is lyse the visible side at all. The source of such
accessible is the previously mentioned giant a study would then be the hidden text only,
bucket of possibly interlinked bits and pieces without any interest in what the web actually
UNDERSTANDING THE ARCHIVED WEB AS A HISTORICAL SOURCE 27

looked like when online in the past. This archived web collection may look like a dig-
could, for instance, be a study of the hyper- itised collection or the online web, it is fun-
link network, the number of specific file damentally different from both, and shall
types, streamed video, or the use of blog soft- thus be approached differently.
ware (see Brügger and Schroeder (2017) for Therefore, it is important to become
examples of this type of study). The phases familiar with the specific digitality of the
of research look different when studying collection one uses by providing as much
the hidden side of the web only. Searching information about provenance as possible,
in the non-visible texts is often not possible, and by digging more into how the general
but if the possibility exists it is usually in the characteristics of the archived web play out
form of access to a file with the index that in each collection. Then it is possible to make
the Wayback software uses when looking up as informed choices as possible as to how it
which bits and pieces to combine in the web makes sense to make selections, create a cor-
page view (a CDX file), which can support pus to study, and perform the analysis in each
users with information about URLs, time­ case. And – most importantly – it is essential
stamp, and file type. Creating a corpus based to continuously explain and document these
on such access forms to the web collection methodological reflections about the nature
may also be challenging. Although an index and provenance of the archived web. In many
of the collection’s holdings enables the mate- ways these steps resonate traditional histo-
rial to be searched, it is not straightforward to riographical skills, but the concrete critical
come from the identification of the websites work with the sources come in new forms
one wants to study to the material itself as because of the digitality of the archived web,
preserved in the collection. For instance, if and therefore traditional approaches have to
the aim is to study the written text on all web be reinterpreted and translated to fit this new
pages, access to the body text is needed, and source environment.
in most web archives this is not possible to
get in its HTML form, but only as displayed
in its visible form with the Wayback software Notes
(cf. Lin et al., 2017). If a researcher has suc-
1  For an overview of other examples see Winters
ceeded in creating the corpus she needs, she
(2017).
then has to do the analysis, and this is where 2  Histories of web archiving and web archives can
the possibilities outcompete the challenges, be found in Brügger (2011: 29–32; 2018); Web-
simply because a corpus based on the above ster (2017); Koerbin (2017); and Laursen and
formats is machine-readable, and therefore Møldrup-Dalum (2017). The latter two mainly
focus on web crawling and national web archives.
the analysis can be automated, and analyses
See also Rogers (2018) and Webster (2018).
of large amounts of data are made possible, 3  There exist a great variety of international,
just as the analysis can focus on each type national, and local web archives, cf. the overviews
of archived fragment that is expressed in the in IIPC Members, and in List of Web archiving ini-
code, thereby making the analysis very fine tiatives.
4  The following sections reiterate some of the
grained, despite the large scale.
insights presented in Brügger (2018).
5  The term born digital is also used in Berry (2012:
4); Kirschenbaum (2013); Rogers (2013); and
Jones (2014: 6).
CONCLUDING REMARKS 6  This is obviously a very simple scenario, but even
with more complicated and dynamic web pages,
this is basically what happens. In the following
The web historians who set out to base their the term HTML is used as an umbrella term to
study on the archived web should start by cover the different versions of HTML, including
acknowledging that although at first sight an XHTML and XML.
28 THE SAGE HANDBOOK OF WEB HISTORY

7  See Brügger (2017) for a pre-history of the hyper- Brügger, N. (2017) ‘Connecting textual seg-
link, and Helmond (2018) and Barnet (2018) for ments: A brief history of the web hyperlink’,
different approaches to the history of the hyperlink. in N. Brügger (Ed.), Web 25: Histories from
8  It is also possible to collect web material from a the First 25 Years of the World Wide Web.
database, made available through an Applica-
New York: Peter Lang Publishing. pp. 3–28.
tion Programming Interface (API), or to collect
web material that has been taken off-line and
Brügger, N. (2018) The archived web: Doing
preserved unchanged (backups and the like); for history in the digital age. Cambridge, MA:
space reasons these archiving forms are not pre- MIT Press.
sented here, but they are discussed in Brügger Brügger, N., and Schroeder, R. (Eds.) (2017) The
(2018). web as history: Using web archives to under-
9  In addition to the technical challenges outlined stand the past and the present. London: UCL
in the following paragraphs, a number of legal, Press.
curatorial, and organisational issues are also at Cho, J., and Garcia-Molina, H. (1999) ‘The
stake in relation to web archiving strategies. evolution of the web and implications for an
10  For an overview of historical network analysis of
incremental crawler’, in Proceedings of the
hyperlinks, see Brügger (2013) and Stevenson
and Ben-David (2018).
26th International Conference on Very Large
Databases.
Cocciolo, A. (2015) ‘The rise and fall of text on
the Web: A quantitative study of Web archives’,
Information Research, 20(3), http://www.
REFERENCES informationr.net/ir/20-3/paper682.html#.
W6Kcpy863OQ [Accessed 19 Sep 2018].
Agata, T., Miyata, Y., Ishita, E., Ikeuchi, A., and Helmond, A. (2018) ‘A historiography of the
Ueda, S. (2014) Life span of web pages: A hyperlink: Periodizing the web through the
survey of 10 million pages collected in 2001, in changing role of the hyperlink’, in N. Brügger
Proceedings of the 14th ACM/IEEE-CS Joint & I. Milligan (Eds.), The SAGE Handbook of
Conference on Digital Libraries. IEEE Press. pp. Web History. London: Sage. pp. 227–241.
463–464. Hockx-Yu, H. (2015) ‘The Unknown Aspects of
Barnet, B. (2018) ‘Hypertext before the web – Web Archives’, paper presented at the confer-
or, what the web could have been’, in ence ‘Web archives as scholarly sources: Issues,
N. Brügger & I. Milligan (Eds.), The SAGE practices and perspectives’, 8–10 June, Aarhus.
Handbook of Web History. pp. 215–226. IIPC Members (n.d.) http://netpreserve.org/
London: Sage. about-us/members.
Berry, D.M. (2012) ‘Introduction: Understanding Jackson, A. (2015) ‘Ten years of the UK web
the digital humanities’, in D.M. Berry (Ed.), archive: What have we saved?’, paper pre-
Understanding Digital Humanities. New York, sented at the 2015 IIPC GA, Palo Alto.
NY: Palgrave Macmillan. pp. 1–20. Jones, S.E. (2014) The emergence of the digital
Brügger, N. (2005) Archiving websites: General humanities. New York: Routledge.
considerations and strategies. Aarhus: Kirschenbaum, M. (2013) ‘The .txtual condi-
Centre for Internet Studies. tion: Digital humanities, born-digital archives,
Brügger, N. (2011) ‘Web archiving – between and the future literary’, Digital Humanities
past, present, and future’, in M. Consalvo & Quarterly, 7(1), n.p.
C. Ess (Eds.), The Handbook of Internet Stud- Koerbin, P. (2017) ‘Revisiting the World Wide
ies. Oxford: Wiley-Blackwell. pp. 24–42. Web as artefact: Case studies in archiving
Brügger, N. (2012) ‘When the Present Web is small data for the National Library of Aus-
Later the Past: Web Historiography, Digital tralia’s PANDORA Archive’, in N. Brügger
History, and Internet Studies’, Historical (Ed.), Web 25: Histories from the First 25
Social Research, 37(4), 102–117. Years of the World Wide Web. New York:
Brügger, N. (2013) ‘Historical Network Analysis Peter Lang Publishing. pp. 191–206.
of the Web’, Social Science Computer Laursen, D., and Møldrup-Dalum, P. (2017)
Review, 31(3), 306–321. DOI: 10.1177/ ‘Looking back, looking forward: 10 years of
0894439312454267 development to collect, preserve, and access
UNDERSTANDING THE ARCHIVED WEB AS A HISTORICAL SOURCE 29

the Danish web’, in N. Brügger (Ed.), Web autobiographical traditions’, in N. Brügger &
25: Histories from the First 25 Years of the I. Milligan (Eds.), The SAGE Handbook of
World Wide Web. New York: Peter Lang Web History. London: Sage. pp. 42–56.
Publishing. pp. 207–227. Schneider, S.M., and Foot, K.A. (2004) ‘The
Lin, J., Milligan, I., Wiebe, J., and Zhou, A. (2017) web as an object of study’, New Media &
‘Warcbase: Scalable analytics infrastructure for Society, 6(1), 114–122.
exploring web archives’, Journal on Comput- Stevenson, M., and Ben-David, A. (2018) ‘Net-
ing and Cultural Heritage, 10(4), link: https:// work analysis for web history’, in N. Brügger
dl.acm.org/citation.cfm?id=3097570 & I. Milligan (Eds.), The SAGE Handbook of
[Accessed 19 Sep 2018]. Web History. pp. 125–137. London: Sage.
List of Web archiving initiatives (n.d.) https:// Webster, P. (2017) ‘Users, technologies, organi-
e n . w i k i p e d i a . o r g / w i k i / L i s t _ o f _ We b _ sations: Towards a cultural history of world
archiving_initiatives. web archiving’, in N. Brügger (Ed.), Web 25:
Meyrowitz, J. (1994) ‘Medium theory’, in Histories from the First 25 Years of the World
D. Crowley & D. Mitchell (Eds.), Communication Wide Web. New York: Peter Lang Publishing.
Theory Today. Cambridge: Polity. pp. 50–77. pp. 175–190.
Ntoulas, A., Cho, J., and Olston, C. (2004) Webster, P. (2018) ‘Existing web archives’, in
‘What’s new on the web? The evolution of N. Brügger & I. Milligan (Eds.), The SAGE
the web from a search engine perspective’, Handbook of Web History. London: Sage.
in www2004, May 17–22, 2004. New York. pp. 30–41.
Rogers, R. (2013) Digital methods. Cambridge, Winters, J. (2017) ‘Breaking in to the main-
MA: MIT Press. stream: Demonstrating the value of internet
Rogers, R. (2018) ‘Periodizing web archiving: (and web) histories’, Internet Histories,
Biographical, event-based, national and 1(1–2), 173–179.
3
Existing Web Archives
Peter Webster

Web archives are fast becoming the funda- much of the business of Web archiving has
mental source with which the history of the yet to come under systematic scholarly scru-
Web is written. Scholars coming to them for tiny. As such, some of the perspectives here
the first time are in need of some orientation, are those of a scholar who has also been a
however, since those archives are brought participant-observer in some of the develop-
into being by many different organisations ments described; it is in part the articulation
for varying purposes and by different means. of a kind of institutional memory.
Their scope and structure also vary widely, as
do the means of first locating and then using
them. This chapter begins with a brief
­historical sketch of the development of Web A BRIEF HISTORY OF WEB ARCHIVING
archiving over the last 20 years. It then
moves on to outline the different means by Users of traditional paper archives have
which archives are created, and what impli- always known that to understand how the
cations those differences have for how they archive came into being is a prerequisite to
must be interpreted. It outlines the varied using it well. (For an extended meditation on
kinds of collections in existence, and the dif- this theme, see Thomas et al., 2017). By
ferent questions of method that this variety which criteria was material selected for
raises for scholars. Finally, it details the inclusion (if it was selected at all)? Who was
means by which scholars may first locate it that devised those criteria, and whose par-
archived Web content, and (once located) ticular interests might they have served, and
how it may be used. It is based throughout on which others have they occluded? These may
the literature available but (as is the nature of seem obvious questions to ask, but users of
very contemporary activity such as this) digital resources have approached them in
EXISTING WEB ARCHIVES 31

different ways to traditional print and manu- rest of the wider Web archiving community.
script resources, with a critical engagement However, it is under no formal obligation to
which, whether greater or less, is surely dif- continue either to collect or to provide access
ferent in kind. And in order to answer those to its content. This is no criticism, but is
questions successfully, some acquaintance merely to note the different conditions under
with the ‘cultural history’ of an archive is which different archives are created.
necessary. Who (or, which organisations) The other Web archiving organisation with
have created these Web archives, over which a globally comprehensive selection policy
time period(s), and in response to which is the Common Crawl Foundation. Like the
internal and external drivers: legal, eco- Internet Archive, its genesis was in the tech-
nomic, organisational? A full history of Web nology hub of California’s Silicon Valley,
archiving as it has unfolded over 20 years and (also like the Archive) it operates as a
remains to be written, but a brief historical non-profit foundation. Founded in 2007 by
sketch of what content has been collected, by former senior Google executive Gil Elbaz, it
whom and why, is in order here. What fol- began comprehensive crawls of the Web in
lows is in large part derived from my own 2008, based on a mission to provide access to
sketch of the history (Webster, 2017b). Web data at a scale previously only available
The most well-known of all Web archives to large corporations, in order to facilitate
was also one of the very first: the Internet research, innovation in business, and use by
Archive, which celebrated 20 years of opera- the general public (Green, 2011).
tion in 2016. Uniquely, it began without any At about the same time that the Internet
link – personal or institutional – to the library Archive was founded, national libraries on
and archive sector. Brewster Kahle, the three continents were also taking their first
Archive’s co-founder and still its director, set steps towards systematic archiving of the
up the Archive as a non-profit counterpart to Web. In Canada, what was then the National
his Alexa Internet, an early Web browser tool- Library of Canada (now part of Library and
bar which both helped users navigate the Web Archives Canada) instigated the Electronic
and archived it based on those users’ actions. Publications Pilot Project, which reported
Although Alexa was subsequently acquired in 1995. In common with the majority of
by Amazon, the Archive continued its work, national libraries, the NLC had a remit to
motivated by a vision of a comprehensive collect and make available ‘Canada’s pub-
library of the world’s content (Kimpton and lished heritage’, now understood to include
Ubois, 2006; an ethnography of the Archive’s publications delivered on physical stor-
current work is Ogden et al., 2017). Given age media such as disks, or via the Internet
this maximally comprehensive policy of (National Library of Canada, 1996). The
selection, the Archive’s holdings dwarf those National Library of Australia was similarly
of any other Web archive. By October 2016 it charged with maintaining a comprehensive
held archived copies of 273 billion webpages collection of materials relating to Australia
from 361 million sites, comprising 510 bil- and the Australian people. The PANDORA
lion individual time-stamped digital objects project was established in 1996, again as a
(Goel, 2016b). At the same time, it operates natural extension of that older remit to take
without any broader institutional remit or in material made available via the Internet.
legal or regulatory framework (except for US Faced with the need to obtain permission
law in general) and is, in the final instance, from the owners of websites to harvest their
answerable only to itself and (indirectly) to material, and a simple lack of resources,
its users. The Archive has become a land- the NLA took a pragmatic decision to take
mark in the information landscape, is very a selective approach from the beginning
heavily used and has been foundational to the (Koerbin, 2004: 1–2).
32 THE SAGE HANDBOOK OF WEB HISTORY

In Sweden, the Royal Library had been (Illien, 2011). The IIPC has acted as a key
responsible for collecting, preserving and clearing-house of ideas – technical, curatorial
providing access to Swedish printed publica- and organisational – as well as being a direct
tions since 1661. Once again, the archiving of funder of education, research and technical
the Web as analogous to published works was development projects.
viewed as a natural extension of that remit. In Alongside these large-scale and (some-
contrast to the Australian case, the Swedish times) comprehensive approaches, there have
approach was comprehensive, on the grounds been a number of organisations, large and
that it was in fact more cost-effective than a small, that have taken on the task of archiv-
selective approach, and also due to agnosti- ing the Web for more specific purposes. Chief
cism about the relative potential value of dif- amongst these are university-based archives,
ferent kinds of content: ‘[o]ne doesn’t know acquiring content as part of their wider con-
what information future generations will con- tent development, to support students and
sider important’ (Arvidson et al., 2000). staff. One early example, beginning in 2008, is
Several countries have long-established the Columbia University Human Rights Web
systems of legal deposit that entitle national Archive, a project of the Center for Human
libraries or their analogues to receive copies Rights Documentation and Research, located
of everything published within that country. within the Columbia University Libraries in
In nations where print legal deposit was in New York (CHRDR, 2016). A rather different
force there have been moves to extend that type of organisation that archives the content
legal framework to cover non-print content. of other organisations is the Archive Team, a
This movement has of course been wider ‘loose collective of rogue archivists, program-
in scope than the Web, since it tends also to mers, writers and loudmouths dedicated to sav-
include digital objects that are more obvi- ing our digital heritage’ (Archive Team, 2016).
ously ‘published’, in particular scholarly The Archive Team began work in 2008 in
journals and electronic books. One of the response to a growing sense of the vulnerability
first nations to implement such a new law of user-generated content to arbitrary closure
was Denmark in 1997. The relevant act for by the platforms that hosted it (Scott, 2008,
New Zealand was the National Library Act 2011). One such archive is that of the early
of 2003, the same year as the Legal Deposit online community Geocities, abruptly closed
Libraries Act in the UK (Elliott, 2011; Field, by Yahoo in 2009, which is now becoming an
2004; Larsen, 2005). The relevant French object of study for historians (Milligan, 2017:
legislation dates from 2006, and several oth- 140). On other occasions, interested citizens
ers have since followed (Aubry, 2010). have acted to create Web archives in response
World Web archiving has been shaped to to rapidly unfolding events of importance, such
a large extent by the key role taken by the as the informal grouping of activist librarians
national libraries, who were in close contact that produced the Dale Askey Archive, relating
and collaborated extensively. In relation to to a lawsuit in Canada that involved issues of
legal deposit in particular, there was interna- freedom of speech (Milligan et al., 2016).
tional collaboration from the first. A working Alongside these several ventures that pre-
group on non-print legal deposit was set up serve the content of others, the more recent
by the Conference of Directors of National past has seen an increase in the number of
Libraries, and worked between 1994 and organisations taking steps to preserve their
1996 (Field, 2004: 90). The International own content. Quite apart from the longer-
Internet Preservation Consortium was formed term interest to scholars, organisations have
in 2003 by the Internet Archive and a group tended to see the issue in terms of orderly
of national libraries and has been of vital management of closure (in the cases where a
importance in the growth of the movement site is to be shut down), but also as a logical
EXISTING WEB ARCHIVES 33

extension of older practices of corporate there was already a small but growing group
record-keeping: as an aid to future decision- of scholars who were studying the early Web
making, and (particularly in industries sub- and its predecessor technologies (Wellman,
ject to statutory regulation) as a means of 2011). For these, there was no alternative but
managing risk and as a resource in the case to act as their own archivist, and this strand of
of litigation. Some such archives are publicly small-scale individual archiving has persisted
available, such as that of the Smithsonian alongside the larger-scale efforts of libraries,
Institution in the United States, or of the archives and other organisations. In some
UK Parliament.1 However, at present, many cases, this has taken the form of image-based
of these archives are not open to public use, records of the visual appearance of a page:
and as such are hard to document, but they either screenshot images or (more recently)
would appear to be a feature of the scene that using facilities within Web browsers to save a
is growing in importance. page as a PDF file. This approach involved
Public organisations operate under a dif- the loss of much if not all of the functionality
ferent set of imperatives than private com- of a site. This problem could be avoided by
panies, and the final significant strand in the the archiving instead of HTML files and
history of Web archiving is the archiving of associated objects. This approach, whilst it
government records. In some countries there retained the functionality in the short term,
is a clear division between a national library was always vulnerable to potential loss both
(dealing with the published record) and a of the appearance of a resource and its func-
national archive (dealing with the unpub- tionality, as the ways in which Web browsers
lished record of government). However, the rendered content changed over time. All this
archiving of government Web estate has not having been said, there most likely exists a
always been easy to place: of which organisa- considerable body of highly significant early
tion should it be the responsibility? The divi- Web content in the possession of individual
sion has become particularly unclear since the scholars, largely inaccessible to others.
move in several countries towards the deliv- For some researchers, interested in larger
ery of government services on a ‘digital by bodies of materials than single pages, there
default’ basis, particularly since 2011 (Lips, were developed various desktop applica-
2014). One of the earliest national archives tions that would allow larger-scale and finer-
to take on the task was the National Archives grained crawling of websites. One such was
of the UK. The government had in 1999 Wget, which allows the harvesting of whole
decided that all newly created public records sites and also recursive crawling: the follow-
were to be stored and retrieved digitally by ing of a certain number of links from a seed
2004, and subsequently also decided that all URL. Crucially, in later versions Wget also
services to business and to the citizen should allowed the output of archived files in the
be delivered online by 2005. Websites used to ISO standard WARC file format used by most
deliver those services were to be considered libraries and archives. Storage in WARC is
as public records, and so the UK Government currently the standard means of avoiding the
Web Archive was formally founded in 2003 loss of functionality and appearance inher-
(Brown, 2006: 178–9). ent in other approaches. Another commonly
used tool was HTTrack, which additionally
came with a graphical user interface where
Wget must be operated from the command
HOW IS A WEB ARCHIVE MADE? line. Most recently the Webrecorder service
has allowed users to capture content as they
Even as the Internet Archive and the national browse it in their own browser, a different
libraries were beginning their work in 1996, approach again.
34 THE SAGE HANDBOOK OF WEB HISTORY

The most commonly used crawler for crawl- is also then required (of which more below).
ing at the scale of whole national domains is This often requires the integration of a new
Heritrix, first developed by the International kind of data into existing library and archival
Internet Preservation Consortium and made catalogue systems, and the support of refer-
available to the wider archiving community ence librarians who support users directly.
on an open source basis (Kimpton and Ubois, New and different kinds of Web application
2006: 211). Each of these scaleable alterna- are also often required to render archived
tives allows efficient collection of material content to users both locally and remotely.
at scale, but presents the end user with addi- The most commonly used basic application
tional questions of interpretation. Each par- is known as Wayback (or its open source sib-
ticular crawl must begin with an initial list ling Open Wayback), but several institutions
of seeds, the composition of which has some are now developing tools that go beyond its
influence on the content which is collected capabilities (for which, see the ‘Finding and
and in which order. The harvested content is using Web archives’ section, below). As such,
also influenced by other decisions taken when operating an end-to-end operation requires a
programming the crawl: were the captures partnership of managers, librarians and/or
from a particular domain capped at a certain archivists, and technology professionals spe-
file size? Were certain areas on a particular cialising in Web crawling, indexing, network
site excluded from the crawl? Did the crawl and hardware management, digital preserva-
observe the robots.txt protocol, by which tion and application development.
website administrators request (but can- Operating a full end-to-end Web archiving
not enforce) that their sites are not crawled? operation at large scale, then, imposes a very
These matters are at present not widely docu- particular set of costs on an archiving organi-
mented in such a way to aid user understand- sation. Although the costs for hardware and
ing of the composition of the final archive.2 storage capacity are significant, those things
Although the precise configuration of are not in short supply; the major limitation
people and technologies varies, the several is often that on the availability of technical
Web archiving organisations that conduct staff. As the technologies used to deliver
large-scale archiving with their own staff Web content evolve very quickly, so must
and infrastructure all must cover certain the skills of those charged with obtaining
activities. (A useful early summary of the successful captures of that content. In con-
issues was Masanès, 2006.) Policies must trast, libraries and archives are often already
be devised to specify which content should well endowed with expertise in the selection
be archived, how frequently, at what depth of content for archiving, its description and
and with which levels of quality assurance the necessary quality assurance work. As
(how good a copy is good enough?). Both such, there has developed a small but signifi-
staff and information systems are required to cant group of providers of outsourcing ser-
record the implementation of those policies vices for Web archiving. Typically, the lead
at an operational level. More staff (and often organisation continues to select and describe
different individuals) are then required to content, whilst contracting with one of these
operate the crawling process itself. Hardware providers to conduct the crawling, carry out
is required to store the data once harvested; indexing and provide access.
further applications to index it for search- The various providers in this market may
ing, if required (the most commonly used are by and large be divided between non-profit
Apache Solr and ElasticSearch); and special- organisations and private-sector providers,
ist expertise is required for long-term pres- and between those providing open access to
ervation of that data. In most cases, some the final archives and those providing access
means of making that content discoverable restricted to certain groups (typically the
EXISTING WEB ARCHIVES 35

client and their staff). In fact, it has been the and a potential trap for the unwary. The fre-
private-sector providers which have tended quency at which individual pieces of content
most often to provide restricted-access are archived varies significantly, and so the
archives. These have most often met the needs scholar must always reckon with the fact that
of individual organisations, most particularly content on the live Web may appear, be altered
in regulated areas of the private sector, as a and indeed disappear without ever being vis-
means of meeting regulatory requirements ited by a crawler. Given this, great care must
for record-keeping and for use in possible be taken not to equate comprehensiveness as
litigation. The non-profit providers are domi- a selection policy with completeness in the
nated by two players, both of whom gener- archive itself. To add to this indeterminate
ally provide openly accessible archives. incompleteness – the ineluctable product
The Archive-It service was launched by the of the nature of the medium itself – studies
Internet Archive in 2006, with a particular have shown that the unintended effect of the
concentration on serving libraries, archives, very nature of the crawl process has intro-
universities and other public and third-sector duced systematic bias into the composition
clients, predominantly although not exclu- of the archive. Thelwall and Vaughan (2004)
sively in North America. In Europe, similar showed significant weighting in the Internet
services have been offered since 2004 by Archive towards certain countries, due to the
the Internet Memory Foundation, via its nature both of the Web itself and of the crawl
Internet Memory Research subsidiary. First process. Web archivists in many institutions
established as the European Web Archive work hard to eliminate such biases as far as
in Amsterdam in 2004, Internet Memory possible, but one of the key challenges in
Research at the time of writing served several the next few years ahead is for scholars and
European clients from the library and archive archivists to work together to understand the
sector (Brown 2006: 18). ways in which archives are created and the
information which scholars need in order to
be able to allow for the inevitable gaps and
biases they contain.
HOW ARE WEB ARCHIVES Amongst those Web archive collections that
STRUCTURED? are bounded by a particular selection policy,
by far the largest are those relating to whole
This chapter has dwelt at length on the vari- national domains, held for the most part by
ous means by which a Web archive comes national libraries, and collected under legal
into existence, and the cultural and institu- deposit provisions. Such archives now exist
tional setting in which that creation occurs. for Denmark, France, Iceland and the UK,
No less important for interpreting the archive amongst others, and more are being added
correctly is an understanding of its scope: gradually to the list. Once again, however,
which kinds of content were selected for some caution is required in understanding
inclusion. In some cases, notably those of the precisely how the national domain is defined,
Internet Archive and Common Crawl, the as the definition varies considerably between
question is at one level easy to answer: eve- nations. For those nations with one or more
rything. Neither the Archive nor Common country code top-level domain (such as .uk
Crawl limit their collecting geographically to for the UK and .fr for France), registration
particular national or local domains, or on within that ccTLD defines a particular domain
grounds of subject matter, or by the techno- as within scope for legal deposit. The matter
logical format in which content is delivered. is much less clear for nations whose political
This intention of comprehensiveness pre- boundaries do not map precisely onto any one
sents both an unprecedented opportunity ccTLD: the United States, without any such
36 THE SAGE HANDBOOK OF WEB HISTORY

ccTLD of any size, is a case in point. In any content. Two prominent examples of this
case, studies show that a large proportion of are the Netherlands Institute for Sound and
domains published from within a particular Vision (Nederlands Instituut voor Beeld en
nation are not registered in the local ccTLD, Geluid), and (in France) L’Institut nationale
although neither the extent of nor the reasons de l’audiovisuel. Both have a national remit,
for this are well understood (Milligan and and work closely with the respective national
Smyth, 2018; Webster, 2018). Recognising libraries. As a result, it is important that users
this, other means are also often used to deter- understand the demarcation of the collections
mine whether a domain may be considered between the pair of institutions in each country.
in scope, such as the geographic location of Other means of demarcation include the
the servers on which content is hosted, or of sectoral: materials relating to a particular
the residence of the organisation or individ- sphere of political, economic or cultural life.
ual which registered the domain. In addition, This chapter has already described the his-
some nations with a national language that is tory of the archiving of national government
wholly or largely unique to that country, such Web estate, such as the UK Government Web
as Denmark, have cast their legal deposit Archive (part of the National Archives) or
legislation to include content written in that the Australian Government Web Archive
language, even if published elsewhere. There (an ­ initiative of the National Library of
remain broader questions about the nature of Australia). In some cases, an institution is of
the nation that each individual disposition such ­particular and singular significance that
posits. However, a knowledge of the basic its corporate archive is at the same time an
assumptions made is essential for scholars archive of a whole sphere of activity: a case
wishing to use national legal deposit archives. in point is the Parliamentary Web Archive in
To date it has been the national and supra- the UK. (The UK has only one parliament,
national Web archiving initiatives that have after all.) One particular gap in current pro-
attracted the most attention and attained vision is of business archives: Web archives
the highest profile. Although established of the estate of private companies to match
more recently, there are also many other the extensive network of paper archives for
collections that are also demarcated by geo- the same corporations. This is not to say that
political d­ivisions but focussed on smaller such content is not being archived, but more
units. This has been a relatively weak tra- that it is (so far) rarely made available for
dition in Europe, although there are nota- public and scholarly use, apart from within
ble examples, such as in the Belgian city the mass of larger legal deposit collections.
of Antwerp (Boudrez and Van den Eynde, Rather more various, and thus more diffi-
2002). In the United States, there is a some- cult to interpret, are the many collections that
what stronger tradition of independent action are based on particular subjects. Alongside
by individual state archives, such as North archiving under legal deposit, the national
Carolina (Martin and Eubank, 2007). Some libraries also have a strong tradition of sub-
of these initiatives have a broad remit to cap- ject-based collecting. This has often been
ture content relating to the affairs of a city associated with significant political and cul-
or state more generally; in the case of North tural events, such as elections or major sport-
Carolina, the content is restricted to state ing events. These events, of a wide general
government sites and related content. interest, and of which the significance is not
Some Web archives are created and made often in dispute, have served in many cases
available along different lines than the geo- as a means of demonstrating the importance
graphic. Partly due to the very particular tech- of Web archiving and of establishing a pro-
nical issues involved in their capture, some gramme. One of the earliest collections in the
archives are focussed solely on audio-visual UK Web Archive relates to the 2005 general
EXISTING WEB ARCHIVES 37

election; in 1996 the Internet Archive col- in the relationship between the providers of
lected the sites of candidates for the presi- objects for study (libraries and archives, by
dency of the United States, in partnership with and large) and those who use those objects.
the Smithsonian Institution, and again in 2000 Whatever intellectual use a reader might make
on behalf of the Library of Congress (Kimpton of a manuscript or printed book, the means by
and Ubois, 2006: 202–3). The first collabora- which that object is brought to the reader’s
tive collections created by the members of the desk does not affect how it is used. With digi-
IIPC related to successive Olympic Games. tal resources, design decisions made during
These large general collections have tended to the creation of the applications that deliver
be projects undertaken by the national librar- digital objects to users affect what may or may
ies. However, both the national libraries and not be done with those objects. As such, users
university initiatives have focussed collecting need not only to engage critically with the
on themes that are wider and longer-lasting object itself and how it came into being, but
than particular events. Examples within the also with the systems by which they are able
UK Web Archive include collections relating to gain access to it. Access to Web archives
to diasporic communities, health, video games has been enabled in several different ways,
and the practice of oral history. From within each of which is in a state of constant develop-
the universities, one of the earliest examples ment as technologies change.
was the Digital Archive of Chinese Studies The earliest and most pervasive means of
(DACHS), a German–Dutch venture (Lecher, accessing an archived object is by means
2006). Amongst the collections hosted by of a search for its original URI, the unique
Archive-IT are the Digital Art Web Archive string of characters that identified it online
(Cornell University), Resources in Religion (also often commonly referred to as the
and Theology (Princeton Theological URL, although the two are not strictly syn-
Seminary) and the Innsbruck Newspaper onymous). Most crawler softwares produce
Archive (University of Innsbruck). the data necessary to enable this as a matter
These subject-based collections, selective in of course as part of the crawl process, often
nature, present the user with a different set of in the CDX format, a kind of finding list for
methodological challenges to the comprehen- locating an individual URI in a larger WARC
sive collections. Since the collections are selec- file. URI search is the primary way of locat-
tive, users must first understand the selection ing objects in Wayback. Indeed, the Wayback
criteria used: criteria which are not always fully Machine (the main means of accessing the
documented at the point at which users access holdings of the Internet Archive) had URI
the collection. It is also essential to understand search as its only means of access until 2016.
the cases where a resource was selected for URI search on its own, however, presents
archiving, but could not be archived, either some challenges to use of the archived Web
for technical reasons (that a successful capture as an historical source as it presupposes that
could not be obtained) or for legal reasons (that, the URI of a resource, perhaps long since dis-
where the permission of the site owner was appeared from the live Web, is either known
required, it could not be obtained). Without this by the user or can be discovered by some
information, robust analytical use of a whole other means. This point is writ large in the
collection is made more difficult. work of Anat Ben-David on reconstructing
the defunct domain of the former Yugoslavia;
a case in which all the resources from an
FINDING AND USING WEB ARCHIVES entire national top-level domain have gone
from the live Web (Ben-David, 2016).
One of the consequences of the advent of digi- In digital scholarly resources for research
tal resources has been a fundamental change at large, a standard means of access and one
38 THE SAGE HANDBOOK OF WEB HISTORY

which users now expect as standard is search- of derived data made available for download
ing of the full text. This is widely available and local use. The CDX format, as well as
in Web archives, particularly those which are being critical for the provision of a URL
freely available online. However, at the time search service, also provides a data source
of writing, by no means all of the larger Web with which to begin to analyse the contents
archives provide full-text search as standard, of a Web archive at a more abstract level. The
partly due to the not insignificant technical British Library has also begun to make avail-
difficulties in creating the necessary indexes able other kinds of derived data about its col-
at the scale required. For its first 20 years the lections, not least the JISC UK Web Domain
Internet Archive – the largest of them all – Archive Dataset. As well as the associated
did not offer full-text search, and the func- CDX files from this data, there is available
tionality implemented in 2016 offers a search derived data relating to postal codes, file for-
of only part of the full archive, based on the mat and host-to-host links, some of which
homepages of each domain (Goel, 2016a).3 is already being used by scholars (Webster,
Both of these modes of access presuppose 2017a; Webster, 2018). Both Common Crawl
that the key object of study in a Web archive and the Internet Archive have begun to allow
is an individual archived object, page or site, users to exploit other file formats derived
replayed in a Web browser using Wayback or from Web archives, notably WAT files (Web
a similar graphical user interface. Attention Archive Transformation) and WET files
amongst Web archives has relatively recently (which contain extracted plain text); these too
turned to means of representing the contents have attracted the attention of Web historians
of Web archives at a more abstract level, (Milligan, 2016).
and visually, in line with a more general Niels Brügger has noted the confusion
trend towards visualisation of data both for that surrounds the use of the term ‘archive’
research and in other spheres, not least jour- in the context of Web materials; the term
nalism. The SHINE interface, developed by ‘webrary’, whilst perhaps less elegant, cap-
the British Library and now being used else- tures better the rather stronger analogy that
where, incorporated a visualisation of trends, exists between websites and traditional publi-
a means of showing the frequency with which cation (Brügger, 2016). However, the ways in
particular search terms appeared within each which such ‘publication-like’ things may be
archived year. Experimentation is also at an integrated into traditional library catalogue
early stage with ways to visualise the link systems are yet to solidify into conventions
graph, the patterns of hyperlinks between dif- and standards that apply across different
ferent parts of the Web. services, and much Web archive discovery
Recent years have seen scholars widen is through dedicated user interfaces which
their interest to include Web archives as data: stand alone. Within a Web archive, the par-
either as ARC/WARC files (the standardised ticular object of study could be a particular
file format for Web archives), or metadata individual website, or a related group of them
derived from them. Relatively few individual (a ‘web sphere’); it could equally well be a
researchers have access to the size of local single PDF document or image file. As such,
infrastructure necessary to handle data at the appropriate level at which Web archives
scale. One response to this is that of Common should be described is more difficult to fix
Crawl, where access to (extremely large) col- than for printed works. In addition, the tradi-
lections of WARC files is via a third-party tional library catalogue assumes (for the most
cloud storage service, on which researchers part) a relationship in which each individual
with the necessary programming skills can object is catalogued. In the context of legal
run their own scripts for a fee. As well as deposit archives, containing many millions of
WARC files, there is now also a proliferation domains, and billions of individual resources,
EXISTING WEB ARCHIVES 39

such a relationship is inconceivable in the tra- CONCLUSION


ditional sense. The vast bulk of archived Web
resources will almost certainly never be man- This chapter has sought to outline some of
ually described by a human being, although the key patterns into which Web archiving
automated means may well be developed that has fallen over its first 20 years, in terms of
infer certain descriptive characteristics of a organisations, techniques, collections and
resource from an analysis of its content. As access. Along the way, it has raised several
a result, a user’s awareness of a Web archive axes of necessary critical engagement for
needs to include an understanding of how Web historians regarding the archived Web
and to what extent it has been described, and as a new class of primary source. Some of
how that affects the ways in which individual these issues have their analogues in print,
resources may be discovered. manuscript or other sources; a scholar needs
It is also the case that the means by which to understand who produced an object,
archives may be accessed varies between whether it be a manuscript, a painting or an
and within individual institutions. In many archived PDF. But some of the issues pre-
cases, archived content may be accessed sented here are peculiar to the archived Web,
freely online and between nations, such and must be thought through afresh. As this
as is the case with the Internet Archive, or chapter outlined, the technologies that are
the Portuguese Web Archive (arquivo.pt). used to create archived Web resources funda-
However, the situation is more complex in mentally shape those resources, and so
relation to legal deposit archives, such that in understanding those technologies is a prereq-
the British case, the Open UK Web Archive uisite to understanding the archive. Crucial
is freely available online whereas the UK also is an understanding of how the archive is
Legal Deposit Web Archive is not. Most legal structured: along national lines, by the insti-
deposit dispensations are marked by a com- tution or sector that created the content, by
promise between the opposing interests of the file format or by a more general subject.
owners of content who wish to exercise their Finally, users must also understand some-
rights in that content and generate revenue thing of the means by which they discover,
from it, and scholars and the general public search within, view and analyse archived
who have an interest in there being a public objects, since those means are both relatively
record of that content. As a result, the routine new and in a state of flux and development.
archiving of copyright material has entailed That thinking will be greatly enabled by
certain restrictions on access and reuse of close collaboration between scholars and
that content. (For an example of these nego- archivists: a partnership of mutual benefit
tiations in the UK, see Green, 2012.) In other which shows welcome early signs of growth.
contexts, access is restricted not so much
for these commercial reasons but in order to
protect individuals to whom the content may Notes
refer. Whatever the underlying reason, access
1  Smithsonian Institution: https://archive-it.org/
to legal deposit content is often restricted to
organizations/660; the UK Parliament Web Archive
the premises of the national library, as is the is at http://webarchive.parliament.uk
case in France and the UK. In other cases, 2  An example of a well-documented crawl is that of
such as Denmark, access may be gained the Internet Archive from 2011: https://archive.
away from library premises but only to aca- org/details/wide00002&tab=about
3  The limitations of this approach are evident in
demic users after a process in which their cre-
the case of homepages of organisations or indi-
dentials are verified. Access arrangements to viduals that do not have their own domains, such
Web archives depend on the circumstances in as blogs hosted by platforms such as Blogger or
which those archives were created. Wordpress.
40 THE SAGE HANDBOOK OF WEB HISTORY

REFERENCES Goel, V. (2016b) ‘Defining Web pages, Web


pages and Web captures’, Internet Archive
Archive Team (2016) ‘Homepage’, retrieved Blogs, retrieved 3 January 2017 from
3 May 2016 from http://www.archiveteam. h t t p s : / / b l o g . a rc h i v e . o r g / 2 0 1 6 / 1 0 / 2 3 /
org. defining-web-pages-web-sites-and-web-
Arvidson, A., Persson, K., and Mannerheim, J. captures/
(2000) ‘The Kulturarw3 Project: the Royal Green, A. (2012) ‘Introducing electronic legal
Swedish web archive – an example of “com- deposit in the UK: a Homeric tale’, Alexan-
plete” collection of web pages’. Paper given at dria, 23(3): 103–9.
66th IFLA Council and General Conference, Green, L. (2011) ‘Common Crawl enters a
retrieved 15 April 2016 from http://archive.ifla. new phase’, retrieved 1 January 2017 from
org/IV/ifla66/papers/154-157e.htm http://commoncrawl.org/2011/11/common-
Aubry, S. (2010) ‘Introducing Web archives as crawl-enters-a-new-phase/
a new library service: the experience of the Illien, G. (2011) ‘Une histoire politique de
National Library of France’, LIBER Quarterly, l’archivage du web’, Bulletin des biblio-
20(2): 179–99. thèques de France, 2, retrieved 1 December
Ben-David, A. (2016) ‘What does the Web 2013 from http://bbf.enssib.fr/consulter/
remember of its deleted past? An archival bbf-2011-02-0060-012
reconstruction of the former Yugoslav top- Kimpton, M., and Ubois, J. (2006) ‘Year-by-
level domain’, New Media and Society, year: from an archive of the internet to an
18(7): 1103–19. archive on the internet’, in J. Masanès (Ed.),
Boudrez, F., and Van den Eynde, S. (2002) Web Archiving. Berlin: Springer. pp.201–12.
‘Archiving websites’ [DAVID project report], Koerbin, P. (2004) ‘Managing Web archiving in
accessed 10 January 2017 from http://www. Australia. A case study’. Paper given at
expertisecentrumdavid.be/davidproject/ IWAW, 2004, retrieved 1 May 2016 from
teksten/Rapporten/Report5.pdf http://iwaw.net/04/
Brown, A. (2006) Archiving websites. A practi- Larsen, S. (2005) ‘Preserving the digital herit-
cal guide for information management pro- age: new legal deposit Act in Denmark’,
fessionals. London: Facet. Alexandria, 17(2): 81–7.
Brügger, N. (2016) ‘Webraries and Web Lecher, H. (2006) ‘Small scale academic Web
archives: the Web between public and pri- archiving: DACHS’, in J. Masanès (Ed.), Web
vate’, in W. Evans and D. Baker (Eds), The Archiving. Berlin: Springer. pp.213–26.
End of Wisdom? The Future of Libraries in a Lips, M. (2014) ‘Transforming government – by
Digital Age. Oxford: Chandos. pp.185–90. default?’, in M. Graham and W.H. Dutton
CHRDR [Centre for Human Rights Documenta- (Eds), Society and the Internet. How Net-
tion and Research] (2016) ‘Human rights works of Information and Communication
Web archive’, retrieved 5 May 2016 from are Changing our Lives. Oxford: Oxford Uni-
http://library.columbia.edu/locations/chrdr/ versity Press. pp.179–94.
hrwa.html Martin, K.E., and Eubank, K. (2007) ‘The North
Elliott, A. (2011) ‘Electronic legal deposit: the Carolina state government Web archives: a
New Zealand experience’. Paper given at case study of an American government web
IFLA conference, retrieved 1 April 2016 from archiving project’, New Review of Hyperme-
http://www.ifla.org/past-wlic/2011/193-elliott- dia and Multimedia, 13(1): 7–26.
en.pdf Masanès, J. (2006) ‘Web archiving: issues and
Field, C.D. (2004) ‘Securing digital legal deposit methods’, in J. Masanès (Ed.), Web Archiv-
in the UK: the Legal Deposit Libraries Act ing. Berlin: Springer. pp.1–53.
2003’, Alexandria, 16(2): 87–111. Milligan, I. (2016) ‘Lost in the infinite archive:
Goel, V. (2016a) ‘Beta Wayback Machine – now the promise and pitfalls of Web archives’,
with site search!’, Internet Archive Blogs, International Journal of Humanities and Arts
retrieved 12 January 2017 from http:// Computing, 10(1): 78–94.
blog.archive.org/2016/10/24 beta-wayback- Milligan, I., Ruest, N., and St Onge, A. (2016)
machine-now-with-site-search/ ‘The great WARC adventure: using SIPS, AIPS
EXISTING WEB ARCHIVES 41

and DIPS to document SLAPPS’, Digital Studies / retrieved 1 April 2016 from https://archive.
Le Champ Numerique, retrieved from https:// org/details/PDA2011-jasonscott
www.digitalstudies.org/articles/10.16995/ Thelwall, M., and Vaughan, L. (2004) ‘A fair his-
dscn.18/ tory of the Web? Examining country balance
Milligan, I. (2017) ‘Welcome to the Web: the in the Internet Archive’, Library and Informa-
online community of GeoCities during the tion Science Research, 26(2): 162–76.
early years of the World Wide Web’, in Thomas, D., Fowler, S., and Johnson, V. (2017)
N. Brügger and R. Schroeder (Eds), The Web The silence of the archive. London: Facet.
as History. London: UCL Press. pp.137–58. Webster, P. (2017a) ‘Religious discourse in the
Milligan, I., and Smyth, T. (2018) ‘Studying the archived Web: Rowan Williams, archbishop
Web in the shadow of Uncle Sam: the difficult of Canterbury, and the sharia law contro-
case of the Canadian web sphere’, in versy of 2008’, in N. Brügger and
N. Brügger and D. Laursen (Eds), The ­Historical R. Schroeder (Eds), The Web as History.
Web and Digital Humanities: the Case of London: UCL Press. pp.190–203.
National Web domains. London: Routledge. Webster, P. (2017b) ‘Users, technologies,
National Library of Canada (1996) ‘Electronic organisations: towards a cultural history of
Publications Pilot Project (EPPP). Summary of world web archiving’, in N. Brügger (Ed.),
the final report’, retrieved 22 April 2016 Web 25: Histories from the First 25 Years of
from http://epe.lac-bac.gc.ca/100/200/301/ the World Wide Web. New York: Peter Lang.
nlc-bnc/eppp_summary-e/ereport.htm pp.175–90.
Ogden, J., Halford, S., and Carr, L. (2017) Webster, P. (2018) ‘Lessons from cross-border
‘Observing Web archives. A case for an eth- religion in the Northern Irish web sphere:
nographic study of Web archiving’. Proceed- understanding the limitations of the ccTLD
ings of WebSci ‘17, Troy, NY, USA, 25–28 as a proxy for the national web’, in N. Brüg-
June 2017, https://doi.org/10.1145/3091478. ger and D. Laursen (Eds), The Historical Web
3091506 and Digital Humanities: the Case of National
Scott, J. (2008) ‘Eviction, or the coming data- Web domains. London: Routledge.
pocalypse’, retrieved 1 May 2016 from Wellman, B. (2011) ‘Studying the Internet
http://ascii.textfiles.com/archives/1617 through the ages’, in M. Consalvo and C. Ess
Scott, J. (2011) ‘Presentation at personal digital (Eds), The Handbook of Internet Studies.
archiving conference, Internet Archive’, Chichester: Wiley-Blackwell. pp.17–23.
4
Periodizing Web Archiving:
Biographical, Event-Based,
National and Autobiographical
Traditions
Richard Rogers

INTRODUCTION: HISTORIOGRAPHIES only the research that each tradition of web


BUILT INTO WEB ARCHIVES archiving affords but also approaches to
studying web archives that (perhaps counter-
The purpose of this chapter is to periodize intuitively) are not based on a study of web-
web archiving, in order to discuss four ongo- site content. Code-based analysis of archived
ing traditions that form an overlapping and websites allows for avenues of research both
layered history of both the implementation about web archiving – such as the historical
but more so the study and use of archived reconstruction of the websites missing in
websites. The most contemporary period of archives (through an historical hyperlink net-
web archiving, as with most recent periods work analysis) – as well as histories of the
generally, is perilous to characterize, but the web that are broader social observations,
self-archiving and selfie culture undertaken such as the rise of tracking that gathers data
most visibly on social media platforms such on web users (through an overtime analysis
as Facebook and Instagram appears to sit at of trackers and cookies embedded in archived
the end of a spectrum that commenced with websites). Ultimately the chapter concludes
the Internet Archive and preserving single with a discussion of the so-called ‘crisis in
websites, and has witnessed a chronology of web archive use’ and ways to address it, such
efforts that include both event-based as well as repurposing and building atop existing
as national web making. I argue that each of web archives.
these periods corresponds to a particular his- To trace the history of web archiving, I
toriographical tradition, inviting certain argue, is to appreciate the distinctive histo-
kinds of content-based history-writing with riographical points of view built into web
the web archives. The piece discusses not archives since the mid 1990s: from the
PERIODIZING WEB ARCHIVING 43

biographical (or single site) and the event- period. Moreover, the effort here is not to
based to the national and autobiographi- develop a framework that explains (social)
cal traditions. A discussion of each would change in web archiving outlooks, however
concern the implications for (web) history- much the four areas I just alluded to may be
writing as well as web archiving practices, of assistance. Finally, the periodization itself
including a call for the consideration of other is from a point in time, and as such could be
scholarly uses of web archives. In order to rewritten anew in future, or itself be concep-
address the ‘crisis in web archive use’, which tualized as a period in theory, e.g., the early
primarily concerns their under utilization, days of web archiving theory when initial
one may look to such creative uses as mak- thoughts were formed on how to periodize
ing screencast documentaries of website web archiving history.
histories, curating thematic or issue-based The periodization rests on touchstones
collections from existing archives, undertak- that in retrospect brought into being archival
ing historical hyperlink analysis to conjure regimes or repertoires that inform the web
a past state of the web as well as examining historiographies on offer. Here the notion
the underlying code of archived websites of archival regime refers to the work of
(for cookies, for example, in order to study Wolfgang Ernst, defining it as ‘not an idiosyn-
tracking or surveillance). It also may be of cratic choice, but a rule-governed, adminis-
interest to expand upon archival regimes tratively-programmed operation of inclusions
that privilege only the national and ‘official’ and exclusions’ (2006: 114). These regimes
definitions of the public interest that drive inform the kinds of histories that are in some
selection (national methodologies), in order sense given by the archives, or afforded to
to enable historical projects that are currently be written, if you will. The archives are also
unintended, such as a reconstruction of the historicizable, and may be studied for their
Yugoslav web (Ben-David, 2016).1 periodicity, or reflecting a particular web time
Before developing the argument, I would when they were created.
like to mention four caveats about the perio-
dization. The first is that I offer a contem-
porary rather than a sweeping periodization
of longue durée such as the medieval, early SINGLE-SITE HISTORIES, OR THE
modern, modern or postmodern. The time BIOGRAPHICAL TRADITION
span is but two decades. The second is that
periodization does not shutter one tradition as The Internet Archive was founded by
it calls another into ascendancy. Whilst episte- Brewster Kahle in 1996 to archive ‘every-
mologically distinctive and related to specific thing’: ‘I usually work on projects from the
touchstones in the history of web archiving, “you’ve-got-to-be-crazy stage”’ (Reiss,
the biographical, event-based, national and 1996; Livingston, 2008; Kahle and Parejo
autobiographical historiographies overlap in Vadillo, 2015). The Sisyphean improbability
time. Each may endure, but one may lack the of the ‘everything archive’ project (or its
vibrancy of its early period, having experi- ‘craziness’) lies in at least two elements: one
enced pioneer’s regress; it may have matured. may never capture all of the web (at any one
The third caveat is that the periodization does time), and as it grows one is presumably
not rest upon a single cause of change, where- increasingly capturing less of it. At its very
upon one would follow back-end technology, outset the Internet Archive presented itself
front-end interface, institutional involvement less in the service of future generations, web
or academic paradigm formation (to name a and other history scholars, the court of law
few), and be able to identify the triggering (and copyright infringement) or other use
event from one area that prompted the next cases to which it subsequently lent itself.
44 THE SAGE HANDBOOK OF WEB HISTORY

Rather, it offered a solution to a major issue or in metatext), the Wayback Machine from
of its time, the ‘404 File Not Found’ problem the beginning invited one to type in a sin-
(Kahle, 1997). Beginning at least in February gle URL, such as http://www.google.com,
1998, Alexa Internet, shortened from and navigate its history through a listing or
Alexandria, made available a browser toolbar calendar of archived instances as well as a
with a small button that would pulsate if the timeline.2 The interface thus sees the web as
website visited on the live web was not found a history of single websites (or more specifi-
(Alexa, 1998). ‘Wayback’, as the button was cally URLs) that evolve through time; one
named, remarkably supplied the missing may view their stories through the interface.3
page from the Internet Archive. The default setting on the timeline is a
In exchange for receiving these offline monthly impression, with a chunky arrow that
webpages from the archive, one would agree invites the user to click through the changes
to allow one’s surfing history to be logged to the website at that historical pace, though
so that the archive’s crawlers could visit through the calendar there is also the oppor-
the sites to archive them. That was the data tunity to choose specific dates as well as
exchange, in one of the earlier ‘free’ busi- times of the day of those frequently archived
ness models (Anderson, 2009). Through the sites. The ones archived most frequently are
Alexa toolbar, one also could retrieve infor- likely from ‘focused crawls’ rather than the
mation about the website one was visiting, ‘broad crawls’ that capture the vast majority
such as how fresh (or stale) it was. One also of distinct sites in the archive (Kimpton et al.,
can view related websites, the speed with 2003; Sigurðsson, 2005).
which the website loads and even its inlink
count as a sign of its popularity or authority.
The Wayback Machine (and the toolbar) are
thus period pieces, and as such of interest to EVENT-BASED SPECIAL COLLECTIONS,
web historians in how they capture the mid OR EVENT-BASED HISTORIOGRAPHY
1990s, where loading speeds, broken links
and related destinations were all considered The second period’s touchstone is the
of the moment for the ‘surfer’. Webarchivist project, and its readiness when
Indeed, even without the toolbar the 9/11 transpired (Webarchivist, 2001). There
Wayback Machine, launched in 2001, later were collection concepts (‘web sphere’),
maintained the web as a surfer’s medium server capacity and researchers in place when
rather than the searcher’s medium that came the airplanes struck the World Trade Center
after it. The surfing continued within the and the Pentagon, out of which came not only
archive itself. Rather than allowing broken the pioneering 9/11 web collection but solidi-
pathways, the archive directed hyperlinks fied event-based web archiving, an idée fixe
on archived webpages to take the user to the in the selection repertoire of web archives.
page closest in time to the original, or to the The historiographical tradition it spawned is
live webpage, if it had not been saved. This thus based on collecting websites around
multi-temporal archival experience argu- events, predominantly elections, disasters
ably made surfing into one of the Wayback and (papal and presidential) transitions. It
Machine’s uses, together with various forms actually began with the Internet Archive’s
of historical reconstruction. first project (with the Smithsonian), archiving
From a historiographical point of view, the the 1996 US presidential elections, and the
Internet Archive, with its Wayback Machine, Webarchivist project followed in those foot-
organized the history of the web in a spe- steps with its plans to capture the 2002 US
cific manner. Whilst since 2016 one may congressional and other elections, when dis-
search for keywords (that appear in URLs aster struck on September 11, 2001. With this
PERIODIZING WEB ARCHIVING 45

agile, just-in-time archiving, they were able part, to be archived separately, either by law
to amass some 25,000 websites (over some or custom (Schostag and Fønss-Jørgensen,
four months), now part of the US Library of 2012). Rather than a cyberspace (where eve-
Congress’s ‘digital collections’. Subsequently, rything can be collected) or a sphere (where a
the Webarchivist project continued to collect thematic event is to be located), from the
websites around elections and further national point of view the web is carved into
expanded its disaster collection work with the relatively tidy, geographical (content) por-
special collection surrounding the Asian tions for national institutions keeping public
Tsunami of 2004 (1,500 websites). records as well as national heritage. In such
Event-based archiving of ‘elections an archival regime it would be remarkable for
and disasters’ owes greatly not only to the national institutions to collect subaltern
Webarchivist project but also to a particular ­materials that archive more than national con-
pioneering collection technique. ‘Web sphere tent, be it (in order of importance) from the
analysis’ is a demarcation technique that top-level country domain, lists of nationally
defines a collection space substantively, tem- significant websites and social media pages,
porally as well as web-topologically. Foot and events as well as language detection.
Schneider describe a web sphere as ‘a set of The third corresponding historiographical
dynamically defined digital resources, often tradition is thus the national, which began (as
connected by hyperlinks, spanning multiple with the event-based tradition) as an Internet
websites relevant to a central event, concept Archive service to Swedish and Icelandic
or theme and bounded temporally’ (Foot and national libraries, before they and other
Schneider, 2002: 225; Schneider and Foot, national libraries settled into their own col-
2005). Akin to Kahle’s ‘everything’ collec- lecting in the mid 2000s. The Internet Archive
tion which necessarily evolves as it discov- would crawl a top-level country domain and
ers new websites, the web sphere definition make it available to the national library for
is also innovative for its webbiness, allowing preservation and access, whereas nowadays,
for its dynamic collection through discover- in a turn of the tables, national libraries crawl
ing new websites through hyperlink analysis. their ‘own’ domains, without subsequently
It is also historicizable, having borrowed from uploading them as collections to the Internet
the emergence at the time of hyperlink map- Archive (often out of concern for copyright
ping techniques to map networks of websites infringement, which also keeps them offline
around the same issue (Rogers, 2012). The and accessible only on site), with the excep-
project also was contiguous with contem- tion of Portugal, which both uploads and
poraneous assumptions of the metaphysics makes accessible its archive online (Internet
of the web, portions of which are organized Archive, 2017). Indeed, one of the stated
(and readily archivable) as spheres, like (in aims of the national web archiving projects
its day) the up-and-coming blogosphere. is to reduce dependence on foreign services
(Gomes et al., 2008). Most other national web
archiving initiatives have gradually developed
their own archiving capacity, with particular
NATIONAL WEB ARCHIVES, OR definitions of what constitutes the national.
NATIONAL HISTORIOGRAPHY National libraries and archives are tasked
with archiving public records as well as other
The third derives from not so much a moment content of national interest, so the ques-
but rather a long march of the national institu- tion asked with regards to web archiving by
tions onto the web, and the growing accept- national libraries has to do with what con-
ance of the very idea of the ‘Danish part of stitutes public interest. What should count,
the internet’, or the Swedish part or the Czech methodologically, as valuable ‘national
46 THE SAGE HANDBOOK OF WEB HISTORY

content’? Archiving traditions are often development, Web activism’ (BnF, 2017). In
adjoined to the technical infrastructure of the all the descriptions of collection-making by
web, so the national becomes the websites national libraries, the national interest takes
using the top-level country domain as well precedence.
as the language, together with public inter- Such an archiving routine editorial-
est and heritage definitions. Along these lines izes the web as a national story, and moves
the national libraries have existing appraisal web archivalism far away from the Internet
and selection traditions as well as vehicles to Archive’s initial approach of crowdsourcing
undertake this kind of work, such as national URLs through people installing a toolbar in a
deposit laws, which obligate archiving. Some ‘grab them all’ tradition, or the hyperlink anal-
countries, such as the Netherlands, do not ysis for event-based, ‘web sphere’ collections.
have such deposit laws, but have similar prin- It often turns ‘events’ largely into national
ciples concerning collecting and preserving ones. These days an Asian tsunami likely
the public record as well as cultural heritage. would not be archived by a European national
Whether with or without deposit laws, one library. As a case in point the last international
could argue the collection approach preserves event collected by the UK web archive is from
some combination of public record for offi- 2004; the Danish archive appears increas-
cial history and heritage for national history. ingly to concentrate on national events, with
Amongst the influential definitions of the the exception of international collaborations
constitution of heritage is the Danish, which with, for example, the Czech archive and its
collects and stores ‘Danica’ or national Vaclav Havel collection.4
cultural heritage, but also incorporates the
history of web archiving into its regime, col-
lecting a couple of events per year. For the
Danes, as well as for other national librar- AUTOBIOGRAPHICAL ARCHIVING
ies, websites that use the top-level country
domain are to be archived as well as those Finally, with the rise of social media (espe-
intended for a Danish audience, and written cially Facebook) and mobile platforms chal-
in Danish. Websites about the Danish peo- lenging the web as the dominant online
ple, significant Danish personalities or well- content and activity space, has come the
known figures, or simply about Denmark (no lessened capacity to archive it, or as archival
matter the language) also would be candidates studies scholars have put it: ‘the responsibil-
for archiving. Therefore, if there is a web- ity of archiving Facebook data [lies with]
site in English about the nineteenth-century individual users’ (Sinn and Syn, 2014); hence
Danish writer Hans Christian Andersen, that the notion of the autobiographical. In prac-
website could be included. In the crawling tice, each of the three traditions mentioned
regimes, apart from regularly archiving sig- above have the means to retain some social
nificant websites and periodically archiving media (and occasionally do archive Facebook
lesser ones, there are also 2–3 ‘events’ (such ‘pages’ such as We are all Khaled Said, cru-
as Danish elections) that Netarkivet, the cial in the Egyptian Revolution of 2011).
archiving authority, is prepared to save. The That archiving, however, does not pertain to
Dutch (without the legal deposit) make use individual profiles after login.
of a similar definition of ‘the national’, as do This fourth tradition (loosely defined) is
the Portuguese and French, whilst the latter is the development of the capacity of archiving
somewhat broader in its special collections – oneself, especially one’s social media use,
apart from the French elections, examples which is outside the purview of the other web
of special collections are more expansive archiving traditions. One could refer to such
than events, such as ‘blogs, sustainable activities as life blogging and quantifying
PERIODIZING WEB ARCHIVING 47

oneself as precursors to self-archiving online, be discovered’ to ‘bad girl’ and finally to


since these, often health-related, authoring ‘healthy lifestyle’ before (surprisingly) post-
practices are often in the style of public dia- ing ‘the end’. Rhizome, the digital arts group,
ries, worthy of saving, like eighteenth-century in producing this work of art, introduced its
chapbooks and other ‘popular’ sources of eve- Web Enact software to capture such coming
ryday life (Darnton, 2009). ‘Mommy’ as well of age or other personal developments on
as ‘DIY fatherhood’ blogging are examples social media. Later called webrecorder.io, it
from the web, as are the ‘wounded healers of allows one to ‘record’ a social media page.
Instagram’ from social media (Ammari et al., This approach stands in contrast to request-
2017; Sanchez-Querubin, 2017). Posting ing a data dump, downloading or taking
entries on Facebook (and even Snapchat) screenshots of the website, or tapping into the
could be construed similarly; however it is the API for data. Finally, I would like to briefly
request for the data dump that could be under- mention ‘selfie archiving’, however much it
stood at least initially as self-archiving. That has been undertaken by neither self-archivers
is to say, awareness of the capacity to create nor web archivists. Researchers have created
collections of one’s own history first relied on collections of Instagram photos hashtagged
knowing one’s digital rights (so to speak) and #selfie. In one project by TIME magazine,
asking companies to comply with them. There the goal was to crown the ‘Selfiest Cities in
have arguably been larger developments in the World’ (Wilson, 2014). The researchers
terms of opportunities for self-archiving, and created a database of 400,000 Instagram pho-
I would like to discuss four recent ones. tos that were tagged #selfie, determined each
The first development concerns health selfie’s geolocation and created a ranked list
apps, and the data retained from fitness of cities, where in the event Makati City and
bands and watches such as Apple’s iWatch. Pasig in the Philippines had the most selfies
Here there are opportunities for ‘personal per 100,000 people, followed by Manhattan,
data requests’ that relate less to one’s writ- Miami and so forth; the list is dominated by
ings than to one’s physical state. The sec- Asian and North American cities that were
ond development concerns the movement subsequently plotted onto a map show-
of activity from the web to smartphones, ing selfie density. Conducted by a team of
and the question as to what to archive. Apart researchers led by Lev Manovich, the second
from making software app collections (prac- selfie archiving project – Selfiecity – created
ticed by the Internet Archive, for example), selfie collections from New York, São Paulo,
there is a shift in collection from ‘content’ to Berlin, Bangkok and Moscow (using #selfie
data. One may access mobile-related activity and city geolocations in the queries). The
data through an API, or application program- researchers subsequently measured the for-
ming interface (Littman et al., 2017). Such mal properties of the faces (tilt of the head,
is a more general strategy for social media smile, etc.). Amongst other findings, the pro-
archiving, be it originating from the web ject found that selfies are largely an undertak-
or from a mobile device (Thomson, 2016). ing for 23- to 26-year-olds, though in Moscow
The third development, recording a period they tend to trend older and in São Paulo
of one’s life in social media, is exemplified younger. Furthermore, in Moscow people
by the social media artistic ‘performance’ are slightly gloomier, whereas in São Paulo
by Amalia Ulman called ‘Excellences happier. Thus, the project outputs city mood
& Perfections’ (2014). In a period of six gradations. It should be noted that Instagram
months Ulman documents on Instagram her (and the moody selfies) are accessible on the
aspiration to become a Los Angeles ‘It girl’, web, though most users are mobile-based (on
gaining followers as she moves through aspi- smartphones), pointing up a number of issues
rational stages of becoming from ‘trying to such as the movement of users and their
48 THE SAGE HANDBOOK OF WEB HISTORY

content production from the web to the app to the Library of Congress, where they are
space and also how web archiving may assist stored. The girth of such big data, however,
with mobile content archiving. has proven too great to make it available to
What are the prospects for archiving social researchers (Zimmer, 2015). Tweet collec-
media institutionally? With respect to the tions also may be made (but not shared with
single-site or biographical tradition (and the others) with software tools, such as the Digital
contents available via the Wayback Machine), Method Initiative’s TCAT (Twitter Capture
attempts to crawl and archive social media and Analysis Tool), which queries Twitter’s
as websites are fraught with issues. Whilst streaming and REST APIs for hashtags,
hardly clear-cut, one could put forward a keywords and user accounts (as well as @
distinction between social media that are mentions). US President Donald Trump’s
principally social networking sites or user- tweet collection (i.e., just his user account’s
generated content platforms, and expect the tweets) is significant as are collections made
content platforms to be more open to crawl- from the US presidential elections of Hillary
ing and archiving than the social networking Clinton and Trump supporters, respectively.
ones (Thomson, 2016) (see Table 4.1). In the There are also hashtag collections, such as
event, Facebook and LinkedIn (seeing them- the infamous #iranelection (around which a
selves as principally social networking sites) great deal of debate ensued concerning the
do not allow archive crawlers, having opted very idea of a ‘Twitter revolution’). Given
out through placing a robots.txt file respected rate limits, questions of collection complete-
(and archived) by the Internet Archive; most ness arise; there is a Twitter-owned service
other platforms allow the archiving of their (GNIP) that provides historical tweet sets at
‘about’ pages, terms of service, FAQs and often exorbitant prices.
other non-password-protected webpages.5 Each of the other traditions sketched above
With respect to archiving the user-generated would tackle archiving social media some-
content, platforms (Flickr, Pinterest, Reddit, what differently. From the event-based tradi-
YouTube) allow archiving of their materials, tion, archiving of Facebook pages (rather than
though in certain cases the materials are per- individual profiles) has been practiced, espe-
sonalized (and ranked), so there is skewing. cially in the case of the Egyptian Revolution
The overall exception is Twitter, which of 2011 (Runyon and Houlihan, 2012; Urgola
donates its historical tweets (and archive) and Runyon, 2016). In the national tradition,

Table 4.1. Select social media platforms with principle function in order of importance
(adapted from Thomson, 2016)
Platform Year launched URL Principal functions in order of importance

Facebook 2005 facebook.com* social networking, user-generated content


Flickr 2004 flickr.com% user-generated content, social networking
Instagram 2010 instagram.com% user-generated content, social networking
LinkedIn 2003 linkedin.com* social networking, user-generated content
Pinterest 2010 pinterest.com% user-generated content, social networking
Reddit 2005 reddit.com#% user-generated content, social networking
Twitter 2006 twitter.com#% user-generated content, social networking
YouTube 2005 youtube.com# user-generated content, social networking

*robots.txt exclusion file in place


# ranked content archived
% terms of service archived
PERIODIZING WEB ARCHIVING 49

one may wish to archive national public fig- notable exceptions (Musso and Merletti, 2016;
ures and official institutions (Facebook pages Chakraborty and Nanni, 2017). In the national
as well as Twitter feeds), as is practiced by tradition, most national libraries have very few
the UK National Archive, for example. As visitors to their web archives, given access
mentioned there is also a movement afoot to policies. The computers allocated to web
shift attention from the html to the data, and archive use by the National Library of France
tap into and archive the API streams offered often lie idle. Annual users of the web archives
by the social media companies. One could in Denmark and the Netherlands number per-
imagine, methodologically, that there would haps in the double digits.6
be ‘everything’, ‘event-based’ as well as How might one address the crisis? Are
‘national’ strategies for API polling. At least there signs that it may be abating? Scholars
with Facebook, which closed down the avail- have introduced ways and means to increase
ability of extracting personal information, the use of web archives, such as making avail-
including one’s own as well as that of friends able full text search, building tools to visual-
with its 1.2 API version (mid 2015), the ize the contents of web archives, treating web
full autobiographical (all data about oneself archives as big data, concatenating archives
across Facebook) still would rely on the data with a common interface or promulgating the
dump, though Archive-It’s software would creation of one’s own archives (Hockx-Yu,
allow one to grab one’s own profile page, and 2011; Padia et al., 2012; Huurdeman et al.,
webrecorder.io affords the means to capture 2013; Memento, 2015; Archive-It, 2017;
its dynamic history (Rieder, 2015). Cowls, 2017; Meyer et al., 2017). For (con-
temporary) historians working with (web)
archivists, one may wish to make one’s own
archive with the Archive-It service, and study
ADDRESSING THE CRISIS IN WEB it, as in the case of the Egyptian Revolution
ARCHIVE USE of 2011 (the uprising of the 25th of January), a
project of the American University of Cairo.
In 2010 web archiving theorists referred to a The historical account of the rise and fall of
crisis in the scholarly use of web archives, for an Islamic punk scene is also illustrative of
there were so few users (Dougherty et al., the approach of making a collection to study
2010). The crisis has been illustrated through it (Dougherty, 2017).
regular queries in Google Scholar and Google Another general approach, described
Web Search for the use of the citation pre- below, is making creative use of existing
ferred by the Library of Congress; for some archives with digital methods (Rogers, 2009).
years now, typing ‘Archived in the Library of By a digital methods approach is meant the
Congress Web’ into Google Scholar has repurposing of dominant devices and plat-
returned very few results (Rogers, 2013). The forms for social and medium research (or,
search engine returns are mainly self-citations. in this case, digital and web history). How
The Library of Congress Web has largely spe- to repurpose the Wayback Machine’s output
cial collections in the event-based tradition, of single websites? One captures a website’s
which could account for the low usage. In the history and plays it back in the style of time-
single-site tradition, the employment of the lapse photography as a screencast documen-
Wayback Machine of the Internet Archive in tary with voiceover (Rogers, 2017).7 One
scholarly work is greater; however, most arti- example is the history of google.com, called
cles are about Wayback URLs as references in ‘Google and the Politics of Tabs’ (Rogers
legal cases and academic papers (and how and Govcom.org, 2008). Using the Wayback
Wayback combats live link rot), rather than as Machine, all unique historical instances of
sources for web or digital history, with some google.com are captured, i.e., those pages
50 THE SAGE HANDBOOK OF WEB HISTORY

that contained changes as signified by an should be noted that such a project could not
asterisk on the Wayback Machine’s (classic) be realized with a national archival regime
user interface. These are loaded into a mov- that saves ‘significant’ websites in the pub-
iemaker (QuickTime), and then played back lic interest for heritage or, for that matter, an
to choose themes for a voiceover. The story is event-based approach. Being able to make a
told by noting the gradual changes made over collection of extremist sites would demand an
time to the front page of google.com, espe- ‘everything’ approach, augmented perhaps by
cially to the tabs above the search box, where combing multiple national web archives that
different services came and went, such as the sometimes stray from the national (heritage)
Google directory, or the human-edited listing. methodology.
After appearing with great fanfare in 2000 it The third example is a project that again
was gradually relegated to the ‘more’ button, creates a thematic collection of websites
and then placed behind ‘even more’, until within existing web archives, and conjures
the directory was finally removed altogether a past state of the web, so as to study the
in 2004. Other services made it all the way missing web and its significance through his-
to the front-page interface before disappear- torical hyperlink analysis. The past state in
ing, or have staying power, such as images question is the early blogosphere. There is an
(Google Image Search). The story told is a archived website called Eaton Web, which for
web history: in the screencast documentary many years had an authoritative list of blogs,
of google.com, there is an evident, gradual and it was considered the portal (or leading
decline of the directory over time (and with directory) for the blogosphere (albeit largely
it the human librarian), and the simultane- American and English-language ones). In the
ous rise of the algorithm or the back end of summer of 2001, the website owner, Eaton,
search. Other examples of screencast docu- gave up listing new blogs because of abun-
mentaries have been made for media history dance. To the researchers this was the sign
(e.g., the evolution of a newspaper on the that the period of the ‘early blogosphere’ was
web) as well as digital history (the evolution over. Eaton’s last list was batch queried in
of the US White House, whitehouse.gov). the Wayback Machine, and it was found that
The second contribution to web archive only 20% or so of blogs were missing (see
usage is to create one’s own thematic Figure 4.1). Consequently, historical hyper-
­collection of websites from one or more exist- link analysis was performed, which enabled
ing archives, first demonstrated by the Dutch the researchers to determine the significance
newspaper, the NRC Handelsblad (Dohmen, of each of the websites in the blogosphere,
2007). By largely using the Wayback including the missing ones, according to
Machine, the researchers made a collection network measures. Thus, all the sites were
of Dutch right-wing and right-wing extremist given historical context, and the past ‘blogo-
websites, and devised a keyword query strat- sphere’ was depicted (or conjured) as a
egy to answer the question: is Dutch culture network. Using a similar technique of his-
hardening and becoming more extreme? To torical hyperlink analysis, the evolution of
answer this question, they examined how the the Dutch blogosphere has also been mapped
language on the websites changed over time, (Weltevrede and Helmond, 2012).
demonstrating that on the right-wing websites Finally, the fourth approach is to study not
in particular, the language used was becoming the content of a single website, a thematic set
increasingly extreme, approximately that on over time, or a past state of the web through
the right-wing extremist sites. The research- hyperlink analysis (including the unarchived
ers cautiously concluded that Dutch culture sites), but rather the underlying code of one
is hardening, thereby making a contribution, or more websites. This is a technique col-
however modest, to social history. Here it leagues and I discovered coincidentally by
PERIODIZING WEB ARCHIVING 51

Figure 4.1 Early blogosphere, with missing archived websites. Collection based on
Eatonweb. Digital Methods Initiative, Amsterdam, 2009.

using the browser add-on Ghostery whilst periodicity produce distinctive historiographi-
visiting historical websites in the Wayback cal traditions. Historicized and described is the
Machine. Ghostery shows third-party ele- single-site (or website biography) approach
ments embedded in webpages when vis- built into the interface of the Internet Archive
ited, including trackers, third-party cookies (mid 1990s), the event-based tradition from the
and so forth. With Ghostery enabled whilst Webarchivist project (and particularly the 9/11
visiting an archived webpage, one can view archive), the long march of the national institu-
historical trackers (van der Velden, 2014; tions and heritage methodologies making their
Deville and van der Velden, 2015). These own national webs (mid 2000s), and a variety
trackers may be captured over time for spe- of efforts to save social media, and particularly
cific websites, showing for example the his- individual accounts through recording (as well
tory of tracking behavior by The New York as data dumps) in the autobiographical tradi-
Times (Helmond, 2015) (see Figure 4.2). tion (early 2010s). There is a distinctive chro-
nology of web archiving with particular key
moments per tradition, such as the Alexa tool-
CONCLUSIONS: FROM SINGLE SITE bar, the ‘web sphere’, the ‘Danish part of the
HISTORIES TO HISTORICAL NETWORK web’, and webrecorder.io, but each archiving
ANALYSIS regime and associated historiographical tradi-
tion continues.
The effort here is to periodize web archiving, The dominant approaches to web archiv-
in order to describe how archival regimes and ing and their in-built historiographies do not
52 THE SAGE HANDBOOK OF WEB HISTORY

Figure 4.2 Trackers embedded in The New York Times. Output from Tracker Tracker tool
showing trackers embedded in archived newspaper webpages over time. Digital Methods
Initiative, Amsterdam, 2012.
PERIODIZING WEB ARCHIVING 53

preclude others, however much (with the Notes


exception of the Internet Archive and some
1 National web archival regimes have taken root in
smaller projects) the emphasis is increas- at least the following countries, whose national
ingly on the public interest remit of a heritage libraries or archives are members of the Interna-
institution.8 As a result, one is able to write tional Internet Preservation Consortium (IIPC):
national (web) histories from the contents Canada, Chile, China, Croatia, Czech Republic,
stored by the national institutions. Such an Denmark, Estonia, Germany, Finland, France, Ice-
land, Ireland, Israel, Japan, Latvia, Luxembourg,
emphasis, however, would preclude projects the Netherlands, New Zealand, Norway, Poland,
such as the reconstruction and transformation Portugal, Scotland, Serbia, Singapore, Slovenia,
of the Iraqi web before, during and after the South Korea, Spain, Sweden, Switzerland, the
Iraq War, 2003–11. Given the refrain con- UK and the United States. Numbering some 50,
cerning the urgency of archiving the web and there are additional representatives from prov-
inces (Catalonia and Quebec) as well as ones
born-digital content, a rejoinder would add: from universities and other (governmental) insti-
when does it become urgent to archive other tutions. See Niu (2012).
cultures than one’s own (Beunen & Schiphof, 2 The Memento project (mementoweb.org), the ini-
2006)? tiative that strives to source and output archived
Web archives are also underutilized. Many websites from multiple web archives, also invites
the single URL as input.
can only be accessed from a library reading 3 The Internet Archive originally was provided to
room; for the German national web archive, users of the Alexa Toolbar as a solution to the ’404
for example, one would sit behind a national file not found’ error, providing single webpages.
library computer in Frankfurt or Leipzig. Few The Wayback Machine, as its interface, also pro-
do, creating an air of crisis in web archive use. vides single webpages. Since one is prompted to
type a URL into the interface, users presumably
Brightening the archive is the task of scholars would type front pages, or website domains, into
researching archive use, and building tools the interface, and proceed from there. I thus refer
atop. Other engagement comes from making to the Wayback Machine as organizing single-site
one’s own (with Archive-It) or curating collec- histories. There are other use cases, too, such as
tions from existing archives. Indeed, amongst pasting a longer URL, in order to look up copy-
right infringement. One may paste a specific his-
the creative use of archives for scholarly pur- torical URL that is now offline, thereby employing
poses is building a thematic collection (e.g., it in the manner of the Alexa Toolbar of old.
right-wing and right-wing extremist websites 4 When collaborating with other international
in the Netherlands) and posing research ques- institutions (such as with the Czech web archive),
tions about (the hardening of) culture. Other the Danish web archive occasionally makes col-
lections that are not primarily or partially Danish
uses of web archives put forward in the digi- in focus. Since the early days, however, there
tal methods approach above include the tech- appears to be an increasing tendency to follow
nique of creating a screencast documentary the Danish heritage preservation policy, and
of the history of a website, conjuring a past archive events where Denmark is involved (Olym-
state of the web through historical hyperlink pics, EU elections), as well as particularly national
events such as the teacher lock-out, a Copenha-
analysis and examining the underlying code gen science festival and a national scandal involv-
of archived websites and fishing out track- ing credit card transaction monitoring.
ers and third-party cookies with the aid of 5 Web archiving software, such as Archive-It, has
a tracker database, in order to put forward a the option to ignore robots.txt, thereby enabling
history of (online) surveillance or behavioral the archiving of social media pages, groups
and profiles, if one is logged in, and has given
targeting. Each approach rests on the capac- user credentials. Otherwise one archives default
ity to build software on top of an online web prompt pages, such as Facebook’s ‘log in or sign
archive, query it, extract data and make deriv- up’. (Archive-It does have a default user, Charlie
ative works, which is a research practice dif- Archivist, who has no friends.) Archive-It’s instruc-
ferent from visiting a library’s reading room tions are explicit in their call to exclude personal
profiles (Lohndorf, 2017). It should be added
and browsing its web archive.
54 THE SAGE HANDBOOK OF WEB HISTORY

that social media companies such as Facebook First 25 Years of the World Wide Web. New
and LinkedIn explicitly prohibit crawling without York: Peter Lang. pp.157–172
permission, and list on the robots.txt exclusion Cowls, J. (2017) ‘Cultures of the UK Web’, in
page which types of content archive crawlers Niels Brügger and Ralph Schroeder (Eds.),
may access, i.e., Facebook pages of public figures
The Web as History. London: UCL Press.
rather than personal profiles.
6 In the Netherlands it often has been a hand-
pp.220–237.
ful of users. In Denmark special workshops at Darnton, R. (2009) The Case for Books: Past,
Netarkivet organized by the NetLab at Aarhus Present and Future. New York: Public Affairs.
University have bolstered the numbers. Deville, J., and van der Velden, L. (2015)
7 One employs the Wayback Machine link ripper ‘Seeing the invisible algorithm’, in Louise
to harvest the URLs of all historical instances of Amoore and Volha Piotukh (Eds.), Algorith-
a webpage, from which one chooses the ones to mic Life: Calculative Devices in the Age of
screenshot. Loading a select list of URLs into ‘Grab Big Data. London: Routledge. pp.87–105.
them all’ or another batch screenshot generator Dohmen, J. (2007) ‘Opkomst en ondergang
creates files to be imported into a moviemaker.
van extreemrechtse sites’, NRC Handelsblad,
8 The Common Crawl project (commoncrawl.org)
is another exception.
25 August.
Dougherty, M. (2017) ‘”Taqwacore is Dead.
Long Live Taqwacore” or punk’s not dead?:
Studying the online evolution of the Islamic
REFERENCES punk scene’, in Niels Brügger and Ralph
Schroeder (Eds.), The Web as History.
Alexa (1998) ‘Support’, Alexa.com, https:// London: UCL Press. pp. 204–219.
web.archive.org/web/19980209020820/ Dougherty, M., Meyer, E.T., Madsen, C., van
http://www.alexa.com:80/support/details/ den Heuvel, C., Thomas, A., and Wyatt, S.
index.html (2010) ‘Researcher Engagement with Web
Ammari, T., Schoenebeck, S., and Lindtner, S. Archives: State of the Art’, London: JISC.
(2017) ‘The Crafting of DIY Fatherhood’, Ernst, W. (2006) ‘Dis/continuities: Does the archive
Proceedings of CSCW ‘17, New York: ACM, become metaphorical in multi-media space?’,
1109–1122. in Wendy Chun and Thomas Keenan (Eds.),
Anderson, C. (2009) Free: The Future of a Radi- New Media, Old Media. A History and Theory
cal Price. New York: Hyperion. Reader. New York: Routledge. pp.105–123.
Archive-It (2017) ‘Archive-It. Web archiving Foot, K.A., and Schneider, S.M. (2002) ‘Online
services for archives and libraries’, San Fran- Action in Campaign 2000: An Exploratory
cisco: Internet Archive. Analysis of the U.S. Political Web Sphere’,
Ben-David, A. (2016) ‘What does the Web Journal of Broadcasting & Electronic Media,
remember of its deleted past? An archival 46(2): 222–244.
reconstruction of the former Yugoslav top- Gomes, D., Nogueira, A., Miranda, J., and
level domain’, New Media & Society, 18(7): Costa, M. (2008) ‘Introducing the Portu-
1103–1119. guese web archive initiative’, Proceedings of
Beunen, A., and Schiphof, T. (2006) ‘Legal IWAW ‘08, Heidelberg: Springer.
aspects of web archiving from a Dutch per- Helmond, A. (2015) ‘The Platformization of the
spective’, Centre for Law in the Information Web’. PhD dissertation, Amsterdam: Univer-
Society. Leiden: University of Leiden. sity of Amsterdam.
BnF (2017) Digital legal deposit: four questions Hockx-Yu, H. (2011) ‘The Past Issue of the
about Web Archiving at the BnF, Paris: Bibilo- Web’, Proceedings of WebSci ‘11, New York:
theque national de France, http://www.bnf. ACM.
fr/en/professionals/digital_legal_deposit/a. Huurdeman, H.C., Ben-David, A., and Samar, T.
digital_legal_deposit_web_archiving.html (2013) ‘Sprint methods for web archive
Chakraborty, A., and Nanni, F. (2017) ‘The research’, Proceedings of WebSci ‘13, New
changing faces of science museums: A dia- York: ACM.
chronic analysis of museum websites’, in Internet Archive (2017) Arquivo.pt: the Portu-
Niels Bruegger (Ed.), Web 25: Histories of the guese web-archive, San Francisco: Internet
PERIODIZING WEB ARCHIVING 55

Archive, https://archive.org/details/ Rogers, R. (2012) ‘Mapping and the politics of


portuguese-web-archive the web’, Theory, Culture & Society, 29(4/5):
Kahle, B. (1997) ‘Archiving the Internet’, Scien- 193–219.
tific American, March: 157–172. Rogers, R. (2013) Digital Methods. Cambridge,
Kahle, B., and Parejo Vadillo, A. (2015) ‘The MA: MIT Press.
Internet Archive: An Interview with Brewster Rogers, R. (2017) ‘Doing web history with the
Kahle’, 19: Interdisciplinary Studies in the Internet Archive: Screencast documentaries’,
Long Nineteenth Century, issue 21. Internet Histories, 1(1–2): 160–172.
Kimpton, M., Stata, R., and Mohr, G. (2003) Rogers, R., and Govcom.org (2008) ‘Google
‘Internet Archive Crawler Requirements and the Politics of Tabs’, video, Amsterdam:
Analysis’, Heritix Internet Archive Webteam Govcom.org.
Confluence. San Francisco: Internet Archive. Runyon, C., and Houlihan, M. (2012) ‘Revolu-
Littman, J., Chudnov, D., Kerchner, D., Peter- tionary libraries: Building collections and
son, C., Tan, Y., Trent, R., Vij, R., and Wrubel, promoting research about the January 25th
L. (2016) ‘API-based social media collecting uprising in Egypt’, Alexandria, 23(2):
as a form of web archiving’, International 73–77.
Journal on Digital Libraries, 28: 1–18. Sanchez-Querubin, N. (2017) ‘The Wounded
Livingston, J. (2008) Founders at Work: Stories Healers of Instagram’, Paper presented at
of Startups’ Early Days. New York: Apress. Trauma Studies in the Digital Age workshop,
Lohndorf, J. (2017) ‘Archiving Facebook’, Netherlands Institute for Advanced Study in
Archive-It Help Center, September, https:// the Humanities and Social Sciences, Amster-
support.archive-it.org/hc/en-us/ dam, 10–12 May.
articles/208333113-Archiving-Facebook Schneider, S.M., and Foot, K.A. (2005) ‘Web
Memento (2015) Memento Guide – Introduction sphere analysis: An approach to studying
to Memento. Mementoweb.org online action’, in Christine Hine (Ed.), Virtual
Meyer, E., Yasseri, T., Hale, S., Cowls, J., Methods: Issues in Social Research on the
Schroeder, R., and Margetts, H. (2017). ‘Ana- Internet. Oxford: Berg. pp.157–170.
lysing the UK web domain and exploring 15 Schostag, S., and Fønss-Jørgensen, E. (2012)
years of UK universities on the web, in Niels ‘Webarchiving: Legal deposit of internet in
Brügger and Ralph Schroeder (Eds.), The Denmark. A curatorial perspective’, Micro-
Web’ as History. London: UCL Press. form & Digitization Review, 41(3/4):
pp.23–44. 110–120.
Musso, M., and Merletti, F. (2016) ‘This is the Sigurðsson, K. (2005) ‘Incremental crawling
future: A reconstruction of the UK business with Heritrix’, Proceedings of IWAW ‘05,
web space (1996–2001)’, New Media & Vienna.
Society, 18(7): 1120–1142. Sinn, D., and Syn, S.Y. (2014) ‘Personal docu-
Niu, J. (2012) ‘An overview of web archiving’, mentation on a social network site: Face-
D-Lib Magazine, 18(3/4). book, a collection of moments from your
Padia, K., AlNoamany, Y., and Weigle, M.C. life?’, Archival Science, 14(2): 95–124.
(2012) ‘Visualizing Digital Collections at Thomson, S.D. (2016) ‘Preserving Social Media:
Archive-It’, Proceedings of JCDL ‘12, New DPC Technology Watch Report 16-01’, Glas-
York: ACM. gow: Digital Preservation Coalition.
Reiss, S. (1996) ‘Internet in a Box’, Wired, 1 Ulman, A. (2014) ‘Excellences and Perfections’,
October, http://www.wired.com/wired/4.10/ New York: Rhizome, http://webenact.rhizome.
scans.html org/excellences-and-perfections
Rieder, B. (2015) ‘The end of Netvizz (?)’, blog Urgola, S., and Runyon, C. (2016) ‘Participatory
post, The Politics of Systems blog, http:// archives: Building on traditions of collabora-
thepoliticsofsystems.net/2015/01/the- tion, openness, and accessibility at the
end-of-netvizz/ American University in Cairo’, in Raymond
Rogers, R. (2009) The End of the Virtual: Digital Pun, Scott Collard and Justin Parrott (Eds.),
Methods. Amsterdam: Amsterdam Univer- Bridging Worlds: Emerging Models and Prac-
sity Press. tices of U.S. Academic Libraries Around the
56 THE SAGE HANDBOOK OF WEB HISTORY

Globe. Chicago: Association of College and Weltevrede, E., and Helmond, A. (2012) ‘Where
Research Libraries. pp.91–103. do bloggers blog? Platform transitions within
van der Velden, L. (2014) ‘The third party diary: the historical Dutch blogosphere’, First
Tracking the trackers on Dutch governmental Monday, 17(2).
websites’, NECSUS. European Journal of Wilson, C. (2014) ‘The Selfiest Cities in the
Media Studies, 25 June. World: TIME’s Definitive Ranking’, TIME
Webarchivist (2001) ‘Please help us build a Magazine, 10 March.
Web Archive of the Sept 11 Attack’, web- Zimmer, M. (2015) ‘The Twitter archive at the
page, Webarchivist.org, 1 October, https:// Library of Congress: Challenges for informa-
web.archive.org/web/20011001200536/ tion practice and information policy’, First
http://webarchivist.org:80/ Monday, 20(7).
PART II

Theoretical and
Methodological Reflections
This page intentionally left blank
5
Web History in Context
Va l é r i e S c h a f e r a n d B e n j a m i n G . T h i e r r y

INTRODUCTION encompassing or encompassed by the history


of the Web as an object affecting numerous
Insisting on a contextualised history of the areas of the economic, political, technical
Web may at first appear tautologous, given and personal lives of the actors involved (an
the central role of contextualisation in the object determining its own context), no less
practice of history. Nonetheless, the number than as an object set within numerous already
of factors involved in the deployment, diffu- identifiable contexts (of time, space and
sion and usage of the Web raises important material reality).
questions about the hierarchies to be oper- This perspective entirely meets the ten-
ated, the periodisation to be adopted and the dencies and issues raised in the history of
active strategies to be emphasised. information technology. Within less than
Emerging from a context to be appre- 15 years, this history has developed in a
hended on many different levels (global, number of directions, as highlighted by
national and also individual) and within many Tom Misa (2007) in Understanding How
different areas (technical, economic, political Computing Has Changed the World. An ini-
and ­cultural), the Web also gives rise to new tial focus on technology has given way to
dynamics. They have implications for tempo- work on the wider roots of the information
rality, for conceptions of space and even of age. Tom Misa anticipated the development
events. Constructing a historical narrative of of a field that would seek to understand
the Web from the early 1990s until the present the meeting of information technology and
should take full account of a phenomenon society, increasingly important with the
both global and present in an ever-growing development of networks and of the Web.
number of human activities. It means that The dissemination of information technol-
we need to pay equal attention to everything ogy not only in the professional world, but
60 THE SAGE HANDBOOK OF WEB HISTORY

also, thanks to micro-computing, in the THE WEB FROM A DIACHRONIC AND


home, along with the increasing presence CHRONOLOGICAL PERSPECTIVE
of networks in daily life, has led to new
approaches taking greater account of social
factors. The willingness to take on board Can you describe for the reader the D-day, or the
the complexity of innovation from multiple H-hour or the M-minute, the precise moment
when you invented the World Wide Web?
perspectives, to rethink the articulation of
top-down and bottom-up approaches, along (His expression suggests he is lost in memories)
with, for example, Nathan Ensmenger’s Well, I was coming down a footpath in the Swiss
advocacy (2012) of a reassessment of the Alps… (silence) Then the clouds started getting
social history of computing, have influ- thicker… and darker… (silence) There was a flash of
lightening, a peel of thunder, a storm was­
enced historians’ views of information
beginning to blow, when suddenly the clouds
technology as well as of networks and the parted and…
Web.
The presence now of the Web in every And?
area of life demands studies which constitute
And nothing! I’m making this up… Ideas never
it simultaneously both as itself a contextu-
arrive like that, you’re not suddenly struck by an
alising object and as anchored in a context. illumination or a revelation, everything you read
Contextualising because today it directly about Newton’s apple or Archimedes shouting
influences an infinite number of objects, ‘Eureka!’ in his bath-tub, these are fantasies! […].
which can no longer be studied without tak-
Tim Berners-Lee (2014)
ing account of their echoes online. In trans-
media dynamics in particular, we can see With this touch of humour Tim Berners-Lee
how the Web is increasingly becoming an questions the vision – sometimes still pre-
ever-present backdrop to controversies, to sent in part of the general public – of a
the configuration of information and even to seminal revelation that would give rise to
interpersonal relationships. the emergence of an invention. The Web did
Equally, the Web must find its own contex- not come out of the blue in the late 1980s.
tualisation, first of all – since we are histori- The notion of hyperlink already existed, and
ans – from a diachronic perspective. The first Mosaic was not the first browser; therefore
challenge is not to remain within the limits it is the uses, creativity, circulation and
set by the technology itself. appropriation of the World Wide Web and
This twofold approach (the Web in con- Mosaic, as much as the innovation itself,
text and the Web as context) suggests how which ensure success. Thus, it is the gene-
focused we need to be on the construction of alogies, but also the turning points, and the
an ever-changing socio-technical reality – a breaks as well as the continuities, that the
return to a totalising, process-focused his- history of the Web has to question, inter-
tory which takes account of ongoing devel- weaving the temporalities of innovation,
opments. We should resist the effects of any uses, communication and media.
discourse tending to fix realities or establish,
in hieratic or metonymic fashion, an image of
‘THE’ Web.
The present analysis falls into three main Genealogies and Transitions:
sections, which interrogate, in turn, the dia- A Process-Focused History
chronic or temporal contextualisation, the
boundaries or spatial contextualisation and The origins of the Web are the subject of a
finally collective and individual experiences rich literature featuring two essential aspects.
of the Web. The first of these consists of (infinitely
WEB HISTORY IN CONTEXT 61

varied) definitions of a finished Web, in a genealogical choices concerning the tech-


state allowing it to be defined as such, some- nologies to be studied, sometimes to anach-
thing existing from that point onwards and ronistic effect. An example of this is Paul
not before. The fetishisation of dates has led Otlet’s Mundaneum. This bibliographic
to the over-determination of particular char- tool (Mazower, 2012; Van den Heuvel and
acteristics, which vary from one writer to Rayward, 2011) originated in Belgium at
another: the appearance of a graphic interface the end of the nineteenth century and is now
and the marketing of the first search engine presented as a founding element in gene-
are the best-known examples. alogies of the Web. Some have even sought
The second aspect is the genealogy to define it as an – at least conceptual –
established in order to explain ‘how the prototype. However, it is unclear whether
Web happened’. This draws on notions of those working on the technologies essential
a longer-term genealogy. The work of Paul to the development of the Internet, and then
Otlet at the turn of the twentieth century or of Web interfaces, had even heard of such
Ted Nelson’s Xanadu project are thus pos- tools. Some journalists and writers, for exam-
ited as early versions of hypertext, even ple, have described Mundaneum as a ‘paper
though the latter is also viewed as an exam- Google’. This is not to deny the influence
ple of vaporware (Wolf, 1995). Meanwhile, of Otlet’s arguments, despite the presentism
other developments, like that of the French inherent in such an anachronistic approach. It
Minitel (a videotex online service), are does, however, serve to underline the fact that
often ignored because they took place in a genealogical history involving elements
geographical locations discounted by the just as complex as the Web itself requires the
historiography of modern technologies. most rigorous analysis of any point of contact
Niels Brügger (2016: 1059–60) has already before actual effects can be deduced or given
pointed to the many omissions in existing any weight.
historical works: Bearing this in mind, the ‘actual creation’
of an artefact, rather than its abstract concep-
Should we celebrate the moment when the idea tion, is these days taken as the crucial element
for an innovation was formulated? Should the
in a historical narrative. Thus, Paolo Bory,
thoughts and technologies paving the way for the
idea be included? Should the prototype or Eleanora Benecchi and Gabriele Balbi (2016),
the finished product be celebrated? Or was the in How the Web was Told: Continuity and
technology not invented until it was made publicly Change in the Founding Fathers’ Narratives
or commercially accessible, or maybe not even on the Origins of the World Wide Web, have
until its number of users reached a critical mass?
clearly established the dual narrative of how
How these questions are to be answered is in part
a function of each individual scholar’s specific Ted Nelson invented the term hypertext and
research question, focus and interest. Berners-Lee made the system a reality:

He argues that the choice of founding event, Paola Castellucci (1999) argues that, in Weaving
the Web, Nelson was portrayed as the artistic and
the timespan considered and the types of
eclectic inventor of the term hypertext, whereas
factor taken into account are determined by Berners-Lee depicted himself as the inventor of the
a researcher’s own background and objec- thing hypertext (pp. 12–18). This thesis has been
tives, as well as by the stated aims of a strengthened by Mike Sendall: ‘Ted Nelson had
study. thought about this forty years ago but it was Tim
who went and did it!’ (Gillies and Cailliau, 2000:
There is considerable justification for this
201). In other words, depicting Nelson’s hypertext
view. From a more polemical perspective, as a potential idea that ‘Berners-Lee transformed
we should also look to the vested interests into reality […] made Berners-Lee himself the
that are bound to influence writing and writ- innovator who “activated” and systematised the
ers on the Web. In particular, these influence hypertext technology’ (Bory et al., 2016).
62 THE SAGE HANDBOOK OF WEB HISTORY

The notion of solitary inventors is now periods, and finally that of adoption by a gen-
increasingly giving way to the study of the eral public, the division between personal
background movements that shaped the Web. websites, followed by blogs and wikis, and
It allows us to appreciate the collective and then digital social networks, etc., should be
international elements of co-construction. The refined too.
different ways in which the Web has been used Thinking of digital culture should also lead
are increasingly recognised as defining actual us to question the significant and commonly
milestones in the history of networks too. duplicated historical breaks within the his-
tory of the Web, in particular that of Web 2.0,
or the ‘participatory turn’.
The Periodisation of Web History Like many important concepts, Web 2.0
doesn’t have a hard boundary, but rather
Every historical phenomenon has its own a gravitational core. Tim O’Reilly (2005)
periodisation (Prost, 2010: 118), which the explains that you can visualise Web 2.0 as a
historian must unravel in order to construct set of principles and practices that tie together
the chronology of his/her object. The perio- a veritable solar system of sites that demon-
disation of Web history is a challenge facing strate some or all of those principles, at a
the historian on several levels. varying distance from that core. Discourses
First to be considered is an overall chro- of actors like Tim O’Reilly demonstrate how
nology of the Web, which must not only take much they can be performative. Convinced
account of technological turning points, but of the importance of the participatory shift,
also consider developments in media, media- O’Reilly reverses the burden of proof by
tised and information culture. We can think, explaining that an ‘important concept’ has
for example, of how gifs, memes (Eppink, no solid boundaries but works much like our
2014), spam (Brunton, 2013) and audio for- solar system, with its ellipses and its gravita-
mats have been used. We can consider how tional influences. For the historian, it is the
video usage has grown, with wider and wider reverse: a solid concept not only has borders,
transmedia circulation, as well as of the ways but it allows complementary boundaries to be
in which different applications and commu- drawn at its margins to identify and articulate
nication empires, as they develop, offer plat- other objects.
forms and services, like Google, YouTube We should thus be taking some distance
and Twitter, which have tended to establish from the discourses, so widely relayed by the
their own rhythms. Also important is a grasp media, of certain digital actors such as Tim
both of network hybridisation in the creative O’Reilly or Chris Anderson. Their formulaic
field (Shifman, 2013) and of network influ- skills (‘An attitude, not a technology’, ‘Data
ence on other media systems (content shared as the Intel Side’, etc.; see O’Reilly (2005))
with television, cinema, etc.). should not lead us to neglect other earlier
Historians need also to consider a com- practices and antecedents, as highlighted by
munication era that is both heterogeneous in Michael Stevenson (2016):
terms of Internet and Web users and global in
From celebrations of the electronic frontier and
terms of countries and continents (we return cyberspace to pronouncements of Web 2.0 and,
to this in the next section). Temporalities to more recently, ‘the sharing economy’, each new
be distinguished should not simply include paradigm shift on the Web may be conceptualised
the Web 1.0 period, followed by the Web 2.0 not just as technological innovation but also as a
rhetorical move that revitalises familiar oppositions
(O’Reilly, 2005) and CMS period.
between the old and the new, thus ‘consecrating’
Alongside – and not always correspond- a new genre or technology as a true departure
ing precisely to – the chronological division from old media. How might the persistence of
between the ‘early adopter’ and ‘newbie’ such discourses be explained?
WEB HISTORY IN CONTEXT 63

In ‘Rethinking the Participatory Web’, constructed ‘in real time’ by the automatic
Stevenson (2014) turns to the history of collection of content that, to some extent at
HotWired to illustrate how debates about the least, has not lost its usage value.
value of amateur participation have long been On the other hand, as Niels Brügger (2012:
fundamental to the imagination of the Web. 161–3) puts it, ‘in general it is impossible to
The values attached to interaction or partici- archive online web content on a scale of 1:1.
pation have also been subject to developments […] On the contrary: the web archiving pro-
not confined to the Web. Forms of expression cess creates unique versions, each with their
on the Web are inextricable from the social own individual “aura”’. Web archives cre-
representations behind the formats concerned. ate reconstructions and offer content from
They relate in turn to imaginaries demanding the past set within contemporary interfaces
analysis in terms of both the short-term his- where we find temporal leaps created by the
tory of the Web and the longer-term history of reconstruction of links, constituting hybrid
written work (Herrenschmidt, 2007). and unpublished content that overturns the
While the famous rise of the amateur, the notion of authenticity (Schafer and Thierry,
place of user-generated content and, more 2015). As Louise Merzeau (2014) has noted,
widely, the question of user participation are these are also inextricable from the rhythms
heuristic structures vital to historians, they of the organisations that have archived the
do well to take equal account of the rhythms content. She highlights how in less than
of a particular object. Commercial usage, 20 years the archiving of the Web has
for example, has not developed at the same passed through different paradigms: from
rhythm as the activist Web, does not form the documentary model, aimed at compre-
part of the same structures and does not have hensive archiving on earlier models like that
the same relationship to the media. of the library, it became a dynamic, tempo-
Furthermore, it is important that historians ral archive in the image of the Web itself,
situate the objects of their studies within a recognising instability as an indispensa-
broader historical process: an archived web ble dynamic, not forgetting the notion of
page dedicated, for example, to the abdication the archive as memory, with the model of
of Pope Benedict XVI may be accessed by the intelligent copy or exemplar. As Niels Ole
historians of the papacy and historians of the Finneman emphasises (2015), the archiving
religious press or of online communication, of the Web also involves taking account of a
may be examined and analysed in different temporal dimension with three aspects: the
ways and also may be inserted into a perio- dimension of original content, the ‘cumu-
disation particular to the researcher’s own lative and transformative’ dimension of
field of study (Schafer and Thierry, 2015). As archiving and finally the dimension of the
noted by Antoine Prost (2010: 116), ‘The his- researcher’s reading, which again becomes
torian does not reconstruct an entire temporal- a kind of hypertextuality. We can also add a
ity in every research project, but encounters a fourth dimension, that of a reading inextrica-
time already worked on, already periodised, ble from its own time and not always immune
by other historians’. Any analysis of websites to presentism or nostalgia.
also involves a further temporal dimension, in We see this nostalgia particularly in relation
addition to those already highlighted. On the to the 1990s. Historians seeking to reconstruct
one hand, as noted by Claude Mussou (2012: the Web of that time encounter not only a nos-
264, our translation): […] whereas an archive talgic discourse on the Web itself, but also the
is traditionally composed of documents whose nostalgia of those for whom that period was
usage value is no longer recognised, selected a golden age. To them it represents the time
or organised in accordance with already iden- of proselytising ‘early adopters’ sharing their
tified and fixed criteria, the Web archive is knowledge and skills and developing what
64 THE SAGE HANDBOOK OF WEB HISTORY

came to be known as Netiquette (a portman- As noted above by Berners-Lee, the Web did
teau composed of net and etiquette qualifying not appear on a specific D-day, H-hour or
courtesy on online discussions groups). This M-minute, but its British founder reminds us
was the time, too, of the ‘handmade Web’, of that the Web and the W3C – the consortium
endless inventiveness, creativity, new kinds that sustains its development and its govern-
of publication that gradually became stand- ance, have several locations too. A general
ardised, especially in the wake of the Internet history of the Web has therefore to take into
bubble. Such representations give only a par- account geopolitical, institutional and legal
tial picture of the Web of the 1990s, of course, contexts. It thereby claims a historical geogra-
but their prominence highlights the DIY aes- phy of the Web, both a ‘classic’ one, based on
thetic and practices and the active involve- the centres of power and on national and inter-
ment of Internet users in creating web pages, national issues, and a ‘new’ one of flows (of
which decreased with the introduction of sim- diffusion, acculturation and adaptation) within
ple publication tools1. particular spaces (local, regional or national).
And as Megan Sapnar Ankerson (2009)
has shown by studying the splendour and
decadence of the Flash technology, the Web A Multinational, International and
then assumed a new simplicity, a kind of con- Transnational History
trition for past excesses. In the salvage work
of the Geocities Archive Team and in the Central to the history and development of the
Geocities-izer site2, which enables the user to Web is Tim Berners-Lee’s decision to ask
give his/her site a ‘90s look’, we see some CERN to place his invention in the public
renewal of interest in a ‘vernacular’ digital domain and his preference for open innova-
heritage (Milligan, 2017) and in the folklore tion with equally open structures of govern-
and the coding of those early days. ance (Berners-Lee, 2000). A focus on the
We can state, therefore, that constructing a governance of the Web allows us to situate it
chronology of the Web depends on our capac- historically, once again both technically and
ity to establish a multiple genealogy for an politically, as well as within a global arena
object at a crossroads – in technical terms, of where stakeholder and geographical interests
course, but also in terms of the offline usages, are nonetheless important.
representations and variations accompanying Meanwhile, in 1993, the University of
the diffusion of the Web. Minnesota decided that companies should
pay to use Gopher3, though, at the request
of Tim Berners-Lee, CERN agreed to allow
free use of Web protocols. The next year,
THE BOUNDARIES AND SPATIAL the MIT Laboratory for Computer Science
CONTEXTUALISATION OF THE WEB (LCS) became the first host of the World Wide
Web Consortium (WC3). In his introduction
to Weaving the Web, the head of MIT-LCS,
In 1993 we were not in a position to say where the
Michael L. Dertouzos, notes that ‘As tech-
future home for the Web would be.
nologists and entrepreneurs were launching
In 1995, I am happy to say that two renowned or merging companies to exploit the Web,
institutions, INRIA and MIT, have allowed us to they seemed fixated on one question: “How
build one. The idea of having a single location, can I make the Web mine?”. Meanwhile, Tim
in the United States, always seemed out of
Berners-Lee was asking “How can I make
the question to me.
the Web yours?”’ (Berners-Lee, 2000: viii).
Tim Berners-Lee, Launch of the European Berners-Lee (2000) also recalls that he wanted
Branch of the W3C, 2 November 1995 the consortium to run on an open process like
WEB HISTORY IN CONTEXT 65

the IETF’s, but ‘one that was quicker and more The decisive factors in his choice thus arose
efficient, because we would have to move fast’. both from a particular historical context –
Recalling the establishment of WC3, and that of openness and the influence of the
the decision to base it in three locations, in structures within which software and even
the United States, in Europe and finally at the Internet were being developed – and from
Keio in Japan in the mid 1990s, suggests a an immediate context of wishing to propel
number of reflections concerning issues of the development of the technology, the tran-
normalisation, regulation, internationalisa- sition of Gopher to a payment model and his
tion and globalisation involved in the devel- own departure from CERN, among other
opment of the Web, and the importance of things.
viewing these in context. Below we high- Second, we should note that while the Web
light two such issues. was established within a global dynamic,
First and foremost is Tim Berners-Lee’s as evidenced not least in its name, it was
decisive preference for openness and multi- inextricable from the start from continental
stakeholder governance, which Internet and regional geopolitical and institutional
actors have continued to develop ever since interests. Tim Berners-Lee, having left
the 1960s. As Andrew Russell notes in a CERN and joined MIT, was keen to retain
paper (2003) highlighting the influence of a European base for the Web – a desire that
consensus-based, non-hierarchical values would lead in particular to the competition
throughout the history of the Web and of to host W3C…
W3C, ‘consensus is a predominant cultural
[…] a ‘divine surprise’ for Europe and for INRIA.
value of Internet governance, most notably in
the Internet Engineering Task Force (IETF). After CERN’s withdrawal, the Web could have left
The IETF’s motto and operating philosophy: for the other side of the Atlantic with Berners-Lee,
“We reject: kings, presidents, and voting; its founding father, had he not promoted a promi-
we believe in rough consensus and running nent European role. Despite CERN’s innovations,
Europe did not have its own critical mass of activity
code” is widely cited as a model for delibera-
and development.
tive governance’. The history of Web govern-
ance thus suggests a chronology going back It had immediately to collaborate with transna-
to the 1960s, the formation of the Network tional communities, helping to integrate European
Working Group4, the creation of ‘Requests building of infrastructure into the larger context of
globalization in recent decades. (Griset and
for Comments’ in the Arpanet project5 and
Schafer, 2011: 368)
even further back. W3C represents a stance
on patents and openness inherent in the devel-
opment of information and network technolo- National Adaptations
gies. Nonetheless, Tim Berners-Lee did not
place the Web within existing governance Beyond these international, transnational and
structures, particularly those of the Internet, multinational spaces, there is clearly a rich
although he was in contact with the IETF: history of adaptations of the Web to specific
contexts with different technical cultures and
Berners-Lee understood that ‘running the consor- industrial and commercial antecedents, as
tium would always be a balancing act, between well as those subject to varying national legis-
taking the time to stay as open as possible and lation. Examples are provided by several stud-
advancing at the speed demanded by the onrush ies in The Routledge Companion to Global
of technology’. He admired the openness of the Internet Histories (Goggin and McLelland,
IETF process, but felt that, given the fast pace of
the commercial world that was increasingly 2017), such as Nicholas John’s chapter on the
adopting the Web, the Web would benefit from a emergence and structuring of ISPs in Israel, or
quicker and more efficient process (Russell, 2003). Ignacio Siles’s chapter on the slow passage
66 THE SAGE HANDBOOK OF WEB HISTORY

from X25 to TCP/IP in Costa Rica (and the questions about the spread of digital culture
occurring rivalries and debates within the in different production contexts.
country). Other examples include the studies The question of whether there is really
dedicated to Mexico or Brazil, situating them just one Web or a plurality of Webs is thus a
in specific political and economic contexts, or crucial one. A useful image for how national
those focusing on Korea and Taiwan, address- communities have participated in the crea-
ing more specific uses such as BBS and tion of discrete information spaces, defined
e-mails. All of these are enlightening national by language usage and specific reference
paths, which involve the political, economic, points and sometimes governed by the will of
technological, cultural and social history of national institutions, may be that of the archi-
these countries, their inclusion in regional pelago. This is also currently becoming an
groups and environments and their relation- even more appropriate image for the various
ship to the United States. services, applications and spaces on the Web,
As regards the case of France specifi- with the effects of GAFAT (Google, Amazon,
cally, which is very familiar to the authors, Facebook, Apple, Twitter) and of digital
the delay in French adoption of the Web at social networks and digital environments thus
the end of the 1990s was often emphasised requiring the most sophisticated analysis.
by advocates of a faster transition from
Minitel to the Internet. The Internet access
figure of 100,000 in 1995 increased by 1997
to 381,000 for domestic access and 621,000 FROM WEB TO WEBS? COLLECTIVE
for workplace access. Despite this rapid take- AND INDIVIDUAL EXPERIENCES
off, Reuters in 1997 estimated the number of
French websites at 21,367, as compared with Virtual reality came before reality, not the other
825,385 in the United States. Thus, while way around; the reality was in our heads
early adapters in the United States had to deal
with the huge number of novice arrivals on when we imagined the exploits of Captain
the Net from 1993 – an ‘eternal September’ Fracasse6, and we didn’t have to pay for a connec-
tion or a modem.
provoked by AOL’s opening of Usenet
forums to general access, it was only in the Bruno Latour on the France Culture radio pro-
second half of the 1990s that France would gramme Place de la Toile7, 20 November 2009.
experience the still very cautious appearance
of newbies on the Web: Bruno Latour’s somewhat provocative for-
mula that challenges the idea of a virtuality
In 1988, the regulation of domestic traffic was specific to the digital world by insisting on its
only a matter of taming a herd of elephants. In material dimensions, which are also put for-
1995, all our online relationships were ambushed
ward by approaches related to Internet Studies
by millions of little mice, which were much more
difficult to control. And of course releasing the or Infrastructure Studies, invites us to grasp
mice only in clusters, as Netscape did, solved noth- the context that gives birth to these infrastruc-
ing. (Huitema, 1995, authors’ translation) tures, whether informational or material. We
must therefore rethink the place of the hard-
This highlights the need for national, as ware and software computer industry, as well
much as transnational, histories. We should as the economic, regulatory and political
also note, despite these clearly different time- characteristics that frame, or sway under, the
scales, a convergence of reactions on the part development of uses of networks. Speeches
of early adopters on both sides of the Atlantic concerning the information highways, from
(Auray, 2002; Paloque-Bergès, 2011), raising those of Al Gore (Vice President of the
WEB HISTORY IN CONTEXT 67

United States during the Clinton administra- However, the influence of the Web and of
tion) in 1993 and their rework in Europe (e.g. networks more generally on our political
the Théry report in France, or the European ideas (Loveluck, 2015) and democratic prac-
Bangemann report on ‘Europe and the Global tices (Badouard et al., 2016; Mabi, 2016) is
Information Society’ in 1994, for instance) increasingly a focus. While the role of the
reflect the growing importance of political State in the development of information tech-
and economic stakes at work since the 1990s. nology before the network era has been the
Individual, private and domestic aspects also subject of several pioneering studies from the
come into play in the history of the Web; 1990s onwards (Griset, 1998), the role of
together, these factors contribute to experi- political contexts in the adoption of the Web
ences that are both singular and plural, indi- has yet to be sufficiently appreciated.
vidual and collective. Nonetheless, national political configurations
have played a crucial role. To return to the
example of Minitel, the catch-up period seen
in France from the second half of the 1990s
Beyond Technology: Economics highlights a public initiative well behind that
and Politics seen in the United States. Following a dozen
years when France led the way in public
Technical approaches are beginning to give
access technology, Minitel assumed all the
way to analyses that take account of the rela-
hallmarks of a ‘technical-industrial mis-step’
tionship of the Web to both politics and
(Thierry, 2013).
economics.
To this must be added the way in which
This sometimes focuses on the changing
a current and widespread political discourse
environment of the Web itself and the impact
depicts digital communication, including the
it has on its historical development, chal-
Web, as a solution to everything, a tendency
lenging its roots and values, as highlighted
denounced by some analysts as solutionism
by Andrew Russell (2003) with the histori-
(Morozov, 2014). Here, too, only a long-term
cal study of some debates within the W3C.
perspective on both ‘network politics’ and
He enriches the socio-technical approach
the ‘politics of networks’ will allow us to
of protocols and governance that Sandra
see the actions of Snowden, Julian Assange
Braman (2014) had raised for Internet
or Bradley Manning in the context both of
development by also studying the Request
governments’ digital activity (Rey, 2016) and
for Comments. With particular reference to
of a secrecy whose development is only in
the debates that took place between 1999
part attributable to power relations mediated
and 2003 around the use of patents in W3C
through the Web. Such a perspective allows
recommendations, Andrew Russell throws
us to see that not everything happening online
light on the impact of a changing context,
is completely new.
for example the widespread diffusion of the
Web and the growth of commercial interests
in the period preceding the Internet bubble.
Thus: Towards a Phenomenology of
Web Experience
[the] history of PPWG8 provides an excellent case
study for testing the ability for the W3C to garner The position assumed by the Web during the
consensus around an issue that pitted the ‘open 1990s in the workplace as well as in the pri-
code’ values of grassroots Web developers against
the commercial interests that concerned vate sphere is also a powerful argument for
Dertouzos and Berners-Lee in the early 1990s. approaching our relationship with all things
(Russell, 2003) digital from an experiential perspective as
68 THE SAGE HANDBOOK OF WEB HISTORY

much as through economic and political of considering the visual and material specifi-
analysis. Far from reviving the now-dated cities which framed the context of production
metaphor of early analysts who depicted and accessibility of these pages – terminal,
Web users as traversing a virtual world network and bandwidth – the reading of a
removed from reality, on science-fictional, Web archive implies an understanding of
‘cyberpunk’ lines (Dery, 1994), the time has what the web page, the website and its hyper-
come to develop a phenomenology of our links meant in their time (Schafer et al.,
real relationship with the Net. 2016).
The first stage, unarguably, is to validate Finally, analysis of how the Web has
the real foundations of our experience and changed our understanding of the world is
compile a history of the real culture of the still in its infancy, but is already indicating
virtual, no longer arbitrarily separating real new and fertile topics for research (such as
and virtual. Studies, for example, of the eco- the massive adoption of smartphones or the
nomic conditions of consultancy, undertaken online availability of digital photographs
from the perspective of a history of pricing (Gunthert, 2015) or, again, the growth of
and supply (Rebillard, 2012), of the ele- online communities of medical patients,
ments of environmental interaction defined political activists or users of technology).
by a long-term history of our relationship The pioneers and research centre archives
with digital tools (Bardini, 2000), or indeed which are clearly identified are now being
of early practitioner communities (Thierry, joined by new knowledge banks and new
2012), can become the basis for appreciating tools that the historian must take advantage
recent innovations and their impact on our of. Web archives revealing the websites (lost
everyday experience of the Net (adoption of or not), the presentation tools (the technolo-
smartphones, new pricing structures, etc). gies invoked, the real dissemination and
As the most common elements of our usage of these technologies and the degree to
practice continue to change, we also need which they have been mastered by the actors
to establish a periodisation of such change, involved) and the digital landscapes com-
along the lines of Anne Helmond’s work posed by the various interfaces (especially
on the hypertext link and how its meaning the structures for presenting information) can
has changed over time. She has shown, for lead to the formation of a corpus that will
example, how much the meaning of the term give us access to the various ways in which
‘hyperlink’ has changed over time, studying: the Web has been used. This allows us to go
beyond the boundaries of how technicians
the history of the hyperlink from a medium- use technology.
specific perspective by analyzing the technical
reconfiguration of the hyperlink by engines and The involvement of digital tools in almost
platforms over time. Hyperlinks may be seen as every area of society should thus encourage
having different roles belonging to specific peri- studies both on their diffusion and on their
ods, including the role of the hyperlink as a unit usage.
of navigation, a relationship marker, a reputa-
tion indicator and a currency of the web. • Diffusion especially in terms of how this is
(Helmond, 2013) anchored in reality. For example, the place of
cybercafés as spaces of discovery, learning and
The tools available to historians for carrying
relaxation is a striking example from the first
out this type of study are developing. For period of the ‘mobile Web’ before the advent of
example, Oldweb.today, a service that allows the smartphone.
for the navigation of Web archives with • Usage approached through Web archives and
browsers from their own time, reminds us that other archives that historians are used to consult-
the study of Web archives demands their con- ing. Looking no further than the social adapta-
textualisation. Beyond the importance tions of the Web since the time of laboratories
WEB HISTORY IN CONTEXT 69

and universities, we have the manuals that flour- an effective system of technology-based doc-
ished in the 1990s (Laquey, 1995), the press and umentation has haunted minds since at least
the other ‘traditional’ media that allow us to Vannevar Bush and his famous Memex (Nyce
re-anchor the Web within a society in the adop- and Kahn, 1991), and even long before a
tion phase. global network of computers was considered.
This calls for a reassessment of the impor-
tance given to older innovations, but with the
risk of an anachronism, as shown by the dis-
CONCLUSION courses that try to make the Mundaneum a
Google ‘before its time’, which is an obvious
Building the history of the Web consists sign of the contamination of the past with an
above all of confronting multiple problems obsessive present.
of categorisation of the objects that make up In context, the Web then comes about and
this complex reality. The Web must be viewed develops in spaces that are interwoven with
in relation to all the fields that act upon it, not each other and that must be articulated. In its
all of which are technical in nature: the expe- early years of development and after its birth
rience of users, the political and economic at CERN in Europe, the Web becomes rooted
consequences of its development, and so on. in the North American context and expands
To grasp the Web as a whole requires a seem- rapidly, coming in contact with national
ingly paradoxical division in order to empha- spaces that adopt it according to specific con-
sise its dimension of global social fact. This straints and contexts. As a context, the Web
paper stressed three essential dimensions, pushes us today to think of globalisation in
which can contribute to it: temporalities, terms of technology and media. However, as
spaces and materialities. a deeply deterritorialised space too, it ques-
By introducing the reader to the temporali- tions the notion of border and sovereignty,
ties, spatialities and materialities that should but also the possibility of plurilingualism and
frame the history of the Web, our narrative the position of States in the governance of the
takes its roots in Fernand Braudel’s reflections world communications.
(1969) on history but also revisits recent works In context, the Web can finally be grasped
such as Geoffrey Bowker’s analysis in Memory as the emerging place of many individual
Practices in the Sciences, which advocated: and collective experiences. These experi-
ences are distributed across time and lead
We need to open a discourse – where there is no
effective discourse now – about the varying tem- to the succession of definitions of the Web
poralities, spatialities and materialities that we in the minds and discourses of actors that
might represent in our databases, with a view to is dependent on the instant at which they
designing for maximum flexibility and allowing as experience the Web. The myriad of mediums
possible for an emergent polyphony and poly-
(screens, devices, interfaces, etc.), speeds
chrony. Raw data is both an oxymoron and a bad
idea; to the contrary, data should be cooked with and contents draw up as many possibilities
care. (Bowker, 2006: 183–4) for the historian to explore the phenomenolo-
gies of the Web. From a 56 kilobits connec-
In context, the Web is polychronous. Here, as tion, at home or in a cybercafé and with a
already emphasised, we must find the means 13- or 14-inch CRT screen, to the ubiquitous
to articulate complex genealogies and go 4G smartphone’s connection, is it always the
beyond a history of successes that can only same Web or is it different experiences of the
serve to rubber-stamp a dominant model of same artefact?
development. Its profound roots, sometimes Finally, as context, what the historian
unforeseen by its actors themselves, have experiences of the Web constitutes the
many long-term genealogies: the search for ultimate frontier to be taken into account.
70 THE SAGE HANDBOOK OF WEB HISTORY

Following what sociologists have been assid- 3  Gopher is a protocol that was designed for dis-
uously practising for a while, the historian tributing, searching and retrieving documents
over the Internet. It would remain free for educa-
must make a reflexive analysis of his own
tional and non-profit institutions.
history with the Web, in order to avoid tak- ’This was an act of treason in the academic com-
ing the current Web for the state of culmina- munity and the Internet community. Even if the
tion of what is in fact an ever-evolving story. university never charged anyone a dime, the fact
Such consideration of internal and external that the school had announced it was reserving
the right to charge people for the use of the
contexts then further allows us to give due
gopher protocols meant it had crossed the line.
weight to the development of a Web that is To use the technology was too risky. Industry
constantly being redefined and to analyse dropped gopher like a hot potato’ (Berners-Lee,
recent developments (for example the way 2000: 73).
that HTML and the page format as the basic 4  A group led by Steve Crocker, to develop the
host-to-host software in Arpanet.
unit of information now face competition
5  An early packet switching network launched by
from the creation of data silos, data encapsu- the Advanced Research Projects Agency (ARPA)
lation and the appearance of major new cap- of the United States Department of Defense in
turers of attention like Facebook), in sum to 1969.
consider the Web in all its dynamic process 6  A nineteenth-century French adventure novel.
7  Web Square.
of becoming, rather than through a rigid and
8  Patent Policy Working Group.
inevitably outdated definition of what it is
presumed to be.

REFERENCES
ACKNOWLEDGEMENTS
Ankerson, M.S. (2009) ‘Historicizing Web
Design: Software, Style and the Look of the
This study has been carried out within the
Web’, in Janet Staiger and Sabine Hake
framework of the Web90 project supported
(eds), Convergence Media History. New
by the French National Research Agency York, London: Routledge. pp. 192–203.
(ANR-14-CE29-0012-01). Auray, N. (2002) ‘L’Olympe de l’internet fran-
çais et sa conception de la loi civile’, Les
Cahiers du Numérique, 3(2): 79–90. https://
www.cairn.info/revue-les-cahiers-du-numer-
Notes ique-2002-2-page-79.htm
Badouard, R., Mabi, C. and Sire, G. (2016)
1  See Kyle Chayka (28 October 2014), The Great
Web 1.0 Revival, gizmodo.com, http:/gizmodo. ‘Beyond “Points of Control”: Logics of digi-
com/the-great-web-1-0-revival-1651487835 tal governmentality in Internet Policy Review’
’Nostalgic for intimacy [online] https://policyreview.info/articles/
When it first got popular, Facebook was like the analysis/beyond-points-control-logics-
back room at a club: A cozy space filled with just digital-governmentality
your friends, everyone clearly connected to every- Bardini, T. (2000) ‘Les promesses de la révolu-
one else. Now, it’s more like a stadium […] tion virtuelle. Genèse de l’informatique per-
Nostalgic for free-form self-expression sonnelle, 1968–1973’, Sociologie et Sociétés,
Niche communities that aren’t overseen by giant
32(2): 57–72.
companies also have the freedom to evolve on
Berners-Lee, T. (2014) ‘Oui, le Web est né en
their own and embrace identities and designs
that wouldn’t be possible on Facebook […]’. France’, Challenges, 17 June. https://www.
See also Anil Dash (13 December 2012), The challenges.fr/high-tech/tim-berners-lee-
Web We Lost, http://dashes.com/anil/2012/12/ oui-le-web-est-ne-en-france_151444
the-web-we-lost.html Berners-Lee, T. (2000) Weaving the Web. New
2  http://www.wonder-tonic.com/geocitiesizer/ York: Harper Business.
WEB HISTORY IN CONTEXT 71

Bory, P., Benecchi, E. and Balbi, G. (2016) ‘How Helmond, A. (2013) ‘The algorithmization of
the Web was told: Continuity and change in the hyperlink’, Computational Culture.
the founding fathers’ narratives on the ori- http://computationalculture.net/article/
gins of the World Wide Web’, New Media & the-algorithmization-of-the-hyperlink
Society, 18(7): 1066–87. Herrenschmidt, C. (2007) Les Trois Écritures:
Bowker, G. (2006) Memory Practices in the Langue, Nombre, Code. Paris: Gallimard.
Sciences. Cambridge, MA: The MIT Press. Huitema, C. (1995) ‘Les touristes, les éléphants
Braman, S. (2014) ‘The geopolitical and the et les souris’, Planète Internet, November–
network political: Internet designers and December: 78.
governance’, International Journal of Media Laquey, T. (1995) Sésame pour Internet: Initia-
and Cultural Politics, 9(2): 277–96. tion au Réseau Planétaire. Paris: Addison
Braudel, F. (1969) Écrits sur l’Histoire. Paris: Wesley France.
Flammarion. Loveluck, B. (2015) Réseaux, Libertés et Con-
Brügger, N. (2016) ‘Introduction: The Web’s trôle. Une Généalogie Politique d’Internet.
first 25 years’, New Media & Society, 18(7): Paris: Armand Colin.
1059–65. Mabi, C. (2016) ‘Analyser les dispositifs partici-
Brügger, N. (2012) ‘Web history and the Web as patifs par leur design’, in Christine Barats
a historical source’, Zeithistorische Forschun- (ed.), Manuel d’Analyse du Web. Paris:
gen/Studies in Contemporary History, 9: 316– Armand Colin. pp. 33–7.
25, http://www.zeithistorische-forschungen. Mazower, M. (2012) Governing the World, The
de/2-2012/id%3D4426 History of an Idea, 1815 to the Present.
Brunton, F. (2013) SPAM: A Shadow History of London: Allen Lane.
the Internet. Cambridge, MA: The MIT Press. Merzeau, L. (2014) ‘Vers un Web temporel’,
Castellucci, P. (1999) Dall’ipertesto al web. Bari: Talk at the IIPC General Assembly. http://
Laterza. merzeau.net/vers-un-web-temporel/
Dery, M. (1994) Flame Wars: The Discourse of Milligan, I. (2017) ‘Welcome to the Web: The
Cyberculture. Durham: Duke University Press Online Community of GeoCities and the
Books. Early Years of the World Wide Web’, in Ralph
Ensmenger, N. (2012) ‘The digital construction Schroeder and Niels Brügger (eds), The Web
of technology: Rethinking the history of as History. London: UCL Press. pp. 137–58.
computers in society’, Technology and Cul- Misa, T.J. (2007) ‘Understanding how comput-
ture, 53(4): 753–76. ing has changed the world’, IEEE Annals of
Eppink, J. (2014) ‘A brief history of the GIF’, the History of Computing, 29(4): 52–63.
Journal of Visual Culture, 13(3): 298–306, Morozov, E. (2014) To Save Everything, Click
http://jour nals.sagepub.com/doi/p d f / Here: The Folly of Technological Solutionism.
10.1177/1470412914553365 New York: PublicAffairs.
Gillies, J. and Cailliau, R. (2000) How the Web Mussou, C. (2012) ‘Et le Web devint archive:
Was Born: The Story of the World Wide Enjeux et défis’, Le Temps des Médias, 2(19):
Web. Oxford: Oxford University Press. 259–66.
Goggin, G. and McLelland, M. (eds) (2017) The Nyce, J.M. and Kahn, P. (1991) From Memex to
Routledge Companion to Global Internet Hypertext: Vannevar Bush and the Mind’s
Histories. New York/Abingdon: Routledge. Machine. Boston: Academic Press.
Griset, P. and Schafer, V. (2011) ‘Hosting the Ole Finneman, N. (2015) ‘Hypertextual rela-
World Wide Web Consortium for Europe: tions in digital born Materials’, RESAW con-
From CERN to INRIA’, History and Technol- ference, ‘Web Archives as Scholarly Sources:
ogy, 27(3): 353–70. Issues, Practices and Perspectives’, 8 – 10
Griset, P. (1998) Informatique, Politique Indus- June, Aarhus University, Denmark.
trielle, Europe, entre Plan Calcul et Unidata, O’Reilly, T. (2005) ‘What Is Web 2.0, Design
Institut d’Histoire de l’Industrie. Paris: Édi- Patterns and Business Models for the Next
tions Rive Droite. Generation of Software’, oreilly.com.
Gunthert, A. (2015) L’Image Partagée. La Pho- Retrieved from: http://www.oreilly.com/
tographie Numérique. Paris: Textuel. pub/a/web2/archive/what-is-web-20.html
72 THE SAGE HANDBOOK OF WEB HISTORY

Paloque-Bergès, C. (2011) Entre trivialité et culture: Stevenson, M. (2016) ‘The cybercultural


Une histoire de l’Internet vernaculaire. Émer- moment and the new media field’, New
gence et médiation d’un folklore de réseau, Media & Society, 18(7): 1088–102.
Thèse de Doctorat en Sciences de l’Information Stevenson, M. (2014) ‘Rethinking the participa-
et de la Communication, Université Paris VIII. tory Web: A history of HotWired’s “new
Prost, A., 2010 [1996] Douze Leçons sur publishing paradigm”, 1994–1997’, New
l’Histoire. Paris: Éditions du Seuil. Media & Society, 18(7): 1331–46.
Rebillard, F. (2012) ‘La genèse de l’offre com- Thierry, B. (2013) ‘De Tic-Tac au Minitel:
merciale grand public en France (1995– la télématique grand public, une réussite
1996): Entre fourniture d’accès à l’Internet et française’, Actes du colloque Les ingénieurs
services en ligne “propriétaires”’, Le Temps des Télécommunications dans la France
des Médias, 18: 65–75. contemporaine. Réseaux, innovation et ter-
Rey, O. (2016) Quand le Monde s’est Fait ritoires (XIXe–XXe siècles). Paris: IGPDE
Nombre. Paris: Stock. editor.
Russell, A. (2003) ‘The W3C and its Patent Thierry, B. (2012) ‘“Révolution 0.1”. Utilisateurs
Policy Controversy: A Case Study of Author- et communautés d’utilisateurs au premier âge
ity and Legitimacy in Internet Governance’, de l’informatique personnelle et des réseaux
http://arussell.org/papers/alr-tprc2003.pdf grand public (1978–1990)’, Le Temps des
Schafer, V. and Thierry, B. (2015) ‘L’ogre et la Médias, 1(18): 54–64.
toile. Le rendez-vous de l’histoire et des Van den Heuvel, C. and Rayward, B. (2011)
archives du Web’, Socio, 4: 75–96. ‘Facing interfaces: Paul Otlet’s visualizations
Schafer, V., Musiani, F. and Borelli, M. (2016) of data integration’, Journal of the American
‘Negotiating the Web of the past’, French Society for Information Science and Technol-
Journal for Media Research, 6, http://french ogy, 62(12): 2313–26.
journalformediaresearch.com/lodel/index. Wolf, G. (1995) ‘The curse of Xanadu’,
php?id=952 Wired 3, https://www.wired.com/1995/06/
Shifman, L. (2013) Memes: In Digital Culture. xanadu/
Boston: The MITPress.
6
Science and Technology Studies
Approaches to Web History
F r a n c e s c a M u s i a n i a n d Va l é r i e S c h a f e r

INTRODUCTION One of STS’ tenets is to ‘open the black box’ of


technology to understand its functioning, and
understand how social relations and aims translate
A decade ago, Barry Wellman affirmed that into artifacts. STS similarly offer models to describe
the mid nineties had corresponded to the how human and non-human actors exert joint
‘prehistoric’ era in the emerging field of agency in mediated environments. (Abbate,
2012:170)
study seeking to explore, analyze and under-
stand the complexity of the social, legal, This chapter introduces STS approaches,
political and economic relations subtending applying them to Web history, with the intent
the development of the Internet (Wellman, to show how they can shed further light on it
2004: 127). This approach goes hand in hand and enrich it. In its first part, the chapter
with Web development, and this interest in introduces key STS concepts and notions
renewed approaches to Internet studies is such as agency, co-construction, dispositif,
most likely related to the widespread diffu- actor–network, boundary object (Star and
sion of the Internet allowed by the Web in the Griesemer, 1989), controversy and trial, and
late nineties. Leveraging approaches issued links them to case studies and examples
from science and technology studies (STS), drawn from the history of the Web such as
researchers have progressively emphasized a the creation of W3C, the rise of spam, GIFs
critical approach to the complex nexus of and memes, the development of Wikipedia
Internet and society. More recently, Janet and the role of APIs. The second part of the
Abbate notes that: chapter delves into the governance of the
STS can be useful to address the complex links Web as a particularly interesting case study
between Internet technology and culture, which of how STS notions and concepts can be
have blurred the frontiers of traditional categories. leveraged to advance the analysis of objects
74 THE SAGE HANDBOOK OF WEB HISTORY

and dynamics central to the Web and its his- social practices of appropriation of emerging
tory. Important Web governance-related con- technologies, in media sociology’s increas-
cepts, such as multi-stakeholderism ing interest in ICTs, and the development of
(Malcolm, 2008) and algorithmic govern- information and communication sciences:
mentality (Rouvroy and Berns, 2013), are ICTs become ‘interactional artifacts’
‘put to the test’ through a STS lens. (de Fornel, 1994: 126). With the advent of the
This chapter seeks to show the value of Internet, then of the Web, an ‘object-cen-
a relational, practice-oriented approach – tered’, interdisciplinary field of study follows
what in STS vocabulary is called ‘unpack- at the end of the nineties, with the creation of
ing black boxes’, or doing a sociology ‘of Internet studies. The ‘STS turn’ within this
assemblages’. The chapter examines how this field calls for a particular attention paid to
approach can shed light on the hybrid mobi- context and situated practices, the unveiling
lizations of innovators, consumers, users and of the ‘invisible work’ of Internet and Web
entrepreneurs – alongside their creations and innovation. STS approaches put emphasis on
cultures – that have made the Web the com- the practices that shape the management and
plex socio-technical system it is today. the governance of the Internet and its uses as
a living reality, and determine the ways in
which it operates, works, resists and f­ unctions.
Furthermore, STS approaches invite us not to
WHEN STS AND WEB HISTORY MEET consider values and rationalities of Internet
and Web practitioners as indicators of how
The encounter between the field of science they perceive the world, but to problematize
and technology studies (STS) and the Internet them as resources and categories that they
as a subject of study has already proven inter- deploy in specific circumstances in order to
esting and fertile in recent times. Notably, an create and uphold specific configurations – in
emerging field of study is born in the bridging short, to actively organize their world.
of STS and Internet governance research.
Thus, it is interesting to revisit the history of
the Web, and more generally of the Internet, When STS Meet Media Studies
through the lens of key concepts in STS. The
first part of this chapter aims to demonstrate Several disciplines have attempted to ‘think
the effectiveness of a few core STS concepts themselves anew’ through the STS lens, or
in the field of Web history. The meeting of proposed conceptual hybrids. Noortje Marres
STS and the Web first happened with studies and Carolin Gerlitz (2015) argue, for exam-
of information and communication technolo- ple, that digital social research should adopt
gies (ICTs), the Internet in particular. Thus, as ‘interface methods’ – a ‘critical and creative
a background to this endeavor, the chapter engagement with methods development at the
first retraces this meeting, and proceeds to intersection of sociology, STS and digital
show that more recently, STS have crossed research’ (Marres and Gerlitz, 2015: 21),
paths with media studies, thus reinforcing the despite the strong bias in favor of purely
nexus between approaches to the communica- quantitative methods that currently informs
tion field in its past and present dimensions. research on digital media.
Even more recently, Gabriele Balbi,
Alessandro Delfanti and Paolo Magaudda
(Balbi et al., 2016) have facilitated a con-
When STS Meet ICTs
versation on the cross-fertilization of STS
The ‘STS turn’ is founded in the sociology of and media studies approaches (Badouard
innovation’s interest in technical objects and et al., 2016). The convergence between the
SCIENCE AND TECHNOLOGY STUDIES APPROACHES TO WEB HISTORY 75

two fields is revealed to be far from linear, concepts at the intersection of media stud-
with numerous points of friction appearing ies and STS. In classic actor–network the-
and notions such as dispositif (which we will ory vocabulary, a socio-technical dispositif
come back to later in more detail) revealing (device) is defined as an assemblage of human
their multiple meanings. However, several and non-human actants, where competen-
objects and sensibilities are shared by media cies and performances are distributed, and
studies and STS, especially in terms of com- whose existence is enabled by the workings
mon ancestors – e.g. domestication theory as of innovation. Moreover, the notion allows
discussed by Oudshoorn and Pinch (2003) – agency (Proulx, 2009) to be integrated in the
or a shared interest in infrastructures. All analysis, for a more fine-grained apprecia-
these contribute to the dialogue of the sociol- tion of its collective dimension. As Muniesa
ogy (and the history) of the Web, and STS. et al. (2007: 2) point out, ‘devices do things.
Their encounter is, in particular, driven by the They articulate actions, they act or make
concept of materiality. others act’, they make phenomena of trans-
As Balbi et al. (2016) note, the ‘material’ lation materially possible. Several authors
and ‘physical’ aspects of social life have have underlined the suitability of the ‘device’
been a crucial object of investigation for STS notion to be expanded beyond the strictly
since the inception of the field. A ‘material Foucauldian focus on power relations, social
turn’ in the study of the Internet and ICTs is control, disciplinary and normative conno-
fostered by hybrid work between STS and tations. Indeed, the more recent uses of the
media and communication studies (Hondros, notion by social science researchers call for
2015). This work has not always been linear, a reflection on the different forms of tension
as Lievrouw (2014) notes, due to STS’ and and mediation that articulate and interact in a
media studies’ respective tendencies to give number of constantly evolving media, digital
prominence to different processes – the co- and communication devices, including the
determinism of the social and the material Web (Appel et al., 2010).
for STS, versus media studies’ understand-
ing of technology’s materiality as a product
of discourses and visions. However, it is
increasingly producing notable work, such When Web History Meets STS
as Parikka’s media archaeology (2012) and Concepts: The Example of
Gitelman’s work (2013) on the infrastruc- ‘Boundary Objects’
tures subtending data. Concepts such as dispositif and mediation
Badouard et al. (2016), drawing from prove useful for articulating STS and media
Latour and Callon’s ‘classic’ STS literature, studies in a pragmatic approach to the ‘power
observe that STS have helped recognize relations’ inscribed in information and com-
technical artifacts’ status of ‘mediators’, munication tools, as Badouard et al. (2016)
inasmuch as they can modify the performa- conclude. Indeed, the notions of mediation
tivity of social actions. In this conception, and re-mediation can usefully be applied to
it makes less sense to consider discourses the Web and to its archives – which increas-
and objects as separate spheres, and more ingly act as sources for historians. In recent
sense to understand discourses as circulating work, we have shown the extent to which
within objects, both spheres co-constructing mediations and agencies need to be taken
each other (Boczkowski and Lievrouw, 2007; into account in order to think of the Web of
Gillespie et al., 2014). the past and the archived Web:
The notion of dispositif (in the Foucauldian
sense, often translated as ‘device’ in English) Technical and human negotiations at both levels of
and boundary object are among the notable collection and consultation of the Web archive
76 THE SAGE HANDBOOK OF WEB HISTORY

include many operations: the choices of particular the performative action of artifacts in the pro-
crawl frequencies, depths, domains to be col- duction of knowledge. By fully re-instating
lected, programming of robots, data deduplication
these dimensions, one is able to account more
processes; the recreation of links and filling of
URLs by the access software; the exclusion of spe- broadly for the work of coordination, align-
cific elements such as advertisements; the creation ment, alliance and translation between the
of platforms and consultation environments offer- different actors and the worlds they mobilize.
ing different designs and functionalities. All these The notion of boundary object, seen in this
operations bear witness to the ongoing choices
light, can be applied to the study of born-
that reflect the scope and ambitions established by
and for the actors of Web archiving (Schafer et al., digital heritage on the Web and beyond the
2016). Web, e.g. the discussion lists, newsgroups and
websites of the early nineties. For instance,
We can attempt to go further, by showing with a view to reconstructing the trajectories
how several features of born-digital heritage of innovation within pioneering user groups,
allow us to qualify it as a boundary object, Camille Paloque-Berges (2017) highlights
i.e. the concept coined by Susan Leigh Star the uses of these technologies in the mid
and James Griesemer (1989) to analytically 1990s, and observes that they served a logic
describe those processes where actors coming of confrontation to social, political and eco-
from different social worlds, and called upon nomic norms – such as, respectively, rules of
to cooperate, manage to coordinate despite sociability in online public speaking, equip-
their diverging points of view: ‘how do they ment and techno-scientific development, gov-
create mutual understandings without losing ernance and regulation of networks, and the
the diversity of social worlds?’ (Trompette transition from non-commercial networks to
and Vinck, 2009: 6–7). This concept was a full-fledged digital economy. Interestingly,
meant to reduce the inherent asymmetry the author notes that not only the discussions,
enshrined in Michel Callon’s original ‘trans- but the processes of their archiving facili-
lation’ notion (1986): from a top-down initia- tated negotiations and cooperation between
tive by the innovator or the science programmers and pioneer users and within
entrepreneur, enrolling other actors by exert- the programing community. Indeed, those
ing control via mandatory passage points, to processes helped to clarify and make explicit
a more organic view taking into account the the reconfigurations and re-appropriations of
co-existence of several translation processes. innovation, and contributed to the emergence
This notion is, indeed, to be handled with of the spatial and temporal practical dimen-
care – as Susan Leigh Star herself suggests sions of innovation (Latzko-Toth, 2010).
in a subsequent article (2010): since its crea- As noted previously (Schafer et al., 2016:
tion, the concept has mostly been used for its 13), sufficiently ‘malleable to adapt itself to
‘interpretive flexibility’, i.e. the property that the local needs and constraints of its differ-
allows it to operate as a support of heteroge- ent user-types’, capable of existing in differ-
neous translations, a device of integration for ent social worlds, all the while satisfying the
different types of knowledge, of mediation in ‘informational needs’ of each, and being suf-
the processes of coordination of experts and ficiently robust to maintain a common iden-
amateurs. However, as Trompette and Vinck tity throughout these adaptations (Star and
(2009) emphasize, other dimensions have Griesemer, 1989), born-digital heritage car-
been unjustly overlooked or downright forgot- ries within itself the challenges of its mainte-
ten; for example, boundary objects incorpo- nance, memory, but also of its governance, as
rate sets of conventions, standards and norms, we explore in the second part of this chapter.
typical of specific communities of practice, The prism of boundary objects can be
and allow us to account for the processes of used to investigate several ‘objects’ in Web
delegation of work or other activities, or for history – such as the protocols and formats
SCIENCE AND TECHNOLOGY STUDIES APPROACHES TO WEB HISTORY 77

that have contributed to make the Web what design, regulation and maintenance of the
it is today, or the animated images GIFs. Web, a system whose crucial importance in
Indeed, taking the issue of formats and our lives today was shaped by its early days
particularly of blogs, Ignacio Siles proposes in the nineties. Ultimately, these approaches
an STS entry into Web history via the notion speak to, and unpack, more ‘macro’ issues of
of closure (Siles, 2011). He observes that politics, power and governance, which will
practitioners largely consider that weblogs be the subject of the next section.
merged other types of sites (diaries, journals)
only after 1999 as a result of the emergence
of automated blogging software, considered
as a defining moment in the early history of WEB GOVERNANCE HISTORY
blogs. However, Siles remarks, this particu- THROUGH ‘A STS LENS’
lar account of the history of blogs neglects
other factors that STS perspectives can help This mutual shaping between artifacts and
to unveil. Examples of this are the conditions contents calls for a long-term understanding
under which this Web ‘format’ emerged of the history of the Web and of the shaping
and how it has merged with other practices of knowledge infrastructures. It also invites
and technologies, as well as the ‘shaping’ us, following Philip Agre, to explore the rela-
role of relevant social groups of Web users tion between technical architectures and
(diarists, personal publishers), and more institutions, particularly the difference
broadly the agency of users at large in between ‘architecture as politics’ and ‘archi-
appropriating both the technical and content tecture as a substitute for politics’ (Agre,
dimensions of these sites, which paralleled 2003), an argument that is, of course, closely
the development of the software itself. Thus, linked to Lawrence Lessig’s famous motto
the author argues: ‘code is law’ and its offspring. Agre argues
that technology often comes to us ‘wrapped
[b]ecause it involved interactions between several in histories about politics’, which raises the
groups of users around a technology and the par-
issue of Web governance.
tial stabilization of its meaning, this case can be
conceptualized as an instance of early ‘closure’ The notion of Internet governance – since
(Pinch and Bijker, 1987) [, referring to] the process its first widely consensual definition elabo-
through which ‘artifacts appear to have fewer rated at the World Summit on the Information
problems and become increasingly the dominant Society in Geneva, 2003, and Tunis, 2005 –
form of the technology’ (Kline and Pinch, 1996:
has led to a number of recent studies seek-
766). Stabilization thus designates the process
through which technologies acquire material form ing to merge it with STS perspectives (e.g.
and meaning. (Siles, 2011: 738–739) DeNardis, 2014; Epstein et al., 2016). In 2005,
the Working Group on Internet Governance
As for blog formats, the history of the Web defined Internet governance as ‘the devel-
provides myriad examples of processes and opment and application by Governments,
practices that contribute to attach meaning to the private sector and civil society, in their
artifacts, of different actors’ engagement in respective roles, of shared principles, norms,
the material appropriation of such artifacts, rules, decision-making procedures, and pro-
and of their social and technical ‘stabiliza- grammes that shape the evolution and use
tion’ via ongoing processes of shaping and of the Internet’, adding that ‘it also includes
reshaping. Web history is about the mutual other significant public policy issues, such as
shaping of content and artifacts, developers critical Internet resources, the security and
and users; STS concepts contribute to shed safety of the Internet, and developmental
light on the mundane and taken-for-granted aspects and issues pertaining to the use of the
practices and discourses that constitute the Internet’.
78 THE SAGE HANDBOOK OF WEB HISTORY

This definition is centered on the Internet, showing how the unfolding of this history is
not specifically on the Web; furthermore, most often about alliances and consultations
Web governance has led to a far more lim- with an increasing number of stakeholders.
ited number of studies that explicitly place The historian can produce an ex post
this notion at their heart. However, we argue socio-technical analysis of the ways in which
that said definition is relevant for Web gov- choices were made. However, actors have at
ernance as well, and can be usefully explored times taken upon themselves, often in very
via STS tools, as we will show in the remain- empirical ways, to undertake a situated socio-
der of this section via three cases that speak technical analysis. As John Law noted, actors
to three levels of the Web and of its history: become ‘heterogeneous engineers’, able to
the ‘institutional’ governance of the Web as understand and support the organization of
expressed in a wide standardization body actors and objects – humans and non-humans
such as the W3C; the governance of a Web (Law, 1987). The success of innovation, in
‘system’ – as speaking of a mere ‘website’ this vision, partly relies on innovators’ capac-
may now be too reductive – i.e. the foremost ity to interest and enroll other actors around
online encyclopedia, Wikipedia; and, finally, their vision of innovation and their way of
that of Web archiving, which has in several articulating it (Callon, 1986).
respects reproduced and prolonged a number This approach helps redefine the emer-
of critical issues in Internet governance. gence of the Web, since the work of Tim
Berners-Lee and Robert Cailliau, in a specific
manner. The emergence of the W3C, further
From CERN to the W3C detailed by the ‘Web History in Context’
chapter in this Handbook, can indeed be
As previously showed, STS approaches are read according to this particular angle. Let
strongly linked to the sociology of transla- us recall, for example, Tim Berners-Lee’s
tion, and influenced by the actor–network decision to ask CERN to place his invention
theory (ANT) (Callon, 1986; Latour, 1987; in the public domain and his preference for
Law, 1992). By proposing an analysis of the open innovation with equally open structures
very ‘practical’ ways in which humans and of governance (Berners-Lee, 2000). At the
non-humans connect and stabilize, ANT request of Tim Berners-Lee, CERN agreed
allows an articulate understanding of the to allow free use of Web protocols. The next
interactions leading to the hybrid and unsta- year, the first international World Wide Web
ble associations of ‘the technical’, ‘the politi- Conference was held at CERN, and the MIT
cal’, ‘society’ and ‘organizations’, and their Laboratory for Computer Science (LCS, now
temporary stabilizations due to ‘translations’. CSAIL) became the first host of the World
Giving center stage to technical objects in the Wide Web Consortium (W3C). In his intro-
making, to processes, to the unstable and the duction to Weaving the Web, the head of MIT-
transient, ANT considers that the keys to LCS, Michael L. Dertouzos, notes that ‘As
understanding innovation and its politics are technologists and entrepreneurs were launch-
the process, the movement and the negotia- ing or merging companies to exploit the Web,
tion, rather than the object or the artifact they seemed fixated on one question: “How
themselves at any given moment. This frame- can I make the Web mine?”. Meanwhile, Tim
work proves useful to analyze arenas of Berners-Lee was asking “How can I make
Internet governance that are, indeed, places of the Web yours?”’ (Berners-Lee, 2000: viii);
constant and articulate negotiation – such as indeed, this citation seems to emphasize
the World Wide Web Consortium (W3C) – Berners-Lee’s capacity to ‘enroll’ around
but it also allows us to re-instate the actors of open protocols. Similarly, the establishment
Web history in its socio-technical complexity, of the W3C, and the decision to base it in
SCIENCE AND TECHNOLOGY STUDIES APPROACHES TO WEB HISTORY 79

three locations, in the United States, in Europe regard, Badouard et al. (2014) note that a
(Griset and Schafer, 2011) and finally at Keio large majority of links is the prerogative of a
in Japan, further sustains this argument – this small number of websites, and the majority of
capacity of enrollment and translation that sites are targeted by only a very small number
originated with Berners-Lee’s newsgroups of links. The authors conclude that this phe-
announcement of the WWW in 1991. nomenon implies a specific relation between
Other examples in Web history provide users and information, which translates, in
us with the needed opportunity to study the particular, into a daily imbalance in the agen-
negotiations that took place within the W3C, das of online media, between a minority of
but also among actors of a governance that over-exposed current news and a majority of
was soon made broader, including, in par- confidential subjects. This approach invites
ticular, organized civil society. Janet Abbate historical studies of hyperlinks, such as those
(2012: 174–5) demonstrates this with clarity of Ian Milligan (2014) and Anne Helmond
when she describes the controversy that took (2013) – and shows the interest in making
place at the end of the nineties between the these studies STS-informed and revealing of
W3C and American activists for civil liber- these imbalances and their evolutions.
ties, on the content filtering system called Similarly, current research on Web track-
Platform for Internet Content Selection ing appears to fuel both media studies and
(PICS). Tim Berners-Lee argued that PICS STS, looking at Web history as an ongoing,
was a mere tool that allowed users to choose technical and human negotiation and hybridi-
their preferred content, and that Web devel- zation. Lerner et al. (2016) used the tool
opers were inspired by ‘a vision of the way TrackingExcavator to conduct the most exten-
in which society needed to be improved sive longitudinal study of the third-party web
[as] nowadays the social can evolve thanks tracking ecosystem to date, retrospectively
to technology, while earlier on, the only from 1996 to the present; they argue that:
way was to produce laws’ (Harmon, 1998,
understanding the trends in the web tracking eco-
quoted in Abbate, 2012: 174, our transla- system over time – provided for the first time at
tion). However, civil liberties activists were this scale by our work – is important to future
worried about the power of technologists, discussions surrounding web tracking, both tech-
summarized under Lessig’s formula ‘Code nical and political. Beyond web tracking, there are
is Law’ in 1999, a year after The New York many questions about the history and evolution of
the web (Lerner et al., 2016).
Times had dedicated time and space to PICS.
As Abbate shows, the PICS system was never This study, as those previously mentioned,
widely spread to the general public, and it will certainly contribute toward shedding
was replaced ten years or so later by a new further light onto the performative function of
filtering system, this time at the service of Web arrangements and the invisibility, perva-
advertisers seeking to target consumers. Yet siveness and agency of infrastructure that
it serves as an illustration of Web governance shape the Web. The next section examines
as a complex socio-technical system of sys- how these phenomena can be analyzed at the
tems and it invites an analysis of the plurality scale of a single website, such as Wikipedia.
and ‘networkedness’ of hybrid devices and
arrangements that populate, shape and define
the Web (Musiani, 2015). From Wikipedia Governance to
Other work is reminiscent of this approach, Algorithmic Governmentality
whether it touches upon the general history of
the Web and its governance or more precise Pierre-Carl Langlais’ work (2015) on the
aspects, for example on the relation we estab- emergence of a Wikipedian normativity
lish to information via search engines. In this during the 2000s suggests governance issues
80 THE SAGE HANDBOOK OF WEB HISTORY

can be understood and analyzed via STS intersect, the different layers of human and
tools. Langlais builds his work on Dominique technical agency at work. Indeed, increas-
Cardon and Julien Levrel’s (2009) frame- ingly often, robots are undertaking correction
work analyzing ‘participatory vigilance’ – a and signaling tasks, and the Recent Change
notion developed through their interest in the patrol, looking for vandalism episodes, ben-
mechanisms of norm integration in the fran- efits from a number of automatic tools to
cophone Wikipedia. He remarks that, until supervise new modifications. As Dominique
the end of 2003, the issue of references was Cardon emphasizes: ‘The epistemic rationale
not present in the foremost online encyclope- guiding a large majority of behaviors on
dia. Indeed, the processes of modification Wikipedia must not be looked for in people,
and discussion were prevalent before the sit- nor in the interface, but in the mutual adjust-
uation evolved in the mid 2000s. The 2004 ments allowing people to interact on the
recommendation ‘cite your sources’ is argu- wiki’ (2015: 19, our translation).
ably born out of the important augmentation The example of Wikipedia shows how dig-
in readership, increasing tensions, and multi- ital environments raise multiple questions on
ple ‘editing wars’. Analyzing the process how much autonomy users have, when faced
through which a norm is imposed – which with automatized decision-making. More
entails heavy discussions on the initial obli- broadly, architectures, programs, computer
gation it signifies – Langlais explains: codes, algorithms, interfaces and recom-
mendation systems influence and co-shape
In early 2007, the normative exigence of reference the possible choices and decision-making
citation has reached its maturity […] Actually, all is
capacities of users. In which ways do digital
left to do. Non-referenced contributions keep on
coming […] Created in January 2007, the environments introduce visible or invisible
References Project initially gathers fifteen or so elements that are susceptible to influencing
contributors. The emphasis is placed on the evan- participation and decision? Indeed, Web his-
gelization of the community […] Several practical tory would greatly benefit from a diachronic
tools are created to facilitate this evangelization
study of environments, platforms and Web
process, including a number of banners signaling
the lack or the necessity of a reference. Parallel to architectures exploring the evolutions that
this external diffusion work, the References Project took place via a strong integration of hard-
promoters develop an important clarification work. ware and software, of material and infor-
The first discussions center on suggestions of tools, mational infrastructures, from the DIY of
and on proposals of rules to adopt. This strong
the nineties to the development of Content
coalescence of the technical sublayer and human
organization appears as consubstantial of the wiki Management Systems (see, in this volume,
system. (2015: 85–6, our translation) the ‘Web History in Context’ chapter).
Code, software, bots or hardware are no
Prominent Internet historian Janet Abbate longer conceived as ‘mere’ technical devices
(2012: 171) argues that since its early stages, but as actors themselves, and the result of
the network of networks has been about the human decisions at once – in short, as socio-
merging of infrastructure and culture, which technical constructions. In this view, an ana-
has constituted hybrid agencies, complex lytical examination of the place of robots, of
interactions between machine- and human- interfaces, of algorithms or infrastructures
originated agencies in the creation of content (Musiani and Schafer, 2011; Star, 1999) is
and the elaboration of communication prac- no longer an ‘internalist’ approach consist-
tices (ibid.: 176). In this light, we can con- ing in the exploration of ‘black boxes’ and an
ceive of Wikipedia as a nexus between ‘already-made’ assessment of technology’s
technical and human infrastructures, and capacity to constrain human decision. Instead,
examine the ways in which contributions and it becomes the analysis of a social, political,
collective, individual behaviors and tools technical and economic co-construction, to
SCIENCE AND TECHNOLOGY STUDIES APPROACHES TO WEB HISTORY 81

be conducted all along the digital ‘chain’, enforcement to algorithms and automated
from developers to users, exploring techni- devices. As noted by Star (1999: 337–9):
cal architectures, upper and lower layers, ‘[…] Study an information system and
design, applications, aggregation techniques neglect its standards, wires, and settings, and
and algorithms (Barocas et al., 2013; Rieder, you miss equally essential aspects of aesthet-
2012), and platforms (Gillespie, 2010). ics, justice, and change’.
In this regard, the notion of governmental- This ‘relational’ approach entails impor-
ity seems appropriate as the cornerstone of a tant changes in methods, as STS-informed
theoretical framework that enables us to think fieldwork can go as far as including the are-
of the ways in which technical devices tend nas where the shaping of infrastructures and
to co-structure the range of possible actions architectures are observed, deconstructed and
(Foucault, 2004). Within the more specific reconstructed. Arenas where political deci-
field of digital technologies, the notion of sions – explicitly so, or de facto – are taken
‘algorithmic governmentality’ (Rouvroy concerning the code, the technical norms, the
and Berns, 2013) invites us to think how the ‘tinkering’ and reconfigurations of technical
digital ‘makes’ Internet users do things, and objects (Star and Bowker, 2006: 151–2). In
simultaneously, highlights its empowering recent work (Schafer et al., 2016) we have
dimension. In digital environments, different set out to apply this perspective to the study
possibilities and multiple constraints merge of Web archiving, thus seeking to respond to
and simultaneously express themselves in Tim Hitchcock’s interrogation:
the development of action (Badouard et al.,
2016). This fusion of what is possible and Where I end up is seriously jealous of the possibili-
what constrains can also be found in the ties; and seriously wondering what the ‘object of
study’ might be. In the nature of an archive, the
constant negotiations taking place in the UK Web Archive imagines itself as an ‘object of
field of Web archiving and, in the historian’s study’; created in the service of an imaginary
eyes, introduce further arenas of ‘mediation’ scholar. The question it raises is how do we turn
within the Web of the past. something we really can’t understand, cannot
really capture as an object of study, to serious
purpose? How do we think at one and the same
time of the web as alive and dead, as code, text,
Web Archiving as a Microcosm of and image – all in dynamic conversation one with
the other. And even if we can hold all that at once,
Internet Governance
what is it we are asking? (Hitchcock, 2015)
In 1980, Langdon Winner’s seminal paper
asked: ‘Do artefacts have politics?’ By ‘poli- By opening the black boxes of Web archiving
tics’, Winner meant the ‘arrangements of (Musiani and Schafer, 2015) and by observ-
power and authority in human associations as ing the processes through which the Web of
well as the activities that take place within the past is ‘negotiated’ in several arenas of
those arrangements’ (Winner, 1980: 123). formal and informal governance today
Applying this hypothesis to the study of Web (Schafer et al., 2016), we have been able to
archiving means to study how its distributed, demonstrate how Web archiving, as several
diffused and technology-embedded nature other arenas in the past and the current
(DeNardis, 2014: 8) ‘can embody specific Internet, relies upon a multi-stakeholder
forms of power and authority’. Observing model of governance.
infrastructures, Web archives’ design and Indeed, the dialectic between different
their stakeholders’ ‘coming together’ entails practices and sources of normativity – con-
looking into the scripts (Akrich, 1992) that curring or complementary – that can be found
perform role-sharing, the distribution of in Internet governance may as well be found
competencies and some delegation of rule in Web archiving. The initial ‘anarchistic’ and
82 THE SAGE HANDBOOK OF WEB HISTORY

flamboyant elements of the early Internet, for our access to Internet sites ‘Neutral’ and not at the
example, meet their match in the Archive discretion of companies and governments. (Kahle,
2014)
Team’s ironic motto (‘We are going to rescue
your shit!’), which has indeed opened up the
And when he appealed for funding to create
way to some iconic rescues, such as Geocities
a complete copy of the Archive’s digital col-
from the shutdown by Yahoo!. Twitter and
lections in Canada after the November 2016
Facebook’s collection and capture of private
elections in the United States:
data for archiving purposes is a testament to
the component of Web archiving governance We are building the Internet Archive of Canada
that is dependent upon the role of the private because, to quote our friends at LOCKSS, ‘lots of
sector. National and international institu- copies keep stuff safe.’ […] On November 9th in
America, we woke up to a new administration prom-
tions occupy a central role, as illustrated by
ising radical change. […] It means preparing for a
projects such as the French legal deposit and Web that may face greater restrictions. (Kahle, 2016)
by entities such as the UNESCO chart on
Digital heritage in 2003, the Internet Archive As shown in Schafer et al. (2016), Web
and the International Internet Preservation archiving reactivates the same polarizations,
Consortium (IIPC) – with nuances ranging negotiations and dynamics between actors
from the ‘international’ as the sum of national which had emerged at the time of Internet
initiatives, to the ‘transnational’ approaches. governance’s birth – and it mirrors the fact
Finally, the role of standardization bodies that the present-day digital world is still
and technical discussions that occupy such a developing unevenly.
prominent place in Internet governance, like
the W3C and the IETF, can also be found in
Web archiving with the IIPC meetings.
We have also shown how the notion of CONCLUSION
co-construction has found its way into Web
archiving, where the main categories of This chapter has made an argument for – and
Internet governance actors may be found – as illustrated by means of several examples –
well as their tensions. It is the case, for exam- the suitability and usefulness of an STS-
ple, of controversies between the common informed perspective on the history of the
good and proprietary formats, and between Web and Web archives. We have looked back
different imaginaries of the Internet and the to the first ‘encounters’ between STS and
Web. In this regard, the missions established ICTs, and then, with the Internet in particu-
by, or delegated to, Web archiving organiza- lar, and with other disciplines that have
tions are interesting to observe, as are the geo- sought to investigate the ‘media’ potential of
political and political tensions that contribute such tools, the chapter has come to examine
to shaping and reshaping them. Brewster how STS concepts, in particular that of
Kahle’s September 2014 and November 2016 boundary object in its multiple facets, may
appeals, respectively (Kahle, 2014, 2016), be of use in shedding light on Web history
show how a reflection on the perimeter and and, in particular, on its governance.
mission of his project took place when he Indeed, as the second part of this chapter
raised public attention to China’s blocking of has shown, ‘tackling the macro questions
the Internet Archive: of politics and power related to IG requires
unpacking the micro practices of governance
China started blocking the Internet Archive again as mechanisms of distributed, semi-formal
a couple of months ago, we believe, because they
do not like our open access policies. In this way,
or reflexive coordination, private ordering,
we have started to understand the power in the and use of internet resources’ (Epstein et al.,
hands of the Internet service providers. Let’s keep 2016). STS can shed light on how, in the wide
SCIENCE AND TECHNOLOGY STUDIES APPROACHES TO WEB HISTORY 83

and vast march of history producing seem- Badouard, R., Mabi, C. and Sire, G. (2014)
ingly stable arrangements, small ‘histories’ ‘Numérique et gouvernementalité. Les tech-
of taken-for-granted, mundane activities of nologies entre possibles et contraintes’,
design and use contribute to doing and mak- SFSIC Annual Conference 2014, https://
ing the Web and its governance. sfsic2014.sciencesconf.org/30651/document
Badouard, R., Mabi, C., Mattozzi, A., Schubert, C.,
Incorporating into the analysis of Web
Sire, G. and Sørensen, E. (2016) ‘STS and
history(ies) the intertwining of technical media studies: Alternative paths in different
and political governance, the ever-chang- countries’, Tecnoscienza, 7(1): 109–28.
ing games of alliances and power balances Balbi, G., Delfanti, A. and Magaudda, P. (2016)
between very different actors, the visions, ‘Digital circulation: Media, materiality, infra-
imaginaries and social worlds they each structures. An introduction’, Tecnoscienza,
bring to the table, the agency of non-human 7(1): 7–16.
actors and infrastructures as loci of media- Barocas, S., Hood, S. and Ziewitz, M. (2013)
tion… these STS sensibilities enable us to ‘Governing Algorithms: A Provocation Piece’,
connect the micro actions of individuals and Discussion Paper for the Governing Algo-
the affordance of particular technical arti- rithms conference, NYU, May 16–7, 2013,
http://ssrn.com/abstract=2245322
facts with emergent attributes of large, com-
Berners-Lee, T. (2000) Weaving the Web. New
plex systems (ibid., 2016). As such, they add York: Harper-Business.
another layer of understanding and apprecia- Boczkowski, P. and Lievrouw, L. (2007) ‘Bridg-
tion of how the Web of our (recent) past was ing STS and communication studies: Scholar-
negotiated – and still is. ship on media and information technologies’,
in Hackett, Edward, Amsterdamska, Olga,
Lynch, Michael and Wajcman, Judy (eds.),
The Handbook of Science and Technology
Acknowledgments
Studies, third edition. Cambridge, MA: The
This study has been carried out within the MIT Press, pp. 949–77.
framework of the Web90 project supported Callon, M. (1986) ‘Some elements of a sociol-
by the French National Research Agency ogy of translation: Domestication of the scal-
lops and the fishermen of St Brieuc Bay’, in
(ANR-14-CE29-0012-01).
Law, John (ed.), Power, Action and Belief: A
New Sociology of Knowledge? London:
Routledge, pp. 196–223.
Cardon, D. (2015) ‘Surveiller sans punir’, in
REFERENCES Barbe, Lionel, Merzeau, Louise and Schafer,
Valérie (eds.), Wikipedia, Objet Scientifique
Abbate, J. (2012) ‘L’histoire de l’Internet au non Identifié. Nanterre: Presses Universitaires
prisme des STS’, Le Temps des Médias, 18: de Paris Ouest, pp. 15–39.
170–80. Cardon, D. and Levrel, J. (2009) ‘La vigilance
Agre, P. (2003) ‘Peer-to-peer and the promise participative. Une interprétation de la gou-
of Internet equality’, Communications of the vernance de Wikipédia’, Réseaux, 2(154):
ACM, 46(2): 39–42. 51–89.
Akrich, M. (1992) ‘The de-scription of technical de Fornel, M. (1994) ‘Le cadre interactionnel de
objects’, in Bijker, Wiebe & Law, John (eds.), l’échange visiophonique’, Réseaux, 64:
Shaping Technology/Building Society. Stud- 107–32.
ies in Sociotechnical Change. Cambridge, DeNardis, L. (2014) The Global War for Internet
MA: The MIT Press, pp. 205–24. Governance. New Haven: Yale University
Appel, V., Boulanger, H. and Massou, L. (eds., Press.
2010) Les Dispositifs d’Information et de Epstein, D., Katzenbach, C. and Musiani, F.
Communication. Concepts, Usages et Objets. (2016) ‘Doing internet governance: Prac-
Bruxelles: De Boeck. tices, controversies, infrastructures, and
84 THE SAGE HANDBOOK OF WEB HISTORY

institutions’, Internet Policy Review, 5(3): Louise and Schafer, Valérie (eds.), Wikipedia,
DOI: 10.14763/2016.3.435 Objet Scientifique non Identifié. Nanterre:
Foucault, M. (2004) Sécurité, Territoire, Popula- Presses Universitaires de Paris Ouest, pp. 77–90,
tion, Cours au Collège de France 1977– http://books.openedition.org/pupo/4106
1978. Paris: Seuil. Latour, B. (1987) Science in Action: How to
Gillespie, T. (2010) ‘The politics of “plat- Follow Scientists and Engineers Through
forms”’, New Media & Society, 12(3): Society. Cambridge, MA: Harvard University
347–64. Press.
Gillespie, T., Boczkowski, P. and Foot, K. (eds., Latzko-Toth, G. (2010) La co-construction d’un
2014) Media Technologies: Essays on Com- dispositif socio-technique de communication:
munication, Materiality and Society. Cam- Le cas de l’Internet Relay Chat. PhD disserta-
bridge, MA: The MIT Press. tion, Université du Québec à Montréal.
Gitelman, L. (ed., 2013) Raw Data is an Oxy- Law, J. (1987) ‘Technology and heterogeneous
moron. Cambridge, MA: The MIT Press. engineering: The case of Portuguese expan-
Griset, P. and Schafer, V. (2011) ‘Hosting the sion’, in Bijker, Wiebe E., Hughes, Thomas P.
World Wide Web Consortium for Europe: and Pinch, Trevor (eds.), The Social Construc-
From CERN to INRIA’, History and Technol- tion of Technological Systems. New Direc-
ogy, 27(3): 353–70. tions in the Sociology and History of
Harmon, A. (19 January 1998) ‘Technology to Technology. Cambridge, MA: The MIT Press,
Let Engineers Filter the Web and Judge Con- pp. 111–34.
tent’, The New York Times. Law, J. (1992) ‘Notes on the theory of actor-
Helmond, A. (2013) ‘The algorithmization of network: Ordering, strategy and heteroge-
the hyperlink’, Computational Culture. neity’, Systems Practice, 5(4): 379–93.
http://computationalculture.net/article/ Lerner, A., Kornfeld Simpson, A., Kohno, T. and
the-algorithmization-of-the-hyperlink Roesner, F. (2016) ‘Internet Jones and the
Hitchcock, T. (2015) ‘The UK Web archive, Raiders of the Lost Trackers: An Archaeologi-
born-digital sources, and rethinking the cal Study of Web Tracking from 1996 to
future of research’, Web Archives for Histori- 2016’, Proceedings of the 25th USENIX
ans blog, http://webarchivehistorians.org/ Security Symposium (USENIX Security 16),
tag/tim-hitchcock/ August 2016, https://trackingexcavator.
Hondros, J. (ed., 2015) ‘The Internet and the cs.washington.edu/InternetJonesAndTheR-
material turn. Special issue’, Westminster aidersOfTheLostTrackers.pdf
Papers in Communication and Culture, 10(1): Lievrouw, L.A. (2014) ‘Materiality and media in
http://www.westminsterpapers.org/27/ communication and technology studies: An
volume/10/issue/1/ unfinished project’, in Gillespie, Tarleton,
Kahle, B. (2014) ‘Please Help Protect Net Neu- Boczkowski, Pablo and Foot, Kirsten (eds.),
trality’, Internet Archive Blog, https://blog. Media Technologies: Essays on Communica-
a rc h i v e . o r g / 2 0 1 4 / 0 9 / 1 0 / p l e a s e - h e l p - tion, Materiality and Society. Cambridge:
protect-net-neutrality/ The MIT Press, pp. 21–52.
Kahle, B. (2016) ‘Help Us Keep the Archive Malcolm, J. (2008) Multi-Stakeholder Govern-
Free, Accessible, and Reader Private’, Inter- ance and the Internet Governance Forum.
net Archive Blog, https://blog.archive. Wembley, WA: Terminus Press.
org/2016/11/29/help-us-keep-the- Marres, N. and Gerlitz, C. (2015) ‘Interface
archive-free-accessible-and-private/ methods: Renegotiating relations between
Kline, R. and Pinch, T. (1996) ‘Users as agents digital social research, STS and sociology’,
of technological change: The social construc- The Sociological Review, 64(1): 21–46.
tion of the automobile in the rural United Milligan, I. (2014) ‘Extracting links from Geoci-
States’, Technology and Culture, 37(4): ties and throwing them at the wall’,
763–95. Blog Digital History, Web Archives, and Con-
Langlais, P.-C. (2015) ‘{{Référence nécessaire}} temporary History, https://ianmilligan.
L’émergence d’une norme wikipédienne ca/2014/07/23/extracting-links-from-
(2003–2009)’, in Barbe, Lionel, Merzeau, geocities-and-throwing-them-at-the-wall/
SCIENCE AND TECHNOLOGY STUDIES APPROACHES TO WEB HISTORY 85

Muniesa, F., Millo, Y. and Callon, M. (2007) ‘An http://pro.ovh.net/~iskofran/pdf/isko2009/


introduction to market devices’, Sociological PROULX.pdf
Review, 55(2): 1–12. Rieder, B. (2012) ‘What is in PageRank? A his-
Musiani, F. (2015) ‘Practice, plurality, performa- torical and conceptual investigation of a recur-
tivity and plumbing: Internet governance sive status index’, Computational Culture, (2).
research meets science and technology stud- Rouvroy, A. and Berns, T. (2013) ‘Gou-
ies’, Science, Technology and Human Values, vernementalité algorithmique et perspectives
40(2): 272–86. d’émancipation’, Réseaux, 177(1): 163–96.
Musiani, F. and Schafer, V. (2011) ‘Le modèle Schafer, V., Musiani, F. and Borelli, M. (2016)
Internet en question (années 1970–2010)’, ‘Negotiating the Web of the past. Web
Flux, 85–86(3–4): 62–71. archiving, STS and governance’, French Jour-
Musiani, F. and Schafer, V. (2015) ‘Opening the nal for Media Research, 6, http://french
Black Box of Web Archiving: STS Approaches journalformediaresearch.com/lodel/index.
and the Governance of Born-Digital Herit- php?id=952
age’, First RESAW Symposium, Web Archives Siles, I. (2011) ‘From online filter to web
as Scholarly Sources: Issues, Practices, Per- format: Articulating materiality and meaning
spectives, June 8–10, 2015. in the early history of blogs’, Social Studies
Oudshoorn, N. and Pinch, T. (eds., 2003) How of Science, 41(5): 737–58.
Users Matter. The Co-Construction of Users Star, S.L. (1999) ‘The ethnography of infra-
and Technologies. Cambridge, MA: The MIT structure’, American Behavioral Scientist,
Press. 43(3): 377–91.
Paloque-Berges, C. (2017) ‘Vers des lieux de Star, S. L. (2010) ‘This is not a boundary object:
mémoire réticulaires? Construire un patri- Reflections on the origin of a concept’, Science,
moine de la communication des sciences et Technology & Human Values, 35(5): 601–17.
des techniques du numérique’, RESET, 6, Star, S.L. and Bowker, G. (2006) ‘How to Infra-
http://reset.revues.org/839 structure’, in Lievrouw, Leah A. (ed.), Hand-
Parikka, J. (2012) What is Media Archaeology? book of New Media. London: Sage, pp.
Cambridge: Polity Press. 151–62.
Pinch, T.J. and Bijker, W.E. (1987) ‘The social Star, S.L. and Griesemer, J. (1989) ‘Institutional
construction of facts and artifacts’, in Bijker, ecology, “translations” and boundary objects:
Wiebe E., Hughes, Thomas P. and Pinch, Amateurs and professionals in Berkeley’s
Trevor (eds.), The Social Construction of Museum of Vertebrate Zoology, 1907–39’,
Technological Systems. New Directions in the Social Studies of Science, 19(3): 387–420.
Sociology and History of Technology. Cam- Trompette, P. and Vinck, D. (eds., 2009)
bridge, MA: The MIT Press, pp. 17–50. ‘Retour sur la notion d’objet-frontière’, Revue
Proulx, S. (2009) ‘L’intelligence du grand d’Anthropologie des Connaissances, 3(1): 5–27.
nombre: La puissance d’agir des contribu- Wellman, B. (2004) ‘The three ages of Internet
teurs sur Internet – limites et possibilités’, studies: Ten, five and zero years ago’, New
7ème colloque du chapitre français de l’ISKO, Media & Society, 6(1): 123–9.
Intelligence collective et organisation des Winner, L. (1980) ‘Do artifacts have politics?’,
connaissances, Lyon, 24–26 juin 2009. Daedalus, 109(1): 121–36.
7
Theorizing the Uses of the Web
Ralph Schroeder

INTRODUCTION unless we know how the web is used, it is dif-


ficult to understand other questions about the
The web has been with us for more than a web, such as how the web should be archived,
quarter century. It has clearly become a major or how we should think about its history, or
medium, but it is equally clear that how the tackle various policy questions.
web is used does not fit existing theories of This chapter will give an overview of vari-
mass or interpersonal media. Apart from ous perspectives on web uses which can be
media and communication studies, the main seen as elements that need to come together
discipline that might be expected to provide for an overall framework for understanding
insights into the web is information science. their implications. It will begin by discussing
But information science has a rather limited various disciplinary approaches to the web.
perspective on the web, and hardly deals with Next, it will review what is known about
how people seek information in everyday life. everyday information practices related to the
This chapter will discuss various approaches web. In the following section, we can take
to how the web is used and point the way for- the concrete example of Wikipedia, one of
ward by developing a theoretical framework. the most well-known and important sources
It may seem futile to try to fit the uses of the of online information. At that point, we can
web into an overarching framework since they turn the shape of the web, its scale, scope,
are rapidly changing, but a case can also be and interconnectedness. Finally, against this
made for the opposite view: that we can only background, it will be possible to examine the
understand the social implications of the web web as a new information infrastructure. The
if we start to pin down how the web fits into conclusion can then point to the implications
people’s overall information and communica- of the framework that has been developed,
tion uses. And, in the context of this Handbook, for policy, ethics, and future web scholarship.
THEORIZING THE USES OF THE WEB 87

DISCIPLINARITY news, see Kümpel et al., 2015). Still, these


are first sought – and then shared or commu-
The reason why people’s information-seek- nicated. There are other complications and
ing behaviours are poorly understood is clarifications to be discussed below, but the
partly due to the legacies of disciplinary spe- idea of ‘seeking’ information – one way –
cialization. Media and communication schol- will do for now.
ars have studied email and more recently The problem that information seeking falls
social media, but they are only beginning to between disciplines will only become more
study the web as such. Information scientists acute over time since, increasingly, most
have been mainly interested in how research- forms of accessing any content, including the
ers and students seek information, but not the consumption of news and entertainment, will
population-at-large. And political communi- take place online. However, it may become
cation researchers have been concerned with necessary to distinguish between when peo-
whether digital media democratize or enable ple go online to actively seek information as
greater control – but less so with how people opposed to when they passively consume the
seek political information. The web also has pre-packaged sources of news and entertain-
features which set it apart from other media: ment with which they are provided in mass
for example, user-generated content is a media. Another distinction that can be made
major portion of web content, but where does is between serious and non-serious informa-
it fit into theories of traditional mass or inter- tion, a distinction that can be found (though
personal media? it is not developed) in Savolainen (2008).
Information seeking has so far been poorly Serious information relates to needs, or the
theorized in the social sciences (Sonnenwald, practical means to develop one’s capabilities
2016) and there has been scant research on the (Sen, 2009), as opposed to wants, which are
role of information seeking in everyday life not required for capabilities but rather for lei-
(Rieh, 2004; Savolainen, 2008; Aspray and sure or consumption.
Hayes, 2011). Information can be defined as There is a further requirement, related to
codified accounts that can answer questions individuals’ access to and ability to use seri-
such as ‘who’, ‘what’, ‘where’, and ‘why’; or, ous information, which needs to be part of a
as a cognitive input that makes a difference theory of information, which is that informa-
to the person’s relation to the physical and tion (like other media) must be reliable, open,
social environment (a ‘cybernetic’ definition; and diverse. If it is not, information could be
see Gleick, 2011). This sets information apart misleading, asymmetric in terms of access
from the more complex ‘knowledge’, which (because of paywalls and the like), and
can be seen as the more analytical processing skewed towards being dominated by certain
or organization of information on one side groups. Yet as long as there is such a rich and
(Stehr, 1994; Meyer and Schroeder, 2015 inclusive environment, the need for serious
for scientific knowledge) and the simpler or information can be met, which will also ena-
raw ‘data’ (Schroeder, 2016) on the other. ble social scientists to identify the types of
Information can also be distinguished from information that are central to social change
communication: information is one way, and which types are less so (and in the lat-
communication two way – whether one-to- ter case, more a matter of documentation or
one (interpersonal) or one-to-many (mass). academic specialisms rather than part of the
This also makes it possible to distinguish the role of the web and other media as integral
web from other sources; the web is primarily to social transformation). Clearly, identify-
a one-way online source of information. One ing information use that is integral to social
complication is that the web allows sharing, change in this way is a contentious issue. Yet
via sending links, for example (for sharing making such distinctions is part of any social
88 THE SAGE HANDBOOK OF WEB HISTORY

science of media, just as it is a (somewhat dif- information infrastructure of society as seen


ferent) task for historians, including histori- from an everyday user perspective.
ans of the web or those who use the web as a One potential objection to treating the web
resource for analysing history. separately must be confronted: why treat the
To analyse information seeking on the web, online separately from offline, and why sepa-
a number of disciplines – media and commu- rate the web when it encompasses so much
nications, sociology, information science, and online material yet overlaps with many inter-
some areas within computer science – can be net uses such as sharing links in emails, using
drawn upon, but there is no single discipline apps, and much more? One reason is that the
which has done so in a unified way to date. boundaries of content are themselves chang-
Search engine companies and marketing ing: news, for example, could formerly be
companies, of course, have lots of knowledge restricted to traditional print and television
about ‘user behaviour’ in information seek- and (often) primarily national news sources,
ing, though this knowledge hardly finds its but blogs and YouTube videos and many other
way into academia. Yet the main purpose of sources can now also be regarded as news, and
their knowledge of user behaviour is to tailor they often transcend borders. And even where
information more precisely to certain users or audiences for content are still mainly defined
groups and to target them, which can be seen by nationality and language, perhaps the most
as a more powerful or refined form of mass interesting new forms of accessing informa-
communication. Still, there are some emerg- tion are cross-border (for example, diasporas)
ing areas of research, Web Studies (Brügger, or across language (as when people use tools
2012) and Web Science (a computer science, like Google Translate). The same applies to
ACM, conference, see http://www.websci16. other categories of content such as advertis-
org/) that will be helpful. However, these will ing, educational videos, and blogs. It makes
need to be connected to established social little sense to examine only a narrow slice
science disciplines and to long-standing of online content when people often browse
social science debates since social scientists and stumble upon unexpected sources across
are interested in what information people boundaries. And while web uses are compa-
actually routinely search for. rable to offline and other online sources, it is
On this topic, there is as yet little research, clearly also a separable entity, and informa-
though Hektor (2001) undertook a pioneer- tion seeking on the web a separable activity.
ing study even before the web was widely There is a final point about disciplinar-
used, examining mostly offline sources. Yet ity that must be mentioned here: there is
Hektor’s study also points to how little is the emerging specialist discipline of studying
known about information uses from another the history of the internet and the web. That is
perspective, because what he examined, via the backdrop to – but not the focus of – this chapter.
an in-depth study of Swedes’ daily informa- Instead, the focus here is: what do the uses
tion uses, is how they sought information of the web, in everyday life and for seeking
from various sources, such as libraries, bus information (though ‘seeking’, as we shall
timetables, printed encyclopaedias, recipe see, is a subset of ‘uses’), tell us about how
books, noticeboards, other people (in per- our media habits (again, media = communi-
son), and the media. What this highlights is cation + information) are changing? In other
that the sources of information before the words, what place do web uses and informa-
web were not studied as such (but see Aspray tion have in the social science of media? The
and Hayes, 2011); instead ‘the media’ were major part of this change is that the web has
studied and other (offline) sources of infor- added information uses to communication
mation fell into various other disciplinary uses, making information seeking a much
niches without being theorized as part of the greater part of everyday life. If this argument
THEORIZING THE USES OF THE WEB 89

is correct, then this analysis of web uses must something every day. We can also think of
inform web history: on the analogy with the the variety of questions that are ‘googled’:
history of newspapers, why study the history persons, places, schedules, brands, services,
of newspapers – though that is also a sepa- scientific and technological novelties, dis-
rate enterprise – without also looking at the eases, popular culture references, and much
role of those who read them, and what role more. Or again, think of how often we say:
they played in society in general? Similarly, ‘what did we do before Google?’ Social sci-
with the web: what role does it play as part of ence has simply not caught up with how to
our uses and sources of information – in all make sense of the changes brought about by
media, or generally? How do the web’s uses the web as a central and new part of our lives.
among other media vary? Which groups use It is in any event a departure from traditional
this source among other sources? Without this media, which were, in terms of seeking infor-
context, any web history will be incomplete. mation, constrained by their physical availa-
All this can be put differently: what kind of bility. Note that the perspective taken here is
social science of media, with the web as one that the important aspect for understanding
medium, does web history need? Web history web uses is what people seek and do with the
may be concerned with lots of individual top- information – not with how it is produced.
ics – the history of companies on the web, the The structure of what is available and how
history of fandom on the web, the history of this material is used is central to everyday
literatures on the web, and many more. But life, and can be seen as an extension of soci-
a social science of media provides the most ety’s media infrastructure (which will be
encompassing frame for all these (the uses of discussed below). The question of how infor-
the web and other media), while also singling mation is produced, even if the possibility of
out or separating out the most important ones producing user-generated content has also
(serious uses), while more detailed uses are become more widespread, is a separate one.
allocated their place for more specialist dis- One approach in respect of what is new here
ciplinary treatments (by, say, business histori- might be to argue, as Castells does (2009),
ans, cultural historians, literary historians, and that the main change is ‘self-selection’,
the like). As will be readily apparent, this is whereby people choose the online material
partly a matter of the division between social that best fits their needs. Yet this approach
science of the media and the various fields of overlooks that there are new gatekeepers even
history, which could be drawn differently. But when information is chosen, as with search
there must be, for social science, a general engines or social media feeds or other con-
impact of the web or way in which the web straints on the visibility or accessibility of
can be seen as resulting in social change, and web pages. And people do not select informa-
understanding this impact or change, I would tion from a limitless set of possibilities; there
argue, is a precondition for doing web history, are constraints to what they seek and find,
even if it does not determine or limit it. dictated, among other things, by their digital
literacy and their routines or habits, and also
how these carry over from traditional media.
However, it is true that, apart from these gate-
INFORMATION SEEKING IN keepers, some online resources both extend
EVERYDAY LIFE existing sources and are openly accessible,
and so increase what can be ‘selected’.
Seeking online information is still changing, In any event, the web as an information
but it has also become so commonplace as to source is distinct from the information sources
be invisible: we can think here of how often of traditional mass and interpersonal media,
we, or anyone with internet access, ‘googles’ but it also competes with other digital media
90 THE SAGE HANDBOOK OF WEB HISTORY

for attention: for example, even if Wikipedia found that the vast bulk of content sought
is often among the top results when using a relates to consumer activity and popular cul-
search engine, there are many other online and ture. Less than 2% of search queries related
offline sources that could be used (with and to ‘serious information’ (recall the distinc-
without the use of search engines). Further, tion made earlier between serious and non-
online information sources are examples of serious information), such as civic, health,
content being pushed towards users in a tar- and political information. Yet this small per-
geted way, as when search results prioritize centage still amounts to millions of searches
finding Wikipedia articles (or other informa- per day in a country with a relatively small
tion) for particular users because the search population. Another surprising finding was
algorithm has been tailored towards specific that search queries hardly varied between
users (Pariser, 2011). Finally, these new socio-economic groups. In any event, the fact
online sources add to and complement others, that such a small proportion of information
but media systems also shape them: in China, is ‘serious’ also raises the question whether
the Chinese-language version of Wikipedia information ‘seeking’ is appropriate: most
has been blocked for certain periods, and a uses of the web are for leisure, ‘surfing’ the
different online encyclopaedia, Baidu Baike, web, rather than looking for material. This
with content controlled by the company Baidu difference is sometimes labelled ‘directed’ as
(one of the biggest in China), which is in turn against ‘undirected’ seeking, but speaking of
influenced by the government, is dominant on ‘seeking’ as a subset of ‘uses’ may be more
the mainland (Liao, 2009). appropriate in the context of everyday life.
A number of methods have been used to Another study, also by Waller (2013),
understand information behaviours, includ- provides a ‘bottom-up’ perspective, since a
ing interviews (Rieh, 2004; Savolainen, top-down or quantitative perspective misses
2008) and focus groups (Hargittai et al., how information is actually used. To be sure,
2012). Focus groups are useful because they online information may be reliable and avail-
can elicit responses people may not be aware able, but can people actually use it? Waller
of in their own behaviours (information seek- (2013) gives the example of an immigrant
ing is often of this nature, since people do not woman in urban Australia seeking informa-
consider it a separate activity, even though tion that she needed to obtain from the Red
researchers do). What Hargittai et al. (2012) Cross. Because of her situation, she was
found in their study of information seeking unable to access any of the information she
is that Americans did not think of themselves needed online. Instead, she had to make
as overburdened by the flood of diverse types several arduous trips to find the relevant
of information; in other words, there was no Red Cross office in person, and then wait in
self-report of ‘information overload’. What queues to be seen. This took her several days,
people found objectionable was the low qual- information that a young person with the req-
ity and repetitive nature of this information – uisite educational and linguistic and digital
scandals, endless crime reports, and tabloid- skills would have found online in minutes!
type news. Tellingly, however, they neverthe- This is admittedly an extreme example, but
less consumed this information in abundance. it behoves researchers who have access to
A different approach is quantitative, devices, online resources and skills, and with
and here the study of search engine use in the ability to ask others in their networks –
Australia (Waller, 2011a) can be highlighted. the most extreme ‘information haves’ – to
This study was based on access to com- think about people who have access neither
mercial data about users, both their search to the devices or skills, nor to others who
query keywords and demographic informa- could help them, the ‘information-have-less’
tion about who searches for what. The study or ‘information-have-nots’ (see Qiu, 2009).
THEORIZING THE USES OF THE WEB 91

It is also worth bringing to mind how access high-income countries, and certainly for
to appropriate – or again, serious – informa- the majority in lower-income countries like
tion is increasingly becoming a vital resource China and India, smartphones are already the
in all walks of life; not just access to, but the main means of engaging with the internet and
ability to find and use information, again, can the web (Donner, 2015). This is a momen-
be seen as a question of justice, essential to tous shift, opening up new ‘digital divides’
developing one’s ‘capabilities’ (Sen, 2009). (Napoli and Obar, 2015): it may not be vastly
The advantage of a bottom-up user per- important whether people contact each other
spective is that it can challenge conventional via apps on smartphones, since they also have
wisdom. One further example can suffice: it other means to do so. But for information,
is often said that China’s censorship is highly and since their smartphones are becoming
effective and that it is particularly aimed at constant companions and means for master-
preventing ‘harm’ by strictly and effectively ing ever greater parts of most people’s lives,
censoring (among other things) pornographic and as non-digital sources of information
materials. This may be true for some mate- disappear – how people access information
rials, but as Hockx (2015) has shown in his via smartphones, and what they access, will
study of ‘internet literature’, it is also highly become an ever more central question in the
misleading. First, by way of context, he notes social sciences. But the experience of the
that around 40% of Chinese access online lit- web on smartphones can be quite different
erature (a specific literary genre rather than from the experience of the web on a PC.
simply reading books on Kindle-type devices, Another example of these differences or
and nowadays often on mobile apps) – divides is the use of the voice interface for
a uniquely high proportion worldwide. This search on smartphones (Siri, the assistant on
surprising result is itself worth pursuing from the Apple iPhone, is the most well-known
a comparative perspective: is seeking out example): how is what people find via voice
a web-specific form of literature uniquely different from text searches? Only one tech-
popular to China, or can similar phenomena nical paper (Guy, 2016) on this question
be found around the globe? Second, Hockx exists to date, which finds that search que-
documents the popularity of erotic fiction, ries are indeed quite different: longer (unex-
bordering on pornographic, and particularly pectedly so) and more like spoken language
among women. While the government partly (expected). But there is much more research
censors this material, and online publish- to do on this topic. In the United States, more
ers are working to contain this phenomenon than 50% of younger people already use voice
within limits, it is clear that censorship here, search on a daily basis (Guy, 2016). This way
as in politics, is widely circumvented, and of searching for information will be impor-
that online literature represents a source of tant for those who mainly or exclusively use
online material that transgresses or pushes smartphones, or more widely among those
the boundaries of what is acceptable. These who simply find it more convenient in certain
widespread ways of seeking material online situations to search via voice. If the results
are not much discussed in top-down analyses for search via voice differ in important ways
of government internet policies, or indeed in from those obtained via text-based search,
the study of literary and cultural tastes. Yet then it can be anticipated that significant new
understanding commonplace ways of access- information divides will open up.
ing online materials such as these is highly Not just search engine results, but also
revealing about Chinese cultural tastes. how information is passed to ‘audiences’ or
From a bottom-up perspective, it is also ‘consumers’ has changed profoundly. That
important to consider the device that peo- is because the expertise and commercial
ple use to access information. For many in competition for ‘attracting eyeballs’ in the
92 THE SAGE HANDBOOK OF WEB HISTORY

‘marketplace of attention’ (Webster, 2014) almost nothing is known about what content
are changing quickly for the web (and for tra- is read, even though there are computational
ditional media). One reason that has already tools and data sources for finding out (https://
been mentioned in passing is that informa- en.wikipedia.org/wiki/Wikipedia:Web_sta-
tion is increasingly shared via social media. tistics_tool). Global differences in accessing
Hence the analytics for measuring attention Wikipedia content by language and in differ-
and visibility will have to measure web links ent countries are bound to be highly reveal-
that are shared. This is an exciting new area ing. Equally revealing would be to compare
which has already yielded some interest- Wikipedia with its rival in China, Baidu
ing and non-obvious findings (for example, Baike, which, as mentioned, is more popular
Bright, 2016), for example that what is most than the Chinese-language version of
shared from the BBC news website does not Wikipedia in China because the government
correspond to what is most read in terms of has championed it and curbed Wikipedia
the stories on the main page. (Liao, 2009).
Finally, it might be possible to develop a Wikipedia, like other openly available
taxonomy of the main types of information online information sources, extends the range
sought (from studies such as Waller (2011a), of sources of information available: it is more
discussed earlier). That would also make it accessible and more comprehensive than
possible to examine how people search for other, similar sources, such as offline ency-
particular topics. Anderegg and Goldsmith clopaedias. It has also become a widely used
(2014) show, for example, how interest in resource for a range of topics for many inter-
climate change waxed and waned, and the net users. Yet there are also new gatekeepers:
search terms used in seeking information Waller finds that in Australia, 93% of clicks
on a particular issue (‘global warming’ or on Wikipedia come via Google, and Google
‘global warming hoax’) can be instructive. is the dominant search engine in Australia,
The popularity of topics could be combined with more than a 90% share (Waller, 2011b)
with national and global website rankings of search engine uses (Australia is the coun-
and most frequently searched keywords. On try for which the most detailed analysis of
the qualitative side, this taxonomy could be Wikipedia uses and online information seek-
further refined with individual web browsing ing is available). This new information source
histories to provide rich (‘thick’) descriptions also creates new divides: some Australians do
of the types of information people look for – not have access to the internet or the digital
and how this information fits into their eve- skills to use it even if they do have access.
ryday lives, meets their needs, and is part of An interesting exception to how little
their overall information environment of old we know about Wikipedia web audiences
and new sources and media. are articles about medicine. According to
Heilman and West (2015), there are more
than 155,000 of these, more than 29,000 of
them in English. How often are they read?
WIKIPEDIA In 2013, this English content received 2.28
billion (non-mobile) page views, just under
Against this background, we can turn to half of medical content in all languages on
some specific examples. Wikipedia is the Wikipedia. ‘Medical content accounted for
only non-commercial website in the top ten 0.64% (0.029/4.5 million) of all articles
around the world (http://www.alexa.com/ on English Wikipedia, yet these received
topsites/countries). Much has been written 2.49% (2277/91,252 million) of all English
about the content of Wikipedia and how it is Wikipedia page views’ (Heilman and West,
edited (Schroeder and Taylor, 2015), but 2015: 7). This makes medical content on
THEORIZING THE USES OF THE WEB 93

Wikipedia the single largest source of online boundaries. The best example is China, which
health information in English, followed by is often said to be cut off from the global web.
the websites of the National Institutes of Yet this is misleading in some senses. One
Health (NIH), WebMD, and the Mayo Clinic. reason is that, even in this restrictive environ-
It can be mentioned that over half the editors ment, savvy internet users who wish to get
of medical content pages were health care access to information from within and outside
professionals. And mobile views of English the country can for the most part do so, except
Wikipedia were over 30% in 2014. This kind where online information has been removed
of audience or readership information tells us altogether. A different way to gauge the
a lot about where people find medical infor- global web is to ask: how many companies
mation, subject only to knowing more about dominate online attention? (Schroeder, 2014;
how this fits into and complements the other Pan, 2017) It turns out that, despite the global
online and offline sources they use. dominance of Google and Facebook in online
But while Wikipedia provides a source that advertising (and of Tencent and Baidu in
is widely regarded as reliable for serious mat- China), media concentration among old and
ters (though this will continue to be a matter new media is in fact surprisingly varied
of debate), most of the content that is sought across the globe (see Noam, 2016). Finally,
on Wikipedia via the dominant search engine Google, Facebook, and Amazon are among
Google concerns popular culture and the like, the top ten in most countries. Yet, even in
which arguably falls into the category of lei- China, there are dominant websites that
sure rather than ‘serious’ information (Waller, closely emulate these three: Baidu the search
2011a). But this is also true of search engine engine, Alibaba a major retailer, and Tencent
uses generally, which are mainly used for with its social network site WeChat.
leisure and consumer-related information Web visibility can be measured by several
seeking around the world (Segev and Ahituv, methods: examining top websites globally
2010). Thus, even if online information seek- and nationally, use of keywords in search que-
ing, in this case using Wikipedia as a resource, ries (Google Trends) and, as discussed, trend-
is mainly for topics that are not crucial, it is ing keywords for particular topics (Anderegg
nevertheless easy to see that some categories and Goldsmith, 2014). In addition, more
of information (such as for health, science, and specialized tools can be used, for example
civic topics) can play an important role in peo- for Wikipedia article readership (https://en.
ple’s lives, even if they are far less common. wikipedia.org/wiki/Wikipedia:Web_statis-
Finally, it can be mentioned that Wikipedia, tics_tool. Hyperlink analysis, which has been
too, can be seen as part of a new infrastruc- a common method to gauge web visibility in
ture: Wikipedia is the only non-commercial the past, has been found to be a poor indi-
site in the top ten, but the top websites world- cator of online visibility and web audiences
wide (as we shall see) make up a vast propor- (Barnett and Park, 2014; Wu and Ackland,
tion of the uses of the web as a whole. 2014; Taneja, 2016). This is because hyper-
links are often an indication of the aims of
webmasters, but they can also serve a num-
ber of other functions. Other methods include
SCALE, SCOPE, AND shared website use for the most frequently
INTERCONNECTEDNESS OF THE visited websites, globally or nationally, using
GLOBAL WEB Alexa.com (http://www.alexa.com/) rankings
(Barnett and Park, 2014).
How global is this new information environ- Among the most advanced methods to
ment? To start with, we can say with certainty measure global (and national) visibility is to
that there are fundamental divides or measure online attention. This method (using
94 THE SAGE HANDBOOK OF WEB HISTORY

data from comScore) was used by Taneja and playing an increasing role. State policies
Webster (2016) and is based on two million promoting information and communication
panellists from more than 170 countries, technologies are one factor here, and shared
measured once per month, and includes the language another. Whatever the most impor-
top 1,000 web domains and subdomains, tant factors may turn out to be, the web is not
which together account for 99% of web user becoming a single whole, but rather a series
visits. This method is also able to capture of clusters: linguistic, and those that develop
different formats and genres (Taneja and due to the policies of states and sites promot-
Wu 2014; Wu and Taneja 2016). Among the ing shared interests, such as commerce or
findings is that ‘similarity of languages and personal relations.
a common geographical focus of any two More work is necessary to situate these
websites offer the best explanations of audi- findings within theories of patterns of online
ence overlap between sites’ and ‘the number attention and visibility and information-
of hyperlinks between websites explains very seeking behaviours. And ‘online attention’
little audience overlap’ (Taneja and Webster, and ‘online visibility’ are of course different.
2016: 175). This finding is based on the idea Yet in a sense what people seek and what is
of ‘audience duplication’, whereby the likeli- most visible to people should amount to the
hood of a user visiting one site if he or she same thing: ‘attention’ and ‘visibility’ should
visits another is higher than chance. These be two sides of the same coin, much as ‘sup-
data are then aggregated to establish patterns ply’ should meet ‘demand’ in markets. And
of audience attention, which can then be cor- ultimately different approaches to measuring
related with other factors. what information is sought should converge;
The authors (Wu and Taneja 2016) found as with measuring the most visible and most
that the clusters of visibility and the most frequently accessed websites and the sur-
popular websites have changed quickly veys which ask people how often they seek
within the space of the last six years: whereas information about what. These include the
in 2009, a global/US cluster was most cen- national surveys undertaken by the World
tral on the web and at the same time the larg- Internet Project (http://www.worldinter-
est, in 2011 it was overtaken by a Chinese netproject.net/#about), the International
cluster, and there was no longer a global/US Telecommunications Union (http://www.itu.
cluster, but rather in second place was an US/ int/en/ITU-D/Statistics/Pages/default.aspx),
English cluster followed by a global cluster. or the China Internet Network Information
The same two clusters occupied the top two Center (http://www.cnnic.net.cn/). Other
spots by size in 2013, but the global cluster measures and approaches, as discussed,
(of websites that are not language specific, include focus groups and browsing histories.
such as Mozilla and Facebook) had slipped
to eighth place (India was ninth and Germany
tenth), followed by a number of other clusters
including sites in Japan and Russia, but also INFORMATION INFRASTRUCTURE
Spanish-language sites and those in Brazil
and France. Another way to think about the web is as an
What we see here is the evolution of the infrastructure of information. In this respect,
web as it becomes more oriented towards the web is part of a larger ‘infrastructure’ or
the global South (Spanish-language sites ‘large technological system’ (Hughes,
and sites in Brazil and also India). We also 1987): the internet. The internet encom-
see, with time, that websites of ‘global’ sta- passes a wider set of functions, including
tus have become fewer in number among the two-way communication, and seeking infor-
world’s top 1,000 sites, and we see language mation is only one part of what people do
THEORIZING THE USES OF THE WEB 95

online. And online information and com- are still changing. On the seeking side, for
munication are only part of a larger infra- example, there are ‘apps’, which need to be
structure of information and communication downloaded and in this sense are not part of
(that includes print, broadcast, and landline the open web but are still increasingly used
telephony). In this respect there is a deepen- in seeking and using information. There
ing and broadening of this infrastructure – are other ‘walled gardens’ within the web
rather than an information or communication or internet infrastructure, reminiscent of an
revolution; people’s uses of information early era of the web when companies and oth-
have not changed radically since the internet ers sought to be ‘portals’ to the web. At the
and the web. To make it even more compli- same time, it can be anticipated that informa-
cated, only part of the internet and web are tion at our fingertips will come to be ‘taken
(public) infrastructures; much of the inter- for granted’ (Ling, 2012) as smartphones
net and web are private (think here of and other devices become commonplace,
‘intranets’), and so there are large techno- and as the online information infrastructure
logical systems rather than infrastructures in becomes more enveloping. Put differently,
the strict sense. But it is important not to people are becoming dependent on the web
exaggerate the influence of this new exten- infrastructure in a similar way to how they
sion of the information infrastructure: the rely on electricity and roads today. This
problem with those who talk of an informa- makes it imperative to understand how the
tion flood (Gleick, 2011), or measure the web extends the information and communi-
radical increase in the amount of informa- cation infrastructure – without exaggerating
tion available (which is true: Hilbert and its effects.
Lopez, 2011), or talk of an information
overload, is that even though the supply of
information has increased dramatically,
there are limits to the amount of online CONCLUSION
information that is routinely used (Hargittai
et al., 2012; Neuman et al., 2012). Over the course of the coming decades, the
Furthermore, only a small proportion web will continue to be only one source of
of online information use is interesting for information for people among many, but it is
social science – search engine optimization arguably becoming the single most important
and online advertising and shopping may one. This means that there are policy impli-
be interesting for marketing and business cations, above all about the diversity, accu-
scholars, but they are of limited significance racy, and openness of information online.
to media and communication scholars and More concretely, is the information diverse
those concerned with social transforma- enough, or do pages that are manipulated to
tions. Nevertheless, there has clearly been a achieve a high page rank, for example, but
change, perhaps not on the scale of the print that are of low quality, dominate the results?
revolution (Eisenstein, 2005), but a change Are the sources reliable when it comes, for
that can be appreciated when we think of example, to political or health-related infor-
the extent to which people use the web on an mation? Are results in different languages
everyday basis, now also on mobile devices. and the results obtained by people with dif-
People have become tethered to information, ferent levels of digital skills and education
just as, with email and social media, they (Hargittai and Hsieh, 2013) of equal quality?
have become more tethered to each other It may be that skills and coverage of different
(Schroeder, 2010; 2018). topics in different languages become more
At the same time, both information- important than, say, questions of access to or
seeking behaviours and the infrastructure censorship of online information – though all
96 THE SAGE HANDBOOK OF WEB HISTORY

these questions will continue to shape the search engine results? When do they prefer
role of the web. text or visual types of information when both
This chapter has discussed a variety of per- are available? When are they satisfied with
spectives on uses of the web. It has suggested what can be displayed on a small smartphone
that an overall framework of understanding screen as opposed to needing a full screen and
web uses must consist of the infrastructure of better input controls via a full-size keyboard
the web (and how it is part of broader infra- and mouse? And: when do they use the web,
structures of information), of the content of as opposed to asking a friend online or offline
the web – its scale, scope, and how it is inter- or consulting printed sources? As informa-
linked, and how these relate to how the web tion becomes more ubiquitous, complex, and
is actually used in terms of content that is variegated, scholarship has unique oppor-
accessed in everyday life – all in the context tunities to produce anatomies of these mul-
of how this fits with or extends other media. tifaceted types of social behaviour, which,
Such a framework must be based on what even though they are still evolving, are also
web content is most visible and what receives becoming more embedded in all aspects of
attention, also compared with other offline, our lives.
online, and traditional information and com- Note: earlier versions of portions of this
munication technology sources (Schroeder, chapter were published previously in chapter
2016). While this is an ambitious undertaking, 5 of Schroeder (2018).
it is also possible to examine particular areas,
such as the uses of Wikipedia, or access to
political information, within this framework.
And while this chapter has provided the ele- REFERENCES
ments of an overall framework, pulling these
elements into a full theoretical account of the Anderegg, W.R., & Goldsmith, G.R. (2014)
web still eludes us; that would require a more ‘Public interest in climate change over the
comprehensive analysis that includes more past decade and the effects of the “climat-
empirical material about web uses combined egate” media event’. Environmental
with a close theoretical integration of these Research Letters, 9(5): 054005.
Aspray, W., & Hayes, B. (eds.) (2011) Everyday
elements. Still, it is only against this back-
Information: The Evolution of Information
ground that it will be possible to answer other Seeking in America. Cambridge, MA: MIT
larger questions about the web, such as those Press.
addressed in this Handbook, but also the ethi- Barnett, G., & Park, H.-W. (2014) ‘Examining
cal or policy questions raised in the previous the international internet using multiple
paragraph. measures: New methods for measuring the
In any event, for future media historians communication base of globalised cyber-
and sociologists of new media technologies, space’. Quality and Quantity, 48: 563–75.
the web will provide a treasure trove of data. Bright, J. (2016) ‘The social news gap: How
Questions such as ‘what did people want to news reading and news sharing diverge’.
know in the early 21st century?’ or ‘what Journal of Communication, 66(3): 343–65.
Brügger, N. (2012) ‘Web historiography and
kinds of information did the web provide?’
internet studies: Challenges and perspec-
will open up new ways of understanding the tives’. New Media & Society, doi:
past and the present. The availability of web 1461444812462852.
data means that there will be plenty of data Castells, M. (2009) Communication Power.
about content that people actively seek out Oxford, UK: Oxford University Press.
and these data are also more granular than Donner, J. (2015) After Access: Inclusion,
data for traditional media: how do people Development, and a More Mobile Internet.
make choices when faced with alternative Cambridge, MA: MIT Press.
THEORIZING THE USES OF THE WEB 97

Eisenstein, E.L. (2005) The Printing Revolution Napoli, P., & Obar, J. (2015) ‘The emerging
in Early Modern Europe. Cambridge: Cam- mobile internet underclass: A critique of
bridge University Press. mobile internet access’. The Information
Gleick, J. (2011) The Information. London: Society: An International Journal, 30(5):
Harper Collins. 323–4.
Guy, I. (2016) ‘Searching by talking: Analysis of Neuman, W.R., Park, Y.J., & Panek, E. (2012)
voice queries on mobile web search’. In Pro- ‘Info capacity: Tracking the flow of informa-
ceedings of the 39th International ACM SIGIR tion into the home: An empirical assessment
conference on Research and Development in of the digital revolution in the US from
Information Retrieval, ACM, pp. 35–44. 1960–2005’. International Journal of Com-
Hargittai, E., & Hsieh, Y.P. (2013) ‘Digital ine- munication, 6: 20.
quality’. In W. Dutton (Ed.), Oxford Hand- Noam, E. (2016) Who Owns the World’s
book of Internet studies. Oxford, UK: Oxford Media?: Media Concentration and Owner-
University Press, pp. 129–50. ship around the World. New York: Oxford
Hargittai, E., Neuman, W.R., & Curry, O. (2012) University Press.
‘Taming the information tide: Perceptions of Qiu, J. (2009) Working-Class Network Society:
information overload in the American home’. Communication Technology and Information
The Information Society, 28(3): 161–73. Have-Less in Urban China. Cambridge, MA:
Heilman, J., & West, A. (2015) ‘Wikipedia and MIT Press.
medicine: Quantifying readership, editors, Pan, J. 2017. ‘How Market Dynamics of Domes-
and the significance of natural language’. tic and Foreign Social Media Firms Shape
Journal of Medical Internet Research, 17(3): Strategies of Internet Censorship’. Problems
e62. of Post-Communism, 64(3-4): 167–88.
Hektor, A. (2001) What’s the use?: Internet and Pariser, E. (2011) The Filter Bubble: What the
information behavior in everyday life. PhD Internet is Hiding from You. Harmonds-
Thesis, Linkoeping University. worth: Penguin.
Hilbert, M., & López, P. (2011) ‘The world’s Rieh, S.Y. (2004) ‘On the Web at home: Infor-
technological capacity to store, communi- mation seeking and Web searching in the
cate, and compute information’. Science, home environment’. Journal of the American
332(6025): 60–5. Society for Information Science and Technol-
Hockx, M. (2015) Internet Literature in China. ogy, 55(8): 743–53.
New York, NY: Columbia University Press. Savolainen, R. (2008) Everyday Information
Hughes, T. (1987) ‘The evolution of large tech- Practices: A Social Phenomenological Per-
nological systems’. In W. Bijker, T. Hughes, spective. Lanham MD: Scarecrow Press.
and T. Pinch (Eds.), The Social Construction Schroeder, R. (2010) ‘Mobile phones and the
of Technological Systems. Cambridge, MA: inexorable advance of multimodal connected-
MIT Press, pp. 51–82. ness’. New Media and Society, 12(1): 75–90.
Kümpel, A.S., Karnowski, V., & Keyling, T. Schroeder, R. (2014) ‘Does Google shape what
(2015) ‘News sharing in social media: A we know?’, Prometheus: Critical Studies in
review of current research on news sharing Innovation, 32(2): 145–60.
users, content, and networks’. Social Media Schroeder, R. (2016) ‘Big Data and Communica-
+ Society, 1(2): doi: 2056305115610141. tion Research’, Oxford Research Encyclopedia
Liao, H.-T. (2009) ‘Conflict and consensus in of Communication, http://communication.
the Chinese version of Wikipedia’. IEEE Tech- oxfordre.com/
nology and Society Magazine, 28(2): 49–56. Schroeder, R. (2018) Social Theory after the
Ling, R. (2012) Taken for Grantedness: The Internet: Media, Technology, and Globaliza-
Embedding of Mobile Communication into tion. London: UCL Press.
Society. Cambridge, MA: MIT Press. Schroeder, R., & Taylor, L. (2015) ‘Big data and
Meyer, E.T., & Schroeder, R. (2015) Knowledge Wikipedia research: Social science knowl-
Machines: Digital Transformations of the Sci- edge across disciplinary divides’. Informa-
ences and Humanities. Cambridge, MA: MIT tion, Communication and Society, 18(9):
Press. 1039–56.
98 THE SAGE HANDBOOK OF WEB HISTORY

Segev, E., & Ahituv, N. (2010) ‘Popular searches Waller, V. (2011a) ‘Not just information: Who
in Google and Yahoo!: A “Digital Divide” in searches for what on the search engine
information uses?’. The Information Society, Google?’. Journal of the American Society
26(1): 17–37. for Information Science and Technology,
Sen, A. (2009) The Idea of Justice. London: 62(4): 761–75.
Allen Lane. Waller, V. (2011b) ‘The search queries that took
Sonnenwald, D. (ed.) (2016) Theory Develop- Australian internet users to Wikipedia’.
ment in the Information Sciences. Austin: Information Research, 16(2), online at http://
University of Texas Press. search.ebscohost.com/login.aspx?direct=tru
Stehr, N. (1994) Knowledge Societies. London: e&db=lxh&AN=62852994&site=ehost-live
Sage. Waller, V. (2013) ‘Diverse everyday information
Taneja, H. (2016) ‘Mapping an audience-cen- practices in Australian households’. Library
tric World Wide Web: A departure from and Information Research, 37(115): 58–79.
hyperlink analysis’. New Media and Society, Webster, J.G. (2014) The Marketplace of Atten-
http://doi.org/10.1177/1461444816642172 tion: How Audiences Take Shape in a Digital
Taneja, H., & Webster, J.G. (2016) ‘How do Age. Cambridge, MA: MIT Press.
global audiences take shape? The role of Wu, A.X., & Taneja, H. (2016) ‘Reimagining
institutions and culture in shaping patterns Internet geographies: A user-centric ethno-
of web use’. Journal of Communication, logical mapping of the World Wide Web’.
66(2): 161–82. Journal of Computer-Mediated Communica-
Taneja, H., & Wu, A.X. (2014) ‘Does the Great tion, 21(3): 230–46.
Firewall really isolate the Chinese? Integrat- Wu, L., & Ackland, R. (2014) ‘How Web 1.0
ing access blockage with cultural factors to fails: The mismatch between hyperlinks and
explain web user behavior’. The Information clickstreams’. Social Network Analysis and
Society, 30(5): 297–309. Mining, 4(1): 1–7.
8
Ethical Considerations for
Web Archives and Web History
Research
Stine Lomborg

INTRODUCTION: ARCHIVES, location, click-through patterns etc. At the


EVERYWHERE same time, the analogue materials of pre-
digital archives are digitized and included in
Digital archives are everywhere. They are cultural heritage archives such as the
repositories of digital communications gen- Europeana collections of European art and
erated through human activity, different in mass media content (Europeana.eu).
size and scope, and created by a diverse Web archives constitute a specific type of
range of actors – public institutions, com- digital archive, comprising segments of the
mercial actors such as digital media insights worldwide web – snapshots of specific sites,
companies, researchers and private citizens, collections of communication about a spe-
to name just a few (Beer and Burrows, 2013). cific event or the entire WWW in the moment
Digital archives serve to fixate and order data and across time. Web archives are typically
that might otherwise quickly vanish or created using software to ‘crawl’ a portion
change. Digitalization has enabled a dra- of the publicly accessible web (e.g. company
matic upsurge in the sheer number and kinds websites, but not intranets; open Facebook
of archives available, owing to the relative pages, but not private profiles). The process
ease of archiving, storing and maintaining of collecting the archive is automated and
digital records. Hence, a wealth of informa- unobtrusive (Lomborg, 2012b; Webb et al.,
tion which was not previously saved and 2000), yet the perimeters are predefined by
organized in databases for later retrieval is the archiving actor (Brügger, 2011, 2017)
now being archived, from entire company (see also Brügger, this volume). The archived
intranets and websites to posts on debate data may include different communication
forums or social media, to personal profile modalities (text, images, video, audio), links,
data and metadata such as time stamps, metadata etc. While web archives do not
100 THE SAGE HANDBOOK OF WEB HISTORY

comprise a 1:1 copy of what was actually Old sources may present detailed accounts
on the live web at a given point in time, they of specific individuals who lived hundreds
constitute web corpora in a form that makes of years ago. But questions of future harm
for a useful source for researching the his- and beneficence for these individuals seem
tory of the web itself as well as for studying irrelevant when weighed against the greater
aspects of late modern cultural history since societal good of writing the history of man-
the 1990s and up until today. Web archives kind. Yet web history and web archives put
may be constructed by the researcher for a the question of harm to persons back into the
specific project, for instance by harvesting research frame. As will be evident, ethics are
data through APIs of specific sites and ser- of crucial importance when writing the histo-
vices (e.g. archives of digital memorial web- ries of the users of the web, their online com-
sites as data on practices of online grieving, or munications and communities. Therefore, the
the development of specific social networks purpose of this chapter will be to address web
or political debates on social media across history research ethics through the general
time). Researchers may also use data from perspective of internet research ethics, and to
existing archives (e.g. the Internet Archive’s sensitize web history to areas of ethical con-
Wayback Machine or national web archives), cern, such as expectations of privacy, based
typically larger in scope and maintained by on established ethics principles and discus-
public and private institutions with broader sions. This will, hopefully, serve as a starting
purposes, such as supporting research infra- point for future conversations around ethics
structures or preserving cultural heritage. in web history research.
The abundance of archives cannot be sepa- Ethics questions regard not just what kinds
rated from the enabling technologies and of data can be collected and archived, but also
infrastructures that support them. Digital what can be used in research. There is some
media not only allow for easy access to a convergence in digital research ethics discus-
wide range and unprecedented amount of sions towards the idea that there is no fixed
(personal) data, they also provide tools that recipe for ethically sound research on digital
enable a wide range of actors to collect and media – it has proven unfruitful, even impos-
mine such data, often requiring little techni- sible, to establish firm rules that can apply
cal knowledge (Kennedy, 2016). Perhaps for across contexts, cultures, research fields etc.
this reason, research ethics questions have What instead seems to crystallize as the main-
surfaced quite forcefully in digital media stream stand in humanistic and social science
research. At the same time, there is very little research ethics regarding digital media is the
research ethics literature to consult for specific need for a case-by-case approach, consider-
guidance when it comes to web historiogra- ing a range of issues throughout the research
phy and web history research, a specialized, process: from the kinds of data solicited in
interdisciplinary field of study that has devel- order to answer a research question and the
oped at the intersection of history research procedures and tools used for data collec-
and internet research proper. Although ethics tion, storage, processing and documenta-
mix into ongoing epistemological debates tion, to the publication of research findings.
in the discipline of history (e.g. on how to Following from this general trend, this chap-
use and treat diverse sources of evidence), ter does not offer a fixed recipe for ethical
the vocabulary of research ethics is largely decision-making in web history research. It
absent (Gallois, 2011). This is perhaps also offers guidance by way of reflecting on the
a result of historians traditionally working kinds of issues and dilemmas that researchers
with comparably old sources, the appropri- are likely to face in digital media research,
ate use of which is formally stipulated by and web archival research specifically. It also
archival legislation and local regulations. offers concepts to ‘think with’ in order to
ETHICAL CONSIDERATIONS FOR WEB ARCHIVES AND WEB HISTORY RESEARCH 101

reach ethically sound decisions for specific Rights (1948), the Nuremberg Code (1949)
research endeavours and contexts. and the Belmont Report (1979) (Buchanan,
2011). It is also influenced by standards and
best practices in the various research dis-
ciplines that are involved in digital media
FROM INTERNET RESEARCH ETHICS research. As a result, internet research eth-
TO WEB HISTORY ETHICS: DEBATES, ics is pluralistic in nature. Ethics guidelines
PRINCIPLES AND SENSITIZING and recommendations, such as the broadly
CONCEPTS endorsed guidelines from the Association of
Internet Researchers (Ess and Committee,
In digital media, ethics questions encompass 2002; Markham and Buchanan, 2012), reflect
the responsible and careful collection, pro- this pluralism. There is no general agreement
cessing, storage, reuse and reporting of digi- on specific standards for how to conduct ethi-
tal data across contexts – commercial, cally responsible research (e.g. when to seek
governmental, personal etc. Internet research informed consent), beyond the ground rule
ethics deals with these same practices in a that the user’s right to privacy must be bal-
specific context, namely that of research, anced against researchers’ rights to pursue
understood as the generation and dissemina- knowledge for the benefit of society.
tion of knowledge for the public good and In this section I review key debates in the
benefit of society. At the core of ethics dis- internet research ethics literature to reveal
cussions is the balancing of respect for grey areas and concepts with which to think
human subjects with the prospect of societal about ethics and making ethically responsi-
and human benefits, such as information ble decisions in research etc. I consider these
technology and data-driven innovations. As debates against the context of web history to
will be evident below, I base the ethics dis- make a case for when and how it is relevant
cussions largely on the human subjects to apply internet research ethics to this field.
model. Since the purpose of my chapter is to
bring web history in dialogue with digital
media research, the basis of such dialogue Text Versus Person: What is the
must be to benchmark the new research area Ontological Status of the Data we
against the existing, interdisciplinary best are Dealing with?
practice.
The core questions of internet research At the heart of internet research ethics dis-
ethics have developed in parallel with the cussions lies a distinction between seeing
development of digital media – questions digital data as texts and as human subjects
about anonymity and authenticity in com- (Markham and Buchanan, 2012; McKee and
puter-mediated communication were central Porter, 2009), echoing the models of broad-
in the early days of internet research eth- cast and interpersonal communication. On
ics, whereas today’s ethics debates largely the web, ordinary users are productive in new
revolve around algorithms, artificial intelli- ways, including in the creation of the texts
gence and big data. Internet research ethics that make up the WWW. This fact provides
discussions are mainly anchored at the inter- crucial fuel for the ‘text versus human’
section of information and computer ethics debate. Is a status update on Facebook or a
and social scientific, empirical inquiry into comment uploaded to a news website to be
the uses of digital media. It is roughly mod- considered a finished product, a textual arte-
elled after human subjects’ rights and protec- fact authored and published by someone? Or
tion principles rooted in medical science, and should it be seen as a communicative trace or
consolidated in the UN Declaration of Human extension of the person? In some cases, the
102 THE SAGE HANDBOOK OF WEB HISTORY

determination of whether to consider digital network graphs, likes and recommendations,


data as texts or persons may be more or less user comments on news sites, personal blogs
clear-cut. A piece of digital art or a news and so on are all data directly or indirectly
article would typically classify as cultural traceable to the individuals that have browsed,
products attributable to an author, who may connected, endorsed, commented and created
or may not hold copyright over this material. ‘stuff’ online. Therefore, they are in a sense
But in other cases, say a personal blog post, data on human subjects. To the extent that
a tweet, a homepage or chat functions on an web archives include such data, which would
organization’s intranet, it may be argued that be the case for archives of social media, for
the communication resembles an ongoing instance, but also other forms of web mate-
interpersonal exchange or a personal confes- rial traceable to individuals, broadly accepted
sion rather than a textual product, which, in research ethics standards such as the respect
turn, implies seeing the produced data as for human subjects, protection from harm
traces of a person’s practices, or expressions and by extension the need to seek informed
of the person’s self. Both framings of digital consent etc. become relevant to historical
data – as text or person – may be valid research that uses such web archival data.
depending on the study object and research The personal data may be of a more or less
questions of a specific scientific inquiry. sensitive nature. For instance, according to
The distinction between data as textual the EU Data Protection Directive, informa-
representations or as a form of personhood tion regarding, for example, religion, politi-
has disciplinary origins in humanistic and cal leaning, sexuality and personal health are
social scientific thinking about the internet by law defined as more sensitive than other
as an object of study (Bassett and O’Riordan, types of personal information. It is important
2002; White, 2002). Historiography and web to note that sensitive information may often
history seems largely aligned with the for- be gleaned from the context of communica-
mer, considering archival material as histori- tion, or by cross-referencing archives, even if
cal artefacts – products of human activity. In the content itself does not explicitly reveal it.
some cases, this seems unproblematic, e.g. A case-based approach to research eth-
in studies of the development of the inter- ics as dealt with during, not just before, the
net and WWW or analyses of the evolution research process, is typically called for in
of the design of professional websites over the study of ‘live’ or ‘recent’ events, e.g.
time. In other cases, e.g. the study of the a debate of a specific political issue on the
development of specific sub-cultures on the agenda in the comment section of an online
web (e.g. sexual minority groups, political news outlet, fandom practices in relation to
extremist debate fora or camgirls’ websites), a contemporary artist as displayed on social
using archival data on the participants and media etc. But to what extent does it apply
their online communications may be more to research on the historical web? What dif-
problematic, if only because of the risk of ference does temporality, and the temporal
exposing and causing harm to specific private distance between researcher and researched
individuals, based on their prior affiliation to implied in historical research, make in terms
a sub-culture. The case-by-case ethical judge- of research ethics? In principle, why should
ment starts with reflections about the textual- it make an ethical difference if personal data
ity or humanness of the data in question. is found in an archive of the web or on the
The human subjects model dominates in live web? Ontologically speaking, the data
ethics discussions, most likely because ethics does not change in ‘kind’ over time. Given
dilemmas, guidelines and discussions tend to that the data still refers back to identifiable
surface in research on the uses and users of individuals, ethics dilemmas and questions
the internet. Web traffic patterns, social media regarding beneficence, harm, vulnerability
ETHICAL CONSIDERATIONS FOR WEB ARCHIVES AND WEB HISTORY RESEARCH 103

and so on must still apply, prompting the a rapport with and between them. This chal-
same kind of ethical treatment as research lenge is emphasized when dealing with
into the live web, if possible. Furthermore, in archival data, as will be unpacked at length
a historical perspective the temporal distance later in the chapter. Furthermore, it may
between personal data put on the web ten or cause harm if personal data is not stored,
20 years ago and the present is very small: encrypted and secured properly, or if a prom-
digital media are comparatively young. Even ise or expectation of confidentiality is
though this challenge will gradually become breached through a third party’s de-
smaller as time passes, it seems futile to sug- anonymizing or hacking of the data set.
gest that some web data is so old that ques- One additional point has to be made
tions of harm – to the involved individuals or about the pertinence of web history research
their descendants – are no longer relevant to reflecting on the issue of harm. When com-
consider. However, sometimes temporality pared with physical, historical archives, digi-
does make a difference for research ethics, tal archives, including web archives, hold
at least in practical terms. Seeking informed three main characteristics, albeit with some
consent from research subjects encountered variation among archives: increased and easy
in historical archives may be impossible, accessibility across space and time, increased
because the source of data cannot be identi- and more thorough search and indexing facil-
fied and traced, or because the person is no ities and easier cross-referencing of archives
longer alive, in which cases the researcher (Crossen-White, 2015). Some modification
may consider asking relatives (who might must be added: the ease of search, index-
be harmed due to their affiliation with the ing and cross-referencing is conditioned
deceased in question) to give their consent by the degree to which the Archiving Body
or otherwise protect the research subject. has managed to organize the vast and com-
Informed consent, while certainly not a new plex data, and this is certainly not always the
problem for historians, thus presents a spe- case. But in principle, as a consequence, for
cific challenge for web history research, one the individual whose data is included in the
that I will return to at the end of the chapter. archives, there is increased chance or risk of
their data being discovered and re-identified
(e.g. through cross-referencing between
Risk Assessment and Protection archives), and by extension, a greater risk of
from Harm: Who are the causing harm and distress for the individual,
or descendants of the individual.
Researched?
One way to begin addressing the ques-
The question of harm and the ethical respon- tion of harm is by considering the types of
sibility to protect research subjects from actors involved in our research. Some require
harm is crucial. Yet it may be difficult to more careful ethical treatment than others:
anticipate the possibly harmful outcomes of researching children, socially or politically
being a subject in a digital media research marginalized groups and physically or men-
project. How can we ensure that the volun- tally vulnerable individuals entails a great
tary sharing of personal data at one point in ethical responsibility to protect participants
time does not come to negatively impact the from being bullied or put on undesirable pub-
research subject at a later point in time? As lic display, regardless of whether the online
Enyon et al. (2017) have argued, it may be activities we study revolve around their vul-
particularly difficult to access harm to and nerability (e.g. a patient community for a
offence of research subjects in the data col- specific chronic disease; political dissident
lection process on digital media, simply communication in authoritative regimes etc.)
because it may be more difficult to establish or are seemingly trivial. Public figures such as
104 THE SAGE HANDBOOK OF WEB HISTORY

politicians, celebrities or commercial actors and to the margins of society, e.g. special-
may not expect the same degree of cautious ized communities or radical sub-cultures.
treatment. This is because their public com- With the democratizing potential comes an
munication online is typically tied to their increasing ethical responsibility, and an obli-
professional capacity and role – a role that gation to reflect upon whose stories we are
carries with it an awareness of the possibility telling, to what extent we are equipped to tell
of being put to public scrutiny. Hence, ethical their story, and what kinds of vulnerability
decision-making should start by considering and harm we might encounter and nurture
the types of actors from which we gather data when doing it. Finally, while the temporal
in terms of whether they are public figures distance implied in web history research does
communicating in their professional role or not weaken the distance between the data
private individuals, and the perceived vulner- and the subject who produced it, time may in
ability of these private individuals. some instances change the risk of research,
For web history, reflecting on the kinds of causing them harm.
subjects involved is particularly crucial, and
arguably complements existing practices in
historical research around the fair handling of Private and Public Contexts
sources. Traditional, physical archives of the
past largely comprise material of sufficient One additional key distinction concerns
public interest or value for cultural heritage, whether data is public or private, a principle
since it was not possible to archive every- mainly inherited from the ethics of partici-
thing. Therefore, to the extent that traditional pant observation in public and private spaces.
archives revolve around human subjects, Yet the distinction also comes in a more tex-
these tend to be public figures (politicians, tually oriented guise, for instance, differenti-
artists, prominent business men and women ating between intent of publication and
etc.). To be sure, archives may include both publicity (Treadwell, 2014). A similar dis-
professional and personal artefacts and tinction is found in web history and web
records of these persons (e.g. excerpts from archival research where the concept of ‘pub-
public appearances, but also private letters, lication’ refers to the kinds of material that
to which special archival regulations may have been made available to the public (e.g.
apply). In comparison, ordinary individuals – Brügger, 2017). The distinction between
their public activities as well as their personal public and private has been and continues to
lives – may be less well-represented in his- be promoted by scholarship, claiming that if
torical records. In web archives lay people’s the data is publicly accessible, then it can be
social activities arguably figure more substan- collected without further ethical considera-
tially. Many of these activities – interactions tions and procedures such as seeking
and communications – were previously oral informed consent. Yet this distinction is con-
and face-to-face only (and therefore not tested in digital media research, because
archived), but are now taking place increas- many digital media sites and services blur the
ingly in writing and images on the web, lines between what is public and private.
and therefore they are archived. As a con- This blurring of lines in part has to do with
sequence, web archives may involve a shift the affordances of digital media, including
in the kinds of history that can be written: if the support for many-to-many communica-
everything and anything can be archived at tion alongside broadcasting one-to-many and
the push of a button and stored on servers or interpersonal communication one-to-one;
in the cloud, it becomes increasingly possible and the networked infrastructure allowing for
to write history ‘from below’, giving voice easy distribution and interoperation of data
to lay people sources and mundane artefacts, between services and platforms etc. Although
ETHICAL CONSIDERATIONS FOR WEB ARCHIVES AND WEB HISTORY RESEARCH 105

many social media services are publicly a responsible manner. That is to say, the
accessible by default (e.g. Twitter), or semi- user’s perceptions of privacy, understood
public such as Facebook (some parts are here roughly in terms of personal informa-
open, and it simply requires a profile to tion, calls for attention in discussions of
access other user profiles, pages and open ethics, even if it is legal to collect and use
groups), one of the primary uses of social data in a specific digital context, because it is
media is for personal communication – one- already in the public realm. This suggests
to-one and many-to-many. Furthermore, even that the distinction between what is a public
though users seemingly care about their pri- and private setting is not that helpful for
vacy and ideally should be aware of the pos- making ethical decisions in digital media
sibility of unintended audiences (including research. Instead, as argued by Markham and
researchers, social media insights companies Buchanan (2015) and Tiidenberg (2017),
and governmental institutions) gleaning their among others, we must take people’s expec-
personal information and communicative tations of privacy seriously in ethical deci-
exchanges on social media, they are often sion-making. In the following I put forward
not. The media are rife with stories about (in two concepts that do this, and that therefore
particular young) people who share personal seem particularly useful when doing web his-
and possibly sensitive information on social tory research and developing web history
media, potentially causing harm for those ethics.
involved. In a historical perspective, simi- The concept of ‘contextual integrity’
larly personally compromising content may (Nissenbaum, 2010) is an expectations-based
be found in historical archives of specialized framework for ethical reasoning about pri-
forums for discussing contentious issues vacy and the protection of human subjects
such as politics, religion or hacktivism, or from harm in digital contexts. Nissenbaum’s
chatrooms for exploring one’s sexuality. point of departure is a rebuttal of histori-
Hence, what was – in the moment of posting – cally robust approaches to determining and
a relatively unnoticeable, safe context at the justifying individuals’ right to privacy in the
outskirts of the web (which was, in the early United States: the individual’s right to pro-
days, a niche medium), may over time tection from the interference of the state (or
become part of public records of the web to other specific actors) in private matters, and
which the user cannot regulate access. the right to restricting access if so-called sen-
Several studies have suggested that users sitive personal information is at stake, or if
may not understand the possible ramifica- there is a reasonable expectation of privacy in
tions of sharing personal data, that the impe- context (e.g. the home as a setting in which it
tus of sharing and relating to others is is reasonable to expect privacy). These three
stronger than the concern for privacy, that approaches fall short in a digital context. It
they perceive social media as relatively pri- may be difficult to determine and control who
vate contexts, or simply that they find it dif- gets access to an individual’s data (think, for
ficult to understand and manage their privacy instance, of the case of reselling user data to
settings and control their information once third parties). It may also be difficult to deter-
posted (e.g. Acquisti and Gross, 2006; Boyd mine what counts as sensitive information and
and Hargittai, 2010; Hargittai and Marwick, thus identify possible risks associated with
2016; Taddicken, 2014). The possible unease personal data. Personal data that seemed non-
that follows from not knowing what one’s sensitive in one context may turn sensitive at
data may be used for, not having the means a later point in time or a different context. For
to control access to one’s personal informa- instance, users of the OkCupid dating site
tion etc., prompts ethical reasoning about may have felt that sharing information about
how to handle users’ data in digital media in their sexuality etc. was fairly non-sensitive in
106 THE SAGE HANDBOOK OF WEB HISTORY

this particular context, or at least a reasonable distance between a specific data point and
part of the trade of getting access to poten- the person who produced it in the first place
tial partners on the site, but then in May 2016 (Lomborg, 2012a; Markham and Buchanan,
students leaked OKCupid data from 70,000 2015). A personal status update is likely to
users in a database in an alleged attempt to be considered more ‘private’ than a shared
benefit the scholarly community. They had news article on Facebook, although this
scraped the data from the site and made it is not always the case. Although both are
public without the knowledge and consent of instances of an individual communicating,
the users. Finally, it may be debatable what the status update originates from the user
counts as a reasonable expectation of pri- herself and emphasizes the user as the pro-
vacy in the context of digital media. Thus, ducer. The personal stakes of the individual
instead of thinking of privacy as a matter of are higher in research that reports discursive
universal principles about who should have analyses of web-based communication than
access to what information in which contexts, in research analysing and visualizing aggre-
Nissenbaum suggests the lens of privacy as gate data, such as communication networks
‘contextual integrity’ as a means to consider around a specific hashtag on Twitter. In the
whether something must be treated as pri- former example, the research participant is
vate or not by looking at the specific context, represented through specific instances of
data and actors involved. Every communica- communication that are directly linked to the
tive context is guided by two types of norms individual, whereas in the latter example, the
of information flow: norms regarding what individual is simply represented as a node in
kind of information is appropriate to share a network. The distance principle stipulates
in the specific context, and norms regard- that it may be difficult for the researcher to
ing who can control the information flow determine beforehand whether a specific set
and by extension share, publish or use this of data is perceived as private or not for spe-
information. These norms must be upheld in cific research participants. Rather, it must
order to protect individual privacy. From a rely on a continuous assessment throughout
research ethics perspective, contextual integ- the research process and possibly involve
rity highlights how privacy expectations may asking the participants themselves.
be breached if we as researchers collect and The distance principle helps unpack the
use data on users’ digital profiles, communi- ethics of working with archived web data
cations and actions without their knowledge to document histories of web cultures and
and consent. The principle of contextual identities. In essence, the archive inserts a
integrity invites web historians to dwell on distance between researcher and research
the possible ethical consequences of repur- subjects, which is not possible in online eth-
posing historical web data – from the original nographic research, which would typically
context, to the archive and on to the context partly rely on the same kinds of material to
of web history. study the lived experience and practices of
Another useful ethics concept to reflect users. Given that the same material is used,
on the relationship between actors, content one should expect similar ethical dilemmas
and contexts, and specifically the norms of and procedures. But the nature of the method
appropriate information flows, is ‘the dis- and the temporal expansion of the data col-
tance principle’, which refers to the distance lection process in online ethnography is, in
between the object of study and the person ethical terms, an advantage: research sub-
who produced it (Lomborg, 2012a; Markham jects can be approached slowly to build
and Buchanan, 2015). The distance principle trust, and ethical measures can be built into
urges the researcher to consider the private/ the process ad hoc through negotiation with
public continuum in terms of the experiential the participants throughout the process.
ETHICAL CONSIDERATIONS FOR WEB ARCHIVES AND WEB HISTORY RESEARCH 107

For instance, it is possible to differenti- are also ‘made’, because the very process of
ate between different types of participants archiving entails a number of methodologi-
if necessary. If a research subject discloses cal choices about what to include and leave
information that makes him vulnerable in out, how to represent the material in a data-
the sense discussed above, additional meas- base and so forth (Jensen, 2012; Lomborg,
ures may be taken to protect their privacy. 2012b). Adding to this basic distinction, the
In contrast, archiving techniques standardize data used for research may be found in pre-
data collection from the outset, because the existing archives of the web (e.g. the Internet
tools for archiving require the researcher to Archive, or the Danish Netarkivet) or made,
specify what should be included and left out if the researcher creates the data archive
of the data set. Furthermore, owing to the herself. If data is found in an existing
initial distance between the researcher using archive, the legal issues have been ‘out-
the archive and the persons represented in sourced’ from the research process and
it, it may be more difficult to approach the already dealt with by the archiving institu-
research subjects for informed consent etc. tion, namely those which have to do with the
That is to say, the creation of trustful and collection of data and data storage. In
respectful relationships with those under Denmark, for instance, researchers have to
study is not built into the method; in fact, apply for access to Netarkivet by explaining
contacting participants to establish a rapport the purpose of research and consent to not
would diminish the advantage of the unob- use personal information that may be derived
trusiveness of archiving. That it to say, it is from the archive. In the case of the Internet
the web historian’s job to then think about the Archive, anyone can access the archives if
possible ethical implications of using the they subscribe to the general Terms of
archived data for specific projects with/ Service, which include general agreements
without measures of privacy protection, con- not to violate anyone’s privacy, for instance
sent and so on. As Crossen-White (2015) by collecting and storing personal data from
observes, the distance implied in archival the Wayback Machine. It also grants special
research is likely to lower the researcher’s access to specific collections by application
sense of obligation and responsibility for the (internetarchive.org). Furthermore, the
research subject, even if – according to the archival institution or gatekeeper may stipu-
distance principle – the experiential distance late ethical rules for the researcher to con-
between person and data, and researcher sent to if using personal data from the
and object of the research is fairly small. archive (McKee and Porter, 2009). The
Arguably, this sense of distance is further archival gatekeeper’s code of ethical research
enhanced in historical web research, due to conduct may vary a great deal according to
the temporal distance between the researcher context and content, and many archives do
and the historical data. At the same time, not explicitly address ethics – only legal
especially when engaging personal stories matters. Therefore, the researcher should at
over time, as documented on a homepage or least maintain ethical reflection regarding
blog over time, historical research may bring data processing, documentation and display
us a sense of closeness to research subjects of research results based on the archived
even if we don’t directly connect with them. data. In ethical terms, it may be useful to
distinguish between collection and use of the
Data ‘found’ in existing archives or data in question. Arguably, ethical caution is
archives ‘made’ by the researcher particularly important in the researcher’s
Web archives consist of ‘found’ data, insofar selection of material from the archive for
as the data was already on the web before documentation in publications and other
becoming included in the archive. Yet they presentations of research, which put the
108 THE SAGE HANDBOOK OF WEB HISTORY

archived data of human subjects on public before deciding whether to agree to be part of
display beyond the archive. a given research project. It also allows them
If the archive is produced and maintained to withdraw at any point during the project.
by the researcher, their responsibility is to Hence, informed consent is an ongoing nego-
ensure adherence to the legal requirements tiation between the researcher and the
for protecting personal data as well as all the researched. It may be formally ensured
ethics decisions regarding whether or not to through a written and signed consent form,
obtain informed consent from those included or given orally, for instance before a research
in the archive, and if so how to anonymize the interview. One recent study, the emotional
data, ensure secure data storage and deletion contagion study conducted by Facebook
and so on. This responsibility may be an ethi- (Kramer et al., 2014), caused scholarly and
cal advantage, because it puts the researcher public furore for, among other things, failing
in control of ethical decision-making. Yet it to get informed consent from participants
may be complicated further with the pres- (Flick, 2016). The study deployed a large-
sure of funding bodies to make digital data scale experimental setup and manipulated
banks available for future research projects. user newsfeeds to assess the possibly conta-
For instance, the researcher may initially gious effect of emotions in the network.
have obtained informed consent from those Contagion was documented, but the result
whose data is included in the archive for the suggested possible harm arising directly
specific research project, but this informed from the research setup.
consent does not necessarily apply to later Informed consent is tied to a number of
research projects using this data. Even if the obstacles, some of which are general, some
data is re used in a new research project by of which are more likely to surface in digi-
the researcher, informed consent may need tal media research, and web history in par-
re-confirmation. Tiidenberg (2017), for ticular. One general issue involves how the
instance, describes how she continually asks researcher can verify that people are actually
her informants in studies of sexy selfies on properly informed (Enyon et al., 2017). Do
social media for consent when her research they understand the possible ramifications of
moves into a new phase. being part of the study? One specific issue
for digital media regards the question of
Informed consent and informed consent from whom? For instance,
anonymization: procedures and should a researcher studying the uses of a
obstacles specific hashtag on Twitter, or personal pro-
Seeking the consent from research partici- files on Facebook, seek informed consent
pants is a standard procedure when collecting from every user whose data may be present
personal data on research participants. The in the data set (e.g. friends tagged in a sta-
idea of informed consent enforces individual tus update on the Facebook profile)? In real-
autonomy and beneficence as key principles ity, such an approach is hardly feasible. Ní
in human subjects research (Marzano, 2012) Bhroin (2015) has proposed a distinction
and can be modelled as an opt-in ethical pro- between core and ancillary research par-
cedure, in contrast to the opt-out policies ticipants as a response to this problem, sug-
often employed in web archiving, where gesting that informed consent need only be
users typically are included by default and obtained from core participants. Finally, con-
have to request to be removed or ‘muted’ sidering informed consent in connection with
from the archive. Informed consent involves specific types of actors involved in research,
ensuring that research participants receive it seems to pose a dilemma in the context of
information about the purpose, methods, per- studying vulnerable subjects and sensitive
ceived risks and benefits of a given study topics. Such contexts can be tricky to study
ETHICAL CONSIDERATIONS FOR WEB ARCHIVES AND WEB HISTORY RESEARCH 109

openly, and as Enyon et al. (2017) observe, to argue that personally identifiable informa-
asking for informed consent may cause sub- tion not only includes personal character-
jects to withdraw and close the gates for this istics such as name, age, gender, ethnicity
kind of research before it begins. At the same etc., but also ‘social’ characteristics such as
time, seeking informed consent from perceiv- one’s unique social graph (e.g. the personal
ably vulnerable subjects is a crucial ethical network of friends on Facebook) or the com-
demand. bination of cultural tastes (see also Beaulieu
While there is no principal distinction and Estalella, 2012). In studies of online
between personal data of the (in the case of discourse, anonymization of participants
the web) not-so-distant past and present, in is similarly difficult insofar as examples of
terms of the ethical treatment of such data, discourse are published as part of the valida-
doing web history in an ethically sound man- tion and documentation of analysis. In such
ner may be particularly tricky and cumber- cases, a simple string search may identify the
some, owing to the possible disappearance of participant (Lomborg, 2012a). Possibilities
the subjects ‘behind’ the data. In such cases, of cross-referencing archives could contrib-
seeking informed consent from the person is ute further to making anonymization tricky
futile. Consent to use the data must be negoti- (Crossen-White, 2015).
ated with multiple other stakeholders, includ- If anonymization is not possible, this
ing relatives of the person in question and the may imply that some research is off-limits:
archival institution that included the person’s how can vulnerable subjects and sensi-
data in the archive in the first place (McKee tive topics be researched in the context of
and Porter, 2009). digital media if we cannot protect their
Another gold research ethics standard privacy? At the same time, such research
regards confidentiality, typically handled by is arguably extremely important in order
ensuring data access only to those author- to advance public understanding of, say,
ized by the research participant and by mental and physical illness, online racism
anonymizing research participants in data and bullying etc. with the prospect of gen-
sets and publications. This would imply eral beneficence. The difficulty of ensuring
removing personally identifying information proper anonymization as well as informed
such as name or user IDs, profile informa- consent, as described above, has led schol-
tion and personal photos. Often in digital ars to argue for a need to develop other pro-
media research this is impracticable. Zimmer cedures of ethical oversight and handling
(2010), for instance, has forcefully demon- of subjects (Boyd, 2016), e.g. allowing for
strated that even with large, anonymized fabrication by remixing material from sev-
data sets, it may be possible to re-identify eral participants into composite narratives
anonymized participants. By way of com- or altering words in quotes to disable string
bining data points in a publicly released data search de-anonymization (Markham, 2012),
set on an entire cohort of Facebook users or a greater emphasis on the ‘ethics of care’
from a US college in 2006, Zimmer shows (Held, 2006) to hold researchers account-
how easy it may be to de-anonymize par- able for their research subjects (Tiidenberg,
ticipants in a data set, with possibly harmful 2017). One step in that direction is acknowl-
implications. Interestingly, the researchers edging that research ethics and ensuring
who had published the study had allegedly respect for human subjects is a process that
been in good faith, that is, tried to ensure must be built into our research inquiries, not
respect for the involved participants, and a check-box. Shifting towards opt-in princi-
the anonymized data set had been publicly ples and practices might be one crucial step
released owing to a demand from the funding for web archiving institutions and web histo-
body for the project. Zimmer uses the case rians in that direction.
110 THE SAGE HANDBOOK OF WEB HISTORY

Conclusion: when do human Research Ethics), as well as in handbooks


subjects transform into historical (Buchanan, 2011; Tiidenberg, 2017) and
artefacts? other research ethics publications (McKee
Most historical web data of interest may be and Porter, 2009). Guidelines, rather than
considered human subjects data, because it fixed recipes, have the main advantage that
stems from the online activities of private they are adaptable to the dynamic nature
individuals, e.g. personal homepages, com- of digital media, and the methods we use
ments to online newspaper articles and social to study them. As technology evolves, new
media use. It enables researchers to tell tales issues may emerge. These developments may
of the development of (specific sites, prac- twist and turn the directions and recommen-
tices and discourses on) the WWW over the dations of internet and web history research
past 25 years. Even though archived web ethics discussions in the future.
data becomes historical web data from a
researcher’s perspective, in the lives of the
persons who produced the data, such data is
(sometimes a defining) part of their personal REFERENCES
history, and may still hold significant value
to them. There is no easy and clear line to Acquisti, A. & Gross, R. 2006. Imagined com-
suggest when human subjects on the live web munities: Awareness, information sharing,
and privacy on the Facebook. Lecture Notes
become text, or historical artefacts of the web
in Computer Science, 4258, 36–58.
over time. Therefore, web history research Bassett, E. H. & O’Riordan, K. 2002. Ethics of
deals with similar kinds of ethical dilemmas Internet research: Contesting the human
to those experienced by the broader field of subjects research model. Ethics and Informa-
digital media ethics outlined in this chapter. tion Technology, 4, 233–247.
These to some extent challenge existing dis- Beaulieu, A. & Estalella, E. 2012. Rethinking
ciplinary norms and consensus in historical research ethics for mediated settings. Informa-
research and archival practices, as outlined tion, Communication & Society, 15, 23–42.
through this chapter. On a general level, new Beer, D. & Burrows, R. 2013. Popular culture,
digital territories for historical research call digital archives and the new social life of
for renewed attention to ethical issues and data. Theory, Culture & Society, 30, 47–71.
Boyd, D. 2016. Untangling research and prac-
perhaps more explicit responses from histori-
tice: What Facebook’s ‘emotional contagion’
ans to general research ethics debates across study teaches us. Research Ethics, 12, 4–13.
disciplines. Boyd, D. & Hargittai, E. 2010. Facebook privacy
There are no firm answers to ethics ques- settings: Who cares? First Monday, 15, np.
tions; they depend… on the context of inquiry, Brügger, N. 2011. Web archiving – between
the types of actors involved, the activities past, present, and future. In: Consalvo, M. &
they perform, the methods used for data col- Ess, C. M. (eds.) The Blackwell Handbook of
lection and analysis, the kinds of data sought Internet Studies, pp. 24-42. Oxford:
and so on. Based on exactly these premises, Wiley-Blackwell.
the Association of Internet Researchers, Brügger, N. 2017. Webraries and Web archives:
for instance, has developed internationally The Web between public and private. In: Baker,
D. & Ewans, W. (eds.) The End of Wisdom?:
endorsed guidelines for case-based ethical
The Future of Libraries in a Digital Age,
reflections, which may be useful to consult pp. 185–190. Oxford: Chandos Publishing.
before and during any digital media research Buchanan, E. A. 2011. Internet research ethics:
project (Markham and Buchanan, 2012). The Past, present, future. In: Consalvo, M. &
case-based approach is advocated in other Ess, C. M. (eds.) The Blackwell Handbook of
official national and international guidelines Internet Studies, pp 83–108. Oxford: Oxford
(e.g. the Norwegian National Committees for University Press.
ETHICAL CONSIDERATIONS FOR WEB ARCHIVES AND WEB HISTORY RESEARCH 111

Crossen-White, H. L. 2015. Using digital internet contexts. Information, Communica-


archives in historical research: What are the tion & Society, 15, 334–353.
ethical concerns for a ‘forgotten’ individual? Markham, A. N. & Buchanan, E. 2012. Ethical
Research Ethics, 11, 108–119. Decision-Making and Internet Research:
Enyon, R., Fry, J. & Schroeder, R. 2017. The 2012. Recommendations from the AoIR
ethics of online research. In: Fielding, N. G., Ethics Working Committee.
Lee, R., M. & Blank, G. (eds.) The Sage Hand- Markham, A. N. & Buchanan, E. 2015. Ethical
book of Online Research Methods, pp. 19–37. considerations in digital research contexts.
2nd edition. London: Sage. In: Wright, J. (ed.) Encyclopedia for Social &
Ess, C. M. & Committee, A. E. W. 2002. Ethical Behavioral Sciences, pp. 606–613. Elsevier
decision-making and internet research: Rec- Press.
ommendations from the AoIR Ethics Work- Marzano, M. 2012. Informed consent. In:
ing Committee. Retrieved January 21, 2007, Gubrium, J. F., Holstein, J. A., Marvasti, A. B.
from http://www.aoir.org/reports/ethics.pdf. & Mckinney, K. D. (eds.) The Sage Handbook
Flick, C. 2016. Informed consent and the Face- of Interview Research: The Complexity of the
book emotional manipulation study. Craft, pp. 443–456. London: Sage.
Research Ethics, 12, 14–28. Mckee, H. A. & Porter, J. E. 2009. The Ethics of
Gallois, W. 2011. Ethics and historical research. Internet Research: A Rhetorical, Case-Based
In: Gunn, S. & Faire, L. (eds.) Research Meth- Process, New York, Peter Lang.
ods for History, pp. 201–219. Edinburgh: Ní Bhroin, N. 2015. Lost in Space? Social
Edinburgh University Press. Media-Innovation and Minority Language
Hargittai, E. & Marwick, A. 2016. ‘What can I Use. PhD, Oslo University.
really do?’ Explaining the privacy paradox Nissenbaum, H. 2010. Privacy in Context: Tech-
with online apathy. International Journal of nology, Policy, and the Integrity of Social
Communication, 10, 3737–3757. Life, Stanford, Stanford University Press.
Held, V. 2006. The Ethics of Care: Personal, Politi- Taddicken, M. 2014. The ‘privacy paradox’ in
cal, Global, Oxford, Oxford University Press. the social Web: The impact of privacy con-
Jensen, K. B. 2012. Lost, found, and made: cerns, individual characteristics, and the per-
Qualitative data in the study of three-step ceived social relevance on different forms of
flows of communication. In: Volkmer, I. (ed.) self-disclosure. Journal of Computer-Medi-
Handbook of Global Media Research, ated Communication, 19, 248–273.
pp. 435–450. Malden, MA: Wiley-Blackwell. Tiidenberg, K. 2018. Ethics in digital research.
Kennedy, H. 2016. Post, Mine, Repeat: Social In: Flick, U. (ed.) The Sage Handbook of
Media Data Mining Becomes Ordinary, Qualitative Data Collection, pp. 466–481.
London, Palgrave Macmillan. London: Sage.
Kramer, A. D. I., Guillory, J. E. & Hancock, J. T. Treadwell, D. F. 2014. Introducing Communica-
2014. Experimental evidence of massive- tion Research: Paths of Inquiry. 2. ed.,
scale emotional contagion through social London, Sage.
networks. Proceedings of the National Acad- Webb, E. J., Campbell, D. T., Schwartz, R. D. &
emy of Sciences, 111, 8788–8790. Sechrest, L. 2000. Unobtrusive Measures
Lomborg, S. 2012a. Personal internet archives (Revised edition), Thousand Oaks, CA, Sage.
and ethics. Research Ethics, 9, 20–31. White, M. 2002. Representations or people?
Lomborg, S. 2012b. Researching communica- Ethics and Information Technology, 4,
tive practice: Web archiving in qualitative 249–266.
social media research. Journal of Technology Zimmer, M. 2010. ‘But the data is already
in Human Services, 30, 219–231. public’: On the ethics of research in Face-
Markham, A. N. 2012. Fabrication as ethical book. Ethics & Information Technology, 12,
practice: Qualitative inquiry in ambiguous 313–325.
9
Collecting Primary Sources from
Web Archives: A Tale of Scarcity
and Abundance
Federico Nanni

La diversité des témoignages historiques as primary sources for studying the recent past.
First, an overview of the debate on the histo-
est presque infinie.
rian’s craft is offered. Then, two different case
(Bloch, 1949) studies that have dealt with the difficulties of
adopting born-digital materials in historical
The World Wide Web is the largest collection of work will be described: the first is focused on
human testimonies that we have ever had at our reconstructing the past of university websites
fingertips. Spanning from institutional websites as a new way for studying the recent past of
to digital libraries, from personal blogs to academic institutions; the second retrieves
Twitter accounts of prominent politicians, from materials from large-scale archives of the web
online newspapers to large-scale knowledge in order to study contemporary socio-political
bases, an immense number of born-digital testi- events. Through these descriptions, it will be
monies are waiting to be retrieved, selected and highlighted how a fruitful combination of the
studied by future historians. In addition to this, historical methods with approaches from other
while these new resources are piling up steadily research areas, such as internet studies and nat-
in front of our eyes, they are also rapidly replac- ural language processing, could support future
ing their analog counterparts, from printed historians in successfully addressing them.
news articles to personal diaries, from letter
correspondences to scientific publications.
By acknowledging this sudden transition in THE HISTORICAL METHOD: TODAY
production from printed to digital documents, AND TOMORROW
the goal of this chapter is to present and discuss
some of the new methodological issues that In order to understand how the transition
arise when these materials are to be employed from analog to digital sources is about to
COLLECTING PRIMARY SOURCES FROM WEB ARCHIVES 113

change the historian’s craft, it is first of all Lyotard, 1984, among others) have posed major
essential to examine how the ‘historical critiques to the underlying assumption of both
method’ (Shafer, 1974) is generally defined traditional and social science historical scholar-
and which are its major steps. ships that it is possible to discover a ‘unique
truth’ about the past through careful analysis
Defining a Subject: In the first part of any his- of the remains. The enormous impact of these
torical research, the scholar broadly defines the critiques has been remarked by many historians
subject of investigation and – together with (Burke, 2008; Munslow, 2006) and has led to the
it – an initial question. The research question, so-called cultural turn in the profession, which is
first presented at a coarse-grained level, will be still reflected strongly today in the community1.
sharpened through the recursive process of col- Presenting a Narrative: The final step of any
lecting sources, interpreting them and by doing historical research is to define a narrative and
so discovering the underlying narrative. write a history. The creation of a narrative, which
Collecting the Evidence: In order to address the is highly connected with the initial definition of
research question, the historian identifies the the research question, gives the historian the
testimonies upon which she/he builds a narrative possibility of placing the work she/he is writing
through a complex process of collection, analysis as part of a larger contribution to the field. This
and selection of the remains of the past. These is achieved in two interconnected ways: first
testimonies could be physical remains (e.g. build- of all, by offering a new/different perspective
ings, statues), oral memories or printed docu- on the topic under study; in addition to this, by
ments (e.g. chronicles, diaries, articles, census participating in the larger debate in historiog-
data) and will soon become born-digital docu- raphy regarding the ways the past can be re-
ments, such as websites, online forums, email discovered, examined, described and – for certain
threads, large-scale databases, etc. The process of authors2 – even modeled.
collecting primary sources has been shaped and
sharpened by decades of discussions in histori-
ography both on how to establish the reliability A Computational Turn of the
of these materials, for example through source
Craft?
criticism, and on how much ‘true knowledge’
can be derived from them (there are as many History has been part of the so-called digital
interpretations of the same text as many readers, humanities (Schreibman et al., 2004) since
as Barthes (1967) has taught us). their very beginning.3 In particular, during
Interpreting the Evidence: The interpretation of
the second part of the twentieth century the
the collected textual sources represents the core
potential of computational methods and their
of any historical research. Due to this reason,
it has been the central focus of debate across impact over the historian’s craft have been
twentieth-century historiography and has expe- recurrent topics in historiography. As Thomas
rienced drastic transitions in methodology. As a III (2004) remarked, already in 1945
matter of fact, the analysis and interpretation Vannevar Bush, in his famous essay ‘As We
of sources can be conducted in many differ- May Think’, pointed out that technology
ent ways: traditional historiography scholarships could be the solution that would enable us to
have strongly relied on hermeneutics and on the manage the abundance of scientific and
careful qualitative examination of documents, humanistic data (Bush, 1945); in his vision,
while other approaches – which emerged during the Memex could become an extremely
the second part of the twentieth century, inspired
useful instrument for historians.
by social science methodologies (see the advent
The use of the computer in historical
of Cliometrics; Greif, 1997) – have employed
census data or economic reports in order to con- research, which grew significantly between
duct large-scale quantitative analyses. the 1960s and 1970s thanks both to the
  Through the 1970s and 1980s, postmodern efforts of the Annales school (see for example
and deconstructionist theories (starting from Daumard and Furet, 1959) and to its applica-
the works of Barthes, 1967; Derrida, 1997 and tion to the analysis of economic and census
114 THE SAGE HANDBOOK OF WEB HISTORY

data (Greif, 1997), has been strongly related While these early scholarships based on
to the adoption of social science practices in the use of computational approaches are
historical studies (Evans, 2001). A pioneer- essential for refreshing the historiographic
ing work on the use of database technolo- debate, it is argued in this chapter that the
gies for historical research was conducted by adoption of computational methods could
Manfred Thaller during the 1980s (Thaller, not be considered per se as a revolutionary
1991). turning point for the profession. In fact, use
However, as Milligan (2012) and Robertson of these approaches is similar to other meth-
(2016) have already remarked, a large major- odological turning points that historians have
ity of the historian community has remained already experienced before. Milligan (2012),
skeptical towards the adoption of computa- for example, identifies ‘three waves’ of com-
tional methods in the craft. This attitude has putational history; moreover, during the last
consolidated in opposition to other humani- ten years the use of computational methods
ties disciplines: for example, in the last in humanities research has been strongly
30 years the field of literary study has largely sustained and encouraged by public and
experimented with the potential of what they private institutions (from the NEH Digital
have defined as ‘distant reading’ techniques, Humanities Advancement Grants to the
in order to extract quantifiable information Volkswagen Stiftung on ‘Mixed Methods’ in
from large amount of texts (Moretti, 2013). the Humanities) as well as private companies
Instead, during the same time, the so-called (e.g. Google’s ‘commitment’ to the Digital
digital history community (Cohen et al., Humanities) and often mainstream media
2008) has decided to focus primarily on the sources (Rothman, 2014).
potentialities of the web as a platform for the Nevertheless, it is argued in this chapter
collection, presentation and dissemination that historiography is about to experience a
of material (Cohen and Rosenzweig, 2005) new and far more conspicuous turning point
and on the more ‘communicative aspects’ of and that this will have a very strong impact
doing research in the humanities (Robertson, on a specific step of the historian’s craft,
2016). This can be noticed by observing the namely the way sources are collected from
importance given to digital public history now on. Born-digital documents shared
topics (Noiret, 2015), the relevance of teach- online, their ephemerality, preservation,
ing in digital history (Cohen et al., 2008) availability and access are about to pose a
and the tradition of digital history mapping large set of new challenges for future his-
(Knowles and Hillier, 2008). torians. In the next decades, the methodo-
In the second decade of the 2000s, thanks logical debate in historiography will not
in particular to the prompt availability of only be centered around qualitative over
digitized historical primary sources and the quantitative, distant versus close, hermeneu-
potentialities of web technologies, this skep- tics against statistical significance, but will
tical attitude towards computational methods also address the needs of the community in
has slowly changed and a few interdiscipli- finding ways of acquiring knowledge on our
nary teams have developed tools in order recent (digital) past.
to help other traditionally trained historians
to employ these methods in their work. As
Nelson (2016) remarked, the first fruitful The Born-Digital Turn
applications of these methods for support-
ing historical narratives can be found in the The transition from analog to born-digital
works of Wilkens (2013) and Blevins (2014), materials is influencing the way historians
which are robust examples of the beginning study the past: materials such as websites,
of a mature season of digital history. forums, blogs, tweets and emails are in fact
COLLECTING PRIMARY SOURCES FROM WEB ARCHIVES 115

very different from traditional analog and researchers have to deal with the collected
digitized primary sources. Born-digital mate- materials in a highly critical way, as Brügger
rials have an extremely short life compared (2012) described when he introduced his defi-
with printed documents as they are signifi- nition of web archive documents as reborn-
cantly more difficult to archive and preserve digital materials:
(LaFrance, 2015). This is due to a vast
number of reasons (Brügger, 2005) and the One of the main characteristics of web archiving is
that the process of archiving itself may change
consequence of it has been summarized by
what is archived, thus creating something that is
Rosenzweig (2003) with the concept of not necessarily identical to what was once online.
‘scarcity’ of digital primary sources. […] And, second, that a website may be updated
Web pages disappear constantly from the during the process of archiving, just as technical
live web (because they are removed by the problems may occur whereby web elements which
were initially online are not archived. Thus, it can
author or by the owner of the platform, for
be argued that the process of archiving creates the
instance due to copyright issues), leaving a archived web on the basis of what was once
familiar trace of 404 status code messages. online: the born-digital web material is reborn in
Several scholars (Brügger, 2012; Rosenzweig, the archive. (Brügger, 2012: 108)
2003, among others) have already remarked
on the great impact that the ephemerality of The difficulties in the preservation of digital
web materials will have on the sharing and sources present a new set of issues for histo-
accessibility of the knowledge produced in rians who plan to employ them in their work;
the digital age for the next generations of his- however, they remain only part of the overall
torians – as has been already said, in oppo- problem. In fact, already in 2003, Rosenzweig
sition to the fact that ‘paper survives benign envisioned that future historians will not
neglect for a long time’ (Davis, 2014): only deal with a consistent scarcity of pri-
mary sources, but they will also be chal-
The life cycle of most web pages runs its course in lenged by a never before experienced
a matter of months. In 1997, the average lifespan
of a web page was 44 days; in 2003, it was 100 days. abundance of records of our past. The indis-
Links go bad even faster. A 2008 analysis of links pensable need of computational methods for
in 2,700 digital resources – the majority of which processing and retrieving materials from
had no print counterpart – found that about 8 these huge collections of primary sources has
percent of links stopped working after one year. By been a central topic of Milligan’s publica-
2011, when three years had passed, 30 percent of
links in the collection were dead. (LaFrance, 2015) tions (2012, 2016). From his works it
emerges that now that the community is deal-
Moreover, while some types of pages disap- ing with the abundance of born-digital
pear more frequently than others (e.g. social sources, the use of computational approaches
media messages as opposed to official state- cannot be a choice for the digital humanities
ments on administrative websites), those that researcher anymore. Therefore, it becomes
do survive tend to change very frequently essential that the researchers adopt these
(Dougherty et al., 2010). For example, articles solutions critically, always knowing their
in newspapers (Nanni, 2013) as well as official potential and limitations, and learn how to
administrative pages have often been modified combine them fruitfully with the traditional
without a specific mention (Owen and Davis, historical method.
2008). While initiatives such as the Internet While the consequences of the advent of
Archive have a long tradition of preserving born-digital sources will be revolutionary for
born-digital materials for future research, sev- our profession, so far ‘very little attention has
eral issues still exist and new issues continue been paid to the new digital media as histori-
to emerge – not in the least due to constant cal sources’ (Brügger, 2012), highlighting
innovations in web technologies. Therefore, the fact that, while ‘new media is not that
116 THE SAGE HANDBOOK OF WEB HISTORY

new anymore’ (Milligan, 2016: 80) for our conclusions on the history of universities on
society, they remain a novelty for historians. a large variety of topics, such as the way uni-
The next sections will remark further on versities have managed resources, the way
this topic by describing two very different the admission process changed before and
case studies that have dealt with the use of after 1970 and how sciences and humanities
born-digital documents as primary sources have been taught and studied.
for historical research. The first that will be The current prompt availability of a large
introduced focuses on examining the online variety of born-digital materials such as syl-
presence of the University of Bologna since labi (Cohen, 2005), bachelor, master and
the early nineties, and remarks on the impor- doctoral theses (Ramage, 2011), academic
tance of combining the traditional historian’s websites (Holzmann et al., 2016b) and their
craft with approaches from the field of inter- hyperlinked structure (Hale et al., 2014) is
net studies. about to become a new relevant component
of this field of research (Nanni, 2017b). An
emblematic example of the new challenges
that born-digital documents will pose to histo-
STUDYING THE RECENT PAST OF rians of higher education is a study on recon-
ACADEMIC INSTITUTIONS: A TALE structing the recent past of the University of
OF SCARCITY Bologna, through its digital sources (Nanni,
2017a).
Multiple historians have considered aca- The University of Bologna’s website
demic institutions as political, economic and (Unibo.it), initially created in 1993, repre-
social actors; they have also argued how their sents a new category of relevant resource
power, role and influence changed over time, for historians of higher education. The web-
especially in relation to other actors, such as site collects and offers to the reader a large
the city, the church and the national govern- variety of documents, from descriptions of
ment (Brockliss, 1978). In particular, the educational projects to overviews of research
comprehensive four-volume book series ‘A groups, from reports of collaboration with
History of the Universities in Europe’, com- international institutions to information on
missioned by the European University opportunities of interactions with the private
Association, edited by Hilde de Ridder- sector. In addition, it also shows how dif-
Symoens and Walter Rüegg and published ferent departments, professors and research
between 1992 and 2011 (Rüegg and de teams have been adopting the web – especially
Ridder-Symoens, 1992), offers an unprece- in its early days. Among the many relevant
dented overview on how universities have examples, one that deserves special mention
transformed over centuries: what they have is that the Astronomy Department of the uni-
taught and researched, how they have been versity was already sharing preprints of their
institutionalized and how they have inter- publications online in 1994 as html pages, in
acted with the society. an early attempt to benefit from the potential
Historians of higher education, who pre- of the World Wide Web.
sented their research in the volume, have Nevertheless, while Unibo.it represents a
adopted a large variety of primary and useful collection of primary sources, the web-
secondary sources in their works, from site was modified several times during its first
­university-archive materials such as matricu- 20 years and the majority of pages that were
lation and graduation statistics to academic published in the past are no longer available
dissertations, from public reports to large- on the live web. In particular, the transition
scale statistical analyses. Based on these to the so-called ‘Portale D’Ateneo’, which
data, researchers have described and drawn started in the early 2000s, required that all
COLLECTING PRIMARY SOURCES FROM WEB ARCHIVES 117

department pages change their structure and of the pages of various departments, especially
adopt a common layout and organization of during the 1990s. Yet another interesting finding,
their content. This has often forced the crea- presumably highly relevant for future historians,
tion of brand-new department subdomains was that frequently during the interviews the
and the removal of previous versions of the subjects used public and private backups of
emails in order to recollect the memories of their
same from the live web. As an additional
experience at working on Unibo.it and to confirm
issue, the team that has managed the website passages of the historical reconstruction.
during this entire transition has not consist- Newspapers: As already done in previous work
ently archived the previous versions of the (Brügger, 2011), where printed media were used
website and documented their work. to retrieve information about the web of the
Given the fact that, as of 2017, the National past, information related to Unibo.it and the
Libraries of Florence and Rome are still not role of the website for the University of Bologna
part of the International Internet Preservation was identified in local and national newspaper
Consortium (IIPC) and no coordinated pro- archives. During the 1990s, newspapers such as
ject with the specific purpose of preserving La Repubblica and Il Resto del Carlino published
the national web-sphere currently exists in a few short articles covering the new functionali-
ties on the website (e.g. free email account for
Italy, the Internet Archive remains the only
all students, online fee payments, etc.). These
resource available for recollecting all the publications, together with materials collected
materials that are no longer available on the from the university digital magazines (Alma2000,
University of Bologna website. However, in AlmaNews, Unibo Magazine), offered an addi-
2002 a removal request4 from the administra- tional overview on how the university decided to
tive team of Unibo.it was sent to the Internet promote the website to its audience.
Archive, and for this reason Unibo.it was Online Forums: To get a closer look at the everyday
inaccessible through the Wayback Machine use of the website by students and research-
for more than 13 years. This highly complex ers, other materials have been collected and
situation reflects a new level of difficulties analyzed, starting with student forums (e.g.
that future historians will encounter while UniversiBo) and Usenet discussions preserved
by Google. These documents, especially those
attempting to collect born-digital sources. In
from the 1990s, present the perspective and
the next section, an overview of the variety of enthusiasm of a rather small but specific subset
sources and methods that have been used to of the university community, namely students,
deal with this issue and to reconstruct the past researchers and professors in STEM fields, whose
of Unibo.it will be presented. departments were among the first ones to offer
access to the web.
Library and Archive Materials: As an initial step in Live Web Materials: While the website has been
the research, materials available in the university restructured multiple times during its first 20
library and archives were consulted. Among years online, many resources are still available
many other documents, a very useful source has on the live web and can reveal the current role
been the university yearbook. In the early 1990s of the website in the university’s organization
only a few pieces of information regarding the and management (e.g. attracting national and
website were mentioned in the yearbook; never- international students and researchers, promot-
theless, this source offered an initial diachronic ing collaborations with the private sector, etc.).
overview of the official teams that were manag- Additionally, the social media pages of the insti-
ing Unibo.it and was useful for drawing up a list tution (such as Facebook, Youtube and Twitter
of people to interview. profiles) are becoming key components of its
Interviews: In order to capture the rationale and presence online, showing alternative and more
the changing architecture of the website, the dif- informal ways of interaction with the users.
ferent teams who managed the website were Presence of Italian Websites in Other National
interviewed, together with technicians and
­ Web Archives: Aside from the Internet Archive,
researchers who worked on the development since 1996 national libraries from all around the
118 THE SAGE HANDBOOK OF WEB HISTORY

world have also begun to preserve their national A Critical Combination of Sources and Methods:
web past. PANDORA, started in 1996 by the The combination of traditional archival practices
National Library of Australia, the UK Web Archive with approaches from the field of internet stud-
(2004), the Netarkivet in Denmark (2005) and ies is essential in the attempt to tackle this
the Portuguese Web Archive (2011) are just a few emblematic example of scarcity of born-digital
examples of this international endeavor. Given primary sources and reconstructing the past
the complexity of defining and preserving what of the University of Bologna website. This new
is called a ‘national web-sphere’ (Brügger, 2009), methodology for collecting born-digital evi-
this research also explored the use of foreign dence has been especially useful in identifying
web archives as a proxy for studying Unibo.it. the narrative behind the early years of Unibo.it,
The practice of retrieving primary sources related which involves the arrival of a Turkish professor
to an Italian university website in foreign web from the United States at the university in 1988,
archives could seem rather odd as the goal of the establishment of the second Italian node to
a national web archive is precisely to preserve the internet and the creation of arguably one
the web of its country; however, from time to of the most relevant university websites in the
time part of the non-national web also ends up country5.
being preserved, unintentionally, by these digital While the difficulties in reconstructing the recent
archives. past of a university website could surprise the
For example, to archive national web-spheres in reader, as fewer than 30 years have passed since
an automatic way, archivists could set up crawl- its creation, they only represent a few of the
ers with a maximum number of hyperlinks they new issues that born-digital sources will pose to
can follow, with a specific set of starting points. future historians.
A crawler that is set to go at most ten links away As has been previously remarked and will be
from one of these URLs could also end up crawl- expanded on in the next section, future histo-
ing non-national content, as it will systematically rians will in fact also be challenged by a never
follow all the hyperlinks. For this reason, if the before experienced abundance of records of our
University of Bologna were to organize a Summer past. The second case study presented in this
School and Aarhus University had linked it from chapter focuses on obtaining small topic-specific
its website, the University of Bologna website collections from large-scale archives of the web;
(or at least part of it) would be unintentionally by presenting the encountered challenges and
preserved in the Danish Web Archive. describing the adopted solutions, the importance
As a part of this work, it has been found of fruitfully combining the traditional historical
that both the Portuguese (Arquivo) and Danish method with approaches from the field of natural
(Netarkivet) web archives have preserved parts language processing will be remarked on.
of Unibo.it several times since 2006.
Cloned Versions of the Website: Among the
variety of sources available, one deserves a
specific mention. In May 2007, a group of
activists decided to create a copy of the Unibo.
CREATING POLITICAL EVENT
it web interface, as part of a protest against COLLECTIONS: A TALE OF
the European Credit Transfer and Accumulation ABUNDANCE
System (ECTS) for the evaluation of the number
of hours of study. In the URL http://www.­ The World Wide Web provides the research
unibologna.eu an identical version of the web- community with an unprecedented abundance
site was available, with a description of the of primary sources for diachronically tracing,
reasons for the protest.
examining and understanding major events
This source has not only been important in this
and transformations in our society. For two
study as it documented an innovative way of
conducting a protest against an academic institu- decades, public and private institutions have
tion (by targeting its website), but also because preserved these born-digital materials for
the cloned website was preserved by the Internet future analysis (Gomes et al., 2011). However,
Archive. these collections are now so large that – in the
COLLECTING PRIMARY SOURCES FROM WEB ARCHIVES 119

rare cases that they are fully available for retrieving materials through name-filtering
research (Hockx-Yu, 2014) – it is not feasible have already proved their usefulness in sup-
for scholars to study political and social phe- porting researchers in the humanities and
nomena by examining them in their entirety. social sciences (e.g. Small, 2011), they have
If, for instance, we consider the Internet a few crucial limitations. On the one hand,
Archive, during its first 20 years it has pre- manual selection is obviously a painstakingly
served almost 500 billion web pages, and as of long process – given the previously men-
2017 it has a collection of around 25 petabytes tioned difficulties of retrieving information
of data. Since 2001, this collection has become from web archives. On the other hand, col-
available for research through a URL search lecting documents using the event-name heu-
tool on the Wayback Machine. In the most ristics presents the crucial limitation of often
recent years, information retrieval systems missing information on background stories as
supporting keyword search over the dia- well as the premises of the examined events.
chronic layers of web archives have been To give a specific example, let us imagine that
developed by the research community and the goal is to collect primary sources regard-
employed by institutions such as the UK Web ing the 2004 Ukraine Orange Revolution. If
Archive and – since 2017 – also partially by the adopted method only retrieves documents
the Internet Archive. In addition to this, out- that mention the name of the event, it will not
of-the-box tools such as ArchiveSpark collect materials that connect the premises of
(Holzmann et al., 2016a) and Warcbase (Lin the revolution to the previous controversial
et al., 2017) have been developed by the presidential election in the country. And the
research community with the specific goal of same issue will emerge when studying the
supporting scholars in gathering information first free Algerian elections since their inde-
from large-scale web archive collections. pendence (1990), which is a premise of the
One of the main endeavors of web archive Algerian civil war, or even when investigat-
institutions for fostering the use of these new ing the economic crisis behind Fujimori’s
resources is to offer manually curated sub-­ auto-golpe in Peru (1992). In this last case,
collections regarding recent socio-political the documents that discuss the adoption of
events. On Archive-It – a subscription web austerity measures will not be part of the col-
archiving service provided by the Internet lection. Moreover, the name used for refer-
Archive – a few collections regarding large- ring to an event might change over time or
scale events such as the Boston marathon shoot- vary between countries and languages: for
ing, the Black Lives Matter movement and the example, one of the early hashtags used for
Charlie Hebdo terrorist attack are available. The the 2011 Egyptian Revolution was #jan25,
collections are curated by the Archive-It team referring to the day it started.
in conjunction with curators and subject matter The second case study presented in
experts from institutions around the world. this chapter is an interdisciplinary project
In addition to manual selection, another between computer science and political his-
solution employed by digital archivists for tory focused on building more comprehen-
creating and sharing these event collections is sive sub-collections regarding events such
to adopt a filtering approach that presents to as elections, protests and political crises
the user only those documents that mention from large-scale web archives. As part of
the name of the event. This type of approach this research, a system that employs natural
is common in event-harvesting from Twitter, language processing methods and informa-
where researchers collect all tweets that – for tion retrieval approaches has been developed,
example – mention the hashtag of the event. which is able to gather and organize a highly
While both collecting documents from comprehensive collection of sources describ-
web archives through manual selection and ing a specific event (Nanni et al., 2017).
120 THE SAGE HANDBOOK OF WEB HISTORY

The developed approach is inspired by the Finding Mentions in Text: Having our ranked set
fact that, when historians are conducting the of entities and concepts, other documents men-
same task manually (i.e. identifying relevant tioning them in relevant contexts were retrieved
materials across an entire archive), they do from the web archive. In order to go beyond
not necessarily search only for documents simple string-matching of concepts that were
considered relevant (e.g. ‘protests’, ‘revolution’,
that mention the name of the event. What
‘crisis’, ‘election’), word-embedding representa-
historians will try to collect are also those tions (Mikolov et al., 2013) have been adopted.
documents that talk about related aspects that Embedding techniques represent each word,
provide the context, involving for example entity or concept (e.g. ‘protest’) as a numeric
some of the participants to the event, but not vector of n dimensions. This allows us to measure
others. If we consider the previous example similarity across different words and to collect
regarding the Orange Revolution, historians relevant materials even if they talk about ‘dem-
will also be interested in materials from the onstration’ or ‘crisis’, instead of, for example,
same period of time discussing the political mentioning ‘protest’ or ‘revolution’.
career of Yulia Tymoshenko or addressing Final Collection Building: It could happen that doc-
the state of the political relations between uments mention relevant entities and concepts
out of context, for example as part of a compari-
Ukraine, Russia and the European Union.
son: ‘The popular opposition to Ethiopia’s current
corrupt regime is comparable to the Orange
Identifying Related Concepts and Entities: In Revolution in Ukraine’. In order to filter them
order to achieve this goal in an automated fash- out and select only the documents that should
ion, the first step is to be able to identify a set be included in the event collection, a machine
of concepts and entities that are relevant to an learning system called Learning to Rank (Liu,
event. To do so, DBpedia (Auer et al., 2007) was 2009) was employed, which, given an initial set
employed. This is a large-scale knowledge base of relevant and not relevant documents, learns
extracted from Wikipedia, where events (such as how to abstract this property and to automate
the Orange Revolution) are represented by nodes the ranking process.
and connected through edges (i.e. hyperlinks in A Critical Combination of Sources and Methods:
Wikipedia) to other related entities. The combination of traditional practices of histori-
Retrieving Contextual Passages: For each col- cal research with methodologies and approaches
lected entity and concept, a textual passage from the fields of natural language processing
presenting it in the context of the event was also and information retrieval is essential for dealing
extracted from Wikipedia (for example: ‘Yulia with the large abundance of born-digital primary
Tymoshenko co-led the Orange Revolution and sources. Some of the approaches presented in this
was the first woman appointed Prime Minister of chapter have already been adopted in political sci-
Ukraine’). This is an optimal solution for identify- ence research. One of these first studies focused
ing other terms that could be useful to identify on retrieving documents that referred to political
relevant documents. events (e.g. elections) from institutional web col-
Ranking Concepts and Entities: Having obtained lections of the US government in order to define
an initial set of potentially relevant concepts a new measure of ‘attention’ of the US Congress
and entities, the goal is to score each of them on and the president to democratization and electoral
how relevant they are to the event. For example, practices in other countries, from Zimbabwe to
while Yulia Tymoshenko is highly relevant for the Haiti and Egypt (Elshehawy et al., 2017). By doing
Orange Revolution, the European Union played so, this initial work highlights both the potential
only a marginal role in the event. Different and challenges of using born-digital documents
approaches for ranking entities and concepts for and computational methods for obtaining new
relevance were tested and the best performing insights into the recent political past.
solution was to compute distances between enti- The two case studies presented in this chapter reveal
ties and the event employing out-of-the-box RDF the importance of adopting a highly interdisci-
vector representations (Ristoski and Paulheim, plinary approach when dealing with born-digital
2016). sources; methodologies from the field of internet
COLLECTING PRIMARY SOURCES FROM WEB ARCHIVES 121

studies could support historians in reconstructing contemporary societies. This change affects
lost web pages, while natural language process- any type of document we create and consume
ing methods could guide them in retrieving in our everyday life, from bureaucratic forms
documents from large-scale web archives. The collected by the public sector to newspaper
final part of this chapter will remark further on articles to political mail correspondence to
this, by discussing the importance of offering this
university websites, and it is about to present
interdisciplinary preparation to future historians
in their educational programs.
its multifarious consequences on historical
research.
Born-digital sources are significantly more
complex to archive, collect, analyze and
CONCLUSION: A NEW GENERATION select compared with traditional materials.
OF HISTORIANS Websites (such as Unibo.it) are large and var-
iegated collections of documents, which are
In recent years, researchers have argued that often not preserved in their entirety by web
history, like other humanities disciplines, is archive initiatives and can be reconstructed
reaching a turning point in its methodology only through the meticulous combination of
(Graham et al., 2015; Nelson, 2016; various pieces of information from differ-
Scheinfeldt, 2012): sustained by the efforts ent sources. When a resource, such as the
of many digitization projects, the commu- institutional website of an administration, is
nity has been employing computational finally re-created, it is often so vast that com-
methods in order to examine these vast putational technologies (i.e. natural language
resources and obtain new insights. This processing methods and information retrieval
change in methodology has reopened a long- approaches) are necessary for identifying and
term debate regarding the ways textual evi- retrieving specific documents.
dence of the past can and should be properly The methodological steps overviewed in this
interpreted. chapter for collecting, analyzing and selecting
While for the historical profession it is of born-digital documents require strong inter-
course beneficial to constantly debate and disciplinary competences and a highly critical
criticize the validity of established practices of attitude towards sources and methods. In this
acquiring knowledge from sources, it is argued complex scenario, this chapter concludes by
in this chapter that the adoption of digitized raising a very pressing question: how can the
datasets and computational methods cannot be new generations of historians be prepared to
considered, by itself, the triggering factor of a face these new challenges?
fundamental turning point in our profession. In recent years, the digital history com-
In fact, adopting (or not) large-scale datasets munity has already offered many educational
of digitized sources, together with computa- activities on computational methods to its
tional methods, will always remain a choice students. From workshops to panels, from
for the history scholar: Charles Darwin can courses to summer schools, from tutorials
still be studied without conducting text min- to hackathons, these initiatives have almost
ing over the collections presented on Darwin always been focused on presenting the poten-
Online, just as the London of the eighteenth tial of new resources, tools and platforms to
century can be examined without reading the the history students, following an attitude
Proceedings of the Old Bailey Online. that has been branded ‘more hack, less yack’
However, it is also argued that history is in (Nowviskie, 2014). While offering hands-
fact about to face a paradigm-shifting transi- on experiences with computational tools
tion in its methods, but the triggering cause of is important in order to introduce history
this transition relies on the born-digital nature students to the digital humanities, a criti-
of the large majority of sources produced by cal approach is strongly needed in order to
122 THE SAGE HANDBOOK OF WEB HISTORY

properly deal with born-digital sources and Barthes, R. (1967) ‘Discourse on History’, Social
computational methods. Science Information, 6(4): 65–75.
For this reason, it is essential that students Blevins, C. (2014) ‘Space, Nation, and the Tri-
will first of all be guided in shaping their umph of Region: A View of the World from
research topics and receive early on in their Houston’, The Journal of American History,
101(1): 122–147.
studies the preparation necessary to support
Bloch, M. (1949) Apologie pour l’Histoire, ou,
a critical analysis of the born-digital docu- Métier d’Historien. Armand Colin: Paris.
ments and computational methods at their Brockliss, L.W. (1978) ‘Patterns of Attendance
disposal. This will be imperative for a gen- at the University of Paris, 1400–1800’, The
eration of historians who will be able to go Historical Journal, 21(3): 503–544.
beyond an unquestioned adoption of the new Brügger, N. (2005) Archiving Websites: General
sources and tools at their disposal and will Considerations and Strategies. The Centre
instead critically employ them, in search of for Internet Research: Aarhus.
new historical perspectives. Brügger, N. (2009) ‘Website History and the
Website as an Object of Study’, New Media
& Society, 11(1–2): 115–132.
Notes Brügger, N. (2011) ‘Web Archiving – between
Past, Present, and Future’, in M. Consalvo
1  It is also important to acknowledge that reactions and C. Ess (Eds), The Handbook of Internet
to postmodern approaches are present as well Studies, Wiley-Blackwell: Oxford.
in the historiographic debate (see for example Brügger, N. (2012) ‘When the Present Web is Later
Evans, 2001).
the Past: Web Historiography, Digital History,
2  See for example the adoption of social science
methodologies in historical research in Fogel and
and Internet Studies’, Historical Social Research/
Engerman (1974). Historische Sozialforschung, 37(4): 102–117.
3  However, the relationship between history and Burke, P. (2008) What is Cultural History?
computing on the one side and literary and linguis- Polity: Cambridge (UK).
tic computing on the other side has always been Bush, V. (1945) ‘As We May Think’, The Atlan-
complicated (see for example Robertson, 2016). tic Monthly, 176(1): 101–108.
4  As described in the FAQ section of the Internet Cohen, D.J. (2005) ‘By the Book: Assessing the
Archive, a website owner can request crawling or Place of Textbooks in US Survey Courses’,
archiving of a site to stop and the Internet Archive The Journal of American History, 91(4):
will endeavor to comply. This will be signaled by a
1405–1415.
‘blocked site error’ message such as ‘This URL has
been excluded from the Wayback Machine’.
Cohen, D.J., and Rosenzweig, R. (2005) Digital
5  In 2001 the University of Bologna website won the History: A Guide to Gathering, Preserving,
‘WWW’ prize from the Italian economic newspa- and Presenting the Past on the Web. Univer-
per Il Sole 24 Ore for the best website in the cat- sity of Pennsylvania Press: Philadelphia, PA.
egory ‘School, university and research’. Then, for Cohen, D.J., Frisch, M., Gallagher, P., Mintz, S.,
three consecutive years (2005–7) Unibo.it received Sword, K., Taylor, A.M., and Turkel, W.J.
the ‘Osc@r del web’ prize for the best Italian public (2008) ‘Interchange: The Promise of Digital
administration website. In 2007 Luigi Nicolais, the History’, The Journal of American History,
Italian Minister of Public Administration, was also 95(2): 452–491.
present to confer the prize.
Daumard, A., and Furet, F. (1959) ‘Méthodes
de l’Histoire Sociale: Les Archives Notariales
et la Mécanographie’, Annales: Histoire, Sci-
REFERENCES ences Sociales, 14(4): 676–693.
Davis, C. (2014) ‘Archiving the Web: A Case
Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Study from the University of Victoria’. code
Cyganiak, R., and Ives, Z. (2007) ‘DBpedia: A {4} lib Journal, 26 (http://journal.code4lib.
Nucleus for a Web of Open Data’, Proceedings org/articles/10015)
of the 6th International and 2nd Asian Confer- Derrida, J. (1997) Of Grammatology. John
ence on Semantic Web: 722–735. Hopkins University Press: Baltimore, MD.
COLLECTING PRIMARY SOURCES FROM WEB ARCHIVES 123

Dougherty, M., Meyer, E.T., Madsen, C.M., Van LaFrance, A. (2015) ‘Raiders of the Lost Web’,
den Heuvel, C., Thomas, A., and Wyatt, S. The Atlantic, 14 (https://www.theatlantic.
(2010) ‘Researcher Engagement with Web com/technology/archive/2015/10/
Archives: State of the Art’, Preprint on SSRN raiders-of-the-lost-web/409210/)
(https://ssrn.com/abstract=1714997) Lin, J., Milligan, I., Wiebe, J., and Zhou, A.
Elshehawy, A., Marinov, N., and Nanni, F. (2017) ‘Warcbase: Scalable Analytics Infra-
(2017) ‘Quantifying Attention to Foreign structure for Exploring Web Archives’, Jour-
Elections with Text Analysis of US Congress nal on Computing and Cultural Heritage
and the Presidency’, Preprint on SSRN (JOCCH), 10(4): 22.
(https://ssrn.com/abstract=2981486) Liu, T.Y. (2009) ‘Learning to Rank for Informa-
Evans, R.J. (2001) In Defence of History. Granta tion Retrieval’, Foundations and Trends in
Books: London. Information Retrieval, 3(3): 225–331.
Fogel, R.W., and Engerman, S.L. (1974) Time Lyotard, J.F. (1984) The Postmodern Condition:
on the Cross. University Press of America: A Report on Knowledge. University of Min-
Lanham, MD. nesota Press: Minneapolis, MN.
Gomes, D., Miranda, J., and Costa, M. (2011) Mikolov, T., Sutskever, I., Chen, K., Corrado,
‘A Survey on Web Archiving Initiatives’, Pro- G.S., and Dean, J. (2013) ‘Distributed Repre-
ceedings of the 15th International Confer- sentations of Words and Phrases and their
ence on Theory and Practice of Digital Compositionality’, Proceedings of the 26th
Libraries: 408–420. International Conference on Neural Informa-
Graham, S., Milligan, I., and Weingart, S. tion Processing Systems: 3111–3119.
(2015) Exploring Big Historical Data: The Milligan, I. (2012) ‘Mining the “Internet Grave-
Historian’s Macroscope. Imperial College yard”: Rethinking the Historians’ Toolkit’,
Press: London. Journal of the Canadian Historical Associa-
Greif, A. (1997) ‘Cliometrics After 40 Years’, tion/Revue de la Société Historique du
The American Economic Review, 87(2): Canada, 23(2): 21–64.
400–403. Milligan, I. (2016) ‘Lost in the Infinite Archive:
Hale, S.A., Yasseri, T., Cowls, J., Meyer, E.T., The Promise and Pitfalls of Web Archives’,
Schroeder, R., and Margetts, H. (2014) ‘Map- International Journal of Humanities and Arts
ping the UK Webspace: Fifteen Years of Brit- Computing, 10(1): 78–94.
ish Universities on the Web’, Proceedings of Moretti, F. (2013) Distant Reading. Verso Books:
the 2014 ACM Conference on Web Science: London.
62–70. Munslow, A. (2006) Deconstructing History.
Hockx-Yu, H. (2014) ‘Access and Scholarly Use Routledge: New York.
of Web Archives’, Alexandria: The Journal of Nanni, F. (2013) ‘L’Archiviazione delle Pagine
National and International Library and Infor- dei Quotidiani Online’, Diacronie. Studi di
mation Issues, 25(1–2): 113–127. Storia Contemporanea, 15(3) (http://www.
Holzmann, H., Goel, V., and Anand, A. (2016a) studistorici.com/wp-content/uploads/
‘Archivespark: Efficient Web Archive Access, 2013/10/02_NANNI.pdf)
Extraction and Derivation’, Proceedings of Nanni, F. (2017a) ‘Reconstructing a Website’s
the 2016 IEEE/ACM Joint Conference on Lost Past: Methodological Issues Concerning
Digital Libraries (JCDL): 83–92. the History of www.unibo.it’, Digital Human-
Holzmann, H., Nejdl, W., and Anand, A. ities Quarterly, 11(2) (http://www.digitalhu-
(2016b) ‘The Dawn of Today’s Popular manities.org/dhq/vol/11/2/000292/000292.
Domains: A Study of the Archived German html)
Web Over 18 Years’, Proceedings of the Nanni, F. (2017b) ‘The Web as a Historical
2016 IEEE/ACM Joint Conference on Digital Corpus: Collecting, Analysing and Selecting
Libraries (JCDL): 73–82. Sources on the Recent Past of Academic
Knowles, A.K., and Hillier, A. (Eds) (2008) Plac- Institutions’, PhD Dissertation, University of
ing History: How Maps, Spatial Data, and Bologna.
GIS are Changing Historical Scholarship. Nanni, F., Ponzetto, S.P., and Dietz, L. (2017)
ESRI: New York. ‘Building Entity-Centric Event Collections’,
124 THE SAGE HANDBOOK OF WEB HISTORY

Proceedings of 2017 IEEE/ACM Joint Confer- American Historical Review, 108(3):


ence on Digital Libraries (JCDL): 199–209. 735–762.
Nelson, R.K. (2016) ‘Digital Humanities as Rothman, J. (2014) ‘An Attempt to Discover
Appendix’, American Quarterly, 68(1): the Laws of Literature’, The New Yorker.
131–136. Rüegg, W., and de Ridder-Symoens, H. (Eds)
Noiret, S. (2015) ‘Digital Public History: Bring- (1992) A History of the University in Europe.
ing the Public Back In’, Public History Weekly, Cambridge University Press: Cambridge.
3(13) (http://hdl.handle.net/1814/38393). Scheinfeldt, T. (2012) ‘Sunset for Ideology,
Nowviskie, B. (2014) ‘On the Origin of “Hack” Sunrise for Methodology’, in M.K. Gold and
and “Yack”’, in M.K. Gold and L.F. Klein L.F. Klein (Eds), Debates in Digital Humanities
(Eds), Debates in Digital Humanities (2nd ed.), (1st ed.), University of Minnesota Press. pp.
University of Minnesota Press (http://dhde- 124–127.
bates.gc.cuny.edu/debates/text/58) Schreibman, S., Siemens, R., and Unsworth, J.
Owen, D., and Davis, R. (2008) ‘Presidential (Eds) (2004) A Companion to Digital Human-
Communication in the Internet Era’, Presi- ities. Blackwell Publishing: Oxford.
dential Studies Quarterly, 38(4): 658–673. Shafer, R.J. (1974) A Guide to Historical
Ramage, D.R. (2011) ‘Studying People, Organi- Method. Dorsey Press: Belmont, CA.
zations, and the Web with Statistical Text Small, T.A. (2011) ‘What the Hashtag? A Con-
Models’, PhD Dissertation, Stanford tent Analysis of Canadian Politics on Twitter’,
University. Information, Communication & Society,
Ristoski, P., and Paulheim, H. (2016) ‘RDF2vec: 14(6): 872–895.
RDF Graph Embeddings for Data Mining’, Thaller, M. (1991) ‘The Historical Workstation
Proceedings of the 2016 International Project’, Computers and the Humanities,
Semantic Web Conference: 498–514. 25(2): 149–162.
Robertson, S. (2016) ‘The Differences Between Thomas III, W.G. (2004) ‘Computing and the
Digital History and Digital Humanities’, in Historical Imagination’, in Schreibman, S.,
M.K. Gold and L.F. Klein (Eds), Debates in Siemens, R., and Unsworth, J. (Eds), A Com-
Digital Humanities (2nd ed.), University of panion to Digital Humanities. Blackwell Pub-
Minnesota Press (http://dhdebates.gc.cuny. lishing: Oxford. pp. 56–68.
edu/debates/text/76). Wilkens, M. (2013) ‘The Geographic Imagina-
Rosenzweig, R. (2003) ‘Scarcity or Abundance? tion of Civil War-Era American Fiction’,
Preserving the Past in a Digital Era’, The American Literary History, 25(4): 803–840.
10
Network Analysis for Web History
Michael Stevenson and
Anat Ben-David

This chapter provides a conceptual INTRODUCTION TO NETWORK


background for applying network analy- ANALYSIS
sis in web history research. Rather than a
detailed discussion of the methods of net- To understand how network analysis may be
work analysis (e.g. Wasserman and Faust, applied to web history and what is at stake in
1994) or a hands-on guide to using network adopting the network approach, it is neces-
analysis software for web and social media sary to understand the historical and concep-
research (e.g. Hansen et al., 2010), this tual foundations of social network analysis.
chapter serves as both a brief introduction This section discusses how this ‘paradigm’
to network analysis and a critical reflection emerged as an approach to studying and
on its potential for web historical research. explaining social phenomena and highlights
It is divided into three main sections dis- how this intellectual history relates to the key
cussing: (1) the history and key concepts of assumptions, concepts and methods associ-
network analysis, including the paradigm’s ated with network analysis.
aims and assumptions, (2) the influence
of network analysis on the important web
genres of search and social media, as well
Roots and Basic Concepts
as how network analysis has been applied
in research on the ‘live web’ and (3) the The roots of network analysis can be found
main challenges as well as the state of the in sociology in the 1930s, in particular in the
art for network analysis using web archives. work of famed psychoanalyst Jacob Moreno
Finally, we conclude with a discussion of (1934), who developed an approach to map-
future directions for network analysis in ping the interrelationships of groups called
web history research. ‘sociometry’. For Moreno, as for later
126 THE SAGE HANDBOOK OF WEB HISTORY

proponents of network analysis, the key to Table 10.2 Social ties represented in two
understanding social structure was to see it columns (using the same data as in Table 10.1)
as a ‘network’ of relations rather than rela- Person Shares tie with
tively homogenous ‘groups’ of individuals. A C
Moreno devised methods to capture and C B
analyze networks of affinity within commu-
nities, and in one case famously argued that
an ‘epidemic’ of runaways from a young each axis and labels inside the matrix will
women’s reformatory school in New York represent information about their relation-
could be explained in large part by looking at ships (e.g. ‘admires’ or ‘is friends with …’).
the ties of friendship and dislike between Relationships are either directed (e.g. one
them, or their ‘sociometric organization’ person admires another) or undirected
(ibid.: 407). Moreno’s key insight was that (e.g. kinship).
this spontaneous event was not indicative of Graph data may be visualized in a network
‘mass’ hysteria, but rather reflected how diagram, which consists of nodes (or vertices)
emotions and behaviors could quickly spread representing the graph’s objects and lines (or
within an existing structure of relations ‘In a edges) representing their relationships. This
way that even the girls themselves may not process requires a set of decisions about how
have been conscious of, it was their location the data should be analyzed and depicted. For
in the social network that determined whether example, one might only want to visualize
and when they ran away’ (Borgatti et al., certain connections (e.g. only mutual expres-
2009: 892). sions of friendship), while most network
In subsequent decades, sociologists and diagrams are composed using an algorithm
anthropologists similarly developed a ‘net- (essentially, a set of rules or instructions) that
work’ view of social ties, and in the 1970s positions nodes spatially according to the
social network analysis emerged as a well- connections between them.
defined discipline and ‘formal methodology’ Key aspects of networks that can be repre-
in sociology (Carrington and Scott, 2011: 2). sented and explored visually include center
As a formal methodology, social network and periphery. Although different methods
analysis draws on a branch of mathematics for calculating centrality exist, a central
called ‘graph theory’, which provides vocab- node typically has a high number of con-
ulary and techniques related to the study of nections relative to other nodes, and such
relationships between objects. Graph data measurements may indicate a node’s level of
may be structured in different ways but is involvement and/or prestige within a network
typically formatted as a matrix (Table 10.1) (Wasserman and Faust, 1994: 169–75). Other
or a set of columns and rows (Table 10.2) key characteristics include network density,
representing objects and their connections. In a ratio between the actual and potential total
social network analysis a typical matrix will number of connections in the network, and
list actors (individuals, institutions, etc.) along distance, a measurement of the path between
two nodes. Nodes are said to have struc-
Table 10.1 Example of a matrix with
tural equivalence when they have identical
undirected social network data (x signifies ties, and other (less strict yet more complex)
a tie) notions of equivalence may be used to iden-
Person A Person B Person C tify classes of similar actors (Wasserman and
Faust, 1994: 461–502). Along these lines,
Person A - x
White et al. (1976) developed the notion of
Person B - x
blockmodels to denote sets of nodes with
Person C x x -
similar patterns of connections within a
NETWORK ANALYSIS FOR WEB HISTORY 127

larger network. Each block is essentially a In addition to a wealth of tools, concepts


subgrouping of nodes with a shared position and existing theory, previous work on social
within the network as well as a shared role, network analysis provides a good deal of
where the latter is also defined in terms of a critical self-reflection that will benefit appli-
pattern of links to other blocks (as opposed cations within web history. Criticism of net-
to inherent characteristics of actors within the work analysis centers on three related issues.
group). For example, data from a network of First, the concepts, methods and tools offered
researchers may reveal blocks related to sta- by social network analysis are all geared
tus: a block of leading researchers is defined towards answering questions of structural
as those known by the vast majority of the context rather than content. Network analysis
network and connected to each other through may provide insight into how, say, a video or
interactions and collaborations, while at the meme spreads on the web, or help to iden-
same time generally unaware of and not con- tify clusters and cliques among bloggers or
nected to lower-tier researchers (ibid: 747–9). social media users, but says very little about
the meaning of such content or the character
of such communities. Moreover, given the
Social Network Analysis as a commitment to a relational understanding of
social action, a ‘pure’ form of network analy-
Sociological ‘Perspective’
sis would seek to transform any questions of
Although it is tempting to think of network content into network terms: the meaning of
analysis simply as a toolbox for capturing, a meme or video is to be found in the net-
visualizing and analyzing network data, it is works it traverses and connects, the charac-
important to see how this methodology is ter of a community is understood in terms
embedded in a larger sociological paradigm of the connections between its members and
or ‘perspective’ that prioritizes social ‘rela- its position in relation to other communi-
tions’ over social ‘categories’ (Emirbayer ties. Second, because most network analysis
and Goodwin, 1994: 1414). For example, the assumes the primacy of networks over their
typical sociological survey will seek to find content, it does not address how networks
correlations between a particular behavior are constituted, maintained and dissolved by
(e.g. websites visited) and commonly held cultural practices (Emirbayer and Goodwin,
attributes or demographics (e.g. income 1994). Third, focusing only on structure
level, gender, etc.). By contrast, a network leads to narrow explanations (if any) of his-
analysis approach would assume that behav- torical change:
iors are more accurately predicted by close
examination of one’s network of social ties. [Network analysis] provides a useful set of tools for
investigating the patterned relationships among
As social network analysis has become
historical actors. These tools, however, by them-
widely adopted, one can see this basic selves fail ultimately to make sense of the mecha-
assumption about the importance of ‘rela- nisms through which these relationships are
tions’ at work in a wide range of studies, from reproduced or reconfigured over time. (ibid: 1447)
questions about the link between power and
physical attraction (Martin, 2005) to the study There is thus an inherent tension between
of patent networks in exploring geographical network analysis and the central aim of his-
regions of innovation (Fleming et al., 2007). torical inquiry, namely to provide explana-
Within this diverse literature, however, gen- tions of events and processes that occur over
eral ‘moves’ and concepts can be distilled time. Of course, this does not preclude its use
(Borgatti et al., 2009) that have developed in web history, but any studies that build on
alongside methods for analyzing social net- the language and methods of network analy-
works (Wasserman and Faust, 1994). sis should consider this central tension.
128 THE SAGE HANDBOOK OF WEB HISTORY

Citation Analysis importance of this existing knowledge is not


limited to applying network analysis. As we
One form of network analysis that deserves a argue in the next section, there is a triadic
special mention in relation to the web is cita- relationship between network analysis, the
tion analysis, as this has influenced both web web as a medium and web research that is
technology (such as search engines and social crucial to understanding the web (including
media) as well as tools for web research. its history) as well as the dominant ques-
Citation analysis is the study of connec- tions and modes of explanation that charac-
tions between scientific texts by way of bib- terize the body of research that has grown
liographic references. The technique was around it.
pioneered in the 1950s by Eugene Garfield
to help researchers find the journals and arti-
cles that would most interest them, as well as
providing libraries with an evaluation tool for THE WEB AND NETWORK ANALYSIS
managing their journal collections (Garfield,
1955). In addition to information retrieval In this section we discuss two important
purposes, Garfield and others were quick to ways the web and social network analysis are
see the potential for employing citation anal- related. On the one hand, network analysis is
ysis in the history and sociology of science part of the fabric of our experience of the
(e.g. de Solla Price and Beaver, 1966). web, as network analysis has influenced key
Citation networks can be compiled and technologies such as search engines and
studied in different ways for different pur- social media. On the other hand, web data
poses. The basic form is the study of direct and social media data have strong affordances
citations, producing a graph that is particu- for network analysis, and the growth of the
larly suited to overall measures of popularity web has been accompanied by a great deal of
and centrality. Bibliographic coupling com- interest in network analysis as well as the
pares references from scientific papers, and proliferation of network analytical tools.
serves to indicate topical similarity. A third As a hypertext document system, the
technique that is particularly prevalent in the World Wide Web is rooted in the science and
sociology of science is co-citation analysis, practice of information retrieval. Given this
where a tie between two papers is registered common lineage with citation analysis, it is
when they are cited together by a third paper. no surprise that the web’s basic design has
The more often two papers are cited together, two major affordances for network analysis.
the stronger their connection. According to First, hyperlinks serve as an analogy to
Small (1973), the significance of these ties ties or citations in the sense that social net-
is ‘subject similarity’ and ‘the association work analysis uses these concepts. Even in
or co-occurrence of ideas’, and co-citation his original 1989 proposal, Berners-Lee
analysis thus offers a potential view not only speculated that the web would allow for
of structures of influence, but of relation- (automatic) analysis of connections between
ships between particular concepts, ideas or researchers, for example to guide the creation
arguments. of project groups. Second, IP and HTTP data
Together, social network analysis and logged by web servers (and data gathered
citation analysis provide long histories of through browser cookies and other track-
research and a rich foundation of problems, ing technology) align with the kinds of data
concepts, vocabularies, methods, tools and used for network analysis, including spatial
critical reflections that may be mined in and temporal data (through IP addresses and
order to give depth to network analyses of timestamps), interaction data and data about
web data and web history. However, the information flows (c.f. Borgatti et al., 2009).
NETWORK ANALYSIS FOR WEB HISTORY 129

One key application of network analy- people on the service’ (ibid, n.p.). Around the
sis is in search technology, most famously same time, the company was developing algo-
in Google’s PageRank algorithm. Google rithms for selecting and recommending con-
co-founders Larry Page and Sergey Brin tent for users’ news feeds that would come to
(together with colleagues) specifically drew be known collectively as EdgeRank (Kincaid,
on citation analysis and described their algo- 2010). EdgeRank, like Google’s PageRank, is
rithm as ‘a method for rating web pages a ranking algorithm that employs a relational
objectively and mechanically’ (Page et al., approach: it uses data about relationships and
1999: 1; cf. Rieder, 2014). Google’s engine user activities to calculate a relevance score,
marked a major shift in search because of how and thus explicitly does not seek to define this
little the rankings depended on the content of importance by looking at the substance of
sites, instead relying almost entirely on the shared content. Where PageRank sorts web-
larger structure of connections on the web to sites based on hyperlink analysis, EdgeRank
make determinations about the quality and sorts newsfeed posts and other timely infor-
relevance of a website. Although counting mation based on explicit and implicit ties
links was one of the many methods used by between people, pages, groups and other enti-
previous search engines, Google’s algorithm ties on Facebook’s social graph.
stood out for how it used network structure The blind spots of PageRank and
to make further inferences about the quality EdgeRank are in some sense also those of
of links: a link from a website prominent in network analysis in general. Their relational
the network (such as Yahoo! or the BBC) is approaches must be considered when dis-
valued more highly by PageRank than a link cussing past and present controversies about
from a website on the periphery (i.e. one with how they order and recommend content,
fewer links from the network). In accordance from issues related to controversial search
with the central tenet of network analysis, results to the (discontinued) use of human
then, the algorithm does not take into account editors on Facebook. What the examples of
internal attributes of web pages, but instead PageRank and EdgeRank suggest, then, is
defines authority in relational terms. As is that network analysis is not simply a method
clear in their descriptions of PageRank and to be applied to the web’s past, but also a set
discussion of the search engine elsewhere, of ideas and techniques that informed impor-
Page and Brin see this emphasis on network tant events and innovations in the medium’s
structure as the key to achieving objec- history. They also raise the question of how
tive, or ‘organic’, search results, thus echo- network analyses in web research in some
ing Garfield’s (1955) argument that citation sense ‘extend’ the methods already embed-
analysis provides a more objective measure ded in the medium (Rogers, 2013). This is the
of scientific quality. question we turn to now.
A similar connection is seen between net-
work analysis and social media, particularly in
methods for ‘socially’ recommended content. Network Analysis and Web
Starting in 2007, Facebook began to see its Research
product less as a specialized ‘social network’
and more as a generalized utility (Farber, In this section we examine how network
2007). This turn towards seeing itself as a analysis has been used to study the live web.
platform – not just for users but for develop- Although uses of network analysis are more
ers, content providers and advertisers – went widespread, we highlight three genres in par-
hand in hand with the company’s promotion ticular: webometrics, issue networks and
of its core feature as a ‘social graph’, or ‘the controversy mapping, and studying social
network of connection and relations between media data.
130 THE SAGE HANDBOOK OF WEB HISTORY

Examples of network analysis in web Rogers, 2008), and comparative research on


research prior to the late 2000s were focused how structures of issue networks relate to the
nearly exclusively on hyperlinks, considered degree to which issue ‘narratives’ are shared
the ultimate ‘thread’ that seamlessly weaves (Bennett et al., 2011).
the web. Park and Thelwall (2003) distin- Since the late 2000s, network analysis on
guish between webometrics – an approach the web has increased dramatically, moving
rooted in information science – and hyperlink beyond the hyperlink-based approaches such
network analysis as an approach rooted in as webometrics and issue network analysis to
social network analysis. The two approaches, incorporate many more types of (social) net-
albeit sharing the same object of study and work data – a development made possible by
a similar method of quantitatively extract- the rise of social media. Where a number of
ing and mapping hyperlink networks, differ studies use social media data for traditional
in their focus and research questions. The network analysis questions such as identify-
webometrics approach is primarily focused ing ‘influencers’ within social groups (Aral
on the hyperlink’s role in the context of and Walker, 2012; Stieglitz and Dang-Xuan,
information retrieval and search engine algo- 2013), others employ network analysis of
rithms (Thelwall et al., 2005), with practical social media data in other domains such as
aims such as website discovery and ranking. news discussion (Bruns and Burgess, 2012).
The hyperlink network analysis paradigm, Meanwhile, network analysis tools designed
meanwhile, hosts a variety of research ques- with web and social media data in mind
tions and interests, ranging from studying include NodeXL (Smith et al., 2009; Hansen
social structures and computer-mediated et al., 2010) and Gephi (Bastian et al., 2009).1
communication to the study of social move- The popularity of network analysis for
ments or specific issues (De Maeyer, 2013). social media is unsurprising, given the many
Interestingly, the distinction between infor- affordances social media data has for this
mation science and social sciences as parallel approach. In addition to formalized relation-
approaches to the study of online networks ship and activity data, the commercial incen-
is also characteristic of the literature on web tive of turning social networks into ‘platforms’
archive research and historical network anal- for developers has ensured structured access in
ysis in particular, as we will see later. the form of APIs. However, this commercial
Among other forms of research, hyper- value means companies exert control over
link network analysis is used in Science and this access, often making it more difficult for
Technology Studies approaches to mapping researchers to study social media. In addition to
issues and controversies online (Marres and issues of access, researchers must engage with
Rogers, 2008; Marres, 2015). The Issue the question of what exactly social media data
Crawler (http://www.issuecrawler.net/) is represents, and what assumptions and conse-
a web-based application that uses concepts quences are attached to different approaches to
and techniques from co-citation analysis to gathering and analyzing that data. For exam-
map ‘issue networks’, or heterogeneous sets ple, how we study and interpret Twitter data
of actors organized around an issue of pub- depends on what relations are represented by
lic concern. Beginning with a set of websites ‘follows’. Kwak et al. (2011) demonstrate that
that act as starting points, the Issue Crawler the structure of relations on Twitter deviates
mines a network of hyperlinks and outputs considerably from that of existing social net-
a network diagram based on ‘co-links’. works, and combine this finding with others
Examples of issue network analysis include to suggest that the medium should be under-
research on how local political issues may stood as a news medium rather than a social
be ‘subsumed’ or redefined by the structures utility. Meanwhile, an analysis of Facebook’s
of transnational issue networks (Marres and ‘social graph’ suggests commonalities with
NETWORK ANALYSIS FOR WEB HISTORY 131

other social networks, and provides evidence and data donation from Alexa Internet. In an
of its ‘small-world’ nature, meaning relatively early piece about web archiving, Kahle iden-
few ‘steps’ are necessary for content to be tifies the Internet Archive’s mission with sci-
shared across a large portion of the network entometric analysis, noting that the web’s
(Ugander et al., 2011). Where computer scien- hyperlinked structure functions as an infor-
tists have examined the nature of social media mal citation system and that ‘the study of the
data through large-scale topological map- evolution of the web’s hyperlink topology
pings, media studies researchers have done so may provide insights into what any given
through specific case studies. In a study of a community thought was important’ (1997,
political and cultural ‘refraction chamber’ on n.p.). Thus, the Internet Archive, and subse-
Twitter, Rieder (2012) analyzes the assump- quently many national web archives that were
tions embedded in different approaches to col- founded in the early 2000s, follow the logic
lecting the platform’s network data. As Rieder of search engines. However, unlike search
argues, approaches that focus too narrowly on engine crawlers, which add newly found
the traditional network analysis topic of ‘diffu- pages to the search engine’s index, web
sion’ will miss out on important ways in which archiving crawlers preserve a snapshot of any
communities appropriate news and informa- found page (Masanes, 2006).
tion in culturally distinct ways (ibid.). Unlike the popular application of net-
As we have seen in this section, network work analysis as a paradigm and method
analysis and the web are closely linked, from for conducting web research, historical net-
the influence of network analysis on key web work analysis of the web has thus far been
genres to a large body of network analyti- limited in scope, primarily due to numerous
cal research and a range of network analysis methodological challenges that relate to the
tools for web research. Now we turn to how difficulty in adding a temporal dimension to
network analysis has been applied in web the network diagram, as well as to ontologi-
history research, focusing on the key chal- cal problems in using archived websites as
lenges this brings. the primary source from which historical net-
works are generated.

APPLYING NETWORK ANALYSIS IN


Temporal Challenges
WEB HISTORY RESEARCH
The network diagram is a spatial representa-
Before discussing the applicability of net- tion of relational ties between entities at a
work analysis in web history research, let us given point in time. As long as the temporal
introduce web archives as the primary source dimension is held static, there is no principle
used for historical network analysis. Earlier difference between network analyses of con-
in this chapter, we outlined the historical ties temporary ties – for example, a network of
between hyperlink, search engines and scien- interlinked websites of university students
tometrics. As with Google’s PageRank algo- (Adamic and Adar, 2003) – and network
rithm, the history of web archives is also analyses of historical ties – for example, a
rooted in scientometrics and in the history of network of the social structure of Roman
search engines. Brewster Kahle, the founder society (Alexander and Danowski, 1990).
of the Internet Archive, was the CEO and Since the web is a dynamic medium in which
founder of Alexa Internet. After Alexa content and ties change constantly and rap-
Internet was sold to Amazon in 1996, the idly, the static network diagram fails to cap-
Internet Archive was founded as a non-profit ture the dynamic changes that a network
organization, taking its harvesting technology undergoes over time.2 It should be noted that
132 THE SAGE HANDBOOK OF WEB HISTORY

this constraint poses challenges to any net- over time. Like Wikipedia, web archives also
work analysis of web data, whether of the attach specific timestamps to each page or
present or of the past, as the content and element from the archived website. In this
structure of the network may have changed way, Toyoda and Kitsuregawa (2005) used
minutes after the data was captured. six Japanese web archives to develop a sys-
Therefore, one of the first problems in tem for visualizing and analyzing the evolu-
applying historical network analysis to web tion of the web’s structure with a series of
data is the lack of temporal or longitudinal website captures from 1996 to 2003.
data. To solve this problem, several research- Despite the advantage of web archives as
ers proposed to proactively collect web data repositories of longitudinal web data with
in order to analyze changes in its hyperlinked exact timestamps that may lend themselves
structure over time. Foot et al. (2003), for to historical network analysis, there are sev-
example, proposed a method that involves eral additional challenges to their use as his-
proactive data collection involving machine torical sources for this purpose, as outlined in
network mapping and human ethnographic the following section.
coding to retrospectively study the evolution
of linking practices of Congressional can-
didates during the 2002 campaign season. Challenges Related to Web
Baeza-Yates and Poblete (2003) also used Archives as Specific Historical
proactive crawling of Chilean websites in Sources
order to study the evolution of the structure
of the Chilean web. On its face, web archives should lend them-
A second challenge in applying histori- selves to historical network analysis not only
cal network analyses of the web relates to because of the exact timestamps attached to
the difficulty in determining websites’ spe- each archived snapshot, but also because of
cific timestamps. The live web hosts vari- an assumed fit between the network para-
ous websites that were published or recently digm and methods for web archiving. Many
updated at different points in time, and the web archives are created using a harvesting
exact time of publication or update can be method that relies on the web’s networked
estimated through various methods, such structure to crawl websites and discover
as the timestamps embedded in a website’s more pages. At the same time, the archived
code, or a response to a HTTP server request. websites are preserved as discrete files that
However, as SalahEldeen and Nelson (2013) are stripped of their original networked struc-
argue, such timestamps may vary in format ture and contextual environments (Rogers,
or time zone. While SalahEldeen and Nelson 2013). Therefore, unlike hyperlink networks
propose a method for ‘carbon-dating’ web- of the live web, network analyses of the
sites through a web application that estimates archived web involve reconstruction, rather
the creation date for a URI by polling various than mapping.
sources of evidence (such as Bitly for the first One of the challenges in reconstruct-
time a URI was shortened, a Memento aggre- ing historical hyperlink networks from the
gator for the first time it appeared in a public archived web involves the normalization
web archive or Google’s time of last crawl), and selection of the relevant timestamps of
other researchers use web resources with the archived snapshots. Since the frequency
known timestamps to study the evolution of of archiving is irregular, one cannot ensure
networks over time.3 Buriol et al. (2006), for that all of the network’s actors have been
example, used the explicit timestamps asso- archived at the same time. Instead, the recon-
ciated with Wikipedia’s entries to study the structed network might represent an artifact
evolution of Wikipedia’s linked structure of mixed temporalities, rather than a coherent
NETWORK ANALYSIS FOR WEB HISTORY 133

representation of spatial ties at a specific focusing on discovering links to other web-


moment in time. As a hypothetical example, sites from a given homepage of a website.
consider a researcher who is interested in fol- While the majority of national libraries use
lowing up on the study of Foot et al. (2003) Heritrix, the breadth-first crawler devel-
mentioned above, to study the hyperlink oped by the Internet Archive (Mohr et al.,
network of Congressional candidates’ web- 2004), some national web archives employ
sites before and after the 2008 US presiden- a ‘depth-first’ crawling method that aims to
tial election. The researcher would collect a completely archive all the pages of a given
list of candidates’ websites, and fetch their website, or choose for a ‘selective method’
archived snapshots from the Internet Archive of archiving a given set of curated websites
from 2007 and 2009. Subsequently, she (Masanes, 2006). As a result, historical
would compare differences in the hyperlink hyperlink analyses of web archives may be
network that each list generates. However, a incomplete, due to differences between the
closer inspection of the websites in each list crawling methods employed (Milligan et al.,
would reveal that some of the websites were 2016; Samar et al., 2016).
archived at the beginning of the year, while The problem of completeness of web
others were archived towards its end. The archives as a historical source for network
question arises whether the ties that the net- analysis is addressed by Niels Brügger (2013),
works display indeed exist on the live web, who argues that while all web archives are
even though the websites from which the ties incomplete – both due to partial archiving but
were extracted are dated almost a year apart. also due to missing elements in the archiv-
To address this problem, researchers from ing process, such as video and images, which
the Digital Methods Initiative created a analytical network analysis software cannot
tool that renders hyperlink networks based address – web archives are also too complete,
on a selection of archived versions that are since there are many archived versions of the
closest to 1 July in each year (Weltevrede same website, from which the researchers
and Helmond, 2012). A similar initiative have to select.
that facilitates the reconstruction of hyper- Finally, it should be noted that most hyper-
link networks from the archived web is the link network analyses of web archives are
Web Archives for Longitudinal Knowledge in fact reconstructions of ties between out-
(WALK) project, which offers ready-made bound links from the archived snapshots,
graph files for visualizing hyperlink net- rather than a reconstruction of possible
works of the Canadian national web archiv- inbound links from other websites outside
ing portal (Milligan et al., 2016). While of the researchers’ corpus. Put differently, as
various approaches will address the issue of Brügger (2013) notes, while there is only one
inconsistency differently, it should be noted live web, the reconstructed network is always
that the problem of temporal difference is one instance of many possible networks. To
an artifact of the archiving process and is solve this problem, he suggests performing
not ‘solved’ through any particular mode of historical network analysis across different
analysis, especially since network diagrams – web archives.
given their emphasis on spatial organization –
serve to deemphasize or smooth over such
temporal inconsistencies. Examples
Other challenges that are specific to the
archived web as a historical source relate Despite the abovementioned challenges,
to the scope and depth of the archive. Web there is a growing literature on web history
archives such as the Internet Archive crawl that employs historical network analysis
the live web using a ‘breadth-first’ method, using web archives, especially from the
134 THE SAGE HANDBOOK OF WEB HISTORY

perspective of the study of national webs. The plethora of ontological and methodologi-
The application of network analysis in web cal challenges related to the use of historical
historical research can be broadly described network analysis with archived web materials
in three areas of historical questions: (1) the might deter researchers from applying this
study of the evolution of networks over time – method. However, when carefully planned,
emphasizing the shifting roles of content, historical network analysis of the archived
issues and actors. (2) The evolution of the web may be particularly useful in gaining
web as a medium – emphasizing changes in broad insights about the structural evolu-
technologies, features and structure. (3) A tion of historical networks over time, as well
critical study of web archives as a primary as for answering different sets of questions
source for historical research. While each of on the web’s history. The following section
these questions can be performed separately, concludes this chapter by putting forward a
they are often combined. A combination of reflexive approach to the application of net-
historical questions on the evolution of net- work analysis in web history.
works of actors and of the web as a medium
is evident in a study by Weltevrede and
Helmond (2012), who reconstructed the
hyperlinked structure of the Dutch blogo- BACK IS THE WAY FORWARD
sphere, to study the history of the web’s
blogging platform transitions. Hale et al. This chapter has sought to provide the neces-
(2014) traced the changing hyperlink struc- sary concepts, history and theory needed to
ture of networks of British universities and apply network analysis in web history
found an inverse relationship between the research. In conclusion, we would like to
density of links and the geographical dis- propose two ways in which ‘back is the way
tance between universities. Ben-David forward’ for both network analysis and those
(2016) reconstructed from the Internet who wish to apply it to web history. First, if
Archive the now vanished domain of the methodological innovation follows from
former Yugoslavia, .yu, and performed his- novel research problems, web history pro-
torical network analysis of the historical vides fruitful grounds for new insights relat-
domain to demonstrate ties between sover- ing to network analysis. Whereas social
eignty and the cohesive structure of a national media increasingly offer highly formalized
web, thereby combining questions on the data suitable for network analyses (even if
evolution of historical national webs with a access to this data remains highly con-
critique of web archives as a primary source trolled), the archived web remains notori-
for web historiography. Historical network ously incomplete. In addition to considering
analysis of the archived web has also been how to approach the material constraints and
used as a method to discover evidence of uneven temporality of web archives, research-
historical materials that have not been ers must consider what other, perhaps offline,
archived. Huurdeman et al. (2015) proposed sources can be mined for historical network
a method for recovering evidence of the past data that can be used to triangulate findings
existence of pages that were crawled but not from archived web pages. The challenge is
preserved during the archiving process, using perhaps to realize that web history is more
a combination of hyperlink analysis, anchor ‘networked’ than conventionally understood:
text and crawl logs.4 beyond hyperlinks, there are implicit links
As we have seen in this section, research- between a range of people, institutions, tech-
ers interested in applying historical network nologies and so on that can be mapped in
analysis either proactively archive their order to produce better descriptions of struc-
own corpora, or rely on archived web data. tural relations in the web’s history.
NETWORK ANALYSIS FOR WEB HISTORY 135

Second, web historians making use of structure is an important determinant of the like-
network analysis may find inspiration from lihood of the changes that might occur next.
3  Memento (Van de Sompel et al., 2009) is a frame-
further engagement with existing work
work for locating past versions of a given web
in network sociology. For example, fur- resource through an aggregator of resources
ther work on the histories of networked from multiple web archives.
groups like the early blogging community 4  More examples of network analysis in web history
(Ammann, 2009; Siles, 2012) may find use- can be found in Brügger and Schroeder (2017).
ful concepts and theory on how structures
of relations are co-constituted with shared
identities (White, 1992). More generally, REFERENCES
web historians will likely seek approaches
that transcend the divide between studies of Adamic, L.A., and Adar, E. (2003) ‘Friends and
‘big’ and ‘small’ data, as well as the divide Neighbors on the Web’, Social Networks,
between structural dynamics and situated 25(3): 211–230.
meaning. Another existing body of work that Alexander, M.C., and Danowski, J.A. (1990)
should prove useful is qualitative approaches ‘Analysis of an Ancient Network: Personal
to gathering and analyzing network data Communication and the Study of Social
(Hollstein, 2011). Overall, the challenge is Structure in a Past Society’, Social Networks,
to go beyond understanding network analy- 12(4): 313–335.
Ammann, R. (2009) ‘Jorn Barger, the News-
sis as a methodological toolbox and instead
page Network and the Emergence of the
engage with it as a paradigm, one that is
Weblog Community’, Proceedings of the
suited to particular kinds of research prob- 20th ACM Conference on Hypertext and
lems and questions. Hypermedia. Torino, Italy: ACM: 279–288.
Aral, S., and Walker, D. (2012) ‘Identifying
Influential and Susceptible Members of Social
Networks’, Science, 337(6092): 337–341.
ACKNOWLEDGMENTS Baeza-Yates, R.A., and Poblete, B. (2003) ‘Evo-
lution of the Chilean Web Structure Compo-
Research for this chapter was supported by sition’, Proceedings of the IEEE/LEOS 3rd
the Dutch National Science Foundation International Conference on Numerical Sim-
(NWO) in connection with the Veni research ulation of Semiconductor Optoelectronic
project ‘The web that was’ (275-45-006). Devices, 11–13.
Bastian, M., Heymann, S., and Jacomy, M.
(2009) ‘Gephi: An Open Source Software for
Exploring and Manipulating Networks’, Pro-
Notes ceedings of the Third International ICWSM
1  Note that some tools such as the Issue Crawler Conference, 8: 361–362.
are designed specifically to crawl and analyze the Ben-David, A. (2016) ‘What Does the Web
live web. Users provide data for other tools such Remember of Its Deleted Past? An Archival
as NodeXL and Gephi, making these tools more Reconstruction of the Former Yugoslav Top-
suitable when using historical web data. Level Domain’, New Media & Society, 18(7):
2  It should be noted that although the literature on 1103–1119.
social network analysis proposes statistical meth- Bennett, W.L., Foot, K., and Xenos, M. (2011)
ods for studying the coevolution of networks
‘Narratives and Network Organization: A
using longitudinal data, these models have not
Comparison of Fair Trade Systems in Two
been widely adopted in historical network analy-
ses of the web. Snijders (2005), for example, puts Nations’, Journal of Communication, 61(2):
forward several statistical models that conceive of 219–245.
network dynamics as a continuous process rather Borgatti, S.P., Mehra, A., Brass, D.J., and Labi-
than being bound to the observation moments, anca, G. (2009) ‘Network Analysis in the Social
because at each moment the current network Sciences’, Science, 323(5916): 892–895.
136 THE SAGE HANDBOOK OF WEB HISTORY

Brügger, N. (2013) ‘Historical Network Analysis Hansen, D., Shneiderman, B., and Smith, M.A.
of the Web’, Social Science Computer (2010) Analyzing Social Media Networks
Review, 31(3): 306–321. with NodeXL: Insights from a Connected
Brügger, N., and Schroeder, R. (Eds.) (2017) The World. Burlington, MA: Morgan Kaufmann.
Web as History: Using Web Archives to Hollstein, B. (2011) ‘Qualitative Approaches’, in
Understand the Past and the Present. J. Scott and P.J. Carrington (Eds.), The SAGE
London: UCL Press. Handbook of Social Network Analysis.
Bruns, A., and Burgess, J. (2012) ‘Researching London: Sage. pp. 404–416.
News Discussion on Twitter’, Journalism Huurdeman, H., Kamps, J., Samar, T., et al.
Studies, 13(5–6): 801–814. (2015) ‘Lost but Not Forgotten: Finding Pages
Buriol, L., Castillo, C., Donato, D., and Leonardi, S., on the Unarchived Web’, International Jour-
Millozzi. (2006) ‘Temporal analysis of the Wiki- nal on Digital Libraries, 6(3–4): 247–265.
graph’, 2006 IEEE/WIC/ACM International Kahle, B. (1997) ‘Preserving the Internet’,
Conference on Web Intelligence (WI 2006 Main Scientific American, 276(3): 82–83.
Conference Proceedings)(WI’06): 45–51. Kincaid, J. (2010) ‘EdgeRank: The Secret Sauce
Carrington, P.J., and Scott, J. (2011) ‘Introduc- That Makes Facebook’s News Feed Tick’,
tion’, in J. Scott and P.J. Carrington (Eds.), Techcrunch. 22 April (http://social.tech-
The SAGE Handbook of Social Network crunch.com/2010/04/22/facebook-
Analysis. London: Sage. pp. 1–8. edgerank/).
De Maeyer, J. (2013) ‘Towards a Hyperlinked Kwak, H., Chun, H., and Moon, S. (2011)
Society: A Critical Review of Link Studies’, ‘Fragile Online Relationship: A First Look at
New Media & Society, 15(5): 737–751. Unfollow Dynamics in Twitter’, Proceedings
de Solla Price, D.J., and Beaver, D. (1966) ‘Col- of the SIGCHI Conference on Human Factors
laboration in an invisible college,’ American in Computing Systems. ACM, 16 July.
Psychologist 21: 1011–1018. Marres, N. (2015) ‘Why Map Issues? On Con-
Emirbayer, M., and Goodwin, J. (1994) ‘Net- troversy Analysis as a Digital Method’, Sci-
work Analysis, Culture, and the Problem of ence, Technology, & Human Values, 40(5):
Agency’, American Journal of Sociology, 655–686.
99(6): 1411–1454. Marres, N., and Rogers, R. (2008) ‘Subsuming
Farber, D. (2007) ‘Facebook: The Social Web the Ground: How Local Realities of the Fer-
Utility Company’, ZDnet, 25 Jan. (http:// gana Valley, the Narmada Dams and the BTC
www.zdnet.com/article/facebook-the- Pipeline Are Put to Use on the Web’, Econ-
social-web-utility-company/) omy and Society, 37(2): 251–281.
Fleming, L., King, C., and Juda, A.I. (2007). Martin, J.L. (2005) ‘Is Power Sexy?’, American
‘Small Worlds and Regional Innovation’, Journal of Sociology, 111(2): 408–446.
Organization Science, 18(6): 938–954. Masanes, J. (2006) ‘Web Archiving Methods
Foot, K., Schneider, S.M., Dougherty M., et al. and Approaches: A Comparative Study’,
(2003) ‘Analyzing Linking Practices: Candi- Library Trends, 54(1): 72–90.
date Sites in the 2002 US Electoral Web Milligan, I., Ruest, N., and Lin, J. (2016) ‘Con-
Sphere’, Journal of Computer-Mediated Com- tent Selection and Curation for Web Archiv-
munication, 8(4) (https://doi.org/10.1111/j. ing: The Gatekeepers vs. the Masses’, IEEE/
1083-6101.2003.tb00220.x) ACM Joint Conference on Digital Libraries
Garfield, E. (1955) ‘Citation Indexes for Sci- (JCDL): 107–110.
ence. A New Dimension in Documentation Mohr, G., Stack, M., Ranitovic, I., et al. (2004)
through Association of Ideas’, International ‘Introduction to Heritrix’, 4th International
Journal of Epidemiology, 35(5): 1123–1127. Web Archiving Workshop: 1–15.
Hale, S.A., Yasseri, T., Cowls, J., Meyer, E.T., Moreno, J. (1934) Nervous and mental disease
Schroeder, R., and Margetts, H. (2014) ‘Mapping monograph series: Who Shall Survive? A
the UK Webspace: Fifteen Years of British Uni- New Approach to the Problem of Human
versities on the Web’, Proceedings of the 2014 Interrelation. Vol. 58. Washington, D.C.:
ACM Conference on Web Science: 62–70. Nervous and Mental Disease Publishing.
NETWORK ANALYSIS FOR WEB HISTORY 137

Page, L., Brin, S., Motwani, R., and Winograd, T. Snijders, T. (2005) ‘Models for Longitudinal
(1999) ‘The PageRank Citation Ranking: Network Data’, in P.J. Carrington, J. Scott,
Bringing Order to the Web’, Stanford InfoLab and S. Wasserman (Eds.), Models and Meth-
(http://ilpubs.stanford.edu:8090/422/). ods in Social Network Analysis. Cambridge:
Park, H.W., and Thelwall, M. (2003) ‘Hyperlink Cambridge University Press. pp. 215–247.
Analyses of the World Wide Web: A Review’, Stieglitz, S., and Dang-Xuan, L. (2013) ‘Social
Journal of Computer-Mediated Communication, Media and Political Communication: A Social
8(4) (https://doi.org/10.1111/j.1083-6101.2003. Media Analytics Framework’, Social Network
tb00223.x) Analysis and Mining, 3(4): 1277–1291.
Rieder, B. (2012) ‘The Refraction Chamber: Thelwall, M., Vaughan, L., and Björneborn, L.
Twitter as Sphere and Network’, First (2005) ‘Webometrics’, ARIST 39(1): 81–135.
Monday, 17(11) (http://firstmonday.org/ojs/ Toyoda, M., and Kitsuregawa, M. (2005) ‘A
index.php/fm/article/view/4199). System for Visualizing and Analyzing the
Rieder, B. (2014) ‘What Is in PageRank? A His- Evolution of the Web with a Time Series of
torical and Conceptual Investigation of a Graphs’, Proceedings of the Sixteenth ACM
Recursive Status Index’, Computational Cul- Conference on Hypertext and Hypermedia.
ture, 2 (http://computationalculture.net/arti- ACM: 151–160.
cle/what_is_in_pagerank). Ugander, J., Karrer, B., Backstrom, L., and
Rogers, R. (2013) Digital Methods. Cambridge, Marlow, C. (2011) ‘The Anatomy of the
MA: MIT Press. Facebook Social Graph’, arXiv:1111(4503).
SalahEldeen, H.M., and Nelson, M.L. (2013) Van de Sompel, H., Nelson, M., Sanderson, R.,
‘Carbon Dating the Web: Estimating the Age Balakireva, L., Ainsworth, S., and Shankar, H.
of Web Resources’, arXiv:1304(5213) (https:// (2009) ‘Memento: Time Travel for the
arxiv.org/abs/1304.5213). Web’, arXiv:0911(1112) (https://arxiv.org/
Samar, T., Traub, M.C., and van Ossenbruggen, J., abs/0911.1112).
and de Vries, A.P. (2016) ‘Comparing Topic Wasserman, S., and Faust, K. (1994) Social
Coverage in Breadth-First and Depth-First Network Analysis: Methods and Applica-
Crawls Using Anchor Texts’, TPDL 2016: tions. Cambridge: Cambridge University
Research and Advanced Technology for Digi- Press.
tal Libraries, 10 August. Weltevrede, E., and Helmond, A. (2012) ‘Where
Siles, I. (2012) ‘The Rise of Blogging: Articula- Do Bloggers Blog? Platform Transitions
tion as a Dynamic of Technological Stabiliza- within the Historical Dutch Blogosphere’,
tion’, New Media & Society, 14(5): 781–797. First Monday, 17(2–6). (http://firstmonday.
Small, H. (1973) ‘Co-Citation in the Scientific org/ojs/index.php/fm/article/
Literature: A New Measure of the Relation- view/3775/3142)
ship between Two Documents’, Journal of White, H.C. (1992) Identity and Control: A
the Association for Information Science and Structural Theory of Social Action. Princeton,
Technology, 24(4): 265–269. NJ: Princeton University Press.
Smith, M.A., Schneiderman, B., Milic-Frayling, White, H.C., Boorman, S.A., and Breiger, R.L.
N., et al. (2009) ‘Analyzing (Social Media) (1976) ‘Social Structure from Multiple Net-
Networks with NodeXL’, Proceedings of the works. I. Blockmodels of Roles and Posi-
Fourth International Conference on Commu- tions’, American Journal of Sociology, 81(4):
nities and Technologies. ACM: 255–264. 730–780.
11
Quantitative Web History Methods
Anthony Cocciolo

INTRODUCTION as a decreased use of text on the web in favor


of image-based content, such as video and
This chapter explores how historical research photographs. I was particularly interested in
questions, including research questions about what seemed like an erosion of written con-
the history of the web, can be addressed tent online in favor of a form of communica-
through quantitative research methods applied tion that seemed to share commonalities with
to web archives. At a basic level, quantitative children’s books, where photographs were
methods involve applying a variety of math- accompanied with small amounts of text. In
ematical or statistical analyses on numerical seeing what looked like a movement away
data. These mathematical techniques can be from the written word, I was reminded of the
as simple as adding up the occurrence of work of Walter Ong who noted the tenacity of
some word, to more sophisticated techniques orality, or a tendency to attempt to return to
such as analysis of variance (ANOVA), which an oral culture despite the success and obvi-
will be described in more depth in this chap- ous benefits of literacy (Ong, 2002). An oral
ter. Through the use of web archives, quanti- culture is one without knowledge of literacy,
tative methods can be used to show patterns where information, knowledge, and culture
and changes over time, thus having utility in is communicated and passed down through
addressing historical research questions. means other than the written word, such as
In this chapter, a personal use of quantita- through oral storytelling, music, and other
tive research methods with web archives will non-written means. Was the internet, with its
be discussed as a way of illustrating how they newfound ability to easily stream video and
can be used more broadly (Cocciolo, 2015). high-resolution imagery, allowing for Ong’s
In 2014, I was interested in what I perceived return to orality?
QUANTITATIVE WEB HISTORY METHODS 139

Although I realized that I could not make The aim of this chapter is to offer a starting
such sweeping arguments in an academic point for historians interested in applying
study – reviewers would be none too pleased – quantitative research methods to web archives
I was still interested in developing a sound to answer historical research questions, using
method for determining changes in the personal experience as an illustration.
amount of text delivered to users over time. To However, before the stages are discussed,
study this, web archives are essential because relevant literature on using quantitative
they contain copies of webpages from the research methods with web archives will be
past. Although some countries have exten- introduced.
sive web archives of their national domain
or other collecting areas, in the United States
– which was my main study site – the most
extensive web archive is the one kept by the RELEVANT LITERATURE
Internet Archive that it displays through its
WayBack Machine. Thus, I knew that if I was Using quantitative research methods with
to study changes in the presentation of text web archives may be somewhat new to histo-
over websites used by people in the United rians. In a traditional sense, historical research
States, the web archives kept by the Internet involves the close examination of textual
Archive would be an essential resource. records to address historical research ques-
Before I discuss how I applied web tions. The question of whether to use quanti-
archives in my research, I will outline the tative or qualitative research methods is not
general steps for engaging in quantitative generally directed at the historian but rather at
research using web archives. Each of these the social scientist, such as the psychologist,
steps will be described in more detail in the sociologist, and education researcher.
following sections and will draw on this per- Quantitative and qualitative research – when
sonal example. These steps are: referred to by social scientists – typically
involves studying living people, which may or
1 Developing a research question – First, a research may not be the case for historians. In social
question should be developed that can be science, qualitative research methods such as
addressed in full or in part through web archives. interviews and focus groups are often used to
Types of questions that can be explored through
get at people’s understanding of something
such methods will be explored, as well as those
that is not well understood, such as motiva-
that are better suited for other methods.
2 Securing a corpus – Second, the corpus of web tions or opinions on some topic. Quantitative
archived content that provides coverage of the research can be used to study an issue or topic
areas appropriate to the research question ought that may be better understood. However, there
to be secured, and ways to gain access to such is greater interest in seeing how wide or gen-
corpora will be discussed. eralizable a given view is. Whereas qualitative
3 Numerical translation – Third, the corpus of text research may involve analysis of data such as
needs to be translated into numerical data based interview transcripts, quantitative research
on the research question. may involve analysis of numerical data such
4 Analysis – Fourth, using the numerical datasets as that generated from a survey. In this paper,
created in the earlier step, mathematical or sta-
‘quantitative research’ is used to refer to per-
tistical analysis techniques can be employed. This
forming analysis using numerical and statisti-
can be as simple as mathematical functions such
as summation, average, or standard deviation, cal techniques on data, specifically web
to more complex analysis, including statistical archives. While this method may not be com-
techniques such as analysis of variance (ANOVA). monly used by historians, it can be used to
5 Drawing conclusions – Fifth, like all research, con- help address historical research questions in
clusions should be drawn based on the analysis. conjunction with other sources of evidence.
140 THE SAGE HANDBOOK OF WEB HISTORY

Studying webpages and web-based phe- opportunities for historical and longitudinal
nomena using quantitative methods is not analysis are increasingly possible.
new as it is captured in the research subfield In the field of communication and media
of information science known as webomet- studies, researchers have begun to use web
rics. According to Thelwall and Vaughan, archives to create web histories, which
‘Webometrics encompasses all quantitative Brügger defines as ‘a necessary condition
studies of web-related phenomenon’ (2004: for the understanding of the Internet of the
1213). Thelwall (2009) notes that webomet- present as well as of new, emerging Internet
rics can be used for studying a variety of web- forms’ (2011: 24). Web histories can include
based phenomena, such as issues relating to studies of multiple facets of the web, such as
election websites, online academic commu- national domains, which may look at factors
nication, bloggers as amateur journalists, and such as volume, space, structure, and content,
social networking. The methods can be used among others (Brügger, 2014).
for understanding aspects like web impact
assessment, citation impact, trend detec-
tion, and search engine optimization, among
other possible uses. Webometrics grows out STAGE 1 – DEVELOPING A RESEARCH
of the subfield of information science known QUESTION OR QUESTIONS
as bibliometrics, which uses quantitative
analysis to make measurements related to Before progressing further, I must make a
published books and articles, such as citation quick note on the language used in this arti-
analysis to determine impact. Related sub- cle as it varies to some degree by country. By
fields include infometrics, which is the quan- ‘homepage’, I am referring to the start page
titative study of information and can combine or initial page of a website. I also use the
analysis of information in whatever form it term website, which refers to an entire col-
may occur. lection of webpages under a given domain.
Björneborn and Ingwersen (2004) high- For example, the website ‘pepsi.com’ is
light four main areas of webometric research: composed of a homepage and other web-
1) webpage content analysis; 2) web link pages that are hyperlinked together to form
structure analysis; 3) web usage analysis; and the website.
4) web technology analysis. Notably missing When engaging in quantitative research
from this list is a longitudinal or time-based using web archives, it is necessary to have
dimension. However, webometrics research- a research question that lends itself to such
ers highlight the possibilities opened up by methods. For my project mentioned in the
web archives. Björneborn and Ingwersen introduction, my research questions are the
highlight that ‘Web archaeology… could following:
in this webometric context be important for
Is the use of text on the World Wide Web declin-
recovering historical Web developments, for ing? If so, when did it start declining, and by how
example, by means of the Internet Archive much has it declined?
(www.archive.org)’ (2004: 1217). When
webometrics was in its early development in The above research questions are well-suited
the 2000s, web archives such as the Internet for quantitative methods using web archives.
Archive only contained a few years of con- The first reason why is that the question is
tent, making them less appealing for long- essentially quantitative in nature: a ‘decline’
term, longitudinal analysis. However, as web and by ‘how much’ is something that can be
archives have persisted, and notable web readily measured numerically by comparing
archives such as the Internet Archive have data from some specific year in the past and
surpassed 20 years of crawling websites, new measuring it against a more recent year. The
QUANTITATIVE WEB HISTORY METHODS 141

second reason is that web archives are the person, or place. Basically, if it can be read-
essential resource for seeking answers to ily identified by a computer or human, it can
the above questions. As the Internet Archive be used in the research question. Another
has been archiving the web since 1996 component can be the co-occurrence of one
(Goel, 2016), it is possible to use it to ana- ‘thing’ with another. Co-occurrence analy-
lyze homepages for nearly the entire lifespan sis can factor in the distance between each
of the World Wide Web. Although web ‘thing’, such as word distance, pixel distance,
archives are generally not available for the or number of links away. More sophisticated
first few years of the World Wide Web, by the relationships between one or more ‘things’
end of the 1990s good web archives – spe- can also be studied, such as the nature of sen-
cifically through the WayBack Machine – timents relating one element to another (Liu,
exist. Thus, web archives work well for 2012). The more complicated the phenom-
studying content from the late 1990s to the enon – such as sentiment – the more com-
present day. plex the algorithms for identifying them must
The major limitation when making com- be. With the increased complexity, there is
parisons between the past and present is that greater chance that the algorithm could per-
not all present-day webpages existed in the form poorly and not correctly identify the
past, and not all past websites continue into sentiment. Thus, simpler phenomena, such
the future. Further, some webpages blocked as occurrence or co-occurrence, are more
web crawlers because they feared losing straightforward to determine than more
control of their content, thus leaving sites sophisticated relationships. Emerging prac-
like Time Magazine (time.com) poorly rep- tices that use machine learning and artificial
resented in 1990s web archives. A further intelligence in algorithms have the potential
limitation is that most web archives do not for identifying complex phenomena and rela-
copy every webpage within a given domain, tionships (Kelleher et al., 2015). Although
but only go a few levels deep off of the home- well beyond the scope of this article, machine
page. While the Internet Archive has very learning techniques that leverage artificial
extensive copies of homepages of top-level intelligence have potential application to
domain names for many months and years, research questions that are inherently histori-
webpages several levels below the home- cal in nature.
page are less well-represented. Thus, using Questions well-suited for web archives
web archives to make comparisons between research could be the timeframe of the late
homepages is much more feasible than mak- 1990s to the present, which are the years that
ing comparisons against some webpage many are well-represented in some web archives. If
levels below the homepage. Understanding earlier years are being included, web archives
the strengths and limitations of a particular may need to be augmented with more tradi-
web archive necessarily impacts the types of tional sources, such as newspapers, maga-
research questions and analysis that it can be zines, and books. Analysis of such sources
used to address. can be expedited to some extent by using
Although there are an infinite number of digitized copies of such works, but print
possible research questions that may make holdings may be needed as not everything
use of quantitative research methods with has been digitized or, if it has, is it necessar-
web archives, some particular components ily available to researchers?
are more appropriate than others. Research In a perfectly linear world, research-
questions that are looking into the occurrence ers move from creating research questions,
of some ‘thing’ are particularly noteworthy, to devising methods for addressing those
which can include the occurrence of a word, questions, to implementing the methodol-
image, phrase, visual element, hyperlink, ogy, analyzing the data, generating results,
142 THE SAGE HANDBOOK OF WEB HISTORY

and drawing conclusions. However, as many could take a long time to completely down-
researchers know, the questions are devel- load. For example, Masanès notes ‘it will
oped in dialectic with the available research take more than three days to archive a site
methods and data sources available; thus with 100,000 pages’ (2006: 24).
each aspect influences the other. Hence, it Some of the limitations of client-side
would not be unusual to refine research ques- web archiving are overcome by server-side
tions based on the data that can be secured, web archiving, in which files are copied
or the analysis options available. In the next directly from the server in conjunction with
section, securing the web archive corpus will the site owner’s cooperation. This method
be discussed. was used by the Library of Congress to cre-
ate an archive of Twitter (Osterberg, 2013).
The limitation of this approach is re-creating
the webpages so that they are authentic to
STAGE 2 – SECURING A WEB ARCHIVE what the user would have experienced, and
CORPUS the extensive effort required negotiating the
transfer of data.
The next step in the research process is Perhaps the simplest form of web archiv-
securing or identifying a corpus of web ing is to create non-web archives, where
archived webpages that can be used or ana- web content is printed out or converted to
lyzed to address the research questions. a format like Adobe Acrobat PDF or PNG
Before obtaining a corpus, it is important to files and stored using something other than
understand the ways in which web archives the web (e.g., file folders, directories on a
come into existence. Some of the methods computer) (Masanès, 2006). Although this
used to archive web content are client-side method has some appeal because of its sim-
archiving, server-side archiving, and non- plicity, it loses the context in which users
web archiving (Masanès, 2006). Client-side experienced the content and the way it was
archiving is the most popular form of web navigated using hyperlinks. It also could
archiving and is used by the Internet Archive lose some of the graphical look and feel
(2018) to collect webpages for display on the of the webpage, which is readily evident
WayBack Machine. In this approach, web when most webpages are saved as PDFs or
crawlers act like normal web users and ‘start printed out.
from seed pages, parse them, extract links, In the case of my research project on the
and fetch the linked document’, then re-iter- decline of text on the web, I was interested in
ate (Masanès, 2006: 23). This method works making comparisons between webpages from
well for simpler webpages, but could encoun- today with those in the past. Thus, I needed
ter difficulty when encountering webpages to use webpages that I know existed in the
that exchange content in between webpage past and persist until today. I developed a list
loads, which is popularly known as the of 100 popular and prominent websites in the
Asynchronous JavaScript and XML (AJAX) United States that existed from 1999 to 2014
approach to creating web interfaces. Many which were available through the WayBack
social media sites make extensive use of this Machine, using indexes like Alexa’s Top
approach, and without special provisions for 500 English-language Website index (Alexa,
web archiving, could make retrieving this 2003). Popular and prominent websites were
content challenging. This could be over- selected – rather than websites that may be
come, but may require manual intervention obscure and unused – because they may bet-
by a skilled web archivist. This approach is ter reflect the interests and desires of the
also challenging when attempting to down- general user population. The list is repeated
load large collections of webpages, which below in Table 11.1.
QUANTITATIVE WEB HISTORY METHODS 143

Table 11.1 Website categories with respective websites


Category Total websites Websites

Consumer products 10 Amazon.com, Pepsi.com, Lego.com, Bestbuy.com, Mcdonalds.com, Barbie.com,


and retail Coca-cola.com, Intel.com, Cisco.com, Starbucks.com
Government 11 Whitehouse.gov, SSA.gov, CA.gov, USPS.com, NASA.gov, NOAA.gov, Navy.mil, CDC.
gov, NIH.gov, USPS.com, NYC.gov
Higher education 10 Berkeley.edu, Harvard.edu, NYU.edu, MIT.edu, UMich.edu, Princeton.edu, Stanford.
edu, Columbia.edu, Fordham.edu, Pratt.edu
Libraries 8 NYPL.org, LOC.gov, Archive.org, BPL.org, Colapubliclib.org, Lapl.org, Detroit.lib.
mi.us, Queenslibrary.org
Magazines 12 USNews.com, TheAtlatnic.com, NewYorker.com, Newsweek.com, Economist.
com, Nature.com, Forbes.com, BHG.com, FamilyCircle.com, Rollingstone.com,
NYMag.com, Nature.com
Museums 10 SI.edu, MetMuseum.org, Guggenheim.org, Whitney.org, Getty.edu, Moma.org,
Artic.edu, Frick.org, BrooklynMuseum.org, AMNH.org
Newspapers 9 NYTimes.com, ChicagoTribune.com, LATimes.com, NYDailyNews.com, Chron.com,
NYPost.com, Suntimes.com, DenverPost.com, NYPost.com
Online service 8 IMDB.com, MarketWatch.com, NationalGeographic.com, WebMD.com, Yahoo.com,
Match.com
Technology site 11 CNet.com, MSN.com, Microsoft.com, AOL.com, Apple.com, HP.com, Dell.com,
Slashdot.org, Wired.com, PCWorld.com, IBM.com
Television 11 CBS.com, ABC.com, NBC.com, Weather.com, PBS.org, BBC.co.uk, CNN.com, Nick.
com, MSNBC.com, CartonNetwork.com, ESPN.go.com
Total 100

Although the Internet Archive has archived all my sites submitted the Internet Archive
webpages from 1996 onward, the range was the web archive that was brought back
of webpages archived improved as years by Memento. I developed a PHP script that
advanced, and by 1999 many more websites issued requests to the Memento web service
were being archived than in 1996. Thus, I for the 100 websites for six years, and in each
would begin my comparisons at the year case it returned a URL of the content from
1999. To show changes over time, I would the Internet Archive, thus producing 600 web
analyze those websites every three years archived pages. Note that websites originally
(1999, 2002, 2005, 2008, 2011, and 2014). included in the 100 websites that were not
In sum, all websites needed to continuously archived for any given year were removed
exist between 1999 and 2014, and all those in and replaced with a website that had all years
Table 11.1 met that criteria. archived from the respective category. Thus,
The way that I chose to secure the archived each homepage was inspected manually to
webpages was to use Memento (2018). ensure that it was there and was not some
Memento is a technical framework aimed at type of error page that would throw off the
a better integration of the current and the past analysis.
web, and provides a way to issue requests and Web archived webpages (the HTML and
receive responses from web archives (Van de binary files) can be downloaded in a web
Sompel et al., 2009). For example, you can browser window (using the File -> Save As
submit to the web service a URL and date, option), or a script can be used to download
and it will bring back the URL for the web the HTML file and related files needed to
archived content. The URL returned can render the page. For example, the command-
be from a variety of web archives, but for line based tool ‘wget’ makes downloading
144 THE SAGE HANDBOOK OF WEB HISTORY

webpages and related binary files relatively the original (Brügger, 2008). Web archived
straightforward. For example, issuing the fol- websites can have problems with the link
lowing command via the Windows command or with the content displayed, among other
line or Macintosh terminal using ‘wget’ will possible issues. One advantage of creating a
download the Internet Archive’s July 1997 graphic version of a web archived webpage
homepage of the Pratt Institute’s website: is that it stabilizes it: since a PNG file can be
only opened and rendered one way, there is no
Box 11.1 Example use of command line chance that it will be displayed differently on
tool wget different computers or browsers. However, it
wget -p -e robots=off https://web.archive.org/ has the disadvantage of eliminating the special
web/19970713123416/
functions of webpages (e.g., hyperlinks, inter-
http://www.pratt.edu/
active content, moving image content, etc.).
Thus, while it eliminates some problems (e.g.,
Note that in the above example, an option browser obsolescence that may render some
is passed to wget to ignore the robots.txt file functions inoperable), it introduces new limi-
on the Internet Archive’s WayBack Machine, tations (e.g., transforming an inherently inter-
which explicitly blocks all crawlers. It is a active medium into a static image).
strange irony that the site that was built on For projects where only the text is impor-
crawling websites does not allowing crawling! tant and not the HTML or related binary files,
However, this is likely not so problematic as scripts can be developed which remove the
long as large amounts of content are not down- HTML. For example, a regular expression
loaded all at once, which can place strain on a script, implemented in PHP or Python, can
webserver. In fact, the above request will only remove all HTML markup from a file, leav-
download a small amount of data. If too many ing only the readable text. Further, the PHP
requests are being issued too quickly, it is function strip_tags() can remove HTML
likely that you could be temporarily blocked. information from a webpage, leaving only
The ‘p’ option included with the wget com- the visible text. Lastly, specialized libraries,
mand ensures that the crawler downloads all such as the Beautiful Soup library for Python,
the page pre-requisites, such as GIF and JPG can be used for extracting information from
files referenced in the HTML page. HTML and XML files (Beautiful Soup,
In my case, I was not interested in getting 2018; PHP.net, 2018).
the HTML and related binary files (e.g., JPGs, When engaging in such tasks as stripping
GIFs), but wanted large visual presentations the HTML from a file, it is always a good idea
of the webpage. This was because the method to maintain copies of your data in stages. This
I intended to use to determine which parts of is because if you realize further in the process
the webpage were graphics and which were that you made an error (e.g., stripped out more
text was to use a computer vision algorithm, content than was expected), you can refer to
which will be discussed more in the next sec- copies of the data from the earlier stages.
tion. To download the full print-screens, I used Note that this is only one method to get
a simple Firefox extension called ‘Grab Them access to web archived content. This method
All’, which creates full webpage screenshots diverges from the ‘big-data’ approaches to
of pages as PNG files using a seed list of working with web archives, such as using
URLs (Grab Them All, 2018). I was able to gigabytes, terabytes, or even petabytes of
give it a seed list of 600 URLs, and in half an web archived content in analyses. To gain
hour I had large PNG visual representations access to big datasets of web archived con-
of those archived webpages. tent, it is necessary to work more directly
Brügger writes that a web archived website with providers, rather than downloading
is typically faulty and deficient compared with small bits from the web. For example, the
QUANTITATIVE WEB HISTORY METHODS 145

non-profit organization Common Crawl pro- HTML, Javascript, and other parts of a web-
vides researchers access to web archived data page that are not shown to users via a web
(Common Crawl, 2018). Since these datasets browser. This format can be especially useful
are so large, it can be inefficient to make cop- if you are interested in the textual content on
ies of the data, and thus sites like Common a webpage.
Crawl allow users to run their analysis One limitation of Common Crawl’s data
against the cloud-based data using comput- is that its earliest is from 2008 and 2009,
ing services provided by Amazon. stored in ARC format, which was an earlier
Before discussing big-data approaches in web archiving format that preceded WARC.
more detail, it is important to make a distinc- ARC is very similar to WARC, and many
tion between data and metadata. Metadata tools support both formats, so this should not
can be defined as ‘data about data’. A famil- be a major impediment. However, because
iar example is that a telephone conversation this data only begins in 2008, it would not be
could be considered the data, but the phone useful for studying the earliest years of the
number dialed to enable the connection and web. To study webpages from the 1990s, the
the duration of the call might be considered Internet Archive is likely the best source.
metadata. In the case of web archives, the
data could be the HTML of a webpage where
the metadata may be the URL and the date
the copy of the HTML was made. STAGE 3 – NUMERICAL TRANSLATION
Common Crawl provides users with access
to the web archive data and metadata in three Once the corpus has been secured, whether
formats: WARC file, metadata-only format this comprises WARC files, metadata of web
(WAT) and text-only format (WET). WARC crawls, text from crawls, screenshots of web-
files are a standard-based, text-based format pages, or other manifestations of data from
for representing web-crawled webpages. web archives, it is necessary to begin to pre-
WARC files include the HTML for crawled pare the data for numerical analysis. Web
webpages, metadata on the crawl (e.g., what archives have many facets, and thus the ways
day and time the site was crawled), and the in which the data gets translated into numeri-
binary files encoded in the text-based for- cal data is going to depend very much on the
mat. WARC files can be large and difficult research questions. In the case of my research
to deal with, especially if you are not using project described earlier, I was interested in
all the data provided in them. For example, using the 600 full-length screenshots, repre-
all the JPGs, GIFs, or other binary data that senting 15 years of homepages from 100
are part of a webpage get encoded in WARC popular and prominent websites, to see if the
files, easily making the majority of WARC amount of text presented to users was declin-
files comprise nonsensical content because ing. Thus, I needed a method to decipher the
the binary data is not human-readable but textual content from other content, such as
machine-readable. If the binary data, such images, videos, or whitespace.
as images, are not necessary for a particular I ended up modifying an open-sourced
research project, the other formats Common web browser extension called Project Naptha
Crawl offers may be better options. These (2018), which could be used for detecting
include the metadata-only format (WAT), blocks of text within an image. This exten-
which provides information like the page sion implemented an innovative computer
title and outgoing links on the page, making vision algorithm called the Stroke Width
it useful for creating networks of webpage Transform (SWT). SWT was created by a
linkages. The last format provided is the Microsoft research team who observed that
text-only format (WET), which removes the the ‘one feature that separates text from other
146 THE SAGE HANDBOOK OF WEB HISTORY

elements of a scene is its nearly constant the dome incorrectly as a text area, as well
stroke width’ (Epshtein et al., 2010: 2963). as some other very small areas. Nevertheless,
During their initial evaluation the algorithm it has an accuracy that is consistent with the
was able to identify text regions within natu- findings of the Microsoft researchers. A sec-
ral images with 90% accuracy. ond example provided is that of the White
For example, Figure 11.1 shows this pro- House website from 2002 using this same
cess used on the Library of Congress webpage process (shown in Figure 11.2).
from Internet Archive’s 2002 collection, with Using the bounding boxes produced by
the black boxes identifying the text regions Project Naptha, a percentage of webpage
from the images. The algorithm is not without text to non-text was computed, and recorded
minor inaccuracies. It has identified part of into a MySQL database. For example, the

Figure 11.1 Library of Congress website from year 2002, with text areas highlighted with
black bounding boxes. Webpage is 23.33% text using this method.
QUANTITATIVE WEB HISTORY METHODS 147

webpage shown in Figure 11.1 is 23.33% 600 screenshots of webpages. The data
text, whereas the webpage shown in Figure included URL, percentage of text to non-
11.2 is 46.10% text, which indicates what text, and year. This is only one narrow slice
is readily visible: that Figure 11.2 is more of data that can be generated, and other
­text-heavy than Figure 11.1. numerical data can be derived. For example,
Thus, at this point in my research pro- this could include recording information like
ject, I had created numerical data from word counts, properties of images or videos,

Figure 11.2 WhiteHouse.gov from 2002 with text areas highlighted with black bounding
boxes. Webpage is 46.10% text using this method.
148 THE SAGE HANDBOOK OF WEB HISTORY

relative amount of executable code to HTML amount of text online was declining. I had
on webpages, and file sizes, among many 600 data points which included URL, year,
other possible numerical properties. and percentage of text from non-text that was
Other methods for translating web archives generated using the computer vision algo-
into numerical data include the use of topic rithm described in the earlier step. An analy-
modeling, text mining, and natural language sis step that I was interested in engaging in
processing (NLP) tools (Graham et al., 2016). was the average percentages for each year
NLP is a research field within the discipline available: 1999, 2002, 2005, 2008, 2011, and
of computer science, and has a number of 2014. Although this analysis could be readily
applications that can readily produce numeric achieved in any spreadsheet program such as
data. These can include named entity recog- Microsoft Excel, I used the statistics package
nition, such as identifying the number of ref- SPSS. However, average values alone are not
erences to a specific person, place, or thing; enough evidence that text is rising and fall-
sentiment analysis; and topic identification ing. For example, say the year 2014 was 30%
(Jurafsky and Martin, 2008). A number of text because all webpages were approxi-
free tools and open-toolkits are available mately 30% text, whereas in 1999 the aver-
for engaging in natural language processing age was also 30% but half were 15% text and
(Bird et al., 2018; Natural Language Toolkit, the other half 45% text. In cases such as this
2018; Open NLP, 2018; Stanford CoreNLP, one, the standard deviation statistic is impor-
2018). All of these tools require some experi- tant because it clarifies the extent to which
mentation, as well as verification that they are the data diverges from the mean, and indi-
producing the desired result. If such tools are cates how the mean should be interpreted.
being used, it is important to verify that they Microsoft Excel can also be used for comput-
are working correctly, such as correctly iden- ing standard deviation, but again I used
tifying from a pre-existing list or accurately SPSS. Table 11.2 shows the results of both
identifiying a sentiment. Evaluating the util- these computations, and a visualization of
ity of these NLP tools can be accomplished the values is shown in Figure 11.3.
through sampling the source material and the In addition to mean and standard deviation
resulting outcome to verify that they are work- statistics, additional statistical work can be
ing as expected. This is especially important undertaken to provide greater meaning to the
if the method being used has not been proven average values. In this case, I was interested
to work elsewhere. Although there is no hard if the means (or the average amount of text
and fast rule of how large the sample must be, on a webpage) were dependent on the year
I suggest at least 10% for unproven method- they were produced, or if these means were
ologies. The evaluation can be enhanced by simply random. Although standard deviation
using two independent evaluators to measure
how well the tools are working on the sam-
ple data. Ensuring this consistency is often Table 11.2 Mean percentage of text on a
webpage pear year, with standard deviation
referred to as interrater reliability. values
Year Mean percentage of Standard
text on a webpage deviation

STAGE 4 – ANALYSIS 1999 22.36 15.45


2002 30.89 14.93
Once web archives have been used to gener- 2005 32.43 14.60
ate numerical data, this data can be used in 2008 31.31 15.88
analysis. In the research project described 2011 28.51 15.47
here, I was interested in knowing if the 2014 26.88 13.23
QUANTITATIVE WEB HISTORY METHODS 149

Figure 11.3 Percentage of text on webpages.

can measure variation, statistical signifi- to as one-way ANOVA. The SPSS software
cance tests can ensure that this variation is facilitates the computation work for this test,
not merely chance but is dependent on some and provides results that can be interpreted
other variable or variables. In this case, that to conclude whether there is indeed a sta-
variable is the year the webpage was pro- tistically significant relationship between
duced, which is referred to as the independ- these variables. When using statistical tests
ent variable because it is a fact that does not like ANOVA, the existence of a statistically
depend on other variables. The percentage of significant relationship is determined by
text on a webpage could then be considered the ‘p-value’ or probability value. Although
a dependent variable, whose value is hypoth- explaining how p-values work is beyond the
esized to be dependent on the year the web- scope of this article, this particular research
page was produced. ANOVA produced a p-value of less than
When engaging in statistical tests, a find- .0005, which led to the conclusion that there
ing of ‘statistical significance’ indicates that was a statistically significant relationship
the dependent variable’s values are not pure between the percentage of text on a webpage
chance but are influenced one way or another and the year it was produced. In the case of
by the independent variable. In this project, the research project discussed here, a ‘one-
the specific test I used was the one-way way ANOVA revealed that the percentage of
analysis of variance, which is often referred text on a webpage are not chance occurrences
150 THE SAGE HANDBOOK OF WEB HISTORY

but rather this percentage is dependent on the STAGE 5 – DRAWING CONCLUSIONS


year the website was produced’ (Cocciolo,
2015). Before drawing conclusions from research
It should be acknowledged that many his- using quantitative methods, it is necessary to
torians may not be particularly well-versed describe the limitations. As mentioned ear-
in statistics. I received doctoral training at a lier, quantitative methods are well-suited for
school of education, where statistics course- large datasets, such as ones with at least
work is generally required. However, histori- 30 data points. Beyond concerns of dataset
ans interested in using big data can become size, other limitations can be articulated in
more experienced in statistics through their the conclusion. For example, in the research
own independent study or through formal study described here, one limitation was that
coursework. As I do not always use statisti- that it only included websites that were popu-
cal tests in my research, I find myself hav- lar and prominent in the United States, and
ing to brush up on how to use such tests thus the changes to the composition of those
and how to implement them in computer webpages over time may not be a universal
software, such as SPSS. However, I have phenomenon but rather US-specific.
found some resources useful, most notably Although webpages in the United States have
the Laerd Statistics (2018) tutorials on the significant use from individuals outside of
web. Although they provide some free infor- the United States, being able to highlight
mation, most of it is available via a monthly how aspects like website selection impact the
subscription. It provides comprehensive dis- conclusions that can be drawn is necessary.
cussions of all the different types of statisti- Further, issues around the limitations of web
cal tests, with plenty of examples, as well as archiving – or what got archived and what
information on how to implement the tests got missed – could lead to erroneous conclu-
in SPSS and interpret the results. The sub- sions. For example, in this study I noted that
scription costs are low and well worth the some popular and prominent websites – such
small investment. Consulting resources such as Time Magazine – are poorly web archived
as this, as well as other resources such as in the WayBack Machine because of limita-
statistics textbooks (e.g., Mendenhall et al., tions defined by Time in its robots.txt file.
2012), can be useful in understanding how The conclusion reached in the study was
statistical tests work and how the results that ‘the percentage of text on the Web
should be interpreted, including knowing climbed during the turn of the twentieth
whether there is a statistically significant century, peaked in 2005, and has been on
relationship between the variables. A further the decline ever since’ – with the caveat that
option is to explore online courses, such as this finding is based on popular and promi-
courses available through Khan Academy nent websites in the United States (Cocciolo,
and Coursera. 2015). Other issues arising from limitations
Note that there are some limitations to of the web archive corpus can also be dis-
using statistics. If the dataset is small, such as cussed when drawing conclusions from the
under 30 data points, statistics may not be the research.
best tool and results can be augmented with
qualitative information. The study described
here could be significantly enhanced by
making use of a dataset larger than 600 data CONCLUSION
points. When using statistics, in general,
large datasets are better than small, so 6,000, In conclusion, this paper offered a starting
or even six million, data points could enhance point for historians interested in using quan-
the overall significance of the study. titative research methods using web archives.
QUANTITATIVE WEB HISTORY METHODS 151

Although the use of such methods requires Brügger, N. (2014) ‘Probing a Nation’s Web
some study of statistical research methods, as Domain: A New Approach to Web History
well as implementing scripts using tools like and a New Kind of Historical Source’, in
PHP and Python, these can all be readily G. Goggin and M. McLelland (Eds.), The
learned using resources described in this Routledge Companion to Global Internet
Histories. New York: Routledge. pp. 61–73.
chapter, including online resources. Such
Cocciolo, A. (2015) ‘The rise and fall of text on
methods allow for making sense of large the Web: A quantitative study of Web
amounts of data and addressing historical archives’, Information Research 20(3). Avail-
research questions with precision. In some able at: http://www.informationr.net/ir/20-3/
cases, the best option may be engaging with paper682.html [19 February 2018].
others who have the necessary expertise to Common Crawl (2018) Available at: http://
study the phenomena. These can include commoncrawl.org/ [18 February 2018]
computer programmers, statisticians, and Epshtein, B., Ofek, E., and Wexler, Y. (2010)
those with access to facilities with web ‘Detecting text in natural scenes with stroke
archives. Through such collaborations, histo- width transform’, in Proceedings of 2010
rians can open up exciting new research IEEE Conference on Computer Vision and
Pattern Recognition in San Francisco, CA.
avenues using web archives.
New York, NY: IEEE. pp. 2963–2970.
Goel, V. (2016) ‘Defining Web pages, Web sites
and Web captures’, Internet Archive Blogs.
Available at: https://blog.archive.org/2016/10/
REFERENCES 23/defining-web-pages-web-sites-and-web-
captures/ [19 February 2018].
Alexa (2003) ‘Alexa’s Top 500 English-language Grab Them All (2018). Available at: https://add
Website index (2003 web archive)’. Available ons.mozilla.org/en-US/firefox/addon/grab-
at: https://web-beta.archive.org/web/ them-all/ [19 February 2018].
20031209132250/http://www.alexa.com/ Graham, G., Milligan, I., and Weingart, S.
site/ds/top_sites?ts_mode=lang&lang=en (2016) Exploring Big Historical Data: The
[18 February 2018]. Historian’s Macroscope. London: Imperial
Beautiful Soup Python Library (2018) Available College Press.
at: https://www.crummy.com/software/ Internet Archive (2018) ‘The Way Back Machine’.
BeautifulSoup/bs4/doc/ [18 February 2018]. Available at: http://archive.org
Bird, S., Klein, E., and Loper, E. (2018) ‘Natural Jurafsky, D., and Martin, J.H. (2008) Speech
Language Processing with Python – Analyz- and Language Processing, 2nd edition. New
ing Text with the Natural Language Toolkit’. York: Pearson Prentice Hall.
Available at: http://www.nltk.org/book/ Kelleher, J.D., Mac Namee, B., and D’Arcy, A.
[19 February 2018]. (2015) Fundamentals of Machine Learning
Björneborn, L., and Ingwersen, P (2004) for Predictive Data Analytics: Algorithms,
‘Toward a basic framework for webomet- Worked Examples and Case Studies. Cam-
rics’, Journal of the American Society for bridge, MA: MIT Press.
Information Science and Technology 55(14): Laerd Statistics (2018). Available at: https://
1216–1227. statistics.laerd.com/ [19 February 2018].
Brügger, N. (2008) ‘The archived website and Liu, B. (2012) Sentiment Analysis and Opinion
website philology: A new type of historical Mining. San Rafael, CA: Morgan & Claypool.
document’, Nordicom Review 29(2): 155–175. Masanès, J. (2006) ‘Web Archiving: Issues and
Brügger, N. (2011) ‘Web Archiving – Between Methods’, in J. Masanès (Ed.), Web Archiv-
Past, Present and Future’, in M. Consalvo ing. Berlin: Springer. pp. 1–53.
and C. Ess (Eds.), The Handbook of Internet Memento (2018) ‘time travel’. Available at:
Studies. Malden, MA: Wiley-Blackwell. http://timetravel.mementoweb.org/ [19 Feb-
pp. 24–42. ruary 2018].
152 THE SAGE HANDBOOK OF WEB HISTORY

Mendenhall, W., Beaver, R.J., and Beaver, B.M. Project Naptha (2018) Available at: https://
(2012) Introduction to Probability and Statis- projectnaptha.com/ [18 February 2018].
tics. Stamford, CT: Duxbury Press. Stanford CoreNLP (2018) Available at: http://
Natural Language Toolkit (2018). Available at: stanfordnlp.github.io/CoreNLP/ [19 February
http://www.nltk.org/ [19 February 2018]. 2018].
Ong, W.J. (2002) Orality and Literacy: The Tech- Thelwall, M. (2009) Introduction to Webomet-
nologizing of the Word. London: rics: Quantitative Web Research for the
Routledge. Social Sciences. San Rafael, CA: Morgan &
Open NLP (2018) Available at: http://opennlp. Claypool.
sourceforge.net/projects.html [19 February Thelwall, M., and Vaughan, L. (2004) ‘Webo-
2018]. metrics: An introduction to the special issue’,
Osterberg, G. (2013) ‘Update on the Twitter Journal of the Association for Information
Archive at the Library of Congress’, Library of Science and Technology 55(14):
Congress Blog. Available at: http://blogs.loc. 1213–1215.
gov/loc/2013/01/update-on-the-twitter- Van de Sompel, H., Nelson, M.L., Sanderson, R.,
archive-at-the-library-of-congress/ [19 February Balakireva, L.L., Ainsworth, S., and Shankar,
2018]. H. (2009) ‘Memento: Time travel for the Web’.
PHP.net (2018) ‘strip_tags’. Available at: http:// Available at: http://arxiv.org/abs/0911.1112
php.net/manual/en/function.strip-tags.php [18 February 2018].
[18 February 2018].
12
Computational Methods for Web
History
Anat Ben-David and Adam Amram

INTRODUCTION Compared with the fragility and contes-


tation of born-digital data, web archives are
In light of the exponential growth in digital one of the last non-commercial knowledge
data characterizing the twenty-first century, devices that can be used to establish his-
future historians of our time will have to rely torical facts from web data. Web archives
on born-digital materials as primary sources capture snapshots of websites at a specific
for establishing historical facts. Yet born- point in time and preserve them for eter-
digital materials challenge historians’ well- nity. However, despite the fact that archived
established source criticism techniques used websites are stable, reliable, public and non-
for establishing facts based on the authentic- commercial digital primary sources, they are
ity, authorship and authority of documents, difficult to study with digital and computa-
for they are ephemeral, immaterial, fragile tional methods.
and easy to manipulate. For example, the In recent years, a ‘computational turn’ is
content of websites can be easily modified, advocated as a paradigmatic shift in the digi-
tweets are frequently deleted, the number of tal humanities. Arguably, such justification
social media comments and likes can be arti- seems unnecessary for the web, for there is no
ficially boosted through click farms and need for a ‘turn’ if the web is a computational
dubious sources spreading misinformation medium to begin with. On its face, many of
can be disguised as reliable news organiza- the computational techniques already in use
tions. With the commercialization of the by digital humanists and web researchers
web, more than ever before, web data is pri- (such as methods for text and image analy-
marily proprietary, and therefore subjected to sis, geo-mapping or network analyses) can
platforms’ policies and constraints. be easily used with archived web materials.
154 THE SAGE HANDBOOK OF WEB HISTORY

The question remains, if web archives are and digital humanities. Thereafter, we dis-
conceived, curated and distributed in digi- cuss specific methodological challenges put
tal form, why are they so difficult to study forward by web archives, and how computa-
computationally? tional methods may be helpful in overcoming
One of the reasons accounting for the diffi- some of them. We further outline four com-
culty in using computational methods on the putational techniques – drawn from our pre-
archived web is the distinctive ontological vious research projects – that illustrate ways
status of archived websites, compared with in which computational tools have enabled
other digital materials. According to Niels us to use the Internet Archive as a primary
Brügger, web archives differ from other digi- source, and to answer web historical ques-
tal material, whether they are analogue docu- tions. Finally, we discuss the limits of the
ments that have been digitized (for example, computational methods approach.
scans of historical newspapers, letters or
books), or born-digital documents (such as
copies of contemporary newspapers, letters,
books and other documents that were origi- THEORETICAL BACKGROUND
nally produced in digital form). Termed by
Brügger (2016) as ‘re-born digital materials’, Broadly defined, the term ‘computational
the archived web is a static representation of methods’ relates to the application of analyti-
an ephemeral medium, which would not have cal methods and techniques – originally
existed prior to its archiving. Brügger out- developed by computer scientists – to answer
lines further challenges regarding the unique research questions in other disciplines such
characteristics of the archived web, includ- as the humanities and the social sciences. It
ing, among others, the fact that differences in is fair to say that these mathematical tech-
the technical settings and collection policies niques, involving data processing, numerical
of web archiving institutions may result in analysis, simulation and modeling, algo-
different archived versions of the same web- rithms, visualization, artificial intelligence
sites. In addition, there are difficulties in rep- and other forms of computation, are agnostic
resenting the temporal aspects of the archived to disciplinary boundaries, as long as the data
snapshots, and in deciding which archived is digital. For example, a computer program
version of a given website should be used, designed to identify topics in a large corpus
where there is either scarcity of versions or of texts can be applied to analyze a corpus of
a multiplicity of nearly identical versions poems, of historical manuscripts or of corpo-
from similar time stamps. Put differently, rate emails.
web archives are not just another repository However, the introduction of computa-
of digital documents, but call for particular tional methods in the humanities and social
methods that would take into account their sciences has been far from trivial. David
unique characteristics. Berry (2011) describes a ‘computational
In this chapter, we argue that the use of turn’ in digital humanities as a result of three
computational methods for web archive paradigmatic waves: in the first, computa-
research is possible – and even desired – pro- tional techniques were primarily a means to
vided that the methods, tools and techniques digitizing texts and to building infrastruc-
are adapted to the specific characteristics of tures for digital archives, which, in turn, were
web archives, and the challenges involved analyzed by humanists as they would analyze
in studying them. To support this argument, any other non-digitized text; the second wave
we begin with an overview of the theoreti- looked at the digital historian’s toolkit as
cal justifications for the use of computational one that can be applied not only to digitized
methods in computational social sciences texts, but also to any other ‘born-digital’ data.
COMPUTATIONAL METHODS FOR WEB HISTORY 155

Finally, the third wave turned its gaze to com- by bringing together the humanist’s and
putation itself – the mediation of computer the social scientist’s research questions and
code and the digitality of the objects it works toolkits, and by studying born-digital texts,
with – as producing new ontologies and as well as data about almost any aspect of
effecting epistemic changes. human activity. On the face of it, arguments
Indeed, in the past decade, advances in com- for adopting a computational approach for
putational technologies opened new possibil- web history do not require further justifi-
ities for storing and processing data at a scale cation, as they draw from both the digital
which was unimaginable only a few years humanities and the computational social sci-
ago. The analytical possibilities that accom- ences. As Graham et al. note:
pany new computational analyses promise
new kinds of knowledge that were impossi- While big data is often explicitly framed as a prob-
ble before. Big data analysis and computa- lem of the future, it has already presented fruitful
opportunities for the past. The most obvious place
tional methods are termed the new historians’ where this is true is archived copies of the publicly
‘macroscope’ (Graham et al., 2015), alluding accessible Internet. The advent of the World Wide
to the revolutionary impact the invention of Web in 1991 has had revolutionary effects on
optical instruments such as the microscope human communication and organization, and its
and telescope had on the natural sciences in archiving presents a tremendous body of non-
commercialized public speech. There is a lot of it,
the seventeenth century. This promise is epit- however, and large methodologies will be needed
omized in two papers published in the journal to explore it. It is this problem that we believe
Science. In the first, Michel et al. (2011) put makes the adoption of digital methodologies for
forward the notion of ‘culturomics’, involv- history especially important. (2015: 27)
ing computational analyses of large volumes
of literary texts to quantitatively investigate The ability to perform new computational
cultural trends. Resonating Moretti’s dis- analyses at scale seems to be a unified justi-
tinction between close and distant readings fication for the adoption of computational
(2007), the researchers computationally ana- and digital methods for history, social sci-
lyzed a corpus of digitized texts containing ences and web research. Nevertheless, web
about 4% of all books ever printed from 1800 historical research – with the archived
to 2000, and demonstrated that this approach Internet as its primary source – still posits
can provide insights about fields as diverse somewhat different justifications for the use
as lexicography, the evolution of grammar, of computational methods, since the web is a
collective memory, the adoption of tech- computational medium to begin with. If the
nology, the pursuit of fame, censorship and ‘computational turn’ in the digital humani-
historical epidemiology. The second paper ties and the social sciences began with the
discusses the application of a computational digitization of printed texts on one hand, and
approach in the social sciences (Lazer et al., with the ‘datafication’ of social phenomena
2009). Here too, the researchers argue that on the other, for the web – there has never
the increasing datafication of almost every been a need for such a computational turn.
realm of human activity paves the road to The question then remains: if there is such
new research opportunities that can identify a structural fit between the web as a compu-
patterns, trends and proximities in huge and tational medium and computational methods
diverse datasets, and draw new insights in for studying its history, why are computa-
many fields in the social sciences. tional methods not widely used in web his-
Going back to Berry’s definitions of three torical research?
waves of digital humanities, it can be argued The answer to these questions may be
that the study of the web and its history can found in Brügger’s alternative periodization
be situated between the second and the third – of digital humanities, which focuses on the
156 THE SAGE HANDBOOK OF WEB HISTORY

evolution of digital material, rather than on and Merkl, 1999). Researchers have also
research practices (2016). Compared with started publishing results from longitudinal
digitized and born-digital materials, Brügger studies analyzing the structure and persis-
argues that web archives should be consid- tence of web pages, based on archived web
ered re-born digital materials, which grants data (Koehler, 2002). After the turn of the
it a distinct ontological status and calls for a millennium, studies in information retrieval
methodological treatment that takes its unique created large temporal web collections for the
characteristics into account. Therefore, the development and evaluation of retrieval tech-
apparent structural fit between the ‘digital- niques for the temporal web (Baeza-Yates
ity’ of the archived web and digital methods et al., 2004; Bordino et al., 2008; Chen and
designed for studying born-digital material, Roy, 2009; Chung et al., 2009). Historical
such as the live web, is misleading. analysis of the dynamic evolution of web-
In the next section, we follow Brügger’s sites’ markup style was picked up by infor-
argument about the distinctiveness of the mation scientists much later, less as a means
archived web as re-born digital material, for historical research and more as a method
by tracing the historical roots of the use of for evaluating the retrieval effectiveness of
computational methods in web history, and the modern web (Gyllstrom et al., 2012).
by outlining specific challenges that hinder Meanwhile, social science researchers –
the adoption of computational methods in mainly political scientists, and later on web
web archival research. We then offer possi- historians and other media scholars – started
ble computational solutions to some of these using web archives for studying social
challenges. and political phenomena on the web. The
early methods used for the social studies of
archived web materials were either manual,
Roots of Computational Methods or only semi-automated. Researchers in the
Netherlands, for example, started archiving
in Web History
websites of Dutch political parties and elec-
The establishment of the Internet Archive in toral campaigns (Voerman, 1998). In a series
1996 marked the emergence of web archiving of publications, Schneider and Foot put for-
practices, now shared by many institutions, ward the method of web sphere analysis, as
organizations and researchers around the ‘a framework for web studies that enables
world. Five years later, snapshots of archived analysis of communicative actions and rela-
websites became available for browsing and tions between web producers and users devel-
viewing through an interface developed by opmentally over time’ (Schneider and Foot,
the Internet Archive and aptly labeled the 2005: 2). The method involves dynamically
‘Wayback Machine’. In many ways, the selecting and archiving a set of web pages
introduction of the Wayback Machine has around a theme or an event, web pages which
paved the road for both qualitative and quan- are subsequently analyzed by triangulating
titative analyses of the archived web. hyperlink, content and qualitative analyses
Historically, however, computational (Foot and Schneider, 2010; Foot et al., 2003;
and quantitative methods for studying web Schneider and Foot, 2004).
archives were the realm of information sci- Gradually, along with the widespread
entists, and not historians, media scholars or establishment of web archiving initiatives at
social researchers. The early information sci- national libraries, as well as with develop-
ence literature on web archives offered com- ments in the ways the web was structured
putational tools for improving retrieval of and organized (i.e. through search engines)
archived web content, for example by using and with methodological developments in the
metadata (Rauber and Bina, 1999; Rauber fields of Internet research (such as automated
COMPUTATIONAL METHODS FOR WEB HISTORY 157

hyperlink analysis and web scraping), digital several terabytes, which is highly dependent
and computational methods for the study of on computational analyses as well as on infra-
web archives began to emerge (Rogers, 2015). structures suitable for big data analysis. In
For social scientists and historians who line with these developments are recent calls
engage with Internet histories, web archives to think of computational methods as the new
enabled very specific types of research based toolkit of the web historian (Milligan, 2012,
on a single archived web page as unit of 2016). Despite this, it should be noted that,
analysis. Rogers describes one such research to date, the ability to access and process an
scenario as website histories – performed entire national web archive as a unit of analy-
through screencast documentaries. Rogers sis is reserved to a handful of researchers who
claims that such a research scenario is native collaborate with national libraries or with the
to the medium, as it ‘makes explicit what the Internet Archive, while other members of the
Wayback Machine implies, with its invitation research community need to resort to avail-
to tell the history of a website and through it able resources such as the Wayback Machine,
the history of the web’ (Rogers, 2017: 90). which, as previously noted, is designed for
Following the idea of repurposing the viewing the history of a single website, rather
web’s natively digital objects and organiz- than for performing large-scale data analy-
ing devices to conduct ‘natively digital’ web ses. The following section further elaborates
research, Rogers and researchers from the on the challenges that hinder the widespread
Digital Methods Initiative also developed application of computational methods to
tools that output a list of direct links to all archived web materials.
archived snapshots of a given URL1, and
which extract a historical hyperlink network
from archived snapshots of a set of URLs2.
These tools have been used by Weltevrede CHALLENGES IN APPLYING
and Helmond (2012) to study the evolution COMPUTATIONAL METHODS FOR
of the Dutch blogosphere. WEB HISTORY
It is only around the time of the emergence
of the ‘computational turn’ in digital humani- Though the application of computational
ties and computational social sciences methods to study web history has become
that large-scale historical analyses of web more widespread in recent years, most of the
archives became more widespread. This has published literature on the use of web
been primarily facilitated by emerging col- archives for historical research focuses on
laborations between academic researchers, technical and theoretical challenges limiting
digital libraries and web archives, such as the application of computational and digital
the Big UK Domain Data for the Arts and methods for web history (Brügger, 2013;
Humanities (BUDDAH) project (Winters, Dougherty et al., 2010; Milligan, 2016).
2017), the Netherlands Web Archive Retrieval These challenges can be organized analyti-
Tools (WebART) project (Huurdeman et al., cally around four topics: issues of access, the
2013) and the Danish probing of a national limits of current interfaces to web archives,
web project (Brügger, 2017). These projects and problems of contextualization and com-
– all initiated around 2013 – aim to highlight pleteness of archived web materials. All four
the value of web archives as a source for arts challenges affect the historian’s ability to
and humanities researchers, develop access make sense of the primary materials. Below
tools and methods for research and provide we briefly discuss these challenges that
research scenarios – as well as research find- hinder the application of computational
ings – of historical studies of a national web approaches, or ‘distant reading’, to web
archive as a (huge) single unit of analysis of archives (Lin et al., 2014).
158 THE SAGE HANDBOOK OF WEB HISTORY

Access User studies indicate that most users


(whether they are journalists, litigators, lay
User statistics of web archives hosted by people or academic researchers) expect to
national libraries show relatively low access consult web archives in the same fashion they
rates. To name two examples, both the French consult the live web: through search (Ball,
and Danish web archives keep full domain 2010; Costa and Silva, 2009, 2011; Jatowt
harvests of the .fr and .dk domains, respec- et al., 2008; Meyer et al., 2011; Ras and van
tively, under legal deposit law, and access is Bussel, 2007). Although several web archives
restricted to researchers. In 2012, the National are already developing full text search inter-
Library of France reported 30–50 monthly faces, some of which were designed specifi-
consultations of the web archive, whereas in cally for data analysis purposes (Holzmann
Denmark only 20 researchers received access et al., 2017), these have still not been widely
between 2007 and 2012 (Schostag & Fønss- implemented.
Jørgensen, 2012; Stirling et al., 2012). Indeed, current web archiving infrastruc-
There is a growing divergence between the tures are not well-suited for computational
technological ability to preserve and archive research. The technical infrastructure of
the web, and the legal barriers related to cop- most web archives – the Internet Archive
yright and privacy that hinder online access among them – was built before the ‘compu-
to archived web materials. With the exception tational turn’. Arguably, web archiving infra-
of the Internet Archive and the national web structures are also mirrors through which
archives of Japan and Portugal, most web the history of the web can be studied. The
archiving initiatives held by national librar- Wayback Machine’s slogan, ‘surf the web
ies can only allow offline access, and only to as it was’, denotes that it was conceived at
those physically located in the libraries’ read- a period in the web’s history when ‘brows-
ing rooms. As a consequence, only a handful ing’ and ‘surfing’ were the common mode
of researchers who actively collaborate with of engaging with the web, before the takeo-
national web archives or with the Internet ver of the search paradigm (Ben-David and
Archive are able to conduct large-scale com- Huurdeman, 2014). From an infrastructural
putational analyses of the archived web3. point of view, modern technologies designed
to process petabytes of data, such as Hadoop
Distributed File System (HDFS) and Hadoop
Interfaces MapReduce, are underutilized. The lack of
those contemporary big data technologies as
The problem of legal access to archived web infrastructure for web archive preservation
materials is coupled with a problem of exist- and access makes it difficult for researchers
ing access tools and interfaces to web archives. to perform ‘distant reading’ and macro ana-
The design of the Internet Archive’s Wayback lytics on web archives (Lin et al., 2014).
Machine – currently the dominant access
interface to most institutional web archives
around the world – reflects the perception that
Contextualization
the unit of analysis is a single website, and
that archives are best consulted, or browsed, Another critique often made about the
one website at a time (Brügger, 2012a; Rogers, archived web as a primary source for histori-
2013, 2017). The single-site approach hinders cal research is a problem of contextualiza-
researchers from increasing their analytical tion. As Featherstone (2006) argued, digital
scope from one page to a collection of web- storage allows for expanding the boundaries
sites, or even the entire archive (Ben-David of the archive, as well as the boundaries of
and Huurdeman, 2014; Milligan, 2016). what is considered worthy of archiving. Yet
COMPUTATIONAL METHODS FOR WEB HISTORY 159

this poses tremendous difficulties for the (what has been preserved and what has been
traditional practices of source appraisal and lost), or of the outcome of the archiving pro-
provenance, which are central principles in cess (which sources are missing from a given
archival sciences. Web archives may include corpus of archived websites, and whether or
millions of web pages, yet the archived snap- not the archiving of websites is complete).
shots lack significant contextual information For example, a study by Thelwall and Vaughn
about the wider media ecology in which the (2004) shows significant differences in the
website operated when it was archived, and archival coverage of national webs on the
lack significant provenance information Internet Archive. Research by Ainsworth
(Ben-David and Amram, 2018; Dougherty et al. (2011) shows that an estimated 35%–
et al., 2010; Milligan, 2016; Rogers, 2013). 90% of websites have at least one archived
While web archives keep the seed lists and copy; however, these figures are challenged
crawl logs of the archiving process, this by Hale et al. (2017), who found that the
important metadata is usually not made avail- archival coverage of the popular website
able to researchers. As a result, little is TripAdvisor is only 24%; Alkwai et al.
known about the circumstances of archiving (2015) estimated the completeness of
a specific website at a particular point in archived snapshots of Arabic websites on the
time, about the specific archiving method, or Internet Archive by comparing them with a
whether or not a specific website has been list of URLs that were found in web directo-
archived deliberately (as part of a seed list), ries. Interestingly, Huurdeman et al. (2015)
or serendipitously (as a linked website to a put forward the notion of the ‘aura’ of a web
seed list) (Huurdeman et al., 2015)4. Even archive – representations of unarchived con-
though the lack of sufficient contextual infor- tent that are represented by hyperlinks and
mation and metadata limits the ability to anchor text data from archived URLs of a
draw significant findings from large-scale national web archive.
analyses of archived websites, as we soon While such assessments of archival cover-
demonstrate, a ‘distant reading’ of web age and completeness are crucial for contex-
archives may be fruitful in re-introducing tualizing web archive materials for historical
some of the missing context to archived web research, they can only be performed compu-
materials. tationally, since it is necessary to assess the
whole volume of a given archive (and, as is
the case with the studies referenced above,
estimate the size of the entire web) in order
Completeness
to indicate or infer what parts of it may be
Completeness is another issue web historians incomplete or missing.
need to address when working with web To summarize this section, the relatively
archives as primary sources, and where com- low usage rate of web archives for histori-
putational methods may be of crucial rele- cal research, and the adoption of computa-
vance. Niels Brügger (2012b) poignantly tional methods for this type of study, can be
reminds us that web archives are both incom- explained as a chicken and egg problem. On
plete – since many pages are not archived – the one hand, there is growing scholarly inter-
and too complete – since there may be many est in accessing and using archived web data
duplicates and archived versions of the same for various types of analysis. While there are
page. Making sense of archived web materi- emerging initiatives to develop frameworks
als is therefore aided by assessments of the for allowing access, data extraction and anal-
degree of their completeness, or incomplete- ysis of web archives5, limited access to most
ness. Here, completeness can be measured as existing web archives, along with the limits of
an evaluation of the archiving process itself interfaces and processing infrastructures, and
160 THE SAGE HANDBOOK OF WEB HISTORY

lack of sufficient contextualization informa- to overcome specific challenges related to


tion about appraisal, provenance and com- archived web materials and their current
pleteness, drive many historians away from interfaces, as outlined above. Furthermore,
using web archives for web history, and espe- they significantly increase the scope of anal-
cially from applying various data analysis ysis enabled by the Wayback Machine. While
methods that would be native to the medium. these methods are tailored to specific ques-
In the following section of this chapter, we tions, they serve here as a proof of concept,
attempt to evoke the web historians’ interest in which can then be adapted and used for other
applying simple computational methods when research purposes. Above all, they are listed
using web archives for historical research. here as a means to lower the threshold pre-
This is not proposed as a solution to the exist- venting web historians from engaging in
ing challenges described above; rather, such critical, born-digital, web historical research,
simple computational methods can be seen such that can critique the historical, technical
as a means to allow for distant readings of and infrastructural history of archived web-
archived web materials, by stretching the lim- sites, and that can further contextualize them
its of existing access channels, interfaces and as primary sources for historiography.
infrastructures of web archives. Our previous work has primarily focused
The techniques described in the following on the Internet Archive and the Wayback
section are based on simple scripts specifi- Machine as a primary source. The research
cally designed for researchers who may not questions – related to the history of national
be well versed in computational methods, and webs – were not computational in nature;
who do not have institutional access to web rather, we used computational methods where
archives as big data corpora. This is provided a methodological challenge hindered us from
so as to assist more researchers in attain- increasing the scope of the analysis or from
ing as many analytical benefits as possible, gaining wider contextual knowledge about
when using the Internet Archive’s Wayback our corpus. Below, we outline four research
Machine as their entry point for conducting scenarios where the use of relatively simple
research on the web’s pasts. python scripts has helped us enhance the util-
ity and scope of historical research with the
Wayback Machine.

COMPUTATIONAL SOLUTIONS TO THE


CHALLENGES OF ACCESS, INTERFACE,
1. Increasing the Scope: Finding
CONTEXTUALIZATION, APPRAISAL
more URLs
AND COMPLETENESS
As previously mentioned, the Wayback
In the previous sections of the chapter, we Machine allows for viewing and browsing
reviewed some of the literature that applied a archived snapshots, but does not lend itself
computational approach to studying web easily to analyses that wish to extend their
archives from diverse disciplines and per- scope beyond the single page as the unit of
spectives, ranging from assessments of the analysis. Furthermore, in order to use the
archives’ coverage, to the characterization of Wayback Machine, users are expected to
national webs. In this section, we report on know the URL they are looking for6. But
findings from our own work, by outlining what if one does not know which URL to
specific and simple computational techniques look for? And what if it is impossible to
that we developed and applied to web histori- know the URL address of historical websites
cal research. These techniques are designed that can no longer be found on the live web?
COMPUTATIONAL METHODS FOR WEB HISTORY 161

This is the case of the .yu domain, the from the Internet Archive allowed us to ask
historical country code top-level domain of further questions about the history of the .yu
the former Yugoslavia, which was entirely domain, as well as about the appraisal of the
removed from the Internet’s Domain Name Internet Archive as a primary source for web
System (DNS) in 2010. DNS is the hierarchi- history, as outlined below.
cal universal system responsible for the res-
olution of web addresses from IP numbers.
As such, if the IP addresses associated with 2. Source Critique: Comparing
a given domain are removed from the DNS, Web Cultures of Seed Lists
they cannot be resolved, even if the websites
are still hosted on a server connected to the As noted by Rogers (2017), archived web-
Internet. sites face scrutiny as a primary source for
Our research question was relatively broad: historiography, and for legal evidentiary pur-
what does the web ‘remember’ of its deleted poses. Such scrutiny is realized through tra-
past? Is it possible to reconstruct from the ditional appraisal of sources in historiography,
Wayback Machine a deleted national web, if such as determining the originality of a docu-
the live web cannot disclose any .yu URL as ment, the authenticity of authorship and the
a starting point to the archive? (Ben-David, source’s reputation. While web archives
2016). To answer this question, we used the complicate all of the above (Brügger, 2012a;
hyperlinked structure of archived websites Rogers, 2017), the question is which tech-
to our advantage. Using lists of seed URLs niques can be used to appraise archived web-
which were obtained from an expert, we sites as sources? While the screencast
used a python script that fetched all the seeds documentary of a single website is one pos-
from the Wayback Machine, and snowballed sible venue for source criticism, we put for-
their outlinks, searching for more .yu URLs7. ward a technique for appraising lists of
We iterated the method several times until URLs, as starting points or seeds for archival
no new .yu URLs were found. This outlink web research.
extraction method enabled the scope of the Historical analysis of archived web materi-
analysis to be increased from a seed list of als that extends beyond the single website as
about 4,000 hosts, to 17,460 unique websites its unit of analysis greatly depends on a seed
in the .yu domain, estimated to comprise list of initial URLs from which the corpus is
53% of the registered addresses in the .yu created. Different URL lists may lead to very
domain at its peak. The partial reconstruction different corpora, even though they may be
does not necessarily indicate that the undis- seen as covering the same topic. For example,
covered portions of the domain represent a a study by Mataly (2013) compared corpora
limitation of the method; instead, it may indi- created about the former UK Prime Minister
cate that almost half of the domain was not Margaret Thatcher using three born-digital
archived by the Internet Archive, as we will source lists: curated website collections from
soon explain. the UK Web Archive, temporal search on
With the extracted dataset, we performed Google (with query results set to past years)
historical network analysis (a method out- and a domain harvest of the .uk domain. The
lined in Chapter 10 of this Handbook; see differences between the generated lists were
also Weltevrede and Helmond (2012) and rather striking: the corpus created by search-
Hale et al. (2014)) to examine the strength of ing the UK Web Archive’s collections was
ties within the .yu domain, as well as among primarily governmental; the sources referring
national domains of former-Yugoslavia to Thatcher that appeared on Google with
republics. Furthermore, the data we extracted time stamps from the exact same period were
162 THE SAGE HANDBOOK OF WEB HISTORY

primarily commercial (large newspaper and Vaughn (2004), Ainsworth et al. (2011),
e-commerce platforms) and the corpus of the Alkawi et al. (2015) and Hale et al. (2017).
domain harvest was more diverse, containing In the case of the .yu domain, we assessed
a variety of types of sources. the number of accessible archived snapshots
This example shows that seed lists are of .yu websites on the Internet Archive, com-
embedded in specific web cultures, which pared with the number of .yu URLs recov-
may have internal preferences or biases. ered by our outlink extraction method. Here,
To understand these cultures and potential we queried the Internet Archive’s Wayback
biases, in our case study of the reconstruc- Machine CDX server API9 and documented
tion of the .yu domain from the Internet the response code for each URL – indicat-
Archive, we used a fairly simple technique ing whether it can or cannot be accessed on
for comparing the seed lists that generated the Wayback Machine. Using a python script
the reconstruction. This simple technique, and the urllib2 module, we sent GET que-
initially coined by the Digital Methods ries to the REST API of the Internet Archive
Initiative as a triangulation tool8, compared CDX server to lookup captures of our known
the source lists from which our reconstruc- Yugoslav URLs. With the help of filtering
tion began (Google search results, Wikipedia, and collapsing features of the CDX server,
a computer magazine, and an ISP). First, we we requested only successful snapshots and
removed duplicates from each list, and then selected only one for each year. Using this
we identified the URLs shared among source method, we found that the level of archival
types, and the URLs unique to each source coverage of the historical domain is a func-
type. The ties between the shared and unique tion of temporal proximity to the live web:
sources can be read as a table, or visualized the shorter the temporal gap between a live
as a network graph. In the case of the .yu website and its archived snapshot, the better
domain, we found that, relative to the other its archival coverage on the Internet Archive.
source lists, Wikipedia was the most diverse
source to be used as a seed list for recon-
structing topical corpora from the Internet 4. Culturomics: Histories of
Archive. Non-Textual Elements
At the outset of this chapter we argued that
3. Assessing Incompleteness some computational methods for textual
analysis can be applied to any type of text,
Earlier in this chapter, we argued that an independently of their affiliation to various
assessment of the (in)completeness of web academic disciplines. This also applies to the
archives is necessary for contextualizing his- archived web, where, for example, methods
torical research done with archived web such as Topic Modeling have already been
materials. It should be noted that assessments used to analyze the evolution of the textual
of incompleteness greatly depend on meta- content of historical websites over time
data that is often unavailable or inaccessible (Milligan, 2016). At the same time, we also
for researchers, such as information about argued that the history of websites not only
the collection policy, or access to the archiv- comprises text, but also other elements docu-
ing crawler’s logs. menting the history of the medium, and the
Despite these limits, computational meth- socio-technical ‘ecology’ in which a given
ods may give researchers the ability to assess website operated at the time it was archived
the whole by comparing the archived dataset (Berry, 2012; Fuller, 2005). We also men-
against different datasets and data sources, tioned the ‘culturomic’ approach (Michel
as was previously done by Thelwall and et al., 2011), which calls for analyzing
COMPUTATIONAL METHODS FOR WEB HISTORY 163

cultural patterns in large volumes of data. The following section concludes this chapter
Although the archiving process does not and by discussing the limits of the computational
cannot capture the full ecosystem in which approach to web historical research.
the live website operated, it still captures a
wealth of data that can be of significant
importance to web historians – especially
those interested in technological and cultural CONCLUSIONS
histories. Apart from text, web archives con-
tain images, video, various document for- Although a computational approach brings
mats such as MS Word, PDF or PowerPoint with it a revolutionary promise of new para-
presentations; each HTML page also con- digms of knowledge, and of writing and
tains code, within which one can find histori- reading history, most of the advocates of this
cal ‘treasures’ such as embedded approach argue that it should be adopted
advertisements and cookies. Is it possible to judiciously – not necessarily as a replace-
conduct large-scale culturomics analysis of ment to existing humanist and social scien-
non-textual elements found in the Wayback tific methods, but rather as an important aid,
Machine? a necessary instrument for gaining knowl-
Inspired by Lev Manovich’s work on cul- edge in a datafied world (Berry, 2012;
tural analytics (2009, 2012), we also argue Graham et al., 2015). In this chapter, we
that large-scale analyses of the non-textual charted some of the new computational
elements of web archives may open up a methods-based analysis strategies and
variety of new historical analyses. To this approaches to studying archived websites
end, we developed a computational tool accessed from the Wayback Machine. We
for analyzing the evolution of color in non- have shown that despite the limits of the
photographic images of the reconstructed access interface – which is designed for
.yu domain, focusing on archived snapshots viewing single archived pages rather than a
dated between 1997 and 2000 (Ben-David distant reading of millions of them – python
et al., 2018). We used a technique based scripts can increase the analytical scope of
on Machine Learning, to summarize the historical web research, as well as answer
three dominant colors of each of the non- important questions that improve the contex-
photographic images in the reconstructed tualization of archived web data (such as
domain, in order to study the history of the evaluation of sources and of archival cover-
ties between web design, cultural digital age). We also demonstrated that computa-
practices and preferences, and nationalism. tional methods can turn archived websites
Specifically, we compared the three-color into a fascinating playground for identifying
histograms summarizing the images of the cultural trends by using the wealth of multi-
.yu domain with the colors of the Yugoslav modal information they harbor.
flag, and calculated the overall ‘distance’ of Despite the benefits of the computational
the entire domain from the colors of the flag approach outlined above in stretching the
during the Kosovo war10. analytical possibilities of historical research
To summarize this section, it is evident that using the Wayback Machine, there are also
none of the computational techniques that several limits to consider. Chief of them is
we used for studying the history of the .yu the production of analytical artifacts and
domain are new. Rather, we adapted existing their misinterpretation as historical facts. As
methods for web research and data analysis, previously mentioned, awareness of the cul-
and tailored them to meet the challenges of tures of sources and devices from which web
performing large-scale analysis of data with archive research begins is crucial to the inter-
the Internet Archive’s Wayback Machine. pretation of the archival corpora and of the
164 THE SAGE HANDBOOK OF WEB HISTORY

historical networks that each may produce. A 6  Since 2016, the new version of the Wayback
possible way of overcoming the problem of Machine allows for basic keyword search on
phrases from the URL, yet full site search is
analytical artifacts is triangulation of sources.
not yet available. See https://blog.archive.
In addition, analyses aided by computational org/2016/10/24/faqs-for-some-new-features-
methods, tools and techniques are able to available-in-the-beta-wayback-machine/ (visited
answer specific quantifiable questions. They 01.02.18).
are good at ‘probing web archives’ (Brügger, 7  Open Media and Information Lab, The Open
University of Israel (2017). Internet Archive Link
2017) when dealing with huge corpora of
Extractor. https://github.com/omilab/internet-
archived websites, and when there is no archive-link-extractor (visited 01.02.18).
alternative way of ‘knowing’ the archive as a 8  Digital Methods Initiative (2008). Triangulation.
whole. Nevertheless, this knowing is limited https://wiki.digitalmethods.net/Dmi/ToolTriangu-
to quantifiable information, and macro analy- lation (visited 01.02.18).
9  The Wayback CDX server API is a standalone HTTP
ses of millions of documents, or of petabytes
servelet that serves the index that the Wayback
of data, do not always suffice for answering Machine uses to lookup captures. https://github.com/
deeper questions about historical processes internetarchive/wayback/blob/­master/wayback-
(Winters, 2017). Further critical thinking cdx-server/README.md. (visited 01.02.18). For
and reflexive analysis are required to place further research scenarios using the CDX file, see
Milligan (2016).
the outputs of computational tools in context
10  Open Media and Information Lab, the Open
(Schafer et al., 2016). University of Israel (2016). Image Color Analysis.
https://github.com/omilab/image-color-analysis
(visited 22 September 2017).

Notes
1  Digital Methods Initiative, Wayback Machine
Link Ripper, https://wiki.digitalmethods.net/Dmi/ REFERENCES
ToolInternetArchiveWaybackMachineLinkRipper
(visited 01.02.18).
2  Digital Methods Initiative, Internet Archive Way- Ainsworth, S.G., AlSum A., SalahEldeen H.,
back Machine Network per Year, https://wiki. Weigel, M.C., and Nelson, M. (2011) ‘How
digitalmethods.net/Dmi/ToolInternetArchiveWay- much of the web is archived?’ In: Proceed-
backMachineToNetwork (visited 01.02.18). ings of the 11th Annual International ACM/
3  There are several exceptions to this claim. The IEEE Joint Conference on Digital Libraries,
UK web archive is freely accessible online for pp. 133–136.
the collections made before 2014, whereas Alkwai, L.M., Nelson, M.L., and Weigle, M.C.
the ccTLD archivings of the .uk domain (from (2015) ‘How well are Arabic websites
2014 onwards) can only be accessed onsite at
archived?’ In: Proceedings of the 15th ACM/
one of five national libraries. As for the Dan-
ish web archive, researchers who receive per-
IEEE-CE Joint Conference on Digital Libraries,
mission can browse the web archive remotely, 2015, pp. 223–232.
while MA students are allowed access to the Baeza-Yates, R., Lalanne, F., Castillo, C., and Dupret,
web archive only in the reading rooms of the G. (2004) ‘Comparing the Characteristics of the
national library. Korean and the Chilean Web’ Korea-Chile IT
4  Since 2016, this problem is slightly mitigated Cooperation Center ITCC. Technical Report.
in the new interface of the Wayback Machine, Available at: http://chato.cl/papers/baeza_04_
which now provides information about the collec- comparing_chilean_web_korean_web.pdf
tion of web captures associated with the specific Ball, A. (2010) DCC state of the art report: Web
web crawl the capture came from. See https://
archiving. Available at:http://www.dcc.ac.uk/
blog.archive.org/2016/10/24/faqs-for-some-new-
features-available-in-the-beta-wayback-machine/
sites/default/files/documents/reports/
(visited 01.02.18). sarwa-v1.1.pdf
5  See, for example, the Archives Unleashed Tool- Ben-David, A. (2016) ‘What does the web
kit (Lin et al., 2017) and the ArchiveSpark project remember of its deleted past? An archival
(Holzmann et al., 2016). reconstruction of the former Yugoslav
COMPUTATIONAL METHODS FOR WEB HISTORY 165

top-level domain’, New Media & Society, a new kind of historical source’, In: Gerard
18(7): 1103–1119. Goggin and Mark McLelland (eds), The Rout-
Ben-David, A. and Amram, A. (2018) ‘The ledge Companion to Global Internet Histo-
Internet Archive and the socio-technical con- ries. New York: Routledge. pp. 61–73.
struction of historical facts’, Internet Histo- Chen, L. and Roy, A. (2009) ‘Event detection
ries, 2(1–2): 179–201. from flickr data through wavelet-based spa-
Ben-David, A., Amram, A., and Bekkerman, R. tial analysis’, In: Proceedings of the 18th
(2018) ‘The colors of the national web: ACM Conference on Information and Knowl-
Visual data analysis of the historical yugoslav edge Management, 2009, pp. 523–532.
web domain’, International Journal on Digi- Chung, Y.-J, Toyoda, M. and Kitsuregawa, M.
tal Libraries, 19(1): 95–106. (2009) ‘A study of link farm distribution and
Ben-David, A. and Huurdeman, H.C. (2014) evolution using a time series of web snap-
‘Web archive search as research: Methodo- shots’, In: Proceedings of the 5th Interna-
logical and theoretical implications’, Alexan- tional Workshop on Adversarial Information
dria, 25(1–2): 93–111. Retrieval on the Web, 2009, pp. 9–16.
Berry, D. (2011) ‘The computational turn: Costa, M. and Silva, M.J. (2009) ‘Towards infor-
Thinking about the digital humanities’, Cul- mation retrieval evaluation over web
ture Machine, 12: n.p. Available at: http:// archives’, In: Proceedings of the SIGIR 2009
www.culturemachine.net/index.php/cm/arti- Workshop on the Future of IR Evaluation,
cle/viewarticle/440 2009, pp. 37–38.
Berry, D. (2012) Life in Code and Software: Costa, M. and Silva, M.J. (2011) ‘Characteriz-
Mediated Life in a Complex Computational ing search behavior in Web archives’, In:
Ecology. Open Humanities Press. TWAW, pp. 33–40.
Bordino, I., Boldi, P., Donato, D. Santini, M., Dougherty, M., Meyer, E.T., McCarthy Madsen,
and Vigna, S. (2008) ‘Temporal evolution of C., van den Heuvel, C., Thomas, A., and
the UK Web’, In: Proceedings of the IEEE Wyatt, S. (2010) ‘Researcher engagement
International Conference on Data Mining with web archives: State of the art’. Joint
Workshops, ICDM Workshops, 2008, Information Systems Committee Report.
pp. 909–918. Featherstone, M. (2006) ‘Archive’, Theory, Cul-
Brügger, N. (2012a) ‘Web history and the Web ture & Society, 23(2–3): 591–596. DOI:
as a historical source’, Zeithistorische 10.1177/0263276406023002106.
Forschungen, 9(2): 316–325. Foot, K. and Schneider, S. (2010) ‘Object-ori-
Brügger, N. (2012b) ‘When the present Web is ented web historiography’, In: Niels Brüg-
later than the past: Web historiography, digi- ger (ed.), Web History. New York: Peter
tal history, and Internet Studies’, Historical Lang. pp. 61–80. Available at: http://fac-
Social Research/Historische Sozialforschung: ulty.washington.edu/kfoot/Publications/
102–117. Available at: https://www.ssoar. Foot_Schneider.pdf (accessed 22 Septem-
info/soar/bitstream/handle/document/38378/ ber 2017).
ssoar-hsr-2012-4-brugger-When_the_pre- Foot, K., Schneider, S.M., Dougherty, M.,
sent_web_is.pdf?sequence=1 (accessed 20 Xenos, M., and Larsen, E. (2003) ‘Analyzing
October 2015). linking practices: Candidate sites in the 2002
Brügger, N. (2013) ‘Web historiography and US electoral web sphere’, Journal of Com-
Internet Studies: Challenges and perspec- puter-Mediated Communication 8(4). Avail-
tives’, New Media & Society, 15(5): able at: https://doi.org/10.1111/j.1083-6101.
752–764. 2003.tb00220.x.
Brügger, N. (2016) ‘Digital humanities in the Fuller, M. (2005) Media Ecologies: Materialist
21st century: Digital material as a driving Energies in Art and Technoculture. London:
force’, DHQ: Digital Humanities Quarterly: MIT Press.
10(3). Available at: http://www.digitalhuman- Graham, S., Milligan, I., and Weingart, S.
ities.org/dhq/vol/10/3/000256/000256.html (2015) Exploring Big Historical Data: The
Brügger, N. (2017) ‘Probing a nation’s web Historian’s Macroscope. London: Imperial
domain: A new approach to web history and College Press.
166 THE SAGE HANDBOOK OF WEB HISTORY

Gyllstrom, K., Eickoff, C., de Vries, A.P., and Lazer, D., Pentland, A. (Sandy), Adamic, L.,
Moens, M.F. (2012) ‘The downside of Aral, S., Barabasi, A. L., Brewer, D., Christa-
markup: Examining the harmful effects of css kis, M., Contractor, N., Fowler, J., Gutmann,
and javascript on indexing today’s web’, In: M., Jebara, T., King, G., Macy, M., Roy, D.,
Proceedings of the 21st ACM International and Van Alstyne, M. (2009) Life in the net-
Conference on Information and Knowledge work: the coming age of computational
Management, 2012, pp. 1990–1994. social science. Science (New York, N.Y.),
Hale, S.A., Blank, G., and Alexander, V.D. 323(5915), 721–723. http://doi.org/10.1126/
(2017) ‘Live versus archive: Comparing a science.1167742. (2009) ‘Life in the net-
web archive to a population of web pages’, work: The coming age of computational
In: Niels Brügger and Ralph Schroeder (eds), social science’, Science 323(5915): 721–723.
Web as History: Using Web Archives to Available at: https://www.ncbi.nlm.nih.gov/
Understand the Past and the Present. pmc/articles/pmc2745217/ (accessed 22
London: UCL Press. pp. 45–61. Available at: September 2017).
http://www.jstor.org/stable/j.ctt1mtz55k.8. Lin, J., Kraus, K., and Punzalan, R. (2014) ‘Sup-
Hale, S.A, Yasseri, T., Cowls, J., Meyer, E.T., porting “distant reading” for web archives’,
Schroeder, R., and Margetts, H.. (2014) Proceedings of Digital Humanities:
‘Mapping the UK webspace: Fifteen years of 239–241.
British universities on the web’, In: Proceed- Lin, J., Milligan, I., Wiebe, J., and Zhou, A.
ings of the 2014 ACM Conference on Web (2017) ‘Warcbase: Scalable analytics infra-
Science, 2014, pp. 62–70. structure for exploring Web archives’, Jour-
Holzmann, H., Goel, V., and Anand, A. (2016) nal on Computing and Cultural Heritage,
‘Archivespark: Efficient web archive access, 10(4): 1–30.
extraction and derivation’, In: Digital Librar- Mataly, J. (2013) ‘The Three Truths of Margaret
ies (JCDL), IEEE/ACM Joint Conference on, Thatcher: Creating and Analyzing Archival
2016, pp. 83–92. Artefacts’. M.A dissertation, University of
Holzmann, H., Nejdl, W., and Anand, A. (2017) Amsterdam.
‘Exploring Web archives through temporal Manovich, L. (2009) The practice of everyday
anchor texts’, In: Proceedings of the 2017 (media) life: From mass consumption to mass
ACM on Web Science Conference, 2017, pp. cultural production? Critical Inquiry, 35(2):
289–298. 319–331.
Huurdeman, H.C., Ben-David, A., and Sammar, Manovich, L. (2012) How to compare one mil-
T. (2013) ‘Sprint methods for web archive lion images? In: David M. Berry (ed.), Under-
research’, In: Proceedings of the 5th Annual standing Digital Humanities. London:
ACM Web Science Conference, 2013, pp. Palgrave Macmillan. pp. 249–278.
182–190. Meyer, E.T., Thomas, A., and Schroeder, R.
Huurdeman H.C., Kamps, J., Samar, T., de (2011) ‘Web archives: The future (s)’. Avail-
Vries, A.P., Ben-David, A., and Rogers, R. able at SSRN: https://ssrn.com/abstract=
(2015) ‘Lost but not forgotten: Finding pages 1830025 or http://dx.doi.org/10.2139/ssrn.
on the unarchived web’, International Jour- 1830025.
nal on Digital Libraries 16(3–4): 247–265. Michel, JB., SHEN, Y.K., Presser Aiden, A.,
DOI: 10.1007/s00799-015-0153-3. Veres, A. Gray, M.K., The Google Books
Jatowt, A., Kawai, Y., Ohshima, H., and Kat- Team, Picket, J.P., Hoiberg, D., Clancy, D.,
sumi, T. (2008) ‘What can history tell us?: Norvig, P., Orwant, J., Pinker, S., Nowak,
Towards different models of interaction with M.A., and Liberman Aiden, E. (2011) ‘Quan-
document histories’, In: Proceedings of the titative analysis of culture using millions of
19th ACM Conference on Hypertext and digitized books’, Science 331(6014): 176–182.
Hypermedia, 2008, pp. 5–14. Available at: http://science.sciencemag.org/
Koehler, W. (2002) ‘Web page change and content/331/6014/176.short (accessed 22
persistence – A four-year longitudinal study’, September 2017).
Journal of the Association for Information Milligan, I. (2012) ‘Mining the “Internet grave-
Science and Technology, 53(2): 162–171. yard”: Rethinking the historians’ toolkit’,
COMPUTATIONAL METHODS FOR WEB HISTORY 167

Journal of the Canadian Historical Associa- Schneider, S.M. and Foot, K.A. (2004) ‘Web
tion, 23(2): 21–64. campaigning by US presidential primary can-
Milligan, I. (2016) ‘Lost in the infinite archive: didates in 2000 and 2004’, In: John C.
The promise and pitfalls of web archives’, Tedesco and Andrew Paul Williams (eds), The
International Journal of Humanities and Arts Internet Election: Perspectives on the Web in
Computing 10(1): 78–94. Campaign. Lanham: Rowman & Littlefield
Moretti, F. (2007) Graphs, Maps, Trees: Abstract Publishers. pp. 21–36.
Models for Literary History. New York: Verso. Schneider, S.M. and Foot, K.A. (2005) ‘Web
Ras, M. and van Bussel, S. (2007) ‘Web archiv- sphere analysis: An approach to studying
ing user survey’. National Library of the online action’, In: Christine Hine (ed.), Virtual
Netherlands (Koninklijke Bibliotheek). Avail- Methods: Issues in Social Research on the
able at: https://www.kb.nl/sites/default/files/ Internet. New York: Berg. pp. 157–170.
docs/kb_usersurvey_webarchive_en.pdf Schostag, S. and Fønss-Jørgensen, E. (2012)
(accessed 22 September 2017). ‘Webarchiving: Legal deposit of internet in
Rauber, A. and Bina, H. (1999) ‘A metaphor Denmark. A curatorial perspective’, Microform
graphics based representation of digital & Digitization Review, 41(3–4): 110–120.
libraries on the world wide web: Using the Stirling, P., Chevallier, P., and Illien, G. (2012)
libviewer to make metadata visible’, In: Pro- ‘Web archives for researchers: Representa-
ceedings. Tenth International Workshop on tions, expectations and potential uses’, D-Lib
Database and Expert Systems Applications. 18(3/4), Available at: http://www.dlib.org/
IEEE, pp. 286–290. dlib/march12/stirling/03stirling.html
Rauber, A. and Merkl, D. (1999) ‘Mining text Thelwall, M. and Vaughan, L. (2004) ‘A fair his-
archives: Creating readable maps to structure tory of the Web? Examining country balance
and describe document collections’, In: Jan M. in the Internet Archive’, Library & Informa-
Żytkow and Jan Rauch (eds), Principles of Data tion Science Research, 26(2): 162–176.
Mining and Knowledge Discovery: Third Euro- Voerman, G. (1998) ‘Dutch political parties on
pean Conference, PKDD’99 Prague, Czech the Internet’, ECPR News, 10(1): 8–9.
Republic. Berlin: Springer. pp. 524–529. Weltevrede, E. and Helmond, A. (2012) ‘Where
Rogers, R. (2013) Digital Methods. Cambridge, do bloggers blog? Platform transitions within
MA: MIT Press. the historical Dutch blogosphere’, First Monday
Rogers, R. (2015) ‘Digital methods for Web 17(2). Available at: http://first monday.org/ojs/
research’, In: Robert A. Scott and Marlis C. index.php/fm/article/view/3775/3142
Buchmann (eds), Emerging Trends in the (accessed 23 December 2015).
Social and Behavioral Sciences. DOI: Winters, J. (2017) ‘Breaking in to the mainstream:
10.1002/9781118900772. Demonstrating the value of internet (and web)
Rogers, R. (2017) ‘Doing Web history with the histories’, Internet Histories 1(1–2): 173–179.
Internet Archive: Screencast documentaries’, DOI: 10.1080/24701475.2017.1305713.
Internet Histories, 1: 160–172.
Schafer, V, Musiani, F., and Borelli, M. (2016)
‘Web archiving, governance and STS’. French
Journal of Media Research, 6: 1–23.
13
Visualizing Historical Web Data
Justin Joque

The emerging field of web history offers ­isualization techniques, especially for
v
unique opportunities to understand media at representing and exploring large datasets.
scales and levels of complexity never before While visualization techniques offer a
possible. The expansion of various web powerful set of methods for dealing with the
archives provides massive and complex data- diverse types of data that can be extracted
sets that can reveal important insights into from historical web data, it is not always clear
culture and our practices of computation. As how to best display this data. The amount of
Milligan (2012) argues, the availability of data, skewed distributions, complex interac-
these massive and rich data sources opens up tions and the difficulty of displaying change
the space for a third wave of digital history, over time all confound to make visualization
working directly on these already digitized, a difficult task. In light of these challenges,
hyperlinked and temporally differentiated this chapter attempts to outline some critical
texts. To navigate, analyze and comment issues and suggestions for visualizing his-
upon these documents requires a set of tools torical web data, drawing heavily on meth-
and methods that are quickly being devel- odologies developed in data visualization
oped in history and other humanities fields. and related fields, such as network analysis,
While many of these methods and techniques text mining, etc. The chapter explores gen-
draw heavily from other fields, such as com- eral data visualization techniques, network
puter science, information retrieval, data visualization, text analysis visualization and
mining, etc., their use and translation into the concludes with a discussion of visualizing
humanities requires additional considera- change over time.
tions. One of the key methods in this third It should be noted that data visualization
wave of digital history is the use of data is as much an artistic process concerned with
VISUALIZING HISTORICAL WEB DATA 169

questions of design and legibility as it is a sci- serve us well. Histograms do well for show-
entific practice for representing data. There ing distributions of single variables, by creat-
is no simple formula or single process that ing ‘bins’ for the data (e.g. 0–5, 5–10) and
can represent web data or any other type of representing the count of data points by verti-
data immediately and ‘correctly’. Effective cal bars; bar charts excel at showing the
data visualization requires experimentation composition of a dataset broken into catego-
and knowledge of one’s data. Furthermore, ries, etc. It would be imprudent to attempt to
the audience and purpose of data visualiza- provide an exhaustive list of such graphs
tion have immense impacts on best practices. here, but a few general principles will be
Data visualization generally serves three helpful as one explores datasets, graphs and
different purposes: monitoring data to track software for visualization.
ongoing processes, exploring data to under- Any single static two-dimensional data
stand it and communicating data to explain visualization can show at most approximately
what we have discovered to others. Working eight variables. This definitely does not mean
with historical data it is most likely research- one should always strive to show eight vari-
ers will have the latter two goals in mind. In ables at a time, as even that limited number
differentiating these purposes, it is important can quickly make seeing relationships, pat-
to remember who the audience of a given vis- terns or structure difficult. In his 1964 book
ualization will be. While one may use com- Semiology of Graphics, Bertin divides these
plex visualizations with little explanation as eight variables into two positional variables
part of one’s own research, if the intention (e.g. x and y or ϕ and r in polar coordinates)
of a visualization is to communicate some and six ‘retinal’ variables: color, size, shape,
result it is of the utmost importance that what value (i.e. light to dark), orientation and tex-
is being displayed is comprehensible to the ture. He further suggests that some of these
audience and for the media through which it variables are better at showing different types
will be displayed (e.g. it would be imprudent of data (e.g. shapes are well suited for cat-
to create a complex high-resolution graph for egorical data). I will refrain from recounting
a slideshow that the audience will only see Bertin’s entire semiology here, but one should
for a few seconds). With these considerations consider which visual variables can best rep-
in mind, it is possible to outline a number resent the type of data that is being visual-
of issues and possible solutions that arise in ized. Furthermore, spending some time with
working with historical web data. Bertin’s book should be high on any list of
ways to become skilled in data visualization.
Tufte (1983: 93) also provides a number
of important insights that are helpful in con-
VISUALIZATION IN GENERAL structing any visualization. He sums up one
of his major contributions to the field of data
Websites offer diverse types of metadata and visualization by stating, ‘A large share of ink
data that lend themselves to well-known on a graphic should present data-information,
types of graphs. Exploring and presenting the ink changing as the data change. Data-ink
data in such straightforward ways can often is the non-erasable core of a graphic, the non-
be more compelling than complex visualiza- redundant ink arranged in response to vari-
tions whose ultimate aim is not entirely clear. ation in the numbers represented’. Building
For example, we may be interested in the on this he further defines the data-ink ratio
relationship between the number of links on as the ratio of ink used to display data to
a page and the amount of text; in this case a the total amount of ink used in a visualiza-
standard scatterplot, which simply places a tion. We could of course update this from
dot where data lies along two axes, would ink to non-background colored pixels; but
170 THE SAGE HANDBOOK OF WEB HISTORY

the principle remains the same. Good visu- and intertextuality. It is common practice to
alizations should tend to maximize this ratio represent these structures as network dia-
as much as possible, eliminating superfluous grams. While the exponential growth of net-
marks on charts. works as a means of visual representation
Wainer (1984: 139) summarizes the con- and as a metaphor for understanding every-
clusion from Tufte’s metric by stating, thing from capitalism (Castells, 2011) to sci-
‘Although arguments can be made that high ence and society (Latour, 1993, 2005) to
data density does not imply that a graphic will computers and beyond (Barabási, 2014) may
be good, nor one with low density bad, it does make network diagrams seem instantly obvi-
reflect on the efficiency of the transmission ous, there is a risk that this ubiquity actually
of information. Obviously, if we hold clar- serves to obfuscate their nuance and func-
ity and accuracy constant, more information tion. Thus, taking a moment to reiterate
is better than less’. In short, visualization is exactly what network diagrams represent can
most effective when the viewer is presented be fruitful in preparation for deciding the
with as much information as possible, and as most efficacious way to represent data.
little ink as possible, provided that neither of Networks consist of two distinct elements:
these stand in the way of clarity.1 Thinking in nodes and edges. Nodes generally repre-
terms of Bertin’s list of visual variables, this sent things and edges represent some sort of
does not mean that one should always attempt relationship between things. For example,
to use all eight visual variables, as that is we could construct a network out of web-
often an easy way to destroy the clarity of any pages and relationships could represent links
graph. Rather, one should be judicious in their between pages. Such a network would likely
use, providing only the information that is rel- be represented as directed, since a link goes
evant but presenting it as directly as possible. from one page to another. Other networks,
This raises a related consideration, espe- such as a co-author network, may be undi-
cially for comparing aggregate information: rected since writing a paper with someone
when representing small amounts of data, implies that they wrote a paper with you.
tables often outperform charts. Tables, or Additionally, a network can either be made
even sentences describing data, can maxi- up of one type of thing (e.g. authors) or mul-
mize the data-ink ratio for small amounts of tiple types of things (e.g. authors and articles)
data and have the added benefit of present- and often it is possible to convert the latter
ing the reader with the raw information from into the former (e.g. by directly linking co-
which they can easily see relationships. With authors rather than linking them to their arti-
these general considerations in mind, let us cles). This immediately suggests one of the
move onto specific types of graphs (and their first important decisions that can drastically
challenges) that often arise in working with alter network visualizations: what the nodes
historical web data. and edges will be. As both Foot (2006) and
Brügger (2009) argue, we should think care-
fully about exactly what the object of study
is when we are researching ‘the web’ and its
NETWORKS history. Creating a network out of domains or
subdomains rather than individual pages can
Some of the most interesting and readily create networks that provide different infor-
computable data available from websites and mation and look substantially different. In
web archives are the link structures. Arguably, addition, it should be noted that multiple con-
for the first time ever, historians, literary nections between two nodes (e.g. multiple
scholars and others can easily explore mas- links from a single page to another page) can
sive networks of citation, reference either be represented by multiple individual
VISUALIZING HISTORICAL WEB DATA 171

lines or a single line that is given a weight,


which can then be mapped to a visual prop-
erty of the line if desired (e.g. line thickness,
color, etc.).
When executed well, network visualiza-
tions can provide a striking representation of
the complex structure of data. They excel at
showing local clusters (e.g. communities of
websites that are in conversation with each
other), disconnected components and the gen-
eral density of networked relationships. They
can also be extremely powerful for showing
the difference between two related networks,
for example two different blogs, or for show-
ing how a single network has changed by
visualizing two different moments in time.
While networks can be powerful visual Figure 13.1 The network of all pages from
representations, one of the most challeng- http://ai.umich.edu a newer unit at the
ing aspects of representing them is that large University of Michigan focused on academic
tightly interwoven networks tend to produce innovation in the digital age. Each circle
(node) around the outside represents a
uninterpretable masses of nodes and con-
single page and each line (edge) between
nections that look more like hairballs than them represents a link from one page to
a meaningful representation of data (See another. The nodes are sorted around the
Figures 13.1 and 13.2). Various automatic circle based on the next directory in the
layout algorithms can have a considerable url. It is already evident that the left is the
impact on the readability of a network, espe- highly connected component while the right
cially if nodes can be clustered in such a way is very sparsely connected. This and all of
that interesting patterns emerge in the density the other network diagrams made by the
of connections between clusters (e.g. separat- author (Figures 13.1 through 13.6) were
ing subcommunities, subdomain or types of made using Cytoscape (Shannon et al., 2003).
pages can often be fruitful) (see Figure 13.3).
Furthermore, in general, networks that have algorithms – Fruchterman and Reingold
clear clusters tend to be easier to visualize (1991) – makes each node repel every other
than networks with a single tightly connected node and treats each edge as a spring (that
component. Luke (2015) discusses a number has a default length and a given strength that
of automatic layout algorithms and some of pulls nodes towards that default length). It
the ways they can be modified, variations of then updates positions based on the various
which can be found in most network visuali- forces until it settles into a low-energy state
zation software. that keeps related nodes close together and
Furthermore, since many of the most at the same time spreads nodes out as much
often used automatic layouts (e.g. variations as possible, making the resultant visualiza-
of force-directed layouts) simulate physical tion readable. Adjusting the spring length,
systems in order to distribute nodes in space the spring strength and the repellent force
to most clearly present network structures, between nodes can significantly change the
adjusting the parameters of these simula- output of the simulation. Most network vis-
tions can have significant impact on the ualization software will allow these param-
readability of the resultant graph. For exam- eters to be adjusted and taking a little time to
ple, one of the earliest but still often used do so can often be of great benefit.
172 THE SAGE HANDBOOK OF WEB HISTORY

While layout of networks can greatly


increase the aesthetic quality and readability
of a network, many networks (especially link
networks extracted from websites) have such
a high density of edges and number of nodes
that no level of finesse with layout alone is
able to extract a meaningful visual represen-
tation, unless of course one merely desires to
communicate the sheer density of a network.
Luckily, as the use of network visualization
has grown in many fields alongside the ease
of creating large network datasets, a number
of techniques have recently been developed
for making such networks more visually
understandable.

Figure 13.2 The force-directed v ­ isualization


of the same network from Figure 13.1
is d
­ ifficult to read due to its ­density.
Edge Pruning and Edge Bundling
Perhaps the only structural e ­ lement that is Rarely do an excessive number of nodes
­noticeable is the relatively large group of cause a network to be too crowded, since
nodes in the lower left that appear to be
nodes can take up a very small amount of
exclusively linked from a single node.
visual space. Rather, it is usually the number
of edges connecting these nodes that tend to
overwhelm attempts at a network visualiza-
tion, hiding any existing structure behind an
impenetrable mass of lines. One possible
solution to this problem is edge pruning,
wherein edges are removed based either on
some value in the data itself or in the network
structure. The former is the most straightfor-
ward: for example, if one is looking at links
between blogs, any connection with fewer
than, say, 25 links can be removed, leaving
only connections where there is a continual
habit of linking rather than one or two inci-
dental links. Edge pruning is doubly helpful
for force-directed layouts as it both removes
a mass of lines and also decreases the
number of springs that hold the network
tightly together, thus allowing it to spread out
more (see Figure 13.5). The visualizations
Figure 13.3 Looking just at links between
and analysis in Adamic and Glance (2005)
the top-level directories (e.g. http:// offer a number of examples of the effect of
ai.umich.edu/blog and http://ai.umich.edu/ removing data in order to make visuals more
about-ai) gives a smaller network that is understandable. For example, in one visuali-
slightly more manageable but still difficult zation they first remove edges that represent
to learn very much from. fewer than five links and then remove those
VISUALIZING HISTORICAL WEB DATA 173

links that represent fewer than 25 links – of ink spent displaying edges. Edges that
each step makes the underlying structure of trace similar paths can be bent so that they
connection clearer. run together. Like edge pruning, bundling
Another option for edge pruning, especially can either be done based on information in
when there are no clear variables that could the data (e.g. hierarchical relationships can
help discern the importance of connections be bundled together, such as if one were visu-
(e.g. unweighted friendship networks, blog- alizing links both between sites and between
rolls, co-authorship, etc.), is to use the struc- individual pages) or using the structure of the
ture of the network itself to prune edges. By layout itself; Holten (2006; Holten and Van
generating various metrics that describe the Wijk, 2009) offers algorithms for both, the
importance of edges to the network, it is pos- latter using a force-directed algorithm (simi-
sible to maintain and even clarify the general lar in concept to one discussed previously for
structure while simultaneously minimizing graph layout) to bend nearby edges together
the overall number of visualized edges. Zhou (see Figures 13.5 and 13.6). A number of net-
et al. (2010) and Dianati (2016) offer proce- work visualization software packages now
dures for removing edges while maintaining have built-in options for edge bundling.
the connectivity of a network. Hennessey Meyer et al. (2017) include a slightly dif-
et al. (2008) even provide an algorithm for ferent example of bundling that effectively
network simplification by node removal. includes edge and node bundling. Their
Such a process more aggressively changes the graph shows the size of the node based on the
visualization of the network into an abstrac- length of the bar in the outside circle and the
tion of its structure, but for especially dense size of connections as the width of the edges.
networks such measures may be called for. The graph presents a compelling display of
Edge bundling provides a slightly different the size of connections between domains (see
approach for minimizing excessive amounts Figure 13.7).

Figure 13.4 The same set of pages from


Figure 13.1, but only showing edges where
a page links to another page five or more
times. This most likely shows structural links
such as those that appear in headers and Figure 13.5 Bundling the edges from
footers along with a few places on a page Figure 13.1 provides a clearer visualization.
rather than single mentions in the body of Edge b
­ undling can be powerful for circle
the text. The main page in the middle is the layouts as one can see the general direction
About page. of movement.
174 THE SAGE HANDBOOK OF WEB HISTORY

Figure 13.6 This final network visualization shows the same network from Figure 13.3 with
the first-level directories, but only showing edges with ten or more links. The edges are also
­bundled and labels are scaled based on the node’s betweenness centrality, which measures
how important a node is for connecting the network.

Hive Plots numeric data, either from the network struc-


ture or elsewhere, can be mapped to distance
A modification on the basic layout of a net- along the axis. For example, nodes can be
work, known as a hive plot, offers another placed further out depending on the number
possibly elucidating option (Krzywinski of incoming links or the amount of text on a
et al., 2012). A hive plot most often involves given page, clearly delineating the most
three radially arranged axes that contain all linked or longest pages (see Figures 13.8 and
nodes in a network and edges are drawn 13.9).
between the three axes.2 Such a layout then Hive plots allow one to choose what are the
provides a number of options for emphasiz- most interesting aspects of a network dataset
ing various aspects of the data. The three and stress those in the visualization. As such
axes can be used to categorically differentiate they can be very effective in representing
nodes; for instance, if one were visualizing extremely complex networks. While the edge
connections between different groups of segments all end up being of nearly similar
online communities they could be divided length, they all move in the same direction –
between the axes more clearly showing links rather than crossing each other as in a force-
that connect across communities. Likewise, directed network – and so improve the display
VISUALIZING HISTORICAL WEB DATA 175

Figure 13.7 An example of edge and node bundling showing connections between high-
level domains (Meyer et al., 2017).

of the density of connections between parts of drastically affected by minor modifications,


the network. Furthermore, hive plots can be whereas most hive plots will merely delete or
subjected to the same modifications as other add data as it is changed, leaving other nodes
networks, e.g. edge or node pruning, bundling, and edges exactly where they are.
etc., to further clarify the network structure.
Another benefit to a hive plot is that,
unlike a force-directed layout whose final Heatmaps
layout is dependent on initial starting loca-
tions, the layout is deterministic and hence While standard node and edge diagrams tend
always reproducible given the same data and to instantly evoke our sense of ‘network-
decisions about what values to map to the ness’, aesthetically appearing as what many
axes. Furthermore, hive plots have the added of us think of when we think of networks,
advantage of being robust in the face of other representations can also reveal impor-
minor changes. Various other layouts can be tant properties of a given network
176 THE SAGE HANDBOOK OF WEB HISTORY

whether (or how much) they are connected


(see Figures 13.10 and 13.11).
The main challenge that arises with using
heatmaps for web visualization is that while
these networks are dense they are often sig-
nificantly sparser than many other networks
(i.e. there are lots of pages that do not link
to each other). Especially if a heatmap is
automatically clustered, one can end up with
large swaths of white space since sinks (web-
sites that are linked to but do not link out to
anything) will all cluster together. Incredibly
sparse heatmaps can be difficult to read and
are not the most compelling visualization.
Figure 13.8 A hive plot of the pages that Because edges add no more visual complex-
make up three of the first-level directories. ity, but extra nodes significantly increase the
The distance from the center represents size of a map, heatmaps are especially strong
the degree (number of links coming in and options for datasets with high edge density in
going out) and the color represents the networks with a relatively small number of
number of nodes at that position on the
nodes. Like in other network visualizations,
axes. Note that this diagram does not show
links within a directory. The hive plots were
the nodes can be organized and grouped by
produced with Jhive. attribute or data structure (e.g. clusters can be
placed together). Especially when heatmaps
are organized by attribute, they can be power-
ful tools for comparing either different net-
works or the same network over time, since
edge density is easily perceptible.
Heer and Kandel (2012) provide an excel-
lent example of the power of heatmaps for dis-
playing network structure, especially as a tool
for diagnosing data problems. They extract a
network dataset from friendship relations on
Facebook and plot them first as a standard
force-directed layout; a large central cluster
is visible along with a few smaller clusters,
as we would expect from a social network
Figure 13.9 The same plot as Figure 13.7, such as this. They then plot the same data as a
but with the axes expanded to show inter- heatmap, reordering the data in order to group
directory links. Each page is thus shown tightly connected nodes close to each other.
twice, once on the main axis and once on Finally, they plot the dataset in id order and
the repeated axis. a large blank space becomes apparent, sug-
gesting there is a gap in the dataset. Indeed, it
(and sometimes more cleanly and clearly turns out that the Facebook API is rate limited
than network diagrams). One common option and so failed to return the entire dataset, offer-
is a heatmap, which displays connections as ing no warning that the data was truncated.
a colored node–node matrix. Each node is While one hopes to avoid such situations, the
listed once along both the x and y axes and example speaks to the clarity that heatmaps
where they meet is then colored based on can provide in understanding networks.
VISUALIZING HISTORICAL WEB DATA 177

Figure 13.10 A heatmap showing a subset of first-level directories from http://ai.umich.


edu. A gray square represents a link between pages in those directories. The directory on the
left is the directory the link originates from and the directory at the time is the destination
directory. It should be noted that directed graphs do not produce symmetrical heatmaps.
These heatmaps were made using the statistical software R (R Core Team, 2013).

Metrics various subcommunities – possibly against


another metric such as the size of the network –
Finally, we should remember that various can in many circumstances be more informa-
metrics can describe the abstract structures of tive than the network graphs themselves.
networks and these metrics can be visually It can also be compelling to graph network
compared. For instance, if one were research- metrics against other data that describes the
ing subcommunities in an online community, pages one is working with. Cowls and Bright
understanding how tightly clustered link net- (2017) include a nice example, graphing
works are within these various subcommuni- the number of hyperlinks to each high-level
ties may be more important than the visual country domain (e.g. .it websites for Italy)
display of the networks themselves. In such a against the number of mentions of that coun-
case, we could calculate the global clustering try during the study period. The resultant
coefficient (Luce and Perry, 1949), which graph is quite compelling and straightfor-
measures the ratio of connected triplets that ward (see Figure 13.12).
are closed to all connected triplets (i.e. if a Any number of network metrics can be elu-
website links to two other websites, how often cidating, and the vast majority of structural
do those two websites link to each other). characteristics can be quantified. There is not
Graphing the clustering coefficients of the space here to list and detail all of the possible
178 THE SAGE HANDBOOK OF WEB HISTORY

Figure 13.11 The same heatmap from Figure 13.9, but colored based on the number of links.

metrics one could use, but most network vis- many of these algorithms function in very
ualization software will automatically calcu- high-dimensional spaces (e.g. they treat each
late a fair number of these. Any introductory unique word as a dimension and run compu-
text to social network analysis should detail tations in this space) so that any visualization
the vast majority of these and explain how of the results requires some form of dimen-
to utilize them. Furthermore, spending time sionality reduction, such as projecting the
with such a text and learning more about net- data onto an algorithmically selected two-
work analysis can offer great insights into the dimensional plane. While projections of the
process of visualization as well. three-dimensional earth onto two-dimen-
sional maps are generally understandable,
one merely has to look at the complicated
mathematical and political discussions
TEXT involved in this process to get a sense of how
complex projections from this three-dimen-
Another obviously important aspect of web sional space to a two-dimensional space can
data is the actual text of websites. Such large- be (let alone a 10,000-dimensional space).
scale corpora of text offer themselves well to
a variety of text analysis methodologies, from
simply counting the occurrences of a few key Simple Text Visualizations
words to complex algorithms for topic mod-
eling. One of the main challenges of visually Relatively simple text analyses, such as
representing the result of text mining is that counting words, can often be incredibly
VISUALIZING HISTORICAL WEB DATA 179

Figure 13.12 A scatterplot showing the relationship between links to country high-level
domains and mentions on the BBC website (Cowls and Bright, 2017).

insightful and such analyses lend themselves the lexical diversity of a set of documents. In
well to compelling data visualizations (e.g. this case, it may be desirable to remove stop
the continued use of Google N-grams despite words (i.e. common words that tell us little
methodological concerns that have been about the documents in question, such as
raised (Pechenick et al., 2015)). Even lists of ‘the’, ‘a’, etc.); or depending on exactly what
sentences including a given word in a corpus is interesting about a set of websites, one may
can be elucidating (this is known as a con- decide to leave stop words in the count.
cordance). For relatively small datasets with In general, it should be remembered that
longer texts, dispersion plots can quickly visualization need not show everything about
show the distribution of words throughout a a corpus. It is likely that the only way to know
corpus. A dispersion plot shows where terms every element of a corpus is to read every
occur by word position in a document, or in document in it; the purpose of data mining
a series of documents that have been ordered and visualization is to find interesting occur-
in some way (see Figure 13.13).3 rences or structures. It is not to recreate the
Similarly, cumulative frequency plots, corpus, like the imagery empire in Borges’
which show the combined number of occur- short story that created a map the exact size
rences of the most frequent words (e.g. the of the territory.4 In short, if there are interest-
number of times the first most common words ing aspects of a dataset one should seek to
occur; then the number of times the first two show only those elements. This is especially
most common words occur, etc.), are informa- true when working with text mining. While
tive. By showing these plots together for sub- complex algorithms and analyses can be elu-
sections of a corpus, one can display both the cidating, the most striking visualizations tend
most common terms and information about to be those that are easily understood and
180 THE SAGE HANDBOOK OF WEB HISTORY

Figure 13.13 A dispersion plot showing the frequency of the terms ‘digital’, ‘learning’
and ‘div’ from http://ai.umich.edu/about-ai. The text includes all of the html and it is read-
ily apparent how much greater the proportion of code is to the human-readable text. This
graph was made using the Natural Language Toolkit (NLTK), a package for Python (Bird et al.,
2009b).

represent elements of the text or metadata One possible, though relatively infre-
that could be counted directly if one had the quently used, option with great potential is
time. the use of self-organizing maps (SOMs) as a
means to display related and divergent topics
in a large corpus. SOMs use artificial neural
Algorithmic Approaches networks to map high-dimensional spaces to
a two-dimensional plane, trying to maintain
Beyond such direct ways of extracting data the proximity of related items. A third dimen-
from text and visualizing it, many data-min- sion can then be used to display how similar
ing algorithms, especially those that reduce or dissimilar nearby items are. Skupin (2002;
the final dimensionality of the output data to Skupin et al., 2013) has used SOMs to great
two or three dimensions, lend themselves to effect to visualize concepts in large text cor-
various visualizations. Principle component pora, placing central terms throughout the
analysis allows one to visualize two or three map and making ‘valleys’ of related terms
derived components as axes with each web- (see Figure 13.14). One of the major advan-
site as a point on a scatterplot. Latent tages of SOMs over other visualizations of
Dirichlet Allocation (LDA, one of the more algorithmic text analysis is that they repli-
popular algorithms for topic modeling in the cate a type of visualization, a contour map,
humanities) is often presented simply as lists that viewers are often familiar with. Related
of words that determine topics, but since concepts are proximate, and the addition of
LDA assigns a percentage of each document topography supplements this with an addi-
to each topic, multiple visualization opportu- tional sense of how related nearby terms are.
nities arise. One could again use a heatmap, Of course, if a specific algorithm is neces-
to show each document along one axis and sary for an analysis, it will be worth the time
each topic along the other, coloring each and space to describe the functioning of that
square based on the percentage of the topic algorithm and any visual output it may pro-
that corresponds to the document.5 duce. But if one is agnostic about the specific
VISUALIZING HISTORICAL WEB DATA 181

algorithm to be used or an algorithmic treat- namely displaying change over time. It has
ment of the text is merely being used to pro- been left to the end because it is simultane-
vide a sense of the data, it can be beneficial to ously one of the most difficult and most
select an algorithm whose output is sensible simple things to deal with in terms of visuali-
without significant additional explanation of zation. There is a growing interest and
the algorithm. Furthermore, some algorithms, number of tools for animating visualizations
such as LDA, or even simple analysis, such or creating interactive visualizations that
as calculating terms that most clearly differ- allow a user to move through time. While
entiate between two corpora or two times, such animations are a sensible way to deal
lend themselves well to simply listing words with time and are easy for the viewer as they
with scores or counts. Again, the communi- map time to time (though usually scaled so
cative power of tables and lists should not be one can see a number of years in a few min-
underestimated. utes), they have some drawbacks. First, ani-
mations or interactive visualizations are not
as easily distributable as static visualizations;
they often require certain software to run or
CHANGE OVER TIME host online. Furthermore, some of the tech-
nology that is used for these visualizations is
While the discussion up to this point has not easily archived.
focused largely on displaying various aspects Second, and more fundamentally, one
of web data, one of the most important ele- of the strengths of data visualization is the
ments of history has been left to the side, ability to take large amounts of information

Figure 13.14 Self-organizing map based on Wikipedia featured article data. Closer items
are more similar. The ‘mountains’ are edges between clusters and the straight lines are links
between articles. (Denoir, 2013 - Creative Commons Attribution-ShareAlike 3.0 License, avail-
able: https://en.wikipedia.org/wiki/File:Self_oraganizing_map_cartography.jpg).
182 THE SAGE HANDBOOK OF WEB HISTORY

and distribute it spatially, allowing the eye et al. (2017) include an exceptional example
to quickly discern similarities, differences, of the use of small multiples for web data,
patterns, etc. Animations and interactions depicting the strength of hyperlinked con-
require the viewer to compare not what is nections between universities in the UK over
seen but what is remembered. Thus, in a time (see Figure 13.15).
sense, animation works against the greatest Boyandin et al. (2012:1005) even offer a
strengths of data visualization. In his book, user study comparing animated geographi-
Bertin (2011) explicitly excises animation cal visualizations with small multiples. They
from the discussion of data visualization, conclude, ‘We observed that with animation
suggesting that its understanding lies more the subjects tended to make more findings
with a discussion of cinematic time. While concerning geographically local events and
animation and interaction are much more changes between subsequent years. With
easily achieved now than they were at the small-multiples more findings concerning
time of Bertin’s writing, his statement on this longer time periods were made’. While ani-
matter should be understood philosophically mations may allow one to follow specific
rather than merely technologically. areas in a graph, understanding large-scale
In this light, one efficacious technique structural changes is likely significantly
for displaying change over time is the use aided by displaying data over space rather
of small multiples. Small multiples con- than time.
sist of multiple graphs showing how the In addition to small multiples, time can
graph changes over time (or another vari- be treated as a variable in any graph. For
able). Using them to display time requires example, one could graph any network met-
periodizing one’s data (i.e. each graph repre- ric, such as clustering coefficient or average
sents a month, year, decade, etc.), but if this degree, over time. Furthermore, the change in
is done well it can be used to great effect. a distribution, such as betweenness centrality,
Tufte (1983: 169) says of small multiples (a which is a node-level metric, can be repre-
term and technique that he helped to popu- sented as a series of small multiples, with a
larize): ‘Well-designed small multiples are histogram for each time point. This is not to
inevitably comparative, deftly multivariate, say that animations should not be used: these
shrunken, high-density graphics, usually can be very effective for showing the expo-
based on a large data matrix drawn almost nential growth in size or complexity of data
entirely with data-ink, efficient in interpreta- over time and interaction excels at allow-
tion, and often narrative in content’. Meyer ing users to explore subsets that may be of

Figure 13.15 Small multiples showing the strength of hyperlinked connections between UK
universities under study for 2000, 2005, 2010 (Meyer et al., 2017).
VISUALIZING HISTORICAL WEB DATA 183

particular interest to them. Still, the use and writing and culture. Researchers working
necessity of either should be carefully con- with such data do not lack information that
sidered and chosen for what it can explicitly can be quantified, analyzed and visualized.
add. The challenge in working with this type of
data, and especially in visualizing it, is to
exclude the information that distracts from
the most interesting structures and stories
CONCLUSION contained with the data. Spending time
exploring your data and seeing what is most
Wainer (1984: 145–6) concludes his article, visually compelling (and distracting) can pay
which attempts to teach visualization by dividends in the quality and communicative
offering advice on displaying data badly, power of data visualization.
with a final suggestion: if something has
been done well in the past figure out a new
and different way to do it. Of course, our goal Notes
is to display data well, and so the opposite
suggestion should be taken; many of the 1  It should also be noted that axes and scales can
have a considerable impact on clarity. Especially
visualization problems that one runs into
when dealing with heavily skewed data (such as
have already been addressed by others either the number of links pointing to a page and other
using similar data or even unrelated data web data that often presents a Zipfian distribu-
(many of the advances in network visualiza- tion), logarithmic scales can be of great use.
tion have been made dealing with social net- 2  It is also possible to double the axes, such that axis
1 and axis 1’ show connections within the group.
works, gene expression networks, etc.).
3  For both concordances as dispersion plots see
While this chapter has attempted to address NLTK chapter 1 part 3 (Bird et al., 2009a).
some of the most common issues that arise in 4  Borges’ one paragraph short story (1975), “On
visualizing historical web data, it is impos- Exactitude in Science“, recounts, “The Cartogra-
sible that all of the potential issues could phers Guilds struck a Map of the Empire whose
size was that of the Empire, and which coincided
have been dealt with. There are many chal-
point for point with it. The following Genera-
lenges that are beyond the scope of this chap- tions, who were not so fond of the Study of Car-
ter, but it is still highly likely that someone tography as their Forebears had been, saw that
else has confronted a related set of issues and that vast map was Useless“.
written about them. Even a bad example can 5  For more examples of visualizing LDA see (Riddell,
2015).
be instructive in suggesting against possible
approaches.
Finally, it is good practice to look at exam-
ples with an agnosticism towards specific
software and technologies. Much of the soft- REFERENCES
ware available today prematurely limits a
user’s options for making charts and graphs; Adamic, L.A., and Glance, N. (2005) ‘The politi-
exploring visualization outside of the limits cal blogosphere and the 2004 US election:
Divided they blog’, in Proceedings of the 3rd
of a particular piece of software will allow
international workshop on Link discovery:
one to make the best decisions about how to 36–43.
represent data. You can then choose the best Barabási, A.-L., and Frangos, J. (2014) Linked:
software for transforming your data into a The new Science of Networks. New York:
visualization. Basic Books.
Websites, and especially web archives, offer Bertin, J. [1964(2011)] Semiology of Graphics:
a multitude of rich data types that have much Diagrams, Networks, Maps, trans. W Berg.
to tell us about information, communication, Redlands: ESRI Press.
184 THE SAGE HANDBOOK OF WEB HISTORY

Bird, S., Klein, E., and Loper, E. (2009a) NLTK in hierarchical data’, IEEE Transactions on
Book. Available: http://www.nltk.org/book/ Visualization and Computer Graphics, 12(5):
(accessed: November 20, 2016) 741–748.
Bird, S., Loper, E., and Klein, E. (2009b) Natural Holten, D., and Van Wijk, J.J. (2009) ‘Force-
Language Processing with Python. Sebas- directed edge bundling for graph visualiza-
topol, CA: O’Reilly Media Inc. tion’, Computer Graphics Forum, 28(3):
Borges, J.L. (1975) ‘On Exactitude in Science’, 983–990.
in A Universal History of Infamy, trans. Krzywinski, M., Birol, I., Jones, S.J., and Marra,
Norman Thomas de Giovanni. London: Pen- M.A. (2012) ‘Hive plots – rational approach
guin Books: 131. to visualizing networks’, Briefings in Bioin-
Boyandin, I., Bertini, E., and Lalanne, D. (2012) formatics, 13(5): 627–644.
‘A qualitative study on the exploration of Latour, B. (1993) The Pasteurization of France.
temporal changes in flow maps with anima- Cambridge: Harvard University Press.
tion and small-multiples’, Computer Graph- Latour, B. (2005) Reassembling the Social: An
ics Forum, 31(3) pt 2: 1005–1014. Introduction to Actor-Network-Theory.
Brügger, N. (2009) ‘Website history and the Oxford: Oxford University Press.
website as an object of study’, New Media & Luce, R.D., and Perry, A.D. (1949) ‘A method
Society, 11(1–2): 115–132. of matrix analysis of group structure’, Psy-
Castells, M. (2011) The Rise of the Network chometrika, 14(2): 95–116.
Society: The Information Age: Economy, Luke, D. (2015) A User’s Guide to Network
Society, and Culture (Vol. 1). New York: John Analysis in R. Berlin: Springer.
Wiley & Sons. Meyer, E., Yasseri, T., Hale, S., Cowls, J.,
Cowls, J., and Bright, J. (2017) ‘International Schroeder, R., and Margetts, H. (2017) ‘Ana-
hyperlinks in online news media’, in Brügger, lysing the UK web domain and exploring 15
N. and Schroeder, R. (Eds), The Web as His- years of UK universities on the web’, in Brüg-
tory. London: University College London ger, N. and Schroeder, R. (Eds), The Web as
Press: 101–116. History. London: University College London
Dianati, N. (2016) ‘Unwinding the hairball Press: 23–44.
graph: Pruning algorithms for weighted Milligan, I. (2012) ‘Mining the “Internet grave-
complex networks’, Physical Review E, 93(1): yard”: Rethinking the historians’ toolkit’,
012304. Journal of the Canadian Historical Associa-
Foot, K. (2006) ‘Web sphere analysis and cyber- tion, 23(2): 21–64.
culture studies’, in Silver, D. and Massanari, Pechenick, E.A., Danforth, C.M., and Dodds,
A., (Eds), Critical Cyberculture Studies. New P.S. (2015) ‘Characterizing the Google Books
York: NYU Press: 88–96. corpus: Strong limits to inferences of socio-
Fruchterman, T.M., and Reingold, E.M. (1991) cultural and linguistic evolution’, PloS One,
‘Graph drawing by force-directed place- 10(10): e0137041.
ment’, Software: Practice and Experience, R Core Team (2013) R: A language and environ-
21(11): 1129–1164. ment for statistical computing. R Foundation
Heer, J., and Kandel, S. (2012) ‘Interactive for Statistical Computing: Vienna, Austria.
analysis of big data’, XRDS: Crossroads, The Riddell, A. (2015) Text Analysis with Topic
ACM Magazine for Students’, 19(1): Models for the Humanities and Social Sci-
50–54. ences. Available: https://de.dariah.eu/tatom/
Hennessey, D., Brooks, D., Fridman, A., and (accessed November 15, 2016)
Breen, D. (2008) ‘A simplification algorithm Shannon, P., Markiel, A., Ozier, O., Baliga, N.S.,
for visualizing the structure of complex Wang, J.T., Ramage, D., Amin, N.,
graphs’, in 2008 12th International Confer- Schwikowski, B., and Ideker, T. (2003)
ence Information Visualisation. July, 2008: ‘Cytoscape: A software environment for
616–625. integrated models of biomolecular interac-
Holten, D. (2006) ‘Hierarchical edge bundles: tion networks’, Genome Research, 13(11):
Visualization of adjacency relations 2498–2504.
VISUALIZING HISTORICAL WEB DATA 185

Skupin, A. (2002) ‘A cartographic approach to Tufte, E. (1983) The Visual Display of Quantita-
visualizing conference abstracts’, IEEE Com- tive Information. Cheshire, CT: Graphics Press.
puter Graphics and Applications, 22(1): Wainer, H. (1984) ‘How to display data badly’,
50–58. The American Statistician, 38(2): 137–147.
Skupin, A., Biberstine, J.R., and Börner, K. Zhou, F., Malher, S., and Toivonen, H. (2010)
(2013) ‘Visualizing the topical structure of ‘Network simplification with minimal loss of
the medical sciences: A self-organizing map connectivity’, in 2010 IEEE International Con-
approach’, PloS One, 8(3): e58779. ference on Data Mining: 659–668.
This page intentionally left blank
PART III

Technical and Structural


Dimensions of Web History
This page intentionally left blank
14
Adding the Dimension of
Time to HTTP
M i c h a e l L . N e l s o n a n d H e r b e r t Va n d e S o m p e l

INTRODUCTION resource versions, including conventional web


archives, as well as resource versioning sys-
While the web is distributed, most web tems such as wikis.
archives are centralized silos that do not coop- The Memento Protocol introduces some
erate with each other. This is partially because standard terminology with which to discuss
the technology that is necessary to replay the web archiving, the most fundamental of
archived content and keep it from being influ- which are: original resource (the resource
enced by material on the live web also makes on the live web), Memento (an archived ver-
it difficult for web archives to cooperate. The sion of an Original Resource, frozen in time),
Memento Protocol (which we played a central TimeGate (a resource capable of datetime
role in defining) addresses this problem by content negotiation to discover a tempo-
defining an extension to the Hypertext Transfer rally appropriate Memento), and TimeMap
Protocol (HTTP) that allows for standardized, (a machine-readable list of all Mementos
machine-readable integration of both the past for an Original Resource). Furthermore, the
web and the present web. The Memento Memento Protocol is the first web archiving
Protocol extends the concept of HTTP content API, enabling aggregation of access to dis-
negotiation to include not only well-known parate web archives. Web archiving has been
dimensions such as Multipurpose Internet dominated by the Internet Archive’s Wayback
Mail Extensions (MIME) types (e.g., JPEG Machine, but via the Memento Protocol it is
vs. PNG) and file encodings (e.g., gzip vs. possible to leverage the more than a dozen
compress), but also the dimension of publicly accessible web archives through-
Coordinated Universal Time (UTC) as a uni- out the world for increased completeness,
versal versioning system. The protocol can be consistency, verifiability, resilience, and
supported by all systems that hold temporal availability.
190 THE SAGE HANDBOOK OF WEB HISTORY

This chapter begins by introducing the his- way of embracing a time dimension for the
tory of Unix and HTTP and how they continue web. Since most of the early HTTP servers
to influence the design of web archives today, were implemented on Unix workstations, the
then reviews the Memento terminology and Unix filesystem and HTTP were co-deployed
concepts that allow for standardized discus- in almost all cases. Metadata about files in
sion of the basic mechanics of web archiving. the Unix filesystem is stored in ‘inodes’
Next, we review how different web archives (a contraction of ‘index node’), and the origi-
can be aggregated and how user agents (i.e., nal description of the Unix filesystem defined
web browsers) can interact with Memento- three notions of time to be stored in an inode:
enabled archives, and then conclude with a file creation, last use, and last modification
brief review of our activities in areas of open (Ritchie and Thompson, 1974). However, at
research that result when there is an inte- some early point the storage of the file crea-
grated, distributed network of web archives. tion time in the inode was replaced with the
last modification time of the inode itself.
The result was that the last modification and
access times of a file are defined, but the
HISTORY OF HTTP AND UNIX creation time, a crucial part of establishing
provenance, cannot be stored in a standard
HTTP is the protocol (Fielding et al., 1999) Unix filesystem. Thus, files stored on a Unix
that underpins what we know colloquially as filesystem, which are subsequently served
‘the web’. A simple version, HTTP version through HTTP, inherit the semantic limitations
0.9, was documented in 1991 (Berners-Lee, of the filesystem, and as a result HTTP has
1991), and in 1996 a version similar to the only two notions of time in server responses:
version still in use today was defined (Berners- the ‘Date’ header, which provides the date of
Lee et al., 1996). Unix, in various vendor- the response, and the ‘Last-Modified’ header,
specific formats, was the predominant which is inherited from the inode.
operating system during this timeframe for To better understand how the Memento
workstations and mainframes, so the develop- Protocol integrates with HTTP, many of the
ment of HTTP, while technically independent figures listed below provide raw, actual HTTP
of Unix, is ultimately deeply intertwined with requests and responses. The web browsers
the operating system that made it possible. As we use everyday (e.g., Firefox, Chrome) hide
such, the history of HTTP is one of incremen- these detailed and verbose HTTP requests
tally implementing the original vision of a and responses from us, but in this chapter we
fully featured, distributed filesystem. Initially choose to surface these details because they
conceived with read-write and versioning are integral to understanding the web archiv-
capabilities, early implementations fell short ing infrastructure that the Memento Protocol
of the original vision due in part to the tight enables. We recognize the HTTP responses
integration with the Unix filesystem. can be intimidating to those not used to read-
Advances in encryption and authentication ing them, but with a short primer they quickly
have made the read-write capability more become invaluable. We will walk through
widespread, but it was not until the develop- how to read the HTTP session in Figure 14.1.
ment of the Memento Protocol that the ver- First, for every HTTP response, we include
sioning capability was added for HTTP. the curl request, with all the appropriate argu-
Despite versioning (described in terms ments, that generated the response. ‘Curl’ is a
of generic vs. specific resources) being part command line web browser that does not ren-
of an early design document for the Web der web pages (like Firefox, Chrome, etc.),
(Berners-Lee, 1996), the historical coupling but rather shows the raw HTTP response
of HTTP and Unix filesystems stood in the from the web server. The ‘$’ symbol in the
ADDING THE DIMENSION OF TIME TO HTTP 191

figures is the command line prompt, and browser to issue a new request for a differ-
what follows is what the user types. If you ent URI (which in turn may also issue a redi-
are in a terminal program, you should be able rection, until the process stops with a ‘200’
to copy and paste everything after the ‘$’ response). Redirections frequently occur, but
and get a similar response (keep in mind that regular web browsers hide them and users are
responses will change over time). Not all curl typically unaware of this fundamental HTTP
options and other command line arguments event. The argument ‘-L’ will cause curl to
are fully explained in this chapter, but in the automatically follow the redirection.
interest of reproducibility they are provided The final part of an HTTP response is a
as executed. For example, the following line series of lines of metadata (which can appear
uses curl to return just the metadata (via the in any order), arranged in a ‘key: value’ for-
‘-I’ argument) for an image at lanl.gov: mat, followed by a carriage return. For exam-
ple, these four lines indicate the date of the
$ curl -I http://www.lanl.gov/_assets/ response (2017-02-07), when the resource
images/lanl-logo-footer.png
was last modified (2014-10-28), that the
The next line is the response from the returned representation is an image in ‘PNG’
server and has three components: ‘HTTP/1.1’ format, and that the image is 8,719 bytes long:
is an acknowledgment from the server that it Date: Tue, 07 Feb 2017 00:08:10 GMT
is supporting the 1.1 version of HTTP (new Last-Modified: Tue, 28 Oct 2014
and older versions exist, but 1.1 is still cur- 22:12:02 GMT
rently the most commonly deployed). Next, Content-Length: 8719
Content-Type: image/png
‘200’ is the numerical code that provides the
semantics of the response; in this case ‘200’ Now we can examine the HTTP response
means the request was understood, processed, in Figure 14.1 and see that the Los Alamos
and there were no errors. The next portion is National Laboratory (LANL) logo PNG file
a human-readable phrase, in this case ‘OK’, was last modified in 2014, over two years
which is explanatory for the ‘200’ response ago from the request in 2017. Thus, if the file
code. The phrase is for human readability and were stored in a cache after the Last-Modified
web browsers only process the numeric code. date, it does not need to be downloaded again.
In Figure 14.1, the full response is: On the other hand, pages that are dynami-
HTTP/1.1 200 OK cally generated typically do not set the ‘Last-
Modified’ header, in part because it is not
There are many other HTTP response stored by default in the filesystem and thus
codes defined, but in this chapter we will is extra work to track and compute. The
focus on ‘200’ and ‘302’. The following result is that the web is losing expressive-
302 responses are equivalent, since the text ness about time, not gaining it. For example:
phrase is for humans and only the code ‘302’
$ curl -I http://www.lanl.gov/_assets/
is processed by browsers: images/lanl-logo-footer.png
HTTP/1.1 200 OK
HTTP/1.1 302 Found
Date: Tue, 07 Feb 2017 00:08:10 GMT
Last-Modified: Tue, 28 Oct 2014
Or: 22:12:02 GMT
Content-Length: 8719
HTTP/1.1 302 Moved Temporarily Content-Type: image/png

The 302 response is a redirection from the Figure 14.1 The Last-Modified response
requested URI to another URI. The response header often exists for images, pdfs, and
provides metadata, but also instructs the other typically static files.
192 THE SAGE HANDBOOK OF WEB HISTORY

Figure 14.2 is for the HTML home page that mediums’ (or ‘features’), which have now been
embeds (among other resources) the PNG replaced with cascading style sheets (CSS)
from Figure 14.1. The HTML home page functionality, and time, which the Memento
has surely changed from 2014, but it might Protocol made possible nearly 20 years later.
not have changed from yesterday or even
last week. Unfortunately, without the Last-
Modified header we cannot be sure and we
are likely to have unnecessary downloads if MEMENTO TERMINOLOGY AND
we visit the page often. CONCEPTS
A careful inspection of Figure 14.2 shows
the ‘Vary’ response header, indicating that To better understand the Memento Protocol
the server can perform content negotiation on (Van de Sompel et al., 2009, 2010, 2013), we
this resource; in this case the server is capa- must clarify some terminology that is often
ble of transmitting a compressed version of used imprecisely even in technical writing.
this HTML page for quicker download times, Figure 14.4 is from the seminal Web
and the users’ browser will uncompress it on Architecture document (Jacobs and Walsh,
their behalf. Note that while the ‘Vary’ header 2004), and shows: 1) Uniform Resource
indicates that content negotiation is possible, Identifiers (URI), 2) Resources, and
it is the presence of the ‘Content-Encoding’ 3) Representations.
response header that indicates that compres- URIs, which are a superset of the more
sion has actually occurred (Figure 14.3). commonly known Uniform Resource
Tim Berners-Lee’s original design docu- Locators (URLs), identify resources. At any
ment about ‘generic’ and ‘specific’ resources given moment, resources exist in a certain
that formed the basis for content negotiation state, and that state can be serialized into
did not anticipate character sets or encodings a representation of the resource. It is this
(Berners-Lee, 1996), but it did describe ‘target representation that is transmitted via HTTP,
rendered by the browser, etc. – the resource
$ curl -I http://www.lanl.gov/
HTTP/1.1 200 OK
itself is not transferred, and indeed can be a
Date: Tue, 07 Feb 2017 00:08:46 GMT real-world object (i.e., non-digital). When a
Vary: Accept-Encoding URI is dereferenced (most commonly with
Content-Type: text/html; charset=UTF-8 HTTP, though many other URI schemes
(i.e., protocols) are defined), the represen-
Figure 14.2 The Last-Modified response tation of that resource is returned. As shown
header is typically absent from resources in Figures 14.2 and 14.3, the representa-
with dynamically constructed representations tion can vary based on input from the cli-
(i.e., almost all HTML files).
ent (e.g., compressed or not), which means
there can simultaneously be different repre-
$ curl -I -H “Accept-Encoding: gzip”
http://www.lanl.gov/ sentations that capture the current state of
HTTP/1.1 200 OK the resource.
Date: Tue, 07 Feb 2017 05:05:31 GMT The Memento Protocol specifies how to
Vary: Accept-Encoding retrieve a representation of a prior, not cur-
Content-Encoding: gzip
rent, state of a resource by allowing a cli-
Content-Length: 20
Content-Type: text/html; charset=UTF-8 ent to express the datetime of the prior
state it is interested in. Building on the Web
Figure 14.3 Based on the ‘Accept-Encoding’ Architecture, the Memento Protocol intro-
request header, the server responds with duces standard terminology with which to
a gzipped HTML page, as declared in the discuss the mechanics of versioning on the
‘Content-Encoding’ response header. web (Van de Sompel et al., 2013):
ADDING THE DIMENSION OF TIME TO HTTP 193

Figure 14.4 URIs, resources, and representations (Jacobs and Walsh, 2004).

• Original Resource: an Original Resource (identi- Memento-Datetime response header. Note


fied by URI-R) is a resource that exists or used that the value for Memento-Datetime can be
to exist, and for which access to one of its prior different from the Last-Modified header sent
states may be required. from the archive, since the archives can con-
• Memento: a Memento (identified by URI-M) for
tinually update banners, archive logos, and
an Original Resource is a resource that encap-
other extra information injected into the
sulates a prior state of the Original Resource. A
Memento for an Original Resource as it existed at Memento when it is replayed (Nelson, 2011).
time T is a resource that encapsulates the state For example, the banner at the top of Figure
the Original Resource had at time T. 14.18 is supplied by the web archive itself and
• TimeGate: a TimeGate (identified by URI-G) for an the contents of the banner are updated over
Original Resource is a resource that is capable of time, necessitating updated values for Last-
datetime negotiation to support access to prior Modified even though the Memento-Datetime
states of the Original Resource. value does not change.
• TimeMap: a TimeMap (identified by URI-T) for an For the Original Resource http://www.
Original Resource is a resource from which a list
lanl.gov/, the first known Memento is in
of URIs of Mementos of the Original Resource is
available.
the Internet Archive’s Wayback Machine
(the first (operational in 1996) and by far
When an archive crawls an Original Resource, the largest public web archive (Negulescu,
it saves the state of the resource at the time it 2010)) with a URI-M of http://web.archive.
was crawled. This state now becomes its own org/web/19961221031231/http://lanl.gov/.
frozen resource, a Memento, and the time at We can examine its HTTP response in
which it was crawled is stored in the resource’s Figure 14.5.
194 THE SAGE HANDBOOK OF WEB HISTORY

$ curl -I http://web.archive.org/web/ of which indicate the URI of the Original


19961221031231/http://lanl.gov/ Resource nor the Memento-Datetime:
HTTP/1.1 200 OK
Server: Tengine/2.1.0 http://webcitation.org/query?
Date: Wed, 08 Feb 2017 02:26:53 GMT id=1309246026894208
Content-Type: text/ http://webcitation.org/5zm0eNcVU
html;charset=utf-8
Content-Length: 9237
The standardized, machine-readable
Memento-Datetime: Sat, 21 Dec 1996
03:12:31 GMT method is to return the Memento-Datetime
Link: <http://lanl.gov/>; in the response header, with the datetime
rel=”original”, <http://web. in a format borrowed from Email (Crocker,
archive.org/web/timemap/link/ 1982), as with the Date and Last-Modified
http://lanl.gov/>; rel=”timemap”;
headers:
type=”application/link-
format”, <http://web.archive.
Memento-Datetime: Sat, 21 Dec 1996
org/web/http://lanl.gov/>;
03:12:31 GMT
rel=”timegate”, <http://web.
archive.org/web/19961221031231/
http://lanl.gov/>; rel=”first Inside the Link response header, although
memento”; datetime=”Sat, 21 Dec the syntax is challenging, a careful read-
1996 03:12:31 GMT”, <http://web. ing will reveal the unambiguous statements
archive.org/web/19981212015212/ for the Original URI, first, last, and next
http://lanl.gov/>; rel=”next
Mementos, and the TimeGate and TimeMap
memento”; datetime=”Sat, 12 Dec
1998 01:52:12 GMT”, <http://web. (note that even though the line wraps, it
archive.org/web/20170201114455/ is a single logical ‘key: value’ line, where
http://lanl.gov/>; rel=”last the value has multiple, comma-separated
memento”; datetime=”Wed, 01 Feb sub-values):
2017 11:44:55 GMT”
Link: <http://lanl.gov/>; rel=
Figure 14.5 HTTP response for a Memento ”original”, <http://web.archive.
from the Internet Archive. org/web/timemap/link/http://
lanl.gov/>; rel=”timemap”;
type=”application/link-
format”, <http://web.archive.
In this case, the Original Resource org/web/http://lanl.gov/>;
and Memento-Datetime are extract- rel=”timegate”, <http://web.
able from the URI-M as www.lanl.gov and archive.org/web/19961221031231/
19961221031231, respectively. However, http://lanl.gov/>; rel=”first
memento”; datetime=”Sat, 21 Dec
not all archives follow this convention and 1996 03:12:31 GMT”, <http://web.
extracting strings from URIs violates the archive.org/web/19981212015212/
practice of URI opacity as recommended http://lanl.gov/>; rel=”next
by the Web Architecture (Jacobs and Walsh, memento”; datetime=”Sat, 12 Dec
2004). In short, URI opacity means that one 1998 01:52:12 GMT”, <http://web.
archive.org/web/20170201114455/
should not rely on extracting semantics from http://lanl.gov/>; rel=”last
substrings in a URI. For example, the string memento”; datetime=”Wed, 01 Feb
‘19961221031231’ might not always indi- 2017 11:44:55 GMT”
cate the Memento-Datetime that it appears
to indicate. Furthermore, not all archives use In Figure 14.6 we see the HTTP response
URIs with apparent semantics. For exam- for the URI-M http://archive.is/OYfTd, and
ple, the WebCite web archive has archived from just this URI-M we cannot extract the
www.lanl.gov on 2011-06-28, and that page URI-R and Memento-Datetime (similar to
is available at both of these URIs, neither the WebCite example above); thus the Link
ADDING THE DIMENSION OF TIME TO HTTP 195

$ curl -I http://archive.is/OYfTd and Memento-Datetime response headers are


HTTP/1.1 200 OK crucial.
Date: Wed, 08 Feb 2017 02:52:25 GMT
We established that the Memento in Figure
Content-Type: text/
html;charset=utf-8 14.5 is the first Memento available for http://
Memento-Datetime: Wed, 17 Dec 2014 www.lanl.gov/. There are two ways to verify
21:16:53 GMT this with the Internet Archive. The first is
Link: <http://www.lanl.gov/>; to download the TimeMap for http://www.
rel=”original”, <http://archive.
lanl.gov/ and check the first value. Figure
is/timegate/http://www.lanl.
gov/>; rel=”timegate”, <http:// 14.7 only shows the first few entries since
archive.is/timemap/http://www. the TimeMap is quite large, with over 1,800
lanl.gov/>; rel=”timemap”; Mementos.
type=”application/link-format”; Some TimeMaps are already larger than
from=”Sat, 15 Oct 2011 08:20:59
100,000 Mementos, so even though they
GMT”; until=”Wed, 17 Dec 2014
21:16:53 GMT”, <http://archive. are sorted by datetime, it can still be a lot to
is/20141106023554/http://www. download and parse. To address this need,
lanl.gov/>; rel=”prev memento”; TimeGates perform datetime content nego-
datetime=”Thu, 06 Nov 2014 tiation and will issue an HTTP redirect to the
02:35:54 GMT”, <http://archive.
most appropriate Memento. For conventional
is/20111015082059/http://www.
lanl.gov/>; rel=”first memento”; web archives that are unaware of the change
datetime=”Sat, 15 Oct 2011 rate of the Original Resource, the TimeGate
08:20:59 GMT”, <http://archive. simply chooses the closest Memento to the
is/20141217211653/http://www. requested datetime (as expressed in the
lanl.gov/>; rel=”last memento”;
Accept-Datetime request header). This algo-
datetime=”Wed, 17 Dec 2014
21:16:53 GMT” rithm is known as ‘mindist’ for ‘Minimum
Distance’. Figure 14.8 shows the client ask-
Figure 14.6 HTTP response for a Memento ing the Internet Archive for a Memento clos-
from archive.is. est to October 16, 2013.

$ curl --silent http://web.archive.org/web/timemap/link/http://lanl.gov/ |


head -10
<http://lanl.gov/>; rel=”original”,
<http://web.archive.org/web/timemap/link/http://lanl.gov/>; rel=”self”;
type=”application/link-format”; from=”Sat, 21 Dec 1996 03:12:31 GMT”;
until=”Wed, 01 Feb 2017 11:44:55 GMT”,
<http://web.archive.org/web/http://lanl.gov/>; rel=”timegate”,
<http://web.archive.org/web/19961221031231/http://lanl.gov/>; rel=”first
memento”; datetime=”Sat, 21 Dec 1996 03:12:31 GMT”,
<http://web.archive.org/web/19981206235030/http://lanl.gov/>; rel=”memento”;
datetime=”Sun, 06 Dec 1998 23:50:30 GMT”,
<http://web.archive.org/web/19981212015212/http://lanl.gov/>; rel=”memento”;
datetime=”Sat, 12 Dec 1998 01:52:12 GMT”,
<http://web.archive.org/web/19981212030449/http://www.lanl.gov/>;
rel=”memento”; datetime=”Sat, 12 Dec 1998 03:04:49 GMT”,
<http://web.archive.org/web/19990117014439/http://lanl.gov/>; rel=”memento”;
datetime=”Sun, 17 Jan 1999 01:44:39 GMT”,
<http://web.archive.org/web/19990117083819/http://lanl.gov/>; rel=”memento”;
datetime=”Sun, 17 Jan 1999 08:38:19 GMT”,
<http://web.archive.org/web/19990125090547/http://lanl.gov/>; rel=”memento”;
datetime=”Mon, 25 Jan 1999 09:05:47 GMT”,

Figure 14.7 The first ten lines of the TimeMap for http://www.lanl.gov/.
196 THE SAGE HANDBOOK OF WEB HISTORY

$ curl -I -H “Accept-Datetime: Wed, the HTTP response from the World Wide
16 Oct 2013 22:59:48 GMT” http:// Web Consortium (W3C, a standards body
web.archive.org/web/http://lanl.
for the web) wiki which implements our
gov/
HTTP/1.1 302 Found Memento for MediaWiki extension (Jones
Server: Tengine/2.1.0 et al., 2014), and the server is specifying its
Date: Wed, 08 Feb 2017 03:38:20 GMT own TimeGate for the URI-R https://www.
Content-Type: text/html w3.org/wiki/SpecProd/Restyle. One could
Content-Length: 0
consult the Internet Archive to discover
Location: /web/20131019030442/
http://www.lanl.gov/ Mementos for this URI-R, but since this is a
Vary: accept-datetime MediaWiki, the wiki is its own archive and
Link: <http://lanl.gov/>; is authoritative on which versions existed
rel=”original”, <http://web. at different points in time (see Jones et al.,
archive.org/web/timemap/link/
2016a for an analysis of missed updates and
http://lanl.gov/>; rel=”timemap”;
type=”application/link-format”, redundant Mementos when web archives
<http://web.archive.org/ interact with wikis). Figure 14.10 shows
web/19961221031231/http://lanl. datetime negotiation requesting a Memento
gov/>; rel=”first memento”; one second before the Memento-Datetime
datetime=”Sat, 21 Dec 1996
for the last Memento in the wiki. If this were
03:12:31 GMT”, <http://web.
archive.org/web/20131014072106/ a conventional web archive that did not know
http://lanl.gov/>; rel=”prev the complete version history, it would use
memento”; datetime=”Mon, 14 Oct mindist and choose the Memento that was
2013 07:21:06 GMT”, <http://web. one second into the future. But since this is
archive.org/web/20131019030442/
a content management system with a com-
http://lanl.gov/>; rel=”memento”;
datetime=”Sat, 19 Oct 2013 plete version history, it selects the closest
03:04:42 GMT”, <http://web. Memento with a Memento-Datetime value
archive.org/web/20131020082626/ less than or equal to the Accept-Datetime
http://lanl.gov/>; rel=”next value because that is the version that was
memento”; datetime=”Sun, 20 Oct
the current one at the requested datetime.
2013 08:26:26 GMT”, <http://web.
archive.org/web/20170201114455/ This algorithm is known as ‘minpast’ for
http://lanl.gov/>; rel=”last ‘Minimum Past’ and in Figure 14.10 it
memento”; datetime=”Wed, 01 Feb results in the redirection to a Memento two
2017 11:44:55 GMT” years in the past instead of one second in the
future, which the mindist algorithm would
Figure 14.8 Negotiating with a TimeGate have chosen.
for a Memento of http://www.lanl.gov/ The entire Memento framework comes
close to October 16, 2013.
together as shown in Figure 14.11, where
ideally the Original Resource links to the
In Figure 14.8, the TimeGate issues a TimeGate (URI-G); if this link is not pro-
redirection to URI-M http://web.archive.org/ vided the user’s browser can be configured
web/20131019030442/http://lanl.gov/, which to know the location of a suitable TimeGate.
although off by three days (October 19, 2013) It is the TimeGate’s job to handle the date-
is the closest available Memento to October time negotiation and redirect the user to the
16, 2013 that the Internet Archive has. closest available Memento. The Mementos
The Memento Protocol also applies to in the web archive provide machine-reada-
transactional archives, such as LANL’s ble links to each other, as well as back to
SiteStory (Brunelle et al., 2013), and content the Original Resource, allowing for seam-
management systems that act as their own less navigation between the current and past
archives, such as wikis. Figure 14.9 shows webs. Because the Memento Protocol is an
ADDING THE DIMENSION OF TIME TO HTTP 197

$ curl -I https://www.w3.org/wiki/ MEMENTO AND AGGREGATING


SpecProd/Restyle
HTTP/1.1 200 OK
MULTIPLE ARCHIVES
Link: <https://www.w3.org/wiki/
SpecProd/Restyle>; rel=”original In Figure 14.8, the Internet Archive did not
latest-version”,<https://www. have a Memento for the exact date of
w3.org/wiki/Special:TimeGate/ October 16, 2013 and redirected to one of
S p e c P r o d / R e s t y l e > ; October 19 instead. For many applications,
rel=”timegate”,<https://www.
this three-day difference is inconsequential,
w3.org/wiki/Special:TimeMap/
SpecProd/Restyle>; rel=”timemap”; but what if it was essential to view the state
type=”application/link-format”; of that URI-R on October 16, 2013? Users
from=”Thu, 08 Dec 2011 20:09:41 could query different web archives sepa-
GMT”; until=”Fri, 11 Mar 2016 rately, but keeping track of the publicly
16:02:35 GMT”,<https://www.w3.org/
accessible web archives that natively support
wiki/index.php?title=SpecProd/
Restyle&oldid=55833>; rel=”first the Memento Protocol would require more
memento”; datetime=”Thu, knowledge than most users possess. For
08 Dec 2011 20:09:41 example, the earliest Memento for the
GMT”,<https://www.w3.org/ Smithsonian Institute’s home page (http://
wiki/index.php?title=SpecProd/
www.si.edu/) is in the Portuguese Web
Restyle&oldid=97718>; rel=”last
memento”; datetime=”Fri, 11 Mar Archive and not the Internet Archive
2016 16:02:35 GMT” (October, 1996 vs. May, 1997) (Fuhrig,
Content-language: en 2014). Too much time has passed to defi-
Last-Modified: Mon, 13 Feb 2017 nitely say why the Portuguese Web Archive
09:44:38 GMT
has the earliest page, but the Memento
Content-Type: text/html;
charset=UTF-8 Protocol facilitated the discovery of the
Content-Length: 22688 Memento in an archive that many people in
Date: Mon, 13 Feb 2017 17:08:03 GMT the United States may not have known.
Fortunately, the Memento Protocol makes
Figure 14.9 HTTP response with Memento it easy to combine various web archives
headers from the W3C MediaWiki. into aggregators, which provide a single
TimeGate and a single TimeMap for all the
extension of HTTP and content negotiation, web archives that it aggregates (Sanderson,
it retains HTTP’s compliance with the archi- 2012). For example, Los Alamos National
tectural principles of Representational State Laboratory runs an aggregator available at:
Transfer (REST) and Hypermedia as the http://timetravel.mementoweb.org. To revisit
Engine of Application State (HATEOAS) the query in Figure 14.8, finding a Memento
(Fielding, 2000). In its simplest form, of http://www.lanl.gov/ close to October 16,
this means that clients interact with the 2013, Figure 14.12 shows the response from
state of resources using only the methods the TimeTravel aggregator.
defined in HTTP (i.e., REST) and ‘follow Since the aggregator at timetravel.
their nose’ and interact with self-describ- mementoweb.org currently knows of
ing resources to discover navigable links approximately 30 publicly accessible web
to related, typed resources, such as other archives, it can redirect the client to exactly
Mementos, TimeGates, and TimeMaps (i.e., the desired Memento for http://www.lanl.
HATEOAS). In summary, the Memento gov/ on October 16, 2013. These examples
Protocol is not a service separate and apart are from the publicly accessible aggrega-
from a web archive, it is embedded within tor at timetravel.mementoweb.org, but there
the normal HTTP operations of a web is no requirement to use this aggregator.
archive. Old Dominion University (ODU) runs a
198 THE SAGE HANDBOOK OF WEB HISTORY

$ curl -I -L -H “Accept-Datetime: Fri, 11 Mar 2016 16:02:34 GMT” https://www.


w3.org/wiki/Special:TimeGate/SpecProd/Restyle
HTTP/1.1 302 Found
Vary: Accept-Encoding,Accept-Datetime
Location: https://www.w3.org/wiki/index.php?title=SpecProd/
Restyle&oldid=77827
Link: <https://www.w3.org/wiki/Special:TimeMap/SpecProd/Restyle>;
rel=”timemap”; type=”application/link-format”; from=”Thu, 08 Dec 2011
20:09:41 GMT”; until=”Fri, 11 Mar 2016 16:02:35 GMT”,<https://www.w3.org/
wiki/index.php?title=SpecProd/Restyle&oldid=55833>; rel=”first memento”;
datetime=”Thu, 08 Dec 2011 20:09:41 GMT”,<https://www.w3.org/wiki/index.
php?title=SpecProd/Restyle&oldid=97718>; rel=”last memento”; datetime=”Fri,
11 Mar 2016 16:02:35 GMT”,<https://www.w3.org/wiki/SpecProd/Restyle>;
rel=”original latest-version”
Content-Type: text/html; charset=UTF-8
Content-Length: 0
Date: Mon, 13 Feb 2017 17:17:07 GMT

HTTP/1.1 200 OK
Memento-Datetime: Fri, 10 Oct 2014 04:07:37 GMT
Link: <https://www.w3.org/wiki/SpecProd/Restyle>; rel=”original latest-
version”,<https://www.w3.org/wiki/Special:TimeGate/SpecProd/Restyle>;
rel=”timegate”,<https://www.w3.org/wiki/Special:TimeMap/SpecProd/Restyle>;
rel=”timemap”; type=”application/link-format”; from=”Thu, 08 Dec 2011
20:09:41 GMT”; until=”Fri, 11 Mar 2016 16:02:35 GMT”,<https://www.w3.org/
wiki/index.php?title=SpecProd/Restyle&oldid=55833>; rel=”first memento”;
datetime=”Thu, 08 Dec 2011 20:09:41 GMT”,<https://www.w3.org/wiki/index.
php?title=SpecProd/Restyle&oldid=97718>; rel=”last memento”; datetime=”Fri,
11 Mar 2016 16:02:35 GMT”
Content-language: en
Vary: Accept-Encoding,Cookie
Content-Type: text/html; charset=UTF-8
Content-Length: 21235
Date: Mon, 13 Feb 2017 17:17:07 GMT

Figure 14.10 Datetime negotiation with a MediaWiki TimeGate for one second
before the latest Memento; MediaWiki uses the minpast algorithm
instead of mindist.

Figure 14.11 Architectural overview of how the Memento framework allows a


representation of a prior state of a resource to be accessed.
ADDING THE DIMENSION OF TIME TO HTTP 199

$ curl -I -H “Accept-Datetime: $ curl --silent http://memgator.


Wed, 16 Oct 2013 22:59:48 GMT” cs.odu.edu/timemap/link/http://
http://timetravel.mementoweb.org/ www.lanl.gov/ | grep datetime |
timegate/http://www.lanl.gov/ awk ‘{print $1}’ | awk -v FS=/
HTTP/1.1 302 Moved Temporarily ‘{print $3}’ | sort | uniq -c
Server: nginx/1.10.1 36 archive.is
Date: Thu, 09 Feb 2017 17:50:37 GMT 3 arquivo.pt
Content-Type: text/plain; 6 swap.stanford.edu
charset=iso-8859-1 311 wayback.archive-it.org
Content-Length: 0 1 wayback.vefsafn.is
Location: http://archive. 1810 web.archive.org
is/20131016225948/http://www. 228 webarchive.loc.gov
lanl.gov/ 1 webarchive.parliament.uk
Vary: Accept-Datetime
Last-Modified: Wed, 08 Feb 2017 Figure 14.13 The processed TimeMap
04:27:12 GMT
showing the hostnames of the eight public
Link: <http://www.lanl.
gov/>;rel=”original”, <http://
web archives with Mementos for http://
timetravel.mementoweb.org/ www.lanl.gov/ and their respective
timemap/link/http://www. Memento counts.
lanl.gov/>;rel=”timemap”;
type=”application/link-
format”,<http://archive. The MemGator software is available for
is/20131016225948/http://www.
download and local installation (Alam and
lanl.gov/>;rel=”memento”;
datetime=”Wed, 16 Oct 2013 22:59:48 Nelson, 2016), making it possible to set up
GMT”,<http://web.archive.bibalex. a Memento aggregator with custom sets of
org:80/web/19961221031231/ web archives, suitable for local or private
http://lanl.gov/>;rel=”memento deployments.
first”; datetime=”Sat, 21 Dec
1996 03:12:31 GMT”,<http://web.
archive.org/web/20170201114455/
http://lanl.gov/>;rel=”memento RIGHT-CLICK TO THE PAST
last”; datetime=”Wed, 01 Feb 2017
11:44:55 GMT”
To this point, we have discussed the
Figure 14.12 A response from an Memento Protocol in terms of HTTP inter-
aggregated TimeGate, redirecting to actions, using the curl command line user-
http://archive.is/20131016225948/ agent and examining raw HTTP responses.
http://www.lanl.gov/. While this is necessary to understand the
mechanics of the Memento Protocol, there
are a range of more user-friendly clients and
separate aggregator at memgator.cs.odu. services using the Memento Protocol that
edu, and a query to this aggregator shows provide a more seamless integration of the
that Mementos for http://www.lanl.gov/ can current and past web.
be found in at least eight publicly accessible The easiest way to get started is to visit
web archives with native Memento Protocol http://timetravel.mementoweb.org/ and
support (Figure 14.13). While the Internet input the desired URI-R and datetime. In
Archive (archive.org) clearly has the most Figure 14.14, we use http://www.lanl.gov/ and
with 1,810 Mementos and Archive-It (a 2013-10-16, respectively; this is effectively
separate subscription service of the Internet the user-friendly version of the curl request
Archive) is second with 311, there are still shown in Figure 14.12. In Figure 14.15, we
more than 250 Mementos in other web see the results, sorted by web archives with
archives. a Memento closest to the desired datetime,
Figure 14.14 A request to the TimeTravel service with URI-R = http://www.lanl.gov/
and datetime=2013-10-16.

Figure 14.15 The response to the request shown in Figure 14.12, with seven archives
holding Mementos for this URI-R (available at: http://timetravel.mementoweb.org/
list/20131016000000/http://www.lanl.gov/).
ADDING THE DIMENSION OF TIME TO HTTP 201

with the archive.is Memento at the top of the i.e., http://www.lanl.gov/. There are several
list, followed by the Internet Archive, and datetime options, but here the user is select-
a total of seven different archives (when the ing the datetime as set in Figure 14.16. After
page scrolled to the bottom). The archives dif- selecting the link in Figure 14.17, the cli-
fer slightly from the ODU aggregator result ent then communicates with the Time Travel
shown in Figure 14.13, in part because of a aggregator (effectively issuing the request
prediction interface employed by the LANL shown in Figure 14.12), and the result is the
Time Travel service, which is discussed below. client is directed to the URI-M http://archive.
While the Time Travel service is an attrac- is/20131016225948/http://www.lanl.gov/
tive and easy to use interface, it is still a des- (Figure 14.18). To navigate back to the live
tination separate from the live web, meaning web, the user right-clicks in the middle of the
that users need to know of its location and page again and selects ‘get at current date’
choose to navigate there to explore the past (Figure 14.19). The described interactions are
web. While this is suitable for extended ses- also possible for links in pages and embedded
sions of interacting with the past web, such resources such as images.
sessions do not reflect how most humans use
web archives. In a study of accesses to the
Internet Archive, AlNoamany found that 82%
of human sessions begin with referrals from ONGOING RESEARCH
live web pages (the majority of which come
from Wikipedia), and 86% of those links no The Memento Protocol has opened up entirely
longer exist on the live web (AlNoamany, new areas of research regarding inter-archive
2014). In short, web archives are primarily access and collaboration. Though not an
used as a versioning system to supplement exhaustive list, we provide an overview of
failures in the live web, so we should have some of our recent research activities which
user agents that support this modality. the Memento Protocol has made possible
There are several clients for providing and, in some cases, necessary.
Memento-based access, including the original
MementoFox (now deprecated) (Sanderson
et al., 2011), Mink (which also facilitates Routing URI Requests to the
archiving of live web pages) (Kelly et al., Proper Web Archives
2014b), and Memento for iOS (Tweedy et al.,
2013). We will focus on the ‘Memento for When there was only one archive (i.e., the
Chrome’ extension, which provides Chrome Internet Archive), access was simple: the
users the capability to ‘right-click to the past’ archive either had the desired Memento or it
(Nelson, 2013). Regular clicks provide the did not. The addition of multiple web
expected navigation (i.e., staying on the live archives, which began in earnest in the mid
web), but users can choose to right-click on a to late 2000s (Bailey et al., 2013; Gomes
link or just in the middle of the page for the et al., 2011), increases the complexity sig-
current URI to seamlessly provide access nificantly. Either the user has to navigate to
to the same functionality shown in Figures different web archives and check for exist-
14.12, 14.14, and 14.15. Figure 14.16 shows ence, which is limited by the user’s time and
the user setting the desired datetime (October knowledge of the archives, or a service has
16, 2013) with a calendar widget (the exten- to aggregate access to these archives (cf. the
sion will use this value in the Accept-Datetime Time Travel service in Figures 14.14 and
request header). Figure 14.17 shows the 14.15). When the number of web archives
user right-clicking in the middle of the page, an aggregator has to select is small, it is
indicating the desired URI-R is ‘this’ page, easy to broadcast the URI lookup to all
202 THE SAGE HANDBOOK OF WEB HISTORY

Figure 14.16 Setting the datetime to October 16, 2013 for http://www.lanl.gov/.

known web archives and then synthesize the seems – crawling scope is notoriously dif-
results. As the number of web archives ficult and imprecise and archives often cap-
grows this become untenable, with responses ture more than they intended.
waiting for the slowest archive to respond, In our earliest research in URI routing
while all along most of the web archives do (AlSum et al., 2013, 2014), we proved that
not hold the desired Memento (i.e., a 404 52% of the time we can still produce a com-
HTTP response). This problem is well- plete TimeMap querying only the top three
known in the distributed search community (of a possible 12 available at that time) web
as the query routing problem (Callan, 2002), archives even if we exclude the Internet
even though URI lookups are not exactly Archive. Querying the top six web archives
like full-text queries in that the archives (excluding the Internet Archive) produces
either have the Memento or they do not, and a complete TimeMap 77% of the time. In
there is no ranked list of potentially relevant summary, despite a growing number of
hits. For web archives, we want to predict web archives, we need only query a few of
where to send URI lookup requests such them to get the complete list of all available
that we only query archives which are likely Mementos, and even if the Internet Archive
to have Mementos for the requested Original were to disappear, over one-half of the
URI. This is not as obvious as it first Original URIs would be unaffected.
ADDING THE DIMENSION OF TIME TO HTTP 203

Figure 14.17 Right-clicking in the middle of the page to expose datetime negotiation
options for http://www.lanl.gov/.

Of course, the question is: how do we All of these methods have trade-offs. Using
know where to send the URI lookups? We CDX files and HTTP access logs requires
have explored a variety of methods, including cooperation from the participating archives
building web archive profiles using their CDX to make those files available for processing
files (produced for playback in the Wayback into a profile. Profiles built from learning
Machine archival playback software) and from responses to lookups are sensitive to
HTTP access logs of web archives (Alam the URIs used in the training phase and thus
et al., 2015; Alam et al., 2016b), machine will not reveal Original URIs that you do not
learning techniques based training data from know to ask for. Using keywords extracted
responses of web archives for a fixed set of from documents assumes a full-text search
URI lookups (Bornand et al., 2016) (this interface is available (which is often available
method is in production in the Time Travel only in smaller archives) and requires a large
service shown in Figures 14.14 and 14.15), number of full-text queries to extract enough
and – for web archives that have a full-text documents to profile its holdings. Our cur-
index – extracting terms from archived docu- rent research in this area involves developing
ments and using those as queries back into approaches that are hybrids of the meth-
the archive (Alam et al., 2016a). ods described above, as well as addressing
204 THE SAGE HANDBOOK OF WEB HISTORY

Figure 14.18 The user is now at the Memento http://archive.is/20131016225948/


http://www.lanl.gov/.

optimum approaches for updates and reeval- other embedded resources missing, the client
uating baseline profiles. is unable to take advantage of the other web
archives that might hold those resources.
Some embedded resources, links, or even
entire sites might not be present in the Internet
How to Use Multiple Archives
Archive. One reason an entire site might not
Figures 14.12, 14.14, and 14.15 show an be present is because the site, for any num-
aggregator consulting multiple web archives ber of reasons, is using a ‘robots.txt’ file on
and then redirecting the requesting client to the live web to limit access by the Internet
the web archive that contains the closest Archive, for both future crawls and any con-
available Memento. But once the client fol- tent they currently hold (Rossi, 2016). Figure
lows that redirection, all the requests for 14.20 shows the Internet Archive’s response
embedded resources are relative to the for the URI-R https://www.quora.com/, and
archive holding the root HTML page (in the Figure 14.21 shows that there are several
case of Figures 14.12, 14.14, and 14.15, all hundreds of Mementos in other web archives
requests are relative to http://archive.is/). that do not implement the Internet Archive’s
This means if there are links, images, or robots.txt policy.
ADDING THE DIMENSION OF TIME TO HTTP 205

Figure 14.19 Right-clicking in the middle of the Memento to go back to the live web
(i.e., from http://archive.is/20131016225948/http://www.lanl.gov/ back to http://www.
lanl.gov/).

Memento Quality and Temporal Memento damage (Brunelle et al., 2014,


Coherence 2015) which provides weights to missing
embedded resources based on heuristics
Although the gold standard for assessing for determining if the missing resource
web archiving quality is still human inter- was ‘important’.
action with a Memento to ensure all Of particular note is our work on ‘Temporal
embedded resources, links, and functional- Violations’, which are combinations of root
ity are preserved, this level of assessment HTML pages and embedded resources (e.g.,
is clearly not scalable. We have been images, CSS, JavaScript) that are combined
involved with a range of automated during archival replay to produce a com-
evaluations of the web archiving process, bination that never existed on the live web
including the Archival Acid Test (Ainsworth et al., 2014; 2015). This can hap-
(Kelly et al., 2014a), which evaluates the pen when the root HTML page is crawled
capabilities of crawling and playback tech- and the embedded images (for example) are
nology stacks (e.g., the Heritrix crawler modified in between the time the root HTML
(Mohr et al., 2004) and the Wayback page was crawled and the images themselves
Machine playback engine), and assessing were crawled.
206 THE SAGE HANDBOOK OF WEB HISTORY

Figure 14.20 The Internet Archive may hold Mementos for https://www.quora.com/
but is blocking them due to the directives found in https://www.quora.com/robots.txt.

$ curl --silent -i http://memgator. 1 Prima Facie Coherent – the combination of


cs.odu.edu/timemap/link/https://
embedded resources and a root HTML page can
www.quora.com/ | grep datetime |
awk ‘{print $1}’ | awk -v FS=/
be shown to have existed as presented at the
‘{print $3}’ | sort | uniq -c time of crawling (i.e., Memento-Datetime).
44 archive.is 2 Prima Facie Violative – the embedded resources
3 arquivo.pt can be shown to have been modified since the
219 wayback.archive-it.org Memento-Datetime of the root HTML page.
77 wayback.vefsafn.is 3 Possibly Coherent – the embedded resources
182 webarchive.loc.gov have Memento-Datetimes earlier than their root
4 webarchive.nationalarchives.gov.uk HTML page, and although they cannot be shown
6 webarchive.parliament.uk to be coherent since embedded resources are
typically static, they are possibly coherent.
Figure 14.21 https://www.quora.com/ is 4 Probably Violative – the embedded resources
not in the Internet Archive but is archived have Memento-Datetimes later than their root
500+ times in seven other archives. HTML page, and although they cannot be shown
to be violative, as root HTML pages are often
We defined four categories for discussing the dynamically generated and modified at a faster
temporal coherence of composite Mementos rate than embedded resources, they are probably
(root HTML plus embedded resources): violative.
ADDING THE DIMENSION OF TIME TO HTTP 207

$ curl -I http://web.archive.org/web/20141113140512im_/http://www.lanl.gov/
_assets/images/lanl-logo-footer.png
HTTP/1.1 200 OK
Server: Tengine/2.1.0
Date: Mon, 13 Feb 2017 05:46:48 GMT
Content-Type: image/png
Content-Length: 8719
Connection: keep-alive
Memento-Datetime: Thu, 13 Nov 2014 14:05:12 GMT
Link: <http://www.lanl.gov/_assets/images/lanl-logo-footer.png>; rel=”original”,
<http://web.archive.org/web/timemap/link/http://www.lanl.gov/_assets/images/
lanl-logo-footer.png>; rel=”timemap”; type=”application/link-format”,
<http://web.archive.org/web/http://www.lanl.gov/_assets/images/lanl-logo-
footer.png>; rel=”timegate”, <http://web.archive.org/web/20120912040310/
http://www.lanl.gov/_assets/images/lanl-logo-footer.png>; rel=”first
memento”; datetime=”Wed, 12 Sep 2012 04:03:10 GMT”, <http://web.archive.
org/web/20141009211301/http://www.lanl.gov/_assets/images/lanl-logo-footer.
png>; rel=”prev memento”; datetime=”Thu, 09 Oct 2014 21:13:01 GMT”, <http://
web.archive.org/web/20141113140512/http://www.lanl.gov/_assets/images/lanl-
logo-footer.png>; rel=”memento”; datetime=”Thu, 13 Nov 2014 14:05:12 GMT”,
<http://web.archive.org/web/20141122060334/http://www.lanl.gov/_assets/
images/lanl-logo-footer.png>; rel=”next memento”; datetime=”Sat, 22 Nov 2014
06:03:34 GMT”, <http://web.archive.org/web/20170204070650/http://www.lanl.
gov/_assets/images/lanl-logo-footer.png>; rel=”last memento”; datetime=”Sat,
04 Feb 2017 07:06:50 GMT”
X-Archive-Orig-last-modified: Tue, 28 Oct 2014 22:12:02 GMT
X-Archive-Orig-content-type: image/png
X-Archive-Orig-date: Thu, 13 Nov 2014 14:03:55 GMT
X-Archive-Orig-content-length: 8719

Figure 14.22 The Memento-Datetime and X-Archive-Orig-last-modified headers establish


a range of temporal validity.

The combination of Last-Modified and 14:05:12 GMT’, which means it was crawled
Memento-Datetime headers are critical for after the root HTML page in which it appears.
determining a range of temporal validity for an However, as shown in Figure 14.22, the Last-
embedded resource with respect to a root Modified date of the logo as it was when the
HTML page in which it appears. Embedded logo was crawled is echoed in the ‘X-Archive-
resources, such as images, CSS, and JavaScript, Orig-last-modified’ response header (indeed,
tend to be static files with Last-Modified many of the original HTTP response headers
response headers available (cf. Figure 14.1), are echoed by the web archive with the prefix
and most HTML files are dynamically gener- ‘X-Archive-Orig-’).
ated and thus rarely have the Last-Modified In Figure 14.23, we see a NOAA page
response header (cf. Figure 14.2). For exam- with a Memento-Datetime of January 29,
ple, in the Memento http://web.archive.org/ 1999 with an embedded image as the pri-
web/20141031175656/https://www.lanl.gov/, mary content. However, when we derefer-
with a Memento-Datetime of ‘Fri, 31 Oct 2014 ence the URI-M for the image, we see that
17:56:56 GMT’, the logo at the bottom of the Last-Modified and Memento-Datetime
the page (http://web.archive.org/web/ headers (Figure 14.24) have values of April
20141113140512im_/http://www.lanl.gov/_ 10, 2003. This combination of HTML page
assets/images/lanl-logo-footer.png) has a and embedded JPEG never existed on the
Memento-Datetime of ‘Thu, 13 Nov 2014 live web.
208 THE SAGE HANDBOOK OF WEB HISTORY

Figure 14.23 Memento http://web.archive.org/web/19990129040356/http://www.goes.noaa.


gov/browsh2.html.

In our study of temporal violations taken altogether, only 18% of composite


(Ainsworth et al., 2015), we found Mementos are both complete and Prima
that approximately 76% of composite Facie Coherent. In summary, web archives
Mementos were complete (i.e., missing probably provide a faithful rendering of
no embedded Mementos), and utilizing the past web approximately only one in
additional Memento-enabled web archives five times.
could raise that number to 80% complete.
More concerning is that 6% of compos-
ite Mementos are Prima Facie Violative Linking to Archives or the
and 2.5% are Probably Violative. When Live Web?
multiple archives are used, the Probably
Violative composite Mementos actually The existence of multiple web archives intro-
goes up to 5%, in part because not all web duces the question, similar to the ‘appropri-
archives provide the X-Archive-Orig-last- ate copy’ problem of reference linking
modified response header (Ainsworth, (Caplan and Arms, 1999), of which version
2015). So, while multiple archives increase to link to when writing HTML: the live web
the completeness, they can potentially version, or an archived version, and, if an
decrease the temporal coherence. When archived version, in which archive? Increased
ADDING THE DIMENSION OF TIME TO HTTP 209

$ curl -IL http://web.archive.org/web/19990129040356/http://www.goes.noaa.


gov/GIFS/HUVS.JPG
HTTP/1.1 302 Found
Server: Tengine/2.1.0
Date: Tue, 14 Feb 2017 04:02:10 GMT
Content-Type: image/jpeg
Content-Length: 0
Location: /web/20030410203138/http://www.goes.noaa.gov/GIFS/HUVS.JPG
Link: <http://www.goes.noaa.gov:80/GIFS/HUVS.JPG>; rel=”original”

HTTP/1.1 200 OK
Server: Tengine/2.1.0
Date: Tue, 14 Feb 2017 04:02:11 GMT
Content-Type: image/jpeg
Content-Length: 141380
Memento-Datetime: Thu, 10 Apr 2003 20:31:38 GMT
Link: <http://www.goes.noaa.gov/GIFS/HUVS.JPG>; rel=”original”, <http://
web.archive.org/web/timemap/link/http://www.goes.noaa.gov/GIFS/HUVS.JPG>;
rel=”timemap”; type=”application/link-format”, <http://web.archive.org/
web/http://www.goes.noaa.gov/GIFS/HUVS.JPG>; rel=”timegate”, <http://web.
archive.org/web/20030410203138/http://www.goes.noaa.gov/GIFS/HUVS.JPG>;
rel=”first memento”; datetime=”Thu, 10 Apr 2003 20:31:38 GMT”, <http://web.
archive.org/web/20030602014329/http://www.goes.noaa.gov/GIFS/HUVS.JPG>;
rel=”next memento”; datetime=”Mon, 02 Jun 2003 01:43:29 GMT”, <http://web.
archive.org/web/20170201134641/http://www.goes.noaa.gov/GIFS/HUVS.JPG>;
rel=”last memento”; datetime=”Wed, 01 Feb 2017 13:46:41 GMT”
X-Archive-Orig-last-modified: Thu, 10 Apr 2003 20:05:23 GMT
X-Archive-Orig-content-type: image/jpeg
X-Archive-Orig-date: Thu, 10 Apr 2003 20:31:33 GMT
X-Archive-Orig-content-length: 141380
X-Archive-Orig-server: Apache/2.0.45 (Unix)

Figure 14.24 Prima Facie Violative: the embedded JPEG from Figure 14.23 was actually
modified and archived in 2003, not 1999.

interest in link rot (‘soft 404s’ (Bar-Yossef datetime for use in archives the author of the
et al., 2004) or actual 404 HTTP responses) HTML might not have knowledge of. This
and content drift (the page persists, but the approach increases the chances to be able to
content is no longer relevant to the author’s revisit originally linked content by including
original intent when creating the link) in our information that can be used to look up
scholarly, legal, and other corpora has raised Mementos in any archive (URI-R and archi-
interest in linking directly to Mementos val datetime) in addition to the key that can
instead of Original URIs (Jones et al., 2016b; only be used in a single archive (URI-M).
Klein et al., 2014; Zittrain et al., 2014). The This fallback mechanism is relevant because
current practice in sites like Wikipedia is to the brief history of web archives provides
replace broken links to the live web to point plenty of illustrations that their accessibility
directly to a URI-M in the Internet Archive can be hindered by a wide range of chal-
(AlNoamany, 2014). However, this fails to lenges including technical failures, financial
take advantage of the other public web woes, take-down requests, and geo-­politically
archives that might also have copies. One induced censorship.
could link to an aggregator, but there is a The Robust Links specification takes
better way that allows providing both URI-R advantage of the HTML5 ‘data-’ attribute
and URI-M values, along with a prefered for extensible metadata fields not otherwise
210 THE SAGE HANDBOOK OF WEB HISTORY

<a href=”http://www.lanl.gov/” CONCLUSIONS


data-versionurl=”http://archive.
is/3IEj0”
data-versiondate=”2013-10-16”>my The Memento Protocol is the de facto stand-
robust link to the LANL home ard for integrating the now dozens of pub-
page</a> licly accessible web archives, and it works
equally well in intranets and private archives.
Figure 14.25 Primary link is to URI-R, For example, the browsers and tools we at
alternate link to URI-M, and a preferred LANL and ODU use to access the Internet
datetime. Archive and other public web archives are
the same tools we use to access our research
defined in HTML5 (Van de Sompel et al., group’s private wiki. The Memento Protocol
2015). For example, if we wanted to link to is an implementation of datetime negotiation
the URI-R with an alternate link to a URI-M, in HTTP, fulfilling one of the original design
we could write the HTML shown in Figure goals of Tim Berners-Lee that was deferred
14.25. The URI-R and value in the ‘data-­ in part because of the lack of temporal
versiondate’ attribute can be combined for semantics in the Unix filesystem. The
URI lookups in any web archive, for exam- Memento Protocol provides a standardized,
ple, using the Memento Protocol. machine-readable mechanism for bi-­
Similarly, it is possible to reverse the pri- directional linkage between the current and
mary and alternate link. Figure 14.26 shows the past webs, where before there had just been
primary link to an aggregator service that will an ad hoc set of conventions and archive-
dynamically determine and redirect the client to specific heuristics for naming and accessing
the closest available Memento; the URI-R is the the past web. TimeMaps provide lists of
alternate link, and the datetime is also provided. Mementos for an Original Resource, and
If that aggregator is no longer online, the URI-R TimeGates provide the datetime negotiation
and datetime values can be combined for dis- to the closest available Memento.
covery in other web archives or aggregators. In 2006, Julien Masanes expressed a vision
Variations of the examples in Figures of an interconnected, global grid of web
14.25 and 14.26 are possible, but all require archives:
the desired datetime to be expressed either
in the ‘data-versiondate’ link attribute or in Such a grid should link Web archives so that they
together form one global navigation space like the
the datePublished or dateModified attributes live Web itself. This is only possible if they are
for the HTML meta tag if the datetime is structured in a way close enough to the original
applicable to the entire page. A demonstra- Web and if they are openly accessible. (Masanes,
tion of Robust Links in action can be found 2006, p. 21)
in the links of one of our recent papers (Van
de Sompel and Nelson, 2015). The Memento Protocol achieves this goal by
changing web archives from strictly destina-
<a href=”http://timetravel.mementoweb.
tions into infrastructure that supports the live
org/memento/20131016000000/ web by providing a standardized and inte-
http://www.lanl.gov/” data- grated versioning for the web indexed by
originalurl=”http://www.lanl.gov/” global datetime. In the future when the web
data-versiondate=”2013-10-16”>my transitions from dozens to 100s or even
robust link to the LANL home
page</a>
1,000s of web archives, public and private,
the faithful and complete rendering of past
Figure 14.26 Primary link is to an web pages will depend on the ability to iden-
aggregator, alternate link to URI-R, and tify, query, and access the appropriate subset
a preferred datetime. of web archives.
ADDING THE DIMENSION OF TIME TO HTTP 211

The history of the early twenty-first century evaluation of composite memento temporal
cannot be told without significant evidence coherence. [online] Available at: http://arxiv.
from web archives. The websites and their org/abs/1402.0928 [Accessed 1 Mar. 2018].
contents are fleeting, and are too culturally Alam, S., and Nelson, M.L. (2016) ‘Memgator – a
important to be left to the care of a single web portable concurrent memento aggregator:
Cross-platform CLI and server binaries in Go’,
archive. The web is distributed and largely
in: JCDL ‘16. Proceedings of the 16th ACM/
uncoordinated, with interoperability emerg- IEEE-CS Joint Conference on Digital Libraries,
ing from simple protocols such as HTTP. So [online] Newark: ACM, pp. 243–244. Available
too must be our web archives: distributed and at: https://doi.org/10.1145/2910896.2925452
largely uncoordinated, with interoperability [Accessed 1 Mar. 2018].
made possible via time semantics that are part Alam, S., Nelson, M.L., Van de Sompel,
of HTTP and finally transcend the limitations H., Balakireva, L., Shankar, H., and Rosenthal,
of the Unix operating system. D.S.H. (2015) ‘Web archive profiling through
CDX summarization’, in: TPDL ‘15, Proceed-
ings of Theory and Practice of Digital Librar-
ies, [online] Poznań: Springer, pp. 3–14.
ACKNOWLEDGMENTS Available at: https://doi.org/10.1007/978-3-
319-24592-8_1 [Accessed 1 Mar. 2018].
Alam, S., Nelson, M.L., Van de Sompel, H., and
The Memento Protocol was originally sup- Rosenthal, D.S.H. (2016a) ‘Web archive pro-
ported by grants from the Library of Congress. filing through fulltext search’, in: TPDL ‘16,
Additional research was supported in part by Proceedings of Theory and Practice of Digital
NSF IIS 1009392, NSF IIS 1526700, and the Libraries, [online] Hannover: Springer, pp.
Andrew Mellon Foundation. Numerous people 121–132. Available at: https://doi.
supported the Memento Protocol through soft- org/10.1007/978-3-319-43997-6_10
ware development and installation, many of [Accessed 1 Mar. 2018].
whom appear below as our co-authors. Figures Alam, S., Nelson, M.L., Van de Sompel, H.,
14.16 through 14.19 are courtesy of the Los Balakireva, L.L., Shankar, H., and Rosenthal,
D.S.H. (2016b) ‘Web archive profiling
Alamos National Laboratory.
through CDX summarization’, International
Journal on Digital Libraries, [online]
17(3):223–228. Available at: https://doi.
org/10.1007/s0079 [Accessed 1 Mar.
REFERENCES 2018].
AlNoamany, Y. (2014) ‘Using Web Archives to
Ainsworth, S.G. (2015) Original header replay Enrich the Live Web Experience Through
considered coherent. [blog] Web Science Storytelling’. PhD. Old Dominion University,
and Digital Library Research Group. Available Department of Computer Science.
at: http://ws-dl.blogspot.com/2015/08/2015- AlSum, A., Weigle, M.C., Nelson, M.L., and
08-28-original-header- replay.html [Accessed Van de Sompel, H. (2013) ‘Profiling web
1 Mar. 2018]. archive coverage for top-level domain and
Ainsworth, S.G., Nelson, M.L., and Van de content language’, in: TPDL ‘13, Proceedings
Sompel, H. (2015) ‘Only one out of five of Theory and Practice of Digital Libraries,
archived web pages existed as presented’, in: [online] Valetta: Springer, pp. 60–71. Availa-
HT ‘15, Proceedings of the 26th ACM Confer- ble at: https://doi.org/10.1007/978-3-642-
ence on Hypertext and Hypermedia, [online] 40501-3_7 [Accessed 1 Mar. 2018].
Guzelyurt: ACM, pp. 257–266. Available at: AlSum, A., Weigle, M.C., Nelson, M.L., and
https://doi.org/10.1145/2700171.2791044 Van de Sompel, H. (2014) ‘Profiling web
[Accessed 1 Mar. 2018]. archive coverage for top-level domain and
Ainsworth, S.G., Nelson, M.L., and Van de content language’, International Journal on
Sompel, H. (2014) A framework for Digital Libraries, [online] 14(3):149–166.
212 THE SAGE HANDBOOK OF WEB HISTORY

Available at: https://doi.org/10.1007/ 16(3–4):283–301. Available at: https://doi.


s00799-014-0118-y [Accessed 1 Mar. 2018]. org/10.1007/s00799-015-0150-6 [Accessed
Bailey, J., Grotke, A., Hanna, K., Hartman, C., 1 Mar. 2018].
McCain, E., Moffatt, C., and Taylor, N. (2013) Brunelle, J.F., Nelson, M.L., Balakireva, L.,
Web archiving in the United States: A 2013 Sanderson, R., and Van de Sompel, H. (2013)
survey. [online] Available at: https://blogs. ‘Evaluating sitestory with the ApacheBench
loc.gov/thesignal/2014/10/results-from-the- tool’, in: TPDL ‘13, International Conference
2013-ndsa-u-s-web-archiving-survey / on Theory and Practice of Digital Libraries,
[Accessed 1 Mar. 2018]. [online] Valetta: Springer, pp. 204–215.
Bar-Yossef, Z., Broder, A.Z., Kumar, R., and Tom- Available at: https://doi.org/10.1007/978-3-
kins, A. (2004) ‘Sic transit gloria telae: Towards 642-40501-3_20 [Accessed 1 Mar. 2018].
an understanding of the web’s decay’, in: Callan, J. (2002) ‘Distributed Information
WWW ‘04, Proceedings of the 13th Interna- Retrieval’ in: Croft W.B. (ed) Advances in Infor-
tional Conference on World Wide Web, [online] mation Retrieval. The Information Retrieval
New York: ACM, pp. 328–337. Available at: Series, vol 7. Springer, Boston, MA, [online].
https://doi.org/10.1145/988672.988716 Available at: https://doi.org/10.1007/0-306-
[Accessed 1 Mar. 2018]. 47019-5_5 [Accessed 1 Mar. 2018].
Berners-Lee, T. (1991) The original http as Caplan, P., and Arms, W.Y. (1999) ‘Reference
defined in 1991. [online] Available at: https:// linking for journal articles’, D-Lib Magazine,
www.w3.org/Protocols/HTTP/AsImple- [online] 5(7/8). Available at: https://doi.
mented.html [Accessed 1 Mar. 2018]. org/10.1045/july99-caplan [Accessed 1 Mar.
Berners-Lee, T. (1996) Web architecture: 2018].
Generic resources. [online] Available at: Crocker, D.H. (1982) Standard for the Format
http://www.w3.org/DesignIssues/Generic. of ARPA Internet Text Messages, Internet
html [Accessed 1 Mar. 2018]. RFC-822. [online] Available at: https://tools.
Berners-Lee, T., Fielding, R., and Frystyk, H. ietf.org/html/rfc822 [Accessed 1 Mar. 2018].
(1996) Hypertext Transfer Protocol – Fielding, R., Gettys, J., Mogul, J., Frystyk, H.,
HTTP/1.0, Internet RFC 1945. [online] Avail- Masinter, L., Leach, P., and Berners-Lee, T.
able at: https://tools.ietf.org/html/rfc1945 (1999) Hypertext Transfer Protocol –
[Accessed 1 Mar. 2018]. HTTP/1.1, Internet RFC-2616. [online] Avail-
Bornand, N.J., Balakireva, L., and Van de able at: https://tools.ietf.org/html/rfc2616
Sompel, H. (2016) ‘Routing memento [Accessed 1 Mar. 2018].
requests using binary classifiers’, in: JCDL Fielding, R.T. (2000) ‘Architectural Styles and
‘16, Proceedings of the 16th ACM/IEEE-CS the Design of Network-based Software
Joint Conference on Digital Libraries, [online] Architectures’. PhD. University of California,
Newark: ACM, pp. 63–72. Available at: Irvine, Department of Computer Science.
https://doi.org/10.1145/2910896.2910899 Fuhrig, L.S. (2014) Tracking down the elusive
[Accessed 1 Mar. 2018]. ‘treasure house for learning’. [blog] Smithso-
Brunelle, J.F., Kelly, M., SalahEldeen, H., Weigle, nian Institution Archives blog. Available at:
M.C., and Nelson, M.L. (2014) ‘Not all https://siarchives.si.edu/blog/tracking-down-
mementos are created equal: Measuring the elusive-’treasure-house-learning’ [Accessed 1
impact of missing resources’, in: JCDL ‘14, Mar. 2018].
Proceedings of the 14th ACM/IEEE-CS Joint Gomes, D., Miranda, J., and Costa, M. (2011)
Conference on Digital Libraries, [online] ‘A survey on web archiving initiatives’, in:
London: ACM, pp. 321–330. Available at: TPDL ‘11, Proceedings of Theory and Practice
https://doi.org/10.1109/JCDL.2014.6970187 of Digital Libraries, [online] Berlin: Springer,
[Accessed 1 Mar. 2018]. pp. 408–420. Available at: https://doi.
Brunelle, J.F., Kelly, M., SalahEldeen, H., Weigle, org/10.1007/978-3-642-24469-8_41
M.C., and Nelson, M.L. (2015) ‘Not all [Accessed 1 Mar. 2018].
mementos are created equal: Measuring the Jacobs, I., and Walsh, N. (2004) Architecture of
impact of missing resources’, International the world wide web, volume one. Technical
Journal on Digital Libraries, [online] Report W3C Recommendation 15 December
ADDING THE DIMENSION OF TIME TO HTTP 213

2004, W3C. [online] Available at: https://www. [online] Bath. Available at: https://web.
w3.org/TR/webarch/ [Accessed 1 Mar. 2018]. archive.org/web/20170809135759/http://
Jones, S.M., Nelson, M.L., Shankar, H., and Van i w a w . e u ro p a rc h i v e . o r g / 0 4 / M o h r. p d f
de Sompel, H. (2014) Bringing web time [Accessed 1 Mar. 2018].
travel to mediawiki: An assessment of the Negulescu, K.C. (2010) Web archiving @ the
memento mediawiki extension. [online] internet Archive; presentation at the 2010
Available at: http://arxiv.org/abs/1405.2330 Digital Preservation Partners Meeting. [online]
[Accessed 1 Mar. 2018]. Available at: http://www.digitalpreservation.
Jones, S.M., Van de Sompel, H., and Nelson, gov/meetings/documents/ndiipp10/NDIIPP-
M.L. (2016a) ‘Avoiding spoilers: Wiki time 072110FinalIA.ppt [Accessed 1 Mar. 2018].
travel with Sheldon Cooper’, International Nelson, M.L. (2011) Memento-Datetime is not
Journal on Digital Libraries, [online] 19(1):77– Last-Modified. [blog] Web Science and Digi-
93. Available at: https://doi.org/10.1007/ tal Library Research Group. Available at:
s00799-016-0200-8 [Accessed 1 Mar. 2018]. http://ws-dl.blogspot.com/2010/11/2010-
Jones, S.M., Van de Sompel, H., Shankar, H., 11-05-memento-datetime-is-not-last.html
Klein, M., Tobin, R., and Grover, C. (2016b) [Accessed 1 Mar. 2018].
‘Scholarly context adrift: Three out of four Nelson, M.L. (2013) Right-click to the past –
uri references lead to changed content’, PloS Memento for Chrome. [blog] Web Science
One, [online] 11(12):e0167475. Available at: and Digital Library Research Group. Available
https://doi.org/10.1371/jour nal.pone. at: http://ws-dl.blogspot.com/2013/10/2013-
0171057 [Accessed 1 Mar. 2018]. 10-14-right-click-to-past-memento.html
Kelly, M., Nelson, M.L., and Weigle, M.C. [Accessed 1 Mar. 2018].
(2014a) ‘The archival acid test: Evaluating Ritchie, D., and Thompson, K. (1974) ‘The UNIX
archive performance on advanced HTML and time-sharing system’, Communications of the
JavaScript’, in: JCDL ‘14, Proceedings of the ACM, [online] 17(7):365–375. Available at:
14th ACM/IEEE-CS Joint Conference on Digi- https://doi.org/10.1145/361011.361061
tal Libraries, [online] London: ACM, pp. 25– [Accessed 1 Mar. 2018].
28. Available at: https://doi.org/10.1109/ Rossi, A. (2016) Robots.txt Files and Archiving
JCDL.2014.6970146 [Accessed 1 Mar. .gov and .mil Websites. [blog] Internet
2018]. Archive Blogs. Available at: https://blog.
Kelly, M., Nelson, M.L., and Weigle, M.C. archive.org/2016/12/17/robots-txt-gov-mil-
(2014b) ‘Mink: Integrating the live and websites/ [Accessed 1 Mar. 2018].
archived web viewing experience using web Sanderson, R. (2012) ‘Global web archive inte-
browsers and Memento’, in: JCDL ‘14, Pro- gration with Memento’, in: JCDL ‘06, Pro-
ceedings of the 14th ACM/IEEE-CS Joint ceedings of the 12th ACM/IEEE Joint
Conference on Digital Libraries, [online] Conference on Digital Libraries, [online]
London: ACM, pp. 267–276. Available at: Washington: ACM, pp. 379–380. Available
https://doi.org/10.1109/JCDL.2014.6970229 at: https://doi.org/10.1145/2232817.
[Accessed 1 Mar. 2018]. 2232900 [Accessed 1 Mar. 2018].
Klein, M., Van de Sompel, H., Sanderson, R., Sanderson, R., Shankar, H., Ainsworth, S.,
Shankar, H., Balakireva, L., Zhou, K., and McCown, F., and Adams, S. (2011) ‘Imple-
Tobin, R. (2014) ‘Scholarly context not found: menting time travel for the web’, Code4Lib
One in five articles suffers from reference Journal, [online] 13. Available at: http://journal.
rot’, PloS One, [online] 9(12):e115253. Avail- code4lib.org/articles/4979 [Accessed 1 Mar.
able at: https://doi.org/10.1371/journal. 2018].
pone.0115253 [Accessed 1 Mar. 2018]. Tweedy, H., McCown, F., and Nelson, M.L. (2013)
Masanes, J. (2006) Web Archiving. Springer, ‘A Memento web browser for iOS’, in: JCDL
Berlin, Heidelberg. ‘013, Proceedings of the 13th ACM/IEEE-CS
Mohr, G., Kimpton, M., Stack, M., and Rani- Joint Conference on Digital Libraries, [online]
tovic, I. (2004) ‘Introduction to heritrix, an Indianapolis: ACM, pp. 371–372. Available at:
archival quality web crawler’, in: IWAW ‘04, https://doi.org/10.1145/2467696.2467764
4th International Web Archiving Workshop, [Accessed 1 Mar. 2018].
214 THE SAGE HANDBOOK OF WEB HISTORY

Van de Sompel, H., and Nelson, M.L. (2015) and Ainsworth, S. (2010) ‘An HTTP-based
‘Reminiscing about 15 years of interoperabil- versioning mechanism for linked data’, in:
ity efforts’, D-Lib Magazine, [online] LDOW ‘010, Proceedings of the Linked Data
21(11/22). Available at: https://doi. on the Web Workshop, [online] Raleigh:
org/10.1045/november2015-vandesompel CEUR 628. Available at: http://ceur-ws.org/
[Accessed 1 Mar. 2018]. Vol-628/ldow2010_paper13.pdf [Accessed 1
Van de Sompel, H., Nelson, M.L., and Sander- Mar. 2018].
son, R. (2013) HTTP framework for time- Van de Sompel, H., Shankar, H., Wincewicz, R.,
based access to resource states – Memento, and Nelson, M.L. (2015) Robust links – link
Internet RFC 7089. [online] Available at: decoration. [online] Available at: http://
http://tools.ietf.org/html/rfc7089 [Accessed robustlinks.mementoweb.org/spec/
1 Mar. 2018]. [Accessed 1 Mar. 2018].
Van de Sompel, H., Nelson, M.L., Sanderson, R., Zittrain, J., Albert, K., and Lessig, L. (2014)
Balakireva, L.L., Ainsworth, S., and Shankar, ‘Perma: Scoping and addressing the problem
H. (2009). Memento: Time Travel for the of link and reference rot in legal citations’,
Web. [online] Available at: http://arxiv.org/ Legal Information Management, [online]
abs/0911.1112 [Accessed 1 Mar. 2018]. 14(02):88–99. Available at: https://doi.org/
Van de Sompel, H., Sanderson, R., Nelson, 10.1017/S1472669614000255 [Accessed 1
M.L., Balakireva, L.L., Shankar, H., Mar. 2018].
15
Hypertext before the Web – or,
What the Web Could Have Been
Belinda Barnet

Historiography is always guided by specific before hypertext became synonymous with


tropes; it is ‘infected by what it touches as the Web; as Ted Nelson1, who coined the
the past’ Demeulenaere (2003). Much has term ‘hypertext’ among other things, put it to
been written about the history of hypertext me in 1999, ‘People saw the Web and they
over the last 20 years, and I’ve contributed to thought, “Oh, that’s hypertext, that’s how
that literature; in the process I’ve been it’s meant to look”’. But hypertext is not the
infected by the vision behind the early sys- Web; the Web is one particular implementa-
tems from the 60s and 70s. This might be tion of hypertext. In this chapter we will use
because I’ve got to know some of the inven- Ted Nelson’s definition of hypertext: branch-
tors – I conducted 22 interviews with inven- ing and responding text read at a computer
tors for my book on pre-web systems Memory screen (Nelson, 1992). The Web is without a
Machines (Barnet, 2013) – and drank the doubt the most successful and prevalent ver-
same Kool-Aid. The first system was built in sion of hypertext, but it is also an arguably
counterculture-soaked 1960s California, limited one. Hypertext existed well before
though, so a bit of dreaming is appropriate, the Web – the systems here were imagined
along with the occasional Yoga Workstation. (and in some cases built and obsolete) well
This was an era when people had grand before Google and Facebook. Insofar as the
visions for their pre-web hypertext systems, Web is a hypertext system, it continues an
when they believed that the solution to the already established line of evolution, and it
world’s problems might lie in finding a way is important to understand this pre-history to
to organize the mess of human knowledge: to fully understand the Web.
represent its true interconnections. The first hypertext systems were deep and
Now for the important bit: this story stops richly connected, and in some respects more
before the Web. More accurately, it stops powerful than the Web. These early systems
216 THE SAGE HANDBOOK OF WEB HISTORY

were not, however, connected to hundreds of the human, to ‘boost our capacity to deal with
millions of other users. You could not reach complexity’ as a species (Engelbart, 1999).
out through FRESS and read a page hosted in
Thailand or Libya. The early systems worked Here’s a human… He’s got all these capabilities
within his skin we can make use of, a lot of mental
on their own set of documents in their own
capabilities we know of, and some of it he’s even
unique environments. Although Nelson cer- conscious of. Those are marvellous machines there –
tainly envisioned that Xanadu would have motor machinery to actuate the outside world, and
the domestic penetration of the Web, and sensor and perceptual machinery to get the idea of
NLS had nifty collaborative tools and chalk- what’s going on… (Engelbart, 1998: 213)
passing protocols, none of the early ‘built’ sys-
tems we look at either briefly or in depth in For Engelbart, the most important element of
this chapter – NLS, HES and FRESS – was the ‘human system’ was language; language
designed to accommodate literally billions of is a powerful machine. Engelbart would seek
users. That’s something only the Web can do. to harness its nonlinear relationships with
As Jay Bolter put it in our interview, a computer system and externalize its ‘net-
worked’ structure.
What the World Wide Web did was two things. Engelbart still remembers reading about
One is that it compromised as it were on the
Vannevar Bush’s Memex (hypothetical
‘vision’ of hypertext. It said, ‘This is the kind of
linkage it’s always going to be, it’s always going to proto-hypertext system that Bush described
work in this way’, [but] more importantly it said in his 1945 The Atlantic Monthly article ‘As
that the really interesting things happen when We May Think’), and the moment when he
your links can cross from one computer to became ‘infected’ with the idea of building
another… So global hypertext – which is what the
a means to extend and navigate this great
Web is – turned out to be the way that you could
really engage, well, ultimately hundreds of millions pool of human knowledge. He was a military
of users. (Bolter, 2011) radar technician out in the Philippines when
he first picked up a reprint of Bush’s article,
The goal of this chapter is to explore the in the summer of 1945, and wandered into a
visions of the early hypertext pioneers, start- Red Cross library that was built up on stilts to
ing over 60 years ago, and in the process, to read it. News of Hiroshima was devastatingly
broaden our conception of what hypertext fresh; Engelbart was 20 years old and deep
could be. The chapter begins by exploring in thought (Barnet, 2013). He was wondering
NLS, then Xanadu and on to HES. how he could ‘maximize [his] contribution to
humankind’ as an engineer (Engelbart, 1999).
In this article Bush looked towards the
DOUG ENGELBART’S ON-LINE postwar world as an engineer and predicted
SYSTEM (NLS) an exponential increase in human knowledge,
especially scientific and technological knowl-
Dr Douglas Carl Engelbart, who died in edge. How are we to keep track of it all? How
2013, was a softly spoken man. His voice are we to prevent great ideas from being lost?
was low yet persuasive, as though ‘his words Some ideas are like seeds. ‘Or viruses. If they
have been attenuated by layers of medita- are in the air at the right time, they will infect
tion’, his friend Nilo Lindgren wrote in 1971 exactly those people who are most suscepti-
(Rheingold, 2000: 178). I struggled to hear ble to putting their lives in the idea’s service’
him in our interview, being deaf myself, but (Rheingold, 2000: 176). Although he didn’t
that didn’t matter; he had been describing the think about the article again for many years,
same vision in great detail to journalists, his- the ideas in it infected Engelbart.
torians and engineers for over 60 years. But five years after he had read ‘As We
Engelbart wanted to improve the model of May Think’ in the Philippines, Engelbart
HYPERTEXT BEFORE THE WEB – OR, WHAT THE WEB COULD HAVE BEEN 217

claims, ‘I formulated [my] goal on…human their information out to these technicians on
intellectual effectiveness’ (Engelbart, 1962a: punch cards and printouts. The idea that a
236). These three ‘flashes’ were to become screen might be attached to a computer, and
the framework he worked from for the rest that humans might interact directly via this
of his career: surface, was far left field.
The computer, screen and mouse would
1 FLASH-1: The difficulty of mankind’s problems become Engelbart’s parallel to Memex’s
was increasing at a greater rate than our ability microfilm storage desk, tablet display and
to cope. stylus. With these technologies, Engelbart
2 FLASH-2: Boosting mankind’s ability to cope with
would ‘update’ an image of potentiality from
complex, urgent problems would be an attractive
a different era and bring it to digital comput-
candidate as an arena in which a young person
might ‘make the most difference’. ing. But his ideas did not garner the peer sup-
3 FLASH-3: Ahah – graphic vision surges forth of port Engelbart was seeking.
me sitting at a large CRT console, working in After I’d given a talk at Stanford, [three angry
ways that are rapidly evolving in front of my eyes guys] got me later outside at a table. They said, ‘All
(beginning from memories of the radar-screen you’re talking about is information retrieval.’ I said
consoles I used to service). (Engelbart, 1988: 189) no. They said, ‘YES, it is, we’re professionals and
we know, so we’re telling you, you don’t know
This vision of a radar-screen console attached enough so stay out of it, ‘cause goddamit, you’re
bolloxing it all up. You’re in engineering, not infor-
to a computer is important. Engelbart trans-
mation retrieval.’ (Engelbart, 1999)
ferred this technology from the radars he
was servicing in the Philippines to comput- Computers, in large part, were still seen as
ers as he had learnt about them in engineering number crunchers, and computer engineers
school (CRT phosphors came into common had no business talking about the human
use around WWII). This image of a human beings who used these machines. Fortunately,
sitting at a screen is the image from which one of the few people who had the discipli-
all future work would depart. By the time nary background to be able to understand
Engelbart started work on NLS ten years the new conceptual framework was moving
later, progress had been made on ‘presenting through the ranks at the Advanced Research
computer-stored information to the human… Projects Agency (ARPA). This man was
by which a cathode-ray-tube (of which the tel- J.C.R. Licklider, a psychologist from MIT.
evision picture tube is a familiar example) can ‘The hope is that, in not too many years,
be made to present symbols on their screens human brains and computing machines will
of quite good brightness, clarity, and with be coupled together very tightly, and the
considerable freedom as to the form of the resulting partnership will think as no human
symbol’ (Engelbart, 1962b). In 1951, how- brain has ever thought’, wrote Licklider
ever, Engelbart had to mentally extrapolate in a 1960 paper called ‘Man-Computer
from radar screens. As he told me in 1999, Symbiosis’ (Licklider, 1960: 131).
Licklider began financing projects that
I put together what I knew about computers and
developed thought-amplifying technologies.
what I knew about radar circuitry etc. to picture
working interactively, and it just grew from there’ ARPA support began in 1963, ‘at varying
(Engelbart, 1999). levels – during 1965, about eighty thousand
dollars’ (Bardini, 2000: 23). But as Bardini
This was a radical idea in the 1950s. At points out (in his book and in a personal
that time, computers were large electronic communication), it was Bob Taylor, initially
devices stored in air-conditioned rooms, at working as a psychologist at NASA, who
many degrees of separation from the ‘user’. mustered the strongest support for the project.
They were attended to by technicians and fed Initially he put in 85,000 dollars from NASA
218 THE SAGE HANDBOOK OF WEB HISTORY

mid 1964 to mid 1965 (Bardini, 2000: 23). anywhere (‘unless there was a specific reason
Then when he moved from NASA to ARPA not to do so’ (Duvall, 2011)). This approach
in 1964 he told Engelbart that IPTO ‘was to information was entirely new – an approach
prepared to contribute a million dollars ini- that assumed from the outset that connectiv-
tially to provide one of the new time-sharing ity is important, that the relationship between
systems, and about half a million dollars a and among ideas was just as important as the
year to support the augmentation research’ unit of information (or ‘statement’) itself. As
(Rheingold, 2000: 86). That is the equivalent Duvall remembers,
of around 14 million in today’s dollars.
Around this time Engelbart asked a bright The thing that I would say distinguished NLS from
young SRI engineer named Bill English if he’d a lot of other development projects was that it was
sort of the first – I’m not sure what the right word
present a paper for him. English said yes, and is – ‘holistic’ is almost a word that comes to mind –
joined Engelbart’s project shortly thereafter. project that tried to use computers to deal with
As English told me in 2011, ‘I saw what he documents in a two-dimensional fashion rather
was doing and I was interested in it. So I joined than in a one-dimensional fashion. (Duvall, 2011)
him and that’s how it all began’ (English,
2011). He became Engelbart’s chief engineer Content insertion and navigation involved
in 1964, and began work on some of the basic four basic commands: Insert, Delete, Move
ideas we’ll explore in the next section. and Copy. The mouse served as a pointer to
Bill Duvall joined SRI in 1966, but didn’t indicate where content was to be inserted
start work on the NLS project until 1968. He or deleted in existing text. Most important,
asked to be moved onto Engelbart’s team, but however, the Link function allowed cross-
first he had to meet with the head of engineer- referencing to another statement – and the
ing at SRI. user could define cross-references at any
level in the hierarchy. Links were character
[He] was a very traditional engineer. I was a strings within a statement ‘indicating a rela-
24-year-old kid, or 23 years old, and he had this
big corner office that had bookshelves to the ceil- tionship’ (Engelbart, 1999) to another state-
ing and the little ladder that goes around, and a ment, and they could be made in the same file
big desk. He sat on the other side of his desk, and or between different files.
I sat in a chair and he looked at me, and he said, The NLS design changed as it was cre-
‘You don’t really think what they’re doing up there ated, evolving around the technical activities
is science, do you?’ I think that reflected a lot of
the official attitude towards what Doug was of the project team itself as they documented
doing. (Duvall, 2011) their efforts. In the process of designing
the software, the team generated a number
The freshly outfitted laboratory, the Augmen­ of technical reports, source code versions,
tation Research Center (ARC), began its communications, release notes, problems
work in 1965. It started with a series of and associated solutions. These were linked
experiments focused on the way people select together and tracked by date and version.
objects on a computer screen. In the context Consequently, between 1969 and 1971
of screen-based interactivity, Engelbart and (Bardini, 2000) NLS was changed to include
English’s ‘mouse’ consistently beat other an electronic filing arrangement that served
devices for fast, accurate selection in a series as a linked archive of the development team’s
of controlled tests. Yet it took over 20 years efforts. This eventually cross-referenced
to enter the commercial market, a time period over 100,000 items (Engelbart, 1988). It was
that Engelbart considers strikingly long called the software Journal, the most explicit
(Engelbart, 1988: 196). model of a hypertextual environment with
At the heart of NLS was a basic philosophy embedded associative links to surface in the
that you should be able to link to anything, digital environment in the 1960s.
HYPERTEXT BEFORE THE WEB – OR, WHAT THE WEB COULD HAVE BEEN 219

As mentioned previously, the entire NLS back to the group at SRI via a temporary
system was set up for perfect storage and microwave antenna. The setup was very
recall. Everything was tracked and identifia- expensive, and although no special system
ble. Every object in the document was intrin- capabilities were employed (NLS was run
sically addressable, and most important, these just as it was used back at the lab), the organ-
addresses never disappeared (Engelbart, izational and presentational machinery used
1999). They were permanent, attached to the almost all the remaining research funds for
object itself, which meant that they followed the year.
the object wherever it was stored, so links Engelbart sat up on the stage beneath the
could be made at any stage to any object in projection screen, a mouse in one hand and
the system. As Duvall remembers, his other hand playing a special one-handed
keyset. He manipulated the audience’s atten-
[In NLS] we had it that every object in the docu- tion by controlling their view of the informa-
ment was intrinsically addressable, right from the tion being explored; he drilled down through
word ‘go’. It didn’t matter what date a document’s
the data structure and presented it in multiple
development was, you could give somebody a link
right into anything, so you could actually have different views, each piece connected to the
things that point right to a character or a word or last by a link. The screen was divided into
something. (Duvall, 2011) neat windows containing explanatory text
or graphics about NLS, and also about the
The NLS team called this a ‘frozen state’ presentation itself in the hypertext Journal. It
addressing scheme (Engelbart, 1997), which was dubbed ‘The Mother of all Demos’ by
is in contrast to the World Wide Web, where Andries van Dam. A video of this demo is
the finest level of intrinsic addressability is now available on the Web.
the URL (Universal Resource Locator, a This was the first public appearance of
character string that identifies an Internet the mouse and the first public appearance
resource, invented by Tim Berners-Lee in of hypertext, screen splitting, computer-
19942). Unlike the NLS object-specific supported associative linking, computer con-
address, the URL is simply a location on a ferencing and a mixed text/graphics interface.
server; it is not attached to the object itself. It proceeded without a hitch and received a
That said, NLS was working in its own lit- standing ovation (Rheingold, 2000).
tle environment, on its own documents – not As a result of the NLS demo, many of the
with billions of users. user interface technologies from NLS also
By 1968 NLS had matured into a massive migrated into computing over the next few
database and set of paths through this data- years, the mouse in particular. Ted Nelson
base, the first digital hypertext system. It eventually incorporated the mouse into his
was time to take NLS out of the Petri dish Xanadu design (Barnet, 2013). NLS also
and set it to work in front of the engineering allowed consideration of the modern windows-
community. Engelbart took an immense risk icon-menu-pointing-device (WIMP) interface.
and applied for a special session at the ACM/ Engelbart was working from a vision he
IEEE-CS Fall Joint Computer Conference in had as a young engineer. This vision changed
San Francisco in December 1968. ‘The nice the world. Technical visions have no essence;
people at ARPA and NASA, who were fund- there is no transcendent design, no Platonic
ing us, effectively had to say “Don’t tell me!”, form we are striving towards. There is, how-
because if this had flopped, we would have ever, a recurrent dream – an elusive ‘blessed
gotten in trouble’ (Engelbart, 1988: 203). break’ – and this dream is an ancient one. It
The conference was set up using a video comes from long-standing cultural desires
projector pointing at a huge, 20-foot screen and anxieties about the ephemerality of
that was linked to a host computer and piped human memory and knowledge.
220 THE SAGE HANDBOOK OF WEB HISTORY

The dream is to create a perfect archive claimed to have woken from a laudanum-
for human knowledge, a machine that might laced reverie with ‘two or three hundred’
‘extend, through replication, human mental lines of poetry in his head. He had noted
experience’ (Nyce and Kahn, 1991: 124) and down but a few lines when he was interrupted
capture the interconnected structure of knowl- by a visitor, and when he returned to his work
edge itself. Most important, this would be a later he found that the memories had blurred
machine that we can control, whose workings irretrievably. His mythical landscape, this
are transparent to us and whose trails do not vision of Xanadu, had passed away ‘like the
fade. Which brings us to the next system – images on the surface of a stream into which
my favorite system and recurring vision: the a stone had been cast’ (Coleridge, cited in
Magical Place of Literary Memory: Xanadu®. Nelson, 1987: 142).
One of Nelson’s most cherished memories
is based on a vision of gently moving water.
When he was about four or five he trailed his
THE MAGICAL PLACE OF LITERARY
hand in the water as his grandfather rowed a
MEMORY: XANADU
boat. He studied the different ‘places’ in the
water as they passed through his fingers, the
What I thought would be called Xanadu® is called
the World Wide Web and works differently, but ‘places that at one instant were next to each
has the same penetration. (Nelson, 1999) other, then separated as my finger passed.
They rejoined, but no longer in the same way’
It was a vision in a dream. A computer filing (Nelson, 2010: 35). These connections were
system that would store and deliver the great infinite in number and in complexity, and
body of human literature, in all its historical they changed as the water moved. It was a
versions and with all its messy interconnec- religious experience, a vision that has stayed
tions, acknowledging authorship, ownership, with him for 70 years. At that moment he
quotation and linkage. Like the Web, but started thinking about what he calls ‘profuse
much better: no links would ever be broken, connection’, the interconnections that perme-
no documents would ever be lost, and copy- ate life and thought: How can one manage
right and ownership would be scrupulously all the changing relationships? How can one
preserved. The Magical Place of Literary represent profuse connection? Xanadu was
Memory: Xanadu. In this place, users would proposed as a vast digital network to house
be able to mark and annotate any document, this corpus of ideas and evidential materials,
see and intercompare versions of documents facilitated by a special linking system.
side by side, follow visible hyperlinks from The story of Xanadu is the greatest image
both ends (‘two-way links’) and reuse con- of potentiality in the evolution of hypertext.
tent pieces that stay connected to their origi- Nelson invented a new vocabulary to describe
nal source document. There would be his vision, much of which has become inte-
multiple ways to view all this on a computer grated into contemporary hypermedia theory
screen, but the canonical view would be side- and practice – for instance, the words ‘hyper-
by-side parallel strips with visible connec- text’ and ‘hypermedia’. As he told me in 1999,
tions. Just imagine. This vision – which is ‘I think I’ve put more words in the dictionary
older than the Web, and aspects of it are older than Lewis Carroll. Every significant change
than personal computing – belongs to hyper- in an idea means a new term’. Nelson came
text pioneer Theodor Holm Nelson, who up with many significant changes, and con-
dubbed the project Xanadu in October 1966.3 sequently many new terms, some of which I
The name comes from the famous poem discuss below. He also recruited or inspired
by Samuel Taylor Coleridge, ‘Kubla Khan’. some of the most visionary programmers and
In his tale of the poem’s origin, Coleridge developers in the history of computing, many
HYPERTEXT BEFORE THE WEB – OR, WHAT THE WEB COULD HAVE BEEN 221

of whom went on to develop the first hyper- ‘and my world exploded’ (Nelson, 2010: 99).
text products (although this doesn’t impress He understood immediately that computers
him: ‘the problem with inspiring people is were all-purpose machines that could be put
that they then try to credit you with things in the service of information handling. Like
you don’t like’ (Nelson, 2010). Engelbart, he did not believe that computers
Much has been written about Project were mathematical tools for engineers. He
Xanadu over the years (the ones I cite saw a solution to his problem of informa-
throughout this chapter are Landow, 1992; tion handling, and he also saw a future where
Rheingold, 2000). Nelson himself doesn’t paper might be eliminated. ‘The prison of
have the time to keep up with it all – and even paper, enforcing sequence and rectangularity,
when he does, as he put it to me in 2010, had been the enemy of authors and editors
‘anything people write about me will be for thousands of years; now at last we could
insufficiently praising, and so it’s very hard break free’ (Nelson, 2010: 120).
to read it’ (his comments on my own work One of the first ideas was based on his
were prolific). The remarkable thing about own ‘terrible problem’ keeping notes on file
Xanadu is that, despite countless setbacks, cards. The problem was that his cards really
it refuses to die. Its logo is, appropriately needed to be in several different places at
enough, the Eternal Flaming X. Paisley and once; new projects were built on earlier ideas,
Butler (cited in Smith, 1991: 262) have noted new documents were built on earlier ideas, so
that ‘scientists and technologists are guided these items should be reusable by reference.
by “images of potentiality” – the untested Perhaps the computer might solve that prob-
theories, unanswered questions and unbuilt lem; it could link them together. This idea
devices that they view as their agenda for five would later prove important in the design
years, ten years, and longer’. Often accused of Xanadu.
of hand waving and lucid dreaming, Nelson’s
Xanadu has nonetheless become the most Many of my file cards belonged in several places at
once – several different sequences or projects.
important vision in the history of computing. Each card – call it now an entry or an item – should
Nelson wears a lanyard across his neck and be stored only once. Then each project or sequence
shoulder with three pens attached to it (when would be a list of those items. (Nelson, 2010: 103)
I first met him in 1999 it had sticky notes, at
another point a stapler – the ‘system is evolv- Imagine if the same item could appear in
ing’ as he put it in a personal communication, multiple places. You could connect each item
2012). He has been wearing it since 1997. The to its original by an addressing method and
belt is filled with tools to connect things with, retrieve them on a computer screen. Because
tools to deal with a world of paper. Like Bush, this would be reuse by reference rather than
Nelson is painfully aware that ideas are eas- by copying, you could trace each item back
ily lost in conventional indexing systems, that to its original source. This idea – that you
they are disconnected from each other, and should be able to see all the contexts of reuse,
that ‘serious writing or research’ demands and that you should be able to trace items
connecting ideas together. Frustrated by the back to their original source – would ‘drive
lack of a global, real-world system that might my work to the edge of madness’ (Nelson,
do this for him, and ‘outraged’ by the con- 2010: 104). It would later become the kernel
fines of paper (Nelson, 1998: 1–2), he feels of Nelson’s most innovative idea: transclu-
the need to do this manually. sion. Transclusion, and also the ability to vis-
Nelson’s struggle against the paradigm ually compare prior or alternative versions of
of paper led him to design an alterna- the same document on-screen (‘intercompar-
tive. In 1960, at Harvard University, he ison’) were integral to the design of Xanadu.
took a computer course for the humanities, Over the next 40 years, Nelson would hone
222 THE SAGE HANDBOOK OF WEB HISTORY

these ideas and experiment with them in vari- of elements (transclusion). He thought about
ous incarnations. the architecture of the system and decided
In 1960 Nelson announced his term pro- to have sequences of information that could
ject: a writing system for the IBM 7090, be linked sideways. As with his first design,
the only computer at Harvard at the time, this would all occur on a computer screen,
stored in a big, air-conditioned room at the visually, in real time. He called this system
Smithsonian Observatory. In the 1960s ‘Zippered Lists’.
computers were ‘possessed only by huge Zippered Lists permitted linking between
organizations to be used for corporate tasks documents: like the teeth in a zipper, items in
or intricate scientific calculations’ (Nelson, one sequence could become part of another
1965: 135). The idea that expensive process- (‘EXCEPT’, Nelson wrote in response to this
ing time might be wasted on writing, of all chapter, ‘the two sides of the zipper don’t have
things, was deemed crazy by the engineer- to be in the same order’). Versions of a docu-
ing community. Nelson ignored this. He pro- ment could be intercompared; an item could
posed a machine-language program to store be an important heading in one sequence and
documents in the computer, change them on- a trivial point in another, and all items could
screen with various editorial operations and be written or retrieved in a nonsequential
print them out. But this was no mere word fashion. Links could be made between large
processor (which in any case didn’t exist at sections, small sections or single paragraphs.
the time); Nelson envisioned the user would Writers could trace the evolution of an idea.
be able to compare alternative and prior ver- Crucially, the design also got him pub-
sions of the same document on-screen. lished. Nelson’s first paper explaining the
The second part of Nelson’s idea took term ‘hypertext’ was presented at the ACM
shape in the early 1960s, when there was ‘a 20th National Conference in August 1965. It
lot of talk around Cambridge [Massachusetts, was not the first time that the word ‘hyper-
where Harvard is located] about Computer- text’ had appeared in print, though. That
Assisted Instruction, for which there was a was in an invitation to a talk Nelson gave at
lot of money’ (Nelson, 1992: 1/26). It was Vassar College, ‘Computers, Creativity and
not so much a design at this point, Nelson the Nature of the Written Word’, on 5 January
stressed in response to my request for paper, 1965. (Actually, it’s the word ‘hypertexts’. A
‘it was an idea that may have been on only copy of this invitation appears in Possiplex.)
one file card’ (Nelson, 2010). At this time he One of the first people who thought they
conceived of what he called ‘the thousand might try to build part of Nelson’s design was
theories program’, an explorable computer- Andries van Dam (I stress ‘part’ here because
assisted instruction program that would allow van Dam had ideas of his own that he wanted
the user to study different theories and sub- to explore at the same time, such as print text
jects by taking different trajectories through editing). We explore this in the next section.
a network of information. Van Dam would be the first person to
This led to another idea, which Nelson attempt to build part of Nelson’s vision.
drafted as an academic paper while teach-
ing sociology at Vassar College in 1965. He
wants to stress at this point that there was no
single eureka moment as ‘the ever-changing SEEING AND MAKING CONNECTIONS:
designs had been swirling in my head for five HES AND FRESS
years’ (Nelson, 2012). The concept was clear
enough, however, to put it down on paper. We will now trace the development of two
This revised design combined two key ideas: important hypertext systems built at Brown:
side-by-side intercomparison and the reuse the Hypertext Editing System (HES),
HYPERTEXT BEFORE THE WEB – OR, WHAT THE WEB COULD HAVE BEEN 223

co-designed by Ted Nelson and van Dam and talk about work – i.e. software, algorithms,
developed by van Dam’s students, and the things that are concrete’, recalled van Dam
File Retrieval and Editing System (FRESS), (1999). What Nelson did have was a vision of
designed by van Dam and his students. what hypertext should look like, and an
Brown University has played a major role in infectious enthusiasm for the idea. Nelson
the development of hypertext systems and had in mind an entirely new genre for litera-
humanities computing since 1967, due in no ture, and he had a new word to describe this
small part to van Dam’s work in this area, vision: hypertext.
and the Institute for Research in Information
and Scholarship (IRIS) he helped establish Nelson’s vision seduced me. I really loved his way
of thinking about writing, editing and annotating
there with William Shipp and Norman
as a scholarly activity, and putting tools together
Meyrowitz in 1983. to support that… He talked me into working on
Ted Nelson was co-designer of HES, and the world’s first hypertext system and that sounded
his ideas about hypertext inspired the HES cool. (van Dam, 1999)
project in the first place; van Dam credited
Nelson for this contribution both in our inter- Van Dam gathered a team together at Brown
views (1999; 2011) and in his public talks and began work later that year, with the
and published work. Nelson still feels, how- objective of trying out this hypertext concept.
ever, that he has been ‘given no more credit He stressed in his communications to me that
than his [van Dam’s] undergraduate students’ the idea was never to ‘realize’ Xanadu in its
(Nelson, 2012). Unfortunately, Nelson fell entirety. The intention was much smaller and
out with the HES team during its design and more circumspect: to ‘implement a part of
implementation in 1967, and he has stated in his vision. We were not able to implement or
several interviews with the author that he was even understand the breadth and scope of that
unhappy with the result. He is bitter about [larger] vision’ (van Dam, 2011). Nelson,
the experience, largely because he feels his however, was under the impression that his
vision was sidelined in favor of print text designs would be honored.
editing (Nelson, 1999; 2011). We explore this Nelson was initially very excited. He went
in more detail shortly. up there ‘at his own expense’ (van Dam,
Along with Engelbart’s landmark NLS 1988: 87) to consult in the development of
system, HES and FRESS constitute the first HES, but found the experience frustrating.
generation of hypertext systems. These are not As observed previously, van Dam and his
dusty old antiques from the dawn of computing team wanted to explore the hypertext con-
science, however; in many respects, FRESS cept, but they also had their own plans for
by 1968 was more interactive than present-day print text editing (which Nelson strenuously
HTML. As working prototype systems, HES opposed). The team set out to design a dual-
and FRESS had technical chutzpah. purpose system for authoring, editing and
printing documents such as papers, propos-
als and course notes, which could also be
used to browse and query written materials
The Design of HES
nonsequentially. From Nelson’s perspective,
Van Dam bumped into Nelson at the 1967 this was a compromise on his design: it was
Spring Joint Computer Conference in Atlantic simulating paper, and hypertext was a mere
City. Passionate and eloquent, Nelson told footnote to the system.
van Dam about what he’d been doing since This sentiment is understandable; the
he left Swarthmore: hypertext. ‘He had noth- reader knows by now that Nelson feels his
ing to show for this idea, no prototypes or vision was sidelined. But the HES team were
work in the sense that computer scientists trying to convince the world that the whole
224 THE SAGE HANDBOOK OF WEB HISTORY

concept of handling text on computers was an asterisk) to an entrance point in another,


not a waste of time and processing power. The or the same, area.
world knew text handling as a paper-based The HES team used Ted Nelson’s concept
thing. ‘Not only were we selling hypertext, of a hypertext link (though from Nelson’s
but at the same time document processing, perspective they ‘flattened’ this by making
interaction. Many people were still comput- the jumps one-way), as Doug Engelbart was
ing with cards’, recalls van Dam (1999). incorporating the same idea into NLS indepen-
HES was set up on an IBM 360/50 with dently, unbeknownst to van Dam, who wishes
a 2250 display, and ran in a 128k partition he was aware of this work. ‘I hadn’t heard of
of the operating system that controlled the Engelbart. I hadn’t heard of Bush and Memex.
512k of main memory available (there was That came quite a bit later’, van Dam recalls
a complete times-haring system operating (1999). Links were intended to be optional
in another partition). The user sat facing a paths within a body of text – from one place
12-inch-by-12-inch screen, browsing through to another. They represented a relationship
portions of arbitrarily sized texts. Original between two ideas or points: an intuitive con-
text was entered directly via a keyboard, and cept. Branches were inserted at decision points
the system itself was controlled by pressing to allow users to choose ‘next places’ to go.
function keys and by pointing at the text with In early 1968 HES did the rounds of a num-
a light pen or with the keyboard (Carmody ber of large customers for IBM equipment, for
et al., 1969: 4). The activities of the user cor- example, Time/Life and The New York Times.
responded directly to the operations normally All these customers based their business on
performed upon text by writers and editors. the printed word, but HES was too far out for
The user was able to manipulate pieces of them. Writing was not something you did at
text as though they were physical items: cor- a computer screen. They had seen programs
recting, cutting, pasting, copying, moving that set type, and maybe some programs for
and filing drafts. managing advertisements, but the concept of
The HES team did not wish to store text in sitting in front of a computer and writing or
numerical pages or divisions known to users, navigating text was foreign to them.
except as they might deliberately divide text,
create links or number headings. Rather The best I ever got was from people like Time-Life
and the New York Times who said this is terrific
than filing by page number or formal code technology, but we’re not going to get journalists
name, HES stored text as arbitrary-length typing on computer keyboards for the foreseeable
fragments or ‘strings’ and allowed for edits future. (van Dam, 1999)
with arbitrary-length scope (for example,
insert, delete, move, copy). This approach As we now know, however, in less than a
differed from NLS, which imposed a hier- decade journalists (and executives) would be
archical tree structure of fixed-length lines typing on computer keyboards.
or statements upon all content; Engelbart In late 1968 van Dam finally met Doug
used 4,000-character limits on his state- Engelbart and attended a demonstration of
ments to create a tighter, more controlled NLS at the Fall Joint Computer Conference.
environment. HES was deliberately made As we explored, this presentation was a land-
to embody a freewheeling character, as non- mark in the history of computing, and the audi-
structured as possible. ence, comprising several thousand engineers
The system itself comprised text ‘areas’ and scientists, witnessed innovations such as
that were of any length, expanding and con- the use of hypertext, the computer ‘mouse’
tracting automatically to accommodate mate- and screen and telecollaboration on shared
rial. These areas were connected in two ways: files via videoconferencing for the first time.
by links and by branches. A link went from a For van Dam this system set another,
point of departure in one area (signified by and entirely different, technical precedent.
HYPERTEXT BEFORE THE WEB – OR, WHAT THE WEB COULD HAVE BEEN 225

The line or context editor was old technol- tangled data that surrounds us. For the world
ogy. NLS was the prototype for creating, grows more and more complex every day,
navigating and storing information behind and the information we are expected to keep
a tube and for having a multiuser, multiter- track of proliferates at every click. How are
minal, cost-effective system. He went on to we to keep track of it all?
design the File Retrieval and Editing System The problem Nelson, Engelbart and van
(FRESS) at Brown with his team. Dam identified is just as urgent today. The
Nelson is adamant that the legacy of HES Web has *not* solved this problem for us –
is modern word processing, and that it also arguably it has highlighted it, in razor-sharp
led to today’s web browser. text and 16.7 million colors. The Web is
without doubt a world-changing and world-
The design of HES became the design of FRESS… opening technology; as van Dam put it to
then Intermedia, then imitated by Notecards and
then by the World Wide Web. (Nelson, 2012) me in 1999, ‘the fact that I can reach out and
touch stuff in Ethiopia, as it were, is still a
Van Dam, for his part, thinks this is over- surprise to me’ (van Dam, 1999). Unlike any
reaching. ‘I think Nelson, when he feels that of the hypertext systems we have looked at
the “bad example” that HES set had reverber- here, the Web stretches between countries
ations in the bad design of the Web or brows- and engages literally billions of people. But
ers, is giving our humble little effort an order there are things we could improve on.
of magnitude more credit than it deserves’ Imagine a system whose trails did not fade.
(van Dam, 2011). Imagine if documents and objects could be
We do not have space here to go into van stored permanently, with their own unique
Dam’s next system (FRESS): interested read- address that never vanished, and retrieved at
ers can find it in my book. I shall stop here, will. Imagine if any version of these docu-
40 years before the Web, and conclude. ments could be visually intercompared side by
side, like the teeth in a zipper – and the quotes
or ideas in those documents could be traced
back to their original source with a click.
CONCLUSIONS Imagine if we could separate the linking struc-
ture from the content, and that content could
I hope that this chapter has presented you with consequently be reused in a million differ-
some earlier models of the hypertext concept, ent formats. Imagine if we could capture the
and in the process, demonstrated that every deeply tangled structure of knowledge itself,
model has its benefits and its shortcomings. but make it better, make it permanent.
Hypertext is not the Web; the Web is but one
particular implementation of hypertext. It’s
the best we’ve come up with insofar as it actu- Notes
ally works, most of the time – and it has
1  Note that his full name is Theodor Holm Nelson.
stayed the course for 22 years. It is not, how-
2  An anchor tag is not an intrinsic address.
ever, the only way hypertext can be done – as 3  While working at Harcourt.
the systems described in this chapter show.
We have also inherited a vision from these
projects: a device that ‘enables associative
connections that attempt to partially reflect REFERENCES
the ‘intricate web of trails carried by the cells
of the brain”’ (Wardrip-Fruin, 2003: 35). Bardini, T. (2000) Bootstrapping: Douglas
More precisely, a tool for thought – a tool Engelbart, Coevolution, and the Origins of
that might both organize and ‘permanize’ (to Personal Computing. Stanford, CA: Stanford
use Nelson’s term again) the mass of deeply University Press.
226 THE SAGE HANDBOOK OF WEB HISTORY

Barnet, B. (2013) Memory Machines: The Evo- Nyce, J. and Kahn, P. (eds) (1991) From
lution of Hypertext. London & New York, NY: Memex to Hypertext: Vannevar Bush and
Anthem Press. the Mind’s Machine. London: Academic
Carmody, S., Gross, W., Nelson, T.H., Rice, D. Press.
and van Dam, A. (1969) ‘A Hypertext Editing Rheingold, H. (2000) Tools for Thought: The
System for the /360’. In M. Faiman and J. History and Future of Mind-Expanding Tech-
Nievergelt (eds), Pertinent Concepts in Com- nology (2nd Edition). Cambridge, MA: MIT
puter Graphics, 291–330. Urbana: University Press.
of Illinois Press. Smith, L.C. (1991) ‘Memex as an Image of
Demeulenaere, A. (2003) ‘An Uncanny Thinker: Potentiality Revisited’. In J. Nyce and P. Kahn
Michel de Certeau’ in Image & Narrative, 5. (eds), From Memex to Hypertext: Vannevar
Engelbart, D. (1962a) ‘Letter to Vannevar Bush Bush and the Mind’s Machine, 261–86.
and Program on Human Effectiveness’. In J. London: Academic Press.
Nyce and P. Kahn (eds), From Memex to Hyper- Tofts, D. (1998) Memory Trade: A Prehistory of
text: Vannevar Bush and the Mind’s Machine, Cyberculture. North Ryde, NSW: Interface
1991, 235–44. London: Academic Press. Press.
Engelbart, D. (1962b) ‘Augmenting Human Van Dam, A. (1988) ‘Hypertext ‘87 Keynote
Intellect: A Conceptual Framework’. Report Address’ in Communications of the ACM
to the Director of Information Sciences, Air Volume 31 Issue 7, July, 887–95.
Force Office of Scientific Research, Menlo Wardrip-Fruin, Noah (2003) ‘Introduction: As
Park, CA, Stanford Research Institute. We May Think’. In Noah Wardrip-Fruin and
Engelbart, D. (1997) Interview with David Ben- Nick Montfort (eds), 2003, 35.
nehum. Online: http://memex.org/meme3- Wardrip-Fruin, Noah and Montford, Nick (eds)
01.html (2003) The New Media Reader. Cambridge,
Engelbart, D. (1998) ‘The Augmented Knowl- MA: MIT Press
edge Workshop’. In A. Goldberg (ed.), A
History of Personal Workstations, 185–249.
New York: ACM Press.
Landow, G. (1992) Hypertext. London: Johns
Hopkins University Press. PERSONAL COMMUNICATIONS
Licklider, J.C.R. (1960) ‘Man-Computer Symbi-
oses’. In A. Goldberg (ed.), A History of Per- Bolter, Jay (2011) Interview with the author.
sonal Workstations, 1988, 131–40. New Duvall, Bill (2011) Interview with the author
York: ACM Press. (recorded via phone, Melbourne).
Nelson, T. (1965) ‘A File Structure for the Com- Engelbart, Doug (1999) Interview with the
plex, the Changing and the Indeterminate’. author, Baltimore.
In Proceedings of the ACM 20th National English, Bill (2011) Interview with the author
Conference. New York: ACM Press. (recorded via phone, Melbourne).
Nelson, T. (1987) Computer Lib/Dream Nelson, Ted (1999) Interview with the author at
Machines. Redmond, WA: Microsoft Press. Kyoto University Japan.
Nelson, T. (1992) Literary Machines. Sausalito, Nelson, Ted (2010) Interview with the author at
CA: Mindful Press. Swinburne University.
Nelson, T. (1998) What’s on My Mind. Online: Nelson, Ted (2012) Comments on the manu-
http://www.xanadu.net/xybrap.html script for Memory Machines.
Nelson, T. (2010) Possiplex: Movies, Intellect, Cre- Van Dam, Andries (1999) Interview with the
ative Control, My Computer Life and the Fight author at Brown University.
for Civilization. Available at Lulu.com http:// Van Dam, Andries (2011) Interview with the
www.lulu.com/spotlight/tandm [March 2012]. author via phone.
16
A Historiography of the Hyperlink:
Periodizing the Web through the
Changing Role of the Hyperlink
Anne Helmond

INTRODUCTION: PERIODIZING THE stresses the continuities between the two peri-
WEB THROUGH THE HYPERLINK ods and is careful to not position the shift as a
technological upgrade, he nonetheless creates
In this chapter I provide a historiography of a juxtaposition by characterizing Web 1.0 as
one of the core elements of the web, the a web for ‘publishing’ and Web 2.0 as a web
hypertext link. I do so with the specific pur- for ‘participation’ (2005). This supposed shift
pose of tracing the various roles of this central from Web 1.0 as the ‘read-only’ static pub-
web object as a way to understand social, lishing web to Web 2.0 as the ‘read-write’
technical, and commercial transformations of participatory social web has since become
the web. That is, the hyperlink is positioned as the web’s dominant historical narrative
a way to historicize larger web developments (Song, 2010; Allen, 2013; Ankerson, 2015;
and as an alternative way to periodize the web. Helmond, 2015; Stevenson, 2016).
The dominant way of web periodization is Following Allen’s interpretation and cri-
through what Allen calls a ‘discourse of ver- tique of this narrative, Megan Ankerson
sions’ (2013), established by Tim O’Reilly proposes to ‘put aside the discourse of ver-
who labeled the period after the dot-com sions and approach the web and web histo-
crash ‘Web 2.0’ (2005), a marketing term riography as a site of ongoing configuration’
he used to claim this period as new and by employing Lucy Suchman’s notion of
upgraded, whilst at the same time promising a ‘configuration’ as ‘a device for studying
sense of continuity (Allen, 2013: 5). In doing how material artifacts and cultural imaginar-
so, O’Reilly invented the idea of Web 1.0 in ies are joined together’ and how these are
retrospect and created a perception of the connected to specific production practices
web before, Web 1.0, and after, Web 2.0, the (Ankerson, 2015: 3). Like Ankerson’s consid-
dot-com crash (Allen, 2013). Whilst O’Reilly eration of web historiography through forms
228 THE SAGE HANDBOOK OF WEB HISTORY

of alignment and configuration, I approach an analytical device for data-harvesting; and


my periodization of the web through one of sixth, the disappearance of the ‘traditional’
the core infrastructural elements of the web, hypertext link as the prime connection mech-
the hyperlink, by tracing its production and anism with the rise of mobile apps.
use practices within a view of the web as a
complex social, techno-commercial configu-
ration. Such focus does not aim to merely
describe the technological progress of the TRACING THE HYPERLINK
hyperlink as a material artifact, but rather,
to situate its development within the diverse The history of the hyperlink as a reference
practices of end-users, webmasters, bloggers, and navigational object is often traced back
and developers within the context of the soft- to the footnote (De Maeyer, 2013), which
ware, platforms, search engines, and apps that functions as a way to explicitly link to addi-
are involved in the creation, distribution, and tional information and source material within
consumption of hyperlinks. At the same time, a text (Grafton, 1997), and to even older
I consider the hyperlink’s imagined capabili- forms of inter textual connections (Halavais,
ties by the actors involved in its production 2008; Brügger, 2017). In his genealogy of
and distribution. Which social, political, or the hyperlink, Niels Brügger positions the
economic purpose does the hyperlink serve hyperlink within the longer and broader his-
for them? I argue that there is an ongoing – tory of links as mechanisms for interconnect-
and sometimes conflicting – interpretation of ing pieces of text (2017). He identifies three
what the hyperlink is or should be, and that phases in his history of intertextual linking
there is a constant negotiation between spe- and observes how each period is character-
cific production and use practices as well as ized by a specific type of media: first, links in
an unfolding alignment with social and eco- non-digital media such as clay tablets, printed
nomic incentives. books, and mechanical systems; second,
In what follows, I argue that the hyperlink links in stand-alone digital computers; and
is not a timeless object and that it may be third, links in networked digital computers
employed to analyze changes in the social, (2017). The history of the link presented here
technical, and commercial configuration of mainly focuses on the third period, which is
the web. I do so by providing a periodiza- characterized by the advent of interconnected
tion of the hyperlink based on its historically computer systems, digital hypertext systems,
diverse roles at six key moments (or episodes) and the rise of the World Wide Web. In par-
that are characterized by distinct devices. ticular I focus on the role of search engines,
I provide a history that discusses these six blog software, social media platforms, and
episodes individually whilst demonstrat- apps in the evolution of the role of the hyper-
ing their interconnections, thus highlighting link on the web and the mobile space.
how these episodes may run parallel. First, On the web hyperlinks have evolved beyond
the proto-hyperlink as envisioned in early citations to serve a wide range of social and
hypertext systems; second, the hypertext link technical functions (Halavais, 2008). Thus,
as fabric of the web and navigational object instead of thinking about hypertext and
in the early pre-search web; third, the hyper- hyperlinks as ‘remediations’ of older media
link as the currency of the web in the heydays types and linkages (cf. Bolter, 1990), I follow
of the search engine era; fourth; the role of Elmer’s call that if we wish to understand links
hyperlinks in building the blogosphere and and linking on the web ‘we must also come to
the introduction of new link types; fifth, the grips with the specific architectural logic of
effects of platformization on the hyperlink by the web’ and consider the hyperlink within
social media platforms turning the link into its web-native environment (2001). That is, I
A HISTORIOGRAPHY OF THE HYPERLINK 229

consider the medium-specificity of the hyper- organize the world’s information (Rayward,
link as a web-native object (Helmond, 2013; 1994; Wright, 2014a; Wright, 2014b). Otlet
Rogers, 2013) by focusing on the infrastruc- aimed ‘to promote universal access to human
ture of the web and the actors involved in the knowledge through a global information
production and distribution of hyperlinks. network that he dubbed the “Mundaneum”’
In what follows next, I discuss the changing (Wright, 2014a: 5). Knowledge was stored
purpose, use, and function of the hyperlink in and connected to the universal library of
over time in six episodes, each characterized the Mundaneum via a personalized multime-
by a number of human and non-human pro- dia workstation called the ‘Mondothèque’
tagonists that have played a major role in the (van den Heuvel & Rayward, 2011; Wright,
history of the hyperlink. 2014a). The Mondothèque, which remained
an unrealized prototype, should be under-
stood as ‘more than just a platform for con-
suming information; it was an active tool for
EPISODE 1. THE PROTO-HYPERLINK knowledge production’ (Wright, 2014a: 235)
IN EARLY HYPERTEXT SYSTEMS enabling users to create links between vari-
ous types of media documents and pieces of
Most commonly, the lineage of the hypertext knowledge. Wright sees Otlet’s Mondothèque
link is traced back to pre-web hypertext sys- as the forgotten European precursor to
tems (Barnet, in this volume; Elmer, 2001), Vannevar Bush’s Memex, a proto-hypertext
as a way to interconnect and make accessible system in which (hyper)links were concep-
our ephemeral human knowledge (Barnet, tualized as associational markers to establish
2014). In her history of hypertext, Barnet (in ‘trails’ between documents (Bush, 1945). The
this volume; 2014) describes different Memex aimed to organize, sort, and structure
visions, models, and implementations of the world’s knowledge via associative index-
hypertext systems before the web in order to ing. However, what distinguishes Otlet’s from
emphasize that the web is just but one Bush’s visions is that ‘Otlet saw the network
famous realization of a hypertext system. It is as essential to his vision of a worldwide
beyond the scope of this chapter to describe platform for knowledge sharing; Bush envi-
the development of these systems at length. sioned the Memex as a stand-alone machine’
The history of hypertext has been extensively (Wright, 2014b). Otlet and Bush’s devices
discussed elsewhere by scholars from litera- were analog machines that relied on micro-
ture studies, game studies, information sci- film for the storage of data, but both remained
ence, and new media studies (see Bolter, unbuilt prototypes. The advent of the (net-
1990; Landow, 1991; Aarseth, 1997; Elmer, worked) computer introduced new ideas
2001; Kirschenbaum, 2001; Wardrip-Fruin, about organizing and connecting the world’s
2004; Krapp, 2006; Halavais, 2008; Barnet, information. Ted Nelson and Doug Engelbart
in this volume; Barnet, 2014; Wright, 2014a; were two pioneers working on early hypertext
Brügger, 2017). systems before the advent of the web.
However, it is important to briefly con- The word ‘hyperlink’ has been attrib-
sider a number of proto-hypertext systems uted to Ted Nelson, who worked on the first
and the types of interconnections they con- hypertext system, Project Xanadu, in the
ceived to understand links as part of tech- 1960s. The system was never entirely built
nical infrastructures for organizing and but a number of partially functioning Xanadu
distributing information. One of the earliest prototypes have been released since the late
proto-hypertext systems was Paul Otlet’s 1990s (Nelson, 2016). Nelson coined the
Mondothèque, a device for researchers, pro- word ‘hypertext’ (Nelson, 1965) in a paper
viding one of the many visions on how to outlining an evolutionary file structure for an
230 THE SAGE HANDBOOK OF WEB HISTORY

automatic personal filing system, in which Berners-Lee describes his task as to


he quotes Bush’s ideas about the Memex at ‘marry’ the development of the internet with
length. Nelson used the word hypertext to early ideas of hypertext and hypertext links
describe how text within the system could be (2000: 6) to form ‘a web of information’
accessed and retrieved in a nonlinear fashion (2000: 4). His vision of the web was inspired
whereby readers could follow hypertext links by the following idea: ‘Suppose all the infor-
to read additional information. A noteworthy mation stored on computers everywhere were
core element of Nelson’s Xanadu was the linked […]. Suppose I could program my
idea of unbreakable two-way links as a way to computer to create a space in which anything
implement ‘transclusion’, a type of interlink- could be linked to anything. […]’ (2000: 4).
ing of ‘the same thing knowably and visibly To create such an interlinked information
in more than one place’ (Nelson, 1999: 1). space, Berners-Lee created the Universal
Around the same time as Nelson was work- Resource Identifier (URI) to identify any
ing on Xanadu, Doug Engelbart was creat- object on the web, the Hypertext Transfer
ing the oN-Line System (NLS), inspired by Protocol (HTTP) for transferring hypertext,
Bush’s Memex. NLS was a system for cross- and the Hypertext Markup Language
referencing, based on the basic premise that (HTML) as a common language or ‘basic
you could link to anything and to anywhere lingua franca’ for writing and representing
through the Link function (Barnet, in this hypertext pages (2000: 36–40). Berners-Lee
volume). The NLS system had fine-grained considered the URI the ‘most fundamental
linking capabilities and links within the sys- innovation of the web’ because the hypertext
tem represented a relationship (Barnet, in link contains the destination URI telling the
this volume).1 Both Xanadu and NLS were browser where to find the document (2000:
never adopted on a wide scale and it wasn’t 37). HTML in its turn functions as ‘the
until the late 1980s, when Apple started sell- fabric’ (Berners-Lee et al., 1994) or ‘connec-
ing computers with a pre-installed version of tive tissue’ (Berners-Lee, 1998) of the web.
HyperCard, a piece of hypermedia software, One of the central elements of HTML is the
that hypertext became available to the larger hypertext link as ‘the basic hypertext con-
public and became ‘de facto the archetype struct’ enabling connections between differ-
of what hypertext – and hyperlinking – was’ ent resources on the web (W3C, n.d.).
(Barnet, 2014: xxii; Brügger, 2017). With the Hypertext links, commonly referred to as
advent of the World Wide Web (the web), hyperlinks, web links, or just simply links,
the ideas of hypertext and hyperlinking allow the creation of interconnections
would be further exposed to a larger public. between websites, web pages, and other web
objects. As such, they bear the capacity to
weave disparate entities of the web together
into ‘a single, global information space’,
EPISODE 2. THE HYPERLINK AS which Berners-Lee called the World Wide
FABRIC OF THE WEB Web (2000: 4). Universality was key to the
design of the web as ‘[a] hypertext link must
In his book Weaving the Web on the history be able to link to anything’ (107) and anyone
of the web, Tim Berners-Lee acknowledges should be able to create links. Berners-Lee
the ideas of Bush, Nelson, and Engelbart in considered ‘the right to link’ as ‘the very
developing a hypertext system (2000: 5–6). basic building unit for the whole Web’ (139).
He describes how his creation was also Similar to, but distinct from, Nelson’s imple-
inspired by the organizing structures of the mentation of the two-way link, Berners-Lee
mind in terms of making and storing connec- contemplated designing the web with bid­
tions through (random) associations (2000: 10). irectional links (Berners-Lee, n.d.) but these
A HISTORIOGRAPHY OF THE HYPERLINK 231

were never implemented as part of an archi- With the growth of the web it became
tectural decision because mono-directional more and more difficult to find interesting
links enabled the web to scale quickly and relevant websites. This led to the crea-
(Berners-Lee, 1998). Another important tion of new services that aimed to index the
aspect in the design of the link was that web (Halavais, in this volume), with search
whilst Berners-Lee saw that it could be used engines like Aliweb, Excite, and AltaVista,
as ‘a way of transmitting judgements of qual- expert directories of links2 like DMOZ
ity’, he made clear that ‘the intention in the and the Yahoo! Directory, and portals like
design of the web was that normal links MyNetscape and My Yahoo!, which posi-
should simply be references, with no implied tioned themselves as entry points to the web
meaning’ (Berners-Lee, 1997) and that the by providing professionally curated links.
hyperlink does not imply any type of endorse- These mostly new commercial actors made
ment (2000: 139). money off hyperlinks through the strate-
Key to the initial popularization of hyper- gic placement of links or ‘featured links’
links was the introduction of graphical on their websites (Hargittai, 2000; Rogers,
browsers such as Mosaic by indicating click- 2002). Thus, hyperlinks have given rise to
able hyperlinks in blue-colored underlined an industry of new commercial actors on
text, which became the de facto standard for the web such as content aggregators, web
displaying links on the web (Weinreich et al., portals, and search engines (cf. Dellarocas
2001). Whilst anyone could potentially create et al., 2013: 2360). A new company, Google,
hyperlinks by learning HTML, in these early would turn hyperlinks and the interlinked
days of the web this act was reserved to web- infrastructure of the web into a very success-
masters who could make and run their own ful business model.
websites. With the advent of HTML editors,
content management systems, blog software,
and free website services such as Tripod and
Geocities, it became easier to create web- EPISODE 3. THE HYPERLINK AS
sites and to link to other sites. This period of CURRENCY OF THE WEB
mass amateur website building, which Olia
Lialina refers to as the ‘vernacular web’, was An important moment in the history of both
driven by a do-it-yourself ethos with very few the web and the hyperlink has been the intro-
design rules – except for a valid HTML – and duction and rise of search engines. Here, I
led to an explosion of websites with an ama- focus on the role of Google as the search
teur aesthetic (Lialina, 2009: 27). According engine that ‘built its empire from an appre-
to Lialina this web was ‘fascinated by the ciation of the Net’s underlying architecture’,
power of links’ and ‘people felt it was their as it conceived of a ‘web/info-centric busi-
responsibility to configure the environment ness model that was built upon the harvesting
and build the infrastructure’ by linking to and crawling of hyperlinks’ (Elmer, 2006: 9).
other pages and by listing external link lists What distinguished Google from previous
on their websites (27). In addition to build- search engines is that it employed hyperlinks
ing the web’s infrastructure, people also built as a way to calculate the relevance and repu-
communities through links by engaging in tation of a site, by considering – amongst
‘web rings’, a way of interlinking websites other factors – who links to whom and how
with similar topics (Elmer, 1999; Lialina, often a site is linked to (Brin and Page,
2009: 27). These web rings also contributed 1998). Google’s PageRank algorithm calcu-
to the discoverability of new websites and lates the relevance of a website based on the
offered a subject-based index of a set of web- quality or importance of other websites link-
sites (Elmer, 1999). ing to it (Brin and Page, 1998) and as a result
232 THE SAGE HANDBOOK OF WEB HISTORY

not all hyperlinks have equal value for EPISODE 4. BUILDING THE
Google. Whilst Google currently takes over BLOGOSPHERE AND THE
200 signals into account, PageRank is still an INTRODUCTION OF NEW LINK TYPES
important factor in determining a site’s rele-
vance (Search Console Help, n.d.). Besides directories, web portals, web rings,
Google’s ‘industrialization’ of the hyper- and search engines trying to organize the
link (Turow, 2008: 3) has created the so- web in the late 1990s, there were also very
called ‘link economy’ (Rogers, 2002; important individual actors such as webmas-
Walker, 2002) in which search engines deter- ters and bloggers who would point other
mine the value of links and where links are users to interesting websites and pages
exchanged, bought, or sold by webmasters through link lists on their websites and blogs.
and spammers. Google turned the link from Blogs are often seen as a specific genre of
a navigational object into ‘the currency of websites (Siles, in this volume),3 defined by
the web’ by interpreting links as ‘objective, their form as ‘frequently updated, reverse-
democratic and machine-readable signs of chronological entries on a single Web page’
value’ (Walker, 2002: 72). Google’s busi- (Blood, 2004: 53). These entries, called
ness model is built on the ‘open web’, a web ‘posts’, constitute the main units or building
built on open standards and where links can blocks of the blog, in contrast to the ‘page’
be followed from one page to another and unit of the website. In order to be able to link
indexed by search engines. The foreground- to a single post, instead of a whole page,
ing of the link as a sign of value and medium blogger Jason Kottke started implementing
of exchange raises questions about the actors permanent URLs for each entry on his blog
involved in the production, indexation, and (Kottke, 2000). This idea was quickly taken
distribution of links. In this sense, the hyper- up by developers from the blog software
link provides a way to understand the early company Blogger, who implemented ‘per-
commercialization of the web, which has tra- malinks’ into their software to give ‘each
ditionally focused on businesses ‘rushing’ to blog entry a permanent location – a distinct
the web to build websites as a form of pres- URL – at which it could be referenced’
ence (cf. Rogers, 2002), but instead focuses (Blood, 2004: 54). This new link type was
on the political economy of the link. Such a quickly adopted by other bloggers and blog
perspective considers its role as a currency software developers and the permalink
(Walker, 2002) and draws attention to the rise became ‘a canonical component of the stand-
of new web-specific services that monetize ard Weblog entry’ (Blood, 2004: 54). Early
the hyperlink as the core feature of the web, blogger Rebecca Blood argues that the ‘orig-
and the rise of new linking practices and inal weblogs were link-driven sites’ (2000)
strategies. With the new role of the link as the and that the prototypical blog focuses on
currency of the web, actors involved in creat- linking to other blogs to provide commentary
ing hyperlinks became aware of their strate- on interesting blog posts (Blood, 2000;
gic value, leading to a whole new industry Herring et al., 2005). But links did not only
of manipulation around search engines to play an important role within a blog entry;
achieve higher rankings (Halavais, 2008: bloggers also often created ‘blogrolls’ to link
49–50). Next, I elaborate on the political to other interesting blogs in the sidebar of
economy of linking and the constant negotia- their blogs. All these links between blogs
tion between those creating links and those create an interconnected network of blogs
indexing and distributing links by focusing called ‘the blogosphere’, although the degree
on the role of links and the practices of link- of its interconnectedness might be overesti-
ing in the blogosphere and the creation of mated (Herring et al., 2005). The ritual prac-
new types of links. tices of linking stabilized the blog as a web
A HISTORIOGRAPHY OF THE HYPERLINK 233

format and created a sense of community as configuration of actors involved in the pro-
bloggers would start referring to themselves duction and distribution of links. Blog soft-
as the ‘weblog community’ (Ammann, 2009; ware introduced and standardized new types
Siles, 2011: 745). of links such as the permalink, the track-
Before the advent of blog software, which back, and pingback, and became an active
originated and gained popularity between agent in creating links between blogs and
1997–9, bloggers would manually create the interlinking of blogs into a blogosphere
hyperlinks to link to other blogs (Blood, (Helmond, 2008; Weltevrede and Helmond,
2000; Helmond, 2008). The practice of 2012; Helmond, 2013).
blogging and linking was made easier with Another important blog feature that high-
the implementation of linking features into lights the constant negotiations taking place
blog software and What You See Is What between various actors involved in creat-
You Get (WYSIWYG) editors (Blood, 2004: ing, distributing, and exploiting links is the
54). Hereafter, bloggers would only have to blog comment. Whilst previously the act of
insert a URL and the blog software would linking was mainly reserved to webmasters
automatically generate the corresponding and bloggers,4 blog comments underneath
HTML code for the hyperlink. Blog software blog posts opened up the blog to links from
also enabled automated interlinking between blog visitors. This form of user-generated
blogs with the creation of two new link types linking expanded the participatory ideals of
that were built on top of the permalink: the blogging as a way ‘to democratize publishing’
trackback and pingback. Blog software can (Blood, 2004: 55) by also giving blog read-
‘ping’ or notify other blog software of an ers a voice. However, opening up the com-
incoming link by automatically ‘placing a ment space made blogs even more prone to
reciprocal link – a trackback – in the entry spam. Spammers would not only target the
they have just referenced’ (Blood, 2004: 55). trackback mechanism but also the comment
Trackbacks are a semi-manual type of notifi- space with links to boost their own rankings.
cation and interlinking system since bloggers This time an attempt to create a solution to
have to manually send the trackback notifi- combat link spam was developed by Google,
cation from within the software’s interface. the key actor in the link economy, who suf-
Receiving trackbacks could be automated fered from spam in its search engine results
since blog software can be configured to as a result of the new linking capabilities.
automatically receive and place trackbacks On January 18, 2005, Google announced its
underneath a post. Unfortunately, trackbacks measure to prevent comment spam, a new
turned out to be very prone to spam since the hyperlink attribute called ‘nofollow’. In
receiving software did not verify the incom- HTML, elements can have parameters called
ing link and incoming links could therefore attributes where the <a> element defines the
very easily be faked. Spammers used track- hypertext link, the ‘href’ attribute defines
backs to artificially boost the ranking of their the destination of a link, and the ‘rel’ attrib-
sites in search engines and pingbacks were ute specifies ‘the relationship between the
developed to partially solve the problems of document containing the hyperlink and the
trackbacks. Pingbacks automatically send destination resource’ (W3C, 2014). Google
notifications to other blogs and the receiv- introduced the ‘nofollow’ value as part of the
ing blog software verifies the incoming link. hyperlink’s ‘rel’ attribute to indicate a partic-
Both trackbacks and pingbacks (semi-)auto- ular type of relationship between the source
matically interlink blogs and render the links and destination link: ‘From now on, when
between blogs visible on both ends, resem- Google sees the attribute (rel=“nofollow”) on
bling the idea of a two-way link. Trackbacks hyperlinks, those links won’t get any credit
and pingbacks draw attention to the changing when we rank websites in our search results’
234 THE SAGE HANDBOOK OF WEB HISTORY

(Cutts and Shellen, 2005). No credit meant buttons to fit the underlying business models
no value for a website’s PageRank, severely of the associated social media platforms; and
diminishing the intended goal of the spam- third, the increasing invisibility of the hyper-
mers. The nofollow attribute was introduced link in social media.
in collaboration with a number of partners, With the introduction of social bookmark-
including major blog software developers ing icons, now commonly known as social
such as Blogger and WordPress, who imple- buttons, social media platforms have devel-
mented it into their software so that every oped ways to facilitate easy link sharing
link in a blog comment would automatically across platforms (Helmond, 2013).5 Sharing
receive the nofollow attribute. The effect of a website article using a social button such as
nofollow is that, in Google’s world, not all Facebook’s Share button automatically cre-
links have equal value, since links in com- ates a status update displaying a preview of
ments with the nofollow attribute do not pass the link’s content: the article’s title, subtitle,
on value for a site’s PageRank. The role of image, domain name, and author.6 The status
‘nofollow’ in preventing spam is still a press- update with the link preview does not show
ing issue today, as Google regularly warns the linked URL but instead is reduced to the
webmasters to follow their linking guidelines URL’s domain name, which then links to the
and to use ‘nofollow’ where appropriate or individual article. Sharing an article directly on
else they may see their sites’ rankings penal- Facebook is handled slightly differently since
ized (Google Webspam Team, 2017). posting a link both shows the full URL in the
status update and creates a link preview under-
neath the post. The automation of link sharing
via social buttons creates a two-way link:
EPISODE 5. THE EFFECTS OF
PLATFORMIZATION ON THE Facebook also employs linking mechanisms that
closely resemble the bidirectional links proposed by
HYPERLINK Bush, Nelson, and so on. Whereas a typical Web
hyperlink usually goes in one direction – from
Whilst the hyperlink has technically not source document to target hyperlink – links on
changed since its inception, in the previous Facebook typically work both ways; a comment or
‘like’ interaction will show up in multiple threads;
episodes I have demonstrated how search and a user may follow a link back to see the origi-
engines and blog software have appropriated nal commenter’s page. (Wright, 2014a: 290)
or added features to the hyperlink for their
own purposes. This next episode discusses However, Facebook does not support tradi-
the implications of the ‘platformization’ of tional linking practices of being able to
the web (Helmond, 2015), a term used to create a hyperlink by linking a word in a
refer to the rise of social media platforms and Facebook post to a specific URL destination
the consequences of their extension into the on the web. The only ‘links’ you can create
web, for the hyperlink. It provides a way to are to internal Facebook items such as users
conceptualize how social media platforms and pages (Berners-Lee, 2010). Software
have imposed their own infrastructural and developer Dave Winer is a strong advocate of
economic model onto the web by repurpos- implementing hyperlinking into Facebook.
ing web-native objects, such as the hyper- The current implementation, he argues, is not
link, for their own gain. I discuss the a form of linking and is ‘really hurting the
consequences of the rise of social media for rest of the web’ by not allowing users to
the hyperlink by addressing three issues: freely link to other places on the web (Winer,
first, the further automation of linking 2016). Tim Berners-Lee has also expressed
through social buttons; second, the reconfig- his concerns in regard to websites and web
uration of the hyperlink through social services that develop their own linking
A HISTORIOGRAPHY OF THE HYPERLINK 235

mechanisms that do not interoperate with the valuable link data. Twitter can now easily
web’s open standards and formats (Berners- track how many times a link has been shared,
Lee, 2010). He argues that they are under- clicked on, by whom, and when, on its plat-
mining the web’s principle of universality form. But it can also gather many of these
meaning that you can link to anything, where statistics when the link is distributed outside
‘the URI is the key to universality’ (2010). of its platform boundaries. As I have argued
He explains how social media platforms are elsewhere, platforms have turned the link into
becoming silos by walling off their informa- an analytical device by repurposing it into a
tion: ‘The isolation occurs because each data-rich shortened URL:
piece of information does not have a URI.
By automatically wrapping links in tweets with a
Connections among data exist only within a
t.co URL, Twitter makes this shared data on its
site’. Berners-Lee warns that ‘[t]he more this platform ‘algorithm ready’ by reconfiguring the
kind of architecture gains widespread use, hyperlink to fit the platform. The automatic pro-
the more the Web becomes fragmented, and cessing of the hyperlink and its reconfiguration
the less we enjoy a single, universal informa- into an analytical device in order to become part of
an algorithmic system is what I refer to as the
tion space’ (2010: 82). Links are essential in
algorithmization of the hyperlink. (Helmond 2013)
weaving the web into a single place and
social media platforms are prohibiting a spe- On social media platforms, the link no longer
cific type of linking practice, the traditional just functions as a navigational device where a
hyperlink, using instead their own forms of link in a tweet points to another location on
interlinking. In addition, many social media the web, but also provides valuable data such
platforms also reconfigure links posted to as link statistics. This data serves as input for
their platforms so that they fit the underlying the platform’s various algorithms that deter-
platform architectures and business models. mine relevant and trending content but is also
They do so by turning links shared to their used by Twitter’s marketing partners for their
platforms into shortened URLs, a new type own commercial purposes such as advertising
of link which gained popularity on Twitter. (Helmond, 2013). Platforms have appropri-
On Twitter, users can only use up to 140 ated the link as an object to measure web
characters for their tweets, which means activities, attention, and other forms of engage-
that posting a link will take up a signifi- ment to fit their own economic agendas.
cant number of valuable characters. Users Next, I briefly turn to the current state of
turned to URL shorteners such as TinyURL web development to reflect on the decline
to shorten their long links into shortened of the hyperlink and the introduction of new
links, e.g. http://tinyurl.com/newlink. In June link types such as mobile deep links and app
2011 Twitter built its own URL shortener links with the rise of mobile apps.
t.co into its platform architecture and from
that moment onwards every link posted to
Twitter is automatically shortened into a t.co
URL.7 Twitter hides this transformation of EPISODE 6. BEYOND THE WEB:
the hyperlink from its users, by displaying an MOBILE APPS AND DEEP LINKING
abbreviated version of the original long link
in the front-end, the platform’s user interface The previous sections have argued that in
as visible in the browser, whilst using the order to understand the evolution of the
actual shortened link for a variety of different hyperlink and associated linking practices
purposes in the back-end, the platform’s data we should consider the actors involved in the
infrastructure.8 By routing all links through production and reconfiguration of the hyper-
its platform, Twitter can not only detect and link. In the changing political economy of
filter out spam links but it can also collect linking a variety of actors has been involved
236 THE SAGE HANDBOOK OF WEB HISTORY

over time: from webmasters, bloggers, and Google, and Facebook have introduced new
social media users to search engines, blog link types for linking between apps and
software, and social media platforms, to for linking to specific content within apps.
only name a few. I have argued that in order Facebook launched App Links at its F8
to understand what the hyperlink is, can do, Developer Conference in 2014 as ‘an open
or is made to do, we should consider its role cross platform solution for deep linking to
within the transforming socio-techno- content in your mobile app’ (Facebook for
commercial environment of the web. When Developers, n.d.). For Facebook, these App
considering the present role of the hyperlink, Links also provide a way to gather app ana-
its evolution and implementation should also lytics such as ‘traffic and usage informa-
be considered beyond the web. That is, with tion’ on who is using your app (Facebook
the introduction of mobile phones and the for Developers, n.d.), thereby position-
increasing ubiquity of mobile network data, ing App Links as both navigational and
in recent years we have seen the develop- analytical devices. In May 2015 Google
ment of the mobile web (Goggin, in this announced its own ‘App links, along with
volume) with pages specifically built for or App Indexing for Google search’ (Eason,
adjusted to smaller screens and lower band- 2015). A month later, during the WWDC,
width. However, instead of building mobile Apple also announced its own version, enti-
web pages or services many companies have tled ‘Universal Links’ (Apple Developer,
built dedicated mobile apps. For many, apps n.d.-a), seemingly referring to the univer-
have become the main entry point to the web sality principle of hyperlinks on the web.
and internet-based platforms (comScore, Similar to websites and web pages, which
2016). Whilst the mobile web is based on are linked or ‘woven’ together by hyper-
the same standards as the web, and links links to create the World Wide Web, Apple
between pages are created using hyperlinks, explicates how ‘an app exists as part of an
apps employ proprietary linking mecha- ecosystem’ of other apps (Apple Developer,
nisms. They use so-called mobile or app n.d.-b). Universal Links are Apple’s way of
‘deep links’ to enable app interlinking but linking apps into an ecosystem similar to but
these deep links are not based on open separate from the web. Whilst these current
standards. In his defense of the open web, developments may aim to address the cur-
Berners-Lee describes how deep linking in rent barriers of app-interlinking, they are
apps works and the consequences of this not providing a universal solution.9 With
type of linking: ‘Apple’s iTunes system, for the introduction of deep links, Facebook’s
example, identifies songs and videos using App Links, Google’s app links, and Apple’s
URIs that are open. But instead of “http:” Universal Links these companies have con-
the addresses begin with “itunes:”, which is tributed their own – mostly proprietary –
proprietary. You can access an “itunes:” link solutions for app linking (and indexing)
only using Apple’s proprietary iTunes pro- and have created a proliferation of new app
gram’ (2010: 83). Apps do not employ open link standards outside of the scope of W3C,
link standards, which, similar to the prac- which oversees open web standards such as
tices of social media platforms, turns them HTML and the hypertext link. In addition,
into stand-alone objects or little unlinkable deep links provide not only an app-native
islands that often function separate from the way of linking, but also an app-native way
web, thereby turning them into ‘closed to track users, and to collect valuable infor-
worlds’ or centralized ‘walled gardens’ mation for advertisers. At the same time this
(Berners-Lee, 2010: 83). new link type points to the diminishing role
To create interconnections between apps of the hyperlink in the app space as a univer-
and the web, companies such as Apple, sal interconnector.
A HISTORIOGRAPHY OF THE HYPERLINK 237

CONCLUSION the hyperlink is, could, or should be. That


is, there is a constant interplay between the
The hyperlink as one of the core infrastruc- material object of the hyperlink and the
tural elements of the web has a long and rich various actors involved in its production,
history – not only if we consider its potential distribution, and valorization. Instead of
precursors of intertextual links and its early providing a lineage or teleological account
employment in proto-hypertext systems, but of the hyperlink, the six episodes focused
also, and especially, its implementation on on the changing constellations of actors and
the web. In this chapter I have provided a how they employ the hyperlink for their own
historiography of the hyperlink by narrating social, technical, and economic incentives.
its role in six episodes in the history of the The episodes demonstrate a number of con-
web. These episodes demonstrate how the tinuities and discontinuities in the role of the
hyperlink has evolved from a feature to inter- hyperlink. First, the hyperlink is still a core
link information within technical knowledge element of the web and a central mecha-
infrastructures such as the Mondothèque, the nism for creating and navigating connections
Memex, Xanadu, and the World Wide Web, between web pages and sites. On the one
to a feature that may be used to collect valu- hand links are created manually by webmas-
able information about the interlinked struc- ters, bloggers, and other types of web users,
ture of the web as well as about the people and, on the other hand, their production
clicking and viewing those links. That is, the has become increasingly (semi-)automated
hyperlink’s long-established role as a navi- with the introduction of blog software and
gational tool has been supplemented with its social buttons. As a consequence, the way in
new function as an analytical device for data- which the web becomes interconnected has
harvesting. As such, the evolution of the changed. Second, social media platforms and
hyperlink provides an important entry point companies involved in app development have
into understanding the increasing commer- created their own interconnection and inter-
cialization of the web. The way in which the linking mechanisms, their own types of links,
hyperlink has been employed socially by which function similarly to, but are different
users and economically by search engines, from, the traditional hyperlink. As a result,
social media platforms, and apps has changed social media platforms and in particular apps
significantly. By tracing how these actors are seen as walled gardens which operate on
have handled and appropriated the hyperlink their own logic of connection, largely discon-
over time we gain insights into its novel nected from the World Wide Web at large.
additional functions on the web and in the Custom linking mechanisms are not only
app space. developed for creating connections, but espe-
The six episodes focused on the relation cially for tracing and tracking the movement
between the main actors involved in the of users and their data through social media
production, distribution, and valorization platforms and apps.
of hyperlinks. These protagonists included By focusing on the political economy of
human actors, such as webmasters, bloggers, linking over time, I have drawn attention
blog software developers, and technology to the increasing commercialization of the
entrepreneurs, as well as non-human actors hyperlink. The hyperlink has seemingly lost
such as blog software, search engines, social its innocence since it is no longer just a way
media platforms, apps, and the companies of navigating the web but has also become an
behind them such as Google, Facebook, object that can be monetized and that can be
Twitter, and Apple. Each of these actors employed to track users. Linking is no longer
comes with its own very specific ideas, just ‘weaving the web’ but contributing to an
agendas, or economic incentives of what infrastructure of valuation.
238 THE SAGE HANDBOOK OF WEB HISTORY

Notes New Media & Society, 15(2): 260–275. doi:


10.1177/1461444812451567.
1  See Barnet in this volume for more detail on the Ammann, R. (2009) ‘Blogosphere 1998: Analysis’,
role of links and types of linking in Doug Engel-
Tawawa, 5 November. Available at: http://
bart’s oN-Line System (NLS) and Ted Nelson’s
tawawa.org/ark/2009/11/5/blogosphere-
Xanadu.
2  Berners-Lee’s first website at CERN also con- 1998-analysis.html (Accessed: 13 January
tained two directories of websites, one organized 2013).
by subject and the other by type of service, see: Ankerson, M.S. (2015) ‘Social media and
http://info.cern.ch/hypertext/DataSources/Top. the “read-only” Web: Reconfiguring social
html (Accessed: 8 March 2017). logics and historical boundaries’, Social
3  See danah boyd’s (2006) critique of defining blog- Media + Society, 1(2): 1–12. doi: 10.1177/
ging as a form or genre – she argues that, instead, 2056305115621935.
it should be seen as both a medium and a practice. Apple Developer (n.d.-a) Seamless Linking to
4  As well as users from web fora, message boards,
Your App, WWDC 2015 – Videos. Available
and other early forms of web-based communica-
at: https://developer.apple.com/videos/play/
tion groups.
5  See Helmond (2013) on the history of social but- wwdc2015/509/ (Accessed: 4 March 2017).
tons and the technical details of link automation. Apple Developer (n.d.-b) Universal Links for
6  Which elements are visible in the preview Developers, Apple Developer. Available at:
depends on whether or not the website with the https://developer.apple.com/ios/universal-
Share button has optimized itself for Facebook’s links/ (Accessed: 4 March 2017).
crawler by employing Open Graph tags, Face- Barnet, B. (2014) Memory Machines: The
book’s markup language. Evolution of Hypertext. London: Anthem
7  See Helmond (2013) on the history of link short- Press.
eners, the technical details behind link shortening,
Berners-Lee, T., Cailliau, R., Luotonen, A., Nielsen,
and the role of shortened URLs in social media.
H.F., and Secret, A. (1994) ‘The World-Wide
8  Technically this means that the browser displays
the long URL, whilst the short URL is coded into Web’, Communications of the ACM, 37(8):
the underlying HTML. 76–82. doi: 10.1145/179606.179671.
9  For example, Apple’s Universal Links only works with Berners-Lee, T. (1997) ‘Links and law’, Com-
iOS 9 or higher and not with Android apps, thereby mentary on Web Architecture, April. Availa-
only interlinking Apple apps into an ecosystem. ble at: http://www.w3.org/DesignIssues/
LinkLaw.html (Accessed: 29 October 2012).
Berners-Lee, T. (1998) Web Architecture from
ACKNOWLEDGMENTS 50,000 feet, W3C. Available at: https://
www.w3.org/DesignIssues/Architecture.html
The author wishes to thank Fernando van der (Accessed: 17 February 2017).
Vlist, the reviewer, and the editors for their Berners-Lee, T. (2000) Weaving the Web: The
Original Design and Ultimate Destiny of the
valuable comments on this chapter. This work
World Wide Web. New York: Harper Business.
is part of the research program Innovational
Berners-Lee, T. (2010) ‘Long live the Web: A call
Research Incentives Scheme Veni with pro- for continued open standards and neutrality’,
ject number 275-45-009, which is (partly) Scientific American, December. Available at:
financed by the Netherlands Organisation for http://www.scientificamerican.com/article/
Scientific Research (NWO). long-live-the-web/ (Accessed: 24 March 2014).
Berners-Lee, T. (n.d.) Topology, W3 Archive
1990. Available at: https://www.w3.org/
REFERENCES DesignIssues/Topology.html (Accessed: 17
February 2017).
Aarseth, E.J. (1997) Cybertext: Perspectives on Blood, R. (2000) ‘Weblogs: A history and per-
Ergodic Literature. Baltimore: Johns Hopkins spective’, Rebecca’s Pocket, 7 September.
University Press. Available at: http://www.rebeccablood.net/
Allen, M. (2013) ‘What was Web 2.0? Versions essays/weblog_history.html (Accessed: 14
as the dominant mode of internet history’, March 2014).
A HISTORIOGRAPHY OF THE HYPERLINK 239

Blood, R. (2004) ‘How blogging software c o m / c m c / m a g / 1 9 9 9 / j a n / e l m e r. h t m l


reshapes the online community’, Communi- (Accessed: 27 February 2017).
cations of the ACM, 47(12): 53–55. Elmer, G. (2001) ‘Hypertext on the Web: The
Bolter, J.D. (1990) Writing Space: The Com- beginnings and ends of Web Path-ology’,
puter, Hypertext, and the History of Writing. Space and Culture, 10: 1–14.
Hillsdale: Routledge. Elmer, G. (2006) ‘Re-tooling the network: Parsing
boyd, danah (2006) ‘A blogger’s blog: Explor- the links and codes of the Web world’, Conver-
ing the definition of a medium’, Reconstruc- gence: The International Journal of Research
tion, 6(4). Available at: http://www.danah. into New Media Technologies, 12(1): 9–19.
org/papers/ABloggersBlog.pdf (Accessed: 27 doi: 10.1177/1354856506061549.
September 2012). Facebook for Developers (n.d.) App Links –
Brin, S. and Page, L. (1998) ‘The anatomy of a Documentation, Facebook for Developers.
large-scale hypertextual Web search engine’, Available at: https://developers.facebook.
Computer Networks and ISDN Systems, com/docs/applinks/analytics (Accessed: 14
30(1): 107–117. February 2017).
Brügger, N. (2017) ‘Connecting textual seg- Google Webspam Team (2017) ‘A reminder
ments: A brief history of the web hyperlink’, about links in large-scale article campaigns’,
in Brügger, N. (ed.) Web 25: Histories from Official Google Webmaster Central Blog, 25
the first 25 Years of the World Wide Web. May. Available at: https://webmasters.goog-
New York: Peter Lang Publishing, pp. 3–28. leblog.com/2017/05/a-reminder-about-links-
Bush, V. (1945) ‘As we may think’, The Atlantic, in-large-scale.html (Accessed: 26 May 2017).
July. Available at: http://www.theatlantic.com/ Grafton, A. (1997) The Footnote: A Curious
magazine/archive/1945/07/as-we-may- History. Cambridge: Harvard University Press.
think/303881/ (Accessed: 29 October 2012). Halavais, A. (2008) ‘The hyperlink as organizing
comScore (2016). The 2016 U.S. Mobile App principle’, in Turow, J. and Tsui, L. (eds) The
Report. Available at: http://www.comscore. hyperlinked Society. Ann Arbor: The Univer-
com/Insights/Presentations-and-Whitepa- sity of Michigan Press, pp. 39–55.
pers/2016/The-2016-US-Mobile-App-Report Hargittai, E. (2000) ‘Open portals or closed
(Accessed: 5 March 2017). gates? Channeling content on the World
Cutts, M. and Shellen, J. (2005) Preventing Wide Web’, Poetics, 27(4): 233–253.
comment spam, Official Google Blog. Helmond, A. (2008) Blogging for engines.
Available at: http://googleblog.blogspot. Blogs under the influence of software-engine
nl/2005/01/preventing-comment-spam.html relations. University of Amsterdam. Available
(Accessed: 29 October 2012). at: http://www.annehelmond.nl/2008/09/23/
Dellarocas, C., Katona, Z., and Rand, W. (2013) blogging-for-engines-blogs-under-the-
‘Media, aggregators, and the link economy: influence-of-software-engine-relations/
Strategic hyperlink formation in content net- (Accessed: 2 March 2017).
works’, Management Science, 59(10): Helmond, A. (2013) ‘The algorithmization of
2360–2379. the hyperlink’, Computational Culture, (3).
De Maeyer, J. (2013) ‘Towards a hyperlinked Available at: http://computationalculture.
society: A critical review of link studies’, New net/article/the-algorithmization-of-the-
Media & Society, 15(5): 737–751. doi: hyperlink (Accessed: 4 February 2017).
10.1177/1461444812462851. Helmond, A. (2015) ‘The platformization of the
Eason, J. (2015) ‘Android M Developer Preview Web: Making Web data platform ready’,
& Tools’, Android Developers Blog, 28 May. Social Media + Society, 1(2): 1–11.
Available at: https://android-developers. doi: 10.1177/2056305115603080.
googleblog.com/2015/05/android-m-devel- Herring, S.C. et al. (2005) ‘Conversations in the
oper-preview-tools.html (Accessed: 4 March blogosphere: An analysis “from the bottom
2017). up”’, in System Sciences, 2005. HICSS’05.
Elmer, G. (1999) ‘Web rings as computer- Proceedings of the 38th Annual Hawaii
mediated communication’, CMC Magazine, International Conference on. IEEE, pp. 1–11.
January. Available at: http://www.december. Available at: http://ieeexplore.ieee.org/xpls/
240 THE SAGE HANDBOOK OF WEB HISTORY

abs_all.jsp?arnumber=1385453 (Accessed: Rayward, W.B. (1994) ‘Visions of Xanadu: Paul


24 May 2015). Otlet (1868–1944) and hypertext’, Journal of
van den Heuvel, C. and Rayward, W.B. (2011) the American Society for Information Sci-
‘Mondothèque. A multimedia desk in a global ence (1986–1998), 45(4): 235.
internet’, Poster for the 7th Iteration on ‘Sci- Rogers, R. (2002) ‘Operating issue networks on
ence Maps as Visual Interfaces to Digital the Web’, Science as Culture, 11(2): 191–213.
Libraries’ of the Places & Spaces Mapping Rogers, R. (2013) Digital Methods. Cambridge:
Science Exhibition. Available at: http://www. The MIT Press.
narcis.nl/publication/RecordID/oai:pure.knaw. Search Console Help (n.d.) How Google Search
nl:publications%2F2c037a94-f0c7-4829- Works, Google. Available at: https://support.
8d86-a7e2341991eb (Accessed: 25 February google.com/webmasters/answer/70897?hl=en
2017). (Accessed: 26 May 2017).
Kirschenbaum, M. (2001) ‘Materiality and Siles, I. (2011) ‘From online filter to web
matter and stuff: What electronic texts are format: Articulating materiality and meaning
made of’, Electronic Book Review, 3. Availa- in the early history of blogs’, Social Studies
ble at: http://www.altx.com/ebr/riposte/rip12/ of Science, 41(5): 737–758.
rip12kir.htm (Accessed: 15 December 2014). Song, F.W. (2010) ‘Theorizing web 2.0: A cul-
Kottke, J. (2000) Finally. Did you notice the, tural perspective’, Information, Communica-
kottke.org. Available at: http://kottke.org/ tion & Society, 13(2): 249–275.
00/03/finally-did-you-notice-the (Accessed: 3 Stevenson, M. (2016) ‘Rethinking the participa-
March 2017). tory web: A history of HotWired’s “new
Krapp, P. (2006) ‘Hypertext avant la lettre’, in publishing paradigm,” 1994–1997’, New
Chun, W. and Keenan, T. (eds) New Media, Media & Society, 18(7): 1331–1346. doi:
Old Media: A History and Theory Reader. 10.1177/1461444814555950.
New York: Routledge, pp. 359–374. Turow, J. (2008) ‘Introduction: On not taking the
Landow, P.G.P. (ed.) (1991) Hypertext: The hyperlink for granted’, in Turow, J. and Tsui, L.
Convergence of Contemporary Critical (eds) The Hyperlinked Society: Questioning
Theory and Technology. Baltimore: The Johns Connections in the Digital Age. Ann Arbor:
Hopkins University Press. University of Michigan Press, pp. 1–18.
Lialina, O. (2009) ‘A vernacular Web’, in Lial- Twitter (2011) ‘Link sharing made simple’, Twit-
ina, O. and Espenschied, D. (eds) Digital ter Blog, 7 June. Available at: https://blog.
Folklore. Stuttgart: Merz Akademie, twitter.com/2011/link-sharing-made-simple
pp. 19–35. (Accessed: 11 July 2013).
Nelson, T.H. (1965) ‘Complex Information Pro- W3C (2014) 4.2.4 The link element, W3C Rec-
cessing: A File Structure for the Complex, the ommendation. Available at: https://www.
Changing and the Indeterminate’, in Pro- w3.org/TR/html5/document-metadata.html
ceedings of the 1965 20th National Confer- (Accessed: 2 March 2017).
ence. New York, NY, USA: ACM (ACM ‘65), W3C (n.d.) Links, W3C Recommendation. Avail-
pp. 84–100. doi: 10.1145/800197.806036. able at: https://www.w3.org/TR/html401/
Nelson, T.H. (1999) ‘Xanalogical structure, struct/links.html (Accessed: 2 March 2017).
needed now more than ever: Parallel docu- Walker, J. (2002) ‘Links and power: The politi-
ments, deep links to content, deep versioning, cal economy of linking on the Web’, in Pro-
and deep re-use’, ACM Computing Surveys, ceedings of the Thirteenth ACM Conference
31(4es). doi: 10.1145/345966.346033. on Hypertext and Hypermedia. ACM,
Nelson, T.H. (2016) THE XANADU® PARALLEL pp. 72–73. Available at: http://dl.acm.org/
UNIVERSE. Available at: xanadu.com/ citation.cfm?id=513358.
xUniverse-D6 (Accessed: 2 March 2017). Wardrip-Fruin, N. (2004) ‘What hypertext is’, in
O’Reilly, T. (2005) What is Web 2.0: Design pat- Proceedings of the Fifteenth ACM Confer-
terns and business models for the next gen- ence on Hypertext and Hypermedia. ACM,
eration of software, O’Reilly. Available at: pp. 126–127.
http://oreilly.com/web2/archive/what-is- Weinreich, H., Obendorf, H., and Lamersdorf, W.
web-20.html (Accessed: 4 November 2012). (2001) ‘The look of the link – concepts for
A HISTORIOGRAPHY OF THE HYPERLINK 241

the user interface of extended hyperlinks’, at: http://scripting.com/liveblog/users/dav-


in Proceedings of the 12th ACM Conference ewiner/2016/01/03/0783.html (Accessed 4
on Hypertext and Hypermedia. New March 2017).
York, NY, USA: ACM (HYPERTEXT ‘01), Wright, A. (2014a) Cataloging the World: Paul
pp. 19–28. doi: 10.1145/504216.504225. Otlet and the Birth of the Information Age.
Weltevrede, E. and Helmond, A. (2012) ‘Where Oxford: Oxford University Press.
do bloggers blog? Platform transitions within Wright, A. (2014b) ‘The secret history of hyper-
the historical Dutch blogosphere’, First text’, The Atlantic, 22 May. Available at: https://
Monday, 17(2). doi: 10.5210/fm.v17i2.3775. www.theatlantic.com/technology/
Winer, D. (2016) ‘Facebook and linking is a big archive/2014/05/in-search-of-the-proto-
deal’, Scripting News, 3 January. Available memex/371385/ (Accessed 18 February 2017).
17
How Search Shaped and Was
Shaped by the Web
Alexander Halavais

Search engines have long been seen as a set a largely indexed and searchable web marks
of services somehow ‘bolted on’ to the larger one part of that shift. This has been driven
web experience. In this view, search engine not only by the technology of search, but by
technology allowed for an ever-more com- the commodification of attention online. As
plete indexing of a rapidly growing new the search engine came to be the main gate-
World Wide Web, and as the card catalog keeper of online attention, search became the
provided an index to the library stacks, the economic engine and came to be closely tied
search engine was seen as somehow separate to online advertising and marketing.
from the rest of the web. This is more than a But this is far from unidirectional. While
metaphor: the technologies that make up web it is true that the technologies of search have
search trace their lineage directly to library had wide-ranging effects on the organiza-
cataloging, among other sources. But for tion of information, those search engines did
many reasons, the relationship of search not emerge from a vacuum. In many cases,
engines to the larger web is far messier. The search evolved to meet specific needs brought
architecture of the web has co-evolved with about by new uses of the growing web. The
the development of search engine technolo- development of search engines and the web
gies, and the biases of those technologies were deeply intertwined and co-evolutionary.
have shaped – and continue to shape – the This chapter begins by tracing the changes
modern web. in search over time, from the period before
This chapter traces the large-scale shift the emergence of the web to the present.
from web surfing to web searching, and what It then examines the question of how new
this has meant for the organization of the web markets of attention and online socializa-
over time. The massive reorganization from tion have both affected and been affected by
intentionally chaotic distributed hypertext to the structures and biases of search engines.
HOW SEARCH SHAPED AND WAS SHAPED BY THE WEB 243

By understanding search, we are better able digital computer’s ability to not just crunch
to understand how the web as a whole works, numbers but store data was becoming appar-
and how it has changed over time. ent, and it was during this period that many
of the models for representing documents
and algorithms for searching were devel-
oped, especially as part of Gerard Salton’s
THE LONG HISTORY OF THE SEARCH research groups at Harvard and then Cornell.
ENGINE His measure of document similarity (as the
coefficient of the cosine between vectors
The World Wide Web changed the face of the described by the totality of the terms found
internet, in many ways subsuming it. We still in each document) and other ideas developed
use a number of applications outside of the during the period became the core of early
web, from email to chat to gaming, and video search engine functionality (Salton, 1975).
content takes up roughly three-quarters of Early mechanical and digital comput-
total internet traffic (Cisco, 2016), but the ing is not the only starting point. The sys-
World Wide Web was the application that tems eventually used to index digital data
took the internet to the masses. Within sev- had been used to catalog printed and writ-
eral years of the first web browser being ten information for as long as those have
introduced, there were at least a couple existed. For very modest collections, such
dozen search engines that sought to make an index could be held in the head of the
sense of the rapidly expanding web (Chu and owner or librarian. Especially as the collec-
Rosenthal, 1996). Many of these engines tion increased in size, creating an external-
owed their existence to pre-web search tech- ized index became an essential task, and the
nologies. In each of these cases, search power initially vested in the librarian moved
answered a need. The information structure to the technical embodiment of that person:
was not efficiently browsable; a technology the index (Kaser, 1962). That index, and the
was needed that would make it more effec- metadata that accompanies it, has been a part
tive for the user. of what we traditionally think of as libraries
It is difficult to identify a clear start- for thousands of years.
ing point for search engines. The process By the 1980s, traditional printed card cata-
of indexing large collections of digital text logs were largely being replicated digitally
generally comes under the aegis of ‘informa- and displaced by online indexes (Borgman,
tion retrieval’. While that field has received 1986). And with the growth of the internet
far more attention and engagement since the during the same decade, a new and rapidly
early 1990s, it certainly did not begin with growing source of digital data emerged. With
the internet – it grew up with digital comput- it came the need to search. Applications
ing. Some draw a connection to Vannevar intended to search through the files of a
Bush’s prophetic ‘As We May Think’ (1945), single computer were expanded to include
which described an imagined device (the indexes of other computers on the network.
‘Memex’) that allowed the researcher to New forms of searching also emerged. The
easily query a large collection of text and Unix command ‘finger’, for example, pro-
retrieve the appropriate document. He and vided information about a particular user,
his team built prototypes that did this with including when that user last logged on, and
microfilm in the 1930s, and others similarly often some personal contact information.
indexed systems for punch cards (Sanderson Its creator, Les Earnest, designed ‘finger’
and Croft, 2012). The first examples of purely in 1971 to aid in social networking at the
electronic machines that could do something Stanford Artificial Intelligence Lab (quoted
similar soon followed. By the 1960s the in Shah, 2000):
244 THE SAGE HANDBOOK OF WEB HISTORY

People generally worked long hours there, often conferences had started two years earlier, and
with unpredictable schedules. When you wanted had already examined the question of search-
to meet with some group, it was important to
ing hypertext when the amount of information
know who was there and when the others would
likely reappear. It also was important to be able to grew too large to be made sense of through
locate potential volleyball players when you browsing (Frisse, 1987). Indeed, Berners-Lee
wanted to play, Chinese food freaks when you himself saw the project as a melding of infor-
wanted to eat, and antisocial computer users mation retrieval and browsable hypertexts. In
when it appeared that something strange was
one of the early announcements of the project
happening on the system.
(1991) he notes:
When computers were networked via the
internet, it became possible to ‘finger’ indi- The WWW world consists of documents, and links.
Indexes are special documents which, rather than
viduals from across the country or the world, being read, may be searched. The result of such a
to find out more about them, and in the days search is another (‘virtual’) document containing links
before the web was among the more widely to the documents found. A simple protocol (‘HTTP’)
used protocols for searching for information is used to allow a browser program to request a
about individuals online. keyword search by a remote information server.
The development of the File Transfer
Much of the growth of the web can be attrib-
Protocol (FTP) led to even more disconnected
uted to how open ended it was. Adding to the
collections of files. Often public FTP servers
growing web was as simple as making a link.
allowed individuals to upload or download
Exploring this new web of information was
files anonymously. Since a file name was
as exciting as it was intimidating. But as the
rarely descriptive enough to be of particu-
web grew exponentially, and particularly as it
lar use, these servers often had an index file,
came to be used commercially, the need to be
updated by hand, that listed the available files
able to rapidly find what you were looking
and briefly described their contents. This
for became acute. The web needed to be
quickly became unsustainable, and as a result,
searchable as well as surfable.
arguably the first search engine appeared in
1990 (Deutsch, 2000). Archie provided a way
of searching across FTP servers to find a par-
ticular file. After it launched, it grew quickly: WEB SEARCH BEFORE GOOGLE
‘From 30 hits a day, we soon went to 30 an
hour, then 30 a minute’. Archie’s approach to For many people, Google represents the way
gathering information from distributed serv- one accesses the content of the internet. It
ers and then providing it as a search service can be easy to forget that there were a
became the model for the search engines that number of popular web search engines during
followed. This included Veronica, a search early popularization of the web, and even
engine for Gopher, an intermediate step after Google came to dominate. The earliest
toward the World Wide Web that made it pos- search engines had a single challenge: effec-
sible to browse files available on the internet tively accessing and indexing the rapidly
via a menu structure. Like Archie, it searched growing web. But as search engines did a
through the titles of files and indexed them by better job keeping up with the web, they
crawling through the menus of ‘gopherspace’ faced a new challenge: there was a growing
(Parker, 1994). effort across the web to become more easily
In 1989, when Tim Berners-Lee prototyped noticed by the search engines.
the ‘WorldWideWeb’, a project that provided One of the earliest web search engines
distributed access to hypertexts, he already was Wandex, developed by Matthew Gray at
could draw on several years of development the Massachusetts Institute of Technology.
around hypermedia. The ACM Hypertext Gray initially created a crawler, the World
HOW SEARCH SHAPED AND WAS SHAPED BY THE WEB 245

Wide Web Wanderer, in an effort to measure people used the web were changing as well.
the size of the growing web, but by the end Brian Pinkerton (1994) indicates this issue as
of 1993 the resulting index made the web central to the development of one of the first
searchable. Even after initial search engines web search engines, WebCrawler:
were made available, the discovery of useful
Imagine trying to find a book in a library without a
websites was often human-curated through card catalog. It is not an easy task! Finding specific
the distribution of ‘what’s new’ emails, documents in a distributed hypertext system like
shared collections of bookmarks, collabora- the World-Wide Web can be just as difficult. To get
tive ‘webrings’ that linked together like sites from one document to another users follow links
(Elmer, 1999), or web catalogs like Yahoo! or that identify the documents. A user who wants to
find a specific document in such a system must
DMOZ (Callery, 1996). Carefully organized choose among the links, each time following links
‘bookmark files’ with the URLs of interest- that may take her further from her desired goal. In
ing or oft-revisited sites (Abrams et al., 1998) a small, unchanging environment, it might be pos-
could easily be published as pages with web- sible to design documents that did not have these
ready versions generated by some of the early problems. But the World-Wide Web is decentral-
ized, dynamic, and diverse; navigation is difficult,
web browsers. As a result, many personal and finding information can be a challenge. This
home pages included a list of favored sites, problem is called the resource discovery problem.
a practice that eventually dovetailed with
‘blogrolls’ once blogging became popular. Wandex largely replicated the functionality
Reports of new and interesting sites could be of Veronica, with a crawler that was able to
found in a number of places, from magazines find and follow links. But like that predeces-
like Wired to emailed bulletins. sor, it also indexed only the titles of pages,
While bookmark files certainly helped not the content. Brian Pinkerton’s
users to ‘refind’ sites – something search WebCrawler and the Repository-Based
engines are now frequently relied on to pro- Software Engineering (RBSE) spider and
vide – they were of less use in finding specific indexer each extended this to indexing the
information. Especially early in the evolution textual content of the page. By the end of
of the web, this is hardly surprising. Today, 1994, the WebCrawler had received its mil-
there is the general expectation that if some- lionth query, and had been joined by more
thing (an idea, a company, a person) exists, than a half-dozen other early search engines.
there is some indicator of it somewhere on In terms of overall design, these shared
the web; it is simply a matter of finding it. very similar architectures, each crawling
In the case of the early web, it was far more the web with robots, constructing an index,
likely to be a surprise when something you and providing a front end to handle queries
were interested in had some sort of presence and generate a list of search engine results.
on the web. As a result, the early web was Competition was fierce, but largely revolved
ripe for exploration and discovery. Searching around coverage: how broadly their robots
the web only made sense as its size increased. crawled and how often. As Bharat and
That size increased quickly. The web con- Broder noted in 1998, a cottage industry had
sisted of just over 600 sites by the end of emerged around comparing the most popu-
1993. That number was over 10,000 by the lar search engines, with scores of articles
end of 1994 and crossed the million mark in and a dedicated website. They estimated the
the first half of 1997 (Zakon, 2017). While coverage and overlapped four of the most
surfing through several hundred sites, even popular search engines of the time: HotBot,
at the slow speeds of the early 1993 internet, AltaVista, Excite, and Infoseek. The search
was entirely possible, visiting a million – and engines had different sizes of coverage, but
eventually a billion – certainly was not. And the authors note that their most startling dis-
with commercialization, the ways in which covery was how very little overlap the search
246 THE SAGE HANDBOOK OF WEB HISTORY

engines had: ‘less than 1.4% of the total cov- it was often difficult to discern the reasoning
erage, or about 2.2 million pages appeared to behind the ranking. First, the specific algo-
be indexed by all four engines’. Dogpile and rithm that determined where in the list a site
other meta-search engines could help a bit fell was almost always a closely held secret.
here, by aggregating the results across search Second, without any semantic data, there
engines, but there was concern that much of were limited ways of determining the most
the web was undiscoverable. ‘important’ pages. These included measur-
Moreover, a clear financial model to pro- ing the frequency and proximity of the query
vide these resources had yet to emerge. terms on the page, as well as whether pages
Serving an advertisement for AT&T in 1994, had been clicked through on earlier searches,
HotWired was a pioneer in selling web ban- or whether it was part of a human-reviewed
ner ads, which helped to make their HotBot list of sites (sometimes taken from category-
search engine a profitable venture. They based sites like DMOZ). Third was an influ-
claim (Singel, 2010) that this was the spark ence that gradually changed the dynamic
that led to the ‘portal war’ and eventually between search engines and the searchable
the dot-com bubble. While there is no doubt web: site owners sought any advantage they
that search engines played an important part could in rising in the ranks of relevance.
in the commercial development of the web, The formal restrictions on commercial
that may apportion too much of the credit – activity on the internet were lifted just as
and blame – to the role of advertising online. the web was taking off. By the late 1990s
Nonetheless, the ad-supported model would the ‘search engine wars’ largely focused on
make search one of the few profitable indus- ways of ranking search results, while those
tries on the early web, and drive out other who had things to sell online were particu-
financial models for search, including paid larly interested in influencing those rankings.
submission, paid inclusion, and paid place- The basic problem of assembling and keep-
ment. Directly paying to manipulate search ing current an index had been superseded
results undermines the value of an engine, at by the question of ‘relevance’ (Introna and
least in a competitive market (see Henshaw, Nissenbaum, 2000). For some – including
2001), and eventually it drew the scrutiny Tim Berners-Lee (1996), one of the archi-
of regulators as well. Many search engines, tects of the web – for it to be useful, the web
especially a young Google, also provided cus- would need to be coded with semantic data,
tomized search and enterprise solutions that allowing computers to make sense of it in a
helped to fund their efforts. Others shifted more comprehensive fashion:
in this direction as well, including Inktomi,
To date, the principle [sic] machine analysis of
which provided search results for several
material on the web has been its textual indexing
engines, and Northern Light, which launched by search engines. Search engines have proven
a search engine in 1997 that included links remarkably useful, in that large indexes can be
results in proprietary databases, and eventu- searched very rapidly, and obscure documents
ally closed down their public search engine found. They have proved to be remarkably useless,
in that their searches generally take only vocabu-
to focus on enterprise search.
lary of documents into account, and have little or
Because search engines always yielded no concept of document quality, and so produce a
more potential results than an individual lot of junk. (Berners-Lee, 1996)
could make use of, all of them provided some
form of prioritization; as Schwartz (1998) While a number of efforts were made toward
wryly notes: ‘Although experience with the inclusion of semantic data for search, the
search engines sometimes makes this hard to web at large remained – and remains today –
believe, search results are usually ranked by wildly unstructured. Moreover, the cat-and-
relevance…’. The disbelief was natural, since mouse game between search engines and
HOW SEARCH SHAPED AND WAS SHAPED BY THE WEB 247

marketers continued unabated. Into this fray turn of the millennium. If you wanted to
came a new weapon that would significantly be found, you had to be linked. And if you
disrupt that ongoing conflict: PageRank. were linked by a particularly influential site –
especially some of the group blogs and news
filters like Slashdot and its descendants –
you could see your web traffic spike to an
THE GOOGLE REVOLUTION extreme degree. These flash crowds were at
one point called the ‘Slashdot Effect’, but
In early 1996, Stanford University doctoral could be seen as a result of sites that followed
students Larry Page and Sergey Brin started the group news approach Slashdot pioneered:
working on BackRub, which would eventu- Kuro5hin, Fark, Digg, Reddit, and – although
ally come to be called Google. The initial it is not an exact analog – now Facebook and
advantage of Google over the existing search Twitter. So, even if backlinks were not impor-
engines was embodied in the PageRank algo- tant to the process of search, they would be
rithm, which took a page from academic cita- of interest to the web author. Naturally, these
tion analysis and used the information from two functions went hand-in-hand, though.
hyperlinks as a kind of peer review. Those Once a site had received traffic from a flash
pages that received the most links from popu- mob, it was likely to receive more links, and
lar pages floated to the top of the ‘relevant’ higher search rankings – a cumulative pro-
results. Page and Brin described it not in cess that is common among a number of net-
terms of citation, but, from an intuitive sense, worked environments (Price, 1976).
from the perspective of a ‘random surfer’ While the initial work on what would
(Page et al., 1999). A user who randomly become Google web search began in 1996,
surfs links on the web would be more likely and the search engine was launched by 1998,
to wind up on a page with a high PageRank. Google really came into its own at the turn
PageRank is far from the only reason for of the millennium, and Search Engine Land
Google’s long-term rise to search engine (Sullivan, 2010) called what followed ‘The
dominance, but there can be little doubt that Google Decade’. Much of this had to do with
the algorithm changed the balance of power Google’s seemingly incessant march into eve-
between search engines and the marketers rything from ecommerce to social network-
hoping to gain an advantage through them – ing. In particular, the basis for Google’s most
at least temporarily. For the first time, web significant source of income – the AdWords
authors had to think not only about how to advertising network – was launched in 2000,
make the site appealing to the search engine providing the basis for its steady rise to one of
robots that visited it, but how to make its the most profitable businesses on the planet.
hyperlinked ‘location’ in the web ecosystem Starting with the acquisition of an archive
more appealing. Much like retailers in the of Usenet messages called Dejanews in 2001,
physical world have to think about neighbor- much of Google’s expansion occurred via
hoods, foot traffic, and visibility, suddenly it acquisitions that became products for adver-
was more important than ever before that oth- tising, blogging, mapping, cloud-based ser-
ers linked to your site. vices, artificial intelligence, online sales and
Naturally, having a large number of coupons, video sharing, ebooks, robotics,
inbound hyperlinks (or ‘backlinks’) was social networking, facial recognition, health,
always important to website owners. After telephony, mapping, natural language sys-
all, it was not just the theoretical surfer envi- tems, customer relationship management, and
sioned by PageRank who followed hyper- mobile software, among many more. These
links: that is how many people found their helped Google to produce a popular email
way to your website, especially before the service, a mobile phone operating system,
248 THE SAGE HANDBOOK OF WEB HISTORY

and a range of other services that seemingly Vaidhyanathan (2011) persuasively writes,
have little to do with search. Just as PageRank Google is more than merely an application
shifted the focus to the web ecosystem, for searching the web, or an extraordinarily
Google used the data collected from these ser- successful internet company. The process
vices to create a more effective search engine of ‘googlization’ reaches into nearly every
and a more effective advertising network. corner of our information ecosystem, col-
If the period before Google represented a oring our social interactions, our political
war of attrition and attention among search processes, and the ways in which we come
engines seeking to discover a new market, to know the world. Of course, media have
the period since Google’s inception has always shaped our experience of the world,
been marked by consolidation. Certainly, setting the agenda of our political debates, or
Google continues to have significant chal- influencing what we see as risks and threats.
lenges from rivals. In particular, the Chinese In some ways, the internet should have
search engine Baidu is often the first choice reversed the ways in which global media
for those seeking materials from China attenuated our sources of information. After
or in Chinese, though it may not have the all, with broad access to the web, everyone
broader global coverage of Google (Jiang, now had their own ‘printing press’.
2014). And Microsoft’s Bing search engine While the early web might have looked
continues to attract a segment of searchers. like a distributed conversation, where infor-
And a handful of search engines with names mation could only be found by ‘surfing’
familiar to long-time denizens of the web from page to page, search engines changed
continue to operate, including Yahoo!, AOL, that, making pages increasingly more find-
Excite, and Ask.com (though in several cases able, but less discoverable. Search engines
these sites deliver search results provided became a point of control, focusing attention
by Google or Bing). Nonetheless, by most on some parts of the web while ignoring oth-
measures, of every five people searching the ers. On the early web, it was perfectly rea-
web, roughly four are likely to be googling sonable to publish something and hope to be
(NetMarketShare, 2017). Though the basic ‘discovered’ by people who happened by and
technologies that make up the web as a whole were willing to recommend your site to oth-
had not changed, the architecture had, mov- ers. When the recommendations came from
ing from a browsable to a searchable space. a search engine instead, influencing those
engines became more important. And as
Google gradually became synonymous with
search, it also became the central switch for
MONETIZING ATTENTION information and knowledge on the internet.
It initially wrested some control over
By 2006, the Oxford English Dictionary had attention that online marketing and search
added ‘to google’, as a synonym for search- engine optimization (SEO) had begun to
ing the internet. The web come to be a main- accrue, evening out the playing field some-
stay in the media diet for a good portion of what. By 1995, the ‘Multi-Media Marketing
the world, and as audiences left traditional Group’ had been founded, which produced
mass media, advertisers needed to make a popular newsletter with tips for influenc-
sense of a new and challenging medium. ing search engines (Knowles, 2017). The
They naturally focused on the search engine SEO industry grew continuously until 2016,
as a technology that acted as a gatekeeper and when salaries and demand for SEO exper-
provided a clear space for influencing users. tise abated slightly. Especially in the early
Google was in a prime position for days, the term ‘spamdexing’ was far more
addressing these new interests. As Siva popular (Torok, 1996). Initially, the focus of
HOW SEARCH SHAPED AND WAS SHAPED BY THE WEB 249

SEO practitioners was on ‘keyword stuffing’: AltaVista or HotBot would often have to wade
determining which keywords were worth through a number of pornographic results –
targeting in searches and then including as including, at times, those illegal in their
many of them as high on the page as possible. jurisdiction – regardless of the information
Early on, these might have appeared in the sought. During a period in which search
metadata tags for the page, including those engines were in direct competition, those that
that specifically indicated keywords chosen could avoid misdirecting users had an advan-
by the author, but once search engines began tage. Obviously, as the number of pages on
ignoring such metadata for the purposes of the web ballooned, and as the web became
ranking, authors turned to new approaches. even more commercial, the question of not
These included repeating targeted keywords just returning results, but returning the most
throughout the text of the page, sometimes ‘important’ results became a priority. Search
with text with the same color as the page engines responded to spamdexing as quickly
background, or otherwise hidden from all but as they could, but it rapidly became a game of
the search engine. This could result in visi- cat-and-mouse.
tors using a search engine and arriving at a It took a bit longer for spamdexers to fig-
page that had absolutely nothing to do with ure their way around PageRank, and many
their search terms. celebrated Google for bringing some measure
This was particularly true for pornogra- of balance back to the web. While spamdex-
phy. During the 1990s, there was significant ers had developed a repertoire of techniques
concern over the availability of pornography for manipulating their own websites to achieve
online, especially to children. Part of this high rankings in search engine results pages,
was because, as Susanna Paasonen describes they had not needed to think about the broader
elsewhere in this volume, pornography was web ecosystem, and the effect of linkages,
one of the earliest commercially successful before Google came along. As Page et al.
ventures on the web. Users often searched for (1999) noted, the low ranks of these pages were
pornographic content, and providers sought likely ‘because people do not want to link to
buyers who could make an instant purchase. pornographic sites from their own webpages’.
In many ways search engines served as a Google rapidly gained in popularity against the
natural intermediary between those seeking other search engines thanks to its reputation of
out pornography – often with little interest presenting a view of the web that had been
in paying – and those seeking to provide it manipulated far less and provided what were
at a price. Online pornography providers sometimes considered more useful results.
innovated in a range of areas, from advertis- But Google was certainly not a one-trick
ing networks to pop-ups to the use of online pony. Soon after it gained significant popu-
video and safe online payments. One of those larity, website owners targeted PageRank.
areas of innovation was affiliate networks: Rather than making changes to their own site,
developing automated systems that provided they would create rings of sites that linked to
rewards for bringing paying customers to a one another, or seek to have links made from
site. Those who created these affiliate sites other sites. And it was not just the link that
were single-mindedly interested in draw- mattered: the context of the link could affect
ing searchers’ attention, and one of the most how Google interpreted a site. This effect
effective ways of doing this was to attract not was noted and exploited by some, before it
just their legitimate searches for adult con- became common knowledge. In early 2001,
tent, but searches on just about any topic. Farhad Manjoo noted a strange effect: search-
Naturally, this reduced the effectiveness ing for ‘dumb motherfucker’ on Google
of the search engine for the user, who was presented you with a link for merchandise
seeking ‘relevant’ results. Those searching relating to the US president at the time.
250 THE SAGE HANDBOOK OF WEB HISTORY

He hypothesized some potential reasons for (Segal, 2011), and BMW’s site in Germany
this, but in the coming years it became clear (Blakely and McCormack, 2006), have
that including keywords in links to a page felt the sting of being shunned. A US court
could affect how Google’s search engine recently decided that Google has the abso-
indexed it. The technique, called a ‘Google lute right to delist companies if it so chooses
bomb’, was used for several years to collec- (e-ventures v. Google, 2017). Google’s abil-
tively shape what Google indexed. ity to effectively silence voices on the web –
While Google bombing represented, not by removing them, but by making them
largely, an amusing diversion, it also pro- unfindable – is remarkable.
vided a small peek into the ways in which Google advises the creators of websites to
Google continuously changed its technology simply make their sites usable for human vis-
to prevent manipulation by SEO practition- itors, and they will do the rest. Nonetheless,
ers and spamdexers. The rise of the Google producers of websites have continuously
bomb occurred around the time of the rise of tried to gain an advantage and Google has
the blogosphere. Thanks to heavy linking and continuously adjusted their algorithms to
frequently updated pages, blogs came to exert counter this. This process has been called by
a significant influence on Google. When a some the ‘Google Dance’, as pages climb and
collection of bloggers decided to, they could fall according to new criteria for relevance.
collect this power to create a Google bomb There have also been large-scale changes
(Kahn and Kellner, 2004). But more broadly, that significantly reorganize the results pages
blogs had natural advantages when it came and determine who new winners and losers
to achieving high search rankings, and some are. Sometimes these are explained by the
complained that the blogosphere was taking search engine (though rarely in detail), and
over Google’s results pages. Others used this other times they are noted by keen observ-
to their advantage, either creating their own ers in the SEO community. Sometimes these
blog networks or spamming comments on changes can be quite substantial, with major
blogs from around the web to provide links elements of the ranking or interface sys-
to the page they were promoting. (Eventually, tems changed. For example, in late August
the latter was dampened through the ‘nofol- of 2013, Google deployed ‘Hummingbird’,
low’ tag, which excluded links in comments which provided for more conversational
from providing the ‘googlejuice’ that adds to queries. These changes are often tested with
a site’s prominence in results.) a subset of searches made by unsuspecting
Today, Google uses dozens of ‘sig- visitors before they are used more broadly.
nals’ beyond PageRank in order to provide The idea that search is a neutral conduit,
results and counter attempted manipulation. a switchboard that allows you to reach your
Nonetheless, it relies heavily on reading the desired page, is part of the mythology of the
environment around a site, and attempts to search engine. It occupies an important space
manipulate those links in order to gain rank- both in terms of the distribution of knowl-
ing continue to be used, and are one of the edge, and the flow of commerce. As such,
few reasons for Google to execute a ‘manual it is difficult to imagine that it would not
action’, which is sometimes referred to as be subject to substantial efforts to influence
a ‘Google death sentence’ (Malaga, 2010). its functionality. The primary way this has
There have been a number of examples of been achieved is by changing the way web
Google’s removal of a site from the web pages appear, and how the web itself is linked
resulting in substantial financial repercus- together. Despite a design that was intended
sions for the violating site. Even larger com- to be ‘bottom up’, the web has evolved in an
panies that have been penalized by Google, effort to reflect the biases of search engines,
including American retailer JC Penney and particularly the biases of Google.
HOW SEARCH SHAPED AND WAS SHAPED BY THE WEB 251

THE BIASING ENGINE comes from a particular US-centric (or,


alternatively, a globalized) view that might
Critics of search often seek to determine miss important national or cultural nuance.
whether Google is ‘biased’. This is the wrong Baidu played directly to that difference in
question to ask: search is inherently a biasing advertising its search engine. An article in
process, favoring some results over others. Computerworld (Lemon, 2007: 26) described
This does not mean that Google wishes to an early ad for the search engine:
present a particular position or idea, necessar-
ily, but rather that any system that acts as a ‘I get it’, the Western man says, speaking heavily
accented Chinese. Surrounded by beautiful
filter must also introduce some form of bias.
Chinese women in the video advertisement, he
As one early newspaper article (Pegoraro, grins with self-satisfaction. Nearby, a suave
1999) had it, Google ‘sees the Web as a pop- Chinese man dressed in scholar’s robes laughs.
ularity contest’. In that respect, Google has ‘You don’t necessarily get it’, he says. As the ad
become more biased over time, as it aims to unfolds, the Chinese scholar proceeds to humiliate
the Westerner, mocking his poor Chinese-language
provide better results. But just as the word
skills. In the end, the women flock to the scholar’s
‘biased’ is problematic, so is the word ‘bet- side, and the Westerner is left confused, alone and
ter’. The natural question is: better for whom? humiliated.
Given that Google’s product is its users’ atten-
tion, which it sells to advertisers in order to In order to protect its cultural capital – as
produce a profit, Google’s aim is to create a well as some degree of political control –
bias that attracts users back to Google. Because China has supported the development of
of this, ‘better’ is often better for the user. alternative search engines like Baidu, through
But a number of criticisms have sug- both direct funding and policy that has been
gested that what the user gets from Google is at times antagonistic toward Google.
biased in ways that are not necessarily better And China is not alone in this regard. In
for society, and some of these critiques find his 2007 book, Jean-Noël Jeanneney sug-
parallels with those of journalism and mass gested, for example, that a Google search is
media more broadly. There has long been a likely to lead to an Anglo-centric view of the
concern that Google presents a bias toward, French revolution; not necessarily ignoring
broadly, its advertisers. Others suggests that French sources, but favoring those from the
it commodifies knowledge, presenting it in perspective of English observers. The title of
brief, easily digestible chunks that draw us his book – Quand Google défie l’Europe –
away from more considered forms of learn- also hints at issues of national control. Driven
ing and make us collectively more stupid in part by the concern regarding such con-
(Carr, 2008). Another concern – especially trol (Abelson et al., 2008: 159), France and
with search results that since 2009 are influ- Germany partnered in an attempt to create an
enced by our past searches, our activity on alternative to Google with the Quaero pro-
the web, or what our friends look for – is that ject (2007–13). While other national search
we may find information that satiates us by engines, including the Swiss Search.ch, con-
playing to our preconceptions (Pariser, 2011). tinue to attract visitors, none of these have
A search engine created in 2008 as an alter- reached anything approaching the traffic of
native to this kind of ‘filter bubble’, called Baidu, let alone Google.
DuckDuckGo, has enjoyed some modest suc- At least for the time being, Google remains
cess by not tracking its users’ searches, and a central gateway for ideas and money on the
therefore allowing them to escape from self- web. While nations can attempt to influence
reinforcing echo chambers (Wauters, 2011). that filter through policy or the courts, the
One of the more longstanding challenges to most frequent attempt to change what Google
the issue of what Google considers ‘important’ delivers is by reshaping the web. Search
252 THE SAGE HANDBOOK OF WEB HISTORY

engines began as an attempt to inflict order fastest growing. And like Google, Facebook
on the chaotic and dynamic web of intercon- sees effective search as the gateway to selling
nected pages that made up the growing World advertisements.
Wide Web. Today, web authors are just as Even before the rise of the social media plat-
likely to use their own pages in an attempt forms, search engines focused on their forerun-
to inflict a new order on the global search ners: blogs. Though the blogosphere had its
engine. But that centrality is facing a new set own search engines, the largest of which was
of challenges. Technorati, Google focused on blogs not just
because of the currency and interest of their
topics, but because their link structure was so
essential to finding the ‘important’ sites on the
EVOLVING WITH THE SOCIAL WEB web. It was a bit surprising then that Google ini-
tially showed little interest in using signals from
The term ‘social search’ is a bit of a misno- social media platforms to aid their search pro-
mer; how could search be anything but cess. Microsoft’s Bing experimented with ways
social? However, over the last decade a sig- of directly indicating which results your friends
nificant amount of online interaction has found interesting, and Google did the same for
moved to platforms that support social par- a short time. There is a great deal of specula-
ticipation. Naturally, there has been a ques- tion as to how important social signals are to
tion about how search engines might shift as prioritizing results on these two search engines
internet users’ attention and focus has shifted. today, and while Google continues to indicate
One version of the question relates to how that it does not use social signals (Schwartz,
search engines might be used to index social 2015), there is some consensus that results
media platforms and provide search for and rankings correlate to social media attention.
with them. These systems represent a sig- The real impact of ‘social search’ has yet to
nificant challenge in terms of the volume be felt. Over the last few years news organiza-
and velocity of change: making Twitter and tions have noticed a trend in the way people
Facebook searchable is no small problem, end up on their sites. Many still go directly to
and requires a more direct connection than is the websites of trusted news sources, but a sig-
available via the web interface. So it makes nificant number arrive not thanks to a search
sense, for example, that Google might part- engine results page, but rather via a shared
ner directly with Twitter to more easily index message on Facebook, Twitter, or another
their 9,000 tweets per second (Patel, 2015). social media platform. Google, which for a
Likewise, Facebook, Google, Microsoft, time was so ubiquitous it was coming to be
and Amazon, among others, have partnered seen as the internet itself, is gradually being
to research how artificial intelligence might supplanted by Facebook, and especially by
make their systems for filtering and finding Facebook on mobile devices, a shift that has
more effective. accelerated rapidly during the 2010s.
At the same time, these platforms have This shift away from the search engine
direct access to their own data and are cre- interface is something that Google predicted
ating their own, internal search engines to in the earliest days of the company, but it
help make sense of it. In 2012, Facebook saw remains unclear what search without visible
more than a billion queries a day to its search search engines will become. Certainly, the
engine, and by 2016 that number was up to collaborative filtering function of Facebook
two billion and still growing (Constantine, holds part of the key, as do increasingly
2016). If Facebook considered itself a search sophisticated systems for analyzing both the
engine, that would make it the second most content of the web and the meaning of online
popular search engine in the world, and the messages. The future will continue to require
HOW SEARCH SHAPED AND WAS SHAPED BY THE WEB 253

search, but we may no longer see the search Abrams, D., Baecker, R., and Chignell, M.
process as clearly, or find the need to identify (1998) ‘Information Archiving with Book-
things called ‘search engines’. marks: Personal Web Space Construction
and Organization’, CHI ‘98 Proceedings of
the SIGCHI Conference on Human Factors in
Computing Systems, Los Angeles, California,
CO-EVOLVING SEARCH April 18–23, New York: ACM, pp. 41–8.
Berners-Lee, T. (1991) WorldWideWeb: Sum-
Understanding the history of the web requires mary, Usenet: alt.hypertext, 9 August 1991.
Available at: https://groups.google.com/
not just an archive of the pages that were cre-
forum/#!msg/comp.archives/CfsHlSNYPUI/
ated, but a broad understanding of the context DTs60INnuzcJ [Accessed 15 June 2018].
in which audiences encountered those materi- Berners-Lee, T. (1996) The World Wide Web:
als. The web, as a broad environment, is far Past, Present and Future. Available at:
more than its pages, their content, and a collec- https://www.w3.org/People/Berners-
tion of hyperlinks. Much of how individuals Lee/1996/ppf.html [Accessed 15 June 2018].
experience the web has to do with how they Bharat, K., and Broder, A. (1998) ‘A Technique
find and encounter the information on it. That for Measuring the Relative Size and Overlap
alone would make it important to understand of Public Web Search Engines’, Computer
how search engines have developed over time. Networks and ISDN Systems, 30(1): 379–88.
But perhaps more importantly, it is impos- Blakely, R., and McCormack, H. (2006) ‘Goog-
le’s “Death Penalty” for BMW’, The Times,
sible to cleanly divide search engines from
February 6. Available at: https://www.the-
the larger web; the web makes little sense times.co.uk/article/googles-death-penalty-
outside of the context of the search engine. for-bmw-wpbtz992h8v [Accessed 15 June
It is not just an essential ‘feature’ of the web, 2018].
it represents a technology for organizing our Borgman, C.L. (1986) ‘The User’s Mental Model
social experiences and our collective knowl- of an Information Retrieval System: An
edge. And just as the content and experience Experiment on a Prototype Online Catalog’,
of the web is not entirely in the hands of the International Journal of Man-Machine Stud-
authors of websites, the nature of search ies, 24(1): 47–64.
engines is only partially determined by their Bush, V. (1945) ‘As We May Think’, Atlantic
engineers. The search engine has evolved to Monthly, 176: 101–8.
Callery, A. (1996) ‘Yahoo! Cataloging the Web’,
meet the needs of users and web authors, and
Untangling the Web: Proceedings of the Con-
these groups have each contributed signifi- ference Sponsored by the Librarians Associa-
cantly to the evolution of web search. That tion of the University of California, Santa
back and forth is likely to continue, and so Barbara, and Friends of the UCSB Library.
understanding the search engine requires an Available at: http://files.eric.ed.gov/fulltext/
understanding of how the web has changed, ED403886.pdf [Accessed 15 June 2018].
and any hope of understanding the nature of Carr, N. (2008) ‘Is Google Making Us Stupid?
the web relies on a thorough understanding What the Internet Is Doing to Our Brains’,
of the search engine. The Atlantic, July/August. Available at:
https://www.theatlantic.com/magazine/
archive/2008/07/is-google-making-us-stu-
pid/306868/ [Accessed 15 June 2018].
REFERENCES Chu, H., and Rosenthal, M. (1996) ‘Search
Engines for the World Wide Web: A Com-
Abelson, H., Ledeen, K., and Lewis, H.R. (2008) parative Study and Evaluation Methodol-
Blown to Bits: Your Life, Liberty, and Happi- ogy’, Proceedings of the Annual Meeting –
ness After the Digital Explosion. Upper American Society for Information Science,
Saddle River, New Jersey: Addison-Wesley. 33: 127–35.
254 THE SAGE HANDBOOK OF WEB HISTORY

Cisco (2016) White Paper: Cisco VNI Forecast Knowles, M. (2017) The History of SEO. Avail-
and Methodology, 2015–2020. Available at: able at: http://www.thehistoryofseo.com
http://www.cisco.com/c/en/us/solutions/col- [Accessed 15 June 2018].
lateral/service-provider/visual-networking- Lemon, S. (2007) ‘Out-Googling Google: Chi-
index-vni/complete-white-paper-c11-481360. nese Search Giant Baidu is Beating Google at
html. Its Own Game in China, But It’s Playing by
Constantine, J. (2016) ‘Facebook Sees 2 Billion Different Rules’, Computerworld, April 30,
Searches per Day, But It’s Attacking Twitter, p. 26.
Not Google’, TechCrunch, July 27. Available Malaga, R.A. (2010) ‘Search Engine Optimiza-
at:https://techcrunch.com/2016/07/27/facebook- tion: Black and White Hat Approaches’, in
will-make-you-talk/ Marvin V. Zelkowitz (ed.), Advances in Com-
Deutsch, P. (2000) ‘Archie: A Darwinian Devel- puters: Improving the Web, London: Academic
opment Process’, IEEE Internet Computing, Press, pp. 1–41.
4(1): 69–71. Manjoo, F. (2001) ‘Google Link is Bush League’,
Elmer, G. (1999) ‘Web Rings as Computer- Wired News, January 25. Available at: http://
Mediated Communication’, CMC Magazine, archive.wired.com/science/discoveries/
January. Available at: http://www.december. news/2001/01/41401 [Accessed 15 June
com/cmc/mag/1999/jan/elmer.html 2018].
e-ventures v. Google (2017) Memorandum and NetMarketShare (2017) ‘Desktop Search
Order, US District Court, Middle District of Engine Market Share, January 2017’. Availa-
Florida, Fort Meyers Division. Available at: ble at: http://www.netmarketshare.com/
http://digitalcommons.law.scu.edu/cgi/view- search-engine-market-share.aspx [Accessed
content.cgi?article=2410&context=historical. 15 June 2018].
Frisse, M.E. (1987) ‘Searching for Information Page, L., Brin, S., Motwani, R., and Winograd, T.
in a Hypertext Medical Handbook’, in Ste- (1999) ‘The PageRank Citation Ranking:
phen Weiss and Mayer Schwartz (eds.), Pro- Bringing Order to the Web’, Stanford Infolab,
ceedings of ACM Hypertext 87 Conference, 422. Available at: http://ilpubs.stanford.
November 13–15, 1987, Chapel Hill, North edu:8090/422/ [Accessed 15 June 2018].
Carolina, pp. 57–66. Pariser, E. (2011) The Filter Bubble: What the Inter-
Henshaw, R. (2001) ‘What Next for Internet net Is Hiding from You. New York: Penguin.
Journals? Implications of the Trend towards Parker, G. (1994) Internet Guide: Veronica.
Paid Placement in Search Engines’, First Available at: http://web.archive.org/web/
Monday, 6(9). Available at: http://firstmon- 20040808093422/http://www.lib.umich.
day.org/ojs/index.php/fm/article/view/884/ edu/govdocs/godort/archive/elec/intveron.
793 [Accessed 15 June 2018]. txt.old [Accessed 15 June 2018].
Introna, L.D., and Nissenbaum, H. (2000) Patel, N. (2015) ‘Everything You Need to Know
‘Shaping the Web: Why the Politics of Search about the Google-Twitter Partnership’,
Engines Matter’, The Information Society, Search Engine Land, March 20 Available at:
16: 169–85. http://searchengineland.com/everything-need-
Jeanneney, J.-N. (2007) Google and the Myth know-google-twitter-partnership-216892
of Universal Knowledge, Teresa Lavender [Accessed 15 June 2018].
Fagen (trans.). Chicago: University of Pegoraro, R. (1999) ‘Googly Eyes’, The Wash-
Chicago Press. ington Post, January 22, p. N62.
Jiang, M. (2014) ‘The Business and Politics of Pinkerton, B. (1994) ‘Finding What People
Search Engines: A Comparative Study of Want: Experiences with WebCrawler’, pre-
Baidu and Google’s Search Results of Inter- sented at the Second World Wide Web Con-
net Events in China’, New Media & Society, ference, Chicago, October 17–19.
16(2): 212–33. Price, D. De S. (1976) ‘A General Theory of Biblio-
Kahn, R., and Kellner, D. (2004) ‘New Media metric and Other Cumulative Advantage Pro-
and Internet Activism: From the “Battle of cesses’, Journal of the American Society for
Seattle” to Blogging’, New Media & Society, Information Science, 27(5): 292–306.
6(1): 87–95. Salton, G. (1975) A Theory of Indexing. Phila-
Kaser, D. (1962) ‘In principium Erat Verbum’, delphia: Society for Industrial and Applied
Peabody Journal of Education, 39(5): 258–63. Mathematics.
HOW SEARCH SHAPED AND WAS SHAPED BY THE WEB 255

Sanderson, M., and Croft, W.B. (2012) ‘The Sullivan, D. (2010) ‘The Google Decade: Search
History of Information Retrieval Research’, in Review, 2000 to 2009’, Search Engine
Proceedings of the IEEE, 100 (Special Cen- Land, February 1. Available at: http://search-
tennial Issue): 1444–51. engineland.com/the-google-decade-search-
Schwartz, B. (2015) ‘Google: Again, Social Signals in-review-2000-to-2009-34830 [Accessed
Do Not Influence Your Ranking’, Search Engine 15 June 2018].
Roundtable. Available at: https://www. Torok, A.G. (1996) ‘Internet Search Engines:
seroundtable.com/google-social-signals-rank- Are Users Ready?’ in Ahmed H. Helal and
ing-20803.html [Accessed 15 June 2018]. Joachim W. Weiss (eds.), Towards a World-
Schwartz, C. (1998) ‘Web Search Engines’, wide Library: A Ten Year Forecast, 19th Inter-
Jounal of the American Society for Informa- national Essen Symposium, September
tion Science, 49(11): 973–82. 23–26, 1996, Essen: Publications of Essen
Segal, D. (2011) ‘The Dirty Little Secrets University Library, 21: 241–53.
of Search’, The New York Times, February Vaidhyanathan, S. (2011) The Googlization of
12. Available at: http://www.nytimes.com/ Everything (And Why We Should Worry).
2011/02/13/business/13search.html Berkeley: University of California Press.
[Accessed 15 June 2018]. Wauters, R. (2011). ‘DuckDuckGo to Google,
Shah, R. (2000) History of the Finger Protocol. Bing Users: Escape Them Filter Bubbles!’
Available at: http://www.rajivshah.com/ Tech Crunch, June 20. Available at: https://
Case_Studies/Finger/Finger.htm [Accessed techcrunch.com/2011/06/20/duckduckgo-
15 June 2018]. to-google-bing-users-escape-them-filter-
Singel, R. (2010) ‘Oct. 27, 1994: Web Gives Birth bubbles/ [Accessed 15 June 2018].
to Banner Ads’, Wired.com Available at: https:// Zakon, R.H. (2017) Hobbes’ Internet Timeline 24.
www.wired.com/2010/10/1027hotwired-ban- Available at: https://www.zakon.org/robert/
ner-ads/ [Accessed 15 June 2018]. internet/timeline/ [Accessed 15 June 2018].
18
Making the Web Meaningful:
A History of Web Semantics
Lindsay Poirier

INTRODUCTION of people, concepts, documents, etc., and


arrows describe the relationships between
While working at CERN as a contract pro- nodes – for instance, that node A depends on
grammer in 1989, Tim Berners-Lee submitted node B or that node A is a part of node B. In
a proposal to his boss Mike Sendall for an designing a system that could link nodes and
information system that would help organize describe their relationships, it would not only
information distributed across the multi- be possible to organize information in more
national laboratory without requiring research- intuitive ways, it would also be possible to
ers to agree on a standard technology or perform data analysis on the system, deter-
information model (Berners-Lee, 1999). This mining, for instance, when groups of people
system evolved into what we today call the had few ties or when certain software had
World Wide Web (WWW). several dependencies. Importantly, Berners-
Today, the WWW is understood as an Lee notes, ‘Ideally, [each node] represents
information space where documents, or or describes one particular person or object’
Web pages, can be referenced and linked to (Berners-Lee, 1989).
from other documents using hypertext links. However, the Web did not evolve this way.
However, the system Berners-Lee described After releasing the WWW to the public in
in his proposal aimed to do more than link 1991, it began to proliferate rapidly and evolved
documents; it aimed to link data, or specific towards a document-based Web. Rather than
nodes on Web pages. To illustrate the sys- hypertext linking data points, the document-
tem in the proposal, Berners-Lee described based Web linked Web documents that each
a complex information system diagramed described many objects and people. This made
with a series of nodes and arrows. Nodes it difficult to perform data analysis on Web
represent people, software models, groups data; computers could not distinguish between
MAKING THE WEB MEANINGFUL: A HISTORY OF WEB SEMANTICS 257

data points within Web documents and how styles of thought that guided knowledge
they related to documents from which they representation work in the 1970s and 1980s
were linked. Thus, in his keynote speech at the have become interwoven in the design of the
first international WWW conference in 1994, Semantic Web. Finally, I argue that the les-
Berners-Lee lamented that Web documents sons learned in attempting to structure Web
were ‘flat’ and ‘devoid of meaning’ when in data for machine consumption have pushed
fact they ‘describe real objects and imaginary researchers to reevaluate how they think
concepts’. He called for adding ‘semantics to about data structures, semantics, and logic.
the web’ so that information within documents Strategies in knowledge representation
could become machine-readable, and relation- shape how information is organized, contextu-
ships could be described between information. alized, presented, and accessed. Understanding
This was the first public reference to what how knowledge representation on the Web
today is called the Semantic Web. has progressed gives us insight into how the
As the WWW has developed into one of the Web organizes cultural, social, and politi-
world’s largest information repositories, there cal life. Sociological literature examining the
have been notable efforts to structure Web data Semantic Web has shown how its affordances
with semantics – to design the technologies have gone on to shape how knowledge is pro-
and standards to enable a Semantic Web. In duced and understood (Halford et al., 2013;
the Semantic Web, data points on Web pages Waller, 2016; McCarthy, 2017). Philosophical
have unique identifiers and metadata and can literature has examined how digital objects
be linked with other data points on other Web and the way they are related through code have
pages. Using terms defined in openly avail- come to define meaning (Veltman, 2006; Hui,
able schemas to describe the links between 2016). In focusing on the history of Semantic
data points helps the computer interpret mean- Web design in this chapter, I aim to demon-
ing from Web data. Early efforts to enable a strate how history, politics, and diverse episte-
Semantic Web aimed to improve Web search mologies are reflected in the design and use of
capacities and support automated agents in information architectures.
making sense of data. More recent efforts have
aimed to build out knowledge graphs that link
together relevant data from all over the Web.
Notably, standardizing the protocols KNOWLEDGE REPRESENTATION
needed to enable a Semantic Web involved AND ONTOLOGIES
bringing together stakeholders with differ-
ent understandings of where meaning comes For decades, knowledge representation has
from and how best to represent it to a com- been a central concept in AI research.
puter. Thus, many arguments emerged in the Research in knowledge representation ques-
undertaking. I begin by showing how these tions: how can we model the mind? How do
arguments are, to a certain extent, rooted in we get machines to mimic the mind’s recog-
conflicts in the history of artificial intelli- nition and understanding of everyday
gence (AI), and more specifically, knowledge objects? How do we get machines to under-
representation. I then begin drawing on oral stand information about a complex world?
history interviews that I conducted with mem- In the 1970s, debates about how to address
bers of the Semantic Web community, along these questions and challenges began to
with archival research that I conducted on the divide the knowledge representation commu-
World Wide Web Consortium (W3C)’s public nity. Over the past 50 years, there have been
forums (data collected as part of a research calls to merge elements of research across the
project documenting the histories of informa- divisions (Minsky, 1991), and yet, within the
tion infrastructures) to show how different Semantic Web community, and particularly
258 THE SAGE HANDBOOK OF WEB HISTORY

in conversations that emerged in the develop- how ‘smart’ systems with intricate algorithms
ment of the Web Ontology Language (OWL) (such as chess-playing com­puters) were una-
in the early 2000s, classic debates continued ble to perform more common-sense tasks.
to divide the way that researchers approached Prior to proposing frames as a framework for
knowledge representation on the Web. In representing knowledge, Minsky himself,
what follows, I describe early work in knowl- along with Seymour Papert, an AI researcher
edge representation, tracing how divisions who worked closely with Minsky at MIT, had
came to structure the way that Semantic Web attempted to tackle the challenges of teach-
researchers approached their work. ing computers common-sense knowledge
through the construction of ‘micro-worlds’ –
or bounded environments with little com-
Minsky’s Frames plexity where computers could learn. Upon
In ‘A Framework for Representing Knowledge’, mastering knowledge in a micro-world, com-
Marvin Minsky (1974), often considered along- puters could slowly be introduced to more
side John McCarthy to be a father of AI, intro- complex thinking.
duced the frame concept. For Minsky, a frame Yet Minsky critiqued the micro-world
is ‘a data structure for representing stereotyped approach to knowledge representation as he
information’: introduced the frame concept. Minsky (1974:
74) argued that, while modeling logic in a
Here is the essence of the theory: When one micro-world often produced favorable results,
encounters a new situation…one selects from ‘…as we approach reality the obstacles become
memory a structure called a Frame. This is a
remembered framework to be adapted to fit reality overwhelming’. Minsky also lamented that
by changing details as necessary. (1974: 1) ‘logicist’ approaches to AI focused intently on
producing ‘consistency and completeness’1:
The ‘terminals’ or slots of a frame have certain
requirements that the values assigned to them I cannot state strongly enough my conviction that
must meet. In order to recognize and under- the preoccupation with Consistency, so valuable for
Mathematical Logic, has been incredibly destructive
stand a situation, one must match the values of to those working on models of mind. At the popu-
the situation to the terminals in various frames. lar level it has produced a weird conception of the
Applying this to AI involved creating potential capabilities of machines in general. At the
‘frame languages’ based on descriptions ‘logical’ level it has blocked efforts to represent
of objects (rather than algorithms for how ordinary knowledge, by presenting an unreachable
image of a corpus of context-free ‘truths’ that can
data should be manipulated). As a machine stand separately by themselves. This obsession has
is exposed to a new object, it compares and kept us from seeing that thinking begins with
tries to match the object to the description defective networks that are slowly (if ever) refined
for a frame it holds in memory. If the object’s and updated. (1974: 78, italics in original)
values can’t be assigned to the terminals of a
frame, the machine has to select other frames Frames, on the other hand, focused not on the
from memory or ‘de-bug’ existing frames to internal structure of knowledge, but on how
create new ones. Frame languages provided the mind came to recognize and structure
an alternative to reasoning with first-order external situations. While using first-order
predicate calculus by offering structures that logic to model knowledge assumed that con-
could approach new and mundane situations cepts completely, consistently, and rationally
and make sense of them based on a prior followed the same set of rules, the architecture
understanding of the world. of the frame assumed that we can’t possibly
These ideas challenged earlier approaches to model this way since every situation is marked
knowledge modeling in AI and came at a time with a new set of components or conditions.
when many AI researchers were confronting Minsky argued that our assessments and
MAKING THE WEB MEANINGFUL: A HISTORY OF WEB SEMANTICS 259

expectations of any given situation can never classic AI paper, they suggested the difficulty
be more than imperfect approximations; we of using logic to model dynamic facts – or,
can only adapt a concept or situation from in other words, facts that change over time.
frames we already possess – to what we have This problem led some logicists, like Drew
already been exposed to. They will thus never McDermott, a computer science professor
be complete or consistent. at Yale University, to question the enduring
value of logical approaches:

At some point one has to ask, Why bother? When


Divides all the other boys and girls are out playing with
their computers, why must the logicists stay
Minsky’s approach to modeling knowledge, indoors and practice finger exercises? Can we
along with others that were emerging in the really believe that the insights gained will eventu-
early 1970s, such as Roger Schank and Robert ally allow logic to leapfrog other approaches to
inference? It seems far more likely that logic will
Abelson’s (1975) scripts, represented a more
trail behind, struggling to stuff all sorts of infer-
‘procedural’ approach to knowledge represen- ence patterns into its own view of the world,
tation than ‘logicist’ or ‘declarativist’ whether they fit or not. (McDermott, 2014: 116)
approaches, which had been championed by
prominent AI researchers like John McCarthy Terry Winograd (1980), a computer scientist at
and Patrick Hayes (who would go on to make MIT who later moved to Stanford University,
important contributions to the Semantic Web). also inclined towards a proceduralist and
This binary marked a schism in the knowledge scruffy approach to knowledge representation
representation community that emerged in the in the mid to late 1970s after coming to terms
early 1970s. Logicists/declarativists tended to with the inability of his ‘micro-worlds’ system
apply first-order logic to knowledge represen- SHRDLU to scale to real-world contexts.
tation problems, whereas proceduralists Describing the ‘controversy’ between proce-
tended to model knowledge representation dural and declarative epistemologies, Winograd
problems by defining structures and then (1975) argued that approaches to knowledge
manipulating them with programing proce- representation based solely in formal logic
dures. Further debates over the value of con- tended to separate ‘facts’ from ‘processes’ and
sistency and the aesthetic of knowledge ultimately went on to model facts as discrete
representation solutions marked another units. Proceduralist approaches, on the other
schism in the community – one that Roger hand, were based more on ‘interactions’
Schank, an AI theorist and computer science between data and subroutines that manipulated
professor at Yale University, coined as the them. Terry Winograd and Daniel Bobrow’s
‘neats vs. the scruffies’ in the early 1970s Knowledge Representation Language called
(Crevier, 1994). Neats sought clean and con- KRL – ‘organized around conceptual entities
sistent solutions to knowledge representation with associated descriptions and procedures’ —
problems, whereas scruffies tended to employ is considered one of the first knowledge repre-
hacks, testing out different solutions to see sentation languages to be based on Minsky’s
what would work versus what wouldn’t. frames (Bobrow and Winograd, 1977: 2).
These divisions were made more precari- Distinguishing this language from logic-based
ous as researchers on either side of the debate approaches, Winograd described, ‘There is a
addressed the shortcomings of the approaches. fundamental philosophical and mathematical
John McCarthy and Patrick Hayes (1969), difference between truth-based systems of
who may be characterized as the quintes- logic, and process-based systems like KRL’
sential neat and logicists, described one such (1980: 220).
shortcoming to logic-based knowledge repre- Yet Hayes (1977) went on to critique the
sentation in the late 1960s. In what is now a distinctions researchers like Winograd had
260 THE SAGE HANDBOOK OF WEB HISTORY

drawn between proceduralist and declarativist University who would go on to be considered


approaches, arguing that the controversy was the ‘Father of description logics’, began con-
based in a false dichotomy. Logic, he argued, ceptualizing a new system for knowledge
was not a programing language or even a style representation in his 1978 dissertation.
of programing. It was simply a set of ideas Brachman outlined the shortcomings of
for justifying inferences (conclusions drawn semantic networks – frameworks for repre-
in formal reasoning). Proceduralists had been senting the relationships between concepts.
advocating for particular data structures and He argued that semantic networks lacked
styles of programing, whereas the role of consistency and precision in semantics, or, in
logic was to enrich these data structures and other words, an explicit ‘epistemology’ –
processes with interpretations of their mean- defined by Brachman as ‘a set of primitive
ing in the world. Thus, he went on to assert structures for encoding knowledge… and
that representation languages that purported rules for combining those structures into
to adhere to a proceduralist model, such as well-formed representations of individuals
KRL, incorporated elements of logic in their and classes of individuals’ (Brachman, 1978:
design (Hayes, 1980). 44). He argued that existing frame languages,
In a similar vein to Hayes’s critique, in the such as KRL, while offering notable models
late 1970s and early 1980s more researchers for describing concepts, could not explain
began to position frames and other structures how sub-units composed wholes: how con-
for modeling knowledge (such as semantic cepts were built up from representational
networks (see Quillian, 1968)) as providing a units that related to each other. From this,
syntax, or structure, for knowledge represen- Brachman introduced the structural inherit-
tation but lacking a semantics that would ena- ance network – a model for knowledge repre-
ble machines to interpret its meaning. This sentation that would not only characterize the
led to calls for systems that would blur the properties of objects, but would also establish
lines between declarativist and proceduralist what could be called a ‘neat’ semantics for
trends in AI. For instance, Eugene Charniak, formalizing relationships between objects.
a computer science professor at Brown Perhaps, most notably, Brachman introduced
University, introduced Frame-based Artificial the concept of ‘subsumption’, or the ability to
Intelligence Language (FRAIL) that ‘use[d] categorize nodes so that a reasoner can even-
both predicate calculus and frames’ (1981: tually infer (or draw conclusions about) the
1083). Drawing on the neat/scruffy distinc- relationships between concepts.
tion, Charniak (1986) went on to propose a Other research emerging in the early 1980s
‘neat’ theory for language interpretation. was concerned with making object-oriented
It was in this spirit of merging descriptive representation languages, such as frames,
features of object-oriented representation more ‘functional’ (Levesque, 1984). In other
languages and inferential features of logic- words, this research aimed to design systems
based knowledge representation that descrip- that could do more than simply ‘describe’
tion logics (DL) – one of the most influential knowledge through neat semantic structures;
frameworks for modeling knowledge on the it also aimed to design systems that could
Semantic Web – emerged. make ‘assertions’, or statements arrived at
through coding procedures, based on these
structures. In 1982, Brachman and Levesque
suggested that ‘competence’ in knowledge
Genealogies of Description Logics
representation required both ‘terminologi-
Interested in building systems that could cal adequacy’ – where the structures of rela-
automate intelligent behavior, Ronald tions between terms could be characterized
Brachman, a PhD candidate at Harvard to a machine – and ‘assertional adequacy’ –
MAKING THE WEB MEANINGFUL: A HISTORY OF WEB SEMANTICS 261

where the machine could interpret these tradeoffs between expressivity and decidabil-
structures in order to make assertions about ity in DL-based systems. At one end of the
what is known and what is not known in a spectrum, systems like CLASSIC (Borgida
knowledge base. However, frames did not et al., 1989) considerably limited the expres-
make a distinction between these two forms sivity of the language in order to ensure the
of adequacy; according to Brachman et al. system could produce inferences in a reason-
(1983), in many frame-based representa- able amount of time, while offering advice
tion systems, it wasn’t clear whether the on how to work with and extend the limited
frame was describing a concept or asserting language for practical representation and rea-
the existence of a concept. They thus began soning (Brachman et al., 1991). On the other
to conceptualize systems that would differ- hand, systems like LOOM ‘conced[ed] that it
entiate between descriptive and assertional might be acceptable to deliberately build an
components of the language; the descriptive incomplete system’ (MacGregor, 1991b: 90).
components were based on frames, while the Robert MacGregor, who served as a Senior
assertional components were based on ele- Project Leader of the University of Southern
ments of first-order logic. Both subsumption California’s Information Sciences Institute,
and the distinction between descriptive and argued that knowledge representation sys-
assertional components oriented the design tems may need to sacrifice completeness and
of KL-ONE, a frame-based knowledge repre- offer users more expressivity in order to pre-
sentation system that would become the pri- vent systems from becoming too ‘scruffy’:
mary model for description logics (DL) in the
A small, elegant KRS embedded in a larger, incom-
early 1990s (Brachman and Schmolze, 1985). plete (i.e. scruffy) application yields a scruffy appli-
Systems like KL-ONE posed tricky trade- cation. … Our experience with NIKL and LOOM
offs. Brachman and colleagues aimed to build suggests that when application programmers need
systems that could at once provide rich, or to represent something, and the [knowledge repre-
sentation system] can’t help them, they invent their
‘expressive’, descriptions of concepts, but
own representation, and the result is nearly always
also make inferences about the relation- inferior to what a skilled knowledge representation
ships between concepts within a finite num- developer could produce. (MacGregor, 1991a: 396)
ber of computational steps. However, the
more expressive knowledge representation Later work by Ian Horrocks (1998) and Peter
languages are, or the more complexity they Patel-Schneider (1998), who would both
attempt to model, the more challenging it is come to be important figures in the develop-
to guarantee the system will be sound, com- ment of the Web Ontology Language (OWL),
plete, and ‘decidable’ (capable of produc- sidestepped the expressivity/decidability
ing a conclusion) (Patel-Schneider, 1985; tradeoff by implementing powerful reasoners
Levesque and Brachman, 1987). Because of that could draw inferences from expressive
this, KL-ONE-like systems often restricted knowledge bases in reasonable timeframes.
the number of constructors, or ‘primitives’, of Deborah McGuinness (2001) suggests that it
the language to a small set. It offered a sub- was working through these tensions in the
stantially limited logic. Thus, throughout the 1990s that prepared DLs to ‘emerge from
1980s, a great deal of attention was devoted to ivory towers’ into WWW usage.
‘cleaning’ semantics in object-oriented sys-
tems, advancing the inferential capabilities
of these systems, and theorizing the compu- KNOWLEDGE REPRESENTATION
tational limits of the work. In the late 1980s, ON THE WEB
this research merged into the DL2 framework.
Throughout the 1990s, researchers went When Ora Lassila, a computer scientist and
on to experiment how far they could push the research fellow at Nokia Labs, came to MIT
262 THE SAGE HANDBOOK OF WEB HISTORY

as a visiting scholar in 1996, Tim Berners-Lee effort to build a machine-readable corpus of


asked him what he believed was wrong with common-sense knowledge. MCF aimed to
the Web. Lassila replied that he would like the structure metadata so that information could
Web to be able to do more things without flow between software products with differ-
human intervention; Berners-Lee agreed. ent information models and data structures.
The origins of the Semantic Web are When Guha left Apple for Netscape in 1997,
often traced back to the publication of the he met Tim Bray, who had developed the first
Scientific American article ‘The Semantic version of the W3C’s XML specification – a
Web’ in 2001 (Berners-Lee et al., 2001), but standard for storing and transporting data on
the work to formalize standards and proto- the Web. Together, they adapted MCF using
cols for applying knowledge representation XML, and this project became the basis for the
on the Web had begun in the mid 1990s, with Resource Description Framework (RDF)4.
the development of the Resource Description Work to formalize RDF into a Web standard
Framework (RDF). RDF offers syntax for through the W3C began in 1997, and the stand-
enabling computers to interpret Web content. ard was formalized into a recommendation in
Based on RDF, Web data gets organized into 1998. The group of practitioners that partici-
‘triples’, consisting of a subject, a predicate, pated in the formalization of the W3C RDF
and an object. The subject is a piece of data standard had diverse backgrounds and agendas.
on the Web that has a universal identifier; the Some represented major search engines, aim-
object can either be another piece of data on ing to build tools that could enable machines
the Web that has a universal identifier or a to better interpret the content of a Web page
reference to something that exists outside of to improve search. Some came from industry,
the Web3. The predicate describes the rela- seeing an economic benefit to enabling smarter
tionship between the subject and the object. browsing. Another group of practitioners came
For instance, one triple may be John Doe from the knowledge representation community,
(http://johndoe) is a Person. Here John Doe excited by the idea of building a worldwide
would be a piece of data on the Web; ‘person’ knowledge representation system.
would refer to an entity that exists outside of On the working group many argued that
the Web; and ‘is a’ would describe the rela- RDF had to be simple – like a basic program-
tionship between the two. Another triple may ing language with which more complicated
be John Doe (http://johndoe) is married to programs could be written. Others did not
Jane Doe (http://janedoe). believe RDF should be treated like a program-
RDF emerged from a lineage of work ing language – that knowledge representa-
that aimed to develop standard formats for tion was not necessarily reducible to simpler
making metadata, or descriptions of data, forms. The recommendation emerged as a
machine-readable. An early predecessor to compromise between these viewpoints.
RDF was the Platform for Internet Content However, many individuals in the knowl-
Selection (PICS) – a system that allowed edge representation community still con-
users to associate metadata with Web content sidered RDF too messy and inconsistent to
designed initially to help parents control what enable knowledge representation. In fact,
children could access. Perhaps the most nota- without provocation or prior introduction, Pat
ble pre-cursor to RDF was the Meta-Content Hayes, who was still revered in the knowl-
Framework (MCF). Ramanathan Guha began edge representation community, emailed the
developing MCF at Apple in the mid 1990s. working group chairs to complain about the
Guha had long been involved in knowledge recommendation’s sloppiness. He was later
representation work, playing a key role in asked to join the working group. Reflecting
the Cyc project – an ambitious (and often earlier concerns surrounding the limita-
described ‘scruffy’ (Matuszek et al., 2006)) tions of frame-based systems based solely
MAKING THE WEB MEANINGFUL: A HISTORY OF WEB SEMANTICS 263

in procedural methodologies, RDF pro- is often no editorial review or quality control of


vided syntax for knowledge representation, Web information, each page’s reliability must be
questioned. Since a web page that was useful one
but lacked a neat semantics (Lassila and
day can disappear the next, there is no guarantee
McGuinness, 2001). This played an impor- on the availability of information. Since there are
tant role in the development of OWL. no integrity constraints on the Web, information
from different sources can be in disagreement,
leading to inconsistency. Some inconsistencies
Web Ontology Language may be due to error, others due to philosophical
differences. In addition, there are quite a number
In the late 1990s, the US Defense Advanced of well-known “web hoaxes” where information
Research Projects Agency (DARPA) was was published on the Web with the intent to
amuse or mislead – the computational agent typi-
becoming interested in designing systems that
cally cannot tell the difference! (Heflin et al., 1999:
could control and coordinate autonomous soft- 2–3, italics in original)
ware agents. One vein of this research involved
designing languages that could identify, under-
stand, and integrate information sources across Thus, rather than aiming to build a complete,
distributed agents through ‘semantics’. The consistent, or neat knowledge base, the guid-
DARPA Agent Markup Language (DAML) ing design principle for SHOE became ‘a
Program was introduced in 2000, and James little semantics goes a long way’.
Hendler, then a computer science professor at When Hendler got to DARPA, he was
the University of Maryland, was appointed as given funding to distribute to three labora-
the program manager. tories that would build out proofs of con-
At the time of his appointment, Hendler, cept for DAML. One funded group was the
along with colleagues Jeff Heflin and Sean Knowledge Systems Laboratory at Stanford
Luke, had been working to develop a sys- University – where McGuinness had moved
tem that could ‘mark up’ Web pages with after being quite active in the description
semantics. SHOE (Simple HTML Ontology logic community throughout the 1990s, both
Extensions) enabled website designers to ref- designing DL languages and co-building
erence an ontology where a series of terms large DL applications, including a family of
were defined and then annotate data on their industrial-strength configurator applications
HTML documents with these terms. The sys- (McGuinness and Wright, 1998). Another
tem would enable search engines to ‘look up’ funded group was the Massachusetts Institute
the meaning of annotated content on a Web of Technology Laboratory for Computer
page by referencing the ontology. Science and W3C, where Berners-Lee and
Placing this project in the context of others were advancing research towards
research being conducted in the field of what they were calling the ‘Semantic Web’.
knowledge representation (citing systems Several folks that had long been involved
like KL-ONE, CLASSIC, and LOOM), in the knowledge representation community
Heflin, Hendler, and Luke argued that apply- also became involved in the project, includ-
ing a knowledge representation language on ing Drew McDermott, Pat Hayes, and Peter
the Web would require considerably rethink- Patel-Schneider. Together, these groups and
ing what knowledge bases could enable: other collaborators formalized DAML into a
language in October 2000. Soon after, they
Web systems simply cannot assume that all of the joined efforts with other researchers who had
information has been entered solely under a been designing the Ontology Interface Layer
knowledge engineer’s watchful eye, and is there- (OIL) in the European Union. DAML+OIL
fore correct and consistent. As authority on the
Internet is distributed, it cannot and does not
was released in January 2001.
make any such promise. This lack of central control The W3C working group for the Web
leads to a number of serious problems. Since there Ontology Language (OWL) began in 2001,
264 THE SAGE HANDBOOK OF WEB HISTORY

tasked with shaping DAML into a Web stand- real-world knowledge representation. (van
ard. With over 40 participants, the committee Harmelen et al., 2002: 78)
was very large in comparison to other W3C Indeed, many of the arguments that arose in
committees. Consisting primarily of aca- the working group reflected earlier neat–
demic researchers, there were not so much scruffy struggles. As James Hendler, chair of
different stakeholder positions on the work- the working group of OWL, noted in one
ing group, but instead schisms in what mem- email to the group:
bers believed the ontology should do and how
it should be structured. In an interview I con- In fact, if one goes back to a famous talk given by
ducted with Pat Hayes, he described: Roger Schank…he referred to the neats and the
scruffies and at that time, this particular debate
The RDF working group … had sort of a collegiate we’re still having today was one of the examples
atmosphere and eventually … after some initial used as a differentiator! …the issue being discussed
fights, we got each other calibrated, and were able here gets so much to the heart of things and the
to sort of work together. Occasionally people would differences in how frame and DL folks see the
get exasperated with one another, but that sort of world that it’s hard to even know where to begin.
thing happens. The OWL working group was much
more of a real genuine schism. It had an on-going The issues that arose in the conversations to
fight, which was still ongoing, between two very
different points of view about what OWL should be.
formalize the ontology had to do with the
syntax, the semantics, and the usability of the
Some of the early controversies in formaliz- language. Syntactically, there was considerable
ing the mark-up language into a Web stand- concern held by some from the DL community
ard were identified in a ‘Trends and that designing first-order logic on top of RDF
Controversies’ section of the March/April syntax would prevent the ontology from pro-
2002 issue of the IEEE Intelligent Systems ducing decidable results.5 Pat Hayes, who
journal (van Harmelen et al., 2002). In the described in his interview with me that (despite
section, eight participants on the working his historical positioning) he played the role of
group – each from different backgrounds – a scruffy in the OWL working group, went on
were asked to comment on the process of to propose a proof that this was not a paradox
defining a standard ontology language for the as long as the logic gave up decidability.
Web. Frank van Harmelen, a knowledge rep- Another issue that arose in the working
resentation and reasoning professor at Vrije group concerned the type of logic on which
Universiteit Amsterdam, suggested that the the ontology should be based. Should the
styles of frame-based modeling that he had ontology be based on DLs – first-order logics
seen used in both academia and industry with considerably limited expressivity that
‘directly conflict with the DL [description could ensure the system could produce results
logic] style of DAML+OIL’ (van Harmelen in a reasonable amount of time – or should
et al., 2002: 2–3). He called on the working the ontology be based on full logic where any
group to reduce the complexity of the ontol- RDF statements could be combined in any
ogy’s formal semantics. Guus Schreiber, also way without guarantees that there would be
a computer science professor at Vrije soundness and completeness?6 Finally, there
Universiteit Amsterdam and co-chair of the was concern about whether the system would
OWL working group, wrote: be usable by everyday webmasters.
While early efforts attempted to sort out
The debate about what a Web ontology language these controversies, eventually the working
should look like is reminiscent of past neat-scruffy group decided to propose three versions of
struggles. Knowledge modelers want expressive-
ness, logicians stress decidability. The main differ-
OWL – one emphasizing simplicity and usa-
ence is that the Semantic Web actually forces us to bility (OWL-Lite), one emphasizing decid-
make some choices: there is a strong need for ability (OWL-DL), and one emphasizing
MAKING THE WEB MEANINGFUL: A HISTORY OF WEB SEMANTICS 265

expressivity (OWL-Full) (Horrocks et al., Web, the ‘linked data movement’ was less
2003). Ora Lassila told me in an interview, concerned with getting ontologies sound and
‘I think the distinction between these two complete and more concerned with enabling
[OWL-DL and OWL-Full] represents the rift diverse users to publish data on the Web with
that exists between the knowledge represen- linking protocols (Berners-Lee, 2009). For
tation community, or it did at the time any- Wendy Hall, a computer scientist at the
way’. In an email thread with many working University of Southampton whose work on
group members throwing around ideas for hypertext and hypermedia had influenced the
naming each version, Frank van Harmelen design of the Semantic Web, this was exactly
offered one possibility: what was needed to get the Semantic Web
out of an ‘AI rat-hole’. As she described to
OWL Light: ?
me in an interview: ‘We needed to get data
wimpy out there that could be linked so that people
could see what the power of [the Semantic
OWL fast: OWL/FOL-style web] was. The argument being that if you did
the theory first, you’d never achieve it’.
Neat
At around the same time, several of the
OWL large: OWL/RDF-style researchers who had contributed to the OWL
working group honed their attention on fixing
Scruffy 7
inconsistencies in OWL 1. OWL 2 was largely
designed as an attempt to extend and revise
Recommending three versions of OWL sig- the original, with an explicit aim to improve
nified that overcoming the epistemological the ontology’s soundness and expressivity
and technological binds of knowledge repre- (Grau et al., 2008). Yet, several researchers
sentation with a one-size-fits-all solution was that I have interviewed argued that creating
likely impossible – particularly in a space as yet another version of OWL only served to
messy as the WWW. further confuse Web users. Further, with more
everyday webmasters producing their own
Web data, there were far more instances of
Web 3.0 Meets Web 2.0
OWL being used in ways that did not adhere to
As the Semantic Web progressed, there was the strict specifications outlined in the ontol-
growing concern that its vocabularies and ogy (Halpin et al., 2010). This trend posed the
ontologies were being developed to tackle threat of breaking many of the inferential fea-
specific tasks and could not scale out for tures championed by the DL community.
structuring large-scale, heterogeneous, and Thus, in the early 2010s, ‘microformats’,
dynamic Web data (Shadbolt et al., 2006). such as schema.org and hCard, began pop-
Particularly as the emergence of social Web ping up as simplified alternatives to RDF
applications (such as blogging and social and OWL. Much like RDF’s predecessors,
networking sites) made it easier for users to these tools involved marking up data within
produce their own Web data, some Semantic HTML documents with vetted schemas.
Web practitioners began considering the role Popular discourse seems to suggest that these
that individual tagging of Web data could tools, while incredibly scruffy, inconsistent,
play in building out a Semantic Web. Around and unlikely ever to be able to serve as a base
this time many in the community began for sound inferences, are ‘winning out’ in
emphasizing building capacity for ‘linked the Semantic Web domain (Norvig, 2016).
data’ over reasoning with Web data (Bizer Figure 18.1 demonstrates how these tech-
et al., 2009). Aiming specifically to make nologies are part of a lineage of knowledge
structured data publicly available on the representation systems.
266 THE SAGE HANDBOOK OF WEB HISTORY

Figure 18.1 This figure depicts a timeline of systems, languages, and frameworks that have
been advanced in the field of knowledge representation since the 1960s. It shows how the
field has toggled between neat and scruffy approaches to knowledge representation. Many
of the Semantic Web technologies introduced in the 2000s and 2010s can be said to derive
from earlier systems – sharing common creators, design directives, and worldviews.

CONCLUSION In the mid 2010s, after years of firmly stak-


ing grounds over how data structures, seman-
In his commentary ‘The Myth of One True tics, and logic should formalize and organize
Logic’, DL pioneer Ronald Brachman (1987: Web data, members of the Semantic Web com-
169) acknowledged the ‘turn for the ‘neat’ munity are turning again to call for a ‘give-and-
that was emerging in the knowledge repre- take’ approach to knowledge representation
sentation community in the mid to late on the Web. Responding to a long email thread
1980s. He lamented the emergence of what considering how greater consistency could be
he called ‘TOTL[(the one true logic)]- brought to schema.org, Mike Bergman, a spe-
imperialism’, noting how an emphasis on cialist in knowledge representation and active
capital-L logicism had rendered other, less contributor to schema.org community discus-
neat, forms of logic illegitimate in the com- sions, replied, ‘Let’s not revisit the tiresome
munity. He concluded that what was needed RDF v. OWL wars in this forum’. He went on
in the community was more ‘give-and-take’ to suggest, ‘The trick is to be consistent in the
between logicist and procedural approaches. face of pragmatic realities’.8
MAKING THE WEB MEANINGFUL: A HISTORY OF WEB SEMANTICS 267

The history of knowledge representation 6  Jim Hendler described to me that this is an issue
on the Web can be read as an effort to navi- about whether infinity is permitted in the logic.
‘Logic’, he argued, ‘loves infinity’. However, infin-
gate, argue over, and eventually creatively
ity is not decidable and will break a DL reasoner.
endure these binds. In one sense, there is con- 7  https://lists.w3.org/Archives/Public/www-
siderable consensus that the Web could bene- webont-wg/2002Oct/0130.html
fit from ‘smarter data’ – that adding meaning 8  https://lists.w3.org/Archives/Public/public-
to Web data can add context to data, make vocabs/2014Jan/0047.html
data easier to discover, and enable us to draw
disparate data together in novel ways. On the
other hand, the Web itself is quite scruffy, full
of inconsistencies and paradoxes (see Barnet, REFERENCES
this volume, for how HTTP embodies this).
Making this space meaningful is a very hard Berners-Lee, T. (1989) Information Management:
enterprise. Yet, as schema.org and other linked A Proposal (https://www.w3.org/History/1989/
proposal.html) Accessed February 16, 2016.
data technologies that tolerate inconsistencies
Berners-Lee, T. (1999) Weaving the Web: The
and paradoxes ‘win out’ on the Web, the field
Original Design and Ultimate Destiny of the
of Web semantics seems to be taking a ‘turn World Wide Web by Its Inventor. San Fran-
for the scruffy’ (Poirier, 2017) – a turn perpet- cisco: Harper Business.
uated by persistent reminders that the world- Berners-Lee, T. (2009) The next Web. TED2009
views and concepts diverse people use to (https://www.ted.com/talks/tim_berners_lee_
describe their world often don’t fit so ‘neatly’ on_the_next_web?language=en.) Accessed
into tighter systems of representation. June 8, 2016.
Berners-Lee, T., Hendler, J., and Lassila, O.
(2001) ‘The Semantic Web’, Scientific Ameri-
can, 284(5): 29–37.
Bizer, C., Heath, T., and Berners-Lee, T. (2009)
Notes ‘Linked Data: The Story So Far’, in Sheth
Amit (ed.), International Journal on Semantic
1  In mathematical theory, ‘completeness’ suggests
a mathematical system where ‘all true statements Web and Information Systems, 5(3): 1–22.
can be proven’, and ‘consistency’ suggests a (doi:10.4018/jswis.2009081901)
mathematical system where ‘no false statements Bobrow, D.G., and Winograd, T. (1977) ‘An
can be proven’. Overview of KRL, a Knowledge Representa-
2  Unlike when referencing Web documents, when tion Language’, Cognitive Science, 1(1):
URIs are used to reference real-world objects, the 3–46. (doi:10.1207/s15516709cog0101_2)
real-world object cannot be retrieved through the Borgida, A., Brachman, R.J., McGuinness, D.L.,
computer. Agreeing on a solution to this problem and Resnick, L.A. (1989) ‘CLASSIC: A Struc-
generated a great deal of debate in the Semantic
tural Data Model for Objects’, in Proceedings
Web community (see Hayes and Halpin, 2008).
of SIGMOD International Conference on
The current W3C guidelines around this prob-
lem are summarized at: https://www.w3.org/ Management of Data. New York: ACM,
TR/2008/NOTE-cooluris-20081203/. 58–67. (doi:10.1145/67544.66932)
3  Both MCF and RDF went on to influence the Brachman, R.J. (1978) ‘A Structural Paradigm
design of RSS – another Web arena where the for Representing Knowledge’. PhD disserta-
‘neat/scruffy’ divides played out (see Robbins, tion, Harvard University.
2006: 108–9). Brachman, R.J. (1987) ‘The Myth of the One
4  https://lists.w3.org/Archives/Public/www- True Logic’, Computational Intelligence, 3(1):
webont-wg/2002Jan/0170.html 168–72. (doi:10.1111/j.1467-8640.1987.
5  Peter Patel-Schneider, a particularly strong propo-
tb00187.x)
nent for moving away from RDF syntax, acknowl-
Brachman, R.J., Fikes, R.E., and Levesque, H.J.
edged that scruffier approaches could overcome
this issue, but ‘it would be a gigantic mess, at (1983) ‘Krypton: A Functional Approach to
best’. https://lists.w3.org/Archives/Public/www- Knowledge Representation’, Computer, 16(10):
webont-wg/2001Dec/0081.html 67–73. (doi:10.1109/MC.1983.1654200)
268 THE SAGE HANDBOOK OF WEB HISTORY

Brachman, R.J., and Levesque, H.J. (1982) ‘Com- 6496. Berlin Heidelberg: Springer, 305–20.
petence in Knowledge Representation’, in (doi:10.1007/978-3-642-17746-0_20)
Proceedings of Second National Conference Harmelen, F. van, Horrocks, I., Clark, P., Patel-
on Artificial Intelligence. Philadelphia: Morgan Schneider, P.F., Uschold, M., Rousset, M.-C.,
Kaufmann Publishers Inc., 189–92. (http:// Hendler, J., and Schreiber, G. (2002), ‘Ontol-
www.aaai.org/Papers/AAAI/1982/AAAI82- ogies’ KISSES in Standardization’, IEEE Intel-
045.pdf) Accessed February 13, 2017. ligent Systems, 17(2): 70–9. (doi:10.1109/
Brachman, R., McGuinness, D., Patel-Schneider, MIS.2002.999223)
P., Resnick, L.A., and Borgida, A. (1991) Hayes, P.J. (1977) ‘In Defense of Logic’, in 5th
‘Living with CLASSIC: When and How to Use International Joint Conference on Artificial Intel-
a KL-ONE-like Language’, in John Sowa (ed.), ligence – Volume 1. San Francisco: Morgan
Principles of Semantic Networks: Explorations Kaufmann Publishers Inc, 559–65. (http://
in the Representation of Knowledge. San dl.acm.org/citation.cfm?id=1624435.1624559)
Mateo, CA: Morgan Kaufmann, 401–56. Accessed February 14, 2017.
Brachman, R.J., and Schmolze, J.G. (1985) ‘An Hayes, P.J. (1980) ‘The Logic of Frames’, in
Overview of the KL-ONE Knowledge Represen- Dieter Metzing (ed.), Frame Conceptions and
tation System’, Cognitive Science, 9(2): 171– Text Understanding. New York: Walter de
216. (doi:10.1207/s15516709cog0902_1) Gruyter, 46–61.
Charniak, E. (1981) ‘A Common Representa- Hayes, P.J., and Halpin, H. (2008) ‘In Defense of
tion for Problem-Solving and Language- Ambiguity’, International Journal on Seman-
Comprehension Information’, Artificial tic Web and Information Systems, 4(2):
Intelligence, 16(3): 225–55. (doi:10.1016/ 1–18. (doi:10.4018/jswis.2008040101)
0004-3702(81)90001-1) Heflin, J., Hendler, J., and Luke, S. (1999)
Charniak, E. (1986) ‘A Neat Theory of Marker SHOE: A Knowledge Representation Lan-
Passing’, in Proceedings of the Fifth AAAI guage for Internet Applications (http://drum.
National Conference on Artificial Intelli- lib.umd.edu/handle/1903/1044) Accessed
gence. Philadelphia, Pennsylvania: AAAI February 17, 2017.
Press, 584–88. (http://dl.acm.org/citation. Horrocks, I. (1998) ‘Using an Expressive Descrip-
cfm?id=2887770.2887869) Accessed Febru- tion Logic: FaCT or Fiction?’, in Proceedings of
ary 21, 2017. Knowledge Representation. San Mateo, CA:
Crevier, D. (1994) AI: The Tumultuous History Morgan Kaufmann Publishers Inc., 636–47.
of the Search for Artificial Intelligence. New Horrocks, I., Patel-Schneider, P.F., and Harme-
York: BasicBooks. len, F. van. (2003) ‘From SHIQ and RDF to
Grau, B.C., Horrocks, I., Motik, B., Parsia, B., OWL: The Making of a Web Ontology Lan-
Patel-Schneider, P., and Sattler, U. (2008) guage’, Web Semantics: Science, Services
‘OWL 2: The next Step for OWL’, Web and Agents on the World Wide Web, 1(1):
Semantics: Science, Services and Agents on 7–26. (doi: 10.1016/j.websem.2003.07.001)
the World Wide Web, Semantic Web Chal- Hui, Y. (2016) On the Existence of Digital
lenge 2006/2007, 6(4): 309–22. Objects. Minneapolis: University of Minne-
(doi:10.1016/j.websem.2008.05.001) sota Press.
Halford, S., Pope, C., and Weal, M. (2013) Lassila, O., and McGuinness, D. (2001) ‘The
‘Digital Futures? Sociological Challenges and Role of Frame-Based Representation on the
Opportunities in the Emergent Semantic Semantic Web’, Knowledge Systems Labora-
Web’, Sociology, 47(1): 173–89. tory Report KSL-01-02. Stanford University.
(doi:10.1177/0038038512453798) Levesque, H.J. (1984) ‘Foundations of a Func-
Halpin, H., Hayes, P.J., McCusker, J.P., McGuin- tional Approach to Knowledge Representa-
ness, D.L., and Thompson, H.S. (2010) ‘When tion’, Artificial Intelligence, 23(2): 155–212.
owl:sameAs Isn’t the Same: An Analysis of (doi:10.1016/0004-3702(84)90009-2)
Identity in Linked Data’, in Peter F. Patel- Levesque, H.J., and Brachman, R.J. (1987)
Schneider, Yue Pan, Pascal Hitzler, Peter Mika, ‘Expressiveness and Tractability in Knowl-
Lei Zhang, Jeff Z. Pan, Ian Horrocks, and Birte edge Representation and Reasoning’, Com-
Glimm (eds.), The Semantic Web – ISWC putational Intelligence, 3(1): 78–93.
2010. Lecture Notes in Computer Science (doi:10.1111/j.1467-8640.1987.tb00176.x)
MAKING THE WEB MEANINGFUL: A HISTORY OF WEB SEMANTICS 269

MacGregor, R. (1991a) ‘The Evolving Technology Patel-Schneider, P.F. (1985) ‘A Decidable First-
of Classification-Based Knowledge Representa- Order Logic for Knowledge Representation’,
tion Systems’, in John Sowa (ed.), Principles of in Proceedings of the 9th International Joint
Semantic Networks: Explorations in the Repre- Conference on Artificial Intelligence – Volume
sentation of Knowledge. San Mateo, CA: 1. San Francisco, CA, USA: Morgan Kauf-
Morgan Kaufmann Publishers, Inc, 385–400. mann Publishers Inc, 455–58. (http://dl.acm.
MacGregor, R. (1991b) ‘Inside the LOOM org/citation.cfm?id=1625135.1625224)
Description Classifier’, SIGART Bulletin, 2(3): Accessed February 17, 2017.
88–92. (doi:10.1145/122296.122309) Patel-Schneider, P.F. (1998) ‘DLP System
Matuszek, C., Cabral, J., Witbrock, M., and Description’, in Collected Papers from the
Deoliveira, J. (2006) ‘An Introduction to the International Description Logics Workshop,
Syntax and Content of Cyc’, in Proceedings 87–9.
of the 2006 AAAI Spring Symposium on Poirier, L. (2017) ‘A Turn for the Scruffy: An
Formalizing and Compiling Background Ethnographic Study of Semantic Web Archi-
Knowledge and Its Applications to Knowl- tecture’, in Proceedings of WebSci’17. Troy,
edge Representation and Question Answer- NY: ACM, 359–67. (doi:https://doi.
ing. Menlo Park, CA: AAAI Press, 44–9. org/10.1145/3091478.3091505)
McCarthy, J., and Hayes, P.J. (1969) ‘Some Philo- Quillian, M.R. (1968) ‘Semantic Memory’, in
sophical Problems from the Standpoint of Marvin Minsky (ed.), Semantic Information
Artificial Intelligence’, in B. Meltzer and D. Processing. Cambridge, MA: MIT Press,
Michie (eds.), Machine Intelligence 4. Edin- 27–70.
burgh: Edinburgh University Press, 463–502. Robbins, J.N. (2006) Web Design in a Nutshell:
McCarthy, M. (2017) ‘The Semantic Web and A Desktop Quick Reference. Sebastopol, CA:
Its Entanglements’, Science Technology, and O’Reilly Media, Inc.
Society, 22(1): 21–37. (doi:10.1177/ Schank, R.C., and Abelson, R.P. (1975) ‘Scripts,
0971721816682796) Plans, and Knowledge’, in Proceedings of
McDermott, D. (2014) ‘AI, Logic, and the the 4th International Joint Conference on
Frame Problem’, in Frank M. Brown (ed.), Artificial Intelligence – Volume 1. San Fran-
The Frame Problem in Artificial Intelligence: cisco, CA, USA: Morgan Kaufmann Publish-
Proceedings of the 1987 Workshop. San ers Inc, 151–7. (http://dl.acm.org/citation.
Francisco: Morgan Kaufmann, 105–18. cfm?id=1624626.1624649)
McGuinness, D.L. (2001) ‘Description Logics Shadbolt, N., Berners-Lee, T., and Hall, W.
Emerge from Ivory Towers’, in Proceedings of (2006) ‘The Semantic Web Revisited’, IEEE
the International Workshop on Description Intelligent Systems, 21(3): 96–101.
Logics. Stanford, CA, 201–3. (doi:10.1109/MIS.2006.62)
McGuinness, D.L., and Wright, J.R. (1998) ‘An Veltman, K. (2006) ‘Towards a Semantic Web for
Industrial-Strength Description Logic-Based Culture’, Journal of Digital Information, 4(4).
Configurator Platform’, IEEE Intelligent Sys- (https://journals.tdl.org/jodi/index.php/jodi/
tems and Their Applications, 13(4): 69–77. article/view/113 Accessed February 16, 2016.)
(doi:10.1109/5254.708435) Waller, V. (2016) ‘Making Knowledge Machine-
Minsky, M. (1974) ‘A Framework for Repre- Processable: Some Implications of General
senting Knowledge’, Massachusetts Institute Semantic Search’, Behaviour & Information
of Technology A.I. Laboratory, Memo No. Technology, 35(10): 784–95. (doi:10.1080/0
306. (http://hdl.handle.net/1721.1/6089) 144929X.2016.1183710)
Accessed October 15, 2015. Winograd, T. (1975) ‘Frame Representations
Minsky, M. (1991) ‘Logical Versus Analogical or and the Declarative/Procedural Controversy’,
Symbolic Versus Connectionist or Neat in Daniel Bobrow and Allan Collins (eds.),
Versus Scruffy’, AI Magazine, 12(2): 34–51. Representation and Understanding: Studies
(doi:10.1609/aimag.v12i2.894) in Cognitive Science. New York: Academic
Norvig, P. (2016) ‘The Semantic Web and the Press, Inc, 185–210.
Semantics of the Web: Where Does Meaning Winograd, T. (1980) ‘What Does It Mean to Under-
Come From?’, Keynote presented at the stand Language?’, Cognitive Science, 4(3):
www2016, Montreal, Quebec. 209–41. (doi:10.1207/s15516709cog0403_1)
19
Browsers and Browser Wars
Marc Weber

Browsers are the part of the Web that users the minute or hour (Bourne and Hahn, 2003;
actually see and touch with a mouse or finger. Gillies and Cailliau, 2000; Hafner, 2001;
As such, they’ve long gotten more attention Mailland and Driscoll, 2017), the Web was an
than their counterparts: the equally important open standard running over an (initially) aca-
Web server software off somewhere at the end demic network. The ad-driven and ‘freemium’
of a wire or radio link. But in the Web’s first models that finally succeeded have funda-
years, browsers (Figure 19.1) also served as the mentally shaped not only the economy of the
main battleground for control of its future. The online world, but its character and its content1.
features they emphasized, or left out, would The chapter finishes by showing how
shape the new medium at the most fundamen- the battleground has gradually shifted from
tal levels – how you would find information, browser vs. browser to mobile vs. personal
pay online, use multimedia, even whether you computers, and to apps vs. the Web itself. The
could easily contribute knowledge or simply competition has also moved steadily server
absorb it. While the Web is an open standard, side. Not to Web server software itself, but to
the creators of the dominant browser could the massively profitable content it serves up
hope to become its de facto masters. from giant banks of hard disks: social media,
This chapter traces that history, starting keyword search, shopping, and more.
with the Web’s background and invention,
and the loss of control by its creators, whose
vision for the Web’s features was radically
different from what it became. It moves on to DAVID VS. GOLIATH
the two great ‘browser wars’, and how those
were intertwined with the existential ques- At the start of the 1980s it was hard to imag-
tion of the early Web – how to make it pay for ine that the Advanced Research Projects
itself. Where older online systems charged by Agency (ARPA)’s Internet protocols would
BROWSERS AND BROWSER WARS 271

Figure 19.1 Early Web browsers, family tree.


Source: Computer History Museum

become the ‘one ring to rule them all’, with to painstakingly heed input from all major
dominion over the earth’s wires and switches stakeholders (Russell, 2014).
from national payment systems to smart The Internet community, by contrast, was
refrigerators. They were just one of several a seeming contradiction: a captive democ-
experimental standards for how to tie differ- racy of brilliant hackers presided over by
ent networks together at the lower plumbing philosopher-geeks with absolute power.
levels, a process known as internetting. Those leaders – the directors of the comput-
In fact, as that decade’s bitter standards ing research program at ARPA – generally let
wars unfolded around which internetting the iconoclastic volunteers self-organize in a
standard should prevail, the Internet we use kind of online meritocracy, deciding stand-
today was a scrappy but research-oriented ards through a mutual respect for ‘…rough
David facing several Goliaths: Open Systems consensus and running code’2. But the leaders
Interconnect (OSI), the lumbering official could also step in to settle a standards debate
favorite of industry and standards bodies, by fiat, or use their funding muscle to force
and two proprietary systems from computing a grant recipient – UC Berkeley – to build
giants DEC and IBM. Internet protocols into UNIX3.
But then David began taking steroids – in It also didn’t hurt that the Internet was
the form of US government cash. With infu- backed by technology-obsessed young sena-
sions from the military, the National Science tor Al Gore, whose eponymous bill put nearly
Foundation, and other agencies, while build- a billion dollars into Internet protocols for
ing on its loyal base of open-source hack- education4. Those protocols began to spread
ers, the Internet started bulking up (Pelkey, like wildfire, even though OSI remained the
2009–2018). official future of networking for the US gov-
It didn’t hurt that the Internet had working ernment, among others. Looking back, it’s
hardware and software when its most serious clear that by the end of the 1980s the Internet
rival, the Europe-based OSI, was still mostly had already won, even if many insiders didn’t
vaporware tied down in endless meetings. As realize it at the time (Pelkey, 2009–2018;
an official international standard, OSI had Segal, 1995).
272 THE SAGE HANDBOOK OF WEB HISTORY

But because the Internet was still a non- small-time online systems. They were written
commercial net used by geeks, nobody had by lone wolf volunteers and open-source col-
bothered to write slick, easy-to-use online sys- laborations, some from within the freewheel-
tems to run over it, such as the mass-market ing Internet community itself. There was
Minitel, or CompuServe, or PLATO, or AOL, Gopher, a barebones document navigation
or even LexisNexis (Campbell-Kelly and system from the University of Minnesota,
Garcia-Swartz, 2013; Carey and Elton, 2009; and WAIS, a mildly commercial navigation
CHM, 2011; Dear, 2017; Gillies and Cailliau, system similar to a search engine. Usenet,
2000; Jones and Latzko-Toth, 2017; Mailland whose sprawling discussion boards had long
and Driscoll, 2017; Schafer and Thierry, 2012; hosted lively discussions on everything from
Marchand, 1987; Mounier-Kuhn, 2002). Those sadomasochism to particle physics, was the
systems were proprietary, and mostly ran over biggest existing system to get adapted to the
their own networks. The geeks who tradition- Internet (CHM, 2011; Gillies and Cailliau,
ally used research networks like the Internet 2000; Hauben and Hauben, 1997; Krol, 1993).
got by with an assembly of fussy tools only a Another set of online systems featured
power user could love (Hafner and Lyon, 1996; hypertext, the clickable links so familiar
Salus, 1995). To grow beyond those core users, today. These included Viola (Figure 19.2),
something had to change. built around Java-like applets by brilliant and
Who would put an online system on the bored Berkeley student Pei Wei; Hyper-G,
Internet? a slickly packaged and very complete sys-
Back in the 1960s ARPA itself had kick-started tem by Austrian researcher Hermann Maurer
some of the world’s very first such systems, from (Nielsen, 1995); Lynx by Michael Grobe and
timesharing to Doug Engelbart’s eponymous Lou Montulli at the University of Kansas,
NLS (oNLine System). But the ARPA com- and several others (Berners-Lee and Fischetti,
munity’s champions of these upper ‘user’ lay- 1999; CHM, 2011; Gillies and Cailliau, 2000)5.
ers – including J.C.R. Licklider, Engelbart, and
Bob Taylor – had long since moved on (Bardini,
2000; Barnet, 2014, and this volume; Markoff, WorldWideWeb
2005; Nielsen, 1995; Waldrop, 2001).
So the vacuum above the Internet began One of the more obscure of the late 1980s
to get slowly filled by a rag-tag collection of attempts to create an online system for the

Figure 19.2 Viola hypertext system, 1989. Viola was a powerful hypertext system by stu-
dent Pei Wei, based around Java-like applets. He later turned it into an early Web browser.
Courtesy of Pei Wei.
BROWSERS AND BROWSER WARS 273

Internet came out of CERN, the huge physics one place that lived that chaos fully, it was
laboratory in Geneva, Switzerland. The tiny CERN. Because it is funded by over a dozen
project had a comically ambitious name: countries, the institution not only had all the
‘WorldWideWeb’. Its inventor, English phys- many competing standards of the era, but
icist turned programmer Tim Berners-Lee, also obscure national ones thrown into the
created the Web as a partly underground mix, plus a bevy of home-grown contenders
effort – with help from colleague Robert developed just for physics (Berners-Lee and
Cailliau and their students and assistants, and Fischetti, 1999; Gillies and Cailliau, 2000;
with limited official support from CERN. Segal, 1995).
I’ve noticed that many of the people That meant Berners-Lee’s hypertext-
strongly attracted to hypertext share person- based WorldWideWeb had to work within
ality traits that could be labeled Attention the mix of existing systems – instantly. It
Deficit Disorder: distractible, absent-minded, needed to be something a network adminis-
and creative6. To these folks, traditional hier- trator could install without licenses, or pur-
archical categories can seem numbingly chase orders, or days of training. It would be
predictable, the cyber equivalent of chloro- best if it could connect to current databases
form. But a clickable hyperlink might lead and documents.
anywhere. It offers an intoxicating glimpse In terms of features, his system was rudimen-
of what it’s like to make tangible your own tary compared with the elegant visions of earlier
fleeting thoughts and associations; to pin the pioneers like Nelson and Engelbart– something
butterfly of insight. In fact, the man who origi- those pioneers would not be shy about later
nally coined the word ‘hypertext’, Ted Nelson, pointing out (Gillies and Cailliau, 2000; Nelson,
may have co-invented the medium as a crutch 2008; Weber and Hughes, 1997). But his goal
to compensate for his own troubles focusing7. was less to create a self-contained online realm
With his habit of talking in a rapid stream than bridges between existing ones.
of half-finished insights and wry asides, Tim Some of that compatibility was based on a
Berners-Lee has been described as ‘...a hand new role for hypertext. Berners-Lee had real-
grenade with ears’8. The son of early com- ized that hypertext links were not just a use-
puter professionals, he had come up with his ful way to navigate knowledge in general, or
own hypertext system (ENQUIRE) nearly a to get around within a particular information
decade before the Web. The idea stayed with system like HyperCard or NLS. They might
him through a series of contracts and a startup also be the practical breakthrough he needed
venture, and by the late 1980s had turned into to unify information across a rat’s nest of
an obsession. He mercilessly pestered his those different information systems:
managers to send him to an emerging series
of hypertext conferences in the late 1980s. To follow a link, a reader clicks with a mouse (or
Most creators of online systems had started types in a number if he or she has no mouse). To
search an index, a reader gives keywords (or other
with a blank slate, without worrying too much search criteria). These are the only operations nec-
about compatibility. From Engelbart’s NLS essary to access the entire world of data.
in the 1960s to Hyper-G in the late 1980s,
they had assumed their adopters would put in — Tim Berners-Lee, The W3 Book, early 1990s
some effort just to get the system going; con-
verting existing information to the new for- You would search, and click. Behind the
mat, perhaps even buying custom equipment. scenes those actions could be translated into
Berners-Lee could assume no such thing. tortured queries for a database, or arcane
The 1980s were a Babel of conflicting stand- passwords and credentials, or the trigger
ards for both online systems and the net- for intricate commands – perhaps even for a
works that underpin them. But if there was whole virtual page to be created on the fly.
274 THE SAGE HANDBOOK OF WEB HISTORY

But as a user you need neither know nor care and his own Web browser. Unlike those we
about such details. You would be surfing. use today, the prototype browser was also an
Berners-Lee submitted a funding pro- editor – you could author Web pages as easily
posal for what would become the Web in as in a word processor.
early 1989. He re-submitted it in mid 1990 Editing was common to other hypertext
with Robert Cailliau, who had been plan- systems. But it was an especially pivotal part
ning a networked hypertext system of his of his vision for the Web; not only to be able
own. The response from upper management to read information anywhere in the world,
was silence. but to be able to contribute to it, and make
Berners-Lee’s boss Mike Sendall had references to anything, anywhere, in your
pushed him to go ahead and create pro- own personal notes and to-do lists as well
totypes for the main elements of the Web as shared documents. His hope was that a
we know today. He set to work over two Web of knowledge – you might even call it
months in the fall of 1990. He tried hard to a ‘world brain’ – would gradually assemble
farm out the browser-editor, though – he and itself from the millions of links made by
Cailliau begged hypertext vendors including users in the course of their everyday lives.
Microcosm and OWL to adapt their existing How could they make all those links?
client software to work with the Web. There Very simply – clicking on the top-level Links
were no takers. menu in the browser-editor would let a user
By Christmas Berners-Lee had URLs for instantly link to either existing places or a
addresses, HTML for pages, HTTP for links, new, blank page. Making links would become

Figure 19.3 WorldWideWeb browser-editor on the NeXT computer. Courtesy of Jean-François


Groff.
BROWSERS AND BROWSER WARS 275

as ordinary as typing text. Users wouldn’t the real-world relationships between people,
have to deal with any complex codes unless projects, ideas, and things. A few years later,
they wanted to. With the built-in editing, he would flesh out these loose ideas into his
there was no reason to edit HTML by hand. vision for a ‘Semantic Web’, covered in other
URLs were also hidden by default; the idea chapters in this Handbook.
was to navigate with simple links and key- But back in 1991 there was an elephant
word searches. in the room. He had written his browser-
The Web was born. editor on a powerful but rare computer built
The next year was perhaps the richest crea- by Steve Jobs’ NeXT Inc. and known for its
tive period in the Web’s early development as rapid prototyping features. The same work
Berners-Lee, student Nicola Pellow, Robert on a more conventional machine might have
Cailliau, programmer Jean-François Groff, taken over a year. To demo the Web on other
mentor Ben Segal, and a growing circle of platforms he’d had student Nicola Pellow
students and colleagues fleshed out a vision, create a simple text-only browser that could
which included a number of features not run on anything from a mainframe terminal
implemented in the Web today9: typed links, to a Psion handheld computer. (Many early
Web-to-print scripts that would translate users would mistakenly believe this was the
links into paper cross-references10, Next and only browser the Web team had created11.)
Previous buttons to follow a “trail” through But for the Web to grow, Graphical User
content, and so on. Interface (GUI) browsers were now needed for
The Web’s design was also fumbling its other platforms, like PCs, Macs, and the UNIX
way toward another goal. From his ENQUIRE workstations common in computer science.
system a decade before, Berners-Lee had CERN’s upper management declined to fund
been interested in hypertext links as not only that development. For an organization whose
a convenient navigation aid, but a way to map real job is smashing the building blocks of the

Figure 19.4 Screenshot, CERN line-mode browser. © CERN.


276 THE SAGE HANDBOOK OF WEB HISTORY

universe to see what makes them tick, the Web around the globe pitching in and meeting and
seemed like a stretch. The project was stuck. chatting on the www-talk discussion group.
So Berners-Lee and Cailliau took a leap But however beautifully conceived, these
of faith that was both desperate and hopeful. were one-man or student efforts; unpolished
They had programmer Jean-François Groff side projects that frequently crashed and
create a library of ready-to-use Web code, could take even an experienced programmer
like a roll-your-own browser kit, along with a part of a day to successfully install.
standardized server. They put out an appeal: The next volunteer browser changed all
could volunteers from the budding Web that. It was called Mosaic (Figure 19.7),
development community use that library to and it was written in early 1993 by bril-
write the needed browsers? liant student Marc Andreessen and UNIX
Looking back, this is where the Web’s guru Eric Bina at the National Center for
history split off in a very different direction Supercomputing Applications (NCSA). At
from other online systems. From Minitel to first it seemed little more than a me-too
LexisNexis to PLATO and CompuServe, browser on the model of Viola and Midas.
most such systems provided the equivalent of But NCSA, a major site in the 1980s expan-
a ‘browser’ in the form of their own client sion of the Internet, had created and distrib-
or physical terminal. But this was simply a uted the most popular program to run over
tightly coupled – often proprietary – tool for the Internet so far, NCSA Telnet.
viewing content on the system’s servers. The Recognizing the Web’s potential, NCSA
same model would apply to the Web’s direct software manager Joseph Hardin quickly
competitors like Gopher and Hyper-G. Had assembled formal teams for UNIX, Mac, and
CERN funded versions of the NeXT browser- PC browsers as well as a server, and he and
editor for popular platforms, the Web, too, NCSA director Larry Smarr turned the igni-
would have had its own client tightly coupled tion key on the institution’s formidable sup-
to the system as a whole. Because they didn’t, port and PR machines.
a whole new set of players entered the scene. The result was the first Web browser that
was properly tested, supported, and easy for
non-geeks to install. Like the Viola and Midas
browsers it was modeled on, Mosaic left out
A FISTFUL OF BROWSERS editing; you could browse Web pages but not
change them. But Berners-Lee was confident
The response to the Web team’s cry for help that could soon be added back (Weber, 1995
was fast, and heartening. Pei Wei of UC unpublished)13.
Berkeley converted his Viola hypertext Few readers today can imagine just how
system (Figure 19.2) into the first Web small and incestuous the Web community
browser beyond CERN (Figure 19.5). Viola was in 1993, or how deeply the results of
and the Midas browser (Figure 19.6) by Tony early events shape what we see and use now.
Johnson of SLAC laid out the familiar fea- As in those creation myths where, say, a hair
tures of a browser we still use today. A com- from the eyebrow of one of the gods who
puter science class in Helsinki created the gives birth to all trees, a seemingly minor
Erwise browser as a student project (Figure technical decision in 1993 could spawn or
19.1). Law professor Tom Bruce wrote Cello erase whole future industries14.
(a riff on Viola), the first browser for the PC The first real technical battle of the Web
(Gillies and Cailliau, 2000)12. And so on started off as a footnote in an e-mail from
(Figure 19.1). Marc Andreessen. It ended with Andreessen
For the Web team at CERN, it felt like a politely, but clearly, defying Tim Berners-
barn-raising; brilliant volunteers from all Lee. What was the struggle about? On its
BROWSERS AND BROWSER WARS 277

Figure 19.5 Viola browser, screenshot from later version 1993. Courtesy of Pei Wei.

face, how to best integrate images, sounds, urgency when it came to actually implement-
and other media into a still text-heavy Web. ing that support. It was one of many features
But the real issue, of course, was who would that could be worried about when the Web
shape its future. was better established. The NeXT browser-
Since Berners-Lee’s first Web proposal in editor, Midas, Cello, and others were full
1989, support for integrating a rich variety GUI (Graphical User Interface) browsers in
of multimedia had been a major goal. But the sense that they could display text in differ-
nobody on the Web team felt much sense of ent sizes and fonts (Figure 19.1). Yet images
278 THE SAGE HANDBOOK OF WEB HISTORY

Figure 19.6 Midas browser, screenshot from later version 2.1. Courtesy of Tony Johnson.

and other media could only be displayed in Palo Alto Research Center (PARC), partly to
a separate window, not mixed with the text. study different options. He wanted to create
Multimedia over networks was a com- a single HTML tag that could not just handle
plicated and fast-evolving technical issue different media, but display them in different
at the time, with a major standard (MIME) ways – combined with text on a page, in a
just being defined. Berners-Lee had spent separate browser window, collapsible as in an
part of the summer of 1992 at Xerox’s famed outline view, or in a helper app15.
BROWSERS AND BROWSER WARS 279

But Andreessen had a crucial insight: seized the imagination of the tech press was
images, at least, weren’t something that the one that let you mix text and graphics.
could wait until after the Web was estab- The year before, the number of computers
lished. Without them, the Web might never connected to the Internet had passed a mil-
get there at all. lion. The time was ripe for something like the
In late February of 1993 Andreessen wrote Web to go world-wide.
to the www-talk mailing list announcing that
Mosaic would incorporate a new HTML tag
specifically to display bit-mapped images,
called IMG. Berners-Lee responded with his FAME
recommendation for a future, more general
INCLUDE tag that would be the single gate- Mosaic took the technology press by storm in
way for integrating all media, not just images. the spring of 1993, aided by a strong PR cam-
INCLUDE was meant to be incorporated into paign from NCSA as well as years of pent-up
standard links, and offer flexibility about how dreams of an ‘information superhighway’.
media were shown (embedded in the page, in For the still-clubby little Web community
a separate window, in an ‘outline’ view the it was mostly a period of thrill after thrill, as
user could toggle on or off). everyone rode a delicious wave of success
The debate became a heated topic on together. The tensions of the IMG debate
www-talk, with nearly all the main folks receded into the past. All the late-night dreams
weighing in. Many agreed with Berners-Lee of lonely years suddenly seemed not just pos-
that IMG could set a bad precedent. Would sible, but likely, whether your personal vision
there need to be special-case tags for every of cyber-utopia was an infinitely linked library,
new class of media? IMG could also make or a world brain, or a global marketplace.
it harder to implement the kind of scalability It was perhaps like the excitement at the
and format negotiation they felt were impor- dawn of the auto or radio industries, but now
tant. Tony Johnson was considering adding on a time scale compressed from years to
media to the Midas browser, with a simple months. Serious new Web sites were appear-
tag like IMG coupled to a general one like ing, from the Louvre to Kevin Hughes’
INCLUDE. Though he didn’t mention it on pioneering multimedia Hawaii site. For the
the list, Pei Wei was already experimenting Web team, there was an especially warm
with multimedia in Viola. feeling whenever former rivals jumped on
In the end, Berners-Lee decided to put an the bandwagon – the Lynx hypertext sys-
end to the back and forth with a simple ruling: tem joined Viola in transforming itself into a
‘Let the IMG tag be INCLUDE…’16. After a Web browser, and Austrian hypertext pioneer
silence, Andreessen announced that he and Hermann Maurer added Web support to his
Eric Bina would go ahead with IMG despite sophisticated Hyper-G.
the objections17, saying that the complexities Then, in a stroke of luck that would seem
of a more general tag like INCLUDE could contrived in fiction, the system which had
be worked out later. become the Web’s mortal rival began to
It was the first time somebody had seri- almost magically melt away. Gopher ran over
ously challenged Berners-Lee’s authority the Internet like the Web, but was larger and
over the Web. It was also the first real rift in was growing faster (Figure 19.8). Suddenly,
a friendly, idealistic Web development com- Gopher’s home institution – the University
munity which so far had just been thrilled to of Minnesota – announced in early 1993
find others with like interests. that it would begin to charge licensing fees
But users loved the pictures! It’s probably for Gopher servers. That same spring Tim
no coincidence that the version of Mosaic that Berners-Lee finally convinced CERN to
280 THE SAGE HANDBOOK OF WEB HISTORY

Figure 19.7 NCSA Mosaic browser, 1993. Mosaic brought the Web to ordinary users. NCSA’s
‘What’s New’ page effectively became a home page for the entire early Web. Credit: © Board
of Trustees of the University of Illinoi.

release the Web into the public domain. The staff scrambling to add machines and band-
Web would be free – forever. Gopher servers width fast enough to meet the exploding
withered like snowmen in a spring rain. As demand. Company after company licensed
we’ll see, Vice President Al Gore would help their code (some not realizing they could
deliver the coup de grâce. build their own browser and server from the
In the summer, technical publisher O’Reilly Web team’s public code library).
teamed up with Viola author Pei Wei to launch The ‘What’s New’ page updated by Marc
Global Network Navigator, or GNN. This was Andreessen and Eric Bina had become both
the first portal on the Web, with travel, news, the front door and the front page to the infant
shopping, and – perhaps an even more impor- Web. It was the launching pad for several ser-
tant first – advertising (Figure 19.9). vices still familiar today.
With the Web’s first success came things But behind the scenes there was growing
to fight over. bitterness between the Mosaic programing
To the world, NCSA was having the ride team and their managers, Joseph Hardin and
of its life, with journalists virtually camping Larry Smarr. Each side felt the other was
outside the software development lab that expendable, while their own efforts were the
housed the Mosaic teams, and data center crux of Mosaic’s success.
BROWSERS AND BROWSER WARS 281

Figure 19.8 Gopher t-shirt in the style of hot-rod artist Big Daddy Roth, ca. 1994. Gopher
was the Web’s most serious competitor. It was developed by Mark McCahill, Paul Lindner,
and Farhad Anklesaria at the University of Minnesota. © Mark Richards.

Mosaic’s success also created tensions Mosaic’s features were modeled on the Viola
over credit and control between NCSA and and Midas Web browsers. But they chafed
the CERN Web team. In fact, much of the privately. While NCSA was chatting up top
world came to know the Web not as itself, journalists from around the world, Berners-
but under an alias, as Mosaic. NCSA called Lee’s and Cailliau’s passive silence was only
its generic Web server a Mosaic server, and matched by the spectacular inaction on the
its marketing materials rarely mentioned the part of the CERN press office. Mosaic – and
W-word (Figure 19.10). the United States – won the day.
The CERN Web team chafed at the slight. But Mosaic’s rise did accelerate Berners-
Much of the code in Mosaic came out of Lee and Cailliau’s activities in another direc-
the WWW code library they had provided. tion: trying to form non-profit Web bodies
282 THE SAGE HANDBOOK OF WEB HISTORY

Figure 19.9 Web portal site GNN pioneered Web advertising in 1993, with embedded ads
similar to this example from 1995. GNN evolved from a bookstore kiosk version of ‘The
Whole Internet User’s Guide’ based on the early Viola browser. Courtesy of O’Reilly Media.

Figure 19.10 Mosaic marketing materials. Credit: © Board of Trustees of the University of Illinois.
BROWSERS AND BROWSER WARS 283

that could help assure its future. Cailliau At the end of 1993 the tensions within
made contacts at the European Union, and the NCSA Mosaic team had reached toxic
Berners-Lee began to seriously consider levels. Marc Andreessen quit NCSA and
overtures from MIT to host the US part of an took a job at pioneering Internet company
organization there. Enterprise Integration Technologies (EIT) in
Silicon Valley, which would later launch the
CommerceNet consortium.
THE GREAT MIGRATION WESTWARD But Andreessen was soon recruited by the
larger-than-life founder of Silicon Graphics,
By 1994 companies, institutions, advertisers, Jim Clark, to help start a new Web company.
and news media were jumping on the Web Andreessen suggested creating a new browser
bandwagon in geometrically increasing num- and server, codenamed ‘Mozilla’ – a mashup
bers. There was also the first serious interest of the idea of Godzilla and what they hoped
from three harder-to-get players: the highest would be a Mosaic-killing browser18. They
levels of government, the publishing indus- threw down the gauntlet by poaching around
try, and perhaps most important of all, Silicon half the Mosaic team from NCSA, including
Valley. Mosaic coauthor Eric Bina, and the whole
Progress happened fastest in the United group founded Mosaic Communications in
States. The combination of CERN (and early 1994.
Geneva’s) indifference to the Web with NCSA took them to court over their reuse
Silicon Valley’s growing interest pushed of the Mosaic name and, potentially, code.
along a general move westward: from Europe The terms of the settlement that followed
to the United States, and from everywhere were secret. But one result was that the new
else to Silicon Valley (Figure 19.1). company changed its name to that of its

Figure 19.11 White House site, 1994. Available at public domain.


284 THE SAGE HANDBOOK OF WEB HISTORY

new browser, Netscape, which then became The Web’s westward movement got help
Netscape Navigator (Figure 19.12). from an unexpected quarter – the US White
NCSA had assigned commercial rights for House. Vice President Al Gore person-
Mosaic to its corporate partner, Spyglass, and ally demoed the Web to cabinet heads, and
Browser War I was on. Spyglass made some encouraged them to put up Web sites for
good sales, including to one pivotal player – their departments. The White House’s own
Microsoft. But to little avail. Netscape’s site (whitehouse.gov, (Figure 19.11)) led
Navigator seemed to support its program- the way (CHM, 2011)19, a final knock for
mer’s claims that they had rewritten the new one-time Web rival Gopher, which had early
browser from the ground up: it was slick, reli- use by Congress.
able, and faster than Mosaic. It also offered As a major center for both comput-
full support to large paying customers, a ing and the 1960s counterculture, the San
prerequisite for many corporations to adopt Francisco Bay Area as a whole had long
it. NCSA Mosaic was dead within a year. It been a mixing ground for technology with
would be Navigator that first showed the Web – vivid, sometimes utopian, dreams for its
and the online world – to the rest of us. future.

Figure 19.12 Screenshot from Netscape Navigator. Credit: ©AOL.


BROWSERS AND BROWSER WARS 285

The ideas for VRML, or Virtual Reality institute. Part of the impetus was that the lab
Markup Language, were hammered out had just gotten funding approval from its
against a background of technical inspiration, member states to build a major new accelera-
LSD and hot tub parties, and espresso. Mark tor, the Large Hadron Collider (LHC), to hunt
Pesce and Tony Parisi’s vision was of a world for the elusive Higgs-Boson particle. CERN
where most browsers supported virtual real- had always been ambivalent about support-
ity, and the most basic operations on the Web ing the Web, given its physics mission. But
took place in a virtual world20. the shift would effectively scatter what little
That meant not just games or virtual online center of gravity was left to Web develop-
communities like the later Second Life, but ment in Europe and in the Geneva region22.
navigating your way from site to site, and When Europe fully woke up to the Web, it
browsing information once you got there. would be as an American import.
A site’s designer might choose to have you Yet despite its rocketing popularity, the
arrive in a virtual room where you could Web still didn’t have a way to pay for itself.
talk with someone else’s avatar or choose a The next section explores how that changed.
book or a video from a shelf, as well as more
‘conventional’ uses of VR like a virtual walk-
through of a building, collaborative engineer-
ing projects21, or visualization of molecules. MAKING THE WEB SAFE FOR
VRML was a surprise hit at the vastly BUSINESS
oversubscribed first international Web con-
ference at CERN, an event Web programmer If it wasn’t for Netscape, you’d be calling the Web
Microsoft Network or AOL by now.
Jean-François Groff called ‘the Woodstock
of the Web’. Held in the summer of 1994, it — Lou Montulli, Netscape founding programmer,
was a follow-up to the Wizard’s Workshop coauthor of the Lynx browser
hosted by publisher O’Reilly and Associates
the summer before. The participants at both To a savvy business person in the mid 1990s,
events shared a palpable sense that they were the Web looked like a waste of time. There
going to change the world in wonderful ways, were two big reasons. First, it was an open
that they were on the cusp of something Big. standard, with open access. That meant no
But the westward pull continued. Only obvious way to charge by the minute, which
months later, Tim Berners-Lee announced had been the bread and butter of commercial
that he would run the new World Wide Web online systems from 1960s timesharing to
Consortium (W3C) from MIT, in an attempt Minitel to CompuServe – and of telephone
to combat his eroding control over the Web companies since before living memory. How
and its potential fragmentation as a stand- would you make money?
ard. This left Robert Cailliau and others at Second, the Web ran over the Internet. Not
CERN in a diminished role. CERN would only was that government-funded network
host merely the European office, not the based on another open standard, but until
headquarters. mid 1995 it had actually forbidden com-
Mosaic’s eclipse by Netscape was no great mercial use. It didn’t help that many of the
surprise, given the sheer power and momen- better-known spokespeople for both the Web
tum of Silicon Valley. Nor was Tim Berners- and Internet were hippie hackers, academics,
Lee’s shift to the United States. But the civil liberties advocates, or otherwise seen
complete end of Web development at CERN as less than enthusiastic about conventional
was sudden. In December of 1994, CERN business. For many of these folks the Internet
management decided to hand the Web project felt like the antidote to commercial online
off to the French national computing research services like CompuServe and AOL, a shared
286 THE SAGE HANDBOOK OF WEB HISTORY

Figure 19.13 CommerceNet Consortium page, 1994 Courtesy of Kevin Hughes.

commons on which crass commercialization Many pre-Web online systems had been
would be as appetizing as blaring TV ads in very friendly to business indeed. For instance,
a public library. France Telecom generated billions in annual
Of course, we all know the Web and sales on Minitel (Mailland and Driscoll, 2017).
Internet went commercial in the end. But it Even iconoclastic, alternative online commu-
took a number of kick-starts to get them there. nity The Well had a rock-solid business model
Their open, non-commercial roots were a big with its hourly connection charges.
contrast with much of the history of auto- As the Web took off, a few pioneers tried to
mated information systems. The long history prove that the once anti-commercial Internet
of electrically enhanced business leads from could indeed support business. By 1993,
nineteenth-century telegraphy and Western publisher O’Reilly’s pioneering commercial
Union money transfers through 1930s portal GNN (Global Network Navigator) was
Telex, and then the flowering of computer- running online ads, soon joined by Wired
ized transaction systems – ERMA, SWIFT, Magazine’s online venture HotWired. The
ATMs, etc. – from the 1950s onward. In mid 1990s was also the first flowering of
fact, the first dedicated e-commerce device would-be digital currencies, from anonymous,
may have been the telegraph-era ticker tape cypherpunk-inspired DigiCash to the more
machine for stock quotes (Figure 19.14). conventional CyberCash. In 1994, Enterprise
BROWSERS AND BROWSER WARS 287

Yahoo!, and eBay that finally convinced


mainstream business to follow the pioneers
into Web commerce. Netscape’s innovative
business model – free to individuals, com-
mercial licenses to companies – began to
answer the skeptic’s question of how an open
standard could pay.
In terms of tech cred, it didn’t hurt that
Sun Microsystems had picked up the radi-
cal dream that net-based applications could
make operating systems irrelevant (Gillies
and Cailliau, 2000)24. Java applets would give
your browser all the functions you needed, and
eventually eliminate the need for Microsoft or
Apple. Sun’s slogan was: ‘The Network is the
Computer’, and Netscape among others was
Figure 19.14 Universal 3-A stock ticker, ca. happy to come along for the ride.
1870–80. Among the first dedicated e-com-
merce devices, ticker tape machines printed
stock prices in real time. They were named
for their ticking sound © Mark Richards. CLOSING THE FRONTIER

Integration Technologies (EIT) – the same By the mid 1990s there were working models
firm which had hired Marc Andreessen for some of the main pillars of Web com-
after he left NCSA – founded the influential merce: ‘freemium’ software and services
CommerceNet consortium to develop Web with Netscape, ad-supported information por-
commerce (Figure 19.13), with members tals like GNN and Yahoo!, and direct sales of
including Wells Fargo, Netscape, and Visa. goods as with 1-800-FLOWERS, Amazon’s
While mainstream businesses were still virtual mega-bookstore, and eBay’s unique
trying to figure out how to monetize clicks, auction model.
pornography and gambling sites were quietly The frontier was closing fast. The next
starting to earn serious profits and pioneer- story would be one of settlement: not how to
ing the nuts and bolts of Web transactions make the Web, but who would control it.
along the way. The first online lottery started The Web fully joined the mainstream in the
in Liechtenstein in 1995, and soon there was late summer of 1995. The giddy expansion of
a wildly growing new industry loosely cen- the last couple of years had finally produced
tered around London; a freewheeling world a critical mass of users, developers, journal-
of online casinos, offshore shell companies, ists, companies, and government support.
and wild Caribbean parties, all threaded Equally important, Microsoft got involved.
through legal loopholes with the skill of a Netscape had been riding high, particu-
master tailor23. At the same time, sex and por- larly when its August IPO made it the first
nography were evolving from earlier models famous ‘dot-com’. Then the sleeping giant in
on bulletin board systems (BBSs), Minitel, Seattle finally woke up, and the result was the
and Usenet to a new and highly profitable long, cold Browser War II. As in the real Cold
kind of Web industry. War, many other, smaller conflicts slowed
But it was Netscape’s spectacular 1995 down, and what followed were three years of
IPO that kicked off the dot-com boom, and locked combat in which the World Wide Web
the success of online firms like Amazon, reached the rest of us.
288 THE SAGE HANDBOOK OF WEB HISTORY

But the software giant’s entry into Web mostly to the tried and true. This meant it was
development was a reluctant one, since it easy for users to switch browsers, since they
meant scrapping plans for its own compet- all did roughly the same things.
ing Microsoft Network (MSN), a standalone By the late 1990s the design of Web pages
environment with its own networking proto- was professionalizing fast. While HTML had
cols (Figure 19.15). The surviving ‘walled originally been aimed at ordinary users with
gardens’ of the era – CompuServe, AOL, word-processor-level graphics needs (and
Minitel in France – were already beginning skills), the lack of easy editing pushed con-
to fade out or become Web access points. But tent production to professionals. The kinds of
the tens of millions of copies of Windows crude pages done by techie webmasters were
95 that Microsoft expected to sell in the first giving way to serious graphic design, aided
year came ready to plug right in to MSN. by various changes to HTML25.
Even if just a small percentage subscribed, The world at large was just in the first
MSN would have instantly become a walled throes of Web mania. But to those most
garden bigger than CompuServe and AOL. involved in the Web’s launch, the wild ride
In a single document, Bill Gates’ ‘Internet of the last few years seemed to be palpa-
Tidal Wave’ memo dictated a complete bly slowing down. By the end of 1995, the
change of direction for the huge firm, akin to results of the struggles you’ve been reading
turning a battleship around a buoy. Microsoft about would produce the first durable balance
became a Web company. It was a recognition of power for the major players in the Web’s
that the Web – and the business models that technical development.
Netscape and others had pioneered for mak- For nearly four years – an eternity in those
ing an open standard pay – had gotten too big early days when a ‘Web year’ was jokingly
to simply crush. defined as around three months26 – Microsoft
But Microsoft could still crush Netscape. and Netscape were locked in a slow war
Gates decided to use his firm’s near-monopoly of attrition, with the World Wide Web
over the desktop to take on the browser Consortium (W3C) in the middle. Started
market. Microsoft bundled its own licensed by Tim Berners-Lee, W3C was the Web’s
version of Mosaic, which it named Internet standards body, with over 100 corporate
Explorer, with every later copy of Windows members. It also served as a field of combat,
95 and its successors. MSN became a Web as Microsoft and Netscape proposed their
portal as Microsoft challenged Netscape’s own additions to HTML, HTTP, and so on.
Navigator browser head on. At its peak, the Netscape tended to be the challenger, adding
Explorer team reportedly grew to a thousand new tags that infuriated HTML architects like
strong (Sink, 2003). Dave Raggett, while Microsoft often found it
Browser War II was fought on largely the a useful competitive strategy to assiduously
same grounds as Browser War I: Mosaic clone follow W3C rules.
vs. Mosaic clone. Neither company seriously Browser War II was over by 1999. Netscape
revisited the early Web or other hypertext sys- Navigator’s share of the browser market had
tems to expand the basic footprint of what a faded to an echo of Internet Explorer’s, and
browser could do for the user. Netscape Gold Netscape was bought by AOL. But in 1998,
was the only major try at a browser-editor: some of Netscape’s core programmers had
painfully slow, it made editing and brows- started the open-source Mozilla foundation.
ing two separate modes as in very early word The Firefox browser was one of the main
processors. Microsoft did try to integrate the fruits of that effort, and in a sense contin-
Web and connectivity into a variety of prod- ued the battle against Internet Explorer for
ucts, from Word to Windows, with varying another 14 years until Explorer was retired
degrees of success. But Internet Explorer kept by Microsoft.
BROWSERS AND BROWSER WARS 289

Figure 19.15 Windows 95 box with Microsoft Network (MSN) logo. Windows 95 came ready
to connect to this initially proprietary network and online service, the last major challenger
to the Web and Internet. MSN soon switched to providing internet access. Source: Computer
History Museum.

There has been no Browser War III. Why? – search, shopping, and user-generated con-
The main battleground for ‘mindshare’ on tent from Wikipedia to social media. It has
the Web has shifted away from browsers to also shifted form factor. As we’ll see shortly,
mostly server-based (aka ‘cloud’) services the battle lines have increasingly become
290 THE SAGE HANDBOOK OF WEB HISTORY

smartphone vs. computer, and mobile global, open-source standard designed to


browser vs. smartphone app. promote interoperability and cross-linking.
Tim Berners-Lee’s NeXT browser-editor, Most apps, by contrast, are proprietary and
with its idea of integrated browsing and edit- often fleeting. They aren’t generally designed
ing as extensions of word processing and to share data with other apps, or with most
our daily to-do lists, address books, notes, Web sites. In fact, apps offer another way for
etc., was never followed up by any major Web-based ‘walled gardens’ like Facebook to
product27. keep users within their manicured confines.
The most concrete example today might Pre-Web mobile ‘browsing’ has a long and
be the aging Amaya reference browser-editor fascinating history that goes back at least to
you can still download from the W3C site. But Alan Kay’s 1968 Dynabook concept (CHM,
in 2018 you can pretty much only edit pages 2011; Kay, 1972), if not Nikola Tesla’s fan-
that live on your own machine, or your own tastic 1905 ‘World System’ ideas (Carlson,
server if you’re a geek. The whole infrastruc- 2013), and up through the brilliant but failed
ture that the CERN Web team envisioned for efforts of General Magic, the startup that
handling authoring permissions – ‘yes’, for tried to bring us something like the mobile
instance, for your own pages at work but ‘no’ Web – and beyond – in the early 1990s. But
for editing your colleague’s – was never built. that lies beyond the scope of this chapter, as
Instead, what comparatively little editing does a detailed history of the mobile Web28.
we do in 2018 is within the confines of par- In broad outline, mobile devices tried to
ticular ‘Web 2.0’ sites, like blogs, wikis, and incorporate Web browsing as soon as the Web
social networking sites. It is enabled by tools got popular. Nokia’s 1996 Communicator
that run mostly on the server side, rather than was likely the first smartphone to come with
being an integral feature of the browser itself. a browser built in. The powerful Newt’s Cape
(play on Netscape) browser for the Apple
Newton handheld computer appeared in 1995,
although truly wireless browsing required a
WEB IN YOUR POCKET pricey radio modem as an accessory29.
But there was a problem. The still-emerging
The kind of take-no-prisoners competition mobile data networks of the time weren’t up
that once happened between browsers is still to delivering full-sized Web pages at reason-
going on – but in our pockets, and between able speeds, especially without the kind of
our pockets and our desks (or laps)! bandwidth-sensitive format negotiation the
There are more functional differences Web team had originally favored.
between Safari on an iPhone and Safari on Mobile makers tried a series of clever but
a MacBook than between nearly any two ultimately unsatisfying workarounds. For
desktop Web browsers ever written. Ditto for instance, the Palm IV organizer optimized
Chrome on Android vs. Chrome on a Google caching of data on servers to permit browsing
Chromebook. Those differences aren’t just of a sort over low-speed pager networks30.
about user interface issues, like where the On many ‘dumb’ phones, Wireless Access
bookmark’s menu is or how to enter URLs. Protocol (WAP) let users access selected Web
They’re about how you use the browser – content. But the text-only ‘browser’ could
where, for what, and for what kind of sites. only connect to sites which had paid to be
Then there is the rivalry between mobile included on a central WAP server.
Web browsers and Internet apps. At a techni- Others tried to make the basic language
cal level, a mobile Web browser itself is just of Web pages, HTML, more compact. In
an especially well-established Internet ‘app’. 1996 HTML senior architect Dave Raggett
But in societal terms the Web embodies a roughed out a version of HTML optimized
BROWSERS AND BROWSER WARS 291

Figure 19.16 Kinokuniya bookstore, Figure 19.17 Cybird’s mobile map, i-mode
i-mode site. Credit: © Kinokuniya. site. Credit: ©Cybird.

for speed31. But the first such effort to reach browser, and a whole new set of options.
consumers was in Japan. The phones even served as mobile electronic
So in 1999, while American dot-com CEOs wallets, for buying anything from a soda to
were famously dancing on tables in pleather a train ticket. NTT DoCoMo handled billing
pants (Helmore, 2001; Paternot, 2001)32 at and shared the revenues with official i-mode
the peak of the dot-com boom, Tomihisu sites (Mallon, 2013; Matsunaga, 2001;
Kamada of ACCESS and other Japanese Wallace et al., 2002)35. Domestic competitors
pioneers hired by telephone operator NTT to i-mode soon followed, though attempts to
DoCoMo were deploying the mass mobile spread beyond Japan fizzled.
Web – eight long years before the iPhone and Smartphones finally started to get global
Android brought it to the rest of us. traction around 2001: Symbian phones from
i-mode sites (Figures 19.16, 19.17) used Europe, Blackberrys from Canada, Palm/
a compact version of the HTML language, Handspring Treos from Silicon Valley. All
called – appropriately enough – Compact had browsers, as did some handheld com-
HTML (Matsunaga, 2001; Wallace et al., puters for use with a landline modem. Opera
2002)33. By 2002, over 34 million Japanese and ACCESS were two of the major mobile
subscribers were using i-mode phones for browser makers. But small screens, limited
Web access, e-mail, banking, live maps, support for plugins, and still-poky speeds on
streaming video, news, and pretty much eve- full-sized Web pages limited user interest.
rything else we do with smartphones today iPhone didn’t start as a phone. Galvanized
(Wallace et al., 2002)34. by the success of its iPod, Apple began devel-
Early devices looked like conventional oping a minimalist tablet – all screen, no
‘dumb’ phones, but pushing the ‘i’ (infor- keyboard. But when the skunkworks tablet
mation) mode button opened the i-mode team stumbled across a multi-touch pointing
292 THE SAGE HANDBOOK OF WEB HISTORY

device, management got excited. Could using Modern browsers have neither become de
your fingers to pinch, scroll, and zoom be the facto operating systems as in the Java dreams
basis for a new kind of user interface – one of the later 1990s39, nor our personal word
that might let a phone-sized screen easily processors, to-do lists, hypertext author-
browse documents or a full Web page? ing tools, etc. as Tim Berners-Lee origi-
The rest is history. In combination with net- nally envisioned (Berners-Lee and Fischetti,
works that were finally fast enough to deliver 1999). Either of these roles could have kept
full-sized Web pages wirelessly, the iPhone – particular browsers ‘sticky’ for users, less
and its Android imitators – took the mass Web interchangeable than they are today. What
mobile. At least the passive browsing part. authoring and PIM (Personal Information
The finger interface was heavily optimized Management) functions we do now are
for displaying content, not creating it. mostly on the server side, through social
The original iPhone had no third-party media, online calendars and notes, Google
apps. With the exception of included apps docs, blogs, or wikis. This keeps our loyalties
like iTunes, the only connection to the net with the owners of the servers, not the brows-
was through the built-in Safari browser. But ers we use to access them.
in 2008, Apple opened up the iPhone to third- Whether the Web’s drift toward the server-
party developers, who quickly produced side dominance of older commercial online
thousands of them36. Apps could, of course, systems like Minitel, CompuServe, etc. is per-
only run on iPhones. But more remarkably, manent remains to be seen. In the last decade
they could only be accessed through Apple’s there have been attempts to put more func-
highly profitable (and ‘curated’, including tionality in the browser, including Google’s
censorship) App Store37, an update on the ChromeBook with its browser-as-operating
wild success of the iTunes Store for the iPod. system, and the Mozilla Foundation’s similar
Android apps followed suit, though with Firefox OS for smartphones (Mims, 2013).
less of a proprietary lock. Neither have exactly taken the world by
An ironic result was to partly freeze Web storm. But unlike older online systems, with
browsers. Rather than push the limits of what their tightly coupled clients and servers, the
could be done within a browser, or in the client (browser) side of the Web is a fulcrum
shared environment of the Web, developers from which powerful players once shaped
began to put their most innovative features our online future. It remains a wild card.
into native iOS or Android apps. You may
be walking down the street as you pinch and
scroll with your fingers rather than clicking ACKNOWLEDGMENTS
on a mouse, but the main features of mobile
browsers in 2018 have changed little from The author thanks Niels Brügger and Ian
Viola, and Midas, and Mosaic a quarter cen- Milligan for their help and advice in the
tury before. preparation of this chapter, Hansen Hsu for
Looking back, another reason browsers exploring the history of Apple browsers
have become less of a focus is that the server together, and his agent Laurie Fox for help-
side has finally caught up. The centers of ing him think through some of the material
gravity for the Web today are the masters of that led to its contents. He thanks Kirsten
great server banks38 – and even greater user Tashev, Len Shustek, Dag Spicer, Chris
bases – like Facebook, Weibo, Wikipedia, Garcia, Paula Jabloner, Karen Kroslowitz, Al
Amazon, Alibaba, Google, and Yandex. As Kossow, Jon Plutte, and many, many others
we’ve seen, the gradual oozing of power to at CHM for educating him about interpreta-
the server side was accelerated by the fact tion and preservation in a museum context.
that browsers lost two key battles. He thanks Ben Segal, Jean-François Groff,
BROWSERS AND BROWSER WARS 293

Tim Berners-Lee, Stevan Keane, Dave and blogs, Google Docs, etc.). The de facto stan-
Jenny Raggett, and Kevin Hughes for getting dard created by NCSA’s IMG tag accelerated the
adoption of simple images (and possibly of the
him hooked on the topic in the first place.
Web as a whole!) while likely delaying common
standards for other media, and taking multiple
link display modes as proposed in Berners-Lee’s
INCLUDE tag off of the table.
Notes 15  The following is based on Weber, interviews
with Tim Berners-Lee, 1995 and 1996, Weber,
1  ‘Fake news’ is one highly discussed consequence
1995 unpublished, and interviews with Jean-
of the ad-driven model, and the decline of tra-
François Groff, 1995–6 and 2010, and Dave
ditional subscription models for media may be
Raggett, 1996–7. Berners-Lee’s eventual goal
related to the rise of ‘freemium’.
was to have multimedia automatically adjust to
2  Attributed to Internet pioneer Dave Clark, IETF
the capabilities of the browser and connection
plenary presentation 1992; also quoted in Resn-
being used. This was part of a larger goal for
ick (1992).
the Web to be scalable to different browsers and
3  Personal confirmation, 2017, Marshall Kirk
devices. For instance, text tagged as ‘Heading 2’
McKusick, member of Berkeley UNIX team at
in HTML might show up as bolded, with extra
the time. According to McKusick, ARPA made
space above and below, and in a larger font size
the inclusion of support for TCP/IP into Berkeley
on a full GUI browser like Viola or the original
UNIX a prerequisite for further ARPA funding.
NeXT browser-editor. On a text-only browser it
4  Then-Senator Al Gore’s High-Performance Com-
might be bolded, if the browser handled that,
puting and Communication Act of 1991 created
or separated by an extra line and marked by
the National Information Infrastructure, which
asterisks at either end. On a special browser
promoted and funded over $600 million of vari-
for the deaf, the narrating voice might simply
ous networking initiatives. Gore called this the
pronounce those words with more volume and
‘information superhighway’ (Gore, 1991).
emphasis. Similarly, lines of text would automat-
5  Also based on my own interviews with creators
ically ‘wrap’ to fit the width of a browser win-
of Viola in 1996, Hyper-G in 1996, Lynx in 1996,
dow, rather than having a fixed width that could
and Montulli of Lynx for a second time in 2006.
dangle beyond the right margin.
6  This observation is also supported for Berners-
How would that principle apply to multiple
Lee by his statements about himself in my inter-
media? As an example, a video clip could appear
views with him in 1995–7, by Nelson about
in its original, full resolution on a fast graphical
himself in my interview with him in 2013, and in
browser connected with broadband. But with a
the case of Engelbart by Markoff, 2005, among
slow connection, that might get knocked down
other biographical accounts.
to a lower resolution video, or even an image.
7  Weber, interview with Ted Nelson, 2013.
On a text-only browser, a caption might simply
8  Weber, Interviews with Tom Bruce, 1996.
describe what the video was about and note
9  Weber, interviews with Jean-François Groff,
that it couldn’t be shown. All this would hap-
1995, 1996, 2010; interviews with Tim Berners-
pen through format negotiation – the browser
Lee, 1995–6; interviews with Robert Cailliau,
would automatically send the server a list of the
1995–6.
formats it could handle, and the server would
10  Weber, interviews with Jean-François Groff,
then send back appropriate choices.
1995, 1996, 2010.
11  Weber, interviews with Jean-François Groff, 16  www-talk mailing list for 1993, preserved at
1995, 1996, 2010; interviews with Tim Berners- webhistory.org (http://1997.webhistory.org/lists/
Lee, 1995–6; interviews with Robert Cailliau, lists.html), among other sites.
1995–6. 17  Eric Bina initially had opposed adding graphics at
12  Cello was the only very early browser not based all because he was concerned they would waste
on the CERN code library; Bruce rewrote all bandwidth. From Weber and Hughes, interview
the needed code himself. Interviews with Tim with Eric Bina and Marianne Winslett, 1996.
Berners-Lee, 1995–7; Tom Bruce, 1996. 18  Jamie Zawinski 1994–6, ‘The Netscape Dorm’,
13  Weber, interviews with Tim Berners-Lee, 1995 https://www.jwz.org/gruntle/nscpdorm.html
and 1996. (accessed February 2018).
14  For instance, the decision by most browser cre- 19  Also from Weber, interviews with Jock Gill, Phil-
ators to leave out authoring flipped the Web as lip Hallam-Baker, John Mallery, 1996–7.
a whole to a read-only medium, and much later 20  Weber, interviews with Tony Parisi, 1996, Mark
gave rise to server-side editing (social media, Pesce, 1996.
294 THE SAGE HANDBOOK OF WEB HISTORY

21  VENUS project at CERN led by Silvano de Genn- browser for the Apple Newton. It was also a
aro, for prototyping the Large Hadron Collider in multimedia e-book creation tool using New-
virtual reality; http://venus.web.cern.ch/VENUS/ tonScript, http://newtonglossary.com/terms/
(accessed February 2018). newts-cape (accessed February 2018).
22  In early 1995 CERN held a both sad and funny 30  Weber, interview with Jeff Hawkins, 2010.
last-minute Web event for the previously 31  The working name was EZWeb. Tom Greene of
overlooked European tech press; a combined W3C and I came up with a set of proposals for
coming-out party and wake. Many attending a mobile device with a browser-editor based on
journalists were turned on to the Web for the EZWeb or a similarly compact version of HTML,
first time just as it was being kicked out of the called WebGirl as a joke on GameBoy. There were
nest – including myself. The unanswered ques- doubtless others thinking along similar lines.
tions of that event helped pique my own interest 32  TheGlobe.com co-founder Stephan Paternot.
in researching the history of the Web. See Paternot (2001).
23  Interview with Adriaan Brink, 2009. Interview 33  ‘Compact HTML for Small Information Appli-
with anonymous source, 2009. ances’, W3C NOTE 09-Feb-1998. Boston: World
24  Pei Wei’s 1989 Viola hypertext system was based Wide Web Consortium. https://www.w3.org/
around downloadable Java-like applets and was TR/1998/NOTE-compactHTML-19980209/
intended to be part of a more general environ- (accessed February 2018).
ment. Even the browser (later adapted into the 34  In contrast with WAP’s rigid centralization
ViolaWWW Web browser) was written from or the iOS App Store’s lock on native apps,
them. The idea was that functionality could be i-mode was a semi-open ecosystem – a bit like
downloaded as needed over the net. Gillies and Minitel, or the later Android Play Store with
Cailliau (2000); Weber, interview with Pei Wei, its ‘curated’ contents. Since CHTML was an
1996, various personal communications. A pre- official W3C standard as a variant of HTML,
cursor to this vision was the MUPID hypertext anybody could set up a CHTML server. The
videotex system by Hermann Maurer, who later result was a mix of official i-mode sites listed
developed Web competitor Hyper-G. MUPID had and registered by NTT DoCoMo, and renegade
downloadable applets and the intelligent termi- ‘black’ or ‘grey’ sites. i-mode browsers could
nal could also function as a standalone PC. Also also connect to normal Web sites, albeit slowly
Weber, interview with Hermann Maurer, 1996, (Batista, 2000; Weber, interview with Tomihisu
personal communications, 2007. Kamada, 2008).
25  Weber, interviews with Kevin Hughes, 1996–7, 35  A later example of this kind of revenue shar-
2010. ing is the later Apple App Store; an earlier one
26  Quotes from 1996 Tim Berners-Lee interview in the Minitel Kiosk system (Mailland and Driscoll,
World Wide Web journal: http://whatis.techtar- 2017).
get.com/definition/Web-year (accessed February 36  Weber and Hansen Hsu, interview with Richard
2018). Williamson and Ken Kocienda, 2017.
27  Netscape Gold was an attempt to combine 37  With its apps and App Store, Apple had not only
browsing and editing features in the same taken complete control – and a 30% cut – of the
product. But instead of being able to edit text software it would allow to be used on its new
‘live’ as in a modern word processor or Berners- handheld computing platform. It had in effect
Lee’s original Web browser-editor, the user had created a proprietary hardware terminal for an
to switch into a separate editing mode, then open standard, the Internet.
back to browse mode to see how their changes 38  Note that this is different from the evolution of
looked. The Mac version of Netscape Gold was Web server software itself. Following the first
also slow and memory intensive. See Wodaski Web servers from CERN and then NCSA in the
(1996), and a tutorial page on a late version of early to mid 1990s, much server software was
Netscape Gold: http://www.zisman.ca/netgold/ supplied by Apache and other open-source
(accessed February 2018). packages for decades, with Sun and later Micro-
28  I plan to briefly summarize this history in an soft servers in a more minor role. The graph
upcoming book with Thomas Dunne Books/St. on this page from Netcraft.com shows the
Martin’s Press. changes over time: http://news.netcraft.com/ar
29  Weber and Hansen Hsu, interview with Greg chives/2016/07/19/july-2016-web-server-survey.
Christie, 2017. Newt’s Cape (http://www.how- html (accessed February 2018).
design.com/web-design-resources-technology/ 39  Recent attempts to put more operating systems
graceful-degradation-on-a-newton-pda/, on the client side include Google Chrome, and
accessed February 2018) was the most capable Mozilla’s short-lived Firefox OS.
BROWSERS AND BROWSER WARS 295

REFERENCES Hafner, K. (2001) The Well: A Story of Love,


Death, and Real Life in the Seminal Online
Bardini, T. (2000) Bootstrapping: Douglas Community. New York: Carroll & Graf.
Engelbart, Coevolution, and the Origins of Hauben, M., and Hauben, R. (1997) Netizens:
Personal Computing. Stanford, CA: Stanford On the History and Impact of Usenet and the
University Press. Internet. Los Alamitos, CA: IEEE Computer
Barnet, B. (2014) Memory Machines: The Evo- Society Press.
lution of Hypertext. London: Anthem Press. Helmore, E. (2001) ‘So who’s crying over spilt
Barnet, B. (2018) ‘Hypertext Before the Web – milk?’, The Guardian, May 10, 2001, https://
or, What the Web Could Have Been’, in. N. www.theguardian.com/technology/2001/
Brügger and I. Milligan (Eds), The SAGE may/10/internet.onlinesupplement (accessed
Handbook of Web History (n.p.). London: February 18).
Sage. Jones, S., and Latzko-Toth, G. (2017) ‘Out from
Batista, E. (2000) ‘WAP or I-Mode: Which Is the PLATO cave: Uncovering the pre-Internet
Better?’ Wired Magazine, 8/30/00. history of social computing’, Internet Histo-
Berners-Lee, T., and Fischetti, M. (1999) Weav- ries, 1(1–2): 60–69.
ing the Web: The Original Design and Ulti- Kay, A. (1972) ‘A personal computer for chil-
mate Destiny of the World Wide Web by its dren of all ages’, Boston, MA: Proceedings of
Inventor. San Francisco: Harper. the ACM National Conference.
Bourne, C.P., and Bellardo Hahn, T. (2003) A Krol, E. (1993) The Whole Internet: User’s
History of Online Information Services, Guide and Catalog. Cambridge, MA: O’Reilly
1963–1976. Cambridge, MA: The MIT Press. and Associates.
Campbell-Kelly, M., and Garcia-Swartz, D.D. Mailland, J., and Driscoll, K. (2017) Minitel:
(2013) ‘The history of the internet: The miss- Welcome to the Internet. Cambridge, MA:
ing narratives’, Journal of Information Tech- The MIT Press.
nology, 28(1): 18–33. Mallon, D. (2013) ‘Where Are They Now?
Carey, J., and Elton, T.J. (2009) ‘The other path i-mode’ Ubertech, ZDnet.com, http://www.
to the web: The forgotten role of videotex zdnet.com/article/where-are-they-now-i-
and other early online services’, New Media mode/ (accessed February 2018).
and Society, 11(1–2): 241–260. Marchand, M. (1987) Minitel (La grande aven-
Carlson, W.B. (2013) Tesla: Inventor of the Elec- ture). Paris: Larousse.
trical Age. Princeton, NJ: Princeton University Markoff, J. (2005) What the Dormouse Said: How
Press. the Sixties Counter Culture Shaped the Personal
Computer History Museum (CHM) (2011) ‘Rev- Computer Industry. New York: Penguin USA.
olution: The First 2000 Years of Computing’, Matsunaga, M. (2001) The Birth of i-mode: An
Web version. Mountain View, California, Analogue Account of the Mobile Internet.
http://www.computerhistory.org/revolution. Singapore: Chuang Yi Publishing Pte Ltd.
Dear, B. (2017) The Friendly Orange Glow: The Mims, C. (2013) ‘With a new web-based
Untold Story of the PLATO System and the mobile phone, Mozilla is out-Googling
Dawn of Cyberculture. New York: Pantheon. Google’, Quartz, July 1, 2013, https://
Gillies, J., and Cailliau, R. (2000) How the Web qz.com/99505/with-a-new-web-based-mobile-
Was Born: The Story of the World Wide phone-mozilla-is-out-googling-google/
Web. New York: Oxford University Press. (accessed February 2018).
Gore, A. (1991) ‘High-Performance Computing Mounier-Kuhn, P.-E. (2002) ‘Les premiers
Act of 1991’, 102d Congress, Public Law, réseaux informatiques en France’, Entre-
Government Printing Office, https://www.gpo. prises et Histoire, 29: 10–20.
gov/fdsys/pkg/STATUTE-105/pdf/STATUTE- Nelson, T. (2008) Geeks Bearing Gifts: How the
105-Pg1594.pdf (accessed February 2018). Computer World Got This Way. Sausalito,
Hafner, H., and Lyon, M. (1996) Where Wizards CA: Mindful Press.
Stay Up Late: The Origins of the Internet. Nielsen, J. (1995) Multimedia and Hypertext:
New York: Simon and Schuster. The Internet and Beyond. Chestnut Hill, MA:
Academic Press.
296 THE SAGE HANDBOOK OF WEB HISTORY

Paternot, S. (2001) A Very Public Offering: A Sink, E. (2003) ‘Memoirs from the Browser
Rebel’s Story of Business Excess, Success, and Wars’, https://ericsink.com/Browser_Wars.
Reckoning. New York: Wiley. html (accessed February 2018).
Pelkey, J. (2009–2018) A History of Computer Waldrop, M.M. (2001) The Dream Machine:
Communications: 1968–1988, http://www. J.C.R. Licklider and the Revolution That Made
historyofcomputercommunications.info/ Computing Personal. New York: Viking.
(accessed February 2018). Wallace, P., Hoffman, P., Blut, Z., Barrow, K.,
Resnick, P. (1992) ‘On Consensus and Hum- Scuka, D. (2002) i-Mode Developer’s Guide.
ming in the IETF’, IETF (Internet Engineering Boston: Addison-Wesley Professional.
Task Force) RFC (Request for Comments) Weber, M. (1995, unpublished) ‘The Untold His-
7282, also in IETF plenary presentation 1992. tory of the World Wide Web (and Why We’re
Russell, A. (2014) Open Standards and the Only using Half its Power)’, feature article for
Digital Age: History, Ideology, and Networks. Wired Magazine UK, fact-checked by Tim
Cambridge, UK: Cambridge University Press. Berners-Lee, Robert Cailliau, Jean-François
Salus, P.H. (1995) Casting the Net: From Groff, Ben Segal, and Mike Sendall. Unpub-
Arpanet to Internet and Beyond. Reading, lished due to demise of first version of Wired
MA: Addison-Wesley Professional. UK as joint venture with The Guardian.
Schafer, V., and Thierry, B.-G. (2012) Le Minitel, Weber, M., and Hughes, K. (1997) Web History
l’Enfance Numérique de la France. Paris: Nuvis. Events and Exhibit program, 1997 Interna-
Segal, B. (1995) ‘A Short History of Internet tional World Wide Web Conference,
Protocols at CERN’, Internet Society, History http://1997.webhistory.org/historyday/
pages: https://www.internetsociety.org/ (accessed February 2018).
internet/history-internet/, then http://ben. Wodaski, J. (1996) Creating Cool Navigator
home.cern.ch/ben/TCPHIST.html (accessed Gold Web Pages. New York: John Wiley &
February 2018). Sons Inc.
20
Emergence of the Mobile Web
Gerard Goggin

INTRODUCTION Here lies the challenge for how we, as


researchers, historians, users, and non-users,
When the World Wide Web consortium cele- understand the mobile Web. True, browsing
brated the 25th anniversary of the Web in 2014, the Web (old school!) on a mobile device is
inventor Sir Tim Berners-Lee issued a call for common. Yet what now represents the com-
a crowd-sourced Magna Carta. Noting that 40 mon or definitive Internet and Web experi-
per cent of the world’s population were Web ence comprises diverse encounters with a
users, Berners-Lee opened up discussion on range of other mobile technologies. Many
‘what could we do to get the other 60 per cent of the new and emergent Internet technolo-
on board [to the Web] as quickly as possible’. gies have evolved or been ‘born-mobile’ – in
‘Obviously’, Berners-Lee noted, ‘it’s going to terms of their socio-technical characteristics,
be around mobile’ (Berners-Lee, 2014). diffusion, take-up, and meanings (Goggin,
As we approach the 2020s, there is a very dif- 2014; Ito et al., 2005). It is often difficult to
ferent role for the Web, compared with the early strictly declare what kinds of mobile Internet,
2000s. Circa 2000 the Web was often seen as the media, and data technologies and cultures are
Internet. Now, many of the world’s users access authentically ‘mobile Web’, and which don’t
the Internet, including the Web, via mobile really engage the Web at all.
technologies (Donner, 2015; Herman et al., So, in this chapter, I offer a perspective on
2015). The mobile Web is still integral to the how one might define and approach mobile
public Internet, but the mobile Web is also – and Web, as a distinctive area of the general Web
often rather challengingly – woven into much and its histories. There are various ways of
of the fabric of the private, enclosed, ‘walled framing and telling the story of the develop-
gardens’ of the commercial and institutionally ment of mobile Web. Much of the literature,
controlled Internet. especially concerning the early history of
298 THE SAGE HANDBOOK OF WEB HISTORY

mobile Web, has focussed on convergence, it is the Web or Web content as these exist, or
standardization, global and local develop- are adapted, for mobile devices. For the World
ment, commercialization, and so on. Here Wide Web Consortium (W3C) on its ‘Mobile
Indrek Ibrus’s work has been outstanding, in Web Initiative’ home page, the tagline is
its sophisticated examination of the tensions simple: ‘Combining the power of the Web
between the forces for autonomy of the mobile with the strengths of mobile devices’ (W3C,
Web versus convergence with the general Web; 2014). As we shall see, however, the dividing
and, more recently, the continued push of infra- line between ‘mobile Web’ (= Web + mobile)
structure companies for ‘maximum conver- versus non-mobile Web can be quite blurry.
gence of all versions of web access platforms’, As I have already noted, for a brief moment
contrasted with the ‘measured divergence’, the Web, and indeed Web studies, promised to
cross-platform strategies of content and service be nearly co-extensive with the Internet itself,
providers (Ibrus, 2016). The story I wish to tell a moment captured by David Gauntlett’s col-
in the chapter revolves around the interplay lection Web.Studies (Gauntlett, 2000). This
between the shifting interests of various fac- was roughly the same time also we see the
tions of industry (mobile Web platform compa- emergence of the mobile Web. From the
nies, telecommunications companies, handset early 1990s onwards, an increasing num-
vendors, software providers, content and ser- ber of mobile phones and wireless devices
vice providers) with users and the contexts were designed to support Web browsing. The
of use, especially internationally. The chapter Nokia Communicator 900 in 1996 (Nokia,
comprises four key aspects of the mobile Web: 1996) is often seen as the first successful case
of a dedicated mobile Web device (Ibrus,
1 The emergence of the mobile Web, especially 2010: 35). The Communicator was a cellular
focussing on the Wireless Access Protocol (WAP) mobile phone with a hinged keyboard, ini-
as the first dedicated standards initiative; tially marketed as a top-end product for busi-
2 The invention of the ‘classic’ mobile Web browser,
ness executives keen for email and Internet
Opera;
access. It was advertised as a glamour acces-
3 Internet standards-focussed mobile Web develop-
ment and the brief flowering of mobile Web 2.0; sory, for instance in the 1997 movie The Saint
4 Smartphones and the dramatic change of the (Nokia, 1997). As the Communicator devel-
mobile Web, in the face of apps, social media, oped through more advanced models, the
and the internationalization of mobile media. device was increasingly used for mobile Web
access (cf. Figure 20.1).
As I proceed, I hope to highlight the key It is fair to say that the ways the mobile
issues grasping and theorizing the changing Web was imagined in its first decade were
nature of mobile Webs, and especially to draw fairly limited (Hansmann et al., 2003), when
attention to the importance in recent times of viewed from a historical vantage point some
attending to how different actors and groups two decades onwards. An important advance
have imagined, shaped, and adapted these occurred in December 1997, when Nokia,
technologies. As much as anything, as I shall Ericsson, Motorola, and Phone.com formed
outline, mobile Web histories represent a set the WAP Forum, creating the Wireless Access
of limit-cases for thinking about what the Protocol (WAP) to unify the already diverse
Web and Internet have been, are, and will be. kinds of mobile and wireless technologies
and networks that were available. WAP was
invented before the smartphone, to run on
BECOMING MOBILE WEB: MOBILE the mobile phones available from the late
BROWSING AND THE STRUGGLES OF WAP 1990s to mid 2005s, in particular. WAP is a
protocol that allowed mobile phones to dis-
Defining the mobile Web can be tricky. On a play webpages. To make this possible, web-
narrow view, the definition appears obvious: pages needed to be written in a particular
EMERGENCE OF THE MOBILE WEB 299

Figure 20.1 Nokia Communicator 9300 Model displaying Wikipedia home page.
Source: Wikimedia, https://upload.wikimedia.org/wikipedia/commons/5/5e/Nokia9300.png).

Web ‘language’: Wireless Markup Language 2005: 307). An obstacle to the take-up of
(WML). WML was different from the default WAP was that it required mastery of the new
language used for websites at the time – and specialized WML, rather than using
Hypertext Markup Language (or HTML) – HTML. Surmounting such criticisms, the
and the requirement to learn WML was an WAP Forum developed WAP 2.0 based on
additional barrier to website providers cus- directions in mainstream Internet and there-
tomizing websites for WAP. On the positive after WAP websites grew steadily.
side, WAP had quite minimum processing As the same time as WAP was develop-
and software requirements, including a very ing, so did an alternative technology: Japan’s
small or micro browser. Over the next dec- i-mode. I-mode was not just another competi-
ade, a slowly growing number of website tor, it represented another way of imagining
and Web content providers fleshed out WAP the mobile and the mobile Web. In 1998,
mobile Web offerings. Mostly early WAP Digital Phone Tokyo Group launched a simple
services involved transactions conceived in text browsing service called SkyWeb, which
a narrow sense, such as finding basic infor- did not include graphic functions (Haas,
mation on services or direction, making sim- 2006: 123). With fast-developing competi-
ple purchases, booking tickets or checking tion, this quickly evolved into J-Sky, offered
schedules, sport scores, news and weather, under the brand J-Phone. The prime competi-
stock quotes, and so on: tor was i-mode, introduced in February 1999
by dominant Japanese telecommunications
The vast majority of WAP content is text-based, so carrier NTT DoCoMo. I-mode was an eco-
any kind of graphical content is rare to find. Don’t
expect to find a WAP-ready YouTube, Google system of mobile Internet, mobile Web, con-
Video, or anything of that sort. (NTE, 2002) tent, and services, and, crucially, an integrated
billing system that included ‘the monthly
For the first few years of its life WAP was not subscription fee, the packet transmission
widely used, and often widely derided as a fee, and the i-mode information fee [paid to
‘failure’ (Teo and Pok, 2003). Commentators information or content service providers]’
suggested that its promoters had failed to (Ishii, 2004: 46). In technical terms, i-mode
appreciate the capabilities and design issues was underpinned by a version of the HTML
of mobiles, dazzled instead by the Internet Web standard, called c-HTML. The i-mode
and tech boom of the late 1990s (Jenson, was tightly controlled by DoCoMo, allowing
300 THE SAGE HANDBOOK OF WEB HISTORY

subscribers to access a wide range of mobile mobile Internet in that country ‘evolved from
data services (forerunners to mobile premium mobile phones and pagers … rather than
services, as well as mobile apps), offered by from PCs’, and that the ‘Japanese experience
approved third-party providers. As well as the after 1995 demonstrates that user needs have
most popular services such as search, trans- brought about the high penetration rate and
portation information and maps, news, and unique usage patterns’ (Ishii, 2004: 56, 57).
weather (2001 survey of accessed websites Accordingly, the lesson to be drawn from
via PC and mobile telephone quoted in Ishii, i-mode, Ishii argues, is ‘the mobile Internet
2004: 50), popular i-Mode services included may develop in a diverse manner throughout
music, ringtones, and games. As one contem- the world, depending on local culture and
poraneous study notes: ‘Web pages that can customs’ (Ishii, 2004: 57). The deep cultural,
be accessed via I-mode phones are either offi- social, political, economic, and infrastruc-
cial sites registered on the Imenu, or unoffi- tural underpinning of i-mode and the mobile
cial sites’ (Jonason and Eliasson, 2001: 343). Internet in Japan is something that is richly
I-mode rapidly attracted users, numbering an established in the pioneering 2005 volume on
estimated 14 million subscribers by October mobiles in Japanese life, Personal, Portable,
2001 (Natsuno, 2003: 22), then 33 million Pedestrian (Ito et al., 2005). Internet histori-
just three years after launch (Ishii, 2004: 44). ans have offered revisionary accounts of the
Due to i-mode, in particular, Japan became development of the Japanese Internet, not-
the celebrated and widely debated case ing, among other factors, the challenges of
internationally for mobile Web and mobile language scripts on Japanese computer key-
Internet use. In particular, in the WAP vs. boards, compared with innovative solutions
i-mode contest, we see the cross-currents of offered by mobile devices (McLelland et al.,
ongoing tensions in the mobile Web, parsed 2018). Such an emphasis on cultural, social,
in the distinctions drawn between ‘closed’ and linguistic dynamics in relation to the
and ‘open’ visions of the Internet. In no mobile Web and Internet applies even more
small part, the achievement of i-mode is due so in the decade and a half after these debates
to strong operator control (Tee and Gawer, – when WAP has played a crucial role in
2009). Further still, the consumer ease of use extending mobile Internet access globally,
of i-mode and its quick take-up relied upon especially in countries of the ‘global south’
a carefully constructed and tightly controlled (Ling and Horst, 2011).
ecosystem, something that was quite a con- Early adoption of the mobile Web in
trast with WAP, as various researchers noted emerging economies is documented in a
(e.g. Wallace et al., 2002: 150). Various efforts range of research. Among others, the key
were made to export i-mode to other markets, work of Jonathon Donner on the evolving
but it never attracted a following. One issue mobile nature of the Internet has shown users
was timing: the marketing of i-mode in for- seeking to access a wide range of particular
eign markets in 2002 onwards, including the mobile messaging, chat, music, file-sharing,
Netherlands, Taiwan, and Australia (Goggin, information, and other technologies and apps,
2006: 168; Tee and Gawer, 2009), coincided but then often having resource to WAP as a
with the rise of mobile premium services, and way to ‘venture out onto the broader Internet’
eventually the advent of the smartphone and (Donner et al., 2011: 577). Much mobile Web
mobile apps (Goggin and Spurgeon, 2007). usage happens over Wi-Fi networks rather
Woven into these dynamics of compet- than mobile cellular networks (Donner, 2015:
ing mobile Web systems are cultural factors. 34, 129). A 2009 study of digital media usage
An insight that resonates across the history among low-income urban school students in
of the mobile Web is offered by Japanese rural South Africa found that ‘different kinds
scholar Kenichi Ishii, who notes that the of media downloaded from WAP mobile
EMERGENCE OF THE MOBILE WEB 301

phone portals are often used side by side with OPERA: THE CLASSIC MOBILE WEB
locally produced digital media, (Kreutzer, BROWSER
2009: 53). It also noted ‘the absence of walled
garden portals among students’ [mobile] web An instructive case study of the emergence of
usage’ (Kreutzer, 2009: 62). the mobile Web can be found in the browser
Yet another picture of the complex emer- synonymous with it: Opera. Like WAP,
gence of WAP is offered by the Chinese case. Opera was invented years before the smart-
In comparison to Japan (Harwit, 2012), in phone redefined what most consumers
China the PC remained the only interface thought of the mobile phone.
until 2000 when China Mobile started its Opera was developed by two computer
WAP business. WAP was initially slow programmers, Jon Stephenson von Tetzchner
to take up in China (Yan, 2003), with only and Geir Ivarsøy. They worked in Televerkets
50,000–60,000 users by end 2000, or some 2 Forskninginstitutt (TF), the celebrated research
per cent of the total market at that time (Wang group in the Norwegian telecommunica-
& Cheng, 2012: 5). However, the 2003 launch tions incumbent Telenor that also included
of QQ mobile (a version of the highly popu- Finn Trosby, a central figure in the invention
lar QQ messaging software) played an impor- of SMS, and mobile communication scholar
tant role in popularizing WAP phones, such Rich Ling (Ling, 2015: 443), among others.
as the Nokia 7110 (McLelland et al., 2018). Von Tetzchner and Ivarsøy began coding their
For some users, such as migrant women, the own Web browser in December 1994 (Opera,
advent of this mobile Web represented ‘nec- 2009), and at the end of 1995 left Telenor to
essary convergence’, as ‘many migrants’ only found Opera Software ASA (Manes, 1998;
means of going online (outside of Internet Opera, 2017). In 1996, the Opera browser
cafés) is with Internet-enabled mobile was offered for sale to the public, competing
phones’ (Wallis, 2013: 107). Reflecting on with the free Microsoft Internet Explorer and
this study, Cara Wallis argues that for these Netscape Navigator (Manes, 1998). Opera
users, such mobile Web is ‘important in soon attracted praise from users and technol-
alleviating digital inequality’, but that many ogy writers (Manes, 1998). Fitting on one
would prefer the more unconstrained ways to floppy disk, Opera also included a magnifica-
access QQ and other programs via a computer tion tool that allowed zooming in or out, lead-
(Wallis, 2013: 107). This is an argument that ing a reviewer in The Guardian to note that it
various others develop in other directions, ‘can be used by those with impaired vision’
in relation to other jurisdictions, such as (Jennings, 1998). Opera had some limita-
Philip Napoli and Jonathon Obar’s critique of tions, as it could not yet handle certain kinds
mobile Internet access as raising the spectre of content, such as Java and ActiveX; did not
of an emerging underclass (Napoli and Obar, come with its own plug-ins (though it could
2014). Such trends across a range of settings accept Real Audio, and some other multimedia
are taken up and systematically considered in plug-ins); had limitations with handling email
Donner’s 2015 book After Access, notable for addresses (Manes, 1998); and had a barrier to
its argument that the ‘shift to a more mobile use with its ‘try-before-you-buy retail model’
Internet may be accelerating this departure (as the premium version cost £25). As The New
away from the Web ideal – and in so doing, York Times journalist Stephen Manes memo-
altering the relationship between mobile-only rably put it: ‘Opera’s story is a Norse saga of
users and the Internet itself’ (Donner, 2015: smaller size, clever features, reasonable speed
156). For its part, WAP still exists and thrives and occasional annoyances’ (Manes, 1998).
today across many international settings – Opera’s reputation for innovation was the
but it still very much flies under the radar of reason that a third former Telenor colleague,
dominant industry and research trends alike. Håkon Wium Lie, joined the company in 1999
302 THE SAGE HANDBOOK OF WEB HISTORY

as Chief Technology Officer (Lie and Bos, Commission against Microsoft tying Internet
2005: 351), a position he held until 2016.1 Explorer to the Windows Operating System,
Lie was impressed that Opera was only the and ‘hindering interoperability by not follow-
third browser to implement Cascading Style ing accepted Web standards’ (Opera, 2007).
Sheets (CSS), a feature of the Web he had In 2009, the EC rendered legally binding
created while working at CERN in 1994, Microsoft’s commitments to offer its OS users
when he proposed the concept to Berners- a choice of browsers, and eventually fined the
Lee and Robert Cailliau (Lindberg, 2016).2 company €561 million for a 2011–12 breach
Lie joined Opera after a four-year stint at of this commitment (EC, 2013). Otherwise,
W3C, so it is no surprise that he was a flag in this heyday period, Opera was regularly
bearer for the pivotal role of standards in announcing partnerships with a wide range
mobile Web technology development and the of mobile phone vendors and carriers. It also
future of the market. sought to capitalize on its pole position as a
By the early 2000s, Opera had gained an mobile Web pioneer, by trailblazing the emerg-
enthusiastic, although still minority, follow- ing mobile markets internationally, including
ing, with its price seen as a continuing issue as various countries in Asia and Africa. Opera
well as its difficulties in keeping up with the is an excellent example of the way in which
developments in Web technology and design the market became ‘attuned to the needs of
(Taylor, 2002). Making a virtue of its light- bandwidth-constrained mobile users’, result-
ness and speed, Opera chose to stake its repu- ing in important innovations in mobile Web
tation on being the best browser for mobile browsing technology (Donner, 2015: 131).
devices (The Guardian, 2002). In 2005, Opera Opera continued development, and man-
launched the Opera Mini, a cut-down version aged to forge on with modest success in
of its mobile browser, designed for fast-loading the smartphone era, reaching 71 million
content (see Figure 20.2 and Figure 20.3). active users for its mobile browser in 2010
Opera also featured as a fully fledged com- (Sengupta, 2010), rising to 281 million
batant in the ‘browser wars’ among the dif- mobile users in 2016 (Dredge, 2016), due to
ferent companies vying to rule the Web. In ‘pockets of strength’ as a leading browser in
2007, it lodged a complaint with the European Africa, India, Indonesia, and China (Tsang

Figure 20.2 Opera Mini advertising, Opera.com website, 31 December 2005.


Source: Internet Archive, https://web.archive.org/web/20051231034658/http://www.opera.com:80/
products/mobile/operamini).
EMERGENCE OF THE MOBILE WEB 303

Figure 20.3 Mobile website browsing on Opera Mini, Opera.com website, 31 December 2005.

and Mozur, 2016). However, tensions were deepened with the parallel developments in
revealed when von Tetzchner departed the Internet Web technologies themselves
company in 2010, criticizing Opera’s new (W3C, 2015). As WAP emerged, there were
directions in a blaze of acrimonious inter- parallel developments in the mainstream
views (Orlowski, 2014). Von Tetzchner set world of Internet standards and protocols
up a new company, Vivaldi (vivaldi.com), responding to the urgent need for the
revolving around a new, fully customizable, Internet to cope with mobility of connec-
premium browser, ‘redefining what a browser tions (something evident in the creation of
is’ (Schofield, 2016; Toulas, 2016). the IPv6 protocol). The two efforts from the
In 2016 Opera’s consumer businesses mobile industry and Internet sector for-
(Mobile Advertising, Consumer, and Tech mally came together in 1998 when the
Licencing) were the subject of a US$1.2 bil- World Wide Web Consortium (W3C) and
lion takeover bid by Chinese investors. The the WAP Forum released a White paper
offer was accepted by Opera SA but found- which identified areas of future coopera-
ered on the shoals of US regulatory approvals, tion, expressing a desire to achieve ‘the
allegedly due to concerns about privacy and seamless integration of mobile devices into
security of user data. In November 2016, a the Web’ (W3C, 1998).
consortium of Chinese firms quietly acquired Of course, the reality was that the future
the Opera consumer businesses for US$575 turned out to be dauntingly seamful. As
million (Opera, 2016). In new hands, the W3C explained in a 2015 overview docu-
Opera browser limped on; however, its fate ment: ‘Mobile devices not only differ widely
underscored the profound changes unfolding from traditional computers, but they also
in the mobile Web. have a lot of variations among themselves,
in term of screen size, resolution, type of
keyboard, and media recording capabilities’
(W3C, 2015). Indeed, the mobile Web pre-
WEB CUSTODIANS AND MOBILE WEB sents many challenges for Web design and
2.0 VISIONARIES implementation:

The trend of mobile manufacturers design- Mobile devices have a lot of power compared with
ing devices that were optimized for the Web the desktop computer of 10 years ago, but they
304 THE SAGE HANDBOOK OF WEB HISTORY

also have severe limitations that don’t have to be REIMAGINING THE MOBILE WEB
dealt with when developing Web sites solely for
the desktop … Mobile devices force Web develop-
IN THE SMARTPHONE AND SOCIAL
ers to think about things they have never had to MEDIA MOMENT
think about before. (Zakas, 2013)
The incubation of the mobile Web that we
The development of mobile phones as com- have tracked in the 1997–2006 period entered
puter devices, as data processing and net- a fateful and largely unforeseen situation
working devices, for use with different with the arrival of smartphones and tablet
kinds of software applications – and, into computers, which quickly became regarded
the mix, with different kinds of Internet and as the dominant mobile phone technology
Web technologies, is a process that takes (Frith, 2015; Goggin, 2011; Helmond, 2015;
some years to unfold. As convergence gath- Vincent and Haddon, 2018).
ered pace, W3C, as the official custodians While there is some complexity to the
of the Web, picked up the pace on their ‘smartphone moment’, what is evident is
standards development activities. In May that many kinds of services previously con-
2005, the W3C launched its mobile Web sumed via the mobile Web are now accessed
initiative, to make ‘browsing the web from via apps (Morris and Murray, 2018). Apps
mobile devices a reality’ (W3C, 2005). are a kind of mobile software designed and
‘Mobile access to the Web has been a optimized to work with particular mobile
second-class experience for far too long’, devices, operating systems, and mobile net-
according to Berners-Lee, who enjoined works (see, for instance, FTC, 2017). Mobile
developers to make ‘the mobile device as a apps also have the advantage of operating
first-class participant, and … produce mate- within an apps ecosystem that allows the kind
rials to help developers make the mobile of billing, access to third-party providers, and
Web experience worthwhile’ (W3C, 2005). ease of customer experience (often fraught,
Vaulting over this worthy initiative was a however) previously associated with i-mode.
dramatic, even faintly millenarian, movement Apps are tightly integrated with the user’s
to reimagine the future of media by conjur- mobile device, allowing unprecedented
ing up ‘Mobile Web 2.0’, as it was dubbed by tracking of user behaviour and information.
developers Ajit Jaokar and Tony Fish (Jaokar Smartphones also include a range of sen-
and Fish, 2006). This vision of a multime- sor, gestural, haptic, and location technology
dia, collaborative, participatory digital media (Frith, 2015), which adds new dimensions to
had a dramatically expanded mobile Web at the affordances of technology coupled with
its heart, keen to untap the innovation possi- the apps. Since 2007, and increasingly, much
bilities in mobile data technology (Jaokar and consumer experience of online communica-
Fish, 2004). Even scholars saw mobile Web tion has occurred inside and via apps, espe-
2.0 as significantly altering the possibilities cially apps for social media and mobile chat
of technology, for instance, in urban life (viz. platforms. However, there are distinct dif-
Bilandzic and Foth, 2009: 62). The discourse ferences between apps and the mobile Web
of mobile Web 2.0 was relatively shortlived, that have been widely noticed and debated
compared with its much more widely taken- (Brügger, 2018).
up counterpart, Web 2.0. What happened next First and foremost, the mobile Web can be
was a major detour in the career of the mobile accessed from any browser on a smartphone,
Web – the arrival of smartphones. The furi- without the user needing to install or pay for
ous development of smartphones, and their an app. Second, in a context of rising con-
associated operating systems and software cerns concerning data privacy and surveil-
(especially apps), ushered in a new phase for lance, while the user is surfing the mobile
the mobile Web. Web from their browser, their behaviour and
EMERGENCE OF THE MOBILE WEB 305

data cannot be tracked as effectively and support for sharing content, viewing videos,
comprehensively as it can with an app. Third, and downloading content for later, off-line con-
with the mobile Web any provider can poten- sumption, as well as offering advantages for
tially offer a website, without having to get advertising. Complicating the picture are Web
permission from, and abide by the terms and apps. These are software applications that run
conditions of, the particular app store or mar- inside a Web browser. Web apps have an
ketplace. Fourth, the mobile Web browser important advantage, because they are in the
also supports the possibility to jump from control of the website provider, rather than the
one website to another, whereas a user cannot entity controlling the app store.3 A good exam-
jump easily from one app to another, as ‘an ple of the respective merits of ‘native’ apps
app cannot link out of itself directly to other versus Web apps can be found in the case of the
apps’ (Brügger, 2016: 1064). Against these Financial Times. A global brand thriving in the
advantages, it can be contended that, even age of digital disruption of news, the Financial
vastly improved with the advent of the HTML Times switched from having an iOS app in
standard, the mobile Web user experience is 2011. The apparent reason was to avoid paying
much more variable than is the case with Apple a 30 per cent cut of subscription revenue,
apps. A key issue here is the lack of integra- as well as to avoid allowing Apple to collect
tion between website, device, carrier, website payment information and other data directly
service or content provider, and so on – from customers when they purchase the app
a typically tight coherence in apps. and thus subscribe (Marshall, 2017; Owen,
In brief, over the 2006–18 period, we see 2017). Instead, Financial Times subscribers
a world of difference between apps and the viewed content via a Web app. In August 2017,
mobile Web. In response to such issues, we the Financial Times reverted to an iOS app,
can also observe many efforts to join up apps hoping to ‘boost subscriber engagement with
and the mobile Web – and, partly as a result of its content and in turn increase the revenue it is
such bridging efforts, important interdepend- able to extract from its customers over the long
encies between the two technologies. These term’ (Marshall, 2017). Like other media
close relationships have led various commen- brands, such as Spotify, Apple softened the
tators, such as Computerworld’s Ira Brodsky, blow by only allowing existing subscribers to
to predict that ‘in the end, you won’t be able be able to download the iOS app (Marshall,
to tell them apart’ (Brodsky, 2015): 2017). As a result, consumers have to subscribe
via the Financial Times website, and cannot
The mobile Web and mobile apps are not mutually purchase a subscription within the app itself –
exclusive. In fact, the two solutions often work
thus denying Apple access to the subscriber
together … Neither the mobile Web nor mobile
apps are going to die. They are going to converge. data (Marshall, 2017).
(Brodsky, 2015) As this example indicates, with the major-
ity of their traffic coming from mobile
To understand this better, consider the common consumption, news and entertainment com-
experience of consuming news from a mobile panies have a strong investment in such
device. A user might get her news from intricate strategies for navigating the mobile
Buzzfeed, the online news outlet, via an app. Web. From a mobile Web designer’s perspec-
However, reading a story on the app might lead tive, mobile (i.e. ‘native’) apps are crucial,
to a link, which takes the user to a story on the depending on the business. However, there
Buzzfeed website, or another website. This still exists the general imperative of grasp-
would then be viewed via a mobile browser, ing the nettle of ‘the subtler issues of mobile
which would launch on the device. So-called web’, as one textbook author puts it, argu-
‘native apps’ (that is, running on the device, ing in ‘favour of providing a mobile-friendly
inside the operating system) can offer better website’ (Esposito, 2016: 381). These kinds
306 THE SAGE HANDBOOK OF WEB HISTORY

of discourses and contests over the mobile Donner and Schrock, there is little systematic
Web, apps, and mobile technology can be scholarly research on the emergence of the
better understood by the case of HTML5 mobile Web, and its implications for how we
standardization. understand the Web, the Internet, and con-
As Andrew Schrock notes, since the ‘demise temporary media. Invaluable in their own
of Flash on mobile, HTML5 is an increas- right, such mobile Web histories – another
ingly essential infrastructure of the mobile kind of ‘missing net histories’ (Driscoll and
web’ (Schrock, 2014: 827). He explains that Paloque-Berges, 2017) – can also contribute
the mobile Web ‘connects web pages devel- to more accurate, comprehensive, and
oped in HTML5 with various web browsers’, nuanced understandings of Internet histories.
and the ‘distinctions between apps and the Added to which, mobile Web histories can
mobile web are starting to blur due to HTML5 shed important light on how the development
increasingly being used for cross-platform app of technology, standards, uses, cultural and
development’ (Schrock, 2014: 829). While media forms, social meanings, and imaginar-
HTML5 is an example of an ‘open’ standard, ies play out now and into the future. In con-
Schrock shows the various ways that com- cluding, I wish to outline key items for the
panies ‘work open-source standards to their research agenda.
advantage by introducing features into the First, while there is growing recognition of
W3C, while ignoring or downplaying features the archival and methodological challenges
introduced by others’ (Schrock, 2014: 830), of Web histories, most rigorously put in the
which he aptly characterizes as ‘a milder ver- work of Niels Brügger (e.g. Brügger, 2011),
sion of the infamous “browser wars” in the and registered in a range of emergent work
mid-1990s’ (Schrock, 2014: 830). A key chal- pertaining to Internet histories, as yet there is
lenge is that W3C is hamstrung by its lack of little specific discussion of these issues when
enforcement powers, so is left to find other it comes to mobile Web histories. There is a
ways to marshal support and collective agree- threshold ontological issue as the mobile Web
ment. For instance, in late 2015, the two-year is a different entity or object from the Web
EU-funded HTML5 Apps project concluded, itself (cf. Brügger, 2009). Much of the con-
which hoped to accelerate ‘the development of tent may be similar, and there is a broad simi-
standard Web technologies required to make larity among mobile Web, the Web, and other
HTML5 apps competitive with native apps, markup languages. However, it is unclear to
specifically in the areas of Web payments what extent we might be able to reconstruct
and rich mobile Web APIs’ (ERCIM, 2015). a particular mobile Web artefact or have suf-
It aimed to close the gap between native and ficient documentation to analyse mobile Web
HTML5 apps, through the ‘standardization practices and experiences. How and where
of missing HTML5 functionalities’ (ERCIM, are mobile Web artefacts and texts archived?
2015). Such initiatives deserve closer scrutiny And how might they be retrieved, for what
by historians and scholars, especially as buried kinds of purposes? (Davis, 2014; McCown
in such detail lie key coordinates of how the et al., 2015; Schneider and McCown, 2013).
Internet is being shaped for the future (Ibrus, A key issue for the mobile Web arises with
2016; Schrock, 2014: 830). stand-alone, proprietary social media sys-
tems. How does one gain access, for instance,
to Facebook or Twitter or Weibo or Instagram
(as perhaps four of the most harvested and
CONCLUSION researched platforms at present)? Typically,
it involves either: purchasing access from the
Apart from a few studies cited here, notably company; purchasing access via a third-party
the work of Idrus as well as others such as provider (e.g. via a ‘fire hose’); or gathering
EMERGENCE OF THE MOBILE WEB 307

and scraping what particular data from the Finally, a key area for future research is to
platform is available via the publicly accessi- grapple with the implications for Web histo-
ble Web and Internet. Because of their nature, rians of mobile Internet being developed and
as we have seen from reviewing their histo- appropriated differently depending on local
ries, mobile Webs involve a significant com- cultures, economics, infrastructures, and so
ponent of technology, software, and data that on. Here researchers face the imperative of
is owned by mobile companies. acknowledging, documenting, and exploring
The issues for the mobile Web unfolding the very different mobile Webs, mobile Web
with the advent of mobile apps are an impor- cultures of use, social practices and func-
tant case in point here. Apps are key to the tions, providers, and industries to be found
Internet, when accessed via tablets and mobile across the whole international range of set-
phones, yet writing a history of apps is a con- tings, cultures, and languages. Such research
siderable challenge – because it is difficult to trajectories are now on the agenda in rela-
gain reliable information on what apps are tion to global Internet histories (Goggin and
available and how they are developed and McLelland, 2017), and promise to be espe-
used, especially due to the reluctance of app cially rich in relation to the babel of inter-
store owners to allow access to information. national mobile Webs – the journey through
There exist potential ‘workarounds’ – creative which will surely help us assemble a much
techniques to gain a picture and details of apps, richer and adequate set of Web, Internet, and
for instance, via Web and Internet archives other histories.
(which will include websites with details of
apps), a strategy adopted by Anne Helmond in
her critical history of apps (Helmond, 2017). Notes
However, in apps and elsewhere, the basic
foundational work in scrutinizing the archives, 1  Håkon Wium Lie had an illustrious career in
technology and activism, and was a founding
historical remnants, traces, and infrastructures member of the Norwegian Pirate Party (see his
of the mobile Web remains to be done. website at http://www.wiumlie.no/en). Inter-
Moving from the foundational questions estingly, he also edited a 1993 special issue of
of archives, materials, and methods, there Telenor’s in-house journal Telektronik on ‘Cyber-
are various areas of mobile Web histories space’ (https://www.w3.org/People/howcome/p/
telektronikk-4-93/).
that deserve attention. Much existing schol- 2  While at Opera, Håkon Wium Lie wrote his PhD
arly research on the mobile Web focusses on thesis on Cascading Style Sheets, which included
key periods of the early mobile Web, WAP, a chapter on CSS for small screens (Lie, 2005:
i-mode, or mobile Web work by W3C. Future chapter 8).
historical research needs to much more sys- 3  As well as ‘native’ mobile OS apps and Web apps,
there are also hybrid apps: ‘Hybrid apps are built
tematically cover and revise this terrain, from with web technologies and mobile web imple-
different perspectives that better incorporate mentations and run inside a native container on a
the different industries, players, and institu- mobile device’ (Vakintis and Panagiotakis, 2016:
tions they entail. 233).
Consider, for instance, that we know very
little empirically of actual mobile websites
themselves. There is little recuperative or REFERENCES
analytical work, or wider scholarly discus-
sion on websites, Web culture, or Web design Berners-Lee, T. (2014) ‘A Magna Carta for the
(for example, in retrospective scholarly con- Web’. TED Talks. Vancouver, 19 March.
ferences) when it comes to the mobile Web. Retrieved from https://www.ted.com/talks/
Mobile browsers have received little or no tim_berners_lee_a_magna_carta_for_the_
attention. web. [Accessed 14 July 2018]
308 THE SAGE HANDBOOK OF WEB HISTORY

Bilandzic, M., and Foth, M. (2009) ‘Social navi- User Experience. Redmond, WA: Microsoft
gation and local folksonomies: Technical and Press.
design considerations for a mobile informa- European Commission (EC). (2013) ‘Antitrust:
tion system’, in S. Hatzipanagos and S. War- Commission fines Microsoft for non-
burton (Eds), Handbook of Research on Social compliance with browser choice commit-
Software and Developing Community Ontol- ments’, media release, 6 March. Retrieved
ogies. Hershey, PA: IGI Global. pp. 52–66. from http://europa.eu/rapid/press-release_IP-
Brodsky, I. (2015) ‘Deathmatch: The mobile 13-196_en.htm. [Accessed 14 July 2018]
web vs. mobile apps’, Computerworld, 21 European Research Consortium for Informatics
December. Retrieved from https://www.com- and Mathematics (ERCIM). (2015) ‘HTM-
puterworld.com/article/3016736/mobile- L5Apps paves the way to future W3C pay-
w i re l e s s / t h e - m o b i l e - w e b - v s - m o b i l e - ment standards and advances mobile Web
app-death-match.html. [Accessed 14 July standardization roadmap’, media release, 29
2018] September. Retrieved from https://www.
Brügger, N. (2018) ‘Web history and social ercim.eu/news/396-html5apps-paves-
media’, in J. Burgess, A. E. Marwick, and T. the-way-to-future-w3c-payment-standards-
Poell (Eds), The SAGE Handbook of Social and-advances-mobile-web-standardization-
Media. London: Sage. pp. 196–212. roadmap. [Accessed 14 July 2018]
Brügger, N. (2016) ‘Introduction: The Web’s Federal Trade Commission (FTC). (2017) ‘Under-
first 25 years’, New Media & Society, 18(7): standing mobile apps’. Retrieved from https://
1059–1065. www.consumer.ftc.gov/articles/0018-
Brügger, N. (2011) ‘Web archiving – between understanding-mobile-apps. [Accessed 14
past, present, and future’, in M. Consalvo July 2018]
and C. Ess (Eds), The Handbook of Internet Frith, J. (2015) Smartphones as Locative Media.
Studies. Oxford: Blackwell. pp. 24–42. Hoboken, NJ: Wiley.
Brügger, N. (2009) ‘Website history and the Gauntlett, D. (Ed.) (2000) Web.Studies: Rewir-
website as an object of study’, New Media & ing Media Studies for the Digital Age.
Society, 11(1–2): 115–132. London: Arnold/Hodder and Oxford Univer-
Davis, C. (2014) ‘Archiving the Web: A case sity Press.
study from the University of Victoria’, Goggin, G. (2014) ‘Facebook’s mobile career’,
Code{4}Lib, 26. Retrieved from http://jour- New Media & Society, 16(7): 1068–1086.
nal.code4lib.org/articles/10015. [Accessed Goggin, G. (2011) Global Mobile Media.
14 July 2018] London and New York: Routledge.
Donner, J. (2015) After Access: Inclusion, Goggin, G. (2006) Cell Phone Culture: Mobile
Development, and a More Mobile Internet. Technology in Everyday Life. London and
Cambridge, MA: MIT Press. New York: Routledge.
Donner, J., Gitau, S., and Marsden, G. (2011) Goggin, G., and McLelland, M. (Eds) (2017)
‘Exploring mobile-only Internet use: Results Routledge Companion to Global Internet
of a training study in urban South Africa’, Histories. New York: Routledge.
International Journal of Communication, 5: Goggin, G., and Spurgeon, C. (2007) ‘Premium
574–597. rate culture: The new business of mobile inter-
Dredge, S. (2016) ‘Browser maker Opera in line activity’, New Media & Society, 9(4): 753–770.
for $1.2bn acquisition by Chinese consor- The Guardian. (2002) ‘Opera is staging a coup’,
tium’, The Guardian, 11 February. Retrieved The Guardian, 24 October, p. 7.
from https://www.theguardian.com/technol- Haas, M. (2006) Management of Innovation in
ogy/2016/feb/11/browser-maker-opera- Network Industries: The Mobile Internet in
acquisition-chinese. [Accessed 14 July 2018] Japan and Europe. Wiesbaden: Deutscher
Driscoll, K., and Paloque-Berges, C. (2017) Universitäts-Verlag.
‘Searching for “missing” net histories’, Inter- Hadlaw, A. Herman, A., and Swiss, T. (Eds)
net Histories, 1(1): 47–59. (2015) Theories of the Mobile Internet:
Esposito, D. (2016) Modern Web Development: Materialities and Imaginaries. New York:
Understanding Domains, Technologies, and Routledge.
EMERGENCE OF THE MOBILE WEB 309

Hansmann, U., Merk, L., Nicklous, M.S., and Kreutzer, T. (2009) Generation Mobile: Online
Stober, T. (2003) Pervasive Computing: The and Digital Media Usage on Mobile Phones
Mobile World. 2nd ed. Berlin: Springer. among Low-Income Urban Youth in South
Harwit, E. (2012) ‘Comparative development Africa. Report. Centre for Film and Media
of the mobile Internet in China and Japan’, Studies. Retrieved from University of Cape
in R. W.-C. Chu, L. Fortunati, P.L. Law, and S. Town. Retrieved from tinokreutzer.org/
Yang (Eds), Mobile Communication and mobile [Accessed 14 July 2018]
Greater China. New York: Routledge. Lie, H.W. (2005) Cascading Style Sheets. PhD
pp. 80–95. thesis, University of Oslo. Retrieved from
Helmond, A. (2017) ‘NWO Veni grant for App http://www.wiumlie.no/2006/phd/ [Accessed
ecosystems: A critical history of apps’, Blog 14 July 2018]
post, 28 July. Retrieved from http://www. Lie, H.W., and Bos, B. (2005) Cascading Style
annehelmond.nl/2017/07/28/nwo-veni- Sheets: Designing for the Web. 3rd edition.
grant-for-app-ecosystems-a-critical-history- Upper Saddle River, NJ: Addison-Wesley.
of-apps/ [Accessed 14 July 2018] Lindberg, O. (2016) ‘Interview with Håkon
Helmond, A. (2015) ‘The platformization of the Wium Lie’, Net Magazine, 625. 11 April.
Web: Making Web data platform ready’, Social Retrieved from https://medium.com/net-
Media + Society, 1(2). Retrieved from https:// magazine/interview-with-h%C3%A5kon-
doi.org/10.1177/2056305115603080. wium-lie-f3328aeca8ed [Accessed 14 July
[Accessed 14 July 2018] 2018]
Ibrus, I. (2016) ‘Web and mobile convergence: Ling, R. (2015) ‘Rich Ling: An intellectual auto-
Continuities created by re-enactment of biography’, in Z. Yan (Ed.), Encyclopedia of
selected histories’, Convergence, 22(2): Mobile Phone Behaviour. Vol. 1. Hershey, PA:
147–161. IGI Global. pp. 442–452.
Ibrus, I. (2010). Evolutionary Dynamics of New Ling, R., and Horst, H. (2011) ‘Mobile commu-
Media Forms: The case of the Open Mobile nication in the global south’, New Media &
Web. PhD thesis, London School of Society, 13(3): 363–374.
Economics. Manes, S. (1998) ‘For specialty users, browser
Ishii, K. (2004) ‘Internet use via mobile phone price may be right’, The New York Times, 28
in Japan’, Telecommunications Policy, 28(1): April, p. 3.
43–58. Marshall, J. (2017) ‘Financial Times returns to
Ito, M., Okabe, D., and Matsuda, M. (Eds) Apple’s app store after a six-year hiatus’,
(2005) Personal, Portable, Pedestrian: Mobile Wall Street Journal, 7 August. Retrieved from
Phones in Japanese Life. Cambridge, MA: https://www.wsj.com/articles/financial-
MIT Press. times-returns-to-apples-app-store-after-six-
Jaokar, A., and Fish, T. (2006) Mobile Web 2.0. year-hiatus-1502092855 [Accessed 14 July
London: Futuretext. 2018]
Jaokar, A., and Fish, T. (2004) Opengardens: McCown, F., Yarbrough, M., and Enlow, K.
The Innovator’s Guide to the Mobile Data (2015) ‘Tools for discovering and archiving
Industry. London: Futuretext. the mobile Web’, D-Lib Magazine, 21: 3–4.
Jennings, C. (1998) ‘To buy or not to buy – Say McLelland, M., Yu, H., and Goggin, G. (2018)
boo to bloatware’, The Guardian, 17 Sep- ‘Alternative histories of social media in Japan
tember, p. 14. and China’, in J. Burgess, A.E. Marwick, and
Jenson, S. (2005) ‘Default thinking: Why con- T. Poell (Eds), SAGE Handbook of Social
sumer products fail’, in R. Harper, L. Palen, Media. Los Angeles, CA: Sage. pp. 53–68.
and A. Taylor (Eds), The Inside Text: Social, Morris, J.W., and Murray, S. (Eds) (2018) Appi-
Cultural and Design Perspectives on SMS. fied. Ann Arbor, MI: University of Michigan
Dordrecht: Springer, pp. 305–324. Press.
Jonason, A., and Eliasson, G. (2001) ‘Mobile Napoli, P.M., and Obar, J.A. (2014) ‘The emerg-
Internet revenues: An empirical study of the ing mobile Internet underclass: Critique of
I-mode portal’, Internet Research, 11(4): mobile Internet access’, The Information
341–348. Society, 30(5): 323–334.
310 THE SAGE HANDBOOK OF WEB HISTORY

Natsuno, T. (2003) i-Mode Strategy. Chichester, Schneider, R., and McCown, F. (2013) ‘First
England: John Wiley. steps in archiving the mobile web: Auto-
Nokia. (1997) ‘Nokia 9000 Communicator mated discovery of mobile websites’, JCDL
makes a visible appearance in The Saint’, ‘13: Proceedings of the 13th ACM/IEEE-CS
media release, 9 April. Retrieved from http:// Joint Conference on Digital Libraries. Indian-
www.nokia.com/en_int/news/ apolis, IN: ACM. pp. 53–56.
releases/1997/04/09/nokia-9000-communi- Schofield, P. (2016) ‘Vivaldi – the new web
cator-makes-a-visible-appearance-in-the- browser for power users’, The Guardian, 6
saint [Accessed 14 July 2018] April. Retrieved from https://www.theguard-
Nokia. (1996) ‘First GSM-based communicator ian.com/technology/2016/apr/06/vivaldi-the-
product hits the market’, media release, 15 n e w - w e b - b r o w s e r- f o r- p o w e r- u s e r s
August. Retrieved from http://www.nokia. [Accessed 14 July 2018]
com/en_int/news/releases/1996/08/15/first- Schrock, A.R. (2014) ‘HTML5 and openness in
gsm-based-communicator-product-hits-the- mobile platforms’, Continuum, 28(6):
market-nokia-starts-sales-of-the-nokia- 820–834.
9000-communicator [Accessed 14 July Sengupta, D. (2010) ‘Opera is the oldest
2018] browser, and it is still surviving: Jon Tetzch-
NTE. (2002) ‘What is WAP Mobile Web’, ner, founder, Opera software’, Economic
LovetoKnow. Retrieved from http://cell- Times, 9 December. Retrieved from http://
phones.lovetoknow.com/What_is_WAP_ economictimes.indiatimes.com/opinion/qna/
Mobile_Web [Accessed 14 July 2018] opera-is-the-oldest-browser-and-it-is-still-
Opera. (2017) ‘FAQS: When was Opera surviving-jon-tetzchner-founder-opera-soft-
founded?’. Retrieved from http://www.ope- ware/articleshow/7068303.cms [Accessed
rasoftware.com/press/faq [Accessed 14 July 14 July 2018]
2018] Taylor, P. (2002) ‘Faster and smarter’, Financial
Opera. (2016) Annual Report. Retrieved from Times, 13 October.
http://www.operasoftware.com/company/ Tee, R., and Gawer, A. (2009) ‘Industry archi-
investors [Accessed 14 July 2018] tecture as a determinant of the successful
Opera. (2009) ‘Celebrating the 15th anniver- platform strategies: A case study of the
sary of the Opera browser’s origin’, media i-mode mobile Internet service’, European
release, 28 April. Retrieved from http://www. Management Review, 6(1): 217–232.
operasoftware.com/press/releases/ Teo, T.S.H., and Pok, S.H. (2003) ‘Adoption of
general/1994-the-year-that-started-it-all the Internet and WAP-enabled phones in
[Accessed 14 July 2018] Singapore’, Behaviour & Information Tech-
Opera. (2007) ‘Opera files antitrust complaint nology, 22(4): 281–289.
with the EU’, media release, 13 December, Toulas, B. (2016) ‘Vivaldi browser: Interview
http://www.operasoftware.com/press/ with Jon Stephenson von Tetzchner’, Utap-
releases/general/opera-files-antitrust- pia blog, 21 September. Retrieved from
complaint-with-the-eu. https://utappia.org/2016/09/21/vivaldi-br
Orlowski, A. (2014) ‘Opera founder von Tetzch- owser-interview-with-jon-stephenson-von-
ner: It’s all gone to crap since I quit’, The tetzchner/ [Accessed 14 July 2018]
Register, 7 February. Retrieved from https:// Tsang, A., and Mozur, P. (2016) ‘Chinese group
www.theregister.co.uk/2014/02/07/opera_ bids $1.2 billion for company behind Opera
founder_its_all_gone_to_crap/ [Accessed 14 Web browser’, The New York Times, 10 Feb-
July 2018] ruary. Retrieved from https://www.nytimes.
Owen, L.H. (2017) ‘Six years later, the Financial com/2016/02/11/business/dealbook/china-
Times is back in the App Store’, Nieman Lab, opera-kunlun-qihoo-golden-brick.html
7 August. Retrieved from http://www.nie- [Accessed 14 July 2018]
manlab.org/2017/08/six-years-later-the- Vakintis, I., and Panagiotakis, S. (2016) ‘Mid-
financial-times-is-back-in-the-app-store- dleware platform for mobile crowd-sensing
apple-still-wont-get-a-cut-of-subscriptions/ applications using HTML5 APIs and web
[Accessed 14 July 2018] technologies’, in C.X. Mavromoustakis, G.
EMERGENCE OF THE MOBILE WEB 311

Mastorakis, and J.M. Batalla (Eds), Internet www.wapforum.org/what/WAP_white_


of Things (IoT) in 5G mobile technologies. pages.pdf [Accessed 14 July 2018]
Chaim: Springer. pp. 231–274. W3C. (2014) ‘The Web and mobile devices’.
Vincent, J., and Haddon, L. (Eds) (2018) Smart- Retrieved from https://www.w3.org/Mobile/
phone Cultures. London and New York: [Accessed 14 July 2018]
Routledge. W3C (2005) W3C launches ‘Mobile web initia-
Wallace, P., with Hoffman, A., Scuka, D., Blut, Z., tive’. Retrieved from https://www.w3.org/
and Barrow, K. (2002) i-Mode developer’s 2005/05/mwi-pressrelease [Accessed 14 July
guide. Indianapolis, IN: Addison-Wesley. 2018]
Wallis, C. (2013) Technomobility in China: W3C (1998) WAP Forum – W3C Cooperation
Young Migrant Women and Mobile Phones. White Paper. Retrieved from https://www.
New York and London: NYU Press. w3.org/TR/NOTE-WAP [Accessed 14 July
Wang, J., and Cheng, C.-T. (2012) ‘History of 2018]
the mobile phone in China’, in R.W.-C. Chu, Yan, X. (2003) ‘Mobile data communications in
L. Fortunati, P.L. Law, and S. Yang (Eds), China’, Communications of the ACM,
Mobile Communication and Greater China. 46(12): 80–85.
New York: Routledge. pp. 64–79. Zakas, N.C. (2013) ‘The evolution of web
WAP Forum. (2000) ‘Wireless Application Pro- development for mobile devices’, Commni-
tocol: White paper’. Retrieved from https:// cations of the ACM, 55(4):42–48.
This page intentionally left blank
PART IV

Platforms on the Web


This page intentionally left blank
21
Wikipedia
Andy Famiglietti

INTRODUCTION Wikipedia’s significance extends far beyond


its visibility as an information source. For
Today Wikipedia is arguably one of the most these scholars, who focused on the economics
important information sources on the planet. of digital information production, Wikipedia
Wikipedia is the fifth most visited website quickly became the defining example of the
worldwide (Alexa Top 500 Global Sites, vast creative power they believed was being
n.d.). Wikipedia information has been cited unleashed by the internet. Yochai Benkler,
in a wide variety of high-stakes real-world in his influential account of the new media
implications, including peer-reviewed medi- economy, Wealth of Networks, refers to
cal literature (Bould et al., 2014) and judicial Wikipedia, along with several other examples,
opinions (Peoples, 2009). During the 2008 as evidence that ‘the networked environment
presidential race, the McCain campaign was makes possible a new modality of organizing
suspected of attempting to ‘clean-up’ the production: radically decentralized, collabo-
Wikipedia article of their vice-presidential rative, and nonproprietary; based on sharing
nominee, prior to the announcement of Sarah resources and outputs among widely dis-
Palin’s nomination (All Things Considered, tributed, loosely connected individuals who
2008). The struggle over the student use of cooperate with each other without relying
Wikipedia in classroom assignments has on either market signals or managerial com-
been so extensive that when Middlebury mands’ (Benkler, 2006: 60). Internet scholar
College’s history department banned the use and public intellectual Clay Shirky was even
of Wikipedia as a source for student essays it more emphatic in his use of Wikipedia as an
made national news (Cohen, 2007). exemplar of new media potential. In his short
However, for a significant cohort of early piece ‘Gin, Television, and Social Surplus’,
twenty-first-century new media scholars, Shirky argues that the internet will unleash
316 THE SAGE HANDBOOK OF WEB HISTORY

a ‘cognitive surplus’ by allowing individuals a historically contingent and specific project.


to make productive use of time they previ- To that end, I examine the early Wikipedia
ously wasted watching television. He meas- and show how the project grew out of a
ures the productive potential of this surplus specific historical heritage: that of the free
in terms of Wikipedia, claiming that if ‘the and open source software movement. This
internet-connected population’ devoted even movement, which developed in response to
1% of their TV-watching time to productive historically contingent dotcom era concerns
volunteer online activity, that would amount over top-down control of information, shaped
to ‘one hundred Wikipedia projects per year Wikipedia in important ways during its first
worth of participation’ (Shirky, 2012: 240). years of existence. Wikipedia inherited both
For Shirky, then, Wikipedia was not important resources from this movement, and
merely an example of the potential of new significant anxieties about information con-
media, it was a general-purpose unit for trol. These resources and anxieties have, in
measuring this potential. The application of turn, shaped Wikipedia’s own record of his-
Wikipedia as a template for what could be torical events. I explore this influence using
expected of a ‘free’ and ‘open’ internet went the Wikipedia article documenting the 2008
far beyond Shirky’s literal use of Wikipedia Gaza War as a case study.
as a unit of measure. Many scholars (Ghosh,
2006; Lessig, 2004; Tapscott and Williams,
2010; Zittrain, 2008) argued that Wikipedia’s
example could be generalized, and could pro- WIKIPEDIA AND THE FREE/OPEN
vide a model for understanding the econom- SOURCE SOFTWARE MOVEMENT
ics of internet-based information production
more broadly. These authors, whose work Free and open source software (or F/OSS)
was widely cited in debates surrounding cop- encompasses a diverse set of software pro-
yright, net neutrality, and other policy issues, jects. The source code of these projects is
argued that the Wikipedia model showed that released so that anyone with the requisite
free, open internet-based collaboration could knowledge can modify or redistribute the
create a fairer, freer, more democratic infor- program. It is easy to forget that F/OSS is a
mation economy. relatively recent phenomenon. The Google
From the vantage point of 2018, however, n-gram chart in Figure 21.1 below is one way
this belief that Wikipedia would provide a to visualize the sudden emergence of the
model for a new, better, more democratic larger F/OSS movement. Note how ‘open
information environment seems misplaced. source’ grows rapidly, while once common
Instead of ‘hundreds’ of Wikipedia-like, terms for earlier experiments with software
Wikipedia-scale projects, our contemporary that users could obtain at no cost decline.
internet is dominated by a handful of mas- The story of this emergence has been told
sive social media platforms. Wikipedia, and many times. It has been told by practition-
other collaborative internet projects, are ers within the F/OSS movement (Raymond,
understood not to be idealized spaces for 2000; Stallman, 2002; Torvalds, 2001), by
democratic participation, but spaces in which anthropologists embedded within the move-
historical forms of power and privilege oper- ment (Coleman, 2012; Kelty, 2008), and by
ate, sometimes in new and even more viru- legal scholars, economists, and others inter-
lent forms. This chapter seeks to correct our ested in the implications of F/OSS for their
understanding of Wikipedia, by moving fields (Benkler, 2006; Ekstrand et al., 2013;
away from understanding the project as a Lessig, c.1999; Weber, 2004). In this sec-
general-purpose template for new media and tion, I draw from these sources to construct
towards embracing Wikipedia’s existence as a summary of the history of F/OSS that
WIKIPEDIA 317

Figure 21.1: The rise of ‘open source’.

makes salient the conceptual and organiza- innovations (including the first free soft-
tional resources F/OSS provided the early ware license, the GNU GPL), the larger
Wikipedia project. I want to show how these F/OSS movement’s push against informa-
resources come out of a very particular his- tion being controlled as property would not
tory. Namely, the resources inherited by happen for nearly another 15 years. Chris
Wikipedia from F/OSS are shaped by his- Kelty’s account of the F/OSS movement in
torically specific concerns about centralized Two Bits helps us to understand why. Kelty
information control stemming from the rise establishes how F/OSS’s emergence into the
and fall of the ‘dotcom bubble’ at the close of public eye was propelled by the mid-1990s’
the twentieth century. move away from ‘free software’ and towards
Many within F/OSS stress its continuity ‘open source’. This move, which, Kelty
with earlier practices. For example, Richard argues, serves to shear the practices of non-
Stallman, founder of the Free Software property-based software production from
Foundation (FSF) and the GNU1 project, Stallman’s ethics of ‘freedom’, was driven in
describes his project’s origin as reaching no small part by the peculiar economics of
back to practices established in mid-twentieth-­ the dotcom era. As Kelty puts it, ‘the Internet
century computer science departments. giveaway was a conflict of propriety: ­hackers
When he was at the MIT AI lab in the 1970s, and geeks who had built the software that
he writes, ‘we did not call our software, “free made it all work, under the sign of making it
software”, because that term did not yet exist; free for all, were seeing that software gener-
but that is what it was. Whenever people from ate untold wealth for people who had not built
another university or a company wanted to it’ (Kelty, 2008: 110). Thus, even as the shift
port and use a program, we gladly let them’ from ‘free’ to ‘open source’ served to make
(Stallman, 2002: 15). While the acceptance F/OSS more ‘business friendly’, this shift
of software as something to be ‘shared’ in also represented a strategic move designed to
the 1970s was not as universal as Stallman’s ensure those producing information were not
quote here suggests, larger patterns of source- alienated from the fruits of their labor.
sharing in early academic computing have This commitment to using free licenses
been documented, especially in academic to prevent information producers from
contexts (Kelty, 2008; Levy, 1984; Turner, being expropriated by information owners
2006; Weber, 2004). was part of Wikipedia’s F/OSS inheritance.
While the FSF and GNU project would Wikipedia was initially released under the
go on to make important technical and legal GFDL, a documentation license developed
318 THE SAGE HANDBOOK OF WEB HISTORY

by the Free Software Foundation. As we will There are signs that Wales and Sanger actively
see, this license was more than just a legal courted attention from these sites, known for
document establishing the copyright status their ability to direct internet traffic. In a
of Wikipedia contributions; it was also a rhe- post to Wikipedia-L in March 2001, Sanger
torical tool that helped the early Wikipedia writes, ‘JimboWales is being interviewed by
organize and grow. Slashdot […]. He mentioned Wikipedia in
the interview, and we may experience a sub-
stantial surge in traffic’ (Sanger, 2001d).
Others were recruited by Wales from a
HOW F/OSS INHERITANCE SHAPES rival encyclopedia, GNUpedia2, started by
THE EARLY WIKIPEDIA GNU volunteers. I have recounted this story
in detail elsewhere (Famiglietti, 2011), but
In this section, I examine the F/OSS resources significant to my account of Wikipedia’s his-
inherited by the early Wikipedia. In particu- tory here is the way in which Wikipedia’s
lar, I am interested in the state of Wikipedia license serves Wales as a rhetorical tool,
during the project’s first two years of devel- allowing him to persuade GNUpedia vol-
opment, an era when the success of Wikipedia unteers to join Wikipedia. In one series of
was hardly assured, and the project built GNUpedia mailing list posts, Wales charac-
important policies and practices. In this terizes GNUpedia as a ‘fork’ (duplicate) of
analysis of the early Wikipedia, I draw off of Nupedia (Wikipedia is barely a week old at
several archival resources: the archives of the this time) and writes, ‘But since the license
Wikipedia-L and Intlwiki-L mailing lists, is compatible (the same) there is probably
and an internet archive Wayback Machine no reason to have two projects…’ (Wales,
snapshot of the state of Wikipedia.com in 2001a). The fact that the projects use the
April of 2001 (because Wikipedia switched same license becomes a way for Wales to
the wiki software it used to maintain the site convince GNUpedia volunteers that their
early in its existence, the archives of site effort is a wasteful duplication.
activity maintained on the current Wikipedia The successful recruitment of F/OSS hack-
via ‘history’ pages are unreliable during the ers was an important resource for the early
very early period of the site’s existence). The Wikipedia, since these volunteer workers
archives of the first two years of Wikipedia’s were accustomed to the F/OSS philosophy
existence contain ample evidence of how the and coordination style. However, it also
site inherited both a motivated user base, and helped sow the seeds of what would later
anxieties over information control from be understood as an important problem: the
F/OSS. Furthermore, they allow us to see gender skew of Wikipedia’s editor com-
how this inheritance would shape how the munity. As late as 2007, the Wikipedia edi-
project would work in a lasting way. tor community was overwhelmingly (over
It is no secret Wikipedia drew on the 90%) male. This gender imbalance stems in
idea of ‘openness’ as articulated in F/OSS. part from the gender imbalance in the F/OSS
However, the concept of ‘openness’ wasn’t community that Wikipedia relied on to build
all that Wikipedia borrowed; it also borrowed its initial editor base. F/OSS project contribu-
something much more tangible: a population tors are only 2% female (Ghosh et al., 2002).
of developers already exposed to F/OSS- As early as January 2002, Wikipedia editors
style organization. Some of these developers had started to notice the community’s gen-
entered the project after Wikipedia was writ- der imbalance. One female editor, posting to
ten up favorably on blogs like Slashdot and the list about an unrelated issue, asked, ‘Are
Kuro5hin, early in 2001 (Nupedia and Project there no other women taking part in this? I
Gutenberg Directors Answer – Slashdot, n.d.). feel a bit like I’ve snuck into the boys room’
WIKIPEDIA 319

(T., 2002). List members acknowledged the While Wikipedia editors were united in
project was heavily male, but also pointed to the process of trying to negotiate building
a number of high-profile women at work on a resource using an open content license in
Wikipedia. This focus on a few high-profile a world governed by copyright, they had a
women may have blinded them to the ongo- range of interpretations of copyright law’s
ing gender disparity. While early editors purpose and legitimacy. This can perhaps
may have not been particularly conscious of most clearly be seen in a March 2001 mail-
Wikipedia’s gender skew, it has since been ing list conversation between Sanger and
widely recognized (Reagle, 2012) that une- Wikipedia editor Lee Crocker. Crocker criti-
qual gender participation in Wikipedia is a cizes the current warning against includ-
significant complication of the site’s claim to ing copyrighted material in Wikipedia,
be a democratic community where ‘anyone which chastised editors, ‘DO NOT STEAL’.
can edit’. Crocker writes that this language is ‘person-
In addition to developers, Wikipedia ally offensive to me and other dedicated,
inherited the F/OSS movement’s anxie- moral, hard-working, law-abiding, intelligent
ties over information control. For many people working to abolish copyright law as it
early Wikipedia users, the dotcom bust is today’ (Crocker, 2001). Sanger responds,
(which early in 2001 was still in the pro- defending his language, and the broader
cess of happening) showed that information notion of intellectual property, saying, ‘There
control was a failed, but still dangerous, are excellent reasons why intellectual prop-
method of doing business online. While erty rights exist and have existed in one form
early Wikipedians were mostly confident or another in so many places and for so long;
that the GFDL limited their vulnerability probably the most important is that, without
to outside corporate actors capturing and them, artists and inventors lose an extremely
controlling their content, they had some important incentive’ (Sanger, 2001c). Wales,
concerns about the possibility of untoward for his part, responds by attempting to limit
actions by Wales’s company Bomis, which the debate on the mailing list, writing ‘O.k.,
hosted Wikipedia. These concerns would well, I’m thinking we shouldn’t argue too
ultimately help to propel Wikipedia’s shift much about this here on the Wikipedia list’,
to a non-profit business model. Just as and saying he has made a change to the
important, they would help to shape some notice that he hopes will address Crocker’s
key Wikipedia content policies. concerns (Wales, 2001d). Clearly, ideologi-
Copyright was one of the issues tightly tied cal positions on copyright varied within the
to anxieties over information control. Like early Wikipedia.
F/OSS coders before them, early Wikipedia Perhaps the most interesting opinion on
editors were keenly aware of copyright law copyright expressed in the debate above is
and perceived it as a possible stumbling Wales’s pragmatic centrist approach. Perhaps
block. During 2001 and 2002, Wikipedia-L the reason Wales was not terribly interested
would frequently discuss copyright issues, in engaging in a debate over the ideology
ranging from whether an editor could retain of copyright in spring 2001 is that the col-
some copyright protections for his work lapse of the dotcom economy had convinced
while simultaneously releasing ‘excerpts’ him traditional copyright protections were,
under the GFDL (Rybo, 2001), to concerns regardless of their philosophical foundations,
about Wikipedia users’ ability to re-use con- a practical liability for doing business on
tent under another content license (Sanger, the internet. Hints of such a position can be
2001b), to questions about the copyright pro- found in another March 2001 mailing list post
tections afforded to ‘a list of all the winners by Wales, in which he shares a Yahoo news
of major sporting events’ (Owen, 2001). article reporting the problems the traditional
320 THE SAGE HANDBOOK OF WEB HISTORY

encyclopedia Britannica was having building (Wales, 2001e). However, editors are not
online content. Wales writes, ‘Britannica’s entirely convinced. Instead, many users argue
woes will only deepen. It’s hard to com- that Wikipedia would be better off making
pete against free volunteer projects’ (Wales, its method of attribution more adaptable to
2001b). For Wales, long debates over copy- the needs of re-users. Another prominent
right’s legitimacy are simply irrelevant in an Wikipedia user at that time writes, ‘if we
economy where ‘free’ projects will naturally really want large websites to adopt Wikipedia
outcompete ‘non-free’ competitors. […], there is absolutely no way that we can
While Wales was confident the econom- hope to dictate layout decisions to them. Their
ics of the internet favored free projects like site designers will laugh us out the door’
Wikipedia over traditional content like (Boldt, 2001). Many Wikipedia editors are
Britannica, he elsewhere deploys concern keen to give re-users more freedom and are
over unpopular corporate actors captur- unconcerned with corporate re-use. Instead,
ing Wikipedia content for their own use as they suggest that attribution is less important
a rhetorical tool when debating Wikipedia than the wide spread of Wikipedia’s content
policy. One example of this use occurred in to many different users and hosts.
an October 2001 Wikipedia-L thread where This desire for Wikipedia content to
Wales debated Wikipedia’s interpretation of spread to a variety of places is perhaps a
the GFDL, the GNU-derived ‘free documen- reflection of the fact that the chief source
tation license’ that the project then used. At of anxiety over information control for early
the time Wikipedia was making somewhat Wikipedians was the Bomis company itself.
creative use of a clause in the GFDL that This ­anxiety frequently surfaced over the issue
allowed authors to specify an ‘invariant sec- of ­‘forking’, a F/OSS term for splitting a pro-
tion’ re-users had to copy verbatim when ject. Forking emerges as a concern within the
redistributing the work. While this clause had F/OSS community precisely because of the
originally been intended to allow authors of way source code is treated as a shared resource.
software documentation to include a stable Since no one entity ‘owns’ F/OSS code, any
page of author acknowledgments and other project member is potentially free to take that
information, Wikipedia in mid-2001 was code and use it to create their own version of
using the ‘invariant section’ clause to ask the project. Forking is seen by F/OSS com-
anyone re-using Wikipedia content to include munity members both as a source of freedom,
an HTML table of links back to the original since anyone unhappy with the direction of a
page on Wikipedia itself. For some Wikipedia project is free to leave and create something to
editors, this use of the GFDL represented an their own liking, and of danger, since splitting
over-reach. One objected that ‘the require- projects can split the pool of labor available to
ment is most likely in violation of the terms accomplish tasks and lead to duplicate work
of the [GFDL]’, and speculated, ‘requiring and incompatible results (Weber, 2004).
them to be in HTML seems to be violative Posts to the Wikipedia-L list show early
of the FDL as well. What if I wanted to do Wikipedia editors also see forking as a source
my website in some other markup language’ of both freedom and danger. These ideas sur-
(Kissane, 2001). face frequently in conversations about the
Wales responds, pushing back on the inter- potential for a future fork of Wikipedia. For
pretation of the exact permissions granted by example, in August 2001 one Wikipedia edi-
the GFDL (Wales, 2001c), and then posts a tor posts concerns about Bomis either failing
broader defense of requiring attribution, in or attempting to force users to pay for access
which he uses the threat of Microsoft cap- in the future, writing, ‘I feel that while wiki-
turing Wikipedia content and altering it as pedia is doing so great I’m anxious what will
a reason to maintain the invariant section it be in the future. How can we be sure that
WIKIPEDIA 321

they won’t shut down the server or make it to the historical moment of the dotcom crash.
payable [sic]. This is a commercial com- In an era where companies were frequently
pany after all, not the FSF’ (Jasiutowicz, failing, some believed it dangerous to give
2001). Another editor replies, reassuring, even a trusted company unilateral control
‘the GFDL allows you to download every- over important information.
thing and start your own server’ (Hidders, Anxiety over Bomis’s control over
2001). In another reply, an editor writes he Wikipedia did ultimately lead to a project
believes that, in order to fully comply with fork. The 2002 ‘Spanish Fork’ of Wikipedia
the GFDL, Wikipedia needs to be easier for occurred after Larry Sanger announced
a user to copy. He argues Wikipedia is not in Bomis might start running ads on Wikipedia
compliance with the GFDL requirement for to help restore his salary (which had been
a ‘transparent copy to be easily available’ cut due to declining revenue at Bomis). This
simply because ‘spidering’ Wikipedia (using fork, which is still well remembered in the
a script to follow each link on the site and Wikipedia community, has been documented
automatically download the linked page) is a in detail elsewhere (Tkacz, 2011). However,
possibility (Bihlmeyer, 2001). a brief exploration of the Spanish Fork helps
This exchange touches off a conversa- demonstrate the particular anxieties over
tion that would go on via the mailing list for information control that led to an actual
weeks and involve multiple Wikipedia edi- fork of Wikipedia. Furthermore, while the
tors, along with Sanger and Wales. Wales is Spanish Fork was not permanent, it did influ-
generally supportive of the idea of making ence Wikipedia’s policies in the longer term.
Wikipedia’s content easier to replicate, but The unhappiness of Spanish Fork partici-
he hopes those providing copies of Wikipedia pants was based on a sense that Bomis was
will ‘do so in a “read only” way’ (Wales, taking control of information they had pro-
2001f). Editor Bryce Harrington, one of the vided, against their wishes. Their concerns
editors who was persuaded by Wales to leave were informed by ideas of national identity
the GNUpedia project and join Wikipedia, and autonomy. An interview with Enyedy
writes ‘The reason many people got involved conducted in 2011 demonstrates how
(at the very least, *me*) was the willing- national identity informed his decision. In
ness to hold the content under the GFDL. a discussion with Nathanial Tkacz, Enyedy
As I see it, what folks are asking is simply to discusses how tension had existed between
deliver on the promises made at the outset’ the Spanish- and English-language Wikipedia
(Harrington, 2001a). In a later post, he dem- (which Enyedy refers to as the ‘American
onstrates concern that a business failure for Wikipedia’) from the beginning. He says that
Bomis could result in a loss of the Wikipedia the fact that the initial software and policy
project if forking isn’t technically easy: pages were all in English cast an ‘American
shadow’ over the international Wikipedia
while none of us have any wish to see Bromis [sic] projects. He further elaborates on this ten-
go out of business, we must admit that these days sion between the ‘American’ and Spanish
this is not unheard of. If Bromis were to go under,
and we did not have a backup of the site that the projects when asked about the relationship
community could resurrect quickly and easily, then between the Spanish Wikipedia and Larry
there is a question whether wikipedia would exist Sanger, saying, ‘The American Wikipedia
if Bromis did not (Harrington, 2001b). might have seen [Sanger] as a “facilitator”,
but we regarded Sanger more like an obsta-
Here, Harrington’s offhand remark that the cle. […] I have to admit that he brought
failure of Internet companies like Bomis is some good ideas to us, but the American
something that ‘these days […] is not unheard Wikipedia was too caught up in the interests
of’ ties his anxieties over information control of Bomis Inc’ (Tkacz and Enyedy, 2011).
322 THE SAGE HANDBOOK OF WEB HISTORY

Here, Enyedy explicitly equates Sanger, generate revenue to pay for content creation
‘Bomis Inc’, and the American Wikipedia, and would alienate critical unpaid content
suggesting that allowing the ‘American’ creators, a lesson they seem to have learned
dotcom company to control the Spanish from watching the failed business grabs of
Wikipedia project was unacceptable. For the dotcom era.
Enyedy, then, Wikipedia is not a general,
universal example of information produc-
tion, but a specific, American project. QUALITY ANXIETY AND WIKIPEDIA
How Wikipedia was shaped by the specific POLICY
history of the American dotcom crash is further
demonstrated by the larger impact of the Spanish
Without ad revenue to pay his salary, Sanger
Fork. Ultimately, the Spanish Fork would help
departed the Wikipedia project. This solves
to ensure Sanger’s exit from the Wikipedia pro-
one problem for Wikipedia by resolving vol-
ject, as the ad revenue that would have restored
unteer editors’ anxieties over misuse of the
his salary never materialized. This happened
information they were producing. However,
after a larger discussion about ads on Wikipedia.
it exacerbates another of the project’s anxie-
This discussion demonstrates how at least some
ties, namely concern over the ability of a
early Wikipedians thought about online busi-
‘free’ project to create high-quality informa-
ness models in the wake of the dotcom crash.
tion. Retracing how this anxiety shaped
One site of discussion was a page at meta.
Wikipedia policy helps us to understand how
wikipedia.com entitled ‘Making Wikipedia
the specific history that led to Sanger’s
Profitable’, where a variety of editors expressed
departure had a profound effect on the larger
their concerns about the ad-based plan to raise
Wikipedia environment.
revenue for Wikipedia. One writes, ‘I just wrote
Even very early on, Wikipedians were
a page on the dotcom death. The only sort of
aware of quality concerns surrounding their
financing that I believe in for a site like this is
project. For example, as of April 2001 the
a fat, one-time donation from someone who
‘Welcome Newcomers!’ page, intended to
has too much money. Don’t tax the users with
introduce new users to the project, included
subscription fees or advertisements. These plans
the following language:
are doomed’. Bryce Harrington, who Wales had
recruited from GNUpedia, writes an extensive Maybe you think that Wikipedia would end up
plea for a non-profit model for the site, in which being a rather low-quality product, since it’s open
he argues, ‘First, I think we could generally say to everyone. But perhaps it’s the fact that it is open
to everyone that makes a lot of these articles pretty
that if Wikipedia were to start making a profit,
good, and ever-improving. To alter a now-famous
some authors would feel cheated, or that the site catchphrase: ‘Given enough eyeballs, all errors are
had “sold out” and was no longer sufficiently shallow.’ We tend to cater to the highest common
community-controlled’. Another editor pref- denominator – ‘lower denominators’ tend politely
aced his opposition to ads by writing: not to touch articles they know nothing about!
There are a lot of Ph.D.’s and graduate students
and other very smart and knowledgeable people at
I am a recent slashdot inductee, and I have to say
work here – but everyone is welcome. (Wikipedia:
first of all that this site seems to me to be a fulfill-
Welcome, newcomers, 2001)
ment of the real promise of the internet, which is
not to make megabucks for corporations, but to
facilitate communication and sharing of ideas, and The ‘Welcome Newcomers’ page displays a
to make the world a better place in the process curious tension, attempting to at once reas-
(‘Making Wikipedia profitable’, 2002). sure users Wikipedia is a quality product, one
that ‘cater[s] to the highest common denomi-
All of these editors voiced the notion that nator’, while at the same time reassuring
adding advertisements would be unable to them ‘everyone is welcome’ to contribute.
WIKIPEDIA 323

One of the biggest tools allowing Wikipedia editors dealing with contested subject matter.
to navigate this tension is its ‘Verifiability’ The section below will demonstrate how use
policy. This policy, which grows out of lan- of WP:V allows editors to defer questions of
guage contained in the Neutral Point of View truth onto relatively simpler and more resolv-
policy, is summarized in its current version: able questions about existing accounts in
‘in Wikipedia, verifiability means that other secondary sources, while also reinforcing
people using the encyclopedia can check that existing power inequalities within Wikipedia.
the information comes from a reliable source. In particular, the case study demonstrates
Wikipedia does not publish original research. how Wikipedia finds compromises when
Its content is determined by previously pub- dealing with real-world conflicts, but also
lished information rather than the beliefs or how ‘Western’ sources of information are
experiences of its editors’ (Wikipedia, 2017). privileged by WP:V.
This policy allows Wikipedia to accept con- One example demonstrating how WP:V
tributions from a wide range of participants, allows editors to reach compromises was the
as its F/OSS-informed model demands, with- incredibly fraught debate over the listing of
out allowing those participants total author- ‘civilian’ casualties. In the case of this arti-
ity to decide on truth for the encyclopedia. cle, the problem of defining who was and
As we will see in the next section, while was not a ‘civilian’ casualty of the conflict
the policy allows Wikipedia to function in was a seemingly irreconcilable issue, and one
the face of its quality anxiety, it also influ- where Wikipedia was vulnerable to accusa-
ences what sorts of truth claims have validity tions of inaccuracy. Questions over defining
within the project. civilian casualties first surface in a post to
the article’s talk page dated December 29,
2008. In this post one editor challenges the
way civilian casualties of the conflict are then
THE GAZA WAR ARTICLE AS A being counted:
CASE STUDY IN THE INFLUENCE OF
‘VERIFIABILITY’ It seems to me like somebody is trying to put a spin
on things by stating that there are ‘29+ civilians’
dead among the 287 on the basis of an ABC article
Wikipedia’s Verifiability policy (Wikipedia
stating that among those dead, at least 20 were
shorthand WP:V) continues to play an impor- children and 9 were women. That is quite biased,
tant role in the process of producing informa- and should be changed ASAP. (Talk:Gaza War/
tion as practiced on Wikipedia. Since WP:V Archive 1 – Wikipedia, the free encyclopedia,
flows from Wikipedia’s attempt to resolve 2009).
the problem of creating a reliable resource
without centralized control, the way WP:V We can see in this quote two related ques-
shapes Wikipedia’s process can be seen as a tions editors of the Gaza War article would
part of the continuing influence of its particu- return to frequently. First there is a methodo-
lar F/OSS-inflected history. logical question: how should civilians be
To better understand how WP:V shapes counted? Can civilian casualty counts be
Wikipedia, we can examine how it was inferred by counting some other demographic
deployed in the case of the article document- category, such as women or children, or is
ing the 2008 Gaza War. This article provides a some other method needed? Second, and
useful case study for investigating the policy perhaps more difficult, is what could be
environment of the mature Wikipedia project. called a question of ontology. Given the com-
In particular, the fraught environment for the plicated political status of Gaza, the question
Gaza War article demonstrates the compro- of where the line between civilian and com-
mises and negotiations made by Wikipedia batant should be drawn in this conflict is one
324 THE SAGE HANDBOOK OF WEB HISTORY

Wikipedia editors find themselves unable to community, distinct from one’s moral sense
resolve. Two arguments made by editors of real-world conflict. His or her own moral
early in the article demonstrate just how dif- values say, ‘an occupying force killing some-
ficult it would be for Wikipedia editors to body, regardless of the situation, would best
attempt to reconcile definitions of ‘civilian’ be described as killing a victim of occupa-
based on ‘real-world’ ethical and legal con- tion’, but Nableezy stresses the need for
siderations. One editor, Wikifan12345, information included on Wikipedia to meet
argues that Hamas’s decentralized nature the shared values of the community, espe-
blurs the line between civilian and military cially the value placed on reliable sources.
casualties. He writes, ‘Hamas runs like a ter- Unlike Nableezy’s articulation of shared
rorist organization, in the sense that support values, NonZionist and Wikifan12345 con-
comes from many locales, homes, libraries, tinue to advocate on behalf of their own values,
things that we consider ordinary is often used to little avail. Both eventually stop trying to
there to conceal weapons/soldiers/etc.’ influence the course of this article. The cases
(Talk:Gaza War/Archive 4 – Wikipedia, the of both of these editors speak to the apparent
free encyclopedia, 2009), and thus he feels difficulty of being an advocate for a particular
the counts of ‘civilians’ being made by set of moral values on Wikipedia, without tak-
NGOs and Palestinian sources have been ing into consideration the shared values of the
unfairly inflated. In contrast, another editor, Wikipedia community itself, in particular the
NonZionist, argues, ‘Almost ALL shared values embedded in WP:V. This policy
Palestinians are civilians. If my home is also serves to project matters of values onto
invaded and I attempt to defend my family, sources. These sources, however, become a
do I lose my civilian status? International law source of values and a contested terrain for
recognizes the right of people under occupa- editors in and of themselves.
tion to RESIST’ (Talk:Gaza War/Archive 4 –
Wikipedia, the free encyclopedia, 2009).
Other editors quickly point out that these
arguments are irrelevant in the face of WP:V, RELIABLE SOURCES AS CONTESTED
yet heated debate over them grows to involve TERRAIN
several other editors and many more threads
of conversation. Despite repeated pleas, nei- This deferral of the difficult questions of truth
ther Wikifan12345 nor NonZionist has very onto sources, while clearly a useful way for
much luck in swaying their fellow editors Wikipedia editors to build effective compro-
to define ‘civilians’ in the way that they see mises on controversial matters, also raises
best. In one particularly telling moment, edi- difficult questions of its own. Namely, the
tor Nableezy tells NonZionist: question of what does, and does not, consti-
tute a ‘reliable’ source. Editors challenged the
NonZionist, I completely agree with you, but for reliability of sources cited by others on a regu-
the purposes of this article, unless we can find a
reliable source, and probably people would want lar basis. In addition to the debate over the
multiple sources, that make this point, there is not PCHR (Palestinian Center for Human Rights)
really any way of doing this. But as a philosophical and IDF (Israeli Defense Force) sources used
discussion, I do agree that in an occupying force for casualty count numbers discussed in the
killing somebody, regardless of the situation, previous section, heated discussion took place
would best be described as killing a victim of occu-
pation. (Talk:Gaza War/Archive 15 – Wikipedia, the over the use of such sources as a statement by
free encyclopedia, 2009) the official spokesperson of the Palestinian
Popular Resistance Committees (Talk:Gaza
Here we can see how Nableezy articulates a War/Archive 1 – Wikipedia, the free encyclo-
separate moral space for the Wikipedia pedia, 2009), the American Center for Law
WIKIPEDIA 325

and Justice (Talk:Gaza War/Archive sourcing. One such editor articulates his
6 – Wikipedia, the free encyclopedia, 2009), understanding of reliable sources as:
the Pakistani newspaper website Dawn.com
(Talk:Gaza War/Archive 6 – Wikipedia, the A reliable source can be partisan and non-neutral
[…] The more controversy around a topic, the
free encyclopedia, 2009), and the activist more need for verifiability. Hence, there needs to
website antiwar.com (Talk:Gaza War/Archive be more sources and more variety of POV in
16 – Wikipedia, the free encyclopedia, 2009). sources. (Talk:Gaza War/Archive 1 – Wikipedia, the
Over the course of these exchanges a pat- free encyclopedia, 2009)
tern emerges. Both Palestinian and Israeli
advocates argue against sources from the Given these three positions, what is the actual
opposite side that they feel are too extreme outcome of the contest over ‘reliable sources’
to be reliable. For example, Wikifan12345 for the Gaza War article? Because the lines of
objects to complaints that the article cites conflict here are closely tied to national iden-
too many Western sources by arguing, ‘Arab tity, as well as to the larger conflicts involving
and Palestinian media is a sham at best, and the supra-national formation of ‘the Western
I’m being generous here’ (Talk:Gaza War/ world’, investigating what countries are repre-
Archive 4 – Wikipedia, the free encyclo- sented by sources cited in the Gaza War article
pedia, 2009). In another instance, an editor might help to establish the relative success of
objects to the fact that, as he or she sees it, the pro-Israel, pro-Palestine, and pro-plurality
a section on ‘Palestinian Militant Activity’ is factions in shaping the article. Figure 21.2
slanted, since ‘almost every piece of informa- displays the countries of origin of these sources
tion here can be traced back to IDF sources’, in graphic form. As the graph shows quite
and argues that ‘we need to use more neu- clearly, the overwhelming majority of the
tral sources’ (Talk:Gaza War/Archive 29 – sources for this article were attributable to
Wikipedia, the free encyclopedia, 2009). Israel, the US, and the UK. Sources from occu-
A third group of editors positions itself as pied Palestine itself were very rare.
being between the two sides, and advocates There are several factors that may account
for pluralism and inclusiveness in Wikipedia for this disparity between ‘Western’ and

Figure 21.2 Citations in Gaza War article by country of origin.


Source: Wikipedia, the free encyclopedia (2010)
326 THE SAGE HANDBOOK OF WEB HISTORY

‘non-Western’ sources in the composition In particular, I have shown how the anxi-
of this article. The first is that the English ety over centralized information control in
Wikipedia’s Verifiability policy favors the post-dotcom moment gave Wikipedia’s
sources in English, as this renders them early volunteer editors tremendous leverage
easier for the majority of English Wikipedia in shaping the early Wikipedia. The early
editors to check up on. The second is the editor community was able to use this lever-
advantage Western sources enjoy in mate- age, via the Spanish Fork, to resist the impo-
rial conditions, internet access, and distribu- sition of advertising on their work. While
tion, which was noted by Wikipedia editors this editor autonomy allowed Wikipedia to
themselves during their discussion of sources become a non-profit information source, it
on several occasions. Finally, in the face of also meant that the initial, heavily male, edi-
consistent and often very hostile opposition tor population was able to ‘self-govern’ in
to non-Western sources as demonstrated ways that may have continued the ‘locker
by the actions of Wikifan12345 and others, room’ atmosphere reported by some early
described above, editors may have chosen to female Wikipedians, and that may still be
cite Western sources for facts they wished to encountered on the site today.
include, rather than go through the trouble Furthermore, the resistance to adding
of making the case for inclusion of non- advertisements to Wikipedia also meant
Western media. removing Larry Sanger from his paid role as
In any event, this lopsided inclusion of chief editor of the project. This removal exac-
Western sources undermines the expansive, erbated already existing anxieties about the
pluralist position advocated by some editors for quality of information Wikipedia could pro-
Wikipedia sources. It suggests strongly that, in duce. As we saw in the case of the 2008 Gaza
this way at least, Wikipedia’s content policies3, War article, while Wikipedia is able to suc-
policies shaped by Wikipedia’s need to manage cessfully handle these anxieties via policies
anxieties over the quality of ‘free’ information, like Verifiability, these same policies tend
make the site somewhat inherently conserva- to privilege ‘Western’ information sources
tive, bound to reflect historical inequalities of within the project
access and media power as it defers difficult Thus, we can see how Wikipedia’s F/OSS
questions of ‘truth’ to its sources. inheritance creates the particular encyclope-
dia project we see today. A better understand-
ing of the complex and contingent history of
Wikipedia should allow us to better under-
CONCLUSION: A HISTORICALLY stand the still living and evolving community
SPECIFIC AND CONTINGENT of Wikipedia going forward.
WIKIPEDIA

While the above is hardly an exhaustive


survey of the historical conditions that shaped Notes
the creation of Wikipedia, the evidence pre-
1  GNU, a recursive acronym that stands for ‘Gnu’s
sented establishes Wikipedia’s historically Not Unix’, is the name given to Stallman’s
specific and contingent status. Rather than an attempt to build a free clone of the Unix operat-
example of a general trend in the production ing system.
of information in our internet-connected 2  The GNUpedia project was a short-lived attempt
society, Wikipedia is the outcome of a spe- to build a free encyclopedia. While it’s impossible
to know what form this project would have ulti-
cific confluence of ideas and labor coming mately taken, had it survived, it was understood
from the F/OSS community in the era fol- by its volunteers as fundamentally an encyclope-
lowing the dotcom crash. dia project.
WIKIPEDIA 327

3  These content policies include WP:V, which we Enyedy E. (n.d.) [Intlwiki-l] Good luck with your
have been discussing, but also the Neutral Point wikiPAIDia. In: Intlwiki-l. Available at: http://
of View (NPOV) and No Original Research (NOR) git.net/ml/science.linguistics.wikipedia.inter-
policies, along with other policies and guidelines national/2002-02/msg00038.html (accessed
designed to shape Wikipedia’s content.
24 August 2017).
Famiglietti, A. (2011) The right to fork: A
historical survey of de/centralization in
­
­Wikipedia. In: Lovnik, G. and Tkacz, N. eds.
REFERENCES Critical Point of View: A Wikipedia Reader.
Amsterdam: Institute of Network Cultures,
Alexa Top 500 Global Sites (n.d.) Available at: pp. 296–308.
http://www.alexa.com/topsites (accessed 13 Ghosh R. (ed.) (2006) CODE: Collaborative
August 2017). Ownership and the Digital Economy. Cam-
All Things Considered (2008) Palin’s Wikipedia bridge, MA: The MIT Press.
Entry Gets Overhaul. NPR. Available at: Ghosh R.A., Glott R., Krieger B., et al. (2002)
http://www.npr.org/templates/story/story. Free/libre and open source software: Survey
php?storyId=94118849. and study. Part iv: ‘Survey of developers’.
Benkler Y. (2006) The Wealth of Networks: Available at: http://www. infonomics. nl/
How Social Production Transforms Markets FLOSS/report/FLOSS_Final4.pdf. Available at:
and Freedom. New Haven, CT: Yale Univer- http://www.math.unipd.it/∼bellio/FLOSS%20
sity Press. Final%20Report%20-%20Part%204%20
Bihlmeyer R. (2001) [Wikipedia-l] Wikipedia -%20Survey%20of%20Developers.pdf.
teamwork. Available at: https://lists.wikimedia. Harrington B. (2001a) [Wikipedia-l] Wikipedia
org/pipermail/wikipedia-l/2001-August/000316. teamwork. Available at: https://lists.wikimedia.
html (accessed 13 August 2017). org/pipermail/wikipedia-l/2001-August/000325.
Boldt A. (2001) [Wikipedia-l] GFDL and Wikipe- html (accessed 13 August 2017).
dia, II. Available at: https://lists.wikimedia. Harrington B. (2001b) [Wikipedia-l] Wikipedia
org/pipermail/wikipedia-l/2001-October/ teamwork. Available at: https://lists.wikimedia.
000644.html (accessed 13 August 2017). org/pipermail/wikipedia-l/2001-August/000326.
Bould M.D., Hladkowicz E.S., Pigford A.-A.E., html (accessed 13 August 2017).
et al. (2014) References that anyone can Hidders J. (2001) [Wikipedia-l] Wikipedia team-
edit: Review of Wikipedia citations in peer work. Available at: https://lists.wikimedia.org/
reviewed health science literature. BMJ 348: pipermail/wikipedia-l/2001-August/000312.
g1585. DOI: 10.1136/bmj.g1585. html (accessed 13 August 2017).
Cohen N. (2007) A History Department Bans Jasiutowicz K.P. (2001) [Wikipedia-l] Wikipedia
Citing Wikipedia as a Research Source. The teamwork. Available at: https://lists.wikimedia.
New York Times, 21 February. Available at: org/pipermail/wikipedia-l/2001-August/000306.
https://www.nytimes.com/2007/02/21/ html (accessed 13 August 2017).
education/21wikipedia.html (accessed 13 Kelty C. (2008) Two Bits: The Cultural Signifi-
August 2017). cance of Free Software. Durham, NC: Duke
Coleman E.G. (2012) Coding Freedom: The University Press.
Ethics and Aesthetics of Hacking. Princeton. Kissane S. (2001) [Wikipedia-l] GNU FDL & HTML
NJ: Princeton University Press. Table Requirement. Available at: https://lists.
Crocker L. (2001) [Wikipedia-l] ‘DO NOT wikimedia.org/pipermail/wikipedia-l2001-­
STEAL’. Available at: https://lists.wikimedia. October/ 000627.html (accessed 13 August 2017).
org/pipermail/wikipedia-l/2001-March/ Lessig L. (c.1999) Code: and other laws of
000038.html (accessed 13 August 2017). cyberspace. New York, NY: Basic Books.
Ekstrand V.S., Famiglietti A., and Nicole C. Lessig L. (2004) Free Culture: How Big Media
(2013) The Intensification of Copyright: Criti- Uses Technology and the Law to Lock down
cal Legal Activism in the Age of Digital Copy- Culture and Control Creativity. New York:
right. IDEA 53: 291. Penguin Press.
328 THE SAGE HANDBOOK OF WEB HISTORY

Levy S. (1984) Hackers: Heroes of the computer Stallman R.M. (2002) Free Software, Free Soci-
revolution. 1st ed. Garden City, NY: Anchor ety: Selected Essays of Richard M. Stallman.
Press/Doubleday. First Printing, First Edition. Gay J (ed.). Free
Making Wikipedia profitable: encyclopedia Software Foundation.
article from Wikipedia. (2002, March 2). T. L. (2002) [Wikipedia-l] Another copyright
Available at: http://web.archive.org/ issue. Available at: https://lists.wikimedia.
web/20020302113625/meta.wikipedia.com/ org/pipermail/wikipedia-l/2002-Janu -
wiki.phtml?title=Making+Wikipedia+profitable ary/001087.html (accessed 13 August 2017).
(accessed 13 August 2017) Talk:Gaza War/Archive 1 – Wikipedia, the free
Nupedia and Project Gutenberg Directors encyclopedia (2009) Available at: http://
Answer – Slashdot (n.d.) Available at: https:// en.wikipedia.org/wiki/Talk:Gaza_War/
news.slashdot.org/story/01/03/02/1422244/ Archive_1 (accessed 15 July 2010).
nupedia-and-project-gutenberg-directors- Talk:Gaza War/Archive 4 – Wikipedia, the free
answer (accessed 18 August 2017). encyclopedia (2009) Available at: http://
Owen G. (2001) [Wikipedia-l] Copyright ques- en.wikipedia.org/wiki/Talk:Gaza_War/
tion. In: Wikipedia-L. Available at: https://lists. Archive_4 (accessed 15 July 2010).
wikimedia.org/pipermail/wikipedia-l/2001- Talk:Gaza War/Archive 6 – Wikipedia, the free
June/000227.html (accessed 13 August 2017). encyclopedia (2009) Available at: http://
Peoples L.F. (2009) The citation of Wikipedia in en.wikipedia.org/wiki/Talk:Gaza_War/
judicial opinions. Yale JL & Tech 12: 1. Archive_6 (accessed 15 July 2010).
Raymond E.S. (2000) The Cathedral and the Talk:Gaza War/Archive 15 – Wikipedia, the free
Bazaar. Available at: http://catb.org/∼esr/ encyclopedia (2009) Available at: http://
writings/cathedral-bazaar/cathedral-bazaar/ en.wikipedia.org/wiki/Talk:Gaza_War/
(accessed 25 June 2010). Archive_15 (accessed 15 July 2010).
Reagle J. (2012) ‘Free as in sexist?’ Free culture Talk:Gaza War/Archive 16 – Wikipedia, the free
and the gender gap. First Monday 18(1). encyclopedia (2009) Available at: http://
DOI: 10.5210/fm.v18i1.4291. en.wikipedia.org/wiki/Talk:Gaza_War/
Rybo S. (2001) [Wikipedia-l] Copyright forking. Archive_16 (accessed 15 July 2010).
In: Wikipedia-L. Available at: https://lists.wiki- Talk:Gaza War/Archive 29 – Wikipedia, the free
media.org/pipermail/wikipedia-l/2001- encyclopedia (2009) Available at: http://
June/000171.html (accessed 13 August 2017). en.wikipedia.org/wiki/Talk:Gaza_War/
Sanger L. (2001a) [Nupedia-l] Let’s make a wiki. Archive_29 (accessed 15 July 2010).
In: Nupedia-l. Available at: http://web.archive. Tapscott D. and Williams A.D. (2010) Wikinom-
org/web/20030414014355/http://www.nupe- ics: How Mass Collaboration Changes Every-
dia.com:80/pipermail/nupedia-l/2001-Janu- thing. Expanded edition. New York, NY:
ary/000676.html (accessed 13 August 2017). Portfolio.
Sanger L. (2001b) [Wikipedia-l] License ques- Tkacz N. (2011) The Politics of Forking Paths.
tion. Available at: https://lists.wikimedia.org/ In: Lovink G. and Tkacz N. eds. Critical Point
pipermail/wikipedia-l/2001-June/000207. of View: A Wikipedia Reader. Amsterdam:
html (accessed 13 August 2017). Institute for Network Cultures, pp. 94–109.
Sanger L. (2001c) [Wikipedia-l] Response. Tkacz N. and Enyedy E. (2011) ‘Good Luck with
Available at: https://lists.wikimedia.org/ your wikiPAIDia’: Reflections on the 2002
pipermail/wikipedia-l/2001-March/000043. Fork of the Spanish Wikipedia. An interview
html (accessed 13 August 2017). with Edgar Enyedy. In: Lovink G. and Tkacz
Sanger L. (2001d) [Wikipedia-l] Slashdotted? N. eds. Critical Point of View: A Wikipedia
Available at: https://lists.wikimedia.org/ Reader. Amsterdam: Institute for Network
pipermail/wikipedia-l/2001-March/000025. Cultures, pp. 94–109.
html (accessed 13 August 2017). Torvalds L. (2001) Just for Fun: The Story of an
Shirky C. (2012) Gin, Television and Social Sur- Accidental Revolutionary. 1st ed. Diamond D.
plus. In: Mandiberg M. ed. The Social Media ed. New York, NY: HarperBusiness.
Reader. New York: New York University Turner F. (2006) From Counterculture to Cyber-
Press. culture: Stewart Brand, the Whole Earth
WIKIPEDIA 329

Network, and the Rise of Digital Utopianism. Wales J. (2001f) [Wikipedia-l] Wikipedia team-
Chicago, IL: University of Chicago Press. work. Available at: https://lists.wikimedia.
Wales J. (2001a) Re: [Bug-gnupedia] Nupedia. org/pipermail/wikipedia-l/2001-August/
In: Bug-Gnupedia. Available at: http://lists. 000357.html (accessed 13 August 2017).
gnu.org/archive/html/bug-gne/2001-01/ Weber S. (2004) The Success of Open
msg00108.html (accessed 7 June 2010). Source. Cambridge, MA: Harvard University
Wales J. (2001b) [Wikipedia-l] Britannica news. Press.
Available at: https://lists.wikimedia.org/ Wikipedia (2017) Wikipedia:Verifiability. Availa-
pipermail/wikipedia-l/2001-March/000044. ble at: https://en.wikipedia.org/w/index.php?ti
html (accessed 13 August 2017). tle=Wikipedia:Verifiability&oldid=795820786
Wales J. (2001c) [Wikipedia-l] GNU FDL & (accessed 13 August 2017).
HTML Table Requirement. Available at: Wikipedia, the free encyclopedia (2010) Gaza
https://lists.wikimedia.org/pipermail/wikipe- War (2008-2009) Available at: https://
dia-l/2001-October/000629.html (accessed en.wikipedia.org/w/index.php?title=Gaza_
13 August 2017). Wa r _ ( 2 0 0 8 % E 2 % 8 0 % 9 3 0 9 ) & o l d i d =
Wales J. (2001d) [Wikipedia-l] Response. Avail- 376497374 (accessed 15 July 2010).
able at: https://lists.wikimedia.org/pipermail/ Wikipedia: Welcome, newcomers (2001)
wikipedia-l/2001-March/000041.html ­Available at: https://web.archive.org/web/
(accessed 13 August 2017). 20010406105416/http://www.wikipedia.
Wales J. (2001e) [Wikipedia-l] Why an attribu- com:80/wiki/Welcome,_newcomers
tion requirement? Available at: https://lists. (accessed 13 August 2017).
wikimedia.org/pipermail/wikipedia-l/2001- Zittrain J. (2008) The Future of the Internet –
October/000630.html (accessed 13 August And How to Stop It. New Haven, CT: Yale
2017). University Press.
22
A Critical Political Economy of
Web Advertising History
Matthew Crain

The rise of web advertising over the past two Fischetti, 1999: 84). It was designed to
and a half decades has been meteoric. Global be open-ended, but was hardly optimized
online ad spending has risen steadily since to serve the marketing needs of business.
the World Wide Web’s creation, proving Support for advertising was not a standard
resilient in the face of two financial crises feature of web technology, nor was it par-
and generally tepid economic growth. ticularly welcome within early web cultures.
Consulting firm McKinsey & Company Yet today the web is saturated in commercial
(2015) predicts that ‘digital media’, which messaging and significant efforts are ongo-
includes the web and mobile platforms, will ing to enhance and extend digital advertising
account for more than 50% of worldwide capabilities. Is it simply the natural state of
advertising spending by 2019. The web has affairs that the web’s diffusion entails inte-
been the primary carrier of digital advertising gration within advertising systems? A look at
and its expansion has been accompanied by a the origins of web advertising in the United
great build-up of consumer data collection States suggests otherwise. The capacities for
capacity. For a majority of users, pervasive advertising and consumer monitoring had
advertising and monitoring are now default to be constructed along technical, but also
components of web engagement. ­political economic, lines.
As internet access continues its uneven Using the United States as a case study,
proliferation, web advertising seems to grow this chapter outlines the history of web
apace. But why is this so? The web was advertising from a critical political economy
created as an information retrieval tool and of media (CPE) approach. While web adver-
released into the public domain in the hope tising does not have a sole country of ori-
that it might become a ‘universal medium gin, US companies, in partnership with the
for sharing information’ (Berners-Lee and federal government, were among the first to
A CRITICAL POLITICAL ECONOMY OF WEB ADVERTISING HISTORY 331

bring advertising to ‘cyberspace’. As part advertising is that it grew out of the century-
of the broader privatization of the internet, in-the-making interdependency of media and
policy-makers sought to position American marketing within capitalism. Adapting a term
businesses at the forefront of the web’s global from John Sinclair (2016), a ‘marketing/
commercial expansion. The US government media complex’ emerged in the late
embraced a hands-off regulatory approach to ­nineteenth century as manufacturers, retail-
web advertising, hoping to bolster the indus- ers, advertising agencies, and commercial
try’s early growth. Throughout the 1990s, US media outlets found common interest in
companies developed leading technologies, building national consumer markets. These
standards, and practices that brought fringe entities grew symbiotic as markets matured
web advertising markets into the mainstream. and advertising became a cornerstone of cor-
At the time of this writing, US-based trans- porate strategy. In increasingly prevalent oli-
national corporations dominate the global gopoly scenarios, advertising functioned as a
digital advertising sector. At the forefront are barrier to would-be competitors and a means
Google and Facebook, both of which have of brand maintenance. Advertising expendi-
come to stand among the world’s most valu- tures grew rather quickly to account for
able companies by pushing the technical and around 2% of US GDP and have remained
political boundaries of web advertising and relatively stable ever since. Large swaths of
consumer surveillance. In so doing, they have the media sector became reliant on advertis-
helped to solidify advertising’s place at the ing revenues and, on the whole, business was
center of the digital media economy. CPE good. Media empires were forged as adver-
provides valuable insights into the historical tising became a ‘leading edge of global con-
roots of this state of affairs. sumerism’ (Schiller, 1969: 13), serving the
ideological and market-building needs of an
astonishingly productive corporate industrial
economy.
A CRITICAL POLITICAL ECONOMY In a word, advertising became integral to
APPROACH industrial capitalism and evolved in relation
to its overarching political economic cur-
One of the primary aims of CPE is to clarify rents. A rich CPE literature chronicles these
how media and communications systems developments, unearthing the contested pro-
work in relation to larger structures of politi- cesses whereby marketing imperatives came
cal and economic power (Hardy, 2014; to govern the structure and content of succes-
Mosco, 2009; Wasko et al., 2011; Winseck sive media systems and highlighting atten-
and Jin, 2011). Historical analysis is founda- dant social problems including constraints
tional to this effort because it denaturalizes on journalism, class bias of media fare, and
prevailing institutional arrangements and deepening commercialism (Baldasty, 1992;
social relations, showing the structural forces McChesney, 1993; Ohmann, 1996). From
and human agency at work in the construc- this perspective, the history of mass media is
tion of media systems. In her classic study of intertwined with the history of creating large
social construction of technology, Carolyn markets for consumer goods and services.
Marvin (1988) unpacks the history of elec- Advertising took a variety of forms, but mass
tronic communication by taking readers back marketing became the prevailing strategy
to the moments When Old Technologies Were in alignment with the affordances of indus-
New. In a complementary fashion, CPE trial printing and broadcasting technologies,
posits that it is also necessary to consider i.e. mass communication. As new informa-
what is ‘old’ about new technologies. tion communication technologies devel-
Foremost among what is old about web oped, particularly computers and advanced
332 THE SAGE HANDBOOK OF WEB HISTORY

telecommunications networks, the market- The takeaway here is that public policy has
ing/media complex responded with renewed always been fundamental to media system
dynamism, seeking to exploit emerging busi- development and that, despite strong struc-
ness opportunities and evade destructive tural pressures towards commercialization,
competition. This lineage is the starting point there are real political choices to be made,
for a CPE analysis of web advertising history. especially during a platform’s formative
Public policy is a focal point for CPE, years (McChesney, 2007; Starr, 2004). The
which emphasizes the central role of poli- World Wide Web is no exception. Web adver-
tics in media development. Successive tising’s history is in many ways the story of
media and advertising systems have been the internet’s assimilation into the capitalist
heavily shaped by the formative policy political economy. At the same time, impor-
decisions that Paul Starr (2004) calls ‘con- tant elements of web advertising’s construc-
stitutive choices’. For example, legislation, tion can be attributed to variously contested
regulation, and government subsidy were policy choices, rather than inevitable tech-
foundational to the establishment of com- nological advance or market predestination.
mercial broadcasting in the United States, The balance of this chapter demonstrates
particularly in the form of the Radio Act of the CPE approach by highlighting how
1927 and Communications Act of 1934. It ­public policy-making, financial investment,
was the Federal Radio Commission/Federal and the structural imperatives of capitalism
Communications Commission, at the behest shaped the web’s formative moments, drove
of Congress and with executive branch back- the rapid build-up of online advertising, and­
ing, that ‘cleared the dial’ of many public propelled consumer monitoring.
and non-profit broadcasters to give exclusive
licenses (for free) to a commercial broadcast
oligopoly owned by some of the nation’s
most powerful technology companies. Early FOUR ‘STAGES’ OF WEB ADVERTISING
policies often have structuring ‘path depend-
ence’ effects on subsequent system devel- The history of web advertising in the United
opment. Television’s brisk subsumption by States can be mapped into several cascading
commercial radio broadcasters is one exam- stages of development: electronic billboards,
ple, though CPE scholars point out that early ad networks, search advertising, and surveil-
commercial broadcasting was highly contro- lance advertising. Of course, reality has a
versial, as evidenced not only by organized habit of being too complex to fit neatly into
citizen opposition, but also by the decisions distinct categories. There was and remains a
of peer nations like Great Britain to reject great deal of experimentation, investment,
advertising and establish alternative public and conflict within the institutions, technolo-
broadcasting models (McChesney, 1993). gies, and practices of web advertising. A new
CPE attends to such complexities by look- historical ‘stage’ does not come along and
ing for moments of contestation and putting simply replace its predecessor. Instead, it is
policy-making to questions of ‘for whom and useful to think of these categories as trajecto-
for what’ (Schiller, 1978). ries, progressing in varying degrees of over-
There are numerous examples of the US lap, sometimes in opposition, but generally
government’s historical stewardship of the in an additive fashion. Tim Berners-Lee
marketing/media complex, from subsidizing (1999) noted that the web’s technical proto-
basic communication technology research cols were established by means of ‘accre-
to chameleon-like public interest regula- tion’. The same is true of web advertising in
tions to a tax code that allows companies to that contemporary practices reflect an amal-
write off advertising as a business expense. gamation of prior developments.
A CRITICAL POLITICAL ECONOMY OF WEB ADVERTISING HISTORY 333

The first three stages come from the This extended to the advertising industry,
United States’ pre-broadband era, roughly members of which quite literally wrote a
the mid 1990s to the mid 2000s, during laissez-faire approach to web advertising into
which the economy was overtaken by a mas- the administration’s 1997 internet policy
sive boom and bust of speculative investment manifesto, A Global Framework for
that centered on information and communi- Electronic Commerce. Notwithstanding
cation technologies: the dotcom bubble. The exceptions concerning encryption and regu-
web advertising trends of this period have lation of ‘indecent’ content, the government
since converged around the collection and made good on its promise to ‘let the private
exchange of consumer information for appli- sector lead’. This established a baseline of
cation to a wide range of digital marketing so-called ‘self-regulation’ for the web adver-
activities. Terms like ‘one-to-one marketing’ tising industry, a regulatory approach Des
and ‘big data’ have been used to describe Freedman (2014) describes as ‘negative
such practices, which signify the fourth and policy’, a form of non-intervention where the
current stage of development. I use the term private sector charts its own course relatively
surveillance advertising to emphasize that free from public oversight. These measures
targeted messaging and consumer profiling fell under the presiding logic of what schol-
are now at the core of digital advertising. ars have variously described as ‘marketiza-
Surveillance also suggests a power imbal- tion’ (Hesmondhalgh, 2013), ‘corporate
ance among the watched and watchers that libertarianism’ (Pickard, 2015), and, more
reflects a troubling disparity of control over broadly, ‘neoliberalism’ (Harvey, 2005).
contemporary advertising data practices. Despite a favorable policy environment,
online advertising did not advance smoothly.
The first web advertisement is usually attrib-
uted to the online tech magazine HotWired.
ELECTRONIC BILLBOARDS AND com in the fall of 1994; however, marketers
CORPORATE HOME PAGES had been experimenting with older ‘interac-
tive services’ for at least a decade (Mosco,
By the time the web came on the scene in the 1982). Though limited in scope, commercial
early 1990s, the multi-faceted privatization messaging appeared on early data transmis-
of the larger internet was well underway sion systems like teletext and videotex, bulle-
(Abbate, 1999; Greenstein, 2017). In the tin board services like Usenet, and to a greater
midst of a recession, policy-makers at the extent on commercial online services such as
highest levels of government sought to cata- CompuServe, Prodigy, and America Online
lyze economic growth through privatizing (AOL). For our purposes, all of these efforts
and deregulating finance and telecommuni- fall under the electronic billboard stage,
cations. There was bi-partisan support among whereby primarily static ad messages were
policy-makers for the commercial develop- placed in front of audiences as they navigated
ment of what was often called the ‘informa- through content. The most prevalent format
tion superhighway’. President Bill Clinton’s was the banner ad, known in the industry as
administration, taking power in 1993, made ‘display advertising’ because it mixed text
private-sector investment and control the and graphical elements in a manner similar
cornerstone of federal internet policy, which to print and outdoor advertising. But web
enabled web advertising to flourish. Major banners went beyond existing forms by add-
technology and media companies were ing layers of interactivity, the most notable
afforded high-level access to policy-making of which was the click-through function.
processes and were ultimately given broad HotWired.com’s famous first banner was a
leeway to develop the web as they saw fit. partnership with AT&T that read: ‘Have you
334 THE SAGE HANDBOOK OF WEB HISTORY

ever clicked your mouse right HERE? You standard business practices to grease the
will’. Users who clicked were transported to wheels of ad sales. Web publishers lacked
AT&T’s website, which, along with sparse sales staff and technical expertise to imple-
information about long-distance telephone ment banner campaigns. For marketers, it was
services, featured hyperlinks to a handful of difficult to reach users at scale and to measure
websites created by fine art museums. the impact of advertising outlays. Despite
The unpolished and scattershot nature of attempts from television ratings companies
early banners and corporate sites reflected like Nielson to establish audience metrics sys-
the medium’s unfamiliarity, but also exposed tems on the new medium, it proved difficult to
the ambivalence among marketers regard- build a consensus about how ads should be
ing the web’s utility as an ad channel. The bought, sold, and evaluated.
broader ‘information superhighway’ was still Finding opportunity in this disorder, a new
shaking out and it was by no means certain breed of advertising company emerged: the
that the web would prevail over compet- ad network. Blending well-established prac-
ing systems such as AOL’s ‘walled garden’ tices of ad sales outsourcing with the web’s
online service or the cable industry’s pilot capacity for multi-directional communica-
programs for ‘interactive television’. As a tion, ad networks positioned themselves as
result, very few marketers spent any money intermediaries between web publishers look-
on the web in the mid 1990s and those that ing to sell ad inventory and marketers seeking
did only carved out a fraction of their ad sizable audiences. The ‘third-party’ ad net-
budgets to test the waters (Turow, 2006). A work strategy relied on centralized ad serv-
handful of ‘digital ad agencies’ cropped up ing systems to manage banner delivery across
to help marketers experiment on the web, but bundles of disparate websites, an innovation
many traditional agencies remained cautious enabled by the distributed nature of the web’s
about the new interactive landscape. In 1995 communication protocols. Web publishers
web ad spending barely registered on the could use their own servers to host content,
scale compared with more established media, while ad networks hosted and delivered the
but rapid growth was just around the corner. ads from afar. By building their own distribu-
tion infrastructure, ad networks offered pub-
lishers fully outsourced advertising services,
easing the burdens of labor and technical
AD NETWORKS AND THE DOTCOM expertise and effectively lowering barriers to
BUBBLE participation in the web advertising market.
Various iterations of outsourced ad services
By 1996 it was clear that the web would proliferated and were utilized by most major
emerge as the winning interactive platform for publishers, from start-ups like Yahoo to
popular use, due in no small part to Netscape’s established media companies like NBC and
‘killer app’, the graphical web browser and the Wall Street Journal. Leading ad networks
competitive internet service provision mar- such as DoubleClick and MatchLogic were
kets. Commercial online services were com- able to aggregate far more users than any
pelled to open their walled gardens, giving single publisher and thereby brought the first
millions of users new access to the open web. iteration of large-scale advertising to the web.
The dotcom financial bubble funded a host of These logistical improvements were sig-
web start-ups seeking to draw users to their nificant, but the industry had still other
sites, which generated the first big wave of equally vexing problems. As banners spread
demand for web advertising (Crain, 2014). their novelty quickly wore thin. It was some-
But the young industry was plagued by logis- thing of an open secret among publishers
tical problems. There was an absence of and ad networks that the vast majority of
A CRITICAL POLITICAL ECONOMY OF WEB ADVERTISING HISTORY 335

users never clicked on ads. This was espe- The web advertising industry seized upon
cially troublesome because much of the hype cookies as a means to gather information
surrounding the web’s commercialization about consumers to inform ad targeting.
hinged upon its interactivity, which was sup- Though disquiet persisted regarding whether
posed to enable marketers to engage consum- clicks or impressions were the most appropri-
ers directly rather than simply shout in their ate metrics, it quickly became standard prac-
general direction. Without interactivity the tice to pair tracking cookies with banner ad
low-bandwidth web seemed a poor substitute delivery. Ad networks led this charge as they
for existing branding platforms like televi- sought to leverage the scale of their distribu-
sion. As marketers began to complain about tion networks to offer new forms of targeted
dismal click-through rates, a flurry of activity banner advertising across their partner sites.
centered on ways to move ‘beyond the ban- To achieve these goals ad networks devel-
ner’. There were attempts to jazz up ads with oped proprietary ad serving technologies that
‘rich media’ experiences and pop-up formats used databases and algorithms to store, com-
that were harder to ignore, but the idea that bine, and deploy consumer data for targeted
gained the most traction was that ads simply advertising. As early as 1997, DoubleClick’s
needed to be more ‘relevant’ to consumers. DART (Dynamic Advertising, Reporting,
Through much trial and error, greater per- and Targeting) system could serve targeted
sonalization of messaging was positioned ads in near real-time by cross-referencing its
as a solution for making advertising work profile databases with information collected
on the web. Of course, these efforts required on the fly. The company’s tagline during this
increased knowledge about web users, which period spoke of delivering the ‘right mes-
dovetailed with emerging needs for data col- sage to the right person at the right time’.
lection and user identification in the nascent It is important to note that the data collec-
online retailing and banking sectors. tion and ad targeting practices implemented
The web’s broader commercialization in the 1990s were rudimentary by today’s
impelled its transformation from an anony- standards. Information gathering was largely
mous to an identifying platform. Without limited to standard browser meta-data like IP
delving too deeply into the technical details, addresses and time stamps, which could be
the web’s data protocols had originally been strung together to create records of browsing
designed to facilitate series of discrete com- history, but were bounded by a range of tech-
munications, rather than persistent connec- nical and organizational factors.
tions. This made web browsing anonymous, Nonetheless, the ad network stage repre-
but limited the scope of applications, espe- sented web advertising’s first generational
cially those of a commercial nature. For leap. Early ad networks solved basic logis-
example, in order for online shopping to tical problems and pioneered not only tar-
function, websites had to recognize that a geted advertising, but targeted advertising at
given series of actions (like putting items scale in which every ad served was also an
into a virtual shopping cart) were connected opportunity to gather consumer information.
to a single user. The commercial web needed Moreover, since third-party tracking was
the ability to collect and store user data. It implemented behind-the-scenes, most web
needed a memory. Netscape developed an users remained oblivious. These components
elegant solution in the HTTP cookie, which would become important building blocks of
gave web browsers a unique identifier and the contemporary surveillance advertising
enabled a new frontier of data collection model, which integrates targeting and profil-
practices. Released as an open technical ing across the gamut of advertising practices.
standard, cookies were rapidly adopted by As consumer data increasingly occupied the
major browser makers and websites. center of the web advertising economy, the
336 THE SAGE HANDBOOK OF WEB HISTORY

brunt of the industry’s technical, organiza- growing user bases, often partnering with ad
tional, and, as we shall see, political efforts networks to get their start. As portals gave
went towards deepening and expanding web way to the more user-directed and compre-
surveillance. By 2000, a cadre of top-tier ad hensive search engine model, search engines
networks were serving billions of ads per day like Infoseek, GoTo (later called Overture),
across thousands of popular websites and and Google developed paid search advertis-
building large profile databases to improve ing as an alternative to the ad network
their targeting capacities. Though much of banner model.
this activity was based in the United States, Search ads, like web advertising more
DoubleClick in particular worked to glo- broadly, exhibited many variations but coa-
balize its reach, creating sales offices and lesced in the early 2000s around the approach
operating partnerships in some 30 countries. advanced by Google, far and away the sec-
The US government’s stewarding of the tor’s most successful company. Like the ad
dotcom investment bubble was a key policy network approach, search advertising uti-
program that impacted this stage of web lized sophisticated software and hardware
advertising’s development, funneling large and hinged upon the promise of making ads
amounts of capital to both the supply and relevant to consumers. But instead of target-
demand sides of the nascent industry. Most ing ads based on inferences made from stores
concretely, ad networks like DoubleClick of consumer data, search advertising used the
used venture capital and sky-high stock valu- search terms keyed in by users. For example,
ations to pursue aggressive growth strategies, a person using Google’s search engine to
roll out new services, acquire competitors, research a trip to Yellowstone National Park
and invest in infrastructure, all while operat- might see ads for nearby hotels or campsites
ing at losses. On the demand side, start-ups alongside their search results. While the ad
were among the web’s biggest ad spenders. network model was growing increasingly
Venture capitalists, eager to maximize returns complex and multivariate, search advertis-
on dotcom investments, used their mana- ing emphasized simplicity and speed. Google
gerial power to direct resources to ad cam- heavily monitored the format and quality of
paigns in order to build market share and ‘get its ads, limiting them to text only and weed-
big fast’, increasing valuations before public ing out misleading and poorly executed
stock offerings and buyouts. These activities appeals. Google also demarcated paid adver-
accelerated the construction of web advertis- tising from so-called ‘organic’ results, help-
ing markets and legitimized the medium at a ing to build user trust. Importantly, search
time when many traditional marketers were advertising also introduced major changes in
still ambivalent about the web’s prospects as the ways that web ads were bought and sold.
a sales channel. Banners were generally peddled on a cost-
per-impression basis at a negotiated rate, so
marketers paid for every ad delivered regard-
less of whether users clicked or not. Search
SEARCH ADVERTISING ads came to be sold via auctions on a cost-
per-click basis, meaning marketers bid on
Search advertising developed in parallel to the rights to display ads in conjunction with
the ad network model. As the number of web search terms of their choosing and only paid
users and websites increased, portals and when an ad was clicked.
search engines emerged to organize and Many marketers were enticed by search
curate the online experience. Companies like advertising’s contextual approach to target-
Yahoo, AltaVista, and Lycos experimented ing and the cost-per-click pricing scheme
with banner advertising to monetize their in particular. A group of national marketers
A CRITICAL POLITICAL ECONOMY OF WEB ADVERTISING HISTORY 337

led by Procter & Gamble had already been consumer data collection. While Google ana-
pushing for cost-per-click pricing since lyzed user data to improve its search engine
web advertising’s early days. Responding and other services, it did not monitor and
to these demands, contextual search ads profile users for advertising purposes like
moved away from impression-based pric- DoubleClick.
ing and placed greater emphasis on measur- In the second half of the 2000s the distinc-
able results. Finding early success with paid tions between targeted display and search
search ads on its own sites, in 2003 Google advertising fell away, most literally when
took a page from the ad network playbook Google acquired DoubleClick in 2007. After
and created a program called AdSense that a bidding war with Microsoft (which was rap-
enabled any web publisher to host Google idly advancing into web advertising), Google
contextual ads, broadening its reach consid- bought DoubleClick at a $1 billion premium
erably. Search advertising exploded in the over its estimated valuation. No doubt the
early 2000s, quickly growing to account for search giant wanted to move into the dis-
40% of all web advertising expenditures, play advertising market, but also up for grabs
while banner advertising began to level off were DoubleClick’s massive trove of con-
(Pricewaterhouse Coopers, 2005). Google sumer data and surveillance infrastructure.
rapidly became web advertising’s most domi- Soon after the acquisition, Google reversed
nant company, capturing not only the lion’s its policy on collecting consumer informa-
share of the search advertising market but a tion for advertising purposes and in the years
significant chunk of the entire online adver- since has integrated surveillance into the
tising sector. Google’s incredible success in core of its operations, including its flagship
the early 2000s seemed to suggest that web search advertising products. Google’s buyout
advertising could work without relying on of DoubleClick was a high-profile marker for
consumer surveillance. web advertising’s industry-wide embrace of
consumer surveillance.
Surveillance advertising gathered momen-
tum along various fronts in the second
SURVEILLANCE ADVERTISING: half of the 2000s. Google’s acquisition of
PROFILES, PLATFORMS, AND DATA DoubleClick paralleled a number of similar
FUSION mergers, with Microsoft, AOL, Yahoo, and
the advertising holding giant WPP all buying
After a brief but dramatic stall in the wake of major ad networks with core competencies in
the dotcom stock market crash, web advertis- consumer monitoring. Again, policy impacted
ing resumed strong growth, outpacing all these institutional changes. The largest of
other US media sectors. By the mid 2000s these mergers raised anti-trust concerns, trig-
the two major thrusts in web advertising gering reviews and subsequent approvals by
were paid search, grounded in contextual the Federal Trade Commission. The contin-
placement, and targeted display, which relied ued diffusion of broadband internet service,
upon consumer monitoring. Together these which reached over 50% of US households
formats accounted for three-quarters of in 2007, enabled bandwidth-intensive appli-
industry revenues (Pricewaterhouse Coopers, cations like video streaming to flourish
2005). The archetypes were Google and the (Organization for Economic Co-operation
ad network DoubleClick, which emerged and Development, 2011). User-generated
from the dotcom stock crash considerably video sites like YouTube achieved popularity,
leaner, but newly profitable. Each company as did hubs for commercial content like Hulu,
relied on scale to achieve ‘relevance’ in ad a joint partnership between major television
targeting, but took different approaches to networks. Video presented opportunities for
338 THE SAGE HANDBOOK OF WEB HISTORY

marketers to bring familiar TV ad formats tools to make highly specific adjustments as


online, which were then augmented by the needed. These efforts have lowered barriers
surveillance-based targeting methods of ban- to participation in surveillance advertising,
ner and search advertising. For example, after effectively turning the collection and moneti-
purchasing YouTube in 2006, Google began zation of consumer data into an ‘app’ acces-
to integrate targeted advertising services into sible to anyone on the web.
the video platform, building out new capa- The trajectory of surveillance not only
bilities over time. Today marketers can target broadens, but also deepens as companies col-
YouTube ads based on Google’s profiles of lect new forms and greater quantities of data.
individual users, which include information Moving beyond HTTP cookies, the industry
like web search histories, demographics, and has developed myriad new types of ‘digi-
interest categories. tal fingerprinting’ methods to monitor web
Social networking services character- users, embedding surveillance into technical
ized by sites like MySpace, Facebook, and architectures of web communication such as
Twitter also factored heavily into the devel- the flash video format. Another major trend
opment and normalization of surveillance is what might be called data fusion, whereby
advertising. Immensely popular with web various entities collaborate to merge dispa-
users, social networks amassed vast stores of rate consumer information for marketing pur-
personal information that could be deployed poses. The biggest development along these
to inform advertising campaigns, including lines has been the combination of online and
data on demographics, attitudes, and social offline data, including personally identifiable
connections – what Facebook CEO Mark information such as names and addresses, to
Zuckerberg called the ‘social graph’. The core ‘close the loop’ between advertising cam-
‘value-added’ from social networks stemmed paigns and consumer behaviors like retail
from their arguably superior capacities to transactions and movement through physical
collect and deploy consumer data, and the space. For example, Facebook partners with
explosive growth of Facebook in particular, third-party data brokers to help marketers link
which amassed over one billion worldwide their ad campaigns to product purchases. One
users in its first decade, put strong competi- way this is accomplished is by tracking the
tive pressure on the entire web advertising movements of users who have downloaded a
industry to ramp up data collection efforts. Facebook-owned application to their mobile
As digital media moved from the fringes to device and cross-referencing this data with
the center of the ‘marketing mix’, the indus- ad campaign metrics. Here web advertising
try pursued several threads that had been per- becomes increasingly indistinguishable from
colating since the 1990s, but had not achieved activities like credit reporting and consumer
widespread implementation. Many of the big- information reselling, business sectors that
gest players adopted a ‘platform approach’, took hold decades before the web’s creation,
brokering a broadening array of advertising but have accelerated in recent years.
transactions among publishers, marketers, ‘Negative policy’ (Freedman, 2014) has
and ad agencies, all grounded in the collec- been instrumental in enabling surveillance
tion and exchange of increasingly detailed practices to flourish. As data collection
consumer information. Google, Facebook, became more prevalent, civil liberties groups
and their competitors prioritized ease of use, and journalists began to put public pressure
emphasizing simple set-ups, low-budget on web advertising companies to address
options, automation, and customization. The mounting privacy concerns. Privacy policy
recent trend of ‘programmatic’ advertising has been at the forefront of web advertising’s
aims to automate much of the ad buying pro- political agenda ever since. An early backlash
cess while giving campaign managers the against the combination of offline and online
A CRITICAL POLITICAL ECONOMY OF WEB ADVERTISING HISTORY 339

data in the late 1990s produced a potential cri- (Nordrum, 2016). On a global scale, from
sis for the industry as an emergent advocacy banners to search to video, surveillance has
community, led by groups like the Electronic been embedded into advertising on the web
Frontier Foundation and Center for Media to a greater extent than any other marketing
Education, pressured Congress to consider channel in history.
‘opt-in’ legislation mandating that compa-
nies obtain prior consent from users before
collecting their data. Seeing affirmative con-
sent as a threat to the developing surveillance WHY SURVEILLANCE?
business model, a coalition of marketing
trade associations and newly formed online Technology looms large in scholarly and
ad industry groups successfully lobbied to popular understandings of the web for self-
install a regime of advertising ‘self-regulation’. evident reasons. Few would deny that the
Privacy concerns have remained and periodi- character and speed of technology change
cally resurface when particularly egregious during the web’s formative decades have
abuses come to light, but industry lobbies been remarkable. The stages of web advertis-
have been largely successful in maintaining ing presented above might be read as func-
self-regulation, cementing a policy frame- tions of various technological innovations:
work based on principles of ‘notice and the centralized ad server, HTTP cookie,
choice’. The implementations of this ‘con- search term auction, profile database, target-
sumer empowerment’ approach are deeply ing algorithm, and so on. Without a doubt
flawed, primarily relying on unintelligible these technologies have played a central role
privacy policies and tepid opt-out mecha- in shaping the particulars of web advertising.
nisms (Crain, 2018). With little access to An important thrust of media history scholar-
the levers of political power, web users have ship has been to interrogate and unpack tech-
become resigned to commercial surveil- nological forms and practices, as evidenced
lance, believing it ‘futile to [attempt to] man- by a flowering of research approaches includ-
age what companies can learn about them’ ing science and technology studies, infra-
(Turow et al., 2015: 3). structure studies, and media archaeology.
The point is not to overstate the cohesion At the risk of oversimplification, what is
and sophistication of surveillance advertis- collectively useful about these various
ing practices, but rather highlight the major approaches is their attempt to weave together
trends of the web advertising industry that technology’s determinative effects and social
are discernable from the strategies of mar- construction, to bring specificity to complex
ket leaders. By the end of the 2000s, the five questions about the composition and conse-
most powerful US internet advertising com- quences of the social-material assemblages
panies – Google, Facebook, Microsoft, AOL, we call ‘technology’.
and Yahoo (the latter two now owned by Critical political economy of media brings
Verizon) – all served profile-based targeted an important ‘decentering’ dimension to this
advertising and collected consumer data research program, situating media and com-
across expansive networks that included their munications technologies within a historical
own web properties and millions of other context that foregrounds the structural dynam-
sites and applications. Numerous studies ics and differential power relations that char-
have shown that the web’s most popular sites acterize capitalism. This is not to deny that
and services not only overwhelmingly moni- technologies can exhibit significant biases or
tor their users, but share user data with third affordances, but to emphasize how and why
parties, often by giving them direct access to specific technologies and elements thereof
collect user information via their platforms have been elevated or suppressed as media
340 THE SAGE HANDBOOK OF WEB HISTORY

systems congeal around capitalist impera- manifested in the marketing/media complex


tives. Specifically, CPE draws attention to in significant ways, catalyzing and expand-
how web advertising has been constructed by ing advertising practices and technologies
human beings making decisions within organ- related to what Philip Napoli (2011) calls the
izational and political economic bounds that ‘rationalization of audience understanding’.
exert what Raymond Williams [2008 (1971)] Such rationalization boils down to efforts to
referred to as ‘pressures and limits’. In other enhance the comprehension, predictability,
words, research in CPE puts front and center and control of consumer behavior (Pridmore
the notion that, as Jonathan Hardy (2014: 112) and Zwick, 2011). Advertising began to
succinctly put it, ‘capitalism influenced the recompose around an increasingly segmented
internet more than vice versa’. system. Just as inventory was tracked across
Situating web advertising within the broad transnational commodity chains, pressure
currents of capitalism helps to answer the mounted to track audiences as they moved
question: why surveillance? Media business from activity to activity, both nationally and
relations began to shift around 1970 as US internationally (Schiller, 2014). A succession
capitalism in particular faced a crisis of prof- of new media technologies were incorpo-
itability (Brenner, 2002) that spurred a host rated into these functions, with the web and
of political economic activity around infor- surveillance advertising forming a center of
mation and communication technology (ICT) gravity in the 1990s and beyond.
development. It is no coincidence that this is The web presented a range of prospects
the period when packet-switched networks for ‘one-to-one’ marketing, a chance to
and computerization began to kick off major improve return-on-investment by separat-
changes in the composition of global capital- ing ‘targets’ from ‘waste’ (Turow, 2011), to
ism. Nor that the ideology of neoliberalism perhaps solve once and for all the legendary
and its policies of privatization, deregulation, problem posed by department store magnate
and ‘free trade’ would soon achieve main- John Wanamaker: ‘Half the money I spend on
stream political orthodoxy. Dan Schiller advertising is wasted; the problem is I don’t
(1999, 2007) has shown that while com- know what half’. Evaluative methods such as
modification of information has always been A/B testing proliferated, offering improved
involved in capital accumulation, the last 50 ad campaign optimization and increasingly
years have seen ICTs become a foundational granular measurements of outcomes. As
pole of growth for an emergent ‘digital capi- Joseph Turow (2006) put it, the web became a
talism’. Web advertising is part and parcel of ‘test bed’, a prototype for a mode of advertis-
this broader political economic project. ing that found its purchase in distributing data
Capitalist investment, innovation, and gathering capacities, connecting heretofore
appropriation of ICTs induced significant disparate data silos, building out what Julia
changes not only in production, but also Angwin (2014) calls a surveillance ‘dragnet’,
consumption, and, most importantly for our and creating a dispersed but integrated digi-
purposes, the production of consumption tal enclosure movement (Andrejevic, 2007)
otherwise known as advertising. Audience to power increasingly intensive information
fragmentation, shifting demographics, and commodification.
profit squeezes put national marketing under This does not mean that web advertising
growing strain. In 1965, a marketer could developed smoothly or without episodes
reach 80% of 18- to 49-year-old women by of contestation, dysfunction, or resistance.
purchasing just three television commer- Competition and the struggle to overcome
cials; three decades later it required nearly it are definitional to capitalism and drive its
100 prime-time spots to achieve the same dynamism. Disparate entities within the mar-
reach (Narisetti, 1998). These dynamics keting/media complex worked in conjunction
A CRITICAL POLITICAL ECONOMY OF WEB ADVERTISING HISTORY 341

and at odds to construct a social-material 2015; Stole, 2006). Comparative and interna-
infrastructure for online advertising. The web tional studies represent a particularly impor-
emerged in the 1990s, simultaneously a threat tant area for further research. The European
and opportunity, at once conceivable as a Union, for example, has proven much more
platform for individual empowerment, com- willing to constrain online commercial
mercial media’s mortal wound, and potential surveillance, prompting major regulatory
horn of plenty for consumer data gathering. conflicts and legal challenges from the trans-
It was an unknown quantity, threatening to national web advertising sector.
further splinter audiences and provide indi- Another area in need of further study is the
vidual consumers with new degrees of auton- continuing role of finance capital, which has
omy, perhaps even the power to excise media remained a potent driver of web advertising
advertising altogether. Marketers risked the and consolidator of market power. Google
loss of control over a media system that had and Facebook have relied on finance capi-
long been dictated by their interests. As the tal to expand, building up powerful barriers
CEO of marketing giant Procter & Gamble to competition. In the United States, three-
famously put it: the ad industry needed to quarters of digital ad revenues are divided
‘grab technology change in its teeth’ or among just ten companies, while European
chance obsolescence in the digital future and Asian markets also exhibit high degrees
(Artzt, 1994). Workaday rivalries aside, of concentration (Pricewaterhouse Coopers,
a broad range of companies maintained a 2016). Google alone claims its ads can reach
common interest in bringing advertising to over 90% of global internet users. One esti-
as many areas of social life as possible and mate put the number of ads Google serves
sought to redefine the web accordingly. As on a daily basis close to 30 billion, roughly
the stages presented above highlight, policy- ten times the number of people on the planet
making was a preferred venue for action, as with internet access (Koetsier, 2012). This
has been the case throughout US history. kind of market dominance raises important
This sketch of a critical political economy concerns about the bottlenecking of surveil-
approach contributes to a web advertising lance and influence capacities, especially
historiography that denaturalizes technol- when digital advertising intermingles freely
ogy, accounts for continuity and change, between the ostensibly separate domains of
foregrounds policy-making, and situates commerce and politics. At the time of this
marketing and media within the dynamics writing, Facebook and other purveyors of
of the global capitalist political economy. As surveillance advertising face mounting scru-
Robert McChesney (2008: 12) notes, ‘assess- tiny over their roles in political manipulation
ing policies, structures, and institutions can- and what, if any, civic responsibilities fall on
not answer all of the important questions their shoulders. One thing is certain. If the
surrounding media, but [political economists] surveillance status quo is to be confronted,
believe their contributions are indispensi- political activism and public policy must play
ble to the comprehensive study of media’. fundamental roles.
Calling out the undemocratic history of US
media policy-making and web advertising in
particular, CPE continues to articulate poli- REFERENCES
tics as a necessary site of intervention into the
structural composition of media. Historical Abbate, J. (1999) Inventing the Internet.
work in this tradition provides valuable les- Cambridge, MA: MIT Press.
sons about future prospects (see Dolber, Andrejevic, M. (2007) iSpy: Surveillance and
2017; Dunbar-Hester, 2014; Gillespie, 2007; Power in the Interactive Era. Lawrence, KS:
McChesney, 1993; Niesen, 2012; Pickard, University Press of Kansas.
342 THE SAGE HANDBOOK OF WEB HISTORY

Angwin, J. (2014) Dragnet Nation: A Quest for 10/25/30-billion-times-a-day-google-runs-


Privacy, Security, and Freedom in a World of an-ad-13-million-times-it-works/ (accessed
Relentless Surveillance. New York: Times Books. 15 February 2018).
Artzt, E. (1994, May 12) P&G’s Artzt: TV Adver- Marvin, C. (1988) When Old Technologies
tising In Danger Remedy Is To Embrace Tech- Were New: Thinking About Electric Commu-
nology And Return To Program Ownership. nication in the Late Nineteenth Century.
Advertising Age. Available at: http://adage. New York: Oxford University Press.
com/article/news/p-g-s-artzt-tv-advertising- McChesney, R. (1993) Telecommunications,
danger-remedy-embrace- technology-return- Mass Media, and Democracy: The Battle for
program-ownership/87052/ (accessed 5 the Control of U.S. Broadcasting, 1928–1935.
March 2017). New York: Oxford University Press.
Baldasty, G. (1992) Commercialization of News McChesney, R. (2007) Communication Revolu-
in the Nineteenth Century. Madison, WI: tion: Critical Junctures and the Future of
University of Wisconsin Press. Media. New York: New Press.
Berners-Lee, T. and Fischetti, M. (1999) McChesney, R. (2008) The Political Economy of
Weaving the Web: The Original Design and Media: Enduring Issues, Emerging Dilemmas.
Ultimate Destiny of the World Wide Web by New York: Monthly Review Press.
its Inventor. New York: HarperCollins. McKinsey & Company (2015, September) Global
Brenner, R. (2002) The Boom and the Bubble. Media Report: Global Industry Overview.
New York: Verso. Mosco, V. (1982) Pushbutton Fantasies. Nor-
Crain, M. (2014) ‘Financial markets and online wood, NJ: Ablex.
advertising demand: Reevaluating the Mosco, V. (2009) The Political Economy of
dotcom investment bubble’, Information, Communication. 2nd edn. Thousand Oaks,
Communication & Society, 17(3): 371–394. CA: Sage.
Crain, M. (2018) ‘The limits of transparency: Napoli, P. (2011) Audience Evolution: New
Data brokers and commodification’, New Technologies and the Transformation of
Media & Society, 20(1): 88–104. Media Audiences. New York: Columbia
Dolber, B. (2017) Media and Culture in the U.S. University Press.
Jewish Labor Movement: Sweating for Narisetti, R. (1998, November 16) New and
Democracy in the Interwar Era. New York: Improved: Ad experts talk about how their
Palgrave. business will be transformed by technology.
Dunbar-Hester, C. (2014) Low Power to the Wall Street Journal, 33.
People. Cambridge, MA: MIT Press. Niesen, M. (2012) ‘The little old lady has teeth: The
Freedman, D. (2014) The Contradictions of U.S. Federal trade commission and the advertis-
Media Power. London: Bloomsbury. ing industry, 1970–1973’, Advertising & Society
Gillespie, T. (2007) Wired Shut: Copyright and Review, 12(4). Available at: https://muse.jhu.
the Shape of Digital Culture. Cambridge, edu/article/468049 (accessed 4 Sept 2018).
MA: MIT Press. Nordrum, A. (2016, August 23) You’re being
Greenstein, S. (2017) How the Internet Became tracked (and tracked and tracked) on the
Commercial. Princeton, NJ: Princeton Univer- web. IEEE Spectrum. Available at: https://
sity Press. spectrum.ieee.org/tech-talk/telecom/
Hardy, J. (2014) Critical Political Economy of internet/youre-being-tracked-and-tracked-
the Media: An Introduction. New York: and-tracked (accessed 15 February 2018).
Routledge. Ohmann, R. (1996) Selling Culture: Magazines,
Harvey, D. (2005) A Brief History of Neoliberal- Markets, and Class at the Turn of the Cen-
ism. New York: Oxford University Press. tury. New York: Verso.
Hesmondhalgh, D. (2013) The Cultural Indus- Organization for Economic Co-operation and
tries. 3rd edn. London: Sage. Development (2011) Households with Broad-
Koetsier, J. (2012, October 25) 30 Billion Times band Access. OECD Broadband Portal (https://
a Day, Google Runs an Ad. Venture Beat. www.oecd.org/sti/broadband/oecdbroadband-
Available at: https://venturebeat.com/2012/ portal.htm).
A CRITICAL POLITICAL ECONOMY OF WEB ADVERTISING HISTORY 343

Pickard, V. (2015) America’s Battle for Media Stole, I. (2006) Advertising on Trial: Consumer
Democracy. New York: Cambridge University Activism and Corporate Public Relations.
Press. Urbana, IL: University of Illinois Press.
Pricewaterhouse Coopers (2005) IAB Internet Turow, J. (2006) Niche Envy: Marketing Dis-
Advertising Revenue Report. crimination in the Digital Age. Cambridge,
Pricewaterhouse Coopers (2016) IAB Internet MA: MIT Press.
Advertising Revenue Report. Turow, J. (2011) The Daily You: How the New
Pridmore, J. and Zwick, D. (2011) ‘Marketing Advertising Industry is Defining Your Identity
and the rise of commercial consumer surveil- and Your Worth. New Haven, CT: Yale Uni-
lance’, Surveillance & Society, 8(3): 269–277. versity Press.
Schiller, D. (1999) Digital Capitalism. Cam- Turow, J., Hennessy, M., and Draper, N.
bridge, MA: MIT Press. (2015) The Tradeoff Fallacy: How Market-
Schiller, D. (2007) How to Think about Informa- ers Are Misrepresenting American Con-
tion. Urbana, IL: University of Illinois Press. sumers and Opening Them Up To
Schiller, D. (2014) Digital Depression. Urbana, Exploitation. Annenberg School for Com-
IL: University of Illinois Press. munication, University of Pennsylvania,
Schiller, H. I. (1969) Mass Communications and Pennsylvania PA.
American Empire. Boulder, CO: Westview Press. Wasko, J., Murdock, G. and Sousa, H. (eds)
Schiller, H. I. (1978) ‘Computer systems: Power (2011) The Handbook of Political Economy
for whom and for what?’, Journal of Com- of Communications. New York: Wiley.
munication, 28(4): 184–193. Williams, R. (2008) Television: Technology and
Sinclair, J. (2016) ‘Advertising and media in the Cultural Form. New York: Routledge
age of the algorithm’, International Journal (1st edn, 1971).
of Communication, 10: 3522–3535. Winseck, D. and Jin, D. Y. (2011) The Political
Starr, P. (2004) The Creation of the Media: Economies of Media: The Transformation of
Political Origins of Modern Communications. The Global Media Industries. New York:
New York: Basic Books. Bloomsbury.
23
Exploring Web Archives in the Age
of Abundance: A Social History
Case Study of GeoCities
Ian Milligan

INTRODUCTION GeoCities helped to democratize the Web: no


need to know the inner workings of FTP, or
When I think of the challenges – technical, servers, or beyond; web design could be as
ethical, historiographical – facing historians easy as Word processing.
seeking to explore the social history of web While the actual definition of ‘Big Data’
archives, a particular archive comes to mind: is debatable – computer scientists might
GeoCities. GeoCities.com, founded in 1994 dismiss the 4TB that comprise a GeoCities
and closed in 2009 by Yahoo! corporation, web archive as small compared with the
was a place where many people had their first petabytes of data generated by the Large
home on the World Wide Web. One simply Hadron Collider – for historians something
needed to open their web browser, enter like GeoCities represents the new scope of
GeoCities.com into the browser bar, provide historical abundance as they enter the web
their e-mail address, and they would receive age of historiography (Graham et al., 2015).
their free megabyte to create a website. In this chapter, I begin by exploring the state
These could be on any topic: their love of of historical abundance in this field. I then
Buffy the Vampire Slayer, their family tree, a use GeoCities as an example to show what
lamentation or celebration of a favorite sports researchers can do with web archives at scale
team, an online diary, or even a child’s tribute using tools that historians and computer
to their love of Winnie the Pooh. People took scientists have developed. In particular, I
to GeoCities.com with a passion, and today explore how historians can fruitfully analyze
the GeoCities.com web archive contains the links, text, and images to begin to reconstruct
sites of seven million people, about 186 mil- traces of the past for analysis. Ultimately, I
lion URLs in total. While there were many argue that in archives like GeoCities scholars
other ways to create web pages, services like have the potential to create more democratic,
EXPLORING WEB ARCHIVES IN THE AGE OF ABUNDANCE 345

accessible histories but that doing so will 2009, seven million users created some 186
require substantial rethinking of the histo- million documents. Even creating a list of
rian’s craft. Uniform Resource Locators (URLs) within
the collection leads to a 7GB text file and can
take several hours even on a powerful server.
This is what abundance looks like.
FROM SCARCITY TO ABUNDANCE The Web was originally framed along the
lines of a ‘memory machine’, as the title of
In 2003, the late great American historian Belinda Barnet’s book puts it (Barnet, 2013).
Roy Rosenzweig wrote ‘Scarcity or Vannevar Bush’s Memex, or Ted Nelson’s
Abundance? Preserving the Past in a Digital Project Xanadu both aimed to improve the
Era’ in the American Historical Review. This memory of the individual user – transcend-
prescient piece foresaw the shift from ‘a cul- ing the problem of the ‘absent-minded pro-
ture of scarcity to a culture of abundance’, fessor’. In so doing, hypertext and the Web
which has now occurred on a scale that may ended up improving the collective memory
have defied even what Rosenzweig could of our society. Researchers today now have
have imagined a decade and a half ago far more information about everyday people
(Rosenzweig, 2003). An example that I like than ever before. Traditionally, elites shaped
to use to underscore this is to compare the our historical record; increasingly, all of
sorts of sources that historians have tradition- us act in ways and leave behind traces that
ally had with the resources that they might shape the historical record for the next gen-
face in this web age of abundance. eration. James Gleick explains this well in
Ordinary people did not generally leave his masterful The Information: A History, A
behind historical records. When a historian Theory, A Flood: ‘the information produced
tried to reconstruct an earlier time period, and consumed by humankind used to vanish
they were often forced to dig for the scarce – that was the norm, the default. The sights,
records that individuals left behind: a birth the sounds, the songs, the spoken word just
or death notice, for example, or a record of melted away. Marks on stone, parchment, and
a marriage or an inclusion in a national cen- paper were the special case’ (Gleick, 2012).
sus. For insight into the rich lives of every- For GeoCities is just part of the broader
day people, historians have often had to read constellation of web archived material being
against the grain in sources such as criminal retained. In their Uncharted: Big Data as
transcripts or fire insurance records. One of a Lens on Human Culture, Erez Aiden and
the best examples of information about the Jean-Baptiste Michel note that a literary
lives of modern England was the Old Bailey, scholar studying Edgar Allen Poe has 422
the central London criminal court which letters to explore. Think of your own digi-
produced court transcripts in part to sate a tal record – chances are somebody studying
salacious public appetite. Between 1674 and you could find more than 422 letters! As they
1913, the Old Bailey produced and preserved note, ‘this material comprises an astonish-
197,745 trial transcripts. Today’s Old Bailey ingly detailed record of the lives of billions of
website correctly describes its holdings as people – a record that did not exist at all mere
the ‘largest body of records documenting decades ago. It has no precedent in human
the lives of non-elite people ever published’ history’ (Aiden and Michel, 2013). Billions
(Old Bailey Proceedings Online, 2013). For of new documents are created every day, with
239 years, the 197,745 trials are indeed the millions being saved for future access.
golden standard of historical documenta- GeoCities helps us remember, however,
tion. Compare that, however, with GeoCities that it is not just scale – preserving millions
today – in the 15 years between 1994 and of documents where we used to only save
346 THE SAGE HANDBOOK OF WEB HISTORY

thousands, for example – but scope. People metaphors which became integral to the site,
who never before would have been part of a trend which was solidified in its 1995
the historical record are now suddenly part renaming as GeoCities.
of it. Teenagers who write about their love That GeoCities would come to emphasize
of video games, a retired Canadian vacation- and stress spatial metaphors was part of a
ing in Florida who posts their pictures on broader trend on the Web of the 1990s, that of
a personal website, a young girl who posts the ‘frontier’, as Fred Turner has argued in his
poetry on her page under the watchful eyes 2008 From Counterculture to Cyberculture.
of a parent. In the past historians might have While 1960s counterculture adherents had been
only had fortunate glimpses: in the extremely suspicious of or even hostile to technology –
rare event that a diary was made available to burning punch cards at Berkeley, for example –
a historian, or if they appeared in the ‘let- by the 1980s and 1990s their intellectual
ters to the editor’ section of a magazine or successors saw the power of utopian social
were interviewed in a community newspaper transformation in technology (Turner, 2008).
or fan magazine, or if they had been caught Notably, this current expressed the belief that
fleetingly in a radio or television interview. the Web, unlike broadcast mediums like the
These insights into the thoughts and activi- television, could ‘put you [the user] in com-
ties of everyday people have been invaluable mand again’ (Turner, 2008). Geographic met-
to social historians, in part because they are aphors additionally helped anchor the web
so rare within the collections of formal librar- – think of the Electronic Frontier Federation,
ies and archives. With GeoCities, what was for example, an early and still significant
rare is now common, if a researcher knows digital rights group – and GeoCities quickly
how to find it. This is both a challenge and became promoted as an ever-expanding geo-
an opportunity. graphic space.
GeoCities had several unique features,
many of which were anchored to its concep-
tion of geographic space. Spatial metaphors
A BRIEF HISTORY OF GEOCITIES: reigned in the site’s very being. The most
FROM RISING STAR TO DEAD apparent was the ‘neighborhood’ structure
WEBSITE which was the site’s main organizing feature.
Rather than a ‘vanity address’ containing the
GeoCities.com, like many startups today, had user’s name or preferred word (think of a URL
humble beginnings. In November 1994, in such as http://geocities.com/~historycave),
Beverly Hills, California, a new service – users selected a neighborhood for their site
Beverly Hills Internet – launched. Founded to belong in. These took many shapes and
by David Bohnett, a software industry vet- sizes. Some were broad in scope, such as the
eran who had just lost his companion, the ‘Heartland’ neighborhood focusing on ‘fami-
aim was to create a new service to bring lies, pets, hometown values’. Golf aficiona-
Internet users together. As Bohnett later dos would go to Augusta; political wonks to
recalled, ‘We all have something to share the CapitolHill; philosophers, teachers, and
with each other, which enriches both their book lovers to Athens; LGBT topics were
lives and ours as well’ (Ocamb, 2012). While found in WestHollywood; and discussions
Beverley Hills Internet was part of a broader of technology in ResearchTriangle or
trend of site providers including Tripod SiliconValley. URLs would take the form of
(1994) and Angelfire (1996), it had a unique http://geocities.com/Heartland/1005, where
focus on community (which I have argued in the site number was a value between 1,000
a separate piece). The name Beverly Hills and 9,999. As GeoCities grew, those site
Internet itself spoke to the geographic numbers would be exhausted, forcing the
EXPLORING WEB ARCHIVES IN THE AGE OF ABUNDANCE 347

creation of suburbs. Heartland, as the larg- $17 a share, which soon rose to $40 a share.
est and most expansive neighborhood, would Yahoo! purchased GeoCities in January 1999
eventually have 41 suburbs (Heartland/Hills, for $4.6 billion, or a valuation of $117 a
or Heartland/Plains). share. It was then the third most visited site
These neighborhoods played a critical on the Web, with 55 million page views a
role in the life of the community, as I have day, behind only Yahoo! and AOL (Motavalli,
argued elsewhere (Milligan, 2017). Each 2004). Yahoo! GeoCities, as the new site was
area had its own community leaders, vol- branded, quickly saw rapid change. The
unteers who enforced community standards neighborhood structure was scrapped for new
and taught people the ropes of HTML and users, and for this and other reasons the site
site creation, or who ran community news- began to rapidly decline.
papers. Webrings (a series of websites that We can see how quickly the zeitgeist
join together in a ‘ring’ of links so that read- moved on by looking at media discussions of
ers can move between them – i.e. websites GeoCities. In 1998 and 1999, the Lexis|Nexis
on Cairn Terriers that join together so that a database had 208 and 247 articles respec-
visitor can surf between many like-minded tively on GeoCities; by 2000 there were 20,
pages), awards, and other connecting ten- and by 2003 only seven articles contained
drils also connected these areas as discussed that keyword. The Web was quickly moving
below. In short, the neighborhoods were what on from the garish, custom-designed web-
held GeoCities together and kept it from just sites of GeoCities. In 2009, Yahoo! decided
being a collection of parked websites. They to shutter GeoCities and delete all user con-
were an attempt to cluster users based on tent (save for GeoCities Japan, which contin-
pre-existing interests, and to facilitate greater ues today). While notice was given, e-mails
traffic within and throughout the community. went to accounts set up in the 1990s. More
It quickly became successful. A press glaringly, they provided no export tool and
release in 1995 noted that it had received users who wanted to save their content had
over 600,000 hits within its first five weeks, to go page-by-page and preserve them by
and by summer 1995 there were 1,400 web- using their browser’s ‘save file as’ func-
sites on board. A succession of press releases tion. Fortunately, the Internet Archive had
underscored the dramatic growth: the first been preserving this material since 1996
100,000 by August 1996, and the first million as part of its general archiving of the Web,
by October 1997. By mid 1998, GeoCities and it carried out an end-of-life comprehen-
was one of the top ten sites in terms of traf- sive crawl of GeoCities. Simultaneously, a
fic and was growing by over 18,000 users team of guerrilla web archivists known as
a day (Motavalli, 2004). In its popularity Archive Team carried out a collective effort
lay the seeds of its eventual acquisition and to download this material. If it had not been
downfall. The media promoted GeoCities for their efforts, there would be no record of
as a place for people who wanted to join the GeoCities today.
Web. ‘What if you want to do more than just
look at live images from Hollywood? What if
you want to live there? Now you can’, noted
The Independent newspaper in 1996 (Ridey, ACCESSING A WEB ARCHIVE TODAY
1996). It articulated a vision of the Web
that would not just be consumed by most For a scholar interested in studying a collec-
web users, but a place where people could tion of sites like GeoCities, they have a
actively contribute. number of different ways to approach the
With users came money. In August 1998, problem: from the Wayback Machine, to the
GeoCities went public with an initial offer of end-of-life collection from Archive Team,
348 THE SAGE HANDBOOK OF WEB HISTORY

to more boutique arrangements with the Machine. As a high-profile digital dele-


Internet Archive itself. In this, we can see the tion, multiple groups saved GeoCities when
particularities of GeoCities but also the issues Yahoo! announced its 2009 erasure. Archive
of doing web history more generally. Team was ready to help save GeoCities, as
As Rogers notes in this volume, for most it had been formed in the wake of the 2008
scholars the starting point to working with deletion of AOL Hometown, another early
GeoCities will be the Internet Archive’s collection of personal websites. Jason Scott,
Wayback Machine (http://archive.org/web/). a digital archivist, had called for teamwork to
It provides two ways of working with web grab files. His December 2008 battle cry is
archived material. First, if a user knows the worth quoting at length:
exact URL of a resource, they can ‘go back
Fuck the EULAs and the clickthroughs. This is his-
in time’ to that date. For GeoCities, then, a tory, you bastards. We’re coming in, a team of
user would enter ‘GeoCities.com’, find a multiples, and we will utilize Tor [a way to anony-
date that they are interested in, and select mously surf the Web by concealing the origin and
it. Figure 23.1 shows the earliest collected destination of web traffic] and scripting and all
crawl of the GeoCities.com homepage on manner of chicanery and we will dupe the hell out
of your dying, destroyed, losing-the-big-battle
22 October 1996. They would then subse- website and save it for the people who were dumb
quently click on links on the page to tem- enough to think you’d last. (Scott, 2009)
porally browse around the page in October
1996 – as they select a link, the browser will In April 2009, when news spread about
go to the nearest available timestamp for the GeoCities’ deletion, Scott and Archive
page requested. Slowly, the user can find the Team sprang into action. Yahoo! limited
content they are looking for. The website is each downloading computer to 15 mega-
not fully functional – search functions, for bytes an hour: a limit that a regular user
example, will not work nor will any dynamic might not encounter, but if you were trying
content. But one can begin to find websites. to preserve this material, it meant that
In the above example, the user could select many people would have to act in concert
‘neighborhoods’, find the neighborhood of to save GeoCities. To make sure that the
interest, and then begin to look at the pages workload was distributed evenly across
within one-by-one. hundreds of people, Archive Team had vol-
The second option is to use the full-text unteers download and run virtual machines
search option, which explores home pages. on their own computers – these pre-config-
Unfortunately, geocities.com is treated as one ured machines could run pre-configured
home page – so searching can only find the programs, acting in concert with the hun-
splash page. As I will note below, indexing dreds of other virtual machines. Archive
the entirety of GeoCities.com would be chal- Team found themselves in a crash course in
lenging so this is an understandable decision GeoCities history: figuring out the ‘neigh-
taken by the Internet Archive. borhood’ system.
The Wayback Machine is invaluable – the They succeeded in grabbing much of
catch is actually locating the URLs of pages GeoCities before it went down (it is impos-
of interest. These can come from a variety of sible to know just how much). Archive Team,
places: from browsing page-by-page through in the end, was not alone in saving this mate-
the Wayback Machine, to finding websites rial: a testament to the growing interest in
listed in old books or newspapers, or finding digital preservation and in GeoCities itself.
sites by other possible means. GeoCities was also mirrored by ReoCities,
Some of these other means of finding web- the Internet Archive carried out one last big
sites can come from working with the archive download, and another website, Internet
itself, outside of the confines of the Wayback Archeology, also collected a subset. A year
EXPLORING WEB ARCHIVES IN THE AGE OF ABUNDANCE 349

Figure 23.1 GeoCities.com from 22 October 1996, via Wayback Machine.

later, the end-of-life GeoCities crawl was his blog, this was ‘a collection for histo-
released as a Torrent; at the time, the larg- rians, for researchers, for developers. For
est Torrent on the Web, some 643GB of those who want to do study on the herit-
user-generated content. As Scott noted on age on something so soon gone and yet so
350 THE SAGE HANDBOOK OF WEB HISTORY

much of a part of how we got here’ (Scott, Analyzing allows a scholar to extract infor-
2011). Today, you can download the 634GB mation, say the hyperlinks from the pages
of GeoCities either through the Torrent that they have filtered, or find commonly
or via the Internet Archive at https://arch recurring person names. Aggregating allows
ive.org/details/2009-archiveteam-geocities- one to summarize or aggregate the output
part1. Working with the Torrent is somewhat of the analysis from the previous step. And,
challenging, as the files are the result of a finally, visualizing allows them to see results
wget download (wget is a command line tool in a table, network graph, or some other form
that can be used to download or mirror web- of custom visualization (Lin et al., 2017).
sites and other files). It requires boutique sets This approach can allow the fruitful exploita-
of scripts or expertise with the command line tion of a web archive.
to wrangle into a useful format.
For many web historians, files in the
WARC format are ideal. The Internet
Archive also preserved GeoCities at the end USING PAGERANK TO FIND
of it, and through a research agreement with SITES OF INTEREST
them our project team at the University of
Waterloo was able to get the WARC files. With so many files, the difficulty is finding
The WARC file format, which is certified which actual pages to explore. There are sev-
by the International Standards Organization, eral different ways to find sites of relevance:
preserves web archived information in a using network analysis, specifically by lever-
concatenated form. This is useful because if aging the power of hyperlinks throughout
you consider all the material that goes into GeoCities (which I assume to be a deliberate
making a large website possible – the thou- practice), extracting text to explore the vari-
sands of images, PDFs, Word documents, ous ‘topics’ that one can find in the collec-
HTML files, CSS stylesheets, etc. – you tion, and finally by establishing a search
would rather have them all in one container, engine to find topics of considerable interest.
with metadata to help find the information As a starting point for historical inquiry,
that you need. however, network analysis is the most fruit-
The rise of an ISO standard has fortu- ful (Brügger, 2013).
nately enabled the development of an ecosys- A useful starting point when doing web
tem of tools to study web archives through history at scale is to find the pages that
WARC files. One such tool that our research users tended to link to and visit. Many users
team at the University of Waterloo and York wanted others to come to their site, as seen
University has developed is the Archives in their guestbooks, hit counters, and beyond.
Unleashed Toolkit (archivesunleashed.org), While today web content creators often rely
or AUT. AUT provides in part for scalable on search engines to drive traffic to pages,
analytics of web archives, and has the goal of links were critical in the 1990s.
being a usable tool for humanists and social Given the frequency of links around
scientists without formal computer science GeoCities, I used AUT to extract links from
training. It allows researchers to do the fol- the entire site. The ensuing results were too
lowing four steps: filtering, analyzing, aggre- large, so the first step was to filter (the first
gating, and visualizing, or what we term the step of the FAAV cycle) the results by ‘neigh-
FAAV cycle. Filtering allows a user to find a borhood’. In this case, I decided to focus on
particular portion of the web archive – pages the child-focused Enchanted Forest. I then
within the ‘Enchanted Forest’, for example, subsequently analyzed the links, aggregated
or political party pages that link to a certain how often pages linked to each other, and
domain and contain a particular keyword. visualized them in Table 23.1 below.
EXPLORING WEB ARCHIVES IN THE AGE OF ABUNDANCE 351

Table 23.1 Origin and destination links in GeoCities


Origin Destination Number of links

http://geocities.com/EnchantedForest/Meadow/1134 http://www.geocities.com/EnchantedForest/1004 83
http://geocities.com/Area51/Stargate/1357 http://www.geocities.com/Area51/ 33
EnchantedForest/4213
http://geocities.com/Eureka/1309 http://www.geocities.com/EnchantedForest/ 27
Tower/7555

What does the above show? It means to intrinsically knowing the trustworthiness
that on all of the pages in EnchantedForest/ of The New York Times, but because lots of
Meadow/1134 – the Enchanted Forest other sites that in turn are linked from highly
Meadow Community Center (including valued sites link to it.
numerous subpages such as the Meadow Let’s use one particular example now to
newsletter, the webring, the community outline how I can use link analysis to see
leader register, etc.), there were 83 links to the these patterns. Consider Figure 23.2 below,
EnchantedForest/1004 site – the main com- which shows the link structure of one neigh-
munity center. Eighty-three links between borhood: the Enchanted Forest.
two community centers is not surprising, In this figure, the bigger the dot – or node
given the constant references and affinities. – the higher the PageRank value. By trac-
If this is scaled across the entirety of ing the bigger nodes in the above visualiza-
GeoCities, as a researcher I can begin to see tion (or the spreadsheet that powers it), users
what particular websites had more incom- can begin to see the sites that had the highest
ing links and outgoing links, and by using an votes of confidence throughout the site. They
algorithm called PageRank, we can begin to would be the most likely to be stumbled upon
see which sites would most likely be found by users.
during random user adventures around By exploring them, we see great variety in
GeoCities. This latter is particularly impor- the highly ranked pages. Some were simply
tant because if we begin to rank websites pages by children which managed to achieve
based on these metrics, just relying on the a particular impact; more of them, however,
sheer numbers of incoming links as a metric were aimed at building community and had
of popularity can be skewed. Link farms, or themselves baked into the GeoCities infra-
pages that had lots of outgoing links and lit- structure. Indeed, the commonality across the
tle content, existed to ‘game’ search engines highly ranked pages was that they had thrown
and raise certain sites up in the rankings. themselves into GeoCities: they received
PageRank, the core of which powers the ‘awards’ from other users, they shared
Google search engine today, can help us find images with other people, they had become
useful sites. In brief, PageRank is based upon ‘featured’ by the GeoCities administration,
links to other sites, each of which can be con- and in general they were part of a vibrant
sidered a ‘vote of confidence’ in that site. ecosystem of pages. The highest-ranked site
However, these votes are weighted according can be seen in Figure 23.3.
to the PageRank of the site that is issuing the This page was the Enchanted Forest
link, helping us get around shenanigans (Brin Awards Page (http://web.archive.org/web/
and Page, 1998; Kamvar et al., 2003; Rogers, 20010721183753/http://www.geocities.com/
n.d.). For example, a link from The New York EnchantedForest/Glade/3891/), which gave
Times should be worth more than a link from out the main awards, such as the Enchanted
Bob’s Random website, and PageRank val- Forest Award of Excellence. A user would
ues the former more than the latter – not due go there and apply for an award, and then
352 THE SAGE HANDBOOK OF WEB HISTORY

Figure 23.2 The link structure of one GeoCities neighborhood, the Enchanted Forest.

Figure 23.3 EnchantedForest/Glade/3891 — the highest–ranked site.


EXPLORING WEB ARCHIVES IN THE AGE OF ABUNDANCE 353

community leaders would visit and review the Elsewhere on his page, he had an aquarium
user’s page, and if it met a long list of char- page – ‘my mom won’t let me and my sister
acteristics the user would receive the award. have an aquarium … so this is what we came
Other pages that I can find using PageRank up with’ – which was a watery-background
include news websites, community centers, page with dozens of animated fish GIFs. He
and ‘cyberpet’ images that people would put also had other pages, such as one focusing on
on their site. Batman and another on Spiderman, mostly
The best use of this PageRank method, consisting of pictures and frames from the
however, is to find the individual web pages of movies. An extensive Star Wars site makes
children and avid users themselves. The fol- for interesting viewing as well, connected to
lowing pages I explore were found by finding many other sites through two webrings that
highly ranked pages, which had many links Brandon maintained. More crucially,
to them from other highly linked-to pages. Brandon also received 36 awards from other
These pages were all written by children who users, stretching back to his first award on 31
threw themselves into the community fabric July 1997: most of these were from other
of GeoCities, joining webrings, accumulat- kids on the Web, although some were from
ing awards, and having especially active adults or others (‘Clara’s Special Children’s
guestbooks. Webrings were one of the most Award’, which offered ‘a prayer for you
consistent features throughout the Enchanted [and] blessings all year through’). Brandon
Forest; our survey of 430 sites found that fully was part of this ecosystem too, offering his
one-quarter of websites were a member of at own ‘Batman’ award that other people could
least one (107 hits, or 24.8%). Other interest- have: they would need to fill out a form to
ing pages from my PageRank exploration run apply for the award, consisting of their name,
the gamut from those aimed at children, those e-mail address, home page, what it was
that built community either through services about, and whether or not they had signed the
or award provision, to – of course – those guestbook. Some 25 people received his
written by children themselves. Let’s take one award: other kids, ‘fellow treasure hunters’,
well-connected example. Brandon’s page, at Power Rangers or Will Smith fan sites, Peter
Dell/4543, is worth exploring in detail. Like Pan, and so forth.
many other websites, it was active until some Brandon’s site was just one of tens of
point in 2001, when it ceased to be updated thousands. For many, the sites reflect child-
any further (Brandon was ten years old in hood interests and pastimes, or, in some
April 2001, and his age remained unchanged cases, both. Kim’s Pooh place, for example,
on the home page until the closure in 2009). combined an interest in Winnie the Pooh
His home page was an introduction to himself: with online games, prizes, and connectivity.
Others just had lots of content. Fans of the
Hi! My name is Brandon. I am 10 years old. As you Backstreet Boys, expressing excitement at
can tell from the music, I am a big fan of Star finishing elementary school and beginning
Wars. My hobbies include baseball (I hate that my high school, guidelines on how to do magic
season is over), football, watching nascar races tricks, stories about their favorite country or
(my favorite driver is Ernie Ervin), riding my bike,
and playing my games. My favorite game on historical figures (South Africa and Nelson
Virtual Boy is Mario Tennis, Sonic and Knuckles on Mandela in one case, Anne Frank in another),
Sega Genesis, Willy Beamish on Sega CD, Teenage and beyond.
Mutant turtles on Super Ninetendo [sic], and This is what we can learn from exploring
Star wars on 32x. When I grow up, I want to networks. But we can also use the text con-
be a baseball player. We live in S.C. I would like to
hear from you! (https://web.archive.org/web/ tained within GeoCities to garner substantial
20010404081732/http://www.geocities.com/ insights from the historical material that we
EnchantedForest/Dell/4543) are exploring.
354 THE SAGE HANDBOOK OF WEB HISTORY

TOPIC MODELING: FINDING the top two topics for a specific subset of
CLUSTERS OF WORDS IN GEOCITIES neighborhoods.
In general, we are seeing that topics
In a recent chapter for the UCL Press anthol- appeared in the neighborhoods that they
ogy The Web as History, I explored how we should have. The data demonstrates that such
could use various methods to find community correlation was not universal, however. The
within GeoCities. One approach that I used Enchanted Forest remained child focused,
there was topic modeling. This is an approach due in part to the efforts of engaged com-
that finds clusters of words that appear fre- munity leaders in a context of fears around
quently together, or topics (Blei et al., 2003). online child exploitation. The Pentagon
For example, when we write about our fami- expanded beyond its initial aim of connecting
lies we use words such as husband, wife, kids, widely deployed and constantly moving mili-
pets, and home. Or when we write about work tary members: it became a forum for military
we use words such as productivity, office, history and for activism and political discus-
commute, pain, and boss (Jockers, 2011). sion (as seen in the two topics highlighted
Latent Dirichlet allocation, or topic modeling, in the table below). Heartland, a significant
uses a sophisticated mathematical algorithm GeoCities hub, advanced a particular vision
to go through documents and put the words of ‘family’: focused on the Christian faith,
back into the baskets from which they came. domestic issues, and – significantly – geneal-
A researcher reading e-mails in the future ogy. Topic modeling can only tell us so much,
might then see two bags of words: husband, however. Many users want a search engine.
wife, kids, and office, commute, pain and call
them home and work, respectively. Without SEARCHING THE HAYSTACKS: A
reading individual e-mails, researchers can SEARCH ENGINE ON GEOCITIES
gain a sense of what the user wrote about.
We can do something similar for GeoCities One additional experiment that we did was
neighborhoods. In Table 23.2, I explore playing around with search engines. On a

Table 23.2 Topics in GeoCities


Neighborhood Top two topics in each neighborhood

Athens people things time person sense life man work world human good mind
‘… based on education, teaching, reading, soul make nature body case made point
writing, and philosophy’. part parts goddess witch healing incense witchcraft love energy pagan
shaman witches sun spirit protection light circle earth religion
EnchantedForest blue page school home day kids clues fun
‘A place for and about kids. Games, stories, time year room birthday family mom jordan play great party friends
educational sites, and homepages jq battalion show st jonny horse battery
created by kids themselves’. armored lt artillery camp sailor army field col pingu war area quest
Heartland people time children book years child information year work make life
‘A family-oriented neighborhood that school person system state world books government good
represents Main Street in cyberspace. This family county church home years information st city born state war school
is the place to find parenting, pets, and mrs history birth records great cemetery death
home town values’.
Pentagon war people president government american world states power state united
Military men and women. general military public soviet political clinton america make army
fort war civil island iran world adams army british history badge rhode
german french american forts walther cap newport
EXPLORING WEB ARCHIVES IN THE AGE OF ABUNDANCE 355

corpus of this size, keyword searching is hit others. Our TF-IDF search engine can be seen
and miss. We decided that a search engine in Figure 23.4.
based on TF-IDF would help users find pages Developed by researchers at the University
of particular interest. TF-IDF, or term of Waterloo, this search engine allowed us
frequency–inverse document frequency, to type in keywords and find the pages that
finds what words make a given document seemed most relevant. Figure 23.4 shows
(or in this case, page) special. As a quick results for a search for ‘Canada’. Each page
example, consider a collection of trial tran- leads to a thumbnail that can be clicked on
scripts: the word ‘trial’ or even ‘murder’ to bring you right to the page itself. Searches
might not make a document special, but the for ‘high school’ lead to high schools; ‘high
names of the victim, offender, or location school Canada’ leads to Canadian educa-
would be things that were unique to it. tional sites, and beyond.
Accordingly, TF-IDF controls for the fact I found that this search engine was a use-
that some words naturally appear more than ful complement to the above. By finding

Figure 23.4 The TF-IDF search engine.


356 THE SAGE HANDBOOK OF WEB HISTORY

topics and hubs of interest in the network At the end of the process, one needs to
analysis as well as topic modeling, this led to ensure that they are able to reach the sites
search topics: ‘awards’ led to a list of awards themselves: that a historian can read the
pages, ‘community leaders’ helped find the actual web page (close reading) but that
community hubs, or ‘Star Wars’ could find they understand how they would find it. The
those fan pages. But just by starting with the methods described above all facilitate repro-
search page, we would not know if it would ducible and understandable research meth-
be worth searching for awards or commu- ods: using hyperlink analysis to find pages of
nity hubs. The recommendation I would give interest with open-source platforms and well-
would be to begin with network analysis, get documented algorithms; exploring the text of
a sense of what was in a community, and then pages using well-traveled text-analysis algo-
to begin to search down by using the search rithms like TF-IDF; or finding particular fac-
engine itself. ets of the pages such as images in the case of
GifCities.
With the 186 million pages within
GeoCities, one could find anything to prove
FROM DISTANT TO CLOSE any narrative that they wanted to: pages
READING about cats, dogs, particular fashion trends,
or developments in web archives. Traditional
In some respects, what we have seen above is archives had a selection process, and by vir-
simply steps to get the historian to the right tue of something being kept in a national
place for them to do their work. The scale of library or special collections there was a sig-
GeoCities can elude meaningful analysis – nificance test passed; the same is not true with
you cannot read every page, and even if you web archives, as pages like GeoCities contain
could, by the time you finished reading them millions of documents that never would have
all it would be difficult to make sense of it all – been preserved under traditional appraisal
unless it is filtered down to a subset of criteria. Indeed, archival theory is an uneasy
information for traditional reading. All of bedfellow with web archives (Bailey, 2013).
the above examples ultimately help us get In this lies the opportunity to explore the lives
a sense of the context surrounding indi- of everyday people and their records, but also
vidual pages, which are read closely to tell the challenge of abundance. It also makes it all
the actual narratives that a historian wants. the more important to understand the context
Learning the topics and overall link con- of the documents that a historian is looking at.
tours helps reveal a ‘distant reading’ of a Just because it is in the GeoCities web
web archive, in the sense articulated by archive does not mean that it is important, as
literary scholar Franco Moretti. This form it must be contextualized. Is a website one of
of distant reading involves pulling one’s a kind? Is it representative? Is it influential?
gaze back to consider the ‘collective Is it marginal? All of the above questions
system, that should be grasped as such, as matter when reading the document. They
a whole’, as a result making the individual matter when working with traditional archi-
pages intelligible for the historian (Moretti, val documents too, of course, but at the scale
2007: 3–4). Reading one or two pages of a web archive – and the fact that more
could be misleading; reading thousands to information was kept without formal selec-
see the overall shape of the archive can tion criteria – they are even more important.
give us insights into the entire system as Yet just because it is in GeoCities does not
well as the individual documents mean, as this chapter has sought to show, that
themselves. it is not important.
EXPLORING WEB ARCHIVES IN THE AGE OF ABUNDANCE 357

CONCLUSIONS www.archivejournal.net/essays/disrespect-
des-fonds-rethinking-arrangement-and-
description-in-born-digital-archives/).
Everyday people, in general, did not leave
Barnet, B. (2013) Memory Machines: The Evo-
behind information about their everyday
lution of Hypertext. London: Anthem Press.
lives. While epistemologically true in the age Blei, D.M., Ng, A.Y., and Jordan, M.I. (2003)
of web archives, the changing scale of pres- ‘Latent Dirichlet Allocation’, Journal of
ervation of information about interactions on Machine Learning Research, 3: 993–1022.
the Web has changed this in absolute num- Brin, S., and Page, L. (1998) ‘The Anatomy of a
bers. We are now confronted with millions of Large-Scale Hypertextual Web Search
sources produced by everyday people in the Engine’, paper presented at the Seventh
course of their days: pages about their hob- International World-Wide Web Conference,
bies, their childhoods, their schools, their Brisbane, Australia.
favorite sports teams, even the minutiae of Brügger, N. (2013) ‘Historical Network Analysis
of the Web’, Social Science Computer
daily life. Yet what seems so liberating about
Review, 31(3): 306–321.
this shift – historical abundance – also pre-
Gleick, J. (2012) The Information: A History, A
sents challenges as we begin to experience Theory, A Flood. New York: Vintage.
the problems of abundance. Graham, S., Milligan, I., and Weingart, S.
GeoCities.com bears this out. This chap- (2015) Exploring Big Historical Data: The
ter has demonstrated that, through sites like Historian’s Macroscope. London: Imperial
GeoCities, we have the ability to find more College Press.
democratic histories of the past: histories that Jockers, M.L. (2011) The LDA Buffet is Now
include the voices of everyday people who Open; or, Latent Dirichlet Allocation for Eng-
just happened to have a website in the late lish Majors (http://www.matthewjockers.
1990s, and what that means for the histori- net/2011/09/29/the-lda-buffet-is-now-open-
or-latent-dirichlet-allocation-for-english-
an’s craft. Using GeoCities as a case study,
majors/). Accessed 18 July 2013.
it has also demonstrated one computational
Kamvar, S., Haveliwala, T., Manning, C., and
approach to studying the past: from leverag- Golub, G. (2003) ‘Exploiting the Block Struc-
ing hyperlinks, to exploring topics in text, to ture of the Web for Computing PageRank’
text-ranking algorithms like TF-IDF to find (http://ilpubs.stanford.edu:8090/579/).
the information that a historian might want. Accessed 22 June 2018.
By moving between the levels of close and Lin, J., Milligan, I., Wiebe, J., and Zhou, A.
distant reading, this helps us understand the (2017) ‘Warcbase: Scalable Analytics Infra-
tools and approaches that a historian might structure for Exploring Web Archives’, ACM
need to take. There are many other ques- Journal of Computing and Cultural Heritage,
tions that a historian might ask of GeoCities, 10(4): 22:2–22:29.
Milligan, I. (2017) ‘Welcome to the Web: The
and this has just scratched the surface. But it
Online Community of GeoCities and the
hopefully suggests one path forward.
Early Years of the World Wide Web’, in
N. Brügger and R. Schroeder (Eds.), The Web
as History. London: UCL Press. pp.
137–158.
REFERENCES Moretti, F. (2007) Graphs, Maps, Trees: Abstract
Models for Literary History. New York: Verso.
Aiden, E., and Michel, J.B. (2013) Uncharted: Motavalli, J. (2004) Bamboozled at the Revolu-
Big Data as a Lens on Human Culture. New tion: How Big Media Lost Billions in the
York: Riverhead. Battle for the Internet. New York: Penguin.
Bailey, J. (2013) ‘Disrespect des Fonds: Rethink- Ocamb, K. (2012) ‘David Bohnett: Social
ing Arrangement and Description in Born- Change Through Community Commitment’,
Digital Archives’, Archive Journal (http:// Frontiers, 18.
358 THE SAGE HANDBOOK OF WEB HISTORY

Old Bailey Proceedings Online. (2013) Old American Historical Review, 108(3):
Bailey Online – The Proceedings of the Old 735–762.
Bailey, 1674–1913 (http://www.oldbaileyon- Scott, J. (2011) ‘The Geocities Torrent: Patched
line.org/). Accessed 22 June 2018. and Posted’ (http://ascii.textfiles.com/
Ridey, R. (1996) ‘Roger Widey travels under the archives/3046). Accessed 22 June 2018.
volcano, and also discovers a Web full of Scott, J. (2009) ‘Datapocalypso’ (http://ascii.
creepy-crawlies’, The Independent, 15. textfiles.com/archives/1649). Accessed 13
Rogers, I. (n.d.) ‘Pagerank Explained Correctly June 2014.
with Examples’ (http://www.cs.princeton. Turner, F. (2008) From Counterculture to Cyber-
edu/~chazelle/courses/BIB/pagerank.htm). culture: Stewart Brand, the Whole Earth
Accessed 20 April 2016. Network, and the Rise of Digital Utopianism.
Rosenzweig, R. (2003) ‘Scarcity or Abun- Chicago: University of Chicago Press.
dance? Preserving the Past in a Digital Era’,
24
Blogs
Ignacio Siles

The emergence of the blog (originally known Scholars have historicized blogging in sev-
as ‘weblog’) constitutes one of the most eral ways. Some studies have situated blogs
important developments in the history of the within the wide history of practices and tech-
Web. Perhaps more than any other practice, nologies that predate them. The main insight
blogging embodied widespread ideas about from this strand of research is that the blog’s
the potential of the Web for self-expression at most defining features and use practices can
the turn of the twenty-first century. By the be traced back to multiple sources (Herring
mid 2000s, blogs were the icon of the et al., 2005; Miller and Shepherd, 2004).
‘Web 2.0’ discourse, the key example that Another line of work has investigated how
revealed the dynamic nature of the Web, the blogs acquired a recognizable set of char-
new kind of business models it entailed, and acteristics at the end of the 1990s (Blood,
the challenge the Web posed to social institu- 2002; Lovink, 2011; Siles, 2012a). Ammann
tions such as the mainstream media (O’Reilly, (2009) thus studied the role played by Jorn
2005). More recently, blogs have played a Barger in the rise of an early group of users.
crucial role in helping to conceptualize the Other scholars have investigated how, after
emergence and development of other Web its origins in a close community of practi-
technologies. The term ‘microblogging’, typ- tioners, blogging was widely adopted in a
ically employed to label technologies such as variety of fields. For example, Rosenberg
Tumblr and Twitter, suggests that the rela- (2009) discussed how blogging developed
tionship between blogs and other Web arti- into political and commercial phenomena.
facts is of continuity and refinement. In this Finally, there has been a growing interest in
sense, the history of the blog encapsulates documenting the specific configurations that
the history of the Web as a technology for the development of blogging has acquired in
self-performance. national settings (Locatelli, 2014; Moe, 2011;
360 THE SAGE HANDBOOK OF WEB HISTORY

Russell and Echchaibi, 2009; Weltevrede and I begin by examining how blogs emerged
Helmond, 2012). in the second half of the 1990s. I look at how
This chapter contributes to this body of blogs absorbed the identities of other exist-
work by tracing how blogs emerged, stabi- ing websites and content-creation practices
lized, and developed in the United States from on the Web. The result was the stabilization
the mid 1990s to the present day. I focus on of blogs as a malleable ‘format’ for shar-
the United States not only because blogging ing an expansive variety of content types.2
emerged there but also because of the signifi- Second, I analyze the evolution of blogs in
cance that the country has played in shaping the early years of the new millennium. It
an imaginary around blogging that has been was during this period that practitioners in
influential in other contexts.1 Although nego- the United States and abroad adopted blogs
tiated in important ways, the evolution of in a variety of ways and began speaking of
blogging in the United States provided Web sub-types of blogs, as opposed to the generic
users and developers in other parts of the format that characterized the early days. I
world with a framework for making sense of also illustrate how users adopted blogs inter-
the cultural meaning of this practice (Siles, nationally through the example of France.
2017). This is not to suggest that the interna- Third, I discuss how technologies created in
tional uptake of blogging has been uniform. the second half of the 2000s that sought to
As Russell and Echchaibi (2009) remind us, replace or extend blogging came to be associ-
blogging ‘is being conceptualized differently ated with the notion of ‘microblogging’. The
in distinct cultural contexts. A blog can be final remarks bring this chapter to a close by
more things that we are presently imagining, discussing recent evolutions in the history of
a vehicle of democratic expression, yes, but blogs. I argue that blogs are ‘paradigmatic’
also a means to revive tradition, to explore in the sense that they shaped the terms for
identity, [or] to conduct public relations’ understanding the history of the Web as a
(2009: 8). The role of blogging in non-liberal technology for self-performance.
democracies or during major political events My historical analysis draws on a mixed-
attests to the importance of this remark. methods research design conducted in two
To account for this historical process, I countries: the United States and France.3
draw on the theory of ‘articulation’, or the The study on which this chapter is based
notion that meanings are partially established integrated findings from 105 interviews with
through the connection between elements Web users, software developers, investors,
with no necessary relation, such as ‘values, entrepreneurs, commentators, and analysts,
feelings, beliefs, practices, structures, organi- among others (conducted between 2009 and
zations, [and] ideologies’ (Slack, 2006: 225). 2014); traditional archival research and Web
I argue that the identities of blogs in the United archival techniques; both content and mate-
States developed as Web users and software rial analyses of a sample of websites; eth-
developers established links between certain nographic participation in meetings of Web
kinds of websites, metaphors, and practices users and developers; and visits to numerous
of content creation. This approach allows me software companies and organizations that
to depart from a ‘heroic innovators approach’, have appropriated blogs.
that is, the idea that ‘behind every success-
ful innovation in human endeavor is likely a
champion, an articulate visionary, an inventor
perhaps at the margins of the social institu- CREATING A FORMAT FOR SELF-
tions of the day’ (Neuman, 2010: 8). Thus, PERFORMANCE ON THE WEB
rather than focusing on events and figures, I
make my focus the historical processes that In December 1997, Jorn Barger started Robot
these important cases illustrate. Wisdom, a website devoted to sharing
BLOGS 361

annotated hyperlinks to other sites on the other sources of information online. Bill
Web. This computer programmer referred to Humphries, a programmer and early user,
his new project as a ‘weblog’ or ‘a daily run- recalls, ‘The early mindset of [users] was to
ning log of the best webpages I visit’, as he find interesting things and link to them. It
described it in an online forum one week was description or comment and link’ (inter-
after its launch (Barger, 1997b). In his first view with author, 2009).
post, Barger reflected on information about Users posted the most recent information
gangs in Chicago and linked to another site in reverse chronological order to make con-
where his readers could discover ‘a ton of tent easier to read for others. According to
details – names, symbols, alliances – you Wesley Felter, a software developer and early
never see anywhere else’ (Barger, 1997a). weblog user, ‘The idea was that someone was
Barger was not shy about making predictions coming to read the weblog every day, and
about the impact his new online venture they wanted to see what’s new since yester-
would have and invited other Web users to day, so [we would] put the more interesting
join the inescapable expansion of weblogs. or more important links at the top’ (interview
In his words, ‘I suspect that in a year there’ll with author, 2009). These sites also included
be hundreds of people maintaining pages like a menu that linked the most recent entries
this, and that this will allow good URLs to to other sections of the website, options for
spread much more quickly’ (Barger, 1997b). allowing readers to configure the site’s color
In the following months, several users ful- scheme, automated search functions of key-
filled Barger’s prophecy. Throughout 1998, words, archives of older posts, and mecha-
many Web users created their own weblogs, nisms for sharing a link with the creator of
sharing comments and hyperlinks to other the site.
online sources. By 1999, blogs caught the To conceptualize what they considered to
attention of the mainstream media. Journalist be similar kinds of websites, users employed
Scott Rosenberg (1999) maintained that ‘[the] a variety of metaphors. For example, Jorn
phenomenon known as the weblog is one of Barger highlighted their role in the distilla-
the fastest-growing and most fertile creative tion of the Web’s content: ‘We vacuum the
areas on the Web today’. Press articles like Net for stories that the major outlets haven’t
this one illustrate the growth of this online noticed yet, and pass along our sources so
activity in a short period of time. By 1999, we can all get more and more efficient at this
users and journalists recognized the blog as vacuuming’ (Barger, 1998). Other users also
a specific type of website with a defining set employed the notion of ‘pre-surfing’ the Web
of features. for readers (Graham, 1998).
How did the blog gain a relatively stable Another dynamic of articulation that
identity as a specific kind of website? To shaped the emergence of blogs was the adop-
make this happen, users began by creating tion of a name to identify websites that shared
patterns of similarity between certain exist- this common set of traits. Throughout 1998,
ing websites. Throughout 1997 and 1998, a set of competing names circulated among
several Web users identified sites that seemed users. A small group of users adopted Barger’s
to share various characteristics. Male individ- weblog concept. But this term was only one
uals related to the technology development among other possibilities. Some users, for
field, such as software producers, computer the most part associated with a community of
programmers, and Web designers, created developers that coalesced around a software
most of these websites. These sites were named Frontier, had been naming these sites
characterized by features such as relatively ‘news pages’ since at least 1997. This term
short comments on recent news about tech- described websites created with Frontier,
nology, the Internet, and Web design. They organized in reverse chronological order,
were also full of hyperlinks that pointed to and devoted to sharing news, comments,
362 THE SAGE HANDBOOK OF WEB HISTORY

and hyperlinks to other Web sources mostly (2009: 284). Online forums were also a key
about technology issues. Another common dynamic of community building. In July
name at the time was ‘microportals’, defined 1999, Web developer Matt Haughey launched
by one of its advocates as ‘indy [sic] sites MetaFilter, a ‘community weblog’, that is, a
that change all the time [usually] run by blog updated by multiple users. According to
one person, or a small group. Most of them Haughey, MetaFilter sought to crystallize the
belong to presurfers, or people who find sense of community arising from interactions
links and share the best with others. Other between early weblog users (interview with
sites are newsfeeds’ (Wallace, cited in Siles, author, 2011). By early 2000, this site had
2017). Finally, other names included ‘filters’ become a central site of discussion for many
– inspired by an influential website created users. In August 1999, Barger also created a
by a Web developer named Michael Sippey mailing list for weblog users that was estab-
that included a section called The Filter (later lished as an important mechanism to build a
named Filtered for Purity) – or simply ‘per- sense of community between them.
sonal websites’ or ‘homepages’. The constitution of this community
The weblog became the standard name involved a parallel process of identity forma-
mostly because influential users (such as tion by which members of this group con-
Barger and others) employed it as an articu- ceived of themselves as representative figures
latory concept to link existing approaches of a particular type of Internet user, expressed
to Web publishing. By the end of 1998 and by concepts such as the Web’s ‘pre-surfer’ or
beginning of 1999, many users began refer- the ‘weblogger’ (Siles, 2012b). Establishing
ring to their websites as weblogs. In February a distinction between ‘webloggers’ and
1999, Pete Prodoehl (1999), an early Frontier ‘online diarists’ allowed the former to fur-
user, wrote: ‘I didn’t even know this site was ther define themselves as a community. As
a “weblog,” but now I do ….’ Similarly, participants of the technology and Internet
in May 1999, John SJ Anderson (1999), development fields, early bloggers knew a
a molecular biologist who created a site in variety of programing languages and techni-
1998, announced: ‘I’ve decided: GeneHack cal skills. Users envisioned these skills as a
is now a web log, as opposed to the mess o’ marker of identity and a source of differen-
links that I previously thought of it as’. In tiation with respect to other users who coded
addition to a common set of traits, this group their sites in HTML (most notably online dia-
of websites were now also linked by a com- rists). This sense of community led to various
mon name. This name got shortened after efforts to meet in person. A memorable gath-
one user, Peter Merholz, announced early in ering of early bloggers took place in March
1999 his decision to pronounce it as ‘wee- 2000, at the South by Southwest conference.
blog’ or ‘blog’. During this event, many users met each other
Throughout 1999, users implemented dif- in person for the first time and discussed in
ferent means to communicate with other roundtables and informal conversations the
practitioners. Early references to a small set possibilities and limitations of blogs.
of ‘sites like mine’ evolved into recurrent In the final years of the 1990s, more users
allusions to a ‘weblog community’. Certain began appropriating blogs for a variety of
users played a crucial role in the formation purposes (Siles, 2011). A key factor in mak-
of this sense of community. According to ing this possible was the emergence of auto-
Ammann (2009), Jorn Barger ‘shaped the mated software. Unlike Frontier, which was
concept of the weblog community […] by ‘an integrated development environment
setting a prolific and inspiring example in his for building and managing […] websites
Robot Wisdom Weblog and by meticulously [devised for] a webmaster or web developer
crediting his fellow practitioners in its pages’ with experience in system-level scripting’
BLOGS 363

(Winer, 1996), these new tools were specifi- multiplied quantitatively and transformed
cally designed for a wide audience interested qualitatively. The generic concept of the blog
in automating the process of publishing a format that characterized the early days was
website. Andrew Smales released software gradually replaced by the notion that blog-
for blogging called Pitas in July 1999 and, ging had a multiplicity of sub-types: the
four months later, Diaryland for online dia- ‘news blog’, the ‘political blog’, the ‘video
rists. Paul Kedrosky, a business school pro- blog’, the ‘gadget blog’, the ‘fashion blog’…
fessor, launched Groksoup in August 1999. The identity of blogging was thus re-articu-
In San Francisco, a company named Pyra lated: practitioners established a set of differ-
launched an automated Web application ent connections between the blog format and
called Blogger in August 1999. other practices, metaphors, and meanings.
As users kept creating more and different Several factors worked as conditions of pos-
content and software programs standardized sibility for these re-articulations: the availa-
these sites’ features, a notion gained traction bility of software programs that afforded
that defined blogs as a ‘format’ suited ‘for new, specific content-creation practices;
publishing all kinds of information on the transformations in the daily life of actors;
Web’, as Evan Williams, one of the developers and larger economic, political, and cultural
behind Blogger, described it (from an inter- processes. In particular, neoliberalization
view in Turnbull, 2002: 83). Users and soft- infused these re-articulations as it provided a
ware developers utilized the notion of ‘format’ framework for making sense of the purpose
to suggest that blogs were a content-agnostic and nature of blogging. Users and software
medium (Siles, 2011). Meg Hourihan (2002), developers thus re-articulated blogs as a
a co-creator of Blogger, suggested, ‘What we response to these factors.4
write about does not define us as bloggers; it’s By making visible these factors, it
how we write about it (frequently, ad nauseam, becomes easier to understand how and why
peppered with links)’. In this way, users shifted some of the most popular kinds of blogging
the focus of blogging from a singular end to developed. Thus, although in this chapter I
the necessary means for creating, storing, and focus particularly on the examples of ‘news
sharing various content online. ‘Format’ also blogging’ and ‘political blogging’, the goal is
tied the blog to the notion of genre, a par- to provide an account of the process through
ticular arrangement of technological features which multiple blogging identities emerged.
crystallized in standardized software tem- I concentrate on these cases because of how
plates. Finally, ‘format’ designated the size of they captured public imagination in the early
the blog as a publication outlet, which users years of the 2000s in the United States. I also
and software developers described as being in discuss briefly the case of France as an exam-
opposition to the ‘page paradigm’. Blogs, they ple of how the evolution of blogging acquired
argued, revolved around the ‘post’, a unique, an international flavor.
smaller unit for producing meaning. In this The proliferation of blogging in the first
way, by the end of the 1990s, blogs had gained years of the new millennium built on the
a new identity as a malleable means for pub- availability of automated software that envi-
lishing on the Web. sioned blogs as a ‘blank canvas’, as software
developer Meg Hourihan described them
(interview with author, 2009). In contrast to
early popular applications, such as Blogger,
THE PROLIFERATION OF BLOGGING these new programs required an installation
on the user’s server. Noah Grey thus described
In the early years of the new millennium, the the creation of a software program named
identities of blogs proliferated, that is, they Greymatter in late 2000: ‘I simply wanted to
364 THE SAGE HANDBOOK OF WEB HISTORY

give myself more control and organization. we wanted a voice in our nation’s politics’
[…] I couldn’t find another tool to do what (2006: vi–vii). Seen in this way, blogging
I wanted to do, so I wrote my own’ (inter- was a response to significant events in the
view with author, 2012). In a similar man- life of actors and larger cultural, political,
ner, developer Mena Trott (n.d.) narrated the and social processes.
design of a popular software program named According to these users, the political and
Movable Type as a tale of experimentation. economic climate of the early 2000s made it
For users, the availability of novel software imperative to transform the public sphere out-
programs created fertile grounds for new side of established power centers, notably the
content-creation practices that shaped signif- mainstream media. This required a capacity
icantly the identities of blogs. For example, to translate the analysis of current events into
these tools naturalized the post with several an opinion and perspective that could con-
paragraphs as an important part of blogging. tribute to democratic life (Walton, 2004). The
According to Tom Coates, a key figure in the privileged notion to conceptualize the prod-
development of blogging in the UK: uct of these practices was the ‘blogosphere’.
It is possible to trace this notion back to
When it started, [Blogger] was a box you could 1999, when Brad Graham, an avid early user,
type stuff into and press the button. And then deployed it half-jokingly to characterize the
Moveable Type came on. And the first implication sense of community that was forming among
[…] was that people felt much more compelled to
write a lot of things. It’s like, ‘I’ve got to write; I’ve blogging practitioners. In the early years of
got a whole page just sitting there!’ I certainly felt the 2000s, it was used in all seriousness to
the pressure to write more and more intelligently. suggest that blogging embodied a possibility
(interview with author, 2011) to reach the Habermasian ideal of the public
sphere, that is, a symbolic space where citi-
In addition to the transformation of software, zens could meet to deliberate rationally about
the context of blogging changed at the turn of public affairs (Habermas, 1989). In the case
the century. The early days of blogging had of blogs, this was interpreted as the forma-
been a fruitful period for self-exploration tion of a network of websites exchanging
(Siles, 2011). But the political moment opinions and news to monitor the state and
changed in the early years of the new millen- change politics from the bottom up.
nium in ways that made it necessary for users Against the background of a crisis of
to share their voice online. A key example of legitimacy experienced by traditional insti-
this was the appropriation of blogs as a tutions and organizations such as the state,
mechanism to intervene in public life and the mainstream media, political parties, and
make the voices of citizens heard in society large enterprises, actors from these organiza-
(a practice originally known as ‘warblog- tions began exploring the world of blogging
ging’ and then ‘political blogging’) (Welch, with the hope to reconnect with citizens.
2002). The context of this phenomenon was Already in 1999, James Romenesko, a jour-
the aftermath of 9/11 and the US invasion of nalist who covered technology news at the
Afghanistan. To give an opinion about these St. Paul Pioneer Press, had created a website
events became pressing for people who felt to aggregate news called Media Gossip. ‘My
they had no place to go other than the Web. goal’, he recalls, ‘was to try to find things,
According to Jerome Armstrong, an activist journalism items, that most readers would not
who created a website named MyDD (My find on their own, and that meant going to
Due Diligence) in 2001, and Markos (‘Kos’) some more obscure publications and media
Moulitsas Zúniga, a journalist and political criticisms’ (interview with author, 2009). In
scientist who built his own site (Daily Kos) in a similar manner, technology journalist Dan
2002, ‘Both of us started our blogs because Gillmor started a column in October 1999 as
BLOGS 365

a weblog at the San Jose Mercury News. He process of commodification. In this way,
appropriated blogs to expand the content of blogs became ‘enterprise solutions […] [and]
his newspaper columns. Based on the expe- empires of nano-publishing’, as UK blogger
riences of these precursors, early in the new Tom Coates (2003) put it.
century newspapers began blogging as they To be sure, the proliferation of blog-
sought to bring transparency to their inner ging was not exclusive to the United States
workings and thus provide a solution to the (Russell and Echchaibi, 2009). The case of
crisis of legitimacy (Nielsen, 2012). These France illustrates how certain notions about
efforts came to be known as ‘news blogs’, blogging coalesced into an imaginary that
By the mid 2000s, news organizations had acquired particular flavors worldwide (Siles,
gradually institutionalized these practices 2017). Like in the United States, the blog
and expectations. absorbed the identities of early websites in
The invention of ‘news blogs’ and ‘political France, most notably the page perso (short
blogs’ posed a singular challenge: to generate for personal page), the quintessential embod-
a steady income to turn blogging into a full- iment of self-performance on the 1990s
time activity. During the early years of the French Web. This was a generic term used
new millennium, users began experimenting to describe websites where users discussed
with ways to generate this income, notably their passions and interests. By mid 2001,
by inviting readers to donate to their sites and some users were debating intensively about
implementing various advertising regimes. the success of blogs in the United States and
Underlying these efforts was a transformation how it could be replicated in France. A com-
of the identity of blogging. Rather than being puter scientist named Stéphane Gigandet thus
hailed as the ‘soapbox’ of the ‘John and reflected: ‘Weblogs may be a triumph in the
Jane Does of the Net’ (Yahoo! Internet Life, United States, but in the French-speaking
2002: 57–8), as an outlet described them at the world, they are in their infancy’ (Gigandet,
turn of the century, blogs were re-articulated 2001a). The following day, Gigandet (2001b)
as a commodity. This, in turn, required published an essay to argue for the need to
re-imagining the blogger as the digital avatar Frenchify blogs: in his view, they represented
of the neoliberal entrepreneur. an opportunity to revive self-performance
Henry Copeland, a journalist and consult- practices that had given life to sites like pages
ant, emphasized the need for a new economic persos.5
model to highlight the entrepreneurial dimen- It is difficult to overstate the excitement
sion of blogging. He defined it as ‘blogonom- that the discovery of blogs caused among
ics’: ‘a model of economic and informational Web users in France over the following years.
collaboration through cross linking that This was not lost on the international press.
creates ad hoc networks and communities’ The New York Times noted in 2006, ‘Already
(interview with author, 2012). Thus, accord- famed for angry labor strikes and philosophi-
ing to Copeland (2002), bloggers were bet- cal debates in smoke-filled cafés, the French
ter described as ‘idea entrepreneurs, living have now brought these passions online to
in a clickocracy, risking their time and pas- become some of the world’s most intensive
sion on writing’. Such a conception would bloggers. The French distinguish themselves,
‘enable hundreds of thousands of new idea both statistically and anecdotally, ahead
entrepreneurs to carve out local, ideologi- of Germans, Britons and even Americans
cal or conceptual niches and make a living’. in their obsession with blogs’ (Crampton,
Projects to turn blogs into profitable ventures 2006).
and enterprises, such as Gawker (2003), By the mid 2000s, ideas about the politi-
The Huffington Post (2005), and Weblogs, cal potential of blogging, largely advanced
Inc. (2003), were outcomes of this larger in the United States, gained wide notoriety
366 THE SAGE HANDBOOK OF WEB HISTORY

in France. These ideas emphasized the pos- entrepreneurial spirit – borrowing the expres-
sibilities afforded by blogging to shape the sion from Boltanski and Chiapello (1999) –
public sphere outside of traditional power is the drive behind most sub-types of blog-
centers. Early users appropriated the ging that have emerged since the mid 2000s:
American imaginary around blogging as a ‘mommy blogs’, ‘video blogs’, and ‘fashion
solution to a very French problem: the prox- blogs’, to name a few. These phenomena need
imity between journalists in the mainstream to be situated within a context marked by the
media and the political elite. ‘Just like the prevalence of a cultural discourse – or a sub-
2004 campaign in the United States’, wrote ject position – that compels users to think of
Pô and Vanbremeersch (2007), ‘the 2007 themselves as neoliberal entrepreneurs who
campaign in France signal[ed] the rise and forge market transactions with others online
structuration of [a] new political space’ (Duffy and Hund, 2015).
(2007: 155). This new political space, users
argued, could now be occupied by ‘ordinary’
citizens. Like in the United States, although
at a slower pace, the appropriation of blogs in THE INVENTION OF
France was also characterized by commodifi- ‘MICROBLOGGING’
cation and neoliberalization dynamics. As the
decade progressed, actors in France argued By the mid 2000s, developers built a set of
more explicitly for defining the blogger as technologies that came to be defined as
an entrepreneur and defining the blogosphere ‘social network sites’ and ‘microblogging’
as a marketplace of attention (Le Meur and tools. Although instances of these kinds of
Beauvais, 2005). applications can be found from the initial
Rosenberg (2009) argues that, since the days of Web-based automated software, it
early 2000s, ‘blogging has fragmented […]: wasn’t until the mid to late 2000s that these
there are not just craft bloggers, but whole services (and notions) stabilized.
cadres of knitting bloggers and weaving These tools sought to both capture early
bloggers; not just foodie bloggers, but beer blogging practices and enable new concep-
bloggers and sushi bloggers. […] Each sub- tions of what it meant to be in public online.
culture has its own norms of behavior’ (2009: In the months that followed the creation of
263–4). It is these blogging ‘subcultures’ that tools such as Twitter and Tumblr, a specific
have captured public attention since the mid concept emerged to refer to these applica-
2000s. For example, referring to one spe- tions and link (or articulate) them together:
cific ‘fashion blog’, a writer in The New York ‘microblogging’ (Waters and Nuttall, 2007).
Times stated in 2017: ‘It paved the way for Before 2007, the term ‘microblog’ was
the fast fashion news cycle, creating an appe- occasionally used as a synonym of ‘niche
tite for trade sites […] [and] catwalk images blog’, a site devoted to discussing a spe-
[…]. For better or for worse, it was instru- cific topic (Rowan, 2003). Commentators
mental in the democratization of fashion as also employed the notion of ‘miniblog’ to
we know it’ (Taylor, 2017). describe blogs about specific topics that were
Underlying this evolution of blogging into incorporated within larger websites (Faler,
a matter of ‘subcultures’, ‘niches’, or ‘com- 2005). But as artifacts such as tumblelogs,
munities’ are the factors outlined above: Jaiku, and Twitter further stabilized by early
the availability of software to enact specific 2007, the term ‘microblogging’ acquired a
blog sub-types; transformations in the daily specific definition.
life of actors; and larger economic, political, The notion of ‘microblogging’ played an
and cultural processes. The promises of com- articulatory role in several important ways.
bining the cultural force of blogging and an It enabled its proponents to bring together a
BLOGS 367

group of technologies that had no necessary several paragraphs of text, a footer, a permalink
relation.6 The possibility to publish short- page with comments […] but something that I
really wanted out of my website, out of my blog,
form content functioned as the glue that tied
was much more freeform. (From an interview in
these technologies together. As a concept, Schonfeld, 2011)
‘microblogging’ also allowed advocates to
link the blogging imaginary and this group of In his view, the solution to this problem came
technologies. In this sense, the term ‘micro- in the form of the short post.
blogging’ worked to suggest that blogging The invention ‘microblogging’ thus
had been ‘remediated’ (Bolter and Grusin, rested on the belief in the need for simplify-
1999). David Karp, the creator of Tumblr, ing self-performance on the Web. This drive
defined both Tumblr and Twitter as ‘blog- towards simplicity must be interpreted as an
ging that favors short-form data’ (from an expression of a wider cultural concern with
interview in Gwinn, 2007). Evan Williams, complexity (Maeda, 2006). By simplicity,
a co-developer of both Blogger and Twitter, software developers typically referred to the
argued that these tools were linked by an size of posts, the speed at which they could
ontological property of the Web. He claimed: be made available to others, and minimalism
‘[Twitter is] like we’ll take blogging, we’ll in technological design. In short, simplicity
take out all these features and we’ll limit the meant reducing the number of technological
size of the post and that will be a whole thing. features to the minimum required to enable
[…] [It] is like a molecule that is everywhere’ social interaction (Siles, 2013). According
(cited in Moggridge, 2010: 278). In this view, to Evan Henshaw-Plath, one of the original
‘microblogging’ technologies allowed blog- developers of Twitter, during the creation of
ging to be re-imagined in new ways. the tool ‘there was a discussion about what
Underlying these assertions was the belief you could strip away. […] What [was] the
that blogging required a major transformation simplest thing that [could] possibly work as
to maintain its relevance in a shifting Web opposed to what [was] featured complete?’
ecology. A post called The Huffington Post (interview with author, 2011). These concep-
is Not a Blog, written by Jorn Barger – the tions of simplicity found cultural resonance.
programmer who coined the term ‘weblog’ In the media, ‘microblogging’ technologies
in 1997 – aptly captures the dissatisfaction were typically associated with notions of
of some users about the commodification of speed, simplicity, originality, purity, brevity,
blogs. Barger (2005) maintained: ‘[I] was and minimalism (Glaser, 2007; Gwinn, 2007).
distressed to discover that the original intent As the 2000s came to an end, the ‘micro-
of the expression “web logging” (to log your blogging’ notion reached the status of a
websurfing with public annotations) has gone keyword. For Williams (1983), keywords
entirely by the boards’. Specifically, Barger ‘are significant, indicative words in cer-
criticized the notions of ‘niche blog’ and tain forms of thought [that] bound together
‘blog aggregators’ illustrated by blog net- certain ways of seeing culture and society’
works such as The Huffington Post. (1983: 15). As such, the concept functioned
For other users, blogs had standardized in to fill what seemed like a semantic void
unproductive ways and, at the center of this created by the availability of software pro-
standardization, were unnecessarily long grams that extended the history of blogs
posts and certain technological features made through the prism of simplicity. However,
to enable such form of writing. In the words the ‘microblogging’ keyword was not
of Tumblr’s creator: accepted unanimously. Some blatantly
The blogosphere […] had matured to a place
rejected it and the assumptions on which
where it was really designed for editorial publish- the term relied. These critics contended that
ing. Everything after Movable Type had a title, ‘microblogging’ was a vague and imprecise
368 THE SAGE HANDBOOK OF WEB HISTORY

word with no major conceptual value Finally, some actors justify the importance
that tended to freeze what were fluid and of blogging by emphasizing its renewed sin-
dynamic processes of technological design gularity in the Web ecology. This view posits
and appropriation (Selvitelle, 2008). fundamental differences between blogging
and ‘microblogging’ – as both technolo-
gies and practices – but argues for recon-
ciling them rather than keeping them apart.
A HISTORY OF NOW
Blogging and ‘microblogging’ are defined
neither in terms of competition nor hybrid-
The rise of ‘microblogging’ and ‘social net- ity, but rather as forming a symbiotic rela-
work sites’ technologies has led to a reas- tionship: they supplement each other through
sessment of the place of blogging in the their alternative strengths. According to
contemporary Web ecology. Far from a linear WordPress’s co-creator, Matt Mullenweg:
evolution, this can be conceptualized a site of
struggle. From this perspective, reconsidera- Blogging […] is the natural evolution of the lighter
tions of blogging can be situated within a publishing methods – at some point you’ll have
spectrum of positions that range from those more to say than fits in 140 characters, is too
who construe the rise of ‘microblogging’ as important to put in Facebook’s generic chrome, or
you’ve matured to the point you want more flexi-
a threat to those who envision it instead as a bility and control around your words and ideas.
spring of new possibilities for blogging and […] You don’t stop using the lighter method, you
the Web. just complement it – different mediums afford dif-
For some, the rise of ‘microblogging’ tech- ferent messages. (Mullenweg, 2011)
nologies has signaled the imminence of the
blog’s demise. In this view, the gradual com- For Mullenweg, then, to write long posts
modification and standardization of blogging remains fundamental for self-elaboration
killed the merits of this practice. Nicholas dynamics in that it allows the short post to be
Carr (2008), for example, argued that blogs expanded in more substantial ways. Blogging,
had ‘become mainstream’ and thus ‘lost as it were, is back (or never went away).
much of their original personality’. Instead, The processes and trajectories discussed in
these commentators emphasize the virtues of this chapter reveal the centrality of the blog
simplicity and the technologies that enable in defining the history of the Web as a tech-
it. Wired’s Boutin (2008) thus asked: ‘[W] nology for self-performance. Put differently,
hy bother [blogging]? The time it takes to the history of the blog is the history of how
craft sharp, witty blog prose is better spent and why we have used the Web repeatedly to
expressing yourself on Flickr, Facebook, or make ourselves ‘public’ to others. From this
Twitter’. perspective, blogs may be defined as a para-
An alternative position in the struggle digmatic technology in Agamben’s (2009)
for establishing the blog’s contemporary sense, that is, they are the case that makes
meaning comes from users and developers, intelligible singular aspects of a whole set
who argue for hybrids between both blog- of artifacts (such as sites and tools for self-
ging and ‘microblogging’. In their view, the elaboration online). According to Agamben,
materiality of software programs makes this ‘paradigms establish a broader problematic
combination come to fruition in productive context that they both constitute and make
ways. Thus, blogging is not about to perish intelligible’ (2009: 17). As a paradigmatic
because of ‘microblogging’ but can be rein- technology, the history of the blog is also
vigorated instead by integrating ‘microblog- the history of how its most defining features
ging’ into its defining set of technologies have become standard affordances of the new
and practices. media ecology. Rather than technological
BLOGS 369

necessities, these features are the result of REFERENCES


specific historical contexts that need to be
carefully interrogated. Agamben, G. (2009) The Signature of all
Things: On Method. New York: Zone Books.
Ammann, R. (2009) ‘Jorn Barger, the News-
Notes Page network, and the emergence of the
weblog community’, paper presented at the
1  In their analysis of the international uptake of
blogging, Russell and Echchaibi (2009) also speak 20th Association for Computing Machinery
of an ‘American blogging model’. Referring to Conference on Hypertext and Hypermedia,
the case of Italy, Locatelli (2014: 50–1) argues Italy.
that ‘the U.S. model remains a point of reference’ Anderson, J.S. (1999) No title, Genehack
and suggests that the ‘imitation of the American (http://genehack.org/1999/05/). Accessed
experience’ was a key factor driving the early June 18, 2018.
appropriation of blogs. See also Siles (2017) for a Armstrong, J., and Moulitsas Zúniga, M. (2006)
comparison of the trajectories of blogging in the Crashing the Gate: Netroots, Grassroots, and
United States and France.
the Rise of People-Powered Politics. White
2  In the vocabulary of Science and Technology
River Junction, VT: Chelsea Green
Studies (STS), stabilization refers to the process
through which groups of interest actors negoti- Publishing.
ate the identity and meaning of an artifact (Bijker Barger, J. (1997a) ‘Robot Wisdom Weblog’,
et al., 1987). RobotWisdom (http://web.archive.org/
3  The project from which this chapter stems began web/20000817183237/www.robotwisdom.
as doctoral dissertation research. com/log1997m12.html). Accessed June 18,
4  Drawing on both genre and medium theories, 2018.
Miller and Shepherd (2009) provide a similar Barger, J. (1997b) ‘“Weblogs” are the best
explanation. As an alternative, and inspired format for hotlists’, comp.infosystems.www.
mostly by STS literature, I conceptualize the his-
announce (http://groups.google.com/group/
tory of blogging as a mutual shaping process in
comp.infosystems.www.announce/browse_
which the technology of blogging acquired cer-
tain configurations (depending on the specific frm/thread/7de977b747c34d3e/4af6ac27da
sub-type of blogs that users enacted to respond d974fd?pli=1). Accessed June 18, 2018.
to their changing contexts), and cultural expecta- Barger, J. (1998) ‘My weblog press release’, rec.
tions have found recurrently in blogging a mate- arts.books (http://groups.google.com/group/
rial expression. rec.arts.books/browse_thread/thread/
5  In a similar manner, Locatelli (2014) refers to the a01e262d08703942). Accessed June 18,
‘cultural adaptation’ to which blogs were sub- 2018.
jected in Italy. Siles (2007–2008) also shows that Barger, J. (2005) ‘The Huffington Post is not a
blogs absorbed the identities of previous sites
blog’, Robot Wisdom Auxiliary (http://
in Costa Rica and that localizing them (through
robotwisdom2.blogspot.com/2005/12/huff-
names that referred to typical Costa Rican expres-
sions) was crucial in their early development. ington-post-is-not-blog.html). Accessed June
6  In this sense, the ‘microblogging’ notion func- 18, 2018.
tioned as a crucial link between what Flichy Bijker, W.E., Hughes, T.P., and Pinch, T.J. (eds.)
(2007: 82) calls a frame of functioning – ‘the (1987) The Social Construction of Techno-
body of knowledge and know-how mobilized or logical Systems: New Directions in the Sociol-
mobilizable’ in the design of media technologies ogy and History of Technology. Cambridge,
such as tumblelogs and Twitter – and a frame of MA: MIT Press.
use – the ‘social activities proposed by the tech- Blood, R. (2002) ‘Weblogs: A history and per-
nology, the integrated routines of daily life, sets
spective’, in J. Rodzvilla (ed.), We’ve Got
of social practices, kinds of people, places and
Blog: How Weblogs are Changing our Cul-
situations connected to the technical artifact’ (Fli-
chy, 2007: 83), in this case, blogs. Flichy argues ture. Cambridge, MA: Perseus. pp. 7–16.
that it is precisely through the articulation of Boltanski, L., and Chiapello, È. (1999) Le
these two frames that technologies acquire social Nouvel Esprit du Capitalisme. Paris:
meaning. Gallimard.
370 THE SAGE HANDBOOK OF WEB HISTORY

Bolter, J.D., and Grusin, R. (1999) Remediation. http://www.bradlands.com:80/archive/


Cambridge, MA: MIT Press. arc_120198.html). Accessed June 18, 2018.
Boutin, P. (2008) ‘Twitter, Flickr, Facebook make Gwinn, E. (2007) ‘World gets 21st century
blogs look so 2004’, Wired (http://www. totem poles’, Chicago Tribune (http://articles.
wired.com/entertainment/theweb/magazine/ chicagotribune.com/2007-03-
16-11/st_essay). Accessed June 18, 2018. 20/features/0703200287_1_blog-twitter-
Carr, N. (2008) ‘Who killed the blogosphere?’, messaging)
Roughtype.com (http://www.roughtype. Habermas, J. (1989) The Structural Transforma-
com/?s=who+killed). Accessed June 18, 2018. tion of the Public Sphere: An Inquiry into a
Coates, T. (2003) ‘(Weblogs and) the mass Category of Bourgeois Society. Cambridge:
amateurisation of (nearly) everything…’, Polity Press.
PlasticBag (http://www.plasticbag.org/ Herring, S.C., Scheidt, L.A., Wright, E., and
archives/2003/09/weblogs_and_the_mass_ Bonus, S. (2005) ‘Weblogs as a bridging
amateurisation_of_nearly_everything/). genre’, Information, Technology & People,
Accessed June 18, 2018. 18(2): 142–71.
Copeland, H. (2002) ‘Blogonomics: Making a Hourihan, M. (2002) ‘What we’re doing when
living from blogging’, Pressflex.com (http:// we blog’, O’Reilly Networks’ Web DevCenter
web.archive.org/web/20020802074241/ (http://www.oreillynet.com/pub/a/javas-
http://www.pressflex.com/news/fullstory.php/ cript/2002/06/13/megnut.html). Accessed
aid/54/Blogonomics:_making_a_living_from_ June 18, 2018.
blogging.html). Accessed June 18, 2018. Le Meur, L., and Beauvais, L. (2005) Blogs pour
Crampton, T. (2006, July 27) ‘France’s mysteri- les Pros. Paris: Dunod.
ous embrace of blogs’, NYTimes.com (http:// Locatelli, E. (2014) The Blog Up! Storia Sociale
www.nytimes.com/2006/07/27/ del Blog in Italia. Milano: FrancoAngeli.
technology/27iht-blogs.2314926.html). Lovink, G. (2011) My First Recession: Critical
Accessed June 18, 2018. Internet Culture in Transition. Amsterdam:
Duffy, B.E., and Hund, E. (2015) ‘“Having it all” Institute of Network Cultures.
on social media: Entrepreneurial femininity Maeda, J. (2006) The Laws of Simplicity (Sim-
and self-branding among fashion bloggers’, plicity: Design, Technology, Business, Life).
Social Media + Society, July-December: 1–11. Cambridge, MA: MIT Press.
Faler, B. (2005) ‘A Capitol Hill presence in the Miller, C.R., and Shepherd, D. (2004) ‘Blogging
blogosphere’, The Washington Post. p. A15. as social action: A genre analysis of the
Flichy, P. (2007) Understanding Technological weblog’, in L. Gurak, S. Antonijevic, L.A.
Innovation: A Socio-Technical Approach. Johnson, C. Ratliff, and J. Reyman (eds.), Into
Cheltenham, UK: Edward Elgar. the Blogosphere: Rhetoric, Community, and
Gigandet, S. (2001a) ‘Le triomphe des Culture of Weblogs (http://blog.lib.umn.edu/
weblogs’, C-est-tout.com (http://web. blogosphere/blogging_as_social_action_a_
archive.org/web/20010716062349/http:// genre_analysis_of_the_weblog.html)
c-est-tout.com/infos/info_883.shtml). Miller, C.R., and Shepherd, D. (2009) ‘Ques-
Accessed June 18, 2018. tions for genre theory from the blogosphere’,
Gigandet, S. (2001b) ‘Les balbutiements des in J. Giltrow and D. Stein (eds.), Genres in the
jouebs’, C-est-tout.com (http://web.archive. Internet: Issues in the Theory of Genre.
org/web/20010803181835/http://c-est-tout. Amsterdam: John Benjamins. pp. 263–90.
com/infos/info_886.shtml). Accessed June Moe, H. (2011) ‘Mapping the Norwegian
18, 2018. blogosphere: Methodological challenges in
Glaser, M. (2007) ‘Twitter founders thrive on internationalizing Internet research’, Social
micro-blogging constraints’, PBS (http:// Science Computer Review, 29(3): 313–26.
mediashift.org/2007/05/twitter-founders- Moggridge, B. (2010) Designing Media. Cam-
thrive-on-micro-blogging-constraints137/). bridge, MA: MIT Press.
Accessed June 18, 2018. Mullenweg, M. (2011) ‘Blogging drift’, Ma.tt
Graham, B. (1998) No title, BradLands (https:// (http://ma.tt/2011/02/blogging-drift/).
web.archive.org/web/19991018212708/ Accessed June 18, 2018.
BLOGS 371

Neuman, W.R. (2010) ‘Theories of media evo- Journal of Computer-Mediated Communica-


lution’, in W. Russell Neuman (ed.), Media, tion, 17(4): 408–21.
Technology, and Society: Theories of Media Siles, I. (2013) ‘Inventing Twitter: An iterative
Evolution. Ann Arbor: University of Michigan approach to new media development’, Interna-
Press. pp. 1–21. tional Journal of Communication, 7: 2105–27.
Nielsen, R.K. (2012) ‘How newspapers began Siles, I. (2017) Networked Selves: Trajectories of
to blog’, Information, Communication & Blogging in the United States and France.
Society, 15(6): 959–78. New York: Peter Lang.
O’Reilly, T. (2005) ‘What is Web 2.0. Design Slack, J.D. (2006) ‘Communication as articula-
patterns and business models for the next tion’, in G.J. Shepherd, J. St. John, and T.
generation of software’ (http://oreilly.com/ Striphas (eds.), Communication as…: Perspec-
w e b 2 / a rc h i v e / w h a t - i s - w e b - 2 0 . h t m l ) . tives on Theory. London: Sage. pp. 223–31.
Accessed June 18, 2018. Taylor, T. (2017) ‘Where fashion blogging
Pô, J.-D., and Vanbremeersch, N. (2007) ‘La cam- began’, The New York Times (https://www.
pagne électorale de 2007 et le débat politique nytimes.com/2017/02/01/fashion/fashin-
en ligne’, Commentaire, 30(117): 147–55. where-fashion-blogging-began.html?_r=0)
Prodoehl, P. (1999) No title, Rasterweb (http:// Trott, M. (n.d.) ‘The beginning’, Six Apart/
rasterweb.net/raster/199902.html) About (http://archive.li/7bw4C). Accessed
Rosenberg, S. (1999) ‘Fear of links’, Salon June 18, 2018.
(https://www.salon.com/1999/05/28/ Turnbull, G. (2002) ‘The state of the blog part 2:
weblogs/). Accessed June 18, 2018. Blogger present’, in J. Rodzvilla (ed.), We’ve
Rosenberg, S. (2009) Say Everything: How got blog: How weblogs are changing our
Blogging Began, What It’s Becoming, and culture. Cambridge, MA: Perseus. pp. 81–5.
Why It Matters. New York: Crown. Walton, M. (2004) ‘Bloggers get convention
Rowan, D. (2003) ‘Technobabble’, The Times. p. 21. credentials’, CNN.com (http://edition.cnn.
Russell, A., and Echchaibi, N. (2009) Interna- com/2004/TECH/internet/07/23/convention-
tional Blogging: Identity, Politics, and Net- bloggers/). Accessed June 18, 2018.
worked Publics. New York: Peter Lang. Waters, R., and Nuttall, C. (2007) ‘Mini-blog is
Schonfeld, E. (2011) ‘Why David Karp started the talk of Silicon Valley’, Financial Times
Tumblr: Blogs don’t work for most people’, (http://www.ft.com/cms/s/2/d0ccbc46-daf7-
TechCrunch.com (http://techcrunch. 11db-ba4d-000b5df10621.html-ixzz -
com/2011/02/21/founder-stories-why-david- 1I04qzm6t). Accessed June 18, 2018.
karp-started-tumblr-blogs-dont-work-for- Welch, M. (2002) ‘Don’t miss the Lileks
most-people/). Accessed June 18, 2018. response!’, MattWelch (https://web.archive.
Selvitelle, B. (2008) ‘Why I loathe the word org/web/20020405100057/http://mat-
“microblogging”’, life.i.think (http://luke- twelch.com:80/warblog.html). Accessed
warmtapioca.wordpress.com/2008/06/09/ June 18, 2018.
why-i-loathe-the-word-microblogging/). Weltevrede, E., and Helmond, A. (2012) ‘Where
Accessed June 18, 2018. do bloggers blog? Platform transitions within
Siles, I. (2007–2008) ‘“Blogueando” a la tica: the historical Dutch blogosphere’, First
Una mirada al uso de los blogs en Costa Monday, 17(2) (http://www.uic.edu/htbin/
Rica’, Anuario de Estudios Centroamerica- cgiwrap/bin/ojs/index.php/fm/article/
nos, 33–34: 325–57. view/3775/3142). Accessed June 18, 2018.
Siles, I. (2011) ‘From online filter to Web Williams, R. (1983) Keywords: A Vocabulary of
format: Articulating materiality and meaning Culture and Society. New York: Oxford Uni-
in the early history of blogs’, Social Studies versity Press.
of Science, 41(5): 737–58. Winer, D. (1996) ‘What is Frontier?’, Scripting
Siles, I. (2012a) ‘The rise of blogging: Articula- News (http://scripting.com/frontier/begin-
tion as a dynamic of technological stabiliza- ning/whatFrontierIs.html). Accessed June 18,
tion’, New Media & Society, 14(5): 781–97. 2018.
Siles, I. (2012b) ‘Web technologies of the self: Yahoo! Internet Life (2002) Top of the Net
The arising of the “blogger” identity’, 2001, 8: 56–8.
25
The History of Online
Social Media
C h r i s t i n a O r t n e r , P h i l i p S i n n e r a n d Ta n j a J a d i n

INTRODUCTION Today, we face a wide range of different


social media platforms, new ones emerging
In our changing world, terms such as globali- every day. As they strongly influence the way
zation, individualization, digitalization or people interact with each other, they have
information society have become popular developed into an integral part of our soci-
buzzwords. All these concepts refer to a eties. Against this background, the present
transformation process where technological chapter sets out to reconstruct the history of
innovations, changing lifestyles, new work social media in the wider context of web his-
and life patterns and emerging needs are tory. Hence, it will tell how this specific form
inextricably linked (Carpentier et al., 2014). of web-based services evolved in relation to
In communication and media studies, this the overall progress of the web.
fundamental change is primarily discussed as Subsequent to a clarification of terms, the
the ‘meta process’ (Krotz, 2014: 137, empha- chapter will deal with different phases in
sis in original) of mediatization (for discus- this development: 1) the first period before
sion see Livingstone, 2009; Lunt and the rise of modern social media when online
Livingstone, 2016: 462), which is shaped by interaction was mainly realized via discus-
both, technological and social change (see sion boards, newsgroups, mailing lists, chats
Hepp, 2012). or instant messenger, 2) the second phase
Within the last decades, one of the driv- when new web-based services such as wikis,
ing factors of mediatization was the rise of weblogs, podcasts or social bookmarking
social media – a phenomenon that appeared became popular, 3) the third phase which is
in the late 1990s, passed through a period of characterized by a diversification of social
rapid proliferation in the 2000s and has since media services offering new possibilities for
become an important part of online culture. social interaction, and 4) finally, the recent
THE HISTORY OF ONLINE SOCIAL MEDIA 373

time, when apps for mobile devices allow Given the difficulties of a single definition,
us to interact via social media anytime and many authors tend to use social media as an
anywhere. umbrella term for services with similar fea-
Like always in history, different develop- tures. Yet what types of platforms it involves
ments may exist at the same time and new is also open to discussion. Blogs, microblogs,
patterns do not necessarily replace older social networking sites, wikis and multimedia
structures. Therefore, these four phases can- platforms are included by most authors even
not be seen separately. They are overlapping if they use slightly different names. Some
and strongly interwoven. Nevertheless, we add podcasts and videocasts as a new form
argue that they mark major changes in the of personal publishing similar to b­logging
evolution of social media. Moreover, a brief (e.g. Schmidt, 2011). Whether messengers or
history like this is necessarily far from com- even e-mails are part of social media is still
plete. Thus, the aim rather lies in drawing the debated (Sajithra and Patil, 2013: 70), as it is
big picture than in giving detailed insights Kaplan and Haenlein’s (2010, 2012) sugges-
into single technologies or platforms. tion to include virtual worlds or multiplayer
game worlds.
There are several reasons why it is so dif-
ficult to find a common understanding. First
THE CONCEPT OF SOCIAL MEDIA of all, social media are associated with a wide
variety of diverse applications using different
According to Bercovici (2010), the term technologies, providing different functions,
social media appeared in the early 1990s for satisfying different needs. Second, these
the first time, yet it took until the mid 2000s applications tend to incorporate features of
for it to become established in scientific and already existing web-based services, which
public discourse. Since then, social media makes it hard to draw the line at related
have been commonly perceived as a specific tools. Last but not least, ‘social media plat-
category of Internet services. ‘However, forms, rather than being finished products,
what exactly is meant by social media is a are dynamic objects’ (Van Dijck, 2013: 7),
matter of debate’ (Hunsinger and Senft, permanently adapting to new developments.
2014: 1). Thus, the field of social media is in a constant
According to Treem et al. (2016), there state of flux (Lomborg, 2016: 6).
is some consensus that social media are ser- However, it is exactly their diversity and
vices based on the web which enable social volatility that makes social media so fas-
interaction between human beings. Yet what cinating. Therefore, we agree with Treem
kind of social interaction this could be dif- et al. that ‘the ambiguity around social media
fers from author to author (see Fuchs, 2017). should not be viewed (…) as a source of frus-
Other approaches focus on the type of con- tration’ (2016: 770) but rather as an integral
tent provided by social media and identify principle of the phenomenon we investigate.
user-generated content (UGC) as their main In this chapter, we therefore understand
characteristic (e.g. Kaplan and Haenlein, social media in a broad sense, trying to meet
2010, 2012; Kietzmann et al., 2011). As their diverse and dynamic nature. In the fol-
UGC has already been around since the late lowing sections, we will trace the develop-
1970s Kaplan and Haenlein further limit ment of web-based tools that allow users to
social media to applications ‘that build on the create, publish and share own content easily
ideological and technological foundations of and that support social interaction, commu-
Web 2.0’ (2010: 61). However, as we will see nication and collaboration. On the one hand,
later in this chapter, these foundations are the focus is on services that do not provide
quite ambiguous. such functions as additional features but are
374 THE SAGE HANDBOOK OF WEB HISTORY

primarily designed for these purposes. On possibilities for collaboration (Allen, 2004).
the other hand, we mainly consider applica- The creation of the ARPANET, which from
tions that allow public or semi-public com- 1969 on linked universities all over the United
munication. Moreover, we take into account States, increased the potential of comput-
services with similar functions developed ers as a means of social interaction, e.g. via
in the pre-web era in the 1970s, 1980s and Internet e-mail service available from 1971
1990s that can be seen as precursors of mod- on (see Clark, 2003). In the same period,
ern social media. early learning management systems (such as
In doing so, we choose an approach that PLATO in 1960), first bulletin boards (such
puts concrete services with specific commu- as Community Memory in 1973) and col-
nicative features in the core of the definition. laborative software (such as EIES in 1978)
These services depend on the availability were developed (Allen, 2004; Jones, 2003;
of respective technologies, on actors who Senft, 2003a).
develop and offer social media tools and on Building on some of these ideas, new net-
people who make use of them. Hence, we work services for social interaction entered
address the evolution of social media as a pro- the stage in the late 1970s, most notably
cess that is shaped by technological change, bulletin board systems (BBSs), Usenet and
by important actors and companies and by mailing lists. A bulletin board system is an
the communicative practices of its users. early forum software where users can post
or read news, exchange messages and share
files. The first BBS – the Computerized
Bulletin Board System (CBBS) – was devel-
THE PRECURSORS OF MODERN oped by Suess and Christensen in 1979
SOCIAL MEDIA (Senft, 2003a). Only one year later, Truscott
and Ellis created a similar service called
Quite often, social media are seen as a Usenet (Chen, 2003), which provides a dis-
ground-breaking novelty. Yet, although they cussion platform organized in the form of
actually change the ways we interact online, newsgroups. Its communicative structure is
the basic idea behind social media is any- quite similar to mailing lists, another instru-
thing but new (see Brügger, 2015; Kaplan ment for group communication that came up
and Haenlein, 2010; Sajithra and Patil, 2013; at the end of the 1970s.
Treem et al., 2016). It is rather based on the Throughout the 1980s, innumerable
need for sociality, which ‘is an integral part BBSs emerged all over the world function-
of human life’ (Brügger, 2015) and ‘existed ing as platforms for various communities.
long before the creation of digital social In parallel, Usenet developed into a world-
media platforms’ (Treem et al., 2016: 773). wide discussion system with newsgroups
Therefore, some authors trace the origins of on all sorts of topics. At the beginning, both
social media back to the first postal services BBSs and Usenet operated within their own
(Guru et al., 2016) or the invention of smoke networks; however, later they were con-
signals (Adams, 2011). nected among each other as well as with
Even if we do not go that far, but focus the Internet. In the same period, mailing
on social interaction via computer, we have lists became widespread, especially after
to start in the pre-web era, namely in the the launch of an automation program called
1960s. In these years, first tests with local LISTSERV by Thomas in 1986 (Featherly,
e-mails took place (Clark, 2003: 175; Peter, 2003).
2004) and researchers such as Licklider and Yet another important service for online
his colleagues at the Advanced Research interaction began to spread in the late 1980s,
Projects Agency (ARPA) experimented on namely the Internet Relay Chat (IRC)
THE HISTORY OF ONLINE SOCIAL MEDIA 375

developed by Oikarinen (Senft, 2003b). they had an important impact on online com-
It established the communicative genre munication and web culture.
chat and is seen as one of the ancestors of From that time on, the web was the main
instant messaging (Larson, 2003). IRC is driver for online sociality. In the 1990s, its
a text-based chat system that allows mul- dominant role was only challenged by instant
tiple users to send messages in real-time. messengers like ICQ or AOL Messenger
Communication is mainly organized through (Guru et al., 2016), which operated within the
public channels, although IRC also allows Internet but outside the web. These services
one-to-one communication. existed next to numerous web forums and
The idea of synchronous online commu- chat groups, where people met to socialize,
nication was not new at that time. Oikarinen exchange ideas or share common interests. In
was inspired by an already existing technol- addition, the web opened up new possibilities
ogy used within BITNET called Bitnet Relay for self-expression through personal websites
(Senft, 2003b). Moreover, chat software had (Sajithra and Patil, 2013: 72). Even if peo-
been integrated into earlier systems since ple had to code their own sites and therefore
the 1970s (e.g. Talkomatic in PLATO or the needed programing skills in the beginning, a
CB Simulator in CompuServe). However, vivid culture of web publishing emerged. It
none of these were as widespread as IRC. shaped the early days of the web and can be
Within only a few years, IRC evolved into a seen as the roots of the blogosphere (see later
worldwide phenomenon. ‘People in over 120 in this chapter).
countries and territories have used IRC, and As a reaction to these practices, automated
one can easily find conversations flourish- web publishing software was developed.
ing in English, German, Japanese, French, GeoCities (since 1994) was the first service
Finnish and other languages there’ (Senft, that provided free web hosting combined
2003b: 256). with tools for website creation (see Milligan,
By providing means of interaction among chapter 23). ‘For the first time, users could
larger groups, BBSs, Usenet and IRC – create their own web pages without having
together with e-mail and mailing lists – to worry about the intimidating acronym
shaped the way people shared ideas via soup of FTP, HTML, and the like’ (Milligan,
computers for a long time. However, from the 2017: 137). This led to ‘a popularity surge in
mid 1990s on they lost their significance. The homepages, whereby the Average Joe could
reason lies in the rise of the WorldWideWeb share information’ (Kaplan and Haenlein,
(WWW) as a universal service, which today 2010: 60). Another prominent example is
for many represents the Internet itself. After Blogger, the first weblog publishing system,
its launch in 1991 by Berners-Lee, the web which was developed to support the blog-
was yet another Internet service next to ging scene in 1999.
various existing systems, all designed for a These automated software tools enabled
specific purpose: IRC for chat, Usenet for users without specific skills to publish online
newsgroups and so on (Brügger, 2015). At and therefore can be seen as a significant step
that time, it merely concentrated on hypertex- towards social media. In the second half of
tuality. Yet, step by step it ‘began to absorb the 1990s, we can moreover observe first ver-
the functions that each of the other software sions of wikis such as WikiWikiWeb (in 1995)
systems possessed’ (Brügger, 2015). As a or social networking sites such as Classmates
result, BBSs, Usenet and IRC – like many (in 1995) or SixDegrees (in 1997). However,
others – were slowly replaced by web-based it took several more years – in the case of
services. Even if they are still used by spe- social networking sites even longer – until
cific technical communities, today they are these types of media gained the importance
rather a marginal phenomenon. Nevertheless, they have today.
376 THE SAGE HANDBOOK OF WEB HISTORY

THE RISE OF SOCIAL MEDIA IN THE service, to optimize it for different devices, to
WAKE OF WEB 2.0 build application around data, to enrich user
experience and to harness the power of the
The first social media services that became crowd (see also Anderson, 2007).
widespread among larger groups of Internet With regard to social media, the last prin-
users were wikis, weblogs and social book- ciple is the most important as it refers to the
marking sites. Their success story is closely core role of users. In Web 2.0 applications,
linked to a bundle of trends in the develop- content is no longer published by single
ment of the web which were later discussed authors but provided by multiple end users
under the heading Web 2.0 and are seen as a in a collective manner. Therefore, the value
major point in web history. After the burst of of such services depends on the participa-
the dotcom bubble in 2000, the Internet tion of individuals who exchange ideas, share
industry was in an economic crisis. Yet, at the resources, connect with each other and work
same time, the Internet hit the mainstream as together. According to Van Dijck (2009), this
larger groups of the population got access. provoked the notion of a participation shift.
Moreover, several technological innovations However, the core idea of user participation
led to the emergence of new tools following is anything but new. Content provided by
quite different principles than traditional end users already built the backbone of early
web-based services before. network services, and the web was meant to
As a reaction, O’Reilly Media and CMP foster worldwide collaboration right from the
Technology planned a conference in 2004 beginning – as its founder Berners-Lee stated
to discuss new trends and business models (Anderson, 2007).
in the Internet industry. When trying to find Moreover, Van Dijck (2009) argues that
a possible title, Dougherty suggested the the participation shift is rather an ideal than
term Web 2.0, not knowing that this would a real phenomenon. Although the barriers are
be the buzzword of a new web era (Musser lowered, only a limited number of Internet
and O’Reilly, 2006). Although the wording users actively makes use of this potential.
recalls a software update, Web 2.0 does not Still, educational as well as socio-economic
refer to a new technical version of the web. It backgrounds remain the main factors for
is rather ‘a set of economic, social, and tech- Internet usage patterns (e.g. Hargittai and
nology trends that collectively form the basis Walejko, 2008; Van Deursen and Van Dijck,
for the next generation of the Internet’ (2006: 2014). Finally, online activities do not merely
4). The concept was meant to express the result in benefits for the users. By creating
feeling that the web had reached a new level major parts of the content, they provide
and hence should help attract novel capital unpaid creative work. Moreover, they leave
investments (Fuchs, 2017: 35). personal data, on which web companies build
In a well-known blog article, O’Reilly their business models (Van Dijck, 2009: 46).
(2005) describes Web 2.0 by mapping typical Nevertheless, there is some evidence that
applications (flickr, del.icio.us, BitTorrent, Web 2.0 applications actually did foster par-
Napster, Wikipedia, blogs), typical features ticipatory web cultures in the early 2000s (e.g.
(GoogleAdSense, eBay reputation, Amazon Dooley et al., 2012). This applies particularly
reviews), typical practices (tagging, folkson- for blogs and wikis: while blogs allow people
omy) and new technologies (RSS, AJAX). to publicly express their opinion, the strength
Based on this map he tries to identify a set of wikis lies in collaborative content produc-
of principles that characterize the new phe- tion. A wiki is a system of interlinked web
nomenon. These include the tendency to use pages that can be edited by ‘any user with a
the web as a platform, to apply lightweight forms-capable web browser client’ (Leuf and
programing models, to deliver software as a Cunningham, 2001: 14). The history function
THE HISTORY OF ONLINE SOCIAL MEDIA 377

of wikis enables users to follow the changes established relations between different sites.
of a document and easily switch to an older Step by step a network of blogs with some
version of the text. focal points and a big number of peripheral
The first wiki, called WikiWikiWeb, was members developed – the so-called blogo-
launched in 1995 by Cunningham and gave this sphere. An important technology in this con-
genre its name (Wagner, 2004: 269). However, text is RSS (Really Simple Syndication or
it was the launch of Wikipedia in 2001 that Rich Site Summary). This innovation made
made wikis really popular. This free multilin- it possible to integrate content from other
gual online encyclopaedia run by the Wikipedia websites into their own sites. In addition, a
Foundation is by far the most prominent repre- so-called RSS feed reader allowed users to
sentative of this genre. (see Famiglietti, chapter subscribe to blogs or web pages and auto-
21) Today, it consists of more than five million matically follow their updates. Another tech-
articles stored on approximately 41 million nique, which was and still is used to establish
pages that are cultivated by more than 100,000 connections between different blogs is tag-
active users (Wikipedia, 2017). It is one of the ging. By enhancing information with specific
main free information resources online and key words – so-called tags – websites with
demonstrates the enormous potential of wikis similar content can be identified more easily.
for collaborative texting. The technique of tagging is closely linked to
Another flagship of participatory web cul- social bookmarking, which appeared around
ture are blogs. (see Siles, chapter 24) In 1997 2003 and took the original idea of blogging
Barger (www.robotwisdom.com) coined the – namely sharing interesting web sources –
term weblog for a type of website that had to the next level. Social bookmarking plat-
been around for a while (see Siles, in this vol- forms allow people to collect and organize
ume). The original idea of these sites was to links just like they do in their web browser.
collect and comment on other web sources. Yet, as these bookmarks are not stored locally
Therefore, Barger defined weblogs as web but on web-based platforms, they can easily
pages, where ‘Web loggers’ note all sorts of be shared with and commented on by others.
other sites they find interesting (Blood, 2004: The most prominent representative of social
54). Later the meaning of the term slightly bookmarking is Schachter’s website del.icio.
changed, focusing less on the type of content us, which triggered the whole phenomenon
and more on the way it is presented. (Anderson, 2007).
Although the practice of blogging has At around the same time, the idea of blog-
existed since the mid 1990s, it only became ging was applied to audio and later video
widespread after automated weblog publish- content. A new form of publishing – called
ing systems such as Blogger or WordPress podcasting – began to spread. Podcasts are
emerged around the year 2000. Since that regularly updated audio or video blogs that
time a wide range of more or less profes- can be published via blogging software and
sional blogs on all sorts of topics have been subscribed to via RSS feed reader. Their
published all over the world. While in the appearance is an indicator for the shift from
beginning prominent weblogs were mainly text-based publishing to multimedia services.
of private nature, nowadays the most known This trend became apparent with the advent
blogs are integral parts of bigger websites of the first photo-sharing tools like flickr (in
designed by professionals such as Mashable, 2004) or video-sharing sites such as vimeo
Huffington Post or TechCrunch (eBizMBA (in 2004) or YouTube (in 2005). In addi-
Guide, 2017). tion, social networking sites – most notable
Right from the beginning, bloggers had Friendster (in 2001) and myspace (in 2003) –
a strong sense for community building. By tried to gain ground, yet still with limited suc-
commenting on or linking other blogs, they cess. In these years, we can also see the first
378 THE SAGE HANDBOOK OF WEB HISTORY

tendencies towards commercialization – a WhatsApp are key factors for Facebook’s


trend that directly leads us to the next period success. Moreover, Brügger (2015) argues
in the history of social media when this ten- that the service is so attractive because it
dency intensified. provides an empty digital space that struc-
tures user interactions but can be used for
all sorts of purposes. Like many other social
media organizations, Facebook started as a
DIVERSIFICATION, PROLIFERATION ­low-cost-oriented start-up with a promising
AND COMMERCIALIZATION idea but sooner or later became a highly com-
mercialized company. It is therefore a good
In the mid 2000s, the field of social media example of a development we can observe in
entered a new phase characterized by a diver- the overall field of social media: step by step
sification of services, a rapid proliferation and it turned into an internationalized market.
the trend towards commercialization. User To sustain in this highly competitive field,
figures rose up to hitherto undreamt heights it is necessary to push innovation, introduce
all over the world, turning social media into a new features or occupy new niches. As a
global mass phenomenon. This rapid increase result, we can observe the emergence of sev-
in active users led to rising costs and the need eral new types of social media in the second
to professionalize the business. half of the 2000s. One of these new genres
Around this time, we can observe the is microblogging. Although it has its roots in
advent of several new types of social media. blogging (see Siles, in this volume), micro-
Most apparent is the rise of social network- blogging was designed for other purposes
ing sites (SNS), driven by the success of and its communicative dynamics strongly
Facebook. Early versions of SNS such as differ from traditional blogging (Van Dijck,
SixDegrees (in 1997), Friendster (in 2001) 2011: 335). The most famous representative
or myspace (in 2003) had been around for of microblogging is Twitter. It was launched
years (boyd and Ellison, 2007: 214). Yet it in 2006, had more than 300 million active
was Facebook that made social networking users by June 2016 (Twitter, 2017) and is
so popular. Launched for students of Harvard extensively used by politicians, journalists,
University in 2004, it opened its registration organizations and companies.
to almost everyone in the world in 2006. By New types of social media also emerged
September 2016, Facebook had 1.79 billion when specialized services for different types
monthly active users and was available in 140 of content came up. This had already started
languages (Facebook, 2017). in the period before but gained dynamic as
Other social networking sites went online time went on. Several image-sharing sites
in this period but were not able to sustain. such as flickr (in 2004), Pinterest (in 2010)
Myspace may be the most prominent exam- and Instagram (in 2010) or multimedia blog-
ple of what happens if a service fails to live ging tools like Tumblr (in 2007) entered the
up to its users’ changing demands (Gillette, stage. YouTube, the dinosaur among video-
2011). A lack of technological innovations sharing sites, was launched in 2005 and rap-
and an overloaded, old-fashioned design idly spread around the world. In 2006 it was
are named as reasons for its decline, which bought by Google and turned into a highly
began in 2008. Even Google’s networking commercial platform (see also Burgess and
site Google+ is struggling and has so far not Green, 2013). It is now a prosperous venture
got beyond the status of a niche product (see with around one billion users in more than 88
Denning, 2015). countries (YouTube, 2017). Similar services
Its rapid growth and the acquisition of such as the short-form video-hosting service
additional platforms such as Instagram or Vine (in 2012) were less successful. In the
THE HISTORY OF ONLINE SOCIAL MEDIA 379

field of radio, music and audio, SoundCloud India and the United States or StudiVZ,
(since 2007) – a music database owned by SchülerVZ and meinVZ in the German-
CBS-Interactive Inc. – has established itself speaking area (Hasebrink and Rohde, 2011:
as the biggest player. 104f.) – domestic platforms are still the
In the course of time, many of these ser- major player in other countries.
vices expanded their features and turned into The Russian market, for example, is domi-
multimedia platforms. As a reaction to new nated by VKontakte and Odnoklassniki
applications even big players like Twitter, (Classmates), both regional services, and
YouTube or Facebook widened their areas China – the largest linguistic area – is a world
of interest to be able to sustain their posi- unto itself. Due to restrictive legislation and
tion (see Brügger, 2015). However, diversi- censorship, most Western social media are
fication was not only driven by innovative not available there (see also King et al., 2013).
features but likewise by specialization on Therefore, domestic offerings such as the
particular groups with common identi- social networking sites QZone, RenRen and
ties or areas of interest. Hidden champions Kaixin001, the microblogging services Tencent
in different spheres are highlighting their Weibo, SINA Weibo and Fanfou, the video-
wide range and diverse character. We see sharing sites YOUKU, 56.com and iQIYI, or
offerings for LGBTQ+ people such as the the messenger services QQ and WeChat share
SNS PlanetRomeo (since 2002) or Grindr the market (see Millward, 2016).
(launched in 2009), which have had great To sum up, from the mid 2000s on, social
influence on social life and the economy (see media entered into a highly dynamic phase
Lemke et al., 2015: 2), and millions of peo- characterized by rapid proliferation and enor-
ple are gathering in certain networks to share mous diversification. Next to economic and
their activities. Fishbrain (since 2010) for social factors, the main drivers were tech-
fishing, Strava (since 2009) to track and share nological developments. Therefore, it is not
athletic activities or Ravelry (since 2007) for surprising that it was the rise of yet another
fiber arts are only a few examples. technological invention which brought about
Moreover, business and employment-ori- major changes in the field of social media by
ented platforms such as LinkedIn (2002) and the end of the 2000s, namely the smartphone.
XING (2003) or networks for special sectors
such as Academia.edu and ResearchGate
for scientists became popular (Thelwall and
Kousha, 2014, 2015). Digital curation is THE SHIFT TOWARDS SOCIAL MEDIA
now also closely linked to business-related APPS AND MOBILE USAGE
services. Offers such as TripAdvisor (since
2000), Yelp (since 2004) or later Foursquare On 9 January 2007, Steve Jobs presented
(since 2009) bring together companies with Apple’s new mobile phone with the words:
the aim of presenting their products and ‘This is one device … and we are calling it
users with their desire for participation (see iPhone’ (quoted from Noriega, 2011). In the
Jenkins, 2009: 7). years to come, this product would change the
Finally, we should bear in mind that – way we use mobile Internet in general and
though many of the best-known social media social media in particular. Although the
companies are located in the United States – iPhone was neither the first smartphone nor
social media are as diverse as the world itself. perfect or flawless, it has influenced the way
Especially in large linguistic areas, regional such products look like up to the present day.
services became the market leader. While With regards to the handling, all-over multi-
some of them had to face substantial decline touch displays replaced both small displays
or even shut down – such as Orkut in Brazil, and tiny QWERTY keyboards.
380 THE SAGE HANDBOOK OF WEB HISTORY

In 2008 Apple opened the iPhone for live video were launched, namely Meerkat
external developers and launched the first app and Periscope. While Meerkat stopped all
store via iTunes. In such online marketplaces its services in 2016, Periscope is still on the
developers can offer their software tools – market and is owned by Twitter. The app is
so-called apps (Jansen and Bloemendal, 2013: mainly designed for broadcasting live vid-
195). In the same year, Google published the eos to a wider public. Nevertheless, users
Google Play Store (Butler, 2011: 4) and other can limit access to selected persons within
companies such as Microsoft, BlackBerry their own network. The latest develop-
and Amazon followed (Hyrynsalmi et al., ment in the field of mobile live streaming
2012: 64). In subsequent years, smartphones is Houseparty. It was launched in 2017 by
have virtually replaced older versions of the Meerkat team and specializes in sharing
mobile phones in many regions of the world. video streams among smaller groups.
This success story had tremendous influ- Although all these innovative services are
ence on the development of social media. The important for the further development of
whole Internet became accessible via smart- social media, until now they have not reached
phones and people could now chat, read, the mainstream. Quite contrary to this, the
watch, vote or post any content at any time genre of mobile messaging rapidly developed
and any place (e.g. Humphreys et al., 2013; into a mass phenomenon. The most popular
Vorderer et al., 2016). The more intensively representatives are WhatsApp and Snapchat.
people made use of these possibilities, the WhatsApp was launched in 2009, bought by
more social media developed into an inte- Facebook in 2014 and is by now well estab-
grated part of their daily lives and the more lished among all age groups.
people found themselves in a state of perma- Snapchat is extremely popular among young
nent connection. people, mainly because of its numerous func-
Soon, social media companies realized the tions and periodically updated (geo)filters.
potential of mobile usage and started pub- As its most distinguishing feature, photo- and
lishing mobile versions of their offerings. In video-snaps disappear after a few seconds.
addition, more and more new services were This supposed transience contrasts with the
released as independent ‘social networking idea of the Internet as an everlasting memory.
apps’ (Johnson, 2015) for mobile use only. Although Snapchat has been online since 2011,
Many of them built their functionalities on the the company still has a promising future: in
possibility to track users’ whereabouts. These 2017 Snap Inc. went public successfully, and
location-based services such as Foursquare, at the moment competitors are busy adopting
Traces and Recho (for hidden notes and more and more of Snapchat’s functions.
records), Bubbly (a voice-oriented network), At this point the history of social media
or Findery (a location-related database) have turns into a vivid present that points to an
rapidly gained importance over recent years. interesting future. The success of picture-
In addition, mobile dating apps, such as the and video-based networks and the acceptance
above-mentioned Grindr or picture-based of augmented reality give a hint as to which
Tinder (launched in 2012) started to make use direction the game may be heading. However,
of positioning services to optimize results. in such a highly dynamic field, future devel-
Another recent trend is live video opments are hard to predict.
streaming – very often tied to certain loca-
tions, too. Google Hangouts was the first
to offer this feature in 2013, followed by CONCLUSIONS
Facebook in 2015 in the United States and
one year later in the rest of the world. In 2015, To conclude, online social media have a long
two important mobile apps specializing in and dynamic history, which is closely related
THE HISTORY OF ONLINE SOCIAL MEDIA 381

to web history. Although their roots go back areas of life. As new tools are emerging every
to the pre-web period when services such as day, trying to find yet another niche to oper-
bulletin board systems, Usenet or IRC intro- ate in, the diversification of services is still
duced basic functions for online interaction, ongoing. Right from the beginning, this pro-
the emergence of modern social media cess went hand in hand with a trend towards
depended on the success of the web as a uni- internationalization. Although most innova-
versal service. In the mid and late 1990s the tions in this field had their starting point in
web became the main driver for online soci- the United States, they quickly spread around
ality, people met in various web forums and the world. It did not take long for special
chat rooms, a culture of web publishing sup- services for specific countries or regions to
ported by automated software developed and emerge – some of them dominating the mar-
the first versions of modern social media ket in large linguistic areas today.
such as wikis, weblogs and social network- Finally, the history of social media is
ing sites came up. At that time, these offers shaped by professionalization and com-
were only marginal phenomena. Nonetheless, mercialization. In the beginning, the ‘empty
they indicated the major opportunities of the structure’ (Brügger, 2015) of social media
Internet, later labeled as Web 2.0. offers was mainly filled by end users. Today,
In the wake of this transformation, social media content is produced by jour-
weblogs, podcasts, wikis and social book- nalists, newspapers, TV channels, political
marking sites gained significance. They parties, NGOs and all sorts of firms, associa-
offered new ways for sharing information and tions and organizations. (Semi-)professional
represented the new participatory character YouTubers, bloggers or Instagrammers make
of the web. From the mid 2000s on, we can (part of) their living via their social media
observe rapid proliferation, a diversification activities, and social media companies are
of services, the advent of new types of social facing the challenge of finding models for
media and a trend towards commercializa- monetarization. Altogether, these develop-
tion. The enormous success of smartphones ments have resulted in a growing relevance
around 2010 triggered further changes in the of social media for societies and economies
field of social media. Nowadays, nearly all all over the world – a trend that has not yet
social media platforms offer apps for mobile come to an end.
devices and new tools for mobile use only are
emerging. As a consequence, social media
usage is continuously shifting from web-
based services towards mobile social media REFERENCES
apps. Trends like location-based services,
video live streaming or augmented reality Adams, D. (2011) The History of Social Media
programs point to a vivid future. (http://www.instantshift.com/2011/10/20/
Looking back, one can observe some over- the-history-of-social-media/) Date Accessed:
all strands that shaped the history of social 17 June 2018.
media throughout all these periods. These are Allen, C. (2004) Tracing the Evolution of Social
diversification, internationalization, profes- Software (http://www.lifewithalacrity.
com/2004/10/tracing_the_evo.html) Date
sionalization and commercialization. Since
Accessed: 17 June 2018.
the late 1990s, when the first blogs, wikis Anderson, P. (2007) ‘What is Web 2.0? Ideas,
and SNS appeared, the number and types of Technologies and Implications for Educa-
social media have continuously increased. tion’, JISC Technology and Standards Watch,
Today we face a wide range of different plat- 1(1): 1–64.
forms focusing on all sorts of features, con- Bercovici, J. (2010) Who Coined ‘Social
tent, interests, groups, branches, regions and Media’? Web Pioneers Compete for Credit
382 THE SAGE HANDBOOK OF WEB HISTORY

(http://blogs.forbes.com/jeffbercovici/ (http://www.ebizmba.com/articles/blogs)
2010/12/09/who-coined-social-media-web- Date Accessed: 17 June 2018.
pioneers-compete-for-credit) Date Accessed: Facebook (2017) Newsroom. Company Info
17 June 2018. (http://newsroom.fb.com/company-info)
Blood, R. (2004) ‘How Blogging Software Date Accessed: 17 June 2018.
Reshapes the Online Community’, Commu- Featherly, K. (2003) ‘LISTSERV’, in S. Jones
nications of the ACM, 47(12): 53–55. (ed.), Encyclopedia of New Media: An Essen-
boyd, D.M. and Ellison, N.B. (2007) ‘Social Net- tial Reference to Communication and Tech-
work Sites: Definition, History, and Scholar- nology. Thousand Oaks, London, New Delhi:
ship’, Journal of Computer-Mediated Sage. pp. 293–294.
Communication, 13(1): 210–230. Fuchs, C. (2017) Social Media: A Critical Intro-
Brügger, N. (2015) ‘A Brief History of Facebook duction. 2nd edn. Thousand Oaks: Sage.
as a Media Text: The Development of an Gillette, F. (2011) ‘The Rise and Inglorious Fall
Empty Structure’, First Monday, 20(5) (http:// of Myspace’, Bloomberg Businessweek, 23
firstmonday.org/ojs/index.php/fm/article/ June 2011 (https://www.bloomberg.com/
view/5423/4466) Date Accessed: 17 June news/articles/2011-06-22/the-rise-and-
2018. inglorious-fall-of-myspace) Date Accessed:
Burgess, J. and Green, J. (2013) YouTube: 17 June 2018.
Online Video and Participatory Culture. Guru, M.C., Motaghem, S., Kumar, D. and
Hoboken: John Wiley & Sons. Devanoor, G. (2016) ‘History of Social
Butler, M. (2011) ‘Android: Changing the Media’, International Journal of English
Mobile Landscape’, IEEE Pervasive Comput- Language, Literature and Humanities, 4(2):
ing, 10(1): 4–7. 294–303.
Carpentier, N., Schrøder, K.C. and Hallett, L. Hargittai, E. and Walejko, G. (2008) ‘The
(2014) ‘Audience – Society Transformations’, Participation Divide: Content Creation and
in N. Carpentier, K.C. Schrøder and L. Hallett Sharing in the Digital Age’, Information,
(eds.), Audience Transformations. Shifting Community and Society, 11(2): 239–256.
Audience Positions in Late Modernity. New Hasebrink, U. and Rohde, W. (2011) ‘Die Social
York: Routledge. pp. 1–12. Web-Nutzung Jugendlicher und junger
Chen, S.-L.S. (2003) ‘USENET’, in S. Jones (ed.), Erwachsener: Nutzungsmuster, Vorlieben
Encyclopedia of New Media: An Essential und Einstellungen’ [Social Web Usage of
Reference to Communication and Technol- Adolescents and Young Adults: Patterns,
ogy. Thousand Oaks, London, New Delhi: Preferences and Attitudes], in J.-H. Schmidt,
Sage. pp. 457–459. I. Paus-Hasebrink and U. Hasebrink (eds.),
Clark, N. (2003) ‘E-Mail’, in S. Jones (ed.), Ency- Heranwachsen mit dem Social Web [Grow-
clopedia of New Media: An Essential Refer- ing up with the Social Web], together with
ence to Communication and Technology. T. Brüssel. Berlin: Vistas. pp. 83–120.
Thousand Oaks, London, New Delhi: Sage. Hepp, A. (2012) Cultures of Mediatization.
pp. 175–177. Cambridge: Polity Press.
Denning, S. (2015) ‘Has Google+ Really Died?’ Humphreys, L., Pape, T. von and Karnowski, V.
Forbes, 23 April 2015 (http://www.forbes. (2013) ‘Evolving Mobile Media: Uses and
com/sites/stevedenning/2015/04/23/has- Conceptualizations of the Mobile Internet’,
google-really-died/#6374b0d916e9) Date Journal of Computer-Mediated Communica-
Accessed: 17 June 2018. tion, 18(4): 491–507.
Dooley, J.A., Jones, S.C. and Iverson, D. (2012) Hunsinger, J. and Senft, T.M. (2014) ‘Introduc-
‘Web 2.0 Adoption and User Characteris- tion’, in J. Hunsinger and T.M. Senft (eds.),
tics’, Web Journal of Mass Communication The Social Media Handbook. New York:
Research, 42 (June) (http://ro.uow.edu.au/ Routledge. pp. 1–5.
cgi/viewcontent.cgi?article=1027&context= Hyrynsalmi, S., Mäkilä, T., Järvi, A., Suominen, A.,
sspapers) Date Accessed: 17 June 2018. Seppänen, M. and Knuutila, T. (2012)
eBizMBA Guide (2017) The 15 Most ‘App Store, Marketplace, Play! An Analysis
Popular Blogs. The eBizMBA Guide of Multi-Homing in Mobile Software
THE HISTORY OF ONLINE SOCIAL MEDIA 383

Ecosystems’, in S. Jansen, J. Bosch and C. Mediatization of Communication. Berlin: de


Alves (eds.), Proceedings of the Fourth Inter- Gruyter. pp. 131–162.
national Workshops on Software Ecosys- Larson, G.W. (2003) ‘Instant Messaging’, in
tems, CEUR Workshop Proceedings 879. pp. S. Jones (ed.), Encyclopedia of New Media:
59–72 (https://ssrn.com/abstract=2281670) An Essential Reference to Communication
Date Accessed: 17 June 2018. and Technology. Thousand Oaks, London,
Jansen S. and Bloemendal E. (2013) ‘Defining New Delhi: Sage. pp. 236–237.
App Stores: The Role of Curated Marketplaces Lemke, R., Tornow, T. and PlanetRomeo.com
in Software Ecosystems’, in G. Herzwurm and (2015) Gay Happiness Monitor – Results over-
T. Margaria (eds.), Software Business. From view from a global survey on perceived gay-
Physical Products to Software Services and related public opinion and gay well-being.
Solutions. ICSOB 2013. Lecture Notes in Busi- Mainz: Johannes Gutenberg University
ness Information Processing 150. Berlin, (https://www.planetromeo.com/wp-content/
Heidelberg: Springer. pp. 195–206. uploads/2015/05/GAY_HAPPINESS_MONI-
Jenkins, H. (2009) Confronting the Challenges TOR_2015.pdf) Date Accessed: 17 June 2018.
of Participatory Culture: Media Education for Leuf, B. and Cunningham, W. (2001) The Wiki
the 21st Century, together with K. Clinton, Way. Quick Collaboration on the Web.
R. Purushotma, A.J. Robison and M. Weigel. Boston: Addison-Wesley.
Chicago: The MacArthur Foundation (https:// Livingstone, S. (2009) ‘On the Mediation of
www.macfound.org/media/article_pdfs/JEN- Everything. ICA Presidential Address’ 2008,
KINS_WHITE_PAPER.PDF) Date Accessed: 17 Journal of Communication, 59(1): 1–18.
June 2018. Lomborg, S. (2016) ‘A State of Flux: Histories
Johnson, M. (2015) History of Social Media Part III of Social Media Research’, European Journal
(http://www.booksaresocial.com/history-of- of Communication, 32(1): 6–15.
social-media-part-iiI/) Date Accessed: 17 Lunt, P. and Livingstone, S. (2016) ‘Is “Mediati-
June 2018. zation” the New Paradigm for our Field? A
Jones, S. (2003) ‘PLATO’, in S. Jones (ed.), Ency- Commentary on Deacon and Stanyer (2014,
clopedia of New Media: An Essential Refer- 2015) and Hepp, Harvard, and Lundby
ence to Communication and Technology. (2015)’, Media, Culture and Society, 38(3):
Thousand Oaks, London, New Delhi: Sage. 462–470.
pp. 375–377. Milligan, I. (2017) ‘Welcome to the Web: The
Kaplan, A.M. and Haenlein, M. (2010) ‘Users of Online Community of GeoCities During the
the World, Unite! The Challenges and Early Years of the World Wide Web’, in N.
Opportunities of Social Media’, Business Brügger and R. Schroeder (eds.), The Web as
Horizons, 53(1): 59–68. History. Using Web Archives to Understand
Kaplan, A.M. and Haenlein, M. (2012) ‘Social the Past and the Present. London: UCL Press.
Media: Back to the Roots and Back to the pp. 137–158.
Future’, Journal of Systems and Information Millward, S. (2016) WeChat’s Global Expansion
Technology, 14(2): 101–104. Has Been a Disaster (https://www.techinasia.
Kietzmann, J.H., Hermkens, K., McCarthy, I.P. com/wechat-global-expansion-fail) Date
and Silvestre, B.S. (2011) ‘Social Media? Get Accessed: 17 June 2018.
Serious! Understanding the Functional Build- Musser, J. and O’Reilly, T. (2006) Web 2.0. Prin-
ing Blocks of Social Media’, Business Hori- ciples and Best Practices [excerpt] (http://
zons, 54(3): 241–251. cursa.ihmc.us/rid=1211300618980_550838
King, G., Pan, J. and Roberts, M.E. (2013) ‘How 356_10465/web20_report_excerpt.pdf)
Censorship in China Allows Government Date Accessed: 17 June 2018.
Criticism but Silences Collective Expression’, Noriega, M. (2011) iPhone Keynote 2007 Com-
American Political Science Review, 107(2): plete (https://www.youtube.com/watch?
326–343. v=t4OEsI0Sc_s) Date Accessed: 17 June
Krotz, F. (2014) ‘Mediatization as a Mover in 2018.
Modernity: Social and Cultural Change in the O’Reilly, T. (2005) What Is Web 2.0? Design
Context of Media Change’, in K. Lundby (ed.), Patterns and Business Models for the Next
384 THE SAGE HANDBOOK OF WEB HISTORY

Generation of Software (http://www.oreilly. Framework for Study’, Sociology Compass,


com/lpt/a/1) Date Accessed: 17 June 2018. 10(9): 768–784.
Peter, I. (2004) The History of Email (http:// Twitter (2017) Twitter Usage – Company Facts
www.nethistory.info/History%20of%20 (https://about.twitter.com/company) Date
the%20Internet/email.html) Date Accessed: Accessed: 17 June 2018.
17 June 2018. Van Deursen, A.J. and Van Dijck, J. (2014) ‘The
Sajithra, K. and Patil, R. (2013) ‘Social Media – Digital Divide Shifts to Differences in Usage’,
History and Components’, IOSR Journal of New Media & Society, 16(3): 507–526.
Business and Management, 7(1): 69–74. Van Dijck, J. (2009) ‘Users Like You? Theorizing
Schmidt, J.-H. (2011) Das neue Netz. Merk- Agency in User-Generated Content’, Media,
male, Praktiken und Folgen des Web 2.0. Culture & Society, 31(1): 41–58.
[The New Net. Characteristics, Practices and Van Dijck, J. (2011) ‘Tracing Twitter: The Rise of
Consequences of Web 2.0]. Konstanz: UVK. a Microblogging Platform’, International
Senft, T.M. (2003a) ‘Bulletin-Board Systems’, in Journal of Media and Cultural Politics, 7(3):
S. Jones (ed.), Encyclopedia of New Media: 333–348.
An Essential Reference to Communication Van Dijck, J. (2013) The Culture of Connectiv-
and Technology. Thousand Oaks, London, ity: A Critical History of Social Media. New
New Delhi: Sage. pp. 45–48. York: Oxford University Press.
Senft, T.M. (2003b) ‘Internet Relay Chat’, in Vorderer, P., Krömer, N. and Schneider, F.M.
S. Jones (ed.), Encyclopedia of New Media: (2016) ‘Permanently Online – Permanently
An Essential Reference to Communication Connected: Explorations into University Stu-
and Technology. Thousand Oaks, London, dents’ Use of Social Media and Mobile Smart
New Delhi: Sage. pp. 256–258. Devices’, Computers in Human Behavior,
Thelwall, M.A. and Kousha, K. (2014) ‘Academia. 63(2016): 694–703.
edu: Social Network or Academic Network?’ Wagner, C. (2004) ‘Wiki: A Technology for
Journal of the Association for Information Sci- Conversational Knowledge Management
ence and Technology, 65(4): 721–731. and Group Collaboration’, Communications
Thelwall, M.A. and Kousha, K. (2015) of the Association for Information Systems,
‘ResearchGate: Disseminating, Communicat- 13(1): 264–290.
ing and Measuring Scholarship?’ Journal of Wikipedia (2017) Wikipedia: About (https://
the Association for Information Science and en.wikipedia.org/wiki/Wikipedia:About)
Technology, 66(5): 876–889. Date Accessed: 17 June 2018.
Treem, J.W., Dailey, S.L., Pierce, C.S. and Biffl, YouTube (2017) Statistics (https://www.
D. (2016) ‘What We Are Talking About youtube.com/yt/press/statistics.html) Date
When We Talk About Social Media: A Accessed: 17 June 2018.
PART V

Web History and Users, some


Case Studies
This page intentionally left blank
26
Cultural Historiography of the
‘Homepage’
Madhavi Mallapragada

INTRODUCTION HTML links on a given page would lead


to another page (within the site or possibly
Technically speaking, the homepage refers outside it) with its own set of hyperlinks that,
to the designated main page or starting page in turn, were pathways to other pages and links
of a website that is located in the site’s root (Chun and Keenan, 2006; Landow, 2006).
directory (root in the Domain Name System’s Theoretically, then, the website is a decentered
hierarchical structure refers to the top-level text, made up of nodes and linkages; this in
directory and is the starting point of the file turn helped (partly) shape the language of
system) (W3C Technical Architecture travel, mobility, fluidity, and dispersal that
Group, 2004). The URL (Uniform Resource is evoked in historical and contemporary
Locator) or Web address for a homepage is theorizations about user experience of the
typically just the domain name of the site Web as well as the Web’s discursive politics
(for example, www.utexas.edu). The home- vis-à-vis ideologies of time-space, place,
page was and remains a fundamental unit of territory, identity, and the ‘real/virtual’ (see
the Web’s architecture, albeit the specific note for examples).1 It is in this broader
practices and politics shaping the design, context of discrete but interconnected links and
usability, access, and experience of home- globally dispersed network of websites that the
pages are not the same across time (from the spatial–temporal unit of the homepage gains
1990s to the present) and space (cultural and prominence beyond (but intertwined with) its
social contexts). technical significance as the root document.
As websites proliferated by the mid 1990s, I argue here and elsewhere (Mallapragada,
a key area of scholarly attention was the 2014a) that the ‘homepage’, so central to
Web’s hypertextuality, namely the non-linear Web culture, must be understood as a cul-
intertextual nature of a website wherein tural metaphor. At the core of the category
388 THE SAGE HANDBOOK OF WEB HISTORY

of ‘homepage’ is an idea that has enduring period of the mid 1990s to the mid 2000s,
power as a material, affective, and political anchored ‘the ideals and ideologies of
idea: the idea of home. The homepage, ris- belonging online in relation to two dominant
ing to prominence in the early-to-mid 1990s, imaginaries associated with the time-space
must be understood in the broader context of of the home – namely, the domestic, familial
the destabilization of cultural narratives that household and the public, national homeland’
historically privileged rootedness, authentic- (Mallapragada, 2014a: 1).
ity that was tied to place and tradition, and a The historical context of the 1990s as it
time–space framework that was shaped by a pertains to Indian immigration to the United
nationalist optic (Clifford, 1994; Brah, 1996; States is another key factor that anchors my
Appadurai, 1997; Ong, 1999). Drawing on my study of websites. While the migration of
research on websites targeting Indian immi- Indians to American shores dates back to the
grants in the United States since the 1990s mid-to-late nineteenth century, it wasn’t until
(Mallapragada, 2000, 2006a, 2006b, 2010, the passage of the US Immigration Family
2014a, 2014b), I argue that the homepage is Reunification Act in 1965 that Indian immi-
a metaphor that anchored ideas of belong- gration to the United States took on a steady
ing at the turn of the twentieth century as the pattern (Takaki, 1989). That said, the 1990s
world was transformed by digital capitalism, were a particularly significant decade for
new media communicative practices, unprec- Indian immigrant culture for a few reasons:
edented global mobility and transnational all-time high levels of Indian immigration to
lives and work (Castells, 1996; Cohen, 1997; the United States; a growing recognition of
Schiller, 1999; Ang, 2001; Karim, 2003; Lee diversity within the Indian immigrant and
and Wong, 2003; Grewal, 2005; Nakamura, Indian American community at large (gen-
2008). While the need for belonging is argua- erational, political, cultural differences);
bly a universal one, imaginaries of belonging strategic overtures by the Indian state to
as well as how one experiences belonging or embrace its diaspora within its national fold;
its lack thereof (being ‘at home’/feeling out and more visible efforts by immigrant sub-
of place) are both historically and socially jects to enact and negotiate their sense of cul-
contingent (Douglas, 1991; Massey, 1994; tural citizenship through media and popular
Naficy, 1999; Morley, 2000). culture, especially the Web (Mallapragada,
My research on the online cultures of 2014a: 1–19).
Indian immigrants revealed that the Web, Scholars writing at the intersections
and its spatial trope the homepage, were cru- of Web studies and immigrant/diasporic/
cial to the construction and negotiation of migrant/exile studies (sometimes catego-
imaginaries of belonging by a diverse set of rized as studies of ‘digital diaspora’) have
players – including individual creators of per- called attention to the homology between
sonal homepages, commercial entities offer- the condition of transnational migration and
ing community-oriented websites, non-profit the condition of virtual communication via
sites for community building and activism or the Web (see for instance, Lee and Wong,
predominantly e-commerce sites. It is worth 2003; Georgiou, 2006; Gajjala and Gajjala,
pausing here and clarifying why I include 2008; Everett, 2009; Alonso and Oiarzabal,
different genres of sites – personal, commu- 2010). Both involve a pattern of deterritori-
nity, commercial, organizational – within this alization and reterritorialization. Migrants
present discussion of the ‘homepage’. While leave a national territory and relocate to a
there are clear differences in the institutional new country. Websites transcend physical
and ideological cultures of these types of sites, geography and take ‘root’ (as homepages,
my frame for understanding them is how they servers, country code domain names) in digi-
all differently, but in the same socio-historical tal, virtual spaces (Shklovski and Struthers,
CULTURAL HISTORIOGRAPHY OF THE ‘HOMEPAGE’ 389

2010). Relatedly, ideas of real and virtual are discussed within a broader socio-cultural
also invoked in this homologous relationship context to demonstrate why and how ‘the
because migrants on the one hand could be homepage’ needed to be understood as a
said to experience their symbolic real (their cultural form. My intellectual framework
home country) virtually through memory, was and is shaped by the fields of diaspora
community formation, and cultural reproduc- studies, cultural studies of media, postcolonial
tion, while websites on the other hand allow studies, critical race, and feminist studies –
one to virtually experience our sense of time, fields that have routinely put home and
space, and place (Kyra Landzelius, 2006). homeland and their nexus at the locus of their
Or, another way to think about it is that both disciplinary interrogations. Such an intellec-
the immigrant condition and the condition of tual framing allowed me to engage the ques-
Web communication fundamentally disrupt tion of how power shapes the production of
traditional place-based orientation and sense normative understandings of ‘home’ and
of time and open up the space for renewed ‘homepage’.
imaginations about temporality, presence, Normative understanding of home (pri-
and connection. vate household and public homeland) and
homepages rely on a hierarchy of social axes
of identity and difference such as national
affiliation, race, gender, class, sexuality, and
CHAPTER OVERVIEW, THEORETICAL language (Morley, 2000). By examining the
AND METHODOLOGICAL politics of homepage in an immigrant con-
FRAMEWORK text, my research questions the a priori nature
of the category of the homepage – in other
This chapter offers a cultural historiography words, the implicit sense of its always already
of the homepage by drawing on my research ‘home-ness’. By demonstrating how belong-
on Indian immigrants and their online cul- ing was actively constructed and negotiated
tures in the United States. I do not offer a by US immigrant-centric homepages specifi-
comprehensive set of historical facts about cally by employing the tropes of immigrant
specific websites, nor do I cover the Web household and virtual homeland, my research
from its emergence to the present. Instead, I makes a case for interrogating how norma-
offer details from homepages, focusing on tive understandings of belonging (in domes-
the period from the mid 1990s to the mid tic spaces, in community spaces, within the
2000s, as a form of historical evidence to nation) are embedded within homepages that
make a larger argument about the role and are not necessarily explicitly targeting any
politics of the homepage in Web culture. As specific community or group but are more
a researcher working in the media and cul- broadly imagined around a general, main-
tural studies tradition, issues of epistemology stream audience/users. Although utopian
(ways of knowing the subject of our research), narratives about the Web, especially around
methodology (how we study it), and theoreti- issues relating to access and its ahistorical
cal agenda (what issues we seek to engage ‘newness’, have been vigorously disman-
and why we consider them relevant), are tled and new conceptual and methodologi-
inter-related aspects of my research that cal frames have been offered (for example,
co-exist and shape each other. In my book see Bolter and Grusin, 2000; Norris, 2001;
Virtual Homelands (2014a), which I am Brügger, 2010), there still remains, as Gerard
drawing on for this chapter, I employed a Goggin and Mark McLelland (2008) note, a
combination of textual, institutional, and Euro-American bias in how knowledge about
discursive analyses. I situated the textual and the Web and its trajectories is produced. Most
institutional details of the homepages I often, such a bias reveals itself in universal
390 THE SAGE HANDBOOK OF WEB HISTORY

accounts of aspects or features of the Web, I recall Michel Trouillot’s insight in Silencing
often framed as technological histories. While the Past (1995), that in the production of
the implication is that technological histories historiography may be also embedded
transcend cultural context, in reality they are moments of erasure – when source materials
mutually implicated. In this vein, I note that are not recorded, archives are not preserved,
there is no universal history of the homepage, or narratives are not told.
only micro histories (and that a cultural his- Some of the personal pages I examined
toriography of Indian immigrant homepages were in the tradition of a personal diary or
reveals how the technological and institu- biography with links to ‘about me’, ‘my
tional aspects of the homepage (hyperlink- interests’, ‘my work’, and ‘my family’. In
ing, domain name symbolism, advertising these examples, the focus of the page was
and sponsorship, discussion forums, aes- on self-expression of identity and personal
thetic features, entrepreneurial capital, and details. As late-1990s and early-2000s schol-
corporate identity) are shaped by and, in turn, arship on personal homepages has discussed,
shape the latter’s cultural politics). In the next the textual and aesthetic choices made by
sections, I will draw on some key and diverse personal homepage creators sometimes
examples to establish my larger argument included appropriating features of physical
about understanding homepages as a cultural rooms (a wall poster for instance) or homes
metaphor for the recasting of belonging in (doorways), but more crucially involved the
the age of new media and migration. strategic presentation of self articulated as
a hypertextual map of individual interests
and interpersonal connections (Wakeford,
1997; Chandler and Roberts-Young, 1998;
THE PERSONAL HOMEPAGE: Papacharissi, 2002; Cheung, 2004). A com-
REPRESENTING THE TRANSNATIONAL mon feature I found in these pages was
SELF details about Indian ‘origins’ and current
location in the US homeland. As if answer-
In my research, I found that many of the per- ing the perennial question, ‘where are you
sonal pages were maintained by Indian grad- (really) from?’ asked of immigrants in a new
uate students at US universities or those who locale, a question eloquently deconstructed
were employed at US firms, many of the by media and postcolonial theorist Ien Ang
latter being IT and software companies. (2001: 10–11), the authors of the homepages
While many of these pages hosted by in most cases posted details about their past
Geocities Angelfire or Tripod are no longer and present residences in the quintessential
active, one could potentially access at least immigrant frame of, ‘I am originally from
some of them through online archives such ___in India and now I live in __in the United
as Reocities, The Internet Archive’s Geocities States’.2 The personal homepage also bears
Special Collection 2009, Oocities, and testimony to how ‘internet surfing’ emerged
GeoCities.ws. Writing about them in the cur- in the 1990s as a legitimate hobby alongside
rent is an act of memory, limited archive and the usual suspects – reading, traveling, listen-
noticeable absence of ‘evidence’, and as such ing to music, etc. Graduate students, many of
makes visible a research conundrum faced by them from engineering and computer science
Web scholars and historians who might have backgrounds, maintained their pages with a
been witnesses but not necessarily diligent mix of purpose, it appeared. In some cases,
recorders of that which they witnessed. the pages were an attempt to speak to their
Beyond that, as a way of acknowledging my dual homeland affiliations – so, for example,
blind spots in the act of writing about home- links to news outlets and sports teams from
page culture in the Indian immigrant context, India were arranged alongside links to their
CULTURAL HISTORIOGRAPHY OF THE ‘HOMEPAGE’ 391

favorite US national sports teams and media country, were more invested in recreating an
outlets. The placement of images of the Indian ideology of the homeland on the Web than
and American flags as a top banner on pages an individually oriented sense of self. This is
was a common visual strategy to represent an important point to note because most stud-
the emergent transnational imaginary around ies of personal homepages have been focused
cultural belonging and citizenship.3 In recent on the latter (self) and relatedly on noting the
years, studies of homepages maintained by analogous relationship between the brick and
transnational subjects such as international mortar spaces of real homes with the virtual
students and immigrants have highlighted spaces produced on homepages, rather than
how ideas of original homeland (country of on the relationship between personal home-
citizenship, the ‘there’) and present home- pages and homelands (Chandler and Roberts-
spaces (country of residence, the ‘here and Young, 1998; Dominick, 1999; Döring, 2002;
now’) are intertwined with ideas of self and Cheung, 2004). The personal homepages of
belonging in the transnational space (see for immigrant subjects articulating an overlap-
instance, Collins, 2009). In a few instances, ping investment in private selves and pub-
the personal homepage was little more than lic identities thereby complicate the already
an elaborate CV with a strategic performance blurry boundaries between private and public
of the professional self for future employers. in online spaces (Marwick and Boyd, 2014)
Here, in addition to personal details, the page but do by mapping the contours of the pri-
contained links to the latest news in their area vate and public in a transnational context.
of study. Personal homepages that, for instance, func-
While these different examples offer a tioned as a database of news, sports, enter-
glimpse of how different subjects personal- tainment, and information from India and
ized the homepage to either narrate their Indian America were essentially drawing on
selves online or reproduce a strategic ver- media stories or links from the public square
sion to negotiate their continued belonging but repurposing them to cater to the ‘private’
in the United States (students wanting to interests and needs of themselves and others
impress future employers, get a work permit, like them. Private in this context reveals that
and thereby extend their stay in the United there is no universal sense of individual self,
States), there were a few other examples that only one marked by social relations such as,
show how the personal homepage had hardly in this instance, one’s identity as an immi-
anything to do with private self or personal grant in the United States, an ethnic minor-
homes and everything to do with public selves ity whose taste and interests in popular and
and original homeland. Examples include public culture do not get catered to by main-
pages which essentially functioned as a vir- stream American media.
tual library of links to Indian newspapers and
cultural magazines, pages that paid tribute to
their homeland of India by creating a database
of website links to cultural and religious tra- HOMELAND IDEOLOGIES AND
dition, iconic places and linguistic cultures, THE INSTITUTIONAL CONTEXTS
and pages that offered a culturally national- OF THE WEB
ist take (in most case, a Hindu-centric one)
on the greatness of past and present India.4 Community-oriented homepages targeting
These types of personal homepages exempli- Indian immigrants in the United States in
fied how some users, no doubt responding to their earliest iterations (1994) recognized
both their cultural and nationalist identifica- transnational subjectivity as the locus of the
tion as well as the challenge to such identifi- imagined user. This did not mean, however,
cation brought about by migration to a new that ethnocentrism or inward-looking
392 THE SAGE HANDBOOK OF WEB HISTORY

nationalistic discourses did not appear on and insurance (for US homes, travel to India
some of these sites (Lal, 2003). Relatedly, and medical insurance for visiting relatives
although website names like indiainfoonline. from India).
com, indolink.com, nriol.com, indiaworld. When US-based sites like indolink.com
com are obvious examples of marking the (1998), nriol.com (1997) (NRIOL standing
URL as a nation-centric space (Shklovski for Non Resident Indians Online), Namaste.
and Struthers, 2010), when we examine the com (1998), and indianmatches.com (2003)
content and discursive politics of immigrant entered the fray, the company narrative
sites, we note, as Emily Ignacio (2005) and expressed through its ‘About us’ or ‘Mission’
Kyra Landzelius (2006) among others have sections and press releases and interviews
discussed, that the act of immigrants making with the Indian American press (popular
themselves ‘at home’ in an online space fun- publications included India Currents, India
damentally complicates the relationship Abroad, India West) almost always called
between identity and place and problema- attention to the immigrant identity of the
tizes the idea of a stable national discourse in company founders as well as to the fact that
diasporic settings. the website was headquartered in a US city.
Some of the websites targeting Indian In some cases, such as indolink.com, the
immigrants in the United States in the late company first started in Santa Clara, CA and
1990s also contended with a related issue of then established satellite offices in New York
a website’s country of origin and its potential and Mumbai. To be fair, in many cases it was
implications for audience engagement. For hard to really see a big difference in the cul-
example, between 1996–2001, India-based tural ideologies espoused by India-based sites
sites like samachar.com (1998; news and and US-based ones. For example, samachar.
information), khoj.com (1998; job search por- com and indolink.com both offered a fairly
tal), indiaworld.com (entertainment, sports, similar version of a Hindu-centric culturally
news), shaadi.com (1997; matrimonial), and nationalist India; the difference was more in
rediff.com (1996; news and popular culture) the minor details – while samachar.com pre-
were very popular among US users. Many of sented its role as showcasing India on the net,
these sites were hosted by formidable media indolink.com, like many US sites of the time,
and information technology players in India presented its role as linking and connecting
and the United States and were clearly target- Indians worldwide. In reality, India-based
ing Indian immigrants (Rajghatta, 2001). The sites did not necessarily lose out on their pop-
Indian Internet marketplace was robust on the ularity because of the advent of US competi-
production side but facing obstacles on the tors. In fact, over the course of the late 1990s
consumption/usage end due to infrastructural, to early 2000s, we see a complicated mesh of
bandwidth, and cost-related factors (Aguiar, Indian and US players shape the Web space.
1999). During this same period, Indian immi- For instance, AT&T and Western Union fre-
gration to the United States was at an all-time quently advertised on Indian sites and Indian
high and the diverse needs of the community players like ICICI Bank and Citibank adver-
along generational, class, social identity, and tised on US sites (as did American companies
ethnic–racial politics were becoming more such as Western Union).5
pronounced and visible. In addition to pro- While, in many instances, the immigrant
viding news about India and Indians in the sensibility of the company’s founders and,
United States, a key way in which these sites by extension, the website was mentioned in a
targeted Indian immigrants was through their matter-of-fact way, in some cases it was pre-
advertising, notably, international calling sented more vigorously as a key factor in the
cards, money remittance services, matrimo- appeal of the site. A case in point is webindia.
nial ads, shopping for consumer products, com, a California-based Web portal for
CULTURAL HISTORIOGRAPHY OF THE ‘HOMEPAGE’ 393

businesses based in India, founded in 1994, the longest-running (since 1970) and most
around the same time as Yahoo and Netscape prominent Indian American print newspa-
Navigator made their entry on the Internet. per, India Abroad (Financial News, 2000;
In its marketing promos, the company framed Business Wire, 2001a, 2001b). Rediff.com’s
itself as ‘an opportunity for businesses in […] business strategies over the next few years
motherland to compete in the global mar- revealed its efforts to market itself as a trans-
ket…’ and ‘erase the advantages of foreign national company and mobilize an expanded
companies’ (Internet Archive). No ambiguity sense of its cultural citizenship that aimed to
here when it comes to identifying the site’s cater to Indian Americans as well as Indians
homeland affiliation. On the other hand, sites in India. The company redefined itself as an
like bayareaindian.com (1999) and indian- Indian American company (both in the sense
matches.com (2003) used their US home of the hyphenated cultural identity of immi-
location to make a larger argument about grants as well as a company with businesses
their cultural appeal. While bayareaindian. in India and the United States). A highlight of
com’s USP was its ability to mobilize local Rediff.com’s US storyline included its suc-
networks and cater to the local needs of the cessful alliance with another Indian entity,
immigrant community (for example, a list ICICI Bank, in the early 2000s, which fore-
of Indian nannies in the area), indiamatches. grounded how crucial the construct of the
com, which was essentially a set of twin sites, ‘home’ was to enable these Indian players
indiandating.com and indianmatrimonials. to recast themselves as part of the immigrant
com, sought to appeal to Indian American family (see Mallapragada, 2014a: 103–12).
youth by including a dating service separate ICICI Bank, the business leader for Web-
from conventional matchmaking services based banking in India and a dynamic player
and set itself apart from the marriage-focused in global banking, was looking to create a
services offered by the India-based leading new online campaign for its online services
matrimonial site shaadi.com (the Hindi word that targeted Non-Resident Indians (NRIs)
for marriage). In the manner in which the in the United States. Among the services
institutional identity of a site was managed were transnational banking (dual currencies
or presented, there was either a flickering or, being valid), money remittance to India, and
in some instances, a direct invocation of the home loans for properties in Indian locations.
site’s institutional home as a key signifier of ICICI frequently placed ads on Rediff.com,
the politics of belonging advanced by the site sponsored its finance channel, and, much like
at large. Rediff.com, articulated a transnational space
A unique and temporary phenomenon is – a virtual Indian American network – as its
observed in the case of Rediff.com, which authentic space of belonging. In a one-time
emerged in the late nineties as a leading promotion campaign orchestrated to coin-
Indian site and portal that had a large follow- cide with the Indian state’s announcement
ing of US users. The story goes that when of August 6 as the ‘Day of the Diaspora’,
Rediff.com launched its website, it received icicibank.com placed sponsored messages
one million hits in the first month, but almost that read ‘High Achieving NRIs, we have a
all of them were from Indians based in the lot in common…It’s not that lonely at the
United States (Aguiar, 1999). In 2000, the top when we’re right there with you’ in the
site launched a US version of the site, and newspaper India Abroad, which since 2001
over the next few years acquired three Indian has been rebranded as a Rediff.com publi-
American media businesses. They included cation (ICICI Bank, 2003). The messaging
an online portal (thinkindia.com), an Internet was the most direct attempt at establish-
phone company (Value Communications), ing an analogy between Indian immigrants’
and the most high-profile of its acquisitions, experience of border crossings, leaving ‘old’
394 THE SAGE HANDBOOK OF WEB HISTORY

homes and establishing ‘new’ ones, and the affiliations, and transnational investments.
bank’s journey of leaving behind Indian Among commercially sponsored commu-
and physical boundaries and reemerging as nity sites (indolink.com (1998), rediff.com
a virtual bank. Although Rediff.com as of (1996), sulekha.com (1998), indusladies.com
2016 shifted its attention back to the Indian (2004), for example), commonly appearing
market, delisted from Nasdaq and sold India topics were news, entertainment, religion,
Abroad to another Indian American company, lifestyle, parenting, immigration, stock mar-
icicibank.com continues to capitalize on its kets, education, sports, jobs, matrimony, and
virtualization of home services, be they per- discussion forums. In essence, the spatial
sonal finance, getting access to virtual tours arrangement of these topics on the main page
of Indian home properties, or sending money of these sites, many of whom often described
to family back home; the only difference themselves as ‘ethnic portals’ in their press
since the early 2000s is that it is now using and marketing reports, was an attempt to cap-
social media platforms such as Facebook and ture the temporal as well as the socio-cultural
Twitter to expand its reach. context of the immigrant condition. By
constructing memory and foregrounding
core areas (community, finance, shopping,
DOMESTIC AGENDAS, FAMILY culture, religion), in essence the page was
TROPES, AND EXCLUDED inviting the user to relate to these links and
see them as emblematic of the aspects of
COMMUNITIES
home and homeland that need preservation,
He was born in Kansas and raised in Delaware connection, and representation. For exam-
County, but the place Vivek Srivastava feels most ple, on a site like indolink.com, while the
American is not on U.S. soil. It’s in space. ‘Immigration’ channel would offer guid-
Cyberspace. Each day, Srivastava clicks onto his ance on navigating the US visa system, the
computer server from his house in Media, Pa., and, ‘Parenting’ channel would give advice and
for at least two hours, mingles with thousands of
other Indian Americans on the Internet’s World solicit discussion on the challenges of and
Wide Web. (Sudarshan, 1999: 73) resources for first-generation immigrant
parents raising Indian American children.
In this section, I offer a few prominent and Who is the transnational subject? How is
diverse examples to highlight this argument. community articulated to location and mobil-
As I have argued earlier, the distinction ity? In my research, I found that the con-
between commercial and community web- tours of community formation were mapped
sites is untenable precisely because one of across diverse variables. For gopio.net
the key selling points of the Web was its abil- (1999), the home of the Global Organization
ity to transgress boundaries. Hence my dis- of People of Indian Origin, it was an Indian-
cussion here moves freely between origin story with global spin-offs that was
commercial websites and commercially at the heart of its community identity. Early
sponsored community websites. sites like bayareaindian.com and pittsburghi-
A very common template that many of ndian.com were more invested in function-
these sites followed was to have the home- ing as a community center and bulletin board
page feature a gallery of topics that appeared and engendering a local, intimate sense of
as channels or links on the main page, with neighborhood rather than a vast transna-
sub-links (in some instances) and advertising tional space of virtual belonging.
alongside them. What might seem to be a clut- As I have discussed elsewhere
tering of the main page was in fact a gesture (Mallapragada 2014a), there is plenty of evi-
to accommodate a strategically constructed dence that the homepage was crucial to the
image of Indian American identity, cultural maintenance of hegemonic and conservative
CULTURAL HISTORIOGRAPHY OF THE ‘HOMEPAGE’ 395

ideologies around nation, gender, class, and of female users, often addressed the chal-
caste in the Indian immigrant community. lenges to belonging within the nation as it is
For instance, a community site like TANA. clothed in patriarchal and regressive cultural
org, which represents the Telugu-speaking ideologies around gender roles, women’s
linguistic sub-group of the immigrant com- spaces, and cultural gatekeeping in the dias-
munity, also reveals a bias towards specific pora (see Gajjala, 2014). On a community
caste groups (Reddy and Kamma) that are site, indusladies.com, a US-centric discussion
politically and financially highly influen- forum, ‘H4 Indus Ladies’ (2005) exemplifies
tial in India and its diasporic circles. While how immigrant women feeling trapped and
hindunet.com (1994) promised access to an experiencing a sense of un-belonging in the
‘interactive Hindu universe’, its institutional United States as a result of restrictions placed
backers had strong ties to conservative Hindu on them because of their H4 visa status (most
nationalist groups in the United States and notably, being barred from being employed
India. Namaste.com, an online shopping site while on the visa) critically evaluates the
specializing in Indian consumer products, patriarchal politics of imagining the domes-
strategically advertised images of tradition- tic home space as a pleasant retreat from the
ally dressed ‘Indian’ immigrant women to busy world.7 In their conversations, women
drive home the point that, ‘no matter where frequently call attention to how the home can
you live everything you love about India, be a site of displeasure, anxiety, and bore-
movies, snacks, music, health and beauty dom. Furthermore, they re-route the discus-
products, is just a click away’ (Namaste.com, sion about belonging to the virtual ‘home’ of
2000). The metaphorical association between the discussion forum, noting that for some
Indian immigrant women (placed centrally in marginalized voices such as theirs, home
the ad) and the ‘India’ in the site’s tagline, territories, to recall David Morley’s (2000)
‘Bring India home!’, is crucial because a term, are dynamic, transnational zones of
site like namaste.com is digitally remaster- online agency and connectivity.
ing an older imaginary of the Indian cultural
nationalism (middle-class Hindu women as
bearers of cultural authenticity and tradition)
(Bhattacharjee, 1992) to position itself as an CONCLUSION
authentic brand in the e-commerce domain.
The associative linking of women’s social Despite recent news reports wondering if
roles within the family (as wives and moth- ‘homepages are dead’, both homepages and
ers) with the consumption and virtualiza- the idea of ‘home’ continue to be central to
tion of Indian culture was frequently made the imagination and organization of Web
on shopping sites or links that sold products spaces (Kamdar, 2015). In the present media
relating to the domestic and familial space of environment – where social media has
the immigrant home. For example, images of emerged as a key driver of traffic to websites
young women (representing brides) on mat- where Facebook’s profile and personal pages
rimonial sites and older women (represent- can be considered the twenty-first-century
ing mothers or ageing parents) for services version of the 1990s’ ‘free personal home-
around international phone cards or money pages’ (with a whole new dynamic with
remittances were very common.6 regards to commodification, surveillance,
The homepage was, however, not limited to privacy, and interactivity), and where mobile
the consolidation of hegemonic and conserv- viewing of online content has changed the
ative narratives about gendered social roles design and experience of websites – it is
or cultural nationalism. Sawnet.org (1995), important to recognize that while today’s
home to a feminist South Asian collective homepages catering to Indian immigrants in
396 THE SAGE HANDBOOK OF WEB HISTORY

the United States are still articulated to like YouTube, Facebook, or Twitter, both the
imaginations of home and belonging, the ‘look’ of the page and the practices by which
specific practices they employ to engender one could interact with the homepage and
belonging and, relatedly, the visual and cul- engender a sense of belonging are remark-
tural economy of webpage design and flow ably different from that of a homepage from
are different from the 1990s–2000s iteration. the 1990s, where no such access existed.
Most remarkably, the main pages of websites How belonging is invited and enacted is dif-
are increasingly incorporating social media ferent in both instances; nevertheless, my
icons as a way to direct users to the site’s larger point is that imaginaries of belonging
social media presence. And, relatedly, the are central to understanding the work home-
social media pages of commercial and com- pages ‘do’ in our Web-based culture.
munity websites are using the platform to Metaphors are integral to Web aesthetics
guide users back to the website. Arguably, and app design. Icons such as the shopping
what is unfolding in the current moment is an cart in online stores, a pen and paper for the
intensification of the idea of home and Pages app, a camera for Instagram, and a
belonging as mobile, affective, interactive house for the home screen all visually index
spaces that can be accessed by touch or voice the digital and the virtual through the physi-
recognition. cal and the familiar. The ubiquity of such
In different historical and cultural con- metaphors in online media, coupled with
texts, the homepage appears very differently the fact that the Web is more than a quarter
to its different users. When we are forced to century old, makes it especially challenging
contend with the homepage as a historical to separate the elements of architecture and
phenomenon, then we are also led to inter- technical design from those of metaphors and
rogate what the ‘home’ in the homepage cultural imaginaries. Yet, to assess the sig-
means across different historical and cultural nificance of a given medium, we need to not
contexts. For instance, while ideologies of just know how it functions as a technology
nostalgia woven around India as the original but, more importantly, examine that technol-
homeland were central to many of the early ogy as a cultural form. In other words, it is
websites targeting Indian immigrants in the to understand how the institutions, practices,
mid 1990s, by the mid 2000s (with social features, and histories of a technology shape
media and participatory culture) there were everyday life, culture, and meaning making
more instances of websites trying to articu- in a social context.
late belonging by interrogating how Indian Finally, I also take note of the fact that at
immigrants were impacted by ideologies the time of this writing (March 2018) adver-
of race, gendered immigration policy, and tising for products such as Amazon Alexa
inter-racial political alliances.8 Two things and Google Home are increasingly linking
are important to note here. One, recalling the everyday domestic, familial, and house-
Raymond Williams’ (1975) insight that tech- centric activities to the technological capacity
nology and society exist in a symbiotic rela- and cultural significance of these smart home
tionship, one can understand the shifts in how technologies (for example, creating a grocery
homepages articulated belonging as emblem- shopping list, looking up a recipe while in
atic of evolving ideas about identity and poli- the act of cooking) (O’Shea, 2018). In other
tics within the Indian immigrant community. words, ‘home’ as both a real place and imag-
Two, the status and experience of the home- ined space of belonging has been, and contin-
page in online culture in the late 2000s is dif- ues to be, a very potent idea to produce and
ferent from that of the 1990s. For instance, manage the material and discursive power
when the key hyperlinks on the homepage and value of networking technologies, be
of a website lead to social media platforms they websites, smart home devices, or social
CULTURAL HISTORIOGRAPHY OF THE ‘HOMEPAGE’ 397

media apps (Facebook’s Home, for example, 8  See chapters 3 and 4 in Mallapragada (2014a) for
while not successful, still reveals the efforts a more detailed discussion of these themes.
to articulate ideas of home to digital social
networking) (Turner, 2013). Granted, indus-
try narratives are tied to investments in the
consumer Web, but it is worth noting that, REFERENCES
historically as well as in contemporary times,
the constructs of the ‘house’ and the ‘home’ Aguiar, A.A. (1999) ‘Indian Internet Users
Multiplying Fast’, News India-Times, Octo-
have shaped and continue to shape the finan-
ber 22: 36.
cial, institutional, and social imaginaries of Alonso, A., and Oiarzabal, P.J. (Eds) (2010)
websites in particular and the Web at large. Diasporas in the New Media Age, Identity,
Politics and Community. Reno, NV: University
of Nevada Press.
Ang, I. (2001) On Not Speaking Chinese: Living
Notes Between Asia and the West. London:
Routledge.
1  For example, Sherry Turkle (1995: 14) in her dis-
cussion of the Windows interface (and how one
Appadurai, A. (1997) Modernity at Large: Cul-
can have multiple applications simultaneously tural Dimension of Globalization. Minneapo-
open on a computer screen) has argued that lis, MN: University of Minnesota Press.
‘windows’ are a metaphor for the fluidity and Bhattacharjee, A. (1992) ‘The Habit of Ex-
multiplicity of self and identity. Ella Shohat (1999) Nomination: Nation, Woman and the Indian
has thoughtfully critiqued the utopian narratives Immigrant Bourgeoisie’, Public Culture, 5(1):
of cybertravel and electronic frontiers in 1990s 19–44.
popular and public discourse and in turn argued Bolter, D.J., and Grusin, R. (2000) Remediation:
that the Web is shaped by a tension between Understanding New Media. Cambridge: MIT
mobility and fixity, between transcending borders
Press.
and creating boundaries.
2  http://members.tripod.com/~abhayj/whoami.
Brah, A. (1996) Cartographies of Diaspora:
html, URL when active, print-out from 10/26/02. Contesting Identities. London: Routledge.
3  http://www.public.iastate.edu/~sach/ URL when Brügger, N. (2010) (Ed.) Web History. New
active, print-out from 11/4/99. York: Peter Lang.
4  http://members.home.com/draj99/india.html, Business Wire (2001a) ‘Rediff.com Completes
URL when active, print-out from 11/4/99. Acquisition of Value Communications Cor-
5  See, for example, Indolink.com’s list of advertisers poration: New Entity To Leverage Strengths
at http://www.indolink.com/advertise.html. As of and Offer Expanded Services to Indian Amer-
2018, Indolink.com is an inactive site. icans in the U.S’, Business Wire, April 12.
6  Example of sites where these images appeared
Business Wire (2001b) ‘Rediff.com Re-Launches
include matrimonial sites such as shaadi.com and
H1Bmatrimonials.com, as well as the Facebook
India Abroad Weekly Publication’, Business
page of ICICI Bank’s NRI banking services for Indi- Wire, August 1.
ans in the United States. Castells, M. (1996) The Rise of the Network
7  The H4 visa, also known as the dependent visa, is Society. Malden: Blackwell Publishing.
given to the spouse and child(ren) of the H1B visa Chandler, D., and Roberts-Young, D. (1998)
holder. The H1B visa is given to foreign workers ‘The Construction of Identity in the Personal
in a ‘specialty occupation’ who perform ‘special- Homepages of Adolescents’, Nov, 1998. vis-
ized and complex’ duties, such as those in the ualmemory.co.uk/daniel/Documents/short/
information technology sector. US Citizenship strasbourg.html
and Immigration Services, ‘H-1B Specialty Occu-
Cheung, C. (2004) ‘Identity construction and
pations, DOD Cooperative Research and Devel-
opment Project Workers, and Fashion Models’.
self-presentation on personal homepages:
https://www.uscis.gov/working-united-states/ Emancipatory potentials and reality con-
temporary-workers/h-1b-specialty-occupations- straints’, in David Gauntlett and Ross Horsley
dod-cooperative-research-and-development- (Eds), Web.Studies. London: Arnold.
project-workers-and-fashion-models pp. 53–68.
398 THE SAGE HANDBOOK OF WEB HISTORY

Chun, W.H.K., and Keenan, T. (Eds) (2006) Old Internet. New Brunswick, NJ: Rutgers Univer-
Media, New Media: A History and Theory sity Press.
Reader. New York: Routledge. Internet Archive, ‘About us,’ www.webindia.
Clifford, J. (1994) ‘Diasporas’, Cultural Anthro- com. https://archive.org/
pology, 9(3): 302–338. Kamdar, S. (2015) ‘Is the Homepage Dead?’
Cohen, R. (1997) Global Diasporas: An Intro- Forbes, December 27. https://www.forbes.
duction. Seattle: University of Seattle Press. com/sites/sachinkamdar/2015/12/27/
Collins, F.L. (2009) ‘Connecting “Home” with is-the-homepage-dead/#16e189872344
“Here”: Personal Homepages in Everyday Karim, H. (Ed.) (2003) The Media of Dias-
Transnational Lives’, Journal of Ethnic and pora: Mapping the Globe. New York:
Migration Studies, 35(6): 839–859. Routledge.
Dominick, J.R. (1999) ‘Who Do You Think Landow, G.P. (2006) Hypertext 3.0: Critical
You Are? Personal Home Pages and Self- Theory and New Media in an Era of Glo-
Presentation on the World Wide Web’, balization. 3rd edn. Baltimore, MD: Johns
Journalism & Mass Communication Quar- Hopkins University Press.
terly, 76(4): 646–658. Kyra Landzelius, K. (Ed.) (2006) Native on the
Döring, N. (2002) ‘Personal Home Pages on the Net: Indigenous and Diasporic Peoples in the
Web: A Review of Research’, Journal of Virtual Age. London: Routledge.
Computer-Mediated Communication, 7(3) Lal, V. (2003) ‘North American Hindus, the
http://www.ascusc.org/jcmc/vol7/issue3/ Sense of History, and the Politics of Internet
doering.html Diasporism’, in Rachel Lee and Sau-Ling Cyn-
Douglas, M. (1991) ‘The Idea of Home: A Kind thia Wong (Eds), Asian America.Net: Ethnic-
of Space’, Social Research, 58(1): 287–307. ity, Nationalism and Cyberspace. New York:
Everett, A. (2009) Digital Diaspora: A Race for Routledge. pp. 98–138.
Cyberspace. Albany, NY: State University of Lee, R., and Wong, S.-l.C. (Eds) (2003) Asian
New York Press. America.Net: Ethnicity, Nationalism and
Financial News (2000) ‘Silicon Valley Portal to Cyberspace. New York: Routledge.
Merge with Rediff.com to Create Rediff USA: Mallapragada, M. (2000) ‘The Indian Diaspora
Merger will extend the reach of Rediff.com’s in the USA and around the Web’, in David
content, services and marketplace offerings Gauntlett (Ed.), Web.Studies: Rewiring
to the Indian-American community in the Media Studies for the Digital Age. London:
U.S’, Financial News, October 27. Arnold & OUP. pp. 179–185.
Gajjala, R., and Gajjala, V. (Eds) (2008) South Mallapragada, M. (2006a) ‘Home, Homeland,
Asian Technoscapes. New York: Peter Lang. Homepage: Belonging and the Indian-Amer-
Gajjala, R. (2014) Cyberselves: Feminist ican Web’, New Media and Society, 8(2):
Ethnographies of South Asian Women. 207–227.
Walnut Creek: AltaMira Press. Mallapragada, M. (2006b) ‘An Interdisciplinary
Georgiou, M. (2006) Diaspora, Identity and the Approach to the Study of Cybercultures’, in
Media: Diasporic Transnationalism and Medi- David Silver and Adrienne Massanari (Eds),
ated Spatialities. Cresskill, NJ: Hampton Critical Cyberculture Studies: Current
Press. Terrains, Future Directions. New York: New
Goggin, G., and McLelland, M. (Eds) (2008) York University Press. pp. 194–204.
Internationalizing Internet Studies: Beyond Mallapragada, M. (2010) ‘Desktop Deities:
Anglophone Paradigms. New York: Hindu Temples, Online Cultures and the Poli-
Routledge. tics of Remediation’, South Asian Popular
Grewal, I. (2005) Transnational America: Femi- Culture, 8(2): 109–121.
nisms, Diasporas, NeoLiberalisms. Chapel Mallapragada, M. (2014a) Virtual Homelands:
Hill, NC: Duke University Press. Indian Immigrants and Online Cultures in
ICICI Bank (2003) ‘The Day of the Diaspora’, the United States. Urbana-Champaign, IL:
India Abroad, January 10 sec. S: 1–4. University of Illinois Press.
Ignacio, E.N. (2005) Building Diaspora: Filipino Mallapragada, M. (2014b) ‘Rethinking Desi:
Cultural Community Formation on the Race, Class and Online Activism of South
CULTURAL HISTORIOGRAPHY OF THE ‘HOMEPAGE’ 399

Asian Immigrants in the United States’, Tel- Shklovski, I., and Struthers, D.M. (2010) ‘Of
evision and New Media, 15(7): 664–678. States and Borders on the Internet: The Role
Marwick, A., and Boyd, d. (2014) ‘Networked of Domain Name Extensions in Expressions of
Privacy: How Teenagers Negotiate Context in Nationalism Online in Kazakhstan’, in Policy &
Social Media’, New Media and Society, Internet, 2(4), http://www.psocommons.
16(7): 1051–1067. org/policyandinternet/vol2/iss4/art5/
Massey, D. (1994) Space, Place and Gender. Shohat, E. (1999) ‘By the Bitstream of Babylon:
Minneapolis, MN: University of Minnesota Cyberfrontiers and Diasporic Vistas’, in
Press. Hamid Naficy (Ed.), Home, Exile, Homeland:
Morley, D. (2000) Home Territories: Media, Film, Media and the Politics of Place. New
Mobility, Identity. London: Routledge. York: Routledge. pp. 226–227.
Naficy, H. (Ed.) (1999) Home, Exile, Homeland: Sudarshan, R. (1999) ‘Ethnic Roots in Cyber-
Film, Media and the Politics of Place. New space; Isolated in America, Immigrants are
York: Routledge. Uniting Online To Cherish Their Culture’,
Nakamura, L. (2008) Digitizing Race: Visual Little India, January 31, 9(73).
Cultures of the Internet. Minneapolis, MN: Takaki, R. (1989) Strangers from a Different
University of Minnesota Press. Shore: A History of Asian Americans. Boston:
Namaste.com (2000) Advertisement in Silicon Little, Brown and Company.
India, 4(7): 69. Trouillot, M. (1995) Silencing the Past. Boston:
Norris, P. (2001) Digital Divide: Civic Engage- Beacon Press.
ment, Information Poverty and the Internet Turkle, S. (1995) Life on the Screen: Identity in
Worldwide. Cambridge, UK: Cambridge Uni- the Age of the Internet. New York:
versity Press. Touchstone.
Ong, A. (1999) Flexible Citizenship: The Cul- Turner, A. (2013) ‘A Place to Call Home: A
tural Logics of Transnationality. Durham, NC: Social Media Giant Has Reinvented the
Duke University Press. Smartphone Interface’, The Age, April 28,
O’Shea, D. (2018) ‘Amazon’s Alexa Being Inte- http://www.theage.com.au/digital-life/
grated into Smart Homes’, Retail Dive, Feb- mobiles/a-place-to-call-home-20130425-
ruary 22, https://www.retaildive.com/news/ 2ifn3.html
amazons-alexa-being-integrated-into-smart- W3C Technical Architecture Group (2004)
homes/517591/ ‘Architecture of the World Wide Web
Papacharissi, Z. (2002) ‘The Self Online: The Volume One’, The World Wide Web Consor-
Utility of Personal Home Pages’, Journal of tium, https://www.w3.org/TR/webarch/
Broadcasting & Electronic Media, 46(3): Wakeford, N. (1997) ‘Networking Women and
346–368. Grrls with Information/Communication Tech-
Rajghatta, C. (2001) The Horse that Flew: How nology’, in Jennifer Terry and Melodie Cal-
India’s Silicon Gurus Spread Their Wings. vert (Eds), Processed Lives: Gender and
New Delhi: HarperCollins. Technology in Everyday Life. London: Rout-
Schiller, D. (1999) Digital Capitalism: Network- ledge. pp. 35–46.
ing the Global Market System. Cambridge, Williams, R. (1975) Television: Technology and
MA: MIT Press. Cultural Form. New York: Schocken Books.
27
Consumers, News, and a
History of Change
Allie Kosterich and Matthew Weber

INTRODUCTION distribution, and consumption patterns


related to print news media and key legacy
As we near the third decade of the Web, broadcast news media. In doing so, this chap-
attention is needed on the integral impact of ter examines the ways that consumers access
the Web’s historical development on a range news and information by mapping out the
of social, cultural, economic, and political changes in consumer behavior with regard to
facets of society. The influence of the history Web history and access to news. A key point
of the Web is particularly significant with throughout this chapter is the notion that
regard to the ways that humans connect, many present-day changes in news media
share, and communicate. In this context, the consumption can be traced back to historical
impact has been deeply evident in the trans- changes in news media content on the Web,
formation of consumption within the news thus providing a window into best practices
industry. In essence, a history of news media for future adaptation.
has quickly become a history of the Web. As The means by which consumers seek
such, this chapter provides a comprehensive and obtain information and news evolved
review of Web history as related specifically significantly over the past several decades.
to the news media industry. The chapter In 1990, 67 percent of households in the
focuses on Web history and the context of United States reported buying a newspaper;
news media in the United States from 1990 by 2000 that number dropped to 53 per-
through 2015. Specifically, the scope covers cent (Editor and Publisher Yearbook, 2003).
a history of change on the Web and related Conversely, while online news was slow to
impacts with regards to the ways in which find an audience in the 1990s, today 38 per-
consumers engage with the news media eco- cent of Americans report getting their news
system. Emphasis is placed on production, from online sources (Mitchell et al., 2016b).
CONSUMERS, NEWS, AND A HISTORY OF CHANGE 401

The general importance of the news media trends: the transition from traditional printed
industry, paired with the marked change newspaper to digital newspapers, the rise of
resulting from engagement with Web technol- social media, and the influence of mobile
ogy, has led to significant research focused technology (Bell et al., 2017). This chapter
on the transformation of the news industry thus focuses on Web history through the lens
since the introduction of the Web. Despite of the evolution of consumer interaction with
this large body of relevant work, a compara- news media by reviewing the core technolo-
tively small body of work has focused on gies and key media outlets over the past two
changes on the Web from the perspective of decades from the consumer perspective. To
the consumer and consumption of Web-based that end, we focus specifically on the rise of
news media content. And yet the consumer early Web technology (e.g. webpages as a
perspective of Web history is a key driver of means of distribution), the growth of social
change in the news ecosystem (Hamilton, media as a platform for news consumption,
2004). To that point, a 2015 Reuters Institute and, most recently, the dominance of mobile
Digital News Report found that 25 percent of technology as a new means for accessing and
consumers across the globe use a smartphone consuming news media from the Web. In the
as the main device for news consumption; closing sections, recent shifts to a mobile
in the United States alone, 44 percent use consumer environment are addressed, with
a smartphone to access the news (Newman a focus on the impact of the development of
et al., 2015). Dependence on the smartphone mobile applications on the consumer news
for news is remarkable given that the Apple environment.
iPhone, the most popular smartphone in the
United States, launched in 2007. Consumer
dependence on Web access via smartphone
grew at an alarming rate. Yet legacy news A RECENT HISTORY OF US NEWS
organizations have struggled to adjust deliv- CONSUMPTION
ery systems to a Web-based digital environ-
ment, and they have also struggled to adjust In order to better understand Web history and
to the emergence of Web-based digital-first related changing patterns of news consump-
intermediaries such as Facebook, Twitter, tion, this chapter reviews the past 25 years in
and Google through which many of today’s US news consumption. As noted, the focus
consumers obtain their news. To wit, the herein is placed on the transition to digital
2015 Reuters Institute Digital News Report news, the increasing prominence of social
also found that 41 percent of the consumers media, and the introduction of mobile tech-
sampled in their study use Facebook for news nology and mobile news apps. In general, the
each week. story of US news consumption since the
Thus, the history of the Web and associated advent of the Web is punctuated by the advent
technological change underlying the growth of new technologies and the increasing ability
of the Web continues to impact and explain of publishers to deliver rich and interactive
change in the news industry as consumers content via digital technology. And yet, as de
shift to a digital environment. This point is Sola Pool (1983) noted, advances in elec-
particularly salient as mobile and social tech- tronic technology often result in a reversal,
nologies continue to occupy a more central whereby audiences are increasingly frag-
role in the industry. In many ways, the trans- mented and afforded more choices for con-
formation today mirrors historical trends suming news. The consumer perspective is
from the history of the Web. In recent dec- thus particularly important to this narrative,
ades, the news industry has experienced three given the central role of the consumer in
significant disruptions related to consumption driving trends with regards to the patterns of
402 THE SAGE HANDBOOK OF WEB HISTORY

production in response to consumption of national and international companies such as


news on the Web. The Huffington Post, Buzzfeed, and Vice to
smaller companies targeting niche needs and
audiences such as Baristanet, New Brunswick
Newspapers on the Web Today, and Syria Deeply.

Few would argue against the narrative that the Broad trends in Web news
1990s and early 2000s proved fiscally disas- consumption
trous for the majority of the print newspaper Undoubtedly, print and Web-based news are
industry. In general, the narrative of news two inherently different products. There are
media adapting to Web technology is one of innate differences in the products: print news
responding to changes in consumer demand. articles are generally presented on one to two
The early period of the Web (e.g. the 1980s pages, and the reader is easily able to peruse
and 1990s) is one notable exception, as US the content. Web-based content is presented
newspapers responded to the advent of new online and read via a desktop computer,
technology in an attempt to anticipate changes laptop computer, tablet, or mobile device.
in consumer demand. For instance, in the The consumer reading Web-based content
1980s newspapers experimented with Telex will often see a headline and brief abstract,
technology as a way to send news digitally, and will then have to click to open the arti-
although it was clear there was not a signifi- cle, and to then read further. Often, Web-
cant consumer market for the product (Khaw, based news articles are spread across
1983). Similarly, in 1992 Prodigy and America multiple webpages (Mitchelstein and
Online, two early online Internet Service Boczkowski, 2009). In addition, the dis-
Providers, experimented with bulletin board tinctly different nature of the products, as
system formats for distributing digital news, well as the ecosystems (printed products vs.
despite relatively small user bases digital products), contributes directly to the
(Boczkowski, 2004). Early Web technology, distinct nature of the business models in
as used by news media organizations, can be print and online (Ahlers, 2006).
viewed as a means of experimentation, rather Despite the unique nature of Web-based
than as a means of responding to direct con- news content, recent research suggests
sumer demand. that, beyond the experience of consump-
The transition to news on the Web, how- tion, there are no major differences with
ever, is better understood as two periods – regards to recall of news read online versus
the movement of newspapers on to the Web, news read in print (d’Haenens et al., 2004).
and the introduction of digital-only news- Consumption of printed news has continued
papers. Indeed, one of the most significant to decline at a remarkable rate. A Reuters
changes in the history of the Web and the Institute report on digital news noted that
evolution of news consumption is related to from 2012 to 2016, consumption of print
the advent of digitally native news organiza- news declined from approximately 37 per-
tions. Digitally native news refers to news cent to 25 percent based on sources of news
content that is developed exclusively for dis- in a given week (Cornia et al., 2016). Early
tribution via online mediums (Nee, 2013). shifts to online news consumption raised
By definition, digitally native news media questions regarding the impact of online
organizations emerge and develop entirely news on civic engagement. Many feared that
online, differentiating themselves from tra- as audiences moved online, civic participa-
ditional news media organizations in a vari- tion would decline. Research on the impact
ety of ways. These digitally native news of online consumption on civic participation
organizations run the gamut from popular is mixed, as some studies have found little
CONSUMERS, NEWS, AND A HISTORY OF CHANGE 403

to no connection between online news con- 1990 and it remained the only one for some
sumption and civic participation (Bimber, time as the Web source code wasn’t released
2001; Margolis and Resnick, 2000), whereas until 1991 (CERN, n.d.).
others have found evidence that there is a Just a few years later, in 1994, the num-
positive relationship (Amadeo, 2007; Norris, ber of Americans with online access had
2003). Continued research is needed with skyrocketed to 11 million. In 1995, that
regards to the impact on civic participation, number was 18 million, but only 3 percent
as ongoing research continues to point to of Americans had accessed the World Wide
mixed findings (de Zuniga et al., 2014). Web. Just three years later, in 1998, 20 per-
As for digitally native news organizations, cent of Americans were getting their news
exact numbers are hard to come by since online at least once a week (Pew Research
there is no single tracking data source. Pew Center, 2014). Early experimentation shifted
Research notes that the audience for news to a period of accelerated innovation in
in this category is growing significantly, and response to consumer demand as traditional
that nearly 50 percent of the digitally native print newspapers quickly ramped up mod-
news organizations that they track reported els for online news production. In 1997, an
double-digit growth in audience in 2015 estimated 745 newspapers had a website, and
(Mitchell et al., 2016b). Interestingly, the rate that number jumped to 1,749 in 1998 (Editor
of growth has slowed somewhat; in 2009, and Publisher Yearbook, 2003). As of 2000, it
growth rates for top online news websites was clear that online news delivered via web-
averaged 27 percent year-over-year (Center, sites was likely to be a permanent component
2008), but by 2015 growth slowed to just of the news industry.
above 10 percent (Mitchell et al., 2016b). As the industry evolved, this period in the
The historical impact of news on the Web history of news on the Web reflects an era of
as a disruption within the evolution of news marked change and replacement. The evolu-
consumption behaviors is connected to the tion was slow as first, but in the later years,
development of Internet capabilities, which from the late 1990s onwards, the pace of
helps to feed consumer interest in online change increased notably. During this time,
news. As of 2016, the percentage of US many print newspapers found themselves
adults reportedly consuming news from print losing circulation, staff, and prestige. Indeed,
newspapers fell to 20 percent (down from 27 Gilbert (2005) notes that, in most cases, suc-
percent in 2013), compared with 38 percent cessful innovation in the transition to Web
consuming news online via websites, apps, or content required newspapers to create new
social media (Mitchell et al., 2016a). From business units or new divisions in order to
the perspective of the news producer, this foster substantial innovation.
change is further reflected through an exami- The focus of many print newspapers dur-
nation of Web traffic. ing this time was a shift to multi-platform
production, although the transition was slow
History of Web news consumption to occur (Doyle, 2013). For print newspapers,
In order to contextualize technological a multi-platform model of news production
change during this period, it is helpful to recognizes the business model for print has
remember how nascent Web technology was eroded, and, in turn, focuses on repurposing
in the 1990s. Early in the history of newspa- print content across multiple technological
pers on the Web, change was slow because platforms (e.g. print newspaper, social media,
consumers were still adapting to the technol- websites). During this historical evolution,
ogy. In 1990, only 42 percent of the popula- the main focus was on transferring print con-
tion had even tried using a computer. The tent to the Internet with the overriding strat-
first website went online in December of egy of print newspapers being to horizontally
404 THE SAGE HANDBOOK OF WEB HISTORY

translate content to the Web, rather than rein- consumption are far more nuanced. Patterns
venting content for the Web (Rothmann and of user distribution have developed online in
Koch, 2014). ways that often mirror what was once seen in
Following the early rise of newspapers on print; users with mainstream political views
the Web, increased access to Internet tech- congregate on mainstream news websites,
nologies fostered a landscape for both the but there is a diverse and rich population of
production and consumption of news media niche websites to serve other points of view
anytime and anywhere. Early on, however, (Fletcher and Park, 2017). On the other hand,
consumption was often driven by key events. online news audiences engage and interact
For example, a 2002 analysis found that the with news in a far more meaningful way than
majority of online news Web traffic was in the days of print-dominated news. Online
driven by consumption of news pertaining news allows consumers to engage and inter-
to key events such as the Clinton–Lewinsky act with news sources, providing commen-
scandal (Wu and Bechtel, 2002), or by tary, asking questions, and interacting with
immediate need-based consumption such as other readers (Ksiazek et al., 2016).
weather-related news. As technology prolif-
erated, and Internet access speeds increased, Tensions between Web news
usage became more ubiquitous. According consumers and producers
to Pew, for example, 2010 marked the first The advent of online news and digitally
time surveyed respondents said the Internet native news leaves news producers to con-
was the platform of choice for news; 65 per- tend with the response and management of
cent of participants said they got most of their technological and associated behavioral
news from the Internet as opposed to televi- changes as they grapple with new ways to
sion, newspapers, or radio (Rosenstiel and produce and distribute news. In some ways,
Mitchell, 2011). As such, there was a rapid digitally native news producers are at an
acceleration of new news organizations on advantage with regards to managing changes
the Web. in news consumption. As digitally native
Larger legacy organizations such as news producers, these news organizations are
The New York Times, CNN, and the BBC inherently capable of benefitting from social
migrated toward a stronger and more domi- media as a contributor to content creation,
nant online presence. The presence of organi- distribution, and engagement (Wu, 2016).
zations such as CNN and the BBC on the This is crucial for the survival of news media
Web points to a continued pattern of change; organizations as an increasing number of
consumer demand was and is not specific to users consume news through social media
a single medium – as a result, news media platforms, which is in the next section of this
online is often an agglomeration of diverse chapter. Similar benefits exist due to the
media channels. Notably, this period also saw inherent ability of digital news media organi-
an increase in startup organizations born on zations to integrate mobile into organiza-
the Web such as Propublica, Business Insider, tional routines. Indeed, digitally native news
and Quartz (Mitchell et al., 2014). An exami- media organizations lead mobile news adver-
nation of the digitally native landscape indi- tising revenue (Wu, 2016), which proves
cates that many of these organizations are important as 25 percent of respondents use a
focused specifically on content that has been smartphone as the main device for news con-
negatively impacted by the industry’s eco- sumption (Newman et al., 2015). In addition,
nomic struggles, including local news, inves- digitally native news media organizations
tigative journalism, and international content strive to innovate by creating new styles and
(Jurkowitz, 2014). Today, online consump- forms of storytelling (Mitchell and Page,
tion of news is a given, but the patterns of 2014) facilitated by the difference in
CONSUMERS, NEWS, AND A HISTORY OF CHANGE 405

distribution of costs from traditional news increasingly mediated, delivered, and


media companies (Thompson, 2014). As a curated by social media companies. Within
whole, digitally native news media organiza- the context of this chapter, digital intermedi-
tions continue to expand in terms of growth aries refers to online content platforms,
and profit (Wu, 2016). news aggregators, search engines, and social
media sites that serve as third-party hosts
Implications for the news and distributors, bringing news content from
ecosystem creators and publishers to consumers (Braun
This discussion of digitally native news is and Gillespie, 2011; Foster, 2012).
inextricably linked to an understanding of the According to Bell et al. (2017), the role of
news media resource space and represents an these media platforms and technology com-
evolution that can only be told by under- panies within the news ecosystem is even
standing the history of news consumption on greater a disruption than the movement from
the Web. In response to the development of print to digital.
digitally native news, legacy organizations In examining the history of news on
must diversify and engage with these new the Web and the rise of social media, it is
entrants, and competition over resources important to understand that many of these
such as funding and audience is unavoidable. intermediaries utilize mechanisms such as
For example, a 2014 report from the Pew software or algorithms to achieve these pro-
Research Center noted that in the preceding cesses. Broadly speaking, algorithms are
year there had been ‘a dramatic and con- ‘encoded procedures for transforming input
spicuous migration of high-profile journalists data into a desired output, based on specified
to digital news ventures’: specifically, digi- calculations’ (Gillespie, 2014: 167). They are
tally native news ventures such as Vice designed by humans, influenced by human
employed 1,100 staff, the Huffington Post behavior, but still ‘the result of sequential
employed more than 500 staff, and Politico decisions made by a machine’ (Usher, 2015:
had more than 100 staff on payroll at the time 1); as such, algorithms are technically com-
(Jurkowitz, 2014). The shift in employment plex with potentially biased and complex
has not been one-to-one, and hand-in-hand operating logics (Diakopoulos, 2014). As
with changes in consumption patterns there a mechanism utilized by digital intermedi-
has been an accompanying change in the aries, they facilitate the discovery and dis-
underlying business model that has resulted semination of news in their responsive design
in a decrease in the overall employment level as reflective of individual users (Webster,
in the news media industry (Picard, 2014). In 2010). Indeed, these algorithms hold substan-
order to remain competitive, both legacy and tial independent authority to affect the flow
digitally native news organizations must con- of news (Napoli, 2014) to consumers.
tinue to adapt to the accelerated changes
evident in news consumption behaviors. Broad trends in social media and
news consumption
Social media and related technology repre-
sent the cusp of evolution in the news media
Social Media Intermediaries
industry. The advent of social media is a
Digital intermediaries such as platforms and significant disruption in the transformation
search engines, as well as algorithms that of news consumption on the Web. Cornia
are used to sort and display news, play an et al. (2016) found that the number of con-
increasingly significant role in shaping and sumers surveyed who reported using social
structuring digital life – especially evident media as an outlet for news consumption
via changes in news consumption, which are each week increased from 25 percent in 2013
406 THE SAGE HANDBOOK OF WEB HISTORY

to 46 percent in 2015. The report also found Page, 2014). Indeed, Facebook is responsible
that 41 percent of the sample use Facebook for about 20 percent of referrals to news pub-
for news each week. Echoing the same lishers (Persio, 2015). Facebook’s news feed
trends, Pew’s recent study found that half of algorithm always occupied some type of news
Web-using adults in the United States get editorial role in that its programing decisions
political news from Facebook (Mitchell and select the types of news stories featured in user
Page, 2015). The prior trends identified in feeds (Bell, 2015).
this chapter suggest that these changing pat- More recently, in 2015 Facebook announced
terns of consumption are likely to accelerate its Instant Articles partnership with several
moving forward. news organizations, which allows for host-
ing of exclusive content within Facebook’s
History of social news consumption own website instead of linking out to the news
Social media emerged from the growth of organization. Ostensibly, the incentive is to
Web 2.0 technologies in the late 1990s with minimize the slower loading speed from link-
social blogs such as Slashdot (founded in ing out to the news organization website and
1997) and Metafilter (founded in 1999) that to increase revenue opportunity (Constine,
list links to posts by users. Algorithmic dis- 2015); however, the broader implication of
tribution of social news, however, did not such a reality is increasingly centralized news
develop as a medium for news distribution from one powerful platform where news sto-
until the late 2000s. To wit, Facebook ries are sorted and delivered via a proprietary
launched in 2004 as a Harvard-only social algorithm.
network with the news feed component
launching in 2006 – the year that also Tensions between social news
brought the launch of Twitter. Four years consumption and production
later came the introduction of Instagram The increasingly powerful role of social
(2010), followed by Snapchat and Apple media and digital intermediaries in the con-
Newsstand (2011), and Google Newsstand sumption of news raises concerns for a
(in 2013). Notably, where there was a dec- number of reasons. Echoing the attitude of
ades-long gap between the advent of the other technology companies, Facebook, for
Internet and the rise of the World Wide Web, example, does not see itself as a publisher.
the development of social media as a news Rather, as Facebook’s head of news partner-
platform has occurred in a significantly ships Andy Mitchell explains, Facebook is
shorter time period. Beginning in 2014, the ‘first and foremost, a platform where people
frequency of social media and news-related connect with people’ (Menichini, 2015).
developments began to greatly accelerate, When asked if Facebook felt any responsi-
including, but not limited to, developments bility for the integrity of its news feed,
such as Snapchat Discover, Facebook Instant Mitchell evaded acceptance of the role of
Articles, Google AMP, and Instagram Stories news publisher and responded only by saying
(Bell et al., 2017). ‘that the company cared about improving
The case of the Web history of Facebook, in user experience’ (Bell, 2015). Indeed,
particular, helps shed light on the role of social Facebook is very involved in what content is
media and digital intermediaries in news distributed; in fact, users only see about 6
consumption. Since its inception in 2004, percent of their friends’ posts (Timm, 2015).
Facebook evolved to play a significant role Perhaps more importantly, most users aren’t
in social life and specifically in relation to the even aware of these processes. A recent
news media ecosystem, as it appears to play an study from Eslami et al. (2015) found that
important role in increasing traffic to individ- 62.5 percent of sampled users were not
ual news stories (Alpert, 2015; Mitchell and aware that Facebook’s news feed algorithm
CONSUMERS, NEWS, AND A HISTORY OF CHANGE 407

hid stories; instead, they thought every single the news media ecosystem will develop over
story from their followed pages and friends time – although the exact implications are
appeared in the feed. unknown at present.
The role of social media and technology Intermediaries such as Google and Apple
companies has become increasingly inte- are commercial companies with values not
grated into the relationship between news necessarily aligned with media diversity and
producers and consumers; indeed, this transi- the public interest, which are traditional foun-
tion is accelerating in a short time frame. In dations of the Western notion of free press.
response to such significant shifts in the ways The absence of public knowledge regarding
that audiences find and consume news, news what happens from the creation of news to
producers themselves are beginning to incor- how it is consumed raises broader issues for
porate these digital intermediaries into their democracy (Bell, 2015). In fact, the means
own formal business processes. Legacy news by which these digital intermediaries operate
organizations and digital news organizations are economically sensitive, proprietary, and
alike now have teams dedicated to creating generally opaque. Such contradiction between
content solely for social media platforms and the stated values of these companies and the
messaging applications. According to Bell significance of the role they play within the
et al. (2017), this speed of transformation news ecosystem ‘raises the question of if
may lead to a reduction or even an elimina- or how the normative dimensions of their
tion of publishing, distributing, hosting, and governance frameworks reflect the realities
monetizing content from the main business of their function and significance’ (Napoli,
processes of news organizations. 2015: 2).
Indeed, the selective presentation of infor-
Implications for the news mation by any intermediary is potentially
ecosystem detrimental to public interest ideals when
The global news ecosystem is changing in the service personalizes what it presents
fundamental ways as audiences increasingly and uses a ‘proprietary recipe to draw from
consume news through social media plat- boundless ingredients’ to deliver the news
forms and digital intermediaries – and the (Zittrain, 2014). The difference, however, is
modern state of the news media ecosystem is that while traditional news organizations are
a significant evolution from the early history indeed built around principles of editorial
of news on the Web. Third parties such as oversight, such behavior is historically paired
Google, Facebook, Twitter, and Apple (along with industry-wide standards, guidelines, or
with their proprietary software and algo- codes of conduct that provide some level of
rithms) are mediating news access and con- transparency into the process. These kinds
sumption and, as such, raising concerns of self-regulatory instruments are not yet a
regarding public interest and media diversity. component of digital intermediaries such
Third-party organizations frequently use as Facebook, Twitter, Apple, and Google in
algorithms, software, platforms, and other their involvement with the news ecosystem.
mechanisms to bring news content to con- In actuality, many of these companies are still
sumers, acting neither as ‘neutral pipes, nor publicly distancing themselves from identi-
full media companies’ but instead as ‘gate- fying as news organizations and correspond-
keepers, controlling information flows, ing editorial responsibilities (Napoli, 2015).
selecting, sorting, and then distributing infor- As discussed, in response to changes in
mation’ (Foster, 2012: 6). As such, the use of news consumption behaviors that reflect an
algorithms and other technology is directly increase in the use of social media intermedi-
impacting consumption patterns, and has aries, news producers continue to push con-
long-term implications for the ways in which tent to these technology companies without
408 THE SAGE HANDBOOK OF WEB HISTORY

any guarantee of a consistent benefit to their follow the news (Mitchell et al., 2016a).
business models. Such a scenario has created Furthermore, a growing portion of news con-
a critical dilemma for the greater news media sumption now occurs via mobile intake, with
ecosystem: 89 percent of the US mobile population
(144 million users) accessing news and
Should they continue the costly business of main- information on mobile devices (Knight
taining their own publishing infrastructure, with
smaller audiences but complete control over reve- Foundation, 2016). A recent Pew study esti-
nue, brand, and audience data? Or, should they mates that news consumption specifically on
cede control over user data and advertising in mobile devices has increased from 54 per-
exchange for the significant audience growth cent in 2013 to 72 percent in 2016 (Mitchell
offered by Facebook or other platforms? (Bell et al., 2016a). These broad trends speak to
et al., 2017: 50)
the significance of mobile news as the most
recent disruption in the historical evolution
Mobile News Apps of news consumption on the Web.

One of the most significant disruptions in the History of mobile news


evolution of consumer news behaviors is consumption
related to the introduction and growth of Mobile devices are deeply embedded into
mobile news, a term used here to signify the everyday life. What began as an innovation
multiple ways in which consumers engage of the telecommunication industry evolved to
with news via a mobile device. Indeed, become a media-rich platform (Wei, 2008)
mobile news consumption involves a variety encompassing a variety of audio, video,
of different access points of engagement graphical, and textual communication pro-
including customized news alerts, mobile cesses (Westlund, 2008). Early research
websites of news organizations, and mobile found a hesitation toward mobile news con-
news applications (apps) (Westlund, 2013). sumption, behaviors restricted to breaks in a
Mobile news is also used here to represent daily schedule (Dimmick et al., 2011). This
consumer engagement with news produced evolved quickly, however, with research that
specifically for third-party platforms such as found people accessed mobile news both
Snapchat and Instagram, as well as with between and during scheduled activity
news stories on social feeds like Facebook (Westlund et al., 2011).
and Twitter. This turning point of mobile news as a
significant disruptor of consumption pat-
Broad trends in mobile news terns can be traced back to 2011 when half
consumption of the US adult population owned a smart-
The disruption of mobile news with regards phone or tablet and nearly one-third of them
to the evolution of news consumption on the got news on a mobile device at least once per
Web is inextricably linked to the diffusion of week, and by 2012 over half (51 percent) of
smartphones and improved access to mobile smartphone owners used mobile devices for
Internet, which have helped to feed consumer news (Rosenstiel and Mitchell, 2012). This
interest in mobile services such as mobile increased slightly to 54 percent in 2013,
news (Nel and Westlund, 2012). Over the when 21 percent also reported that mobile
past five years, US adult smartphone owner- news consumption is ‘frequent’ (Mitchell
ship has increased from 46 percent to 82 and Page, 2014). In 2015, Pew’s State of the
percent (Knight Foundation, 2016). Broadly News Media report recognized a ‘mobile
speaking, news has always been an integral majority’ as mobile traffic was greater than
component of public life; indeed, more than desktop traffic for 39 of the top 50 digital
70 percent of current US adults closely news websites analyzed (Mitchell and Page,
CONSUMERS, NEWS, AND A HISTORY OF CHANGE 409

2015: 4), and the number of news sites with point in the evolution of news production
greater mobile traffic than desktop traffic and consumption on the Web. The mobile-
continued to increase in 2016 (Mitchell et al., focused efforts of news producers were
2016b). In a similar vein, a Reuters Institute two-fold: first, to ensure news was acces-
report found that the percentage of the US sible on mobile Web browsers and, second,
population that said they used a smartphone to develop their own native apps (Westlund,
to access the news in the last week increased 2013). The late 2010s saw the development
from 30 percent in 2013 to 48 percent in 2016 of an increasing number of emerging mobile
(Cornia et al., 2016). platforms, such as Snapchat and Instagram,
each of which adds to the tension between
Tensions between mobile news news produced and consumers as they navi-
consumers and producers gate the ever-evolving field.
Consumption of news on the Web evolved As we reach present day, the longstanding
quite rapidly, with the disruption of mobile patterns of this historical shift leads to mobile
news and associated technological develop- news, which continues to afford consumers
ments leaving news organizations to contend with access to news at any place and any time.
with managing changes in the creation and On the flipside, news producers continue to
distribution of news. For instance, evolving struggle with where best to invest resources
consumer demand for mobile news could and development efforts. While some news
impact the amount of resources spent on organizations adopted a ‘mobile first’ strategy,
other mediums (Nel and Westlund, 2012) or others tend to focus on a ‘more platform-
it could complement consumer choices agnostic approach’ (Westlund, 2013: 13).
regarding other media (Newell et al., 2008). While some news organizations see value in
News organizations grapple with the shift to developing native mobile news apps, others
mobile news in terms of attracting audiences, are focusing on building out mobile-responsive
formatting and delivering content, and the websites (Knight Foundation, 2016). No matter
influence of technological aspects such as the strategic organizational approach, changes
screen size and loading speed (Dunaway, in news consumption on the Web continue to
2016). Whether an opportunity or a threat, cause tension for news producers responding to
the disruption of mobile news for consumers the disruption of mobile news.
causes tensions for the news producers as
they grapple with new production- and distri- Implications for the news
bution-related activities. ecosystem
Historically, legacy news organizations Indeed, the disruption of mobile news and
endeavor toward cross-media news produc- associated changes in consumption behaviors
tion. In fact, news organizations first experi- is having a profound impact on the greater
mented with mobile news as early as the news ecosystem. The emergence of mobile
1990s, when some organizations published news enables consumption at any time and
news for the pager (Westlund, 2013). The any place, but, furthermore, it enables new
turn of the millennium brought the develop- forms of personalization, curation, and aggre-
ment of pushed news alerts (Fidalgo, 2009) gation of distributed news (Sheller, 2015).
and customized mobile news websites, This affects both the production and con-
experiments implemented with the goal of sumption of news on the Web as we know it.
meeting evolving user demands of better Almost all adults with mobile devices
exposure and usability (Westlund, 2013). consume news on their devices, further ena-
In 2007, Apple’s iPhone launched in the bling news organizations to develop and tar-
United States and spurred the growth of a get specific audiences (Knight Foundation,
new mobile ecosystem, a formative turning 2016). Early research suggested the mobile
410 THE SAGE HANDBOOK OF WEB HISTORY

news consumers followed the news more Alpert, L.I. (2015) ‘Publishers warily embed
frequently, but with specific user patterns with Facebook: Social network’s power to
and preferences (Chan-Olmsted et al., 2012). steer readers could grow with “Instant Arti-
Two-thirds of all online activity is expected cles” initiative’, May 11, The Wall Street
to take place on mobile devices by 2020; Journal.
Amadeo, J.A. (2007) Patterns of Internet Use
as a result, more time is spent consum-
and Political Engagement Among Youth. In:
ing the news than was historically possible Dahlgren P (ed) Young Citizens and New
(Dunaway, 2016). In the shift to a mobile Media: Learning for Democractic Participa-
news landscape, news organizations must tion. New York: Routledge. pp. 125–148.
learn from the lessons of the history of the Bell, E. (2015) ‘Google and Facebook are our
Web and better understand the longstanding frenemy. Beware’, Columbia Journalism
patterns of change with regards to the con- Review. Available at: https://www.cjr.org/
sumption of news. analysis/google_facebook_frenemy.php
Bell, E., Owen, T., Brown, P., Hauka, C. and
Rashidian, N. (2017) ‘The platform press:
CONCLUSION How Silicon Valley reengineered journalism’,
Tow Center for Digital Journalism at Colum-
bia University.
This chapter constructs a narrative that traces Bimber, B. (2001) ‘Information and political
changes in news media consumption from the engagement in America: The search for
early rise of Web pages, to the growth of effects of information technology at the indi-
social media, to the proliferation of mobile vidual level’, Political Research Quarterly
platforms. At each stage, the trajectory of the 54(1): 53–67.
history of the Web is central to understanding Boczkowski, P.J. (2004) Digitizing the News:
the next stage in the evolution of news media Innovation in Online Newspapers, Cam-
consumption and corresponding production. bridge, USA: MIT Press.
In sum, news consumption has indeed evolved Braun, J. and Gillespie, T. (2011) ‘Hosting the
public discourse, hosting the public: When
since the advent of the Web, but changes in
online news and social media converge’,
consumption are interwoven with the history Journalism Practice 5(4): 383–398.
of the Web. The introduction of the Web first Center, P.R. (2008) ‘Internet overtakes newspa-
brought the rise of online news; however, pers as news outlet’, Pew Research Center.
several decades later further significant CERN. (n.d.) The Birth of the Web. (https://
changes have impacted the ways that humans home.cern/topics/birth-web)
communicate and connect. This chapter Chan-Olmsted, S., Rim, H. and Zerba, A.
depicts such transformation in the context of (2012) ‘Mobile news adoption among young
the news industry by highlighting the evolu- adults: Examining the roles of perceptions,
tion of news consumption behavior over the news consumption, and media usage’, Jour-
last two decades. In tracing the news indus- nalism & Mass Communication Quarterly
90(1): 126–147.
try’s movement to digital, the rise of social,
Constine, J. (2015) ‘Facebook starts hosting pub-
and the influence of mobile, this work sheds lishers: Instant Articles’, TechCrunch. Available
light on the history of news evolution with at: https://techcrunch.com/2015/05/12/
specific regard to the consumer perspective. facebook-instant-articles/
Cornia, A., Sehl, A. and Nielsen, R.K. (2016)
‘Private sector media and digital news’, Reu-
REFERENCES ters Institute for the Study of Journalism.
d’Haenens, L., Jankowski, N. and Heuvelman,
Ahlers, D. (2006) ‘News consumption and the A. (2004) ‘News in online and print newspa-
new electronic media’, Harvard International pers: Differences in reader consumption and
Journal of Press/Politics 11(1): 29–52. recall’, New Media & Society 6(3): 363–382.
CONSUMERS, NEWS, AND A HISTORY OF CHANGE 411

de Sola Pool, I. (1983) Technologies of Free- Jurkowitz, M. (2014) ‘The growth in digital
dom, Cambridge, USA: Harvard University reporting’, Pew Research Center.
Press. Khaw, A. (1983) ‘Changing old habits with
de Zuniga, H.G., Copeland, L. and Bimber, B. new technology in newspapers’, Media Asia
(2014) ‘Political consumerism: Civic engage- 10(3): 168–169.
ment and the social media connection’, New Knight Foundation. (2016) ‘Mobile-first news:
Media & Society 16(3): 488–506. How people use smartphones to access
Diakopoulos, N. (2014) Algorithmic accounta- information’, Knight Foundation.
bility reporting: On the investigation of black Ksiazek, T.B., Peer, L. and Lessard, K. (2016)
boxes, Tow Center for Digital Journalism. ‘User engagement with online news: Con-
Dimmick, J., Feaster, J.C. and Hoplamazian, ceptualizing interactivity and exploring the
G.J. (2011) ‘News in the interstices: The relationship between online news videos and
niches of mobile media in space and time’, user comments’, New Media & Society 18(3):
New Media and Society 13(1): 23–39. 502–520.
Doyle, G. (2013) ‘Re-invention and survival: Margolis, M. and Resnick, D. (2000) Politics as
Newspapers in the era of digital multiplat- Usual: The Cyberspace Revolution, Thousand
form delivery’, Journal of Media Business Oaks, USA: Sage.
Studies 10(4): 1–20. Menichini, R. (2015) ‘Facebook: More collabo-
Dunaway, J. (2016) ‘Mobile vs. computer: ration, not competition, with media’, La
Implications for news audiences and outlets’, Repubblica.
Harvard Kennedy School Shorenstein Center Mitchell, A., Gottfried, J., Barthel, M., et al.
on Media, Politics and Public Policy. (2016a) ‘The modern news consumer: News
Editor and Publisher Yearbook. (2003) Editor attitudes and practices in the digital era’,
and Publisher Yearbook Online Data, Pew Research Center.
1940–2003. Mitchell, A., Holcomb, J. and Weisel, R. (2016b)
Eslami, M., Rickman, A., Vaccaro, K., et al. ‘State of the news media 2016’, Pew
(2015) ‘I always assumed that I wasn’t really Research Center.
that close to [her]: Reasoning about invisible Mitchell, A., Jurkowitz, M., Anderson, M.,
algorithms in news feeds’, CHI 2015. Seoul, et al. (2014) ‘State of the news media 2014’,
Korea. In: Mitchell A (ed) State of the News Media.
Fidalgo, A. (2009) ‘Pushed news: When the Washington DC: Pew Research Center. Avail-
news comes to the cellphone’, Brazilian Jour- able at: http://assets.pewresearch.org/wp-
nalism Research 5(2): 113–124. content/uploads/sites/13/2017/05/30142556/
Fletcher, R. and Park, S. (2017) ‘The impact of state-of-the-news-media-report-2014-final.pdf
trust in the news media on online news con- Mitchell, A. and Page, D. (2014) ‘State of the
sumption and participation’, Digital Journal- news media 2014: Growth in digital report-
ism 5(10): 1281–1299. ing: What it means for journalism and news
Foster, R. (2012) ‘News plurality in a digital consumers’, Pew Research Center. (http://
world’, Reuters Institute for the Study of www.journalism.org/files/2014/03/Shifts-in-
Journalism. Reporting_For-uploading.pdf)
Gilbert, C.G. (2005) ‘Unbundling the structure Mitchell, A. and Page, D. (2015) ‘State of the
of inertia: Resource versus routine rigidity’, news media 2015’, Pew Research Center.
Academy of Management Journal 48(5): Mitchelstein, E. and Boczkowski, P.J. (2009)
741–763. ‘Between tradition and change: A review of
Gillespie, T. (2014) The Relevance of Algo- recent research on online news production’,
rithms. In: Gillespie T, Boczkowski P and Foot Journalism 10: 562–586.
K (eds) Media Technologies. Cambridge: MIT Napoli, P.M. (2014) ‘Automated media: An
Press. pp. 167–194. institutional theory perspective on algorith-
Hamilton, J. (2004) All the News that’s Fit to mic media production and consumption’,
Sell: How the Market Transforms Information Communication Theory 24(3): 340–360.
into News, Princeton, USA: Princeton Univer- Napoli, P.M. (2015) ‘Social media and
sity Press. the public interest: Governance of
412 THE SAGE HANDBOOK OF WEB HISTORY

news platforms in the realm of individual Sheller, M. (2015) ‘News now’, Journalism
and algorithmic gatekeepers’, Telecommuni- Studies 16(1): 12–26.
cations Policy 39(9): 751–760. Thompson, B. (2014) ‘Is Buzzfeed a tech com-
Nee, C.R. (2013) ‘Creative destruction: An pany?’, Stratechery.
exploratory study of how digitally native Timm, T. (2015) ‘The most concerning element
news nonprofits are innovating online jour- of Facebook’s potential new power’, Colum-
nalism practices’, International Journal on bia Journalism Review. Available at: https://
Media Management 15(1): 3–22. www.cjr.org/criticism/facebook_news_cen-
Nel, F. and Westlund, O. (2012) ‘The 4C’s of sorship.php
mobile news: Channels, conversation, con- Usher, N. (2015) ‘Who’s afraid of the big bad
tent, and commerce’, Journalism Practice 6: algorithm?’, Columbia Journalism Review.
744–753. Available at: https://www.cjr.org/analysis/
Newell, J., Pilotta, J.J. and Thomas, J.C. (2008) whos_afraid_of_a_big_bad_algorithm.php
‘Mass media displacement and saturation’, Webster, J.G. (2010) ‘User information regimes:
International Journal on Media Management How social media shape patterns of con-
10(4): 131–138. sumption’, Northwestern University Law
Newman, N., Levy, D.A.L. and Nielsen, R.K. Review 104(2): 593–612.
(2015) ‘Digital news report 2015: Tracking Wei, R. (2008) ‘Motivations for using the
the future of news’, Reuters Institute for the mobile phone for mass communications and
Study of Journalism. entertainment’, Telematics and Informatics
Norris, P. (2003) ‘Tuned out voters? Media 25(1): 36–46.
impact on campaign learning’, Ethical Per- Westlund, O. (2008) ‘From mobile phone to
spectives 9(3): 200–221. mobile device: News consumption on the
Persio, S.L. (2015) ‘Facebook’s plans to feed us go’, Canadian Journal of Communication
news’, International Journalism Festival 2015 33(3): 443–463.
WebMagazine. Westlund, O. (2013) ‘Mobile News’, Digital
Pew Research Center. (2014) ‘World wide web Journalism 1(1): 6–26.
timeline’, Pew Research Center. Westlund, O., Gomez-Barroso, J.-L., Compano,
Picard, R.G. (2014) ‘Twilight or new dawn of R., et al. (2011) ‘Exploring the logic of
journalism? Evidence from the changing mobile search’, Behaviour & Information
news ecosystem’, Digital Journalism 2(3): Technology 30(5): 691–703.
273–283. Wu, H.D. and Bechtel, A. (2002) ‘Web site use
Rosenstiel, T. and Mitchell, A. (2011) ‘The state of and news topic and type’, Journalism & Mass
the news media 2011’, Pew Research Center. Communication Quarterly 79(1): 73–86.
Rosenstiel, T. and Mitchell, A. (2012) ‘The Wu, L. (2016) ‘Did you get the buzz? Are digital
future of mobile news: The explosion in native media becoming mainstream?’, #ISOJ
mobile audiences and a close look at what it 6(1). Available at: https://isojjournal.word-
means for news’, Pew Research Center. press.com/2016/04/14/did-you-
Rothmann, W. and Koch, J. (2014) ‘Creativity get-the-buzz-are-digital-native-media-
in strategic lock-ins: The newspaper industry becoming-mainstream/
and the digital revolution’, Technological Zittrain, J. (2014) ‘Engineering an election’,
Forecasting and Social Change 83: 66–83. Harvard Law Review Forum 127: 335–341.
28
Historical Studies of
National Web Domains
Niels Brügger and Ditte Laursen

This chapter sets out to investigate national WHY STUDY THE HISTORY OF
web domains, and it argues that research NATIONAL WEB DOMAINS?
on how national web domains have devel-
oped is a cornerstone in web history stud- At first sight it may seem counterintuitive to
ies at large. However, studying a national use the term ‘national web’ in the same sen-
web domain over time is challenging in a
tence as the World Wide Web. The web is
number of ways, including how to concep-
undoubtedly ‘world wide’ in the sense that
tualize a national web, let alone the techni-
no matter where web users are located they
cal challenges of studying large amounts of
can make content available to anyone located
web data. The chapter starts by presenting
some of the arguments for why the history anywhere, and they can access content made
of web domains is a relevant field of study. available by others, no matter their location.
Then a brief research history of studies of However, there is evidence that web users to
national webs is outlined. In the following a large extent use a ‘national web’. In a
section some of the major research themes study based on an analysis of 4,000 websites,
and challenges are discussed, including already in 2000 Halavais argues that the web
which sources to use, how to delimit a is to some degree part of a national
nation on the web, and how to use the web domain (Halavais, 2000). In the 2009
archived web as a source. Finally, there survey The media menus of Danish internet
follows an overview of case studies of users 2009, transnational web use was
national webs, and then the chapter is examined, and the conclusion was that the
concluded. web is largely used as a national (or even
414 THE SAGE HANDBOOK OF WEB HISTORY

local) medium: ‘national horizon is strong However, web historical studies of


except for “Searching” info about and national webs do not only provide relevant
“Buying goods”’ (Finnemann et al., 2012: historical knowledge. If national web stud-
31–5). In the monograph Social Theory after ies are based on the archived web as a source,
the Internet, it is debated whether the web is they can also help develop and test adequate
global or not when it comes to its use, and methods and analytical tools for Big Data
after evaluating various ways of measuring studies of this important part of the digital
this it is concluded that ‘Whatever the most cultural heritage. Since many national web
important factors may turn out to be, the archiving institutions now hold large col-
web is not becoming a single whole, but lections of archived web historical studies,
rather a series of clusters: linguistic plus these large quantities of archived material
those that develop due to the policies of can help develop novel methods that can
states and sites promoting shared interests be used on a smaller scale when studying
such as economic development strategies’ other web entities. In this sense, historical
(Schroeder, 2018: 121–2). Finally, if look- studies of national webs can be understood
ing at content providers, a comparative as extreme cases that can bring web archive
study of news websites in nine different studies to a head, as is the case with the
countries showed that online news is strongly Danish research project ‘Probing a Nation’s
nation-centred (Curran et al., 2013: 887–91). web domain’, in which both authors of this
Thus, the web may potentially be global, but chapter have participated, and which will be
when it comes to users and content it is actu- used to illustrate some of the points through-
ally in many cases national. However, out the chapter; hence the many Danish
although the national web seems to play a examples (cf. Brügger, 2017). In addition,
role for web users, the rest of this chapter historical studies of national webs are also
will show that from a web researcher’s per- important as drivers for the establishment
spective it can be a challenge to determine of national – and even transnational –
exactly how ‘the national web’ should be research infrastructures. On a national level
approached, in particular when it comes to this includes High Performance Computer
the history of a national web. facilities as well as procedures and techni-
But still there are good arguments for con- cal solutions for extracting large amounts
sidering the national web an analytical entity of data from national web archives, and on
of relevance to web historians. First of all, it a transnational scale could include simple
is important to study a national web and its look-up facilities to help researchers find
historical development, because it constitutes national material in other countries’ web
the backdrop for other web historical stud- archives, and eventually the possibility of
ies of analytical entities that are smaller than including such material in an analysis (for
the national web. For instance, when study- instance, based on a derived dataset such as
ing the history of one website or of a cluster a link graph). Finally, the playfulness that
of interrelated websites about an event such is a part of studying developments in such
as elections, sports events, or natural disas- large quantities of archived web should not
ters it is important to contextualize such a be neglected, since the iterative process of
study, and one of the contexts of the websites gaining experience with working with large
is the entire national web in which they are amounts of archived web can help research-
embedded. Therefore, it is relevant to study a ers develop and test research questions that
national web as an object of study in its own would probably not have occurred with
right, and to include it as a contextualizing smaller amounts of data, but that can now
element in other web studies. be asked, and in many cases also answered.
HISTORICAL STUDIES OF NATIONAL WEB DOMAINS 415

A BRIEF RESEARCH HISTORY the archived web as a source, although not


as the only source in all cases. And in some
Studies of national web domains can be cases the insights that characterized the sec-
divided into three waves, each with a differ- ond wave have been supplemented with his-
ent focus. torical studies of the archived web, such as
National web domains have been studied Ben-David’s study of the history of the web
since the early 2000s (cf. Brügger, 2017: 62), domain of the former Yugoslavia (Ben-David,
and include studies of a number of national 2016, cf. also ‘A disappeared nation’ section
web domains such as the Austrian, Brazilian, later in this chapter). An early project is the
and Portuguese web domains, among others, Australian ‘Internet History in Australia and
with a focus on language, page size, page age, the Asia-Pacific’ (2010–13), which is inves-
and hyperlinks (cf. the overview in Baeza- tigating the development of the internet in
Yates et al., 2007: 3–5). With the exception Australasia; however, this project is focused
of Halavais’ study from 2000 of the rela- not only on web history, but on internet his-
tion between 4,000 websites and national tory. Then follows ‘Big Data: Demonstrating
borders (Halavais, 2000) these early studies the Value of the UK Web Domain Dataset
mainly focused on the technical content of for Social Science Research’, which makes
the web, and their aim was by and large to the first digs into trends in the development
improve either the web (browsers, hyperlink of the UK web domain (UK, 2012–14; cf.
structure, etc.) or the process of web archiv- Hale et al., 2014, and also section ‘A nation’s
ing. These studies rarely adopted a historical sub-domains – the case of the UK web’ later in
perspective. this chapter), ‘WebART: Enabling Scholarly
In contrast, the second wave sets out to Research in the KB Web Archive’, which
study national webs as a source in their own investigates the Dutch web (the Netherlands,
right that can be used to inform research 2012–16), ‘Probing a Nation’s Web Sphere
projects within the social sciences and the – the Historical Development of the Danish
humanities. These studies emerge out of Web’, which analyses file types, link struc-
the research environment Digital Methods tures, and web content on the Danish web, and
Initiative at the University of Amsterdam, how this has developed (Denmark, 2013–;
and they are by and large based on the idea cf. Brügger, 2017), ‘Big UK Domain Data
that the online web’s ways of working (e.g. for the Arts and Humanities (BUDDAH)’,
the use of search engines) are repurposed in which the development of the UK web
as research tools (Rogers, 2013). This also domain is studied in more depth through
means that focus is on the online web and not a number of case studies (UK, 2014–15;
on the archived web, as it has been archived cf. Cowls, 2017 for an overview), ‘A
by a web archive in the past, and these stud- Longitudinal Analysis of the Canadian World
ies do not aim to study the historical devel- Wide Web as a Historical Resource’, which
opment of national domains. Some of the focuses on the development of the Canadian
studies emerging out of the second wave are web (Canada, 2014–15), and, finally, ‘Web90 –
studies of the development of the Iranian web Patrimoine, Mémoires et Histoire du Web dans
(Rogers et al., 2013) and a study of the Arabic les années 1990’, which is a broader research
webspace in Israel (Ben-David, 2014). project in which the development of the
The third wave starts to emerge almost in French web in the 1990s is placed in a wider
parallel with the second wave, although a little cultural context and in a longer time span
later, but in contrast to the latter the research going back to the 1970s (France, 2014–17;
projects of the third wave have a clear histori- cf. Schafer and Thierry, 2016; Schafer, 2018,
cal approach, and in most cases they include and also section ‘A national web domain’
416 THE SAGE HANDBOOK OF WEB HISTORY

later in this chapter). Also in this period the government policies, articles/features in the
first edited volumes about the web in different printed or audio-visual media, internal docu-
national settings were published (Brügger and ments from web companies, and oral history
Schroeder, 2017; Brügger and Laursen, 2018). interviews with relevant individuals, to men-
Thus, national webs have been studied tion just a few. In addition, sources such as
since the early 2000s, in the beginning mainly ‘How to…’ books published from the early
with a technical aim, which from the second 1990s onwards, like ‘How to use the web’
wave is supplemented with social science and similar, also constitute an important
and humanities studies, and in the beginning source type since they are the most compre-
without a historical approach, which does hensive presentation of what the web looked
not emerge until the third wave, when the like and what a user could find on the web of
archived web in web archives also starts to the time, including the national web; in many
be used as one of the primary source types. In cases these types of books are the only
addition, the studies tend to move from broad sources to help show the actual web content,
mappings towards more detailed studies, primarily regarding the old web. In a similar
including reflections on how a national web vein, the archived internet or web can be
domain can be delimited at all. Finally, it is used, but not as the object of study as such,
worth mentioning that in parallel with these but rather indirectly; for instance, the web
research projects more and more national may have been mentioned in newsgroups,
web archives are established, which in many just as national search engines/portals may
cases is a condition as well as an incentive to give an overview of the categories of content
undertake these research projects, and more that were available, including links to that
of the projects are made in close collabora- content. Finally, lists of the domain names on
tion with web archives. a given Country Code Top-Level Domain
name (ccTLD), such as .uk and .fr for the UK
and France, respectively, constitute an inval-
uable source.
RESEARCH THEMES AND An example of a study of a national web
CHALLENGES that is based on a broad range of source types
is the above-mentioned French research
Studies of national web domains are facing a project ‘Web90 – Patrimoine, Mémoires et
number of challenges, such as determining Histoire du Web dans les années 1990’. In
which sources to base the study on, how to addition to the archived web, this project
delimit a national web domain on the web, also studied press and audio-visual archives
and, in particular, to what an extent the (mainstream media and specialist press dedi-
archived web can be used as a source. cated to the web and the internet), reports,
newsgroups, and interviews with web pro-
fessionals (Schafer and Thierry, 2016: 1145;
Schafer, 2018).
A Broad Range of Sources
The challenges related to these types of
When introducing studies of national webs sources are in general not particular to web
above it has been taken for granted that such history studies or to studies of national webs,
studies are based on the archived web. and in brief they mainly relate to the finding
However, this need not be the case. As with of the material. The printed press and audio-
any other historical study a variety of source visual media have usually been collected and
types can – and should – be used. This could preserved, but online media such as news-
include user statistics, reports on government groups and search engines/portals can be
initiatives, on web use and similar, harder to find, since they may not have been
HISTORICAL STUDIES OF NATIONAL WEB DOMAINS 417

archived. Regarding statistical material about community of the nation does not exist
web use this is often not available for entire ex nihilo – on the contrary, the imagination
national webs, and in particular for the early of fellow members of a nation is conveyed
years not much statistical information of this by and with media that are distributed within
kind exists. Lists of web domains on ccTLDs a certain geo-physical territory. Although
come with challenges of their own, and are newspapers could be posted globally, by and
discussed below. large their distribution network (postal sys-
tem (routes, stamps, etc.), street vendors,
bicycles) has stayed within the territory of
the nation. As for analogue electronic media
Delimiting a Nation – on the Web
such as radio and television, global broad-
Defining how a nation can be delimited is in casting is possible, for instance with short-
itself a challenge, and with the web this chal- wave radio and satellite television, but the
lenge does not become smaller. In general, majority of broadcasters have usually limited
one can delimit the boundaries of a nation their broadcast activities to the national terri-
either by focusing on the concrete geo- tory, in the main based on a national terres-
physical space, that is, where the nation’s trial broadcasting network. Thus, although
territory starts and ends, or one can focus on national print media and analogue electronic
the cultural space, that is, how the inhabit- media could have a global reach, their actual
ants within a certain territory construct a distribution networks tend to stay within the
cultural national community and its bounda- geo-physical boundaries of the nation, which
ries. The latter approach is put forward by implies that in the pre-web era there was a
Benedict Anderson when he argues that high degree of concordance between the
‘nation-ness, as well as nationalism, are cul- national territory and the media that carry the
tural artefacts’ (Anderson, 1996: 4). But to imagined community.
Anderson nations are not only cultural arte- With the web this close mutual inter-
facts, they are also imagined, since ‘the dependency between territory and media
members of even the smallest nation will distribution network is loosened to such an
never know most of their fellow-members extent that the limits of the territory do not
[…] yet in the minds of each lives the image constitute the natural boundaries for the web
of their communion’ (Anderson, 1996: 6).1 as carrier of the imagined national commu-
Therefore, Anderson argues, one of the main nity. However, as mentioned at the begin-
conditions for the emerging nation-ness in ning of this chapter, this does not imply that
the late eighteenth century is the spread of no national borders can be identified on the
print-capitalism, that is, the advent of the web, but due to the absence of a close and
printed book as a commodity (Anderson, unequivocal material relation between terri-
1996: 22–46). The printed book, and later tory and media distribution the boundaries of
newspapers, offers a shared space that ena- the nation are not as obvious as before the
bles the inhabitants of a given geo-physical web. In contrast, these boundaries have to be
area to imagine a nation populated by fellow established by reconstructing them in the vast
members, and to do this across physical material of the global web.
distance and over time. The most obvious way of delimiting a
Inspired by Anderson’s idea of this close national web is to use the web’s own insti-
relation between print media and the estab- tutionalized national delimitations, as
lishment of an imagined nation-ness, one expressed in the above-mentioned Domain
could argue that, in general, the territory and Name System where a given Country Code
the cultural artefact of the nation are inter- Top-Level Domain name is allocated to a
woven through media, since the imagined given nation, such as .uk for the UK, .fr
418 THE SAGE HANDBOOK OF WEB HISTORY

for France, and .nl for the Netherlands. list of a given ccTLD from the investigated
Delimiting a nation’s web based on ccTLD is period because it constitutes the authoritative
the most obvious way forward if the archived list of what was included in the nation’s web
web is used as a source. The big advantage domain. But a list of a given country’s ccTLD
of this approach is that it is easy to automate, is normally not accessible, either because it
and it scales to large quantities of web mate- does not exist any longer (the national ccTLD
rial of which the reconstructed national web handler may not have preserved it), or it is not
is composed. However, this way of delimit- freely available (because of privacy issues, or
ing a nation on the web also comes with a other). Therefore, if this list is not available
number of challenges. one has to reconstruct it by searching existing
The first cluster of challenges revolves web archives for the relevant ccTLD name,
around the extent to which the ccTLD mirrors and then perform a hyperlink analysis of
the online activities related to a given nation. the found material to discover new domain
Although the ccTLD may cover some of these names on the ccTLD (an iterative process
activities, in most cases it does not cover all that can be repeated). For instance, this is
of them on a 1:1 scale since a certain por- the method developed and used in a study
tion of a nation’s web can be located on other of the Yugoslav web and the ccTLD .yu (Ben-
top-level domain names such as .com, .edu, David, 2016, cf. also section ‘A disappeared
etc. (a so-called generic Top-Level Domain nation’ later in this chapter).
(gTLD)), or on transnational domains such A third challenge relates to the ccTLDs
as .eu. Also, there may be a certain degree themselves, because even if one has access
of randomness related to the link between a to the authoritative list or has reconstructed
given ccTLD and the content’s relation to the (parts of) it, a ccTLD’s domain names tend to
nation in question; for instance, the ccTLD .tv be dynamic in the sense that they come and
for the island of Tuvalu is used by a large go, and this happens rapidly. An illustrative
number of broadcasters and streaming ser- example of this is a historical study of the
vices, and it would not tell much about that Danish ccTLD .dk where the authoritative
island nation, just like the domain name ccTLD list was compared with the holdings
‘youtu.be’ does not relate to Belgium. In of the US-based Internet Archive. The study
some countries the ccTLD plays an impor- showed that the Internet Archive had archived
tant role on the national web (like .uk, .fr), more web domains than the ones on the ini-
whereas in other countries the ccTLD is not tial ccTLD list that the Danish national web
that well used by the nation’s citizens (like archive Netarkivet used when starting a web
.us, .ca; cf. Milligan and Smyth, 2018 regard- archiving of the entire Danish web domain
ing .ca). In addition, cities within a nation (Brügger et al., 2018). The reason for this is
may have their own TLD, like .amsterdam that the archiving of a national web domain
(Teszelszky, 2018). The challenge regarding takes time, often several months, and dur-
material on gTLDs and similar is that it has ing this time the ccTLD’s domain names are
to be tracked manually, by researchers or web changing. Therefore, this temporal discrep-
archive curators.2 Finally, in some nations ancy between a ccTLD list and the changes
there exist more than one ccTLD on the same during the time it takes to archive a national
territory, such as in the Netherlands where web domain implies that the ccTLD list itself
the .nl exists along with the .frl domain, the does not mirror the national web as it may
ccTLD of Friesland, the region of the Frisian look during the archiving, and it may then be
people (Teszelszky, 2018). necessary to consult other web archives.
The second challenge relates to finding Another way of delimiting a nation on the
and accessing information about a given web would be to look for language. However,
ccTLD. It is preferable to have the entire this is only useful in very few cases where a
HISTORICAL STUDIES OF NATIONAL WEB DOMAINS 419

given language is spoken in only one coun- log file that records everything that happened
try (as for instance Danish in Denmark). In during the web crawl.4 General knowledge
many other cases searching for a nation’s about web crawling is important to have in
main language would not be useful because mind to better understand the challenges
the language is widely spoken (like English related to this type of material.
or Spanish). In general, the main challenge when bas-
In summary, delimiting a nation on the ing studies of national web domains on the
web is challenging. In many cases the ccTLD archived web in web archives is how the
can be a distinctive element, but it usually has holdings of the web archives can be used to
to be supplemented, whereas in other cases reconstruct ‘the national web domain’. As
the ccTLD is not much help. will be shown in more detail below, in some
countries this is a question of having more
material than is needed by the web historian
The Archived Web as a Source to cover ‘the national web domain’, and there-
fore selections have to be made, whereas in
As mentioned above, the archived web may other cases the reconstruction of the national
not be the only useful historical source in a web domain is a matter of the web historian
study of a national web, but nevertheless it is not having enough material, and therefore the
still an important source, and it is therefore material at hand may have to be combined.
necessary to reflect on some of the chal- In brief, the challenge is how one can recon-
lenges related to its use, because it is distinct struct a corpus of archived web that can be
from other source types (written, print, elec- maintained to cover a national web domain.
tronic, digitized, as well as online sources). Before looking more closely at these two
The online web can be archived in a number cases of establishing a corpus, it is important
of ways (Brügger, 2018a, 2018b), but for to remind oneself that when using archived
historical studies of a nation’s web domain web in national web archives it is not only
the most obvious place to look for relevant a matter of reconstructing the national web
source material is national or transnational based on archival holdings – these holdings
web archives.3 In the main these web archives are themselves (re)constructions of the online
have collected and preserved the online web web (Brügger, 2018a, 2018b). For various
by the use of web crawling. With web crawl- reasons web archiving, including web crawl-
ing a list of web domains to be crawled is set ing, cannot collect and preserve the online
up in the crawling software, then the web web on a 1:1 scale, so when reconstructing
crawler contacts the web server where the a national web in a web archive this recon-
web domain is hosted, retrieves the web struction is taking place on top of the ini-
pages, finds all the hyperlinks on these tial construction that took place during the
pages, and follows these hyperlinks and archiving. In that sense what is analysed is
retrieves the web pages to which they point. the web as archived and not the online web as
And this process can be repeated in as many it was in the past.
iterations as the web archiving institution When reconstructing a corpus that can
specifies in the scope of the web crawler. The be claimed to constitute ‘a national web’,
result of web crawling is a number of files the researcher is dependent on how the web
(HTML and other) that are usually preserved archive in question has collected and pre-
in a compressed file format such as ARC or served the online web, and how it is made
WARC. In addition, web crawling also pro- available to researchers. There are great
duces supplementary material such as the differences regarding this between web
seed list (the list of web domains used by the archives, and in each case the choices impact
web crawler) and the crawl log, which is a the research. It is worth mentioning that
420 THE SAGE HANDBOOK OF WEB HISTORY

corpus building is not the same as crawling archive, as in the Danish case. It can also be a
the web with a view to establishing a collec- matter of having too many web archives that
tion. Most web archives have collected more each collect parts of the same national web,
material than is needed by the web historian, but with different archiving strategies and
and therefore they cannot be dealt with as a aims. This is, for instance, the case in the UK,
single unit. Thus, selections have to be made where the British Library performs a broad
to establish a corpus. crawl of the entire ccTLD .uk and some
In some countries more material has been other non-.uk sites as well (since 2013), but
archived than is needed by the web historian in addition to this the library holds selective
to cover ‘the national web domain’, in some archivings from before 2013, and parts of the
cases deliberately, in other cases accidentally, UK web domain are also archived by the UK
as an effect of web crawling as an archiving Government Web Archive at The National
method. And in both cases selections have to Archives as well as by the UK Parliament
be made to establish a corpus that is as accu- Web Archive (Winters, 2018). Thus, the
rate as possible. The Danish case can illus- researcher who wants to establish a corpus of
trate these points (Brügger et al., 2018). The the UK web domain may start with the British
Danish web is archived by Netarkivet, and Library’s ccTLD crawl, but the other collec-
the entire Danish ccTLD .dk is crawled two tions may be included as well, because they
to four times per year (including some mate- can possibly supplement the broad crawl.
rial on gTLDs).5 Thus, the first choice is to However, combining web collections across
choose between these crawls. When one of web archives is in itself challenging because
these has been selected, then comes a more of differences as to archiving strategies and
complicated round of selections. The chal- software, quality of the material, terms of
lenge is that the web crawling process has searching and accessing the archive, archiv-
implied that more versions of the same web ing format, etc. (Brügger, 2012: 316–18).
page are likely to have been collected dur- In other countries the challenge is not hav-
ing the two to four months it takes to archive ing too much material, but rather the opposite:
.dk. The reason for this is that hyperlinks there is not enough archived web for the web
may point to HTML pages that were already historian to reconstruct a corpus that is close
archived (or that are later being archived), to the national web. This lack of material can
and these web pages will then occur in the come in different forms, ranging from that
material more than one time. However, since something but not all is available, to that noth-
this is not a systematic bias (it is based on ing is available at all. In the Netherlands, for
the accidental presence of link targets), the instance, there exists a national web archive
researcher has to make selections as to which at the National Library of the Netherlands,
of these versions should be included in the but it does not make crawls of the entire .nl
corpus.6 Therefore, when this type of web ccTLD (only a small number of selected web
archive collection is used, ‘the national web’ domains are archived, not based on legal
has to be established in a two-step selection deposit), but as was the case in the UK there
procedure: first, one chooses which of the also exist a few smaller web archiving initia-
annual crawls to be used; and, second, ver- tives such as the Archipol web archiving pro-
sions of the same web page should be iden- ject at the Documentation Centre for Dutch
tified and selections have to be made, and Political Parties (Teszelszky, 2018). Thus, in
neither identification nor selection are trivial this case one cannot base the reconstruction
to perform. of the Dutch web on a ccTLD crawl, supple-
That there is too much archived web to mented with material from other web collec-
choose from is not necessarily only a matter tions; one can only try to combine as many
of having too much material in the same web bits and pieces as possible from the national
HISTORICAL STUDIES OF NATIONAL WEB DOMAINS 421

selective web archive with material in other and to have access to broad crawls of the rel-
Dutch web archives to get as close as possi- evant ccTLD, which may then eventually be
ble to the national web. And if it has not been supplemented with material from other col-
possible to provide an authoritative list of all lections. In cases where neither a list of the
existing web domain names on the Dutch ccTLD nor a broad crawl is accessible, one
ccTLD it can be difficult to know exactly how may have to reconstruct the list of domain
(in)complete the constructed national web is names of the ccTLD as well as combine parts
(as mentioned above, a hyperlink analysis of of the national web as it can be found in the
what is already included in the corpus may collections at hand. And, in any case, there
help remedy this).7 are challenges to having too much as well
In yet other countries there does not as too little material, but the challenges are
even exist a national web collection, but different.
only smaller collections. This is the case in
Belgium, where a number of web archiving
initiatives exist, for instance at Felix Archive
in Antwerp, the AMSAB Institute of Social HISTORICAL CASE STUDIES
History in Ghent, and Ghent University
Library (Chambers and Mechant, 2018), As mentioned in the brief research history
as well as in Italy where there exists only a above, since the mid 2010s a number of case
small collection of the Italian ccTLD .it from studies have been made of national web
2006 and a collection of PhD theses from domains. This section will briefly introduce
Italian universities (Nanni, 2017).8 some examples of studies with a view to
Finally, some countries did not manage to illustrating how national web studies can be
establish a national web archive before they made, and how the general challenges out-
disappeared as countries, which is the case lined above have been dealt with, in particu-
with Yugoslavia, and in such cases one has to lar the question of delimiting the nation on
reconstruct the national web based on mate- the web.
rial in other web archives (cf. Ben-David,
2016, and section ‘A disappeared nation’
below). A National Web Domain
It is worth mentioning that a national web
may also be constructed on the basis of what The French research project ‘Web90’, dedi-
can be found in a transnational web archive cated to the French heritage, memories, and
such as the Internet Archive. But the Internet history of the web in the 1990s led by
Archive’s collection of material from a given Valérie Schafer (France, 2014–17) outlines
ccTLD is not necessarily as complete as it the multiple challenges of defining the
would have been if the collection had been research field of a national web domain. In
initiated on the basis of a complete list of investigating how the French web of the
the domain names of the ccTLD. Thus, the nineties can be mapped out, the project
Internet Archive’s collections are a great help works on several scales – macro and micro,
either to supplement existing national collec- collective and individual – and bases itself
tions or to be used as the core of a national on a variety of heterogeneous sources, using
corpus, but one should not expect to find a multi-method approaches, i.e. qualitative
complete copy of the national web domain.9 and quantitative methods, close and distant
In summary, when constructing a national readings, and science and technology s­ tudies
web it is a great help to have access to an approaches.
authoritative list of all web domains on the In delimiting their object of study, the
nation’s ccTLD (although the list is dynamic), researchers do not begin with the introduction
422 THE SAGE HANDBOOK OF WEB HISTORY

of the .fr domain in 1987 nor with the first A Nation’s Sub-Domains – the
websites created in the 1990s. In contrast, Case of the UK Web
they follow a thread which leads back to the
first French contacts with the internet and In the Domain Name System hierarchy, a
the first expressions of interest among the second-level domain (SLD) is a domain that
research community in the internet and the is directly below a top-level domain (TLD).
web, in the 1970s. Another part of the study Second-level domains often refer to the
centres around the Minitel, which for most organization that registered the domain name
French people was their first online experi- with a domain name registrar. Some domain
ence of messaging, directories, and commer- name registries introduce a second-level
cial services (cf. also Schafer and Thierry, hierarchy to a TLD that indicates the type of
2012). Yet another line of study revolves entity intended to register an SLD under it.
around the cyber cafes as spaces for learning, For example, in the .uk name space a college
socialization, and communication (Schafer, or other academic institution would register
2018). These studies are good illustrations of under the .ac.uk ccSLD, while companies
the challenges and perspectives in the study would register under .co.uk. This allows for
of where a nation’s web begins. studies on the relative size of second-level
The boundaries of the nation’s web are domains and the role of different sectors on
also explored in the project’s investigation the web. One such study is Hale et al. (2014),
of the French web domain, i.e. the set of which investigates the growth of the entire
digital resources spanning multiple web- .uk domain between 1996 and 2010, broken
sites relevant or related to the French nation, down into its four largest constituent parts,
not only websites with the Country Code .co.uk, .org.uk, .gov.uk, and .ac.uk, showing,
Top-Level Domain name .fr (cf. Schneider for instance, that .co.uk is the predominant
and Foot, 2005). The project analyses a second-level domain throughout the entire
case where a request to the Internet Archive period. A particular second-level domain, the
for all top-level domains generically or academic sub-domain .ac.uk, is analysed in
geographically relating to France only relation to linking practices compared with
produced one single .fr domain featured institutional affiliation, league table ranking,
among the largest 200 domains (Schafer, and geographic location, showing clear pat-
2018). This was wanadoo.fr, while the larg- terns in terms of higher inlinks to the highest-
est domains included infospace.com and status academic institutions and stronger
amazon.com, as well as yahoo.com and connections between geographically closer
geocities.com. A further example is the institutions.
story of the pioneering site in the museum Studying second-level domains may have
field, Weblouvre, developed by a student, some limitations, as in the above case where
Nicolas Pioch, from 1994, which won a Best several British companies and organiza-
of the Web Award. When the Louvre found tions may operate domains in generic top-
that Nicolas Pioch was using its name, it level domains (.com, .org, etc.). Another
decided to recover the domain and launch approach to studying sub-domains is taken
its own website, and Nicolas Pioch trans- by Musso and Merletti (2016). They pro-
ferred his site. Although no longer hosted pose a methodology based on the usage of
in France and although entirely in English, historical web directories to access and map
this site has a significant place in the early past sub-domains. Web directories were lists
history of digital culture and museum sites that could be found on web portals such as
in France, and also, despite its subsequent Yahoo!, where readers could get an overview
migration, in the history of domain name with hyperlinks to websites related to dif-
allocation. ferent topics, such as ‘Culture’, ‘Sport’, and
HISTORICAL STUDIES OF NATIONAL WEB DOMAINS 423

‘Business’. In the study, the directories pre- a European network that aims to create a
served in the Internet Archive are used as the research infrastructure for the study of
entry door to the archived .uk web. The direc- archived web materials, and the collections in
tories allow the quantitative reconstruction such a research infrastructure will be incom-
of specific webspaces because they conserve plete without content on .eu. For historical
the memory of many hyperlinks that have not studies, however, the researcher is prone to
been archived by any archive but that existed what may or may not have been harvested
at the time in which the directory was on the in different web archives. In 2014, a study
live web. As mapped through several busi- showed that while 3.7 million .eu domains
ness directories, Musso and Merletti recon- were registered that year, the national web
struct the early approach of UK business to archives in Denmark, Portugal, and the UK
the web between 1996 and 2001. had captured only a few thousand (Gomes
et al., 2014). Hockx-Yu et al. (2018) suggest
aggregating content from different archives
and bringing it together to serve as a .eu web
Trans-National Web Domains
archive. Even if it is incomplete, it would still
While some Country Code Top-Level provide some insight into the development of
Domains are allocated to a given nation, such the .eu web domain for future generations’
as .uk for the United Kingdom, .fr for France, understanding of European nations on the
and mirror some of its national online activi- web. The study also proposes and discusses
ties, a certain portion of a nation’s web can a number of options towards sustainable,
be located on other top-level domain names long-term archiving arrangements for .eu,
such as .com, .edu, or .eu. The challenge such as a group or an organization taking
regarding national material on these top-level on the long-term responsibility of collect-
domains is that it has to be tracked manually ing, preserving, and providing access to the
or semi-automatically (cf. above), and that – archived European web, possibly with direct
in their entirety – these top-level domains do funding or involvement from the European
not naturally fall under any national memory Commission.
institution’s remit since they cover multiple
nations.
The .eu is a curious case as it is region spe- Borderland Web Domains – the
cific, created to promote the European sin- Northern Irish Web Domain
gle market on the internet (Hockx-Yu et al.,
2018). With 3.7 million registered domains, While every national web domain includes
the .eu domain is also one of the biggest very substantial amounts of content that
top-level domains in Europe (in comparison resides within the various geographically
France has 2.5 million registered domains non-specific domains, little is known about
(DomainTyper, The .fr domain)), and it is what factors determine the choices made by
the sixth most popular ccTLD in Europe webmasters as to the domain registrations
(DomainTyper, The .eu domain). However, a they choose. This may be examined in differ-
comprehensive web archive for .eu does not ent ways, and one exceptional case is the
exist, because no institution has taken formal, cross-border study of the Northern Irish web
ongoing responsibility for whole-domain domain by Webster (2018a). On the island of
archiving of .eu. The only comprehensive Ireland two political units (the UK and the
effort to date that focused on archiving Republic of Ireland) and two ccTLDs (.uk
the entire .eu domain was conducted by and .ie) share a land border. Webster’s study
the Portuguese web archive, Arquivo.pt, takes the case of the Christian churches in
between 2014–17 in the context of RESAW, Ireland (north and south) as a case study in
424 THE SAGE HANDBOOK OF WEB HISTORY

the mapping of the differences between the existed. When researching the domain on the
nation and the top-level domain. The live web in 2014, Ben-David found little evi-
churches in Ireland are organized on an all- dence that it ever existed. With little evidence
Ireland basis: a reflection of their origins that on historical addresses on the live web, the
pre-date the partition of Ireland into north Internet Archive’s Wayback Machine was of
and south. Using link graph data provided by sparse use since one has to know the specific
the British Library, the study recreates the URL in order to retrieve archived snapshots.
networks of links between individual church In order to reconstruct a portion of the web
congregations on both sides of the border, that no longer exists and whose archived
and the national infrastructure of the churches locations are unknown, Ben-David turned
into which local congregations fit. In exam- to external sources to reconstruct a seed list,
ining the regional, national, and cross-border and then a hyperlink discovery method was
relationships, the study argues that even used to find more .yu pages in the Internet
though there are indeed clear geographic Archive. With the method applied, the net-
concentrations in these link graphs, these worked history of the .yu ccTLD was par-
map only very loosely indeed to the UK tially reconstructed.
Country Code Top-Level Domain. For In another study, Ben-David (2018) chal-
instance, it seems that Roman Catholic lenges the notion of the national web even
­congregations are likely to register domains further, by examining the web history of
outside the UK; however, a particular prior- Kosovo – a disputed country that struggles
itization of registration within the UK in rela- for statehood, yet does not have a ccTLD.
tion to the Protestant churches is not borne The fact that Kosovo is a national web that is
out. The study affirms that the Country Code not supported by the Domain Name System
Top-Level Domain may be a weak proxy for reveals a variety of entangled ties between
the national web, including or even more so geopolitics, standards, and technologies that
in cross-border cases such as this. stand behind national webs. Finally, the role
of different political and regulatory institu-
tions in determining national web borders is
A Disappeared Nation and a discussed, where recognition by the United
Nations is a preliminary condition for a coun-
Nation Without a Web Domain
try’s web presence, and it is compared to the
Nation states are not always stable and sover- process that led to the successful delegation
eign entities, whose borders and boundaries of the Palestinian ccTLD, .ps.
are recognized and undisputed. The history
of the web has seen several deletions and re-
delegations of ccTLDs of countries in transi-
tion. Some examples are the retired ccTLD CONCLUSION
of Czechoslovakia, .cs, which became the
new ccTLDs to the Czech Republic (.cz) and The national web as object worthy of study-
Slovakia (.sk), and the former domain of ing has gained growing attention among
Zaire, .zr, re-delegated as .cd after the coun- internet researchers and web historians since
try changed its name to the Democratic the early 2000s. In this chapter we have
Republic of Congo (Ben-David, 2016). argued that the research history has devel-
Ben-David has researched the .yu, the oped in three waves, the first with a technical
ccTLD of Yugoslavia. The .yu was deleted aim, the second augmented by social sci-
from the internet’s Domain Name System in ences and humanities, and the third with a
2010, after it had lost its legitimacy, since the clear historical and national approach, and in
country to which it was delegated no longer many cases based on the archived web.
HISTORICAL STUDIES OF NATIONAL WEB DOMAINS 425

Today, national projects are well underway in and transnational level on this increasingly
Denmark, in France, and in the UK, among important digital cultural heritage.
others. These national web studies may con-
stitute the backdrop for web studies of other
analytical entities. Notes
The chapter has also outlined a number
of methodological challenges that studies 1  Anderson mentions three other characteristics:
nations are limited, they are sovereign, and they
of national webs face. One perplexing ques- are a community (1996: 7).
tion is: where is the national web? And what 2  It can be partly automated, cf. Zierau et al. (2015).
does it mean to think of the web in spatial 3  For an overview of web archiving initiatives see
and quasi-geographic terms? A way of delim- List of Web archiving initiatives (n.d.); IIPC Mem-
iting a national web is to use the ccTLD but bers (n.d.); Webster (2018b). Historical overviews
can be found in Brügger (2011: 29–32); Webster
national webs can also be found on generic (2017)
TLDs, and ccTLDs are deleted and re-del- 4  See Brügger (2018a, 2018b) for an expanded dis-
egated over time when countries transform. cussion of the challenges related to web crawling
Hence, national projects are put in dialogue as well as how web archives based on web crawl-
with investigations of the boundaries between ing are made available for researcher use.
5  For more information about Netarkivet, see
national domains, and interactions across Schostag and Fønss-Jørgensen (2012).
those boundaries. For historical web studies, 6  These issues are discussed more in detail in
the archived web can be used as a source; ­Brügger et al. (2018).
however, some countries do not have web 7  A number of other countries do not make broad
archives or only smaller collections or no col- crawls of the entire ccTLD (e.g. Australia), and,
like the Netherlands, Canada also only archives
lections at all. Researchers must then turn to selective parts of the national Canadian web, due
other countries’ archives or to transnational to the lack of a legal deposit law (cf. Milligan and
web archives such as the Internet Archive, Smyth (2018)).
which may or may not have archived relevant 8  A pilot project, PROMISE, for the establishment
parts. Moreover, often web archives do not of a national Belgian web archive was launched
in 2017 (Chambers and Mechant, 2018).
have full-text search, so secondary sources 9  The coverage of the Internet Archive compared
for identification of relevant web addresses with the Danish Netarkivet is discussed in Brügger
are crucial to retrieve the potentially archived et al. (2018).
objects. Additionally, when reconstructing
the nation’s past web, this reconstruction is
taking place on top of the initial construction
that took place during the archiving. REFERENCES
National web archives delimit the border-
less flow of information on the web with Anderson, B. (1996) Imagined communities:
national barriers. In addition, there are no Reflections on the origin and spread of
easy ways of accessing or processing the large nationalism. London: Verso.
amounts of data, neither on a national level, Baeza-Yates, R., Castillo, C., and Efthimiadis,
nor on an international level, as web archives E.N. (2007) ‘Characterization of national
are not prepared for Big Data studies. This Web domains’, ACM Transactions on Inter-
net Technology (TOIT), 7(2). DOI
chapter has introduced some examples of
10.1145/1239971.1239973.
studies on how national web studies can be Ben-David, A. (2014) ‘Mapping minority web-
made and how some of the above-mentioned spaces: The case of the Arabic webspace in
challenges have been dealt with. These stud- Israel’, in D. Caspi and N. Elias (Eds.), Ethnic
ies shed light on the historical development minorities and media in the Holy Land.
of specific national webs and suggest many London: Vallentine-Mitchell Academic. pp.
future possibilities for research on a national 137–157.
426 THE SAGE HANDBOOK OF WEB HISTORY

Ben-David, A. (2016) ‘What does the web historical web and Digital Humanities: The
remember of its deleted past? An archival case of national web domains. Abingdon:
reconstruction of the former Yugoslav top- Routledge.
level domain’, New Media & Society, 18(7): Cowls, J. (2017) ‘Cultures of the UK web’, in N.
1103–1119. Brügger and R. Schroeder (Eds.), The web as
Ben-David, A. (2018, forthcoming) ‘National history: Using web archives to understand
web histories at the fringe of the web: Pales- the past and the present. London: UCL Press.
tine, Kosovo and the quest for online self- pp. 220–237.
determination’, in N. Brügger and D. Laursen Curran, J., Coen, S., Aalberg, T., Hayashi, K.,
(Eds.), The historical web and Digital Human- Jones, P.K., Splendore, S. Papathanassopou-
ities: The case of national web domains. los, S., Rowe, D., and Tiffen, R. (2013) ‘Inter-
Abingdon: Routledge. net revolution revisited: A comparative study
Brügger, N. (2011) ‘Web archiving – between of online news’, Media, Culture & Society,
past, present, and future’, in M. Consalvo 35(7): 880–897.
and C. Ess (Eds.), The Handbook of Internet DomainTyper, The .eu domain, https://domain-
Studies. Oxford: Wiley-Blackwell. pp. 24–42. typer.com/domain-names/top-level-domains/
Brügger, N. (2012) ‘Historical network analysis ccTLD/eu-domain (accessed 16 March 2018).
of the web’, Social Science Computer DomainTyper, The .fr domain, https://domain-
Review, 31(3): 306–321. typer.com/domain-names/top-level-domains/
Brügger, N. (2017) ‘Probing a nation’s web ccTLD/fr-domain (accessed 16 March 2018).
domain: A new approach to web history and Finnemann, N.O., Jauert, P., Jensen, J.L., Pov-
a new kind of historical source’, in G. Goggin lsen, K.K., and Sørensen, A.S. (2012) The
and M. McLelland (Eds.), The Routledge media menus of Danish internet users 2009.
companion to global internet histories. New Aarhus: The Research Project ‘Changing Bor-
York: Routledge. pp. 61–73. derlines – Mediatization and Cultural Citizen-
Brügger, N. (2018a) The archived web: Doing ship’. https://research.ku.dk/search/?pure=en/
web history in the digital age. Cambridge, publications/the-media-menus-of-danish-
MA: MIT Press. internet-users-2009(7ac0e756-6479-41c3-
Brügger, N. (2018b) ‘Understanding the a80c-f911b565f230).html (accessed 30 June
Archived Web as a Historical Source’, in 2018).
N. Brügger and I. Milligan (Eds.), The SAGE Gomes, D., Hockx-Yu, H., and Laursen, D.
Handbook of web history. London: Sage. (2014) Archiving .eu – pilot project. RESAW
pp. 16–29. Seminar, London, UK.
Brügger, N., and Laursen, D. (Eds.) (2018, Halavais, A. (2000) ‘National Borders on the World
forthcoming) The historical web and Digital Wide Web’, New Media Society, 2(1): 7–28.
Humanities: The case of national web Hale, S.A., Yasseri, T., Cowls, J., Meyer, E.T.,
domains. Abingdon: Routledge. Schroeder, R., and Margetts, H. (2014) ‘Map-
Brügger, N., Laursen, D., and Nielsen, J. (2018) ping the UK Webspace: Fifteen Years of Brit-
‘Establishing a corpus of the archived web: ish Universities on the Web’, WebSci ‘14
The case of the Danish web from 2005 to Proceedings of the 2014 ACM conference
2015’, in N. Brügger and D. Laursen (Eds.), on Web science, Bloomington, Indiana, June.
The historical web and Digital Humanities: Hockx-Yu, H., Laursen, D., and Gomes, D.
The case of national web domains. Abing- (2018, forthcoming) ‘The curious case of
don: Routledge. archiving .eu’, in N. Brügger and D. Laursen
Brügger, N., and Schroder, R. (2017) (Eds.) The (Eds.), The historical web and Digital Human-
web as history: Using web archives to under- ities: The case of national web domains.
stand the past and the present. London: UCL Abingdon: Routledge. pp. xx–xx.
Press. IIPC Members (n.d.) http://netpreserve.org/
Chambers, S., and Mechant, P. (2018, forth- about-us/members
coming) ‘Towards a national web in a feder- List of Web archiving initiatives (n.d.) https://
ated country: A Belgian case study’, in e n . w i k i p e d i a . o r g / w i k i / L i s t _ o f _ We b _
N. Brügger and D. Laursen (Eds.), The archiving_initiatives
HISTORICAL STUDIES OF NATIONAL WEB DOMAINS 427

Milligan, I., and Smyth, T.J. (2018, forthcom- Schostag, S., and Fønss-Jørgensen, E. (2012)
ing) ‘Studying the Web in the shadow of ‘Webarchiving: Legal deposit of internet in
Uncle Sam: The case of the .ca domain’, in Denmark, a curatorial perspective’, Micro-
N. Brügger and D. Laursen (Eds.), The histori- form & Digitization Review, 41: 110–120.
cal web and Digital Humanities: The case Schroeder, R. (2018) Social theory after the
of national web domains. Abingdon: internet. London: UCL Press.
Routledge. Teszelszky, K. (2018, forthcoming) ‘The historic
Mussu, M., and Merletti, F. (2016) ‘This is the context of web archiving and the web
future: A reconstruction of the UK business archive: Reconstructing and saving the Dutch
web space (1996–2001)’, New Media & national web using historical methods’, in
Society, 18(7): 1120–1142. N. Brügger and D. Laursen (Eds.), The histori-
Nanni, F. (2017) ‘Reconstructing a website’s cal web and Digital Humanities: The case of
lost past methodological issues concerning national web domains. Abingdon:
the history of Unibo.it’, Digital Humanities Routledge.
Quarterly, 11(2). http://digitalhumanities. Webster, P. (2017) ‘Users, technologies, organi-
org/dhq/vol/11/2/000292/000292.html sations: Towards a cultural history of world
(accessed 19 Sep 2018). web archiving’, in N. Brügger (Ed.), Web 25:
Rogers, R. (2013) Digital methods. Cambridge. Histories from the first 25 years of the World
MA: MIT Press. Wide Web. New York: Peter Lang Publishing.
Rogers, R., Weltevrede, E., Borra, E., and pp. 175–190.
Niederer, S. (2013) ‘National web studies: Webster, P. (2018a) ‘Understanding the limita-
The case of Iran online’, in J. Hartley, tions of the ccTLD as a proxy for the national
J. Burgess and A. Bruns (Eds.), A companion web: Lessons from cross-border religion in
to new media dynamics. Oxford: Blackwell. the Northern Irish web sphere’, in N. Brügger
pp. 142–166. and D. Laursen (Eds.), The historical web and
Schafer, V. (2018, forthcoming) ‘Exploring the Digital Humanities: The case of national web
“French Web” of the 1990s’, in N. Brügger domains. Abingdon: Routledge.
and D. Laursen (Eds.), The historical web Webster, P. (2018b) ‘Existing web archives’, in
and Digital Humanities: The case of national N. Brügger and I. Milligan (Eds.), The SAGE
web domains. Abingdon: Routledge. Handbook of web history. London: Sage.
Schafer, V., and Thierry, B. (2012) Le Minitel: pp. 30–41.
L’enfance numérique de la France. Paris: Winters, J. (2018, forthcoming) ‘Negotiating
Nuvis, Cigref. the archives of UK web space’, in N. Brügger
Schafer, V., and Thierry, B. (2016) ‘The “Web of and D. Laursen (Eds.), The historical web and
pros” in the 1990s: The professional acclima- Digital Humanities: The case of national web
tion of the World Wide Web in France’, New domains. Abingdon: Routledge.
Media & Society, 18(7): 1143–1158. Zierau, E., Brügger, N., and Moesgaard, J.
Schneider, S.M., and Foot, K.A. (2005) ‘Web (2015) ‘Defining a National Web Sphere over
sphere analysis: An approach to studying time from the Perspectives of Collection,
online action’, in C. Hine (Ed.), Virtual meth- Technology and Scholarship’. Paper
ods: Issues in social research on the Internet. ­presented at the RESAW conference, Aarhus,
Oxford: Berg Publishers: pp. 157–170. June 2015.
29
The Origins of Electronic Literature
as Net/Web Art
James O‘Sullivan and Dene Grigar

Readers of this chapter might benefit from difficult to point with any certainty to the first
the knowledge that ‘electronic literature’ (or work of electronic literature, though many
e-lit) refers to born-digital works which commentators attribute this accolade to
embrace the creative, as opposed to just the Christopher Strachey, who, in 1952, designed
disseminative, potential of computation. As an algorithm on the University of
defined by Hayles: ‘Electronic literature, Manchester’s Ferranti Mark I capable of gen-
generally considered to exclude print litera- erating love letters (Wardrip-Fruin, 2008:
ture that has been digitized, is by contrast 163). Literary experiments like Strachey’s
“digital born,” a first-generation digital object predate what we now consider ‘the Internet’
created on a computer and (usually) meant to by some 30 years, and indeed even concep-
be read on a computer’ (2008: 3). A useful tual precursors like Ted Nelson’s Project
primer for those looking to develop some Xanadu, introduced in 1960, and computer
perquisite understanding might be ‘Electronic poetry like Alison Knowles and James
Literature: Contexts & Poetics’, wherein Tenney’s The House of Dust, produced on a
Heckman and O’Sullivan argue that e-lit can Siemens 4004 computer with Fortran in
take many forms, but ‘could only exist in that 1967, by the better part of a decade. Judy
space for which it was developed/written/ Malloy’s Uncle Roger, largely considered to
coded’ (2018). Electronic literature (or e-lit) be one of the first commercial works of elec-
did not start with the Web – works of art tronic literature in the United States, was
merging the literary with the digital had published between 1987 and 1988 and set the
found their way into circulation long before stage for stand-alone software that marks the
the Net had permeated public consciousness early period of e-lit development (Grigar and
and consumptive habits to the extent that we Moulthrop, 2015). While the circulation of
know it in contemporary contexts. It is electronic literature is now largely confined
THE ORIGINS OF ELECTRONIC LITERATURE AS NET/WEB ART 429

to Web-based platforms, the early history of is what publishing, an activity so central to


this domain is one of floppies, diskettes, and literature, is all about – needs to be recog-
compact discs. nised, as does the influence of hypertextual-
We can see the reality of this history in ity, which again, while arguably predating
the material emphasis of resources like the Internet as a literary concept, certainly
the Electronic Literature Lab (ELL) at became more pronounced once authors rec-
Washington State University Vancouver, ognised the potential syntheses between the
where scholars and practitioners can get world of writing and the World Wide Web.
a sense of the field’s earliest incarnations In treating the history of electronic literature
through interaction with those pieces. The in the context of the history of the Web, this
central mission of this initiative is to preserve chapter will detail the form as both a pre- and
pieces in their release state, so that the liter- post-Web practice, drawing particular atten-
ary experience is not modified by migration tion to the Net art movement and contempo-
to more current technologies or emulation rary e-lit situation, while arguing for a more
on contemporary software. How does one comprehensive electronic literary history.
capture the experience of first-generation Many of the first-generation hypertextual
e-lit works as objects, replicating the smell works that dominated the field in its earli-
of the plastic, the computer’s loading screen, est days were published by Eastgate Systems,
the learning curve attached to the traversal Inc., founded in Watertown, Massachusetts by
method? The electronic literary experience is Mark Bernstein. With some notable excep-
bound to the quirks of intended platforms, the tions, like Sarah Smith’s King of Space (1991),
peculiarities of the technology, and the vari- written in HyperGate, Deena Larsen’s Marble
ety of glitches that such configurations offer. Springs (1993), created in HyperCard, and
The ELL then, resembles something of a Mac M. D. Coverley’s Califia (2000), produced
museum,1 where vintage machines are used with ToolBook, works for stand-alone sys-
as reading devices for outmoded formats. It is tems were mostly written with Storyspace, an
difficult to provide any precise periodisation intuitive hypertext authoring system created by
of the field – electronic literature is inher- Jay David Bolter, Michael Joyce, and John B.
ently literary, and with any literary move- Smith (Landow, 1992: 40) that is now main-
ment, genres and modes are comprised of tained by Bernstein under his imprint. ‘The
further genres and modes. It might be useful, Eastgate School’, or as Hayles calls it, the
however, to think of ‘first-generation’ e-lit as ‘Storyspace school’ (2008: 6), is synonymous
largely hypertextual, that is to say, screen- with the origins of the movement, incorporating
based works whose digital aesthetics were many of the electronic hypertexts to first garner
largely drawn out of the idea of links, and how significant critical attention: Michael Joyce’s
narrative can be constructed and disrupted in afternoon, a story (1990),2 Stuart Moulthrop’s
a fragmentary manner. Second generation Victory Garden (1992), Jane Yellowlees
might be considered to be more algorithmic Douglas’ ‘I Have Said Nothing’ (1994), and
or computationally ‘sophisticated’, as is the Shelley Jackson’s Patchwork Girl (1995).
case with generative literature. The most con- The earliest works of electronic literature
temporary of electronic literary works might were disseminated as floppy disks (Fig 29.1),
be seen as inherently ludo-literary, availing of and later, as CDs – as noted, e-lit had a life
the affordances of technologies made popular before the Web. But the influence of the Web
by the video game industry, though (usually) is not entirely absent from the origins of the
privileging language over play. field. As mentioned, one of the earliest exam-
But while the history of electronic litera- ples of commercially available electronic lit-
ture is not exclusively Web-based, the signifi- erature, Malloy’s Uncle Roger, first emerged
cance of the Net as a means of sharing – which in 1986 as a serialised novel on The WELL, an
430 THE SAGE HANDBOOK OF WEB HISTORY

Figure 29.1 Stuart Moulthrop’s Victory Garden (1992), published by Eastgate Systems.

online community started in 1985 by Stewart from an 1988 Art Com catalogue (Fig 29.2),
Brand and Larry Brilliant.3 Research by the advertising copies of Uncle Roger packaged
seminal Pathfinders4 project shows an advert as floppy disks (Grigar and Moulthrop, 2015).

Figure 29.2 Advert for Judy Malloy’s ‘Uncle Roger: A Party in Woodside’, from Pathfinders.
THE ORIGINS OF ELECTRONIC LITERATURE AS NET/WEB ART 431

There are six digital versions of Uncle A CASE FOR ELECTRONIC LITERARY
Roger; the first began, as already noted, HISTORY
as a serial novel for the net published on
The WELL’s Art Com Electronic Network Lev Manovich traces the origins of digital art
(ACEN). The second version was an inter- to the post-WWII era, citing co-occurring
active narrative also published by ACEN. shifts in technology and artistry as the moti-
The third version, created as a database vating factors: ‘In the last few decades of the
­narrative on Malloy’s own authoring soft- twentieth century, modern computing and
ware, Narrabase, however, constitutes the network technology materialized certain key
1987–8 commercial packaging on three projects of modern art developed approxi-
5.25-inch floppy disks. Often described mately at the same time’ (2003: 15). For
as a trilogy, the segregation between discs Manovich, the creative potential offered by
was more of a consequence of storage con- emerging technologies ‘actualized the ideas
straints rather than any narrative construc- behind projects by artists, they have also
tion. Version 4.0 was produced in 1988 for extended them much further than the artists
galleries and other venues that used PCs originally imagined’ (2003: 15). Manovich,
instead of Macintosh computers. These like many other theorists and critics operat-
four early versions were produced almost ing in this domain, recognises the difficulty
simultaneously with Malloy starting one in isolating, with any certainty, the exact
version and developing another alongside origins of the field.5 In many respects, this is
it. The fifth version – the one produced for to be expected, as most artistic movements
the Web – came seven years later in 1995 do not result from a Big Bang-like reaction,
and recompiles the work into one environ- instantly materialising out of a collision
ment (Grigar, 2015). This version, accord- between aesthetic forces. Artistic and literary
ing to Malloy, is the authorised version and movements are the product of slow, and
reflects Malloy’s awareness of the chang- sometimes unintentional, exchanges with the
ing audience from that of the intellectu- changeable affordances of form and content
ally elite one of The WELL and the more – the same fluctuations are usually happen-
mainstream one of the Web (Moulthrop ing in multiple places at once, situated in
and Grigar, 2017: 93–106). The final digi- varying contexts, but all contributing to the
tal version is the emulation of Version 4 emergence of something other than what has
with edited content of Version 5, produced gone before.
for the DOSBox Emulator. It creates, to a But there is a second, more explicit reason
certain extent, some of the unique features as to why it is difficult to identify the origins
of the pre-Web experience. This trend, re- of electronic literature – the lack of a deliber-
releasing first-generation Web-based e-lit, ate historical account of the field:
as Eastgate Systems, Inc. has done with
afternoon: a story and Patchwork Girl, both …future theorists and historians of computer
on flash drive, may become a trend as inter- media will be left with not much more than the
est in early digital literary work continues to equivalents of the newspaper reports and film
grow. Thus, as we have pointed out, not all programs from cinema’s first decades. They will
find that analytical texts from our era recognize
e-lit started on the Web, but like most cul- the significance of the computer’s take-over of
tural artefacts influenced by the migration culture yet, by and large, contain speculations
to cloud technology, this is where it is end- about the future rather than a record and theory
ing up, and that has profound repercussions of the present. (Manovich, 2001: 6–7)
for the ways in which renovated pieces are
received, and indeed, how the movement is The paucity of deliberate historical accounts
conceived. of this field has been alleviated in recent
432 THE SAGE HANDBOOK OF WEB HISTORY

years,6 but considering the long history of the cannot experience a work, we should at
field, and its explosion since the introduc- least know that it existed – the sad reality
tion of the browser, more historical accounts of this field is that many of those literary
are needed. Projects like Pathfinders, which experiments that contributed to its emer-
provides a concerted effort to develop the gence have undoubtedly been lost, and it is
methodology for a detailed documentation the fault of the academy that they cannot be
of works that strives to capture their multi- recovered. The contemporary situation is
media and interactive features, follows in the worsening in a marketplace where authors
footsteps of other digital literary histories in are distributing their work on proprietary
contributing to the future of electronic lit- cloud-based architectures maintained by
erature by documenting its past. Literature profit-motivated organisations like Google
has many pasts, and the past that Grigar and and Apple. It may well be that the issue to
Moulthrop are documenting in Pathfinders which Manovich points will only amplify,
focuses on the digital literary experiments to the point where we will only ever have
that emerged in the late eighties to mid nine- an awareness of popular – and thus pre-
ties, and continued to grow and develop into served – first-generation and contempo-
what we now call ‘e-lit’. Pathfinders takes an rary electronic literature, with nothing in
unusual approach to its research methodol- between.
ogy in that it asks authors of early digital lit- This has ramifications beyond this particu-
erature to perform their work on a computer lar community. Take issues around gender,
and with software with which the work was for example. Malloy programmed and coded
originally intended to be read at the time of Uncle Roger by hand with UNIX Shell Scripts
its publication. Grigar and Moulthrop vide- (Version 2), AppleSoft BASIC (Version 3),
otape this performance, as well as record an GW-BASIC (Version 4), and HTML (Version
interview with the author. They also bring in 5). As notable was the authoring software
two additional readers to interact with and she created for e-lit, Narrabase developed at
share the experiences of the work, which approximately the same time as Storyspace
is also recorded. Much has been gleaned and George Landow’s Intermedia. These
from these interactions, the materials ensur- achievements are significant from a femi-
ing that future generations will be able to nist perspective in that it shows that women
appreciate the origins of the field, achieving were among the pioneers of this creative
a sense of the historical and cultural frame- and technological movement, a reality that
work of which one must be aware if they are is often neglected (Walker Rettberg, 2012a).
to critique a work within the contexts of its The history of electronic literature is not just
precursors. While Pathfinders has under- a history of computational art, it is a social
taken in-depth research into a select few and cultural history that has relevance far
works, the fact remains that there are count- beyond the literary. Such a lack of visibility
less more which are yet to be documented may well be one of the chief contributors
and written about, and as a result, preserved. to the ambiguity that surrounds our disci-
Without this work, knowledge about these pline. Electronic literature is increasingly
precursors, and the expression they sought piquing the interest of literary scholars, and
in the digital, would be lost. indeed, finding its way into the third-level
The vital work being accomplished by curriculum, but, in many respects, it remains
initiatives like Pathfinders should serve esoteric. This may well be due to the inher-
as a warning to the current generation, ent computational element, which puts its
who have mistaken the ease by which the critique beyond the expertise, and perhaps
digital can be disseminated for protec- interest, of many scholars, but it may also be
tion against obsolescence. Even when one owing to this historical blind spot.
THE ORIGINS OF ELECTRONIC LITERATURE AS NET/WEB ART 433

THE RISE OF E-LIT AS NET ART culture, noting that their effects are com-
mutative in essence – the computer is being
A definite milestone in the advent of digital shaped by culture, as much as culture is being
literature as something more than merely transformed by the computer:
hypertextual was the publication of the ELO’s To use another concept from new media, we can
first Electronic Literature Collection in October say that they are being composited together. The
2006 (Fig. 29.3), described as ‘the first major result of this composite is a new computer culture
anthology of contemporary digital writing’ – a blend of human and computer meanings, of
traditional ways in which human culture modeled
(Funkhouser, 2007b). Edited by Hayles, Nick
the world and the computer’s own means of rep-
Montfort, Scott Rettberg, and Stephanie resenting it. (2001: 46)
Strickland, the collection marks the progres-
sion towards increasingly multimodal forms. This bi-directional exchange between culture
But the anthology was more than just a dem- and computation is key to the evolution of
onstration of the ‘the perpetual metamorphosis electronic literature. The Web transformed
of electronic literature’ (Marino, 2008); it also the ways in which artists looked at composi-
heralded a new era for e-lit – an era in which tion – quite simply, authors recognised where
literatures of the Web would come to promi- their audiences were going, and started writ-
nence in an increasingly paperless world. ing for the Web. Such a conscious decision
Manovich is acutely aware of the sym- had profound repercussions for the history of
biosis that exists between computation and the form, as it determined the technologies

Figure 29.3 Electronic Literature Collection: Volume 1, from collection.eliterature.org.


434 THE SAGE HANDBOOK OF WEB HISTORY

and literary techniques available to e-lit prac- fact and fiction, open to uninhibited human
titioners. To say that the Web had a profound intervention, the hallmark of creativity. But
influence on the field might seem like an if we are to trace the emergence of electronic
obvious statement – the Internet was a game literature in the context of Net art in a man-
changer, there is nothing revelatory in this – ner that is useful to this volume’s readership,
but the extent of this influence far outweighs we must do so in a slightly more conservative
that felt by e-lit’s counterparts. The broader manner, primarily so that we avoid confusing
publishing industry, for example, recognised technological innovation with literary devel-
the disseminative potential of e-books and opments – they are related, but not quite an
sought to remediate print literature in a bid amalgam.
to capture a share of the emerging e-reader On October 27th, 2016, Rhizome pre-
market. But print has retained its domi- miered its Net Art Anthology,7 an iterative
nance of this industry, and while the Web has online exhibition that is seeking to retell the
undoubtedly transformed the ways in which history of Net art from the 1980s through the
publishers operate, the codex is thriving. The present day, restaging 100 artworks across
same cannot be said of electronic literature, four pre-determined periods: 1984–98,
which has undergone more complex trans- 1999–2005, 2006–11, and 2012–18. The
formations in the age of the Web, with most series aims to ‘take on the complex task of
e-lit now, as already noted, written for, not identifying, preserving, and presenting exem-
just disseminated via, Web technologies – plary works in a field characterized by broad
editions circulated as hardware are now seen participation, diverse practices, promiscuous
as collector items, or intentional acts of art- collaboration, and rapidly shifting formal and
istry and preservation. aesthetic standards, sketching a possible net
When exploring the history of the Web in art canon’ (Connor, 2016). The Anthology
the context of artistry, there is a danger of con- also includes a useful definition of Net art
flating channel with constituent. Historicising ‘as an expansive, hybrid set of artistic prac-
digital art, Manovich asserts that the ‘greatest tices that overlap with many media and disci-
hypertext is the Web itself’, contending that plines’ (Connor, 2016), but our examination
it is even ‘more complex, unpredictable and revolves largely around works which might
dynamic than any novel that could have been be considered literary in the sense that, while
written by a single human writer, even James incorporating the ludic,8 the visual, and aural,
Joyce’ (2003: 15). This is an arbitrary com- they privilege language.9 While unfinished at
parison in many respects; a fresh example the time of writing, the 1984–98 section of
of the old dichotomy: the Internet as hyper- the Rhizome collection is live, and thus rep-
text is an act of communication, whereas resents an ideal snapshot of the era in which
writing in the context of art is an act of lit- the Web emerged from its academic begin-
erature – one transaction is about the clear nings to become a greater part of the pub-
exchange of information, the other about lic sphere. Entries in this period reveal how
unrestrained expression. The latter may well early interactions with the Web reshaped key
have connections to the former, but they are literary notions around language, space, and
not interchangeable in a sense that one criti- ideology.
cism can be readily applied to any technology The advent of the Web heralded deep socio-
intended for broader social utility. The Web is cultural progression, but, in a more pragmatic
undoubtedly the zenith of hypertextuality, but sense, it was also the realisation of HTML as
it is a social hypertext, socially complex and the principal standard through which text was
irresolute; a vessel, not an artwork in itself. to be semantically structured – authors who
Perhaps Manovich is right to treat the Web as wished to create text-based Net art would be
one great work of art, as a great anthology of writing in HTML. The relationship between
THE ORIGINS OF ELECTRONIC LITERATURE AS NET/WEB ART 435

form and content is such that the structural the 1990s were dominated by literary Net art
semantics imposed by HTML would have that privileged the word. But the decade also
forced authors to think about their writing in witnessed considerable advances across other
different ways, adapting their creative pro- screen media, particularly cinema and video
cesses to a pre-defined system of elements gaming, and so many of these works were
and sub-elements, embedding, and selective situated within a progressively visual cul-
rendering. It is in this sense interesting that ture. We see the foundations of this conver-
we use the term ‘Net’ art, when the Net was gence between language, the visual, and the
a largely linear underlying technical struc- ludic in the work of authors like Olia Lialina.
ture that would go on to facilitate the ‘Web’ Rhizome’s Net Art Anthology describes the
as a more omnidirectional means of order- former’s ‘My Boyfriend Came from the War’
ing information as a constellation – the net (1996) as a piece which ‘highlights the par-
influenced the dissemination of art, true, but allels and divergences between cinema and
it was the Web, the idea of hypertextuality, the web as artistic and mass mediums, and
that arguably brought about the greater cul- explores the then-emerging language of the
tural and artistic repercussions. One could net’ (2016). Lialina herself remarked: ‘If
quite readily argue that electronic literature’s something is in the net, it should speak in net.
origins are not as Net art, but as Web art. language’ (Lialina, 2016), accentuating the
Furthermore, authors had to re-think how to demarcation between the Web as a platform
transact with the readers – HTML structures for dissemination, and the Web as an autho-
literary content, but its appearance to an audi- rial instrument.
ence is determined by Web technologies like Beyond the multimodality of HTML, art-
CSS and JavaScript. All writing is subject to ists saw the Net as a space for the renewed
constraint, but authorial media selections are politicisation of art, a space in which they
about the recognition of affordance; what it is could better immerse their audiences within
that a particular medium can do in the service ideological frameworks. Art had always been
of an author’s aesthetic vision. The Web’s political, but now it could be truly public, and
earliest authors traded the constraints of the somewhat participatory. Martine Neddam
page for those of the screen, and that meant used her fictional ‘Mouchette’ (1997) to
writing differently, adapting their processes tackle the issue of suicide, inviting partici-
to allow for such constraint so as to avail of pants to contribute to the topic, thus render-
the potentialities. ing her adolescent persona ‘a character who
Authors had already recognised the aes- doubles as a platform of exchange’ (Neddam,
thetic possibilities of hypertextual narratives 2016). While The File Room (1994) existed
by the time artists began writing for the Web as an installation piece at various points
– there is little structural difference between throughout its existence, Antoni Muntadas
the work of the Eastgate School and that of undertook the project knowing that it could
Net art’s earliest writers. Nonetheless, the never be completed (Muntadas, 2016). The
rise of the Net had a profound effect, drawing publication of these artworks came along-
authors towards a public space composed of a side the rise of Web 2.0, which heralded the
language designed to facilitate multimodality shift towards Net platforms as dynamic sys-
– the transaction between creator and audi- tems composed through mass participation. It
ence had fundamentally changed, and would was an era in which the transaction between
continue to change as the Web advanced producer and consumer became an act of co-
towards more participatory principles, col- creation. Burgess and Green discuss this in
lapsing, in some cases, the distinction the context of popular sites like YouTube,
between producer and consumer. HTML is a and how it transitioned ‘from the idea of
textual system, and so it is unsurprising that the website as a personal storage facility for
436 THE SAGE HANDBOOK OF WEB HISTORY

video content to a platform for public self-­ Prix Ars Electronica – Shulgin used partici-
expression’, a transition that amounts to a patory mechanisms to reinforce the ideology
‘user-led revolution’ (2013: 4). of his piece.
For all the rhetoric of democratisation that Art which interacts with capitalist tools
accompanies participatory concepts, the con- and processes is always in danger of advanc-
vergence between art and the affordances of ing such instruments, a tension which Shulgin
the Web increases the potential for corporate expressed in relation to his artistic movement:
forces to exert control over public expression.
As noted by Lanier, users of the Web, how- In general, now I am having mixed feelings about
early net art because I see how the strategies
ever participatory it may be at present, always developed by net artists are now used by corpora-
‘lose control of their own personal content’ tions and in politics. But that’s the destiny of avant-
(2013: 207). The utopia of Web 2.0 belies garde art – developing communication and
something of a dystopic reality, wherein the aesthetic tools for the future capitalists and politi-
crowd, and in this context, the authors and cians. (2017)
artists, are simply doing the creation for the
producers, but sharing in few of the spoils, His prediction proved accurate in some
and having no say in the governance. But respects: many artists and critics recognised
the same could be said of most institutions the power of the participatory, of the role of
which support the creation and dissemination the receiver in the creative process, long
of artistic practice, and if one is to subvert before major commercial entities realised
a system, it is best to do so through direct that survival in the Web 2.0 era was depend-
engagement. It is in such a context that we ent on their capacity to embrace ‘the power
can appreciate Net art as being political in of the web to harness collective intelligence’
itself, not merely a form open to the encap- (O’Reilly, 2010: 230). But in many respects,
sulation of political content. Shulgin’s ‘Form the converse of what Shulgin anticipated has
Art’ (1997) was published some 20 years turned out to be true, wherein artists now
after the earliest literary games, titles like tend to reuse commercially driven tools for
Will Crowther’s Adventure (1976), and Zork creative purposes. Artistic appropriation of
(1977/80), but it is nonetheless historically capitalist media technologies is a disruptive
significant, transforming ‘the most bureau- act which serves to subvert the profit motives
cratic, functional, and unloved aspects of the driving the development of such tools. Flash,
web into aesthetic, ludic elements’ (Shulgin, Twitter, Unity – these are all examples of
2016). Drawing readers through a series of proprietary systems and spaces which artists
inane Web forms, ‘Form Art’ challenged its have distorted and occupied for the purposes
audience to traverse the maelstrom of forms of facilitating some artistic purpose. In this
as if they were playing a game that they could respect, the history of art and literature and
win or lose. In this respect, Shulgin was one the Net is one of subversion.
of the first artists to really question the per- Distinct communities have been treated in
vasiveness of emerging Web technologies tandem here in the sense that practitioners
and the ways in which the interface has dic- within the Net art community do not always
tated our use of such innovations. Shulgin identify with those from the e-lit movement.
describes the ‘Form Art’ as ‘a declaration Grigar explores this issue in some detail:
of the fact that a computer interface is not a
“transparent” invisible layer to be taken for For the most part, digital media theory has been
dominated by scholars and critics trained in for-
granted, but something that defines the way
malistic theories of cinema and visual art. Lev
we are forced to work and even think’ (2017). Manovich uses Russian formalism, for example, as
By creating a competition that solicited other his lens for formulating views of ‘new’ media,
submissions of form art – a critique of the while Oliver Grau focuses his attention on Italian
THE ORIGINS OF ELECTRONIC LITERATURE AS NET/WEB ART 437

Futurism. What chance does an emergent form not been to privilege certain works over oth-
with literature in its name have when faced with ers, but rather to trace some of the key trends
such a strong art history perspective? Likewise,
that are of relevance to the history of elec-
Stephen Wilson devotes little attention in his 900+
page book Information Arts to early hypertext tronic literature as net art, and indeed, empha-
work with no mention of more contemporary elit sise the one consistent element of all of these
pieces. That ‘net art’ became the name of choice works – language. Whatever term you think
for some working in the area of web-based elit most appropriate to such works, be it ‘net art’
should come as no surprise under these circum-
or ‘e-lit’, the word as text is the dominant
stances since the term ‘literature’ in the name of
elit may have limited its inclusion in media art fes- force throughout. The history of electronic
tivals, exhibitions, and art scholarship. (2008) literature as net art is a history of artists – be
they authors or otherwise – responding to the
There are, of course, figures who have tran- affordances and constraints of the Web as a
scended often arbitrary distinctions like space for writing. The Web was not the first
‘community’ and ‘movement’, such as Mez digital technology which writers adopted,
Breeze, an artist and writer whose work has but it is undoubtedly one of the most signifi-
appeared in both the Net Art Anthology (2016) cant. This significance is represented in the
and the third volume of the ELO Collection reality that those artistic trends that emerged
(Campbell and Breeze, 2012), but, typically, throughout the 1990s are still dominant today
artistic groups relevant to this space have in the prospering modal exchange between
operated as considerably independent cohorts. literature, sound and visual art, video games,
Mark Amerika figures largely as a cross-disci- and an increasingly diverse range of realities.
plinary artist who straddles media art gallery
installations and e-lit performance space.
Furthermore, this treatment is largely
Anglocentric in its focus, and for every artist, THE CONTEMPORARY SITUATION
artwork, or artistic group mentioned, many
more have been omitted. But the purpose in While electronic literature had an existence
drawing reference to these few examples has before the Web, its beginnings and evolution

Figure 29.4 All the Delicate Duplicates, by Mez Breeze and Andy Campbell (2013).
438 THE SAGE HANDBOOK OF WEB HISTORY

can be tied to the cultural diffusion of the rate, so too will our notions of what it means
Internet age – networking changed the way to be contemporary in this domain. Where
authors thought about writing, its processes, authors once wrote in HTML, they now code
and dissemination. Hayles describes elec- in Javascript; where they once animated in
tronic literature as that which is usually Flash, they now build worlds in Unity; where
meant to be read on a screen, but the field’s they once displayed their work on screens,
recent history is one in which most works they now immerse their audiences in aug-
have been designed to be read in a browser. mented and virtual realities. As it stands,
Perusing the Electronic Literature most of the form’s emerging trends are inde-
Organisation’s collections, one encounters a pendent of internetworking, shifting from
high volume of works which rely on Web the Web back towards localised systems,
technologies. The influence of the Net is and indeed, experiences driven by physical
waning in contemporary contexts – as the peripherals like headsets. The reality of the
fields of electronic literature and video practice is that most authors do not have the
gaming continue to converge, the Web is resources nor the desire to develop such lit-
gradually reverting to its disseminative role. erary spaces as participatory. Contemporary
Where authors once predominantly wrote in electronic literature, certainly, at the level
the languages of the Web, electronic litera- of critical acclaim enjoyed by the likes of
ture’s localised traditions are seeing some- Breeze and Campbell (O’Sullivan, 2017), is
thing of a resurgence with the rise of literary about downloading and installing – the flop-
games. The critical discourse surrounding pies have made a resurgence. The Net has had
electronic literature and video games draws its moment, and the Web, while still visible
from a large pool of terms – interactive fic- in the aesthetics of electronic literature, is
tion, digital literary, literary games – but all slowly being relegated to a means of distribu-
of these concepts point to works with similar tion – but this should not diminish the legacy
essential traits: they combine the ‘ludic (from of either, which will undoubtedly rise again.
the Latin ludus: game or play) and literary
(from Latin littera: alphabetic letter, or plural
litterae: piece of writing) elements’ (Ensslin, Notes
2014: 1). We can think of electronic literature 1  Many early works of electronic literature were
with ludic traits quite simply as ‘creative produced for Macintosh computers. It wasn’t
media that has both readerly and playerly until the Windows operating system was avail-
characteristics’ (Ensslin, 2014: 1). able for PCs that e-lit works for that platform
The creators of contemporary electronic became common.
2  afternoon, a story was not published by Eastgate
literature are availing of increasingly sophis- until 1990, though an earlier incarnation had
ticated technologies to develop complex nar- been released three years prior, at the 1987 Asso-
rative spaces, such as those typified by the ciation for Computing Machinery conference.
productions of The Chinese Room, or the 3  For more on The WELL, or Whole Earth ‘Lectronic
work of Mez Breeze and Andy Campbell. Link, see http://www.well.com/aboutwell.html.
4  Details on the Pathfinders project can be found
The latter’s recent piece, All the Delicate at http://dtc-wsuv.org/wp/pathfinders/, while the
Duplicates (Fig 29.4), exemplifies a new project’s open-access book, published in Scalar, is
wave of electronic literature wherein the lit- at http://scalar.usc.edu/works/pathfinders/index.
erary is juxtaposed with the technologically 5  By ‘origins’, we do not refer to works that might
immersive to forge game-like literary spaces be considered the first of many – there is little
to be gained from extensive exploration of what
that are shared, but not played, online. came when, as all that this tells us is the order
Electronic literature is utterly reflective in which particular pieces gained publicity; rather,
of the mechanics of its era, and as technol- we see origins as being concerned with the com-
ogy continues to progress at an exponential plexities of the entire network of practice that
THE ORIGINS OF ELECTRONIC LITERATURE AS NET/WEB ART 439

was operating throughout the era in which e-lit Funkhouser, C. (2007b) Electronic Literature
was formed. Origin stories are less about chronol- circa WWW (and Before). Electronic Book
ogy than they are contribution. Review. (http://www.electronicbookreview.
6  Readers interested in further exploring the literary com/thread/electropoetics/collected).
history of this field might consider one of many
Accessed March 16, 2017.
detailed accounts Funkhouser (2007a, 2007b);
Kac (2007); di Rosario (2011); Walker Rettberg
Grigar, D. (2008) Electronic Literature: Where Is
(2012b); Emerson (2014); Pawlicka (2014); It? Electronic Book Review. (http://www.
­Rettberg et al. (2015); Flores. electronicbookreview.com/thread/techno-
7  Unfinished at the time of writing, Rhizome’s Net capitalism/invigorating) Accessed March 16,
Art Anthology can be found at https://anthology. 2017.
rhizome.org/. Grigar, D. (2015) ‘The Structure of Uncle
8  By ludic, we mean relating to ‘play’. Roger’, in Pathfinders: Documenting the
9  For more detailed disambiguation of the different Experience of Early Digital Literature. Van-
forms of digital art, and the means by which lan- couver, Washington: Nouspace Publications.
guage plays a role in this process, readers should
(http://scalar.usc.edu/works/pathfinders/the-
see Ensslin’s literary-ludic spectrum (2014: 43–5).
s t r u c t u r e - o f - u n c l e - r o g e r- b y - d e n e -
grigar?path=malloys-critical-essays)
Accessed April 9, 2017.
Grigar, D., and Moulthrop, S. (2015) ‘History of
REFERENCES Judy Malloy’s Uncle Roger’, Pathfinders: Doc-
umenting the Experience of Early Digital Lit-
Breeze, M. (2016) ‘Mezangelle’, in Connor, M. erature. Vancouver, Washington: Nouspace
(Ed.), Net Art Anthology. Rhizome. (https:// Publications. (http://scalar.usc.edu/works/
anthology.rhizome.org/mez-breeze) pathfinders/history-of-judy-malloys-uncle-
Accessed May 2, 2017. roger?path=judy-malloy) Accessed April 9,
Burgess, J., and Green, J. (2013) YouTube: 2017.
Online Video and Participatory Culture. Hayles, N. K. (2008) Electronic Literature: New
Cambridge & Malden, Massachusetts: Polity Horizons for the Literary. Notre Dame, Indi-
Press. ana: University of Notre Dame Press.
Campbell, A., and Breeze, M. (2012) ‘The Dead Heckman, D., and O’Sullivan, J. (2018) ‘Elec-
Tower’, in Boluk, S., Flores, L., Garbe, J., and tronic Literature: Contexts and Poetics’, in
Salter, A. (Eds.), Electronic Literature Collec- Price, K. M. and Siemens, R. (Eds.), Literary
tion: Volume 3. Cambridge, Massachusetts: Studies in a Digital Age. New York: Modern
Electronic Literature Organization. (http:// Language Association. (https://dlsanthology.
collection.eliterature.org/3/work. mla.hcommons.org/electronic-literature-
html?work=dead-tower) Accessed May 2, contexts-and-poetics/) Accessed Feb 1, 2018.
2017. Kac, E. (2007) Media Poetry: An International
Connor, M. (Ed.) (2016) Net Art Anthology. Anthology. Bristol and Chicago: Intellect
Rhizome. (http://anthology.rhizome.org/) Books.
Accessed April 10, 2017. Landow, G. P. (1992) Hypertext: The Conver-
Emerson, L. (2014) Reading Writing Interfaces: gence of Contemporary Critical Theory and
From the Digital to the Bookbound. Minne- Technology. Baltimore and London: The
apolis, Minnesota: University of Minnesota Johns Hopkins University Press.
Press. Lanier, J. (2013) Who Owns the Future?
Ensslin, A. (2014) Literary Gaming. Cambridge, London: Allen Lane, Penguin.
Massachusetts: The MIT Press. Lialina, O. (2016) ‘My Boyfriend Came Back from
Flores, L. I ♥ E-Poetry. I ♥ E-Poetry. (http:// the War’, in Connor, M. (Ed.), Net Art Anthol-
iloveepoetry.com/). Accessed May 2, 2017 ogy. Rhizome. (https://anthology.rhizome.org/
Funkhouser, C. T. (2007a) Prehistoric Digital my-boyfriend-came-back-from-the-war)
Poetry: An Archaeology of Forms, 1959– Manovich, L. (2001) The Language of New
1995. Tuscaloosa, Alabama: The University Media. Cambridge, Massachusetts: The MIT
of Alabama Press. Press.
440 THE SAGE HANDBOOK OF WEB HISTORY

Manovich, L. (2003) ‘Mew Media from Borges Pawlicka, U. (2014) Towards a History of Elec-
to HTML’, in Wardrip-Fruin, N. and Montfort, tronic Literature. CLCWeb: Comparative Lit-
N. (Eds.), The New Media Reader. Cam- erature and Culture, 16(5). (http://docs.lib.
bridge, Massachusetts: The MIT Press. pp. purdue.edu/clcweb/vol16/iss5/2) Accessed
13–25. March 10, 2017.
Marino, M. C. (2008) The Electronic Literature Rettberg, S., Tomaszek, P., and Baldwin, S.
Collection Volume I: A New Media Primer. Digi- (2015) Electronic Literature Communities.
tal Humanities Quarterly, 2(1). (http://digitalhu- Morgantown, West Virginia: West Virginia
manities.org/dhq/vol/2/1/000017/000017. University Press.
html) Accesed March 1, 2017. di Rosario, G. (2011) Electronic Poetry: Under-
Moulthrop, S., and Grigar, D. (2017) Traversals: standing Poetry in the Digital Environment.
The Use of Preservation for Early Electronic (https://jyx.jyu.fi/dspace/bitstream/handle/12
Writing. Cambridge, Massachusetts: The 3456789/27117/9789513943356.pdf)
MIT Press. Accessed October 10, 2013.
Muntadas, A. (2016) ‘The File Room’, in Shulgin, A. (2016) ‘Form Art’, in Connor, M.
Connor, M. (Ed.), Net Art Anthology. Rhi- (Ed.), Net Art Anthology. Rhizome. (https://
zome. (https://anthology.rhizome.org/the- anthology.rhizome.org/form-art) Accessed
file-room) Accessed May 3 2017. April 13, 2017.
Neddam, M. (2016) ‘Mouchette’, in Connor, Shulgin, A. (2017) A Net Artist Named Google.
M. (Ed.), Net Art Anthology. Rhizome. (http://rhizome.org/editorial/2017/jan/12/a-
(https://anthology.rhizome.org/mouchette) net-artist-named-google-1/) Accessed April
Accessed May 4, 2017. 13, 2017.
O’Reilly, T. (2010) ‘What Is Web 2.0? Design Walker Rettberg, J. (2012a) ‘Electronic Litera-
Patterns and Business Models for the next ture Seen from a Distance: The Beginnings of
Generation of Software’, in Donelan, H. M., a Field’. Dichtung Digital: A Journal of Art
Kear, K., and Ramage, M. (Eds.), Online and Culture in Digital Media (41). (http://
Communication and Collaboration: A www.dichtung-digital.de/en/journal/
Reader. Abingdon, Oxon: Routledge. pp. archiv/?postID=278) Accessed Jan 23, 2018.
225–235. Walker Rettberg, J. (2012b) The History of the
O’Sullivan, J. (2017) ‘Electronic Literature’s Term ‘Electronic Literature’. jill/txt. (http://
Contemporary Moment: Breeze and Camp- jilltxt.net/?p=2665) Accessed Jan 23, 2018.
bell’s All the Delicate Duplicates’. Los Ange- Wardrip-Fruin, N. (2008) ‘Reading Digital Litera-
les Review of Books. (https://lareviewofbooks. ture: Surface, Data, Interaction, and Expressive
org/article/electronic-literature-turns-a-new- Processing’, in Schreibman, S. and Siemens, R.
page-breeze-and-campbells-all-the-delicate- (Eds.), A Companion to Digital Literary Studies.
duplicates/) Accessed Nov 11, 2017. Oxford: Blackwell. pp. 163–182.
30
Exploring the Memory of the First
World War Using Web Archives:
Web Graphs Seen from Different
Angles
Va l é r i e B e a u d o u i n , Z e y n e p P e h l i v a n a n d P e t e r
Stirling

INTRODUCTION This particular part of the project was dedi-


cated to creating a map of websites related
Web archives are a rich source for perform- to the war. Because websites have a purpose
ing hyperlink network studies within the (here, history and memory), it is important
social sciences, but their use creates meth- to understand what activities they support,
odological questions in applying network and how, and which communities use them
analysis to archived web material. This chap- in order to interpret their structure and their
ter explores these general questions about the inter­relations. Our exploratory analysis aims
construction and analysis of web archives: to understand how the memory of the war is
how to define and archive a collection of socially organised on the web using hyper-
websites? How to extract and visualise link networks, and in particular to identify the
hyperlink information from the archives? social and publishing activity of amateurs.
How to interpret the network of websites? The approach chosen allowed us to study
This chapter is based on a specific case: an how this web space is organised and in par-
analysis of the network of websites related to ticular the place of a discussion forum which
the First World War with the aim of under- was identified as having an important role for
standing how the collective memory of the WWI history (Amar and Chevallier, 2013).
war is constructed online. The methodol- The chapter aims to provide, first, an in-
ogy presented for representing a network of depth look at different methodological chal-
websites using a web archive collection was lenges for network analysis of large web
developed as part of a project, ‘The future archives, including the importance of deci-
of digitised heritage online: the example of sions regarding collection construction; sec-
the Great War’, which aimed to study the ond, an overview of the relative strengths
circulation of digitised documents online. of different strategies for reducing ‘noise’
442 THE SAGE HANDBOOK OF WEB HISTORY

within large collections and creating a net- the researchers. The second section discusses
work of websites that support the research- how the metadata from the corpus was used to
er’s work; and, finally, a discussion of the generate graphs, and in particular the three dif-
importance of qualitative work in addition to ferent strategies that can be combined to pro-
quantitative network analysis. duce networks that best support the work of
The analysis was based on the internet interpretation. Finally, the last section explores
legal deposit collections of the Bibliothèque the network of WWI websites to analyse the
nationale de France (BnF), and specifically online collective memory of WWI, shows how
a collection of websites selected for a BnF the interpretation of the network of websites
crawl on the centenary of WWI. This collec- was enriched by qualitative analysis (content
tion was defined by BnF librarians and part- analysis and interviews of contributors) and
ners independently of the research project, underlines the value of working on web col-
allowing the researchers to analyse a ration- lections prepared and archived by libraries.
ally constructed collection with a collection
policy aimed at representing all aspects of
the WWI web space, with each site classi-
fied by type of producer. The use of the BnF SOURCES: CREATING THE
web archives therefore allowed the creation COLLECTION
of a stable, well-documented corpus, the
long-term preservation of which is assured Using web archives as a source for network
(Stirling et al., 2012a). For researchers study- analysis of the web brings many advantages: it
ing the web archives, it is necessary to under- was decided to base the analysis on the inter-
stand how the collection was constituted, to net legal deposit collections of the BnF in
ensure that the analysis takes into account order to benefit from a stable, well-­documented
the nature of this kind of source. Web archiv- corpus that will be preserved over the long
ing implies fixing content that is in constant term, allowing the possibility for other
movement and evolution, by collecting the researchers to reproduce, complement or chal-
different elements of which websites are lenge the analysis that has been performed.
made, in such a way as to recreate the site as However, the nature of web archives must be
it was at the time of crawling. To achieve this taken into account in developing a research
end, a whole set of choices and processes are methodology. The web archive is an artefact
involved, concerning both technical and col- constructed from the live web, and the content
lection aspects, and determining the content and nature of the collection is therefore in
of the archives that the researcher will use. large part determined by different aspects: the
It is also necessary to consider technical lim- legal framework governing its construction,
its, as web archiving technologies frequently the technical characteristics used in its crea-
encounter difficulties due to the rapid evolu- tion and the collection policy choices involved.
tion of the technologies used on the web, and This section explores how these aspects deter-
indeed technologies specifically intended to mine the content of the corpus analysed and
block web crawlers. Finally, the legal deposit the elements that researchers must take into
framework also determines the form of the account in developing their methodology.
collection. This was the first use of data min-
ing approaches on these collections.
This chapter is organised in three sections. Web Archiving at the BnF: The
First, the creation of the collection is studied, Legal and Technical Framework
as well as the impact that the legal context, the
selection policy and the technical aspects of Legal deposit is used in France, as in many
the collection have on the corpus analysed by other countries, to ensure the preservation of
EXPLORING THE MEMORY OF THE FIRST WORLD WAR USING WEB ARCHIVES 443

web content (Stirling et al., 2012b); legal this becomes difficult even to imagine, due
deposit of the internet is entrusted to the not only to the volume of data but also the
National Audiovisual Institute (INA) for dynamic nature of the web, where it becomes
websites of television and radio stations and impossible to define a stable ‘publication’.
websites mainly dedicated to them, and the Legal deposit of the internet, rather than
Bibliothèque nationale de France for the rest being exhaustive, aims to be representative
of the French domain. The law ensures that regarding the French web. This approach is
web content is conserved in the same way as necessary because it is impossible to judge
publications on other media, but it is neces- in advance what might interest a researcher,
sary to adapt the principles of legal deposit to and legal deposit in any case does not make a
the nature of the web. judgement regarding the interest or value of
First of all, the idea of territoriality is cen- a given site. It is necessary therefore to allow
tral to legal deposit. Three criteria are used researchers the possibility to define corpora
to determine whether web content is French, within the collections.
each one being sufficient for the content to The BnF uses the open-source software
be covered by legal deposit: if the content Heritrix for web archiving (https://web
is on the top-level domain (TLD) .fr; if it is archive.jira.com/wiki/display/Heritrix/
produced by a person or organisation resi- Heritrix). This software, generally called
dent in France; or if it is produced in France. a robot or crawler, starts from a list of seed
This covers a large quantity of sites and web URLs and identifies links within pages to
content, but, as we will see, the legal scope collect all webpages and associated files
can mean ‘breaking’ links between French (such as images and stylesheets), within the
sites and those in other countries which could limits of the technical settings that have been
interest a researcher studying a corpus. defined. Some settings apply to all crawls,
Second, legal deposit concerns that which and specific settings are defined for each
is published, or ‘communicated to the pub- seed URL, depending both on the technical
lic by electronic means’, in the expression characteristics of the site and the target of the
used in the law. Information transmitted via crawl defined by the collection policy (Le
the internet but considered as private corre- Follic et al., 2012).
spondence does not come into the scope of These settings concern in particular the
legal deposit. This includes private emails depth of crawl: the robot can collect a whole
but also any information on social networks site (domain), a sub-site or a blog hosted on
accessible only to those that are logged in a platform (host), or a specific section of
and with the authorisation of the person who a site (path, page + 2 clicks). A budget in
has put the information online. It remains the terms of numbers of files collected is also
case, however, that the border between what defined, mainly in relation to the size of the
is public and what is private on the internet site, and a crawl frequency is set depending
is unclear, including for internet users them- on the nature of the crawl. Multiple crawls
selves; sensitive information, in particular allow the study of the evolution of sites over
that of a personal nature, could therefore be time, but imply multiple occurrences within
found in internet legal deposit collections. the archives. It is necessary for a researcher
The role of the researcher and of the analy- to have sufficient information to be able to
sis that is performed can therefore raise ethi- interpret these occurrences.
cal questions such as the anonymisation of These technical settings determine the
sources. content of the archives, which can only be a
Finally, it is important to note that legal representation of the site as it existed online:
deposit in all its forms aims in principle content may be missing and, at best, web
to be exhaustive. As regards the internet, archives cannot recreate the entire network
444 THE SAGE HANDBOOK OF WEB HISTORY

of links in which a site is placed, or the con- broadened to include partner institutions with
stant flow of information and updates. Web an interest in the subject, including regional
archives are thus a new kind of object for libraries in areas particularly affected by the
researchers to study; while indispensable for First World War: the Bibliothèque de docu-
studying the web, they are an artefact created mentation internationale contemporaine
from the web and not the web itself, their (BDIC), the Ministry of Defence section for
form largely determined by the technology Memory, Heritage and Archives (DMPA),
used to create them. In this they can be com- the Bibliothèque nationale et universitaire de
pared to other sources used by researchers, Strasbourg (BNUS) and Amiens Library.
which have their own characteristics in terms The first step was to define the scope of the
of what is missing, and the choices and limi- collection. While it was originally planned to
tations in their creation. Part of the work of a concentrate on sites covering the centenary
researcher is to define and explain the nature and the commemoration of the First World
of the source and to take this nature into War, it was decided to broaden the scope to
account in any interpretation that is made. any content relating to WWI. This was partly
This is why it is vital for researchers to have due to the fact that the distinction between
an understanding of the technologies used to ‘commemoration’ and other approaches is
archive the web. difficult to define, and sometimes contro-
versial. In addition, the use of the crawl by
the project ‘The future of digitised heritage
Collection Policy and the WWI online: the example of the Great War’ was
already being considered, and this wider
Collection
scope would allow a study of the evolution
It is equally important for researchers to have of the sites over a longer period, with crawls
information about the choices that have been planned two to three times a year over the
made regarding the content of the web period 2013 to 2019.
archives, which also determine their form. As this scope is potentially very wide, dif-
The BnF has put in place a system combining ferent criteria were applied to ensure that the
broad and focussed crawls (Bonnel and Oury, collection remained coherent. Some general
2014) to fulfil its legal deposit mission. criteria are applied to all focussed crawls at
Broad crawls take place once a year and con- the BnF: that sites selected should be active,
cern a very large number of sites, based on and that they should provide original content,
lists of domain names registered in France, to avoid sites that merely duplicate content
while focussed crawls collect more fre- published elsewhere. Other guidelines were
quently and/or in greater depth a smaller specific to this crawl: to include all official
number of sites that have been selected by partners of the commemoration; to avoid
BnF librarians or by partners. In these concentrating on certain geographical areas
focussed crawls, the selection of seed URLs or types of media, etc. How much of a site
and the technical settings of the crawl are should be collected, in relation to the ‘depth’
defined in relation to a collection policy. setting described above, also depends both
‘The Great War on the Web’ is a focussed on the specific collection and the overall
crawl, which therefore follows a collection legal deposit framework. In general, a site
policy intended to ensure that there are no or blog is collected as a whole (that is, using
gaps or incoherencies in the collection. The the depth settings ‘domain’ or ‘host’), but in
crawl was created by the Philosophy, History cases where only part of the site is relevant
and Human Science Department at the BnF, the specific sections can be collected using
in collaboration with the Maps Department, the depth settings ‘path’ or ‘page + 2 clicks’,
and the group of selectors was progressively depending on size and amount of relevant
EXPLORING THE MEMORY OF THE FIRST WORLD WAR USING WEB ARCHIVES 445

content. Even in the latter case, from a legal • Personal


deposit point of view it is useful to widen • Association
this scope, as other content may interest a • Media
researcher; for example, when bloggers talk
about WWI, what else do they have on their While this kind of typology is very important,
blog? Different settings are thus applied to its use did create some problems. Sites and
different sites in the collection; however, each blogs can often be a mix of categories and
receives the same budget. The definition of require a closer analysis to decide where they
‘entities’ within the collection is therefore a should be placed in the typology; for exam-
combination of expert judgement (the selec- ple, the blog of a researcher may mention the
tion of a seed URL) and technical factors (the institution without being under its responsi-
settings applied) – to which can be added the bility. The distinction between ‘Official’ and
interpretation made by the researcher. ‘Public’ was intended to distinguish between
One of the main ways to structure the col- government sites and those produced by pub-
lection was that a typology of sites was estab- lic bodies but which do not have the same offi-
lished to guide the selection and ensure its cial role; this distinction was not always easy
coherence. This typology was based on the for the librarians to maintain, and also cre-
producer, that is, the organisation or individ- ated problems for the researchers. Similarly,
uals responsible for the site. This approach the ‘Association’ category was not limited to
has already been used for other crawls at the those organisations with the official status
BnF, in particular those covering electoral of an association under French law, creat-
campaigns, and ensures that the coverage and ing potential overlap with ‘Personal’ sites.
balance of the collection is as complete and Finally, the ‘Media’ category was intended
equitable as possible, while also providing a to cover news sites with sections specifically
means of dividing up the work of selection. It dedicated to WWI, but not all sites have such
can also be used in analysing the collection. a section. In this case the sites are generally
During the initial sampling the most com- already included as part of a separate dedi-
mon categories of sites were identified, using cated crawl of news sites by the BnF. This
both information in sections called ‘About means that articles from sites not selected for
this site’, ‘Who are we?’, etc., and additional this crawl should still be in the web archives
information from authority notices in the BnF of the BnF, but not as part of the ‘Great War
catalogue. These categories included institu- on the Web’ collection, meaning researchers
tions (political, scientific, heritage), educa- will have to decide whether to include them
tional sites (online resources, teachers’ blogs), in their analysis. The ‘Media’ category was
research sites (researchers, laboratories), asso- also broadened to include other sites of a
ciations (regional, genealogical, veterans), commercial nature, such as video games.
amateurs, media, and memorial tourism. These The selection was created progressively
categories were used to create a typology simi- and for the first capture, held in late 2013 to
lar to that used for previous collections: coincide with the start of the official events
relating to the centenary, it was decided to
• Official ensure that as a minimum the official sites
• National and other sites or sections specifically cre-
• Territorial ated for the occasion should be included; this
• European
corresponds largely with the ‘Official’ and
• Public ‘Public’ categories. This was then widened to
• Heritage include the other categories, starting with any
• Educational site that could be related to or commented on
• Scientific the centenary events, and from the third crawl
446 THE SAGE HANDBOOK OF WEB HISTORY

Table 30.1 Evolution of the collection ‘Great War on the Web’


Date Size of collection Number of URLs collected Volume

November 2013 41 sites 1,198,732 35,75 Gb


March 2014 99 sites 1,689,613 47,29 Gb
August 2014 482 sites 7,323,201 402,83 Gb
November 2014 555 sites 9,698,633 313,85 Gb

onwards to cover the overall scope outlined any obvious gaps in the selection, within the
above of sites about the First World War in criteria and the legal framework described
general. The collection continued to grow, above.
but more gradually than over these first few
captures (see Table 30.1). As noted above,
this is another aspect that researchers must
be aware of in analysing the evolution of the METADATA: GENERATING THE
collection over time, to distinguish between GRAPHS
changes in the sites themselves and changes
due to the addition of sites. This section explores the challenges that
Other methods were also used in the pro- must be overcome in order to build a network
cess of searching for and selecting sites. A list of websites from a web archive. In particular,
of keywords was established both to guide it studies the choices made to increase the
searches for relevant material and to describe relevance and the readability of the networks.
sites selected in the selection tool BnF Network analysis offers an opportunity to
Collecte du Web (BCWeb) (Le Follic et al., analyse the social organisation of a commu-
2012). This list was based on existing research nity of interest. In our case study, the aim is
on the First World War (the main sources to explore the social structure of online
used were: Cervoni and Cervoni, 2005; Loez, memory of the First World War: who is
2013; Meyniel, 2010), to avoid any gaps due involved? How are institutions and amateurs
to a lack of knowledge of a given theme: bat- connected? This interpretative work,
tles, diplomacy, life at the front, etc. described in greater detail in the section on
However, this vocabulary proved to be of the results below, must rely on an interpreta-
only limited use in web searching, due not ble and relevant hyperlink network.
only to the fact that terms frequently pro- The approach chosen was to generate a
duced too many or hardly any sites, but also directed network graph based on the BnF
that sites were often poorly referenced in web archives to study the relations, materi-
search engines. Two main ways of identify- alised by hyperlinks, between the websites
ing sites were thus used: in the WWI collection. Rather than using
the archived files directly, the analysis used
• Using links from sites already identified; official
both metadata extracted from the collections
sites frequently have lists of related sites, blog-
(including hyperlinks, which are considered
rolls indicate other sites on related subjects, etc.
• Searching by type of site: when one site has been as metadata representing the relationship
selected, other similar sites are identified, all between sites) and additional descriptive
sites of local archives, all tourist sites for regions metadata from the selection tool. This section
affected by battles, etc. describes the different methods of generating
graphs from these metadata that were tested
This strategy proved effective; the cartogra- in order to produce results that allow the
phy produced by the project did not identify researchers to study the research questions.
EXPLORING THE MEMORY OF THE FIRST WORLD WAR USING WEB ARCHIVES 447

Figure 30.1 Framework.

The World Wide Web is a vast network: extraction from the web archive files, with in
webpages can be represented as nodes in a particular the URL of each file collected as
graph, where the links between the nodes, well as an extraction of all the outgoing
known as edges, are the URL hyperlinks con- hyperlinks and their properties.
necting the pages. The analysis of these web The elements listed below are extracted
graphs is important for a number of purposes from the WAT files by using adapted Apache
in different domains such as crawler opti- Pig scripts from the Internet Archive:2
misation, information retrieval or commu-
nity detection. It has also become common • URL
practice to analyse web archives using web • Status: http response status code (ex: 404 page
graphs; however, it is not always clear how not found)
• Mime type: a two-part identifier for file formats
best to generate and interpret them. Brügger
and format contents transmitted on the internet
(2013) divides the challenges of network (ex: text/html or application/pdf)
analysis on web archives into two categories: • Outgoing links: list of hyperlinks found on the
constitutive challenges and practical chal- page
lenges. Practical challenges such as access to • Types of outgoing links, such as anchor tags (A),
the archive are beyond the scope of this chap- image tags (IMG)
ter, but we will study some of the constitutive • Date: crawled date of the file
challenges and propose some solutions.
Researchers using web archives work, These links are then aggregated according to
most of the time, with topic-centric collec- the specified aggregation level and a graph
tions (focussed crawls) which are dedicated file is generated in GEXF format, as shown
to specific subjects such as elections, WWI, in Figure 30.1.
etc. Crawling procedures have an impact on
the interpretation of graphs (Achlioptas et al.,
2009; Meusel et al., 2014; Serrano et al., Ensuring Quantity and Quality
2007), and this impact is more remarkable
for topic-centric collections, as the frequency While generating network graphs from web
and/or the depth of crawl of different sites archives, there are two main issues: quantity
can vary. Their heterogeneity should be taken and quality. The main problem with analysis
into account during graph analysis. Another and visualisation of graphs is their readabil-
important issue is content, which can be con- ity. Beyond a certain density, the analysis
sidered ‘off topic’ for the purposes of the algorithms cannot be executed on an ‘aver-
analysis, and is introduced into the collection age’ server or their results become uninter-
due to crawler settings or by outgoing links pretable (cf. Figure 30.4). Thus, the data
(whether or not these are crawled). should be filtered to decrease the quantity for
reasons of readability and interpretability.
As described earlier, the crawler begins
with one or more URLs that constitute a seed
From Metadata to Web Graph
list, and crawls each URL according to the
The web graphs are generated from metadata configuration criteria, such as frequency of
files in WAT format1; these files contain an crawl, depth of links to follow, etc. The seed
448 THE SAGE HANDBOOK OF WEB HISTORY

list is reliable because it is created by cura- • Host Aggregation: The nodes are aggregated
tors and by experts in the domain, but dif- based on a host of a URL, which is defined in
ferent factors can introduce ‘interference’ in RFC 1738 (Berners-Lee et al., 1994): informally,
the collection. The crawling procedure can everything after the protocol double slash and
archive pages not relevant to the subject of the following slash, but excluding port and
authentication information. The same example is
the analysis because it simply follows links
aggregated to the node ‘pages14-18.mesdiscus-
on the page. This is particularly the case for sions.net’
websites where a sub-section is dedicated to • Pay-Level Domain (PLD) Aggregation: The nodes
the topic. As the structure of the website is are aggregated based on pay-level domain, which
based on hypertext links, a part of these links is a sub-level domain of a top-level domain like
may not be related to the topic. These links, .com, .tr, etc. The example is aggregated to the
whether archived or not, appear in the list of node ‘mesdiscussions.net’ in this case.
outgoing links on a page and thus potentially
in the graph. We add a new method for generating graphs
Let us take the example of the page Le from web archives: seed-URL aggregation,
Monde, which proposes a sub-section dedi- where the nodes are aggregated based on the
cated to the Great War: http://www.lem- seed URL, that is the URL chosen as a seed,
onde.fr/Centenaire14-18. This seed URL is or starting point, to archive this source. This
reliable, but contains links to other sub-sec- information is drawn from the selection tool
tions unrelated to WWI, such as Economy, BCWeb. For the example above, the URL
Sport, etc., as well as links to external sites. is aggregated on ‘pages14-18.mesdiscus-
The crawler that follows the links to build sions.net’, which is the URL chosen by the
the corpus, or the outgoing links on pages librarians.
that are archived, potentially introduce a
large number of URLs which are not rel- Filtering
evant to the subject. In order to build a cor- Filtering techniques allow the reduction or
rect graph, it is necessary to eliminate this elimination of interference in the collection.
interference. This can be done by several methods, from
We present three different strategies that using advanced webpage classification algo-
can be used to solve these issues, separately rithms (Markkandeyan and Devi, 2015) to
or in combination: Aggregation, Filtering and manual selection. Classification approaches
Scope. can be divided into binary and multi-class.
Binary classification categorises instances
Aggregation into two classes; in our case these can be
To make large-scale graphs readable, the ‘relevant’ and ‘non-relevant’. To classify
usual method is the ‘aggregation of nodes’. each page, there is a need to work with its
This consists of grouping nodes that share content. In web archives, as we mentioned,
the same attributes and presenting them as a depending on crawler configuration, each
single node in the graph. Three aggregation outgoing link is not archived, thus we do not
methods can be distinguished (see Meusel have the contents of all pages to run a classi-
et al., 2015): fier. In addition, classification is often taken
as a supervised machine learning problem
• Page-Level Aggregation: The nodes are aggre-
where a set of annotated data is needed to
gated based on page, which is defined as a
single page within the web crawl, uniquely train a classifier and we do not have anno-
identified by its URL. For example, a page tated data for our corpus. On the other hand,
with URL http://pages14-18.mesdiscussions.ne the manual annotation at page level of the
t/pages1418/forum-pages-histoire/liste_sujet-1. entire archive content on this scale is often
htm is represented as a node. infeasible.
EXPLORING THE MEMORY OF THE FIRST WORLD WAR USING WEB ARCHIVES 449

The URLs of the archived webpages can Qualitative Analysis


contain the necessary information and can
offer an efficient way to classify pages as We can thus generate graphs more adapted to
relevant or not. It has been shown that URL visualisation by improving the quality of the
tokens are sufficient to classify the pages information and decreasing its quantity based
(Hernández et al., 2012). A recent work on three methods: aggregation, filtering and
(Souza et al., 2015) showed that the URLs scope. In this section we will compare the
of the archived web documents can contain combined effects of the three strategies pre-
semantic information and can offer an effi- sented in order to find the most useful com-
cient way to obtain initial semantic annota- binations. To the best of our knowledge,
tions for the archived documents. We tested there are no measures to compare these dif-
this solution to eliminate ‘off topic’ pages ferent strategies: thus, only a qualitative
from our corpus. A set of keywords (war, assessment is used.
1418, 1914, etc.) was defined by studying the Table 30.2 shows for each combination
seed URLs selected by curators. Links that of strategies the number of nodes and links
do not contain at least one of these keywords for the capture from August 2014. This col-
are eliminated. The result is very dependent lection was chosen to compare the strategies
on the choice of the keywords: while we can based on its medium size (7,323,201 URLs)
obtain a corpus without interference, much that facilitates the experiments and the vari-
relevant content can also be excluded. ety of seed URLs.

Scope Aggregation
The corpus can be restricted to the scope of In this section, we compare the visualisation
collection, in other words by using the seed of the host graph to the seed-URL graph. In
list and only keeping pages whose URLs/ both graphs, we keep only the URLs that
hosts match those of seed URLs. The alterna- match the seed list, in other words, we apply
tive is that all pages are kept, including out- ‘scope’ strategies to better observe the effects
going links not captured by the crawler. of different aggregation levels. To be able to
Restricting the corpus based on the seed visualise the differences between these two
URLs is reliant on the hypothesis that the graphs, the nodes have been filtered with
curators are able to find the most important respect to their degree, that is, the sum of the
sources to archive and the quality of the number of outgoing and incoming links: the
archive is assured. Without this restriction, nodes whose degrees are less than 80 have
the link analysis can also be used to discover been excluded from visualisation. It should
new pages to enrich the archive. be noted that degree filtering does not have

Table 30.2 Comparison of strategies for generating the graph


Aggregation level URL filtering Scope # Nodes # Links

Host Filtered Corpus 456 3,356


Host Filtered All 15,148 27,968
Host Not filtered Corpus 483 6,603
Host Not filtered All 252,207 521,414
Seed URL Filtered Corpus 419 2,274
Seed URL Filtered All 15,310 28,910
Seed URL Not filtered Corpus 462 3,469
Seed URL Not filtered All 252,399 525,460
450 THE SAGE HANDBOOK OF WEB HISTORY

the same impact on both graphs: aggregation the other hand, in Figure 30.2, they are aggre-
by host will group more nodes into one and gated in the facebook.com node with all the
therefore the distribution of degrees is links that have facebook.com as host. In addi-
different. tion, many pages contain outgoing links to
In Figure 30.2, corresponding to aggrega- social networks (in our example Facebook) to
tion by host, we note that the central place is promote the sharing of articles or the related
occupied by nodes like youtube.com and face- social media accounts. Here are some exam-
book.com, which can be considered as main ples of URLs that are located in the facebook.
poles, whereas these social network nodes dis- com node:
appear in the second graph (cf. Figure 30.3),
with aggregation by seed URL. We take the http://www.facebook.com/pages/Ministère-de-
example of facebook.com to illustrate the la-défense/11091233228
comparison. In the seed list, there are two http://www.facebook.com/sharer.php?u=http://
centenaire.org/fr/agenda?nid=2015
URLs chosen by the experts:

https://www.facebook.com/pages/Bicyclette- This increases the degree of the Facebook


Pliante-Gérard/254241421345914 node. When we examine the URLs aggre-
https://www.facebook.com/pages/Centenaire- gated to the Facebook node, we realise that
de-la-Première-guerre-mondiale-en-Haute- a very small part of the links concerns sites
Saône/400310596705578 chosen by curators; indeed, only those from
centenaire.org point to a Facebook page cho-
These two URLs represent two different sen in the seed list. This example illustrates
nodes in the seed-URL graph, which is not how the interpretation of the network visuali-
visible in Figure 30.3 as they do not have a sation can change based on the choices made
degree greater than our threshold (80). On during the generation of the graphs.

Figure 30.2 Hyperlink network visualisation with host aggregation (scope Corpus).
EXPLORING THE MEMORY OF THE FIRST WORLD WAR USING WEB ARCHIVES 451

Figure 30.3 Hyperlink network visualisation with aggregation by seed URL (scope Corpus).

Filtering 3 http://www.lemonde.fr/idees/article/2013/10/08/
In this section, we compare the effect of fil- l-europe-ce-n-est-pas-la-paix-c-est-la-
tering by keywords in URLs on the graph consequence-de-la-paix_3492063_3232.html
aggregated by host. The use of filtering By using filtering, URLs 1 and 3, which are
increases the readability of the graph and offtopic, are eliminated and only URL 2,
ensures that all pages that do not concern which contains the terms ‘centenary’ and ‘14-
WWI are eliminated. Figure 30.4 shows a 18’, is retained. This method is very depend-
graph without filtering, which is very diffi- ent on the choice of the keywords: an overly
cult to open and manipulate with Gephi. selective choice of keywords can lead to the
After filtering the graph, we obtain Figure elimination of pages that are nevertheless
30.5, which is much easier to manipulate and relevant. The August 2014 capture contains
allows an interpretation by the researcher. approximately 240 million URLs, but after
Using the example of a seed URL http:// filtering only 22 million URLs remain. Thus,
www.lemonde.fr/centenaire-14-18/8.html 90% of the corpus is considered ‘off topic’, a
that has outgoing links to different URLs: level that seems unacceptably excessive.

1 http://television.telerama.fr/tele/chaine-tv/ Scope
france-3,80.php In this section, we compare the graphs gen-
2 http://www.lemonde.fr/centenaire-14-18-livres/ erated by staying in the scope of the corpus
452 THE SAGE HANDBOOK OF WEB HISTORY

Figure 30.4 Hyperlink network visualisation without filtering, with host aggregation
(scope: all).

as archived, that is, including only those aggregated by host. The second graph, very
URLs collected based on the seed list, or difficult to manipulate, corresponds to
by including all the outgoing URLs cited Figure 30.4. Figure 30.6 represents the
whether or not they form part of the graph produced by remaining in the scope
corpus. In both cases the nodes are of the seed list.

Figure 30.5 Hyperlink network visualisation after filtering, with host aggregation (scope:
all).
EXPLORING THE MEMORY OF THE FIRST WORLD WAR USING WEB ARCHIVES 453

Figure 30.6 Hyperlink network visualisation without filtering, aggregated by host, remain-
ing in the scope of the seed list.

By staying within the scope of the collec- Dealing with Dynamic Graphs
tion, the seed list plays the role of a filter,
regardless of whether or not URL keyword One of the interests of working on web
filtering is applied. If we do not restrict the archives is to be able to retrace the evolution
dataset to this scope, it is necessary to use of documents and their relationships. In our
filtering to have a readable and interpretable case, the various captures should make it
graph, eliminating the ‘interference’ that possible to evaluate how the structure of the
comes from outgoing links. network of sites devoted to the Great War
For the aggregation of the nodes, the changed over time.
comparison above led us to choose to Dynamic (temporal) graphs, where nodes
aggregate by seed URL rather than by and links appear and disappear over time,
host, which gives too much importance to have been the subject of an increasing inter-
sites that are only partially dedicated to the est in computer science as a challenging topic
Great War (examples: websites of social due to the complexity introduced by this tem-
networks, media, etc.). The graphs pre- poral dimension. Various techniques have
sented in the following sections respect the been proposed to respond to the challenges
scope of the collection with nodes aggre- related to integrating time into the visualisa-
gated by seed URL and without any key- tion of graphs (Albano et al. 2015; Heymann
word filtering. and le Grand, 2014). The most common way
454 THE SAGE HANDBOOK OF WEB HISTORY

in practice consists of studying the series of sites online and links between them? If the
graphs representing the state of the corpus at metadata in WAT files could indicate the date
different times. It is also important to ensure of publication of the archived documents, it
stability in the layout in order to preserve the would be possible to detect if it is one phe-
user’s mental map, which is essential for read- nomenon or the other, but this is not the case.
ability, memorability and interpretability of On the other hand, we will never be certain
dynamic graphs (Archambault and Purchase, because, for web documents in general,
2013). Two kinds of analysis can be done by no trustworthy timestamp for publication
using dynamic graphs: studying the general is available. This is due to its decentralised
evolution of the overall graph over time or nature and the lack of standards for time and
focussing on a specific node (dynamic ego- date; only the date of crawl can be considered
centric graph) and studying only its evolution as truly reliable.
over time (Shi et al., 2015). In the timescale of this project, we were
For homogeneous web archive collec- not in a position to make a reliable temporal
tions, which are constructed by following analysis, although the context of events relat-
the same seed URLs and same crawler con- ing to the centenary encourages us to think
figuration (depth, frequency, etc.), all these that network densification is indeed an effect
challenges remain the same. On the other of increased online activity. Thus, we have
hand, while working with non-homogeneous taken the option of working on a collection
web archives where the seed URLs and/or that reflects the state of the web related to the
the crawler configuration change over time, First World War in November 2014 as defined
we have additional challenges such as how by curators. Since the growth of the collec-
to distinguish the evolution of the overall tion since this date has been more gradual, it
graph from the evolution of the collection. would be interesting to perform a temporal
We would like to underline that these issues analysis on the subsequent captures, which
are independent of the aggregation, filter- represent a more stable corpus.
ing and perimeter decisions. On the other
hand, generating a seed-URL graph raises
another issue: which version of the seed list
to use. This is one reason why it is crucial RESULTS: INTERPRETING THE
for researchers to understand how the web NETWORK OF WWI WEBSITES
archive collections are created.
We generated four graphs for four differ- This section aims to show the interpretive
ent captures made between November 2013 work that can be carried out from a network
and November 2014. For each capture, the of sites built from a web archive. It shows
seed list is enriched with new sources and is how the analysis can be enriched by content
updated (changes to URLs or crawler config- analysis and by interviews, and the interest of
uration for sources already in the collection). working on web archives.
The appearance of a new source in the seed Websites are usually built for one or sev-
list does not necessarily mean that it is a new eral purposes; they are designed and used
site on the web: the site could already exist by specific populations with such purposes
but not have been spotted or have been con- in mind. These functional aspects deeply
sidered to be irrelevant by curators before. inform the content and structure of websites,
Thus, we face a great deal of uncertainty: is as well as their relations with other websites
the intensification of relationships from one and resources. That is why, in this case as in
capture to another due to better identification any case of data analysis, the background
of sites in the archiving process or rather to knowledge of the domain is essential for the
changes in activity in terms of the number of analyst to see what is relevant in the statistics
EXPLORING THE MEMORY OF THE FIRST WORLD WAR USING WEB ARCHIVES 455

and maps, to be able to tell the wood from on a specific aspect of this collection as seen
the trees. earlier in this chapter: the relations between
The aim of our larger research, to study websites through hyperlinks. The idea is to
how the memory of WWI is constructed establish the network of websites, thereby
on the web, provided us some such back- giving a vision of social or intellectual dis-
ground. WWI, of course, corresponds to an tance between actors and providing infor-
important moment for the collective mem- mation on the popularity and notoriety of
ory: it concerned a whole generation and different actors (Cardon et al., 2014; Severo
the nations involved developed a significant and Venturini, 2016).
activity around the commemoration of the After presenting the main results that
dead before the end of the conflict (Olick emerge thanks to the WWI website network,
and Robbins, 1998). It is notable that this we focus on the role of qualitative analysis,
memory activism, still alive one century content analysis and interviews for enhanc-
afterward, was reinforced by the fact that ing the interpretation. We will then examine
since the 1980s we have been facing a pro- which conditions are necessary in using a
fusion of references to memory, ‘the boom corpus initially constituted for a heritage pur-
mémoriel’ as it is called in French. Pierre pose for a research one.
Nora, by introducing the concept of ‘lieu
de mémoire’ (or ‘place of memory’), pro-
poses an interpretation of memory trans- Observations from the Network
formation, focussing on the externalisation of WWI Websites
of memory: ‘Lieux de mémoire originate
with the sense that there is no spontaneous The archived collection of WWI (captured in
memory, that we must deliberately create November 2014) has almost 500 websites:
archives, maintain anniversaries, organize 35% are institutional websites3, and 11% are
celebrations, pronounce eulogies, and nota- websites from media that propose specific
rize bills because such activities no longer sections dedicated to the world conflict. But
occur naturally’ (Nora, 1989). WW1 web- more than half are non-professional websites
sites participate in that ecology. But Nora’s published by individuals (37%) or by asso-
seven-volume book was published before ciations (17%).
the growth of internet as a media. What To visualise the graph, generated by the
are the equivalent of the lieux de mémoire treatment chain presented above, a force-
on the web? A century after the events, who directed layout is used to spatialise the net-
are the main actors involved in constructing work, ForceAtlas2: ‘Nodes repulse each
the history and memory of WWI through other like magnets, while edges attract their
websites (amateurs, experts, associations, nodes, like springs. These forces create a
institutions…)? What kind of links are movement that converges to a balanced state’
established between these websites? What (Jacomy et al., 2014). Figure 30.7 gives a 2D
is the role of online heritage collections in representation of the WWI web network. The
this territory? To understand this ecology, absolute position of the nodes is arbitrary,
our approach combines the sociology of while the relative positions are correlated to
memory (Gensburger and Lavabre, 2005), the interrelationships. Figure 30.8 retains the
in the Halbwachs (1994, 1997) tradition, nodes with a degree greater than 30.
enriched by Nora’s (1989), Rousso’s (2016) The first remark is that media are quite
and Ricoeur’s (2003) reflections on memory invisible in the map, certainly because they
with research on new media. are less connected to the other sites but also
As our aim is to describe the digital organi- because there is not always a clear identifica-
sation of WWI memory on the web, we focus tion of the section dedicated to WWI.
456 THE SAGE HANDBOOK OF WEB HISTORY

Figure 30.7 Network visualisation of websites dedicated to WW1.

Although there are no truly independent connected to institutional websites and do not
clusters, we can notice a clear distinction interact with the amateur sphere.
between a zone dominated by institutional Finally, we can notice that some institu-
websites (in red) and another one where per- tional websites, like Gallica or Mémoire des
sonal websites (in blue) are the most frequent. hommes, are close to the personal websites
We interpret the latter as the amateur sphere. on the map because they are frequently cited
The main actor, with the highest degree (see through hyperlinks.
earlier in chapter), i.e. the most connected
to other websites, in the institutional zone is
centenaire.org, the official website dedicated Improving Interpretation
to the centenary.
In the blue zone, the main node is an ama- The exploration of the network map reveals
teur forum, surrounded by a lot of blogs or some elements of the structure of the terri-
personal websites. The density of the con- tory of WW1 memory, especially the distinc-
nections between the websites is highest in tion between the institutional and the amateur
this part of the graph. It is important to notice zone. In order to improve the understanding
that not all the non-professional websites are of the communities engaged in WWI
located in this lower part of the map. Some of memory, the network analysis should be
them are clearly integrated in the institutional combined with qualitative approaches to
zone, such as sourcesdelagrandeguerre.fr understand the content of nodes and the
or troupesdemarines.org, due to their spe- nature of interactions between stakeholders.
cific network of connections: they are more We would like to underline the interest of 1)
EXPLORING THE MEMORY OF THE FIRST WORLD WAR USING WEB ARCHIVES 457

Figure 30.8 Network visualisation of websites dedicated to WW1 (degree>30).

exploring the content of nodes: a) content analyse the content of webpages behind the
analysis when the website is focussed on nodes of the network. The websites are of
publication, b) content analysis of the inter- two very different natures. Some are primar-
actions between contributors when the web- ily places of publication, while others are
site, like a forum, is dedicated to interactions places of interaction, such as forums,
and 2) conducting interviews with the actors although, on the web, the activity of publica-
of this interaction space to capture their own tion is closely intertwined with the activity of
perspective. interaction. Both require specific approaches,
one focussing on content analysis and the
Analysing the contents of websites other on analysis of interactions.
and interactions This content exploration helps us to under-
To better understand the social structure stand the social organisation of the group, the
behind a network of websites, it is helpful to presence of a common culture, the existence
458 THE SAGE HANDBOOK OF WEB HISTORY

of norms, statuses and roles that are reflected Around the forum, and connected to it, we
in the network structure. It usually implies noticed the network of personal websites. The
an observation of the milieu in the tradi- ethnographic exploration shows that these
tion of virtual ethnography (Hine, 2000), websites are dedicated to specific aspects of
which means participating for an extended the war, based on the knowledge and docu-
period of time, reading what is said and ments (public and personal documents and
published, observing what happens, explor- photographs) that their authors have been
ing hyperlinks, collecting conversations and accumulating. Many of these websites con-
documents. This kind of approach was con- cern infantry regiments and reconstitute the
ducted, for example, by Nancy Baym in the memory of each soldier, battle and place occu-
early 1990s while studying a group of fans pied. Most of the authors of websites in the
of soap opera (Baym, 1993) in a forum or by environment of the forum are or were also
Beaudouin and Velkovska (2000) for under- contributors to the forum. Signs of friendship
standing a peer-community of technical assis- are visible on the websites: most of the sec-
tance, which implied interactions in a forum, tions offering external links are called ‘Blogs
self-publishing of websites, exchanges by et sites amis’ (‘friends’ blogs and sites’), and
mailing list, etc. in the presentation of each blog the interac-
In our case of WWI memory, this explora- tions with other amateurs are often mentioned.
tion shows a real difference in the way sites We have shown on the map that the density of
are cited. In the institutional sphere, links are links was higher in the amateur zone, and the
mainly based on the content; in the amateur ethnographic analysis demonstrates that the
sphere the meaning of the hyperlinks is two- network of links also has a different meaning
fold: highlighting both the content and the reflecting belonging to a community of prac-
social relation. We can better understand this tice (Lave and Wenger, 1991) engaged in a
point by comparing the main nodes of each work of collective memory (Halbwachs, 1994,
zone. Centenaire.org gathers all the initia- 1997), in the sense that the way of talking
tives around the centenary of WWI. It is like about history is constructed by the community.
a portal, referencing every exhibition, confer-
ence, demonstration, publication, educational Conducting interviews
action… and proposing archives of photo- The quality of the interpretation of a network
graphs, maps, etc. There is no interaction in of websites can be improved by conducting
this website; it is a single window addressed interviews with the actors involved. It of
to a large audience. course presupposes that the research is con-
The nature of the forum is totally differ- ducted shortly after the archive process,
ent: this space is principally dedicated to when participants are still active.
interactions between participants. The sec- We performed 11 interviews (nine face-to-
tions and threads help to distinguish different face, two by telephone) with active members
topics of conversation. The forum gathers a of the forum and authors of websites from the
group of people working together to build amateur sphere involved in WWI memory.
the memory of the war: they are connected Interviews lasted on average 1h 50 (between
because they share information and data, 50 minutes and three hours)4. We presented
because they build knowledge together but the web graph to the various interviewees at
also because friendship helps in organising the end of the interaction. They were able to
relations between them. In this space, where react to this representation of a universe that
there are no professional obligations, the sounds familiar to them, especially in the
involvement is also sustained by the social amateur zone. Conversely, the institutional
relations between the members (Beaudouin, zone is for them a space that is less known,
2016). less useful, less frequented.
EXPLORING THE MEMORY OF THE FIRST WORLD WAR USING WEB ARCHIVES 459

These interviews were very helpful for to the actors makes possible the work of co-
understanding the specific position of her- construction of the interpretation between
itage websites in the map. These document actors and researchers and also allows us to
warehouses, which offer digitised collections identify the limits of the map. It also offers a
to readers, are clearly institutional websites. potential loop of interactions with the consti-
This is true for Mémoire des hommes, a tution of the collection.
huge database from the Ministry of Defence,
which contains all the data concerning the
soldiers killed during the war (‘Morts pour Methodological Reflections
la France’) and the digitised war diaries of
each regiment. It is also the case for Gallica, After explaining the strategy used in this
the digital library of Bibliothèque nationale study to interpret and control the quality of
de France. Despite their institutional char- the representation, this final section dis-
acter, they are located close to the amateurs’ cusses two aspects of the methodology
sphere. These non-professional historians chosen for this research: doing research on a
or genealogists show an intensive use of the corpus constituted for a heritage purpose;
digitised documents, confirmed during the and working with web archives instead of
interviews, by frequently mentioning these working with the live web.
sources in their messages, blogs and web-
sites. In that sense these amateurs bring life Delegating the establishment of
to heritage websites that are institutional by the corpus
definition. Can a corpus constituted for a heritage pur-
Finally, the interviews had an unexpected pose be useful for a research one? At the end
use: the exploration of the map by the actors of this experience, the answer is evidently
made it possible to identify the limits of yes, but at the starting point it was not evi-
the network of WWI websites. Some noted dent at all. Usually, the researcher considers
the absence of fundamental sites for their that the constitution of the corpus is part of
research, such as the site of the Red Cross his/her research work and will not volunteer
which lists all prisoners of war, or dedi- to delegate this task. The constitution of the
cated sites in countries other than France… corpus corresponds to the moment where the
Others were surprised by the small size of researcher discovers the field and is when
the MemorialGenWeb site, which is in a hypotheses emerge. For example, with
sense the amateur ancestor of Mémoire des Navicrawler, a tool used by many scholars to
hommes. And more generally, the low pres- perform mapping of websites (Boullier and
ence of genealogy sites was often noted. Lévy, 2016; Diminescu and Pasquier, 2010),
This points to three different problems. The at each step as a new website is proposed by
archiving of the web is part of a legal deposit the tool, the researcher takes the decision to
mission and does not concern foreign sites, keep it or to exclude it from the collection.
which is obviously regrettable in the study of Evidently, this method can also introduce
a world conflict. Second, the limited presence some bias in the sense that the principle of
of MemorialGenWeb is due to the fact that at choice may change over time.
the time the archive being studied was cre- What are the conditions for working on a
ated the site was in the process of leaving its collection that researchers have not consti-
original hosting service provider, so that the tuted? The main condition is the quality of
address of the site was in transition. Finally, description of the process of selection. In
archiving does not currently cover commer- our case, as the collection was constituted
cial sites that charge fees, which is the case by librarians and experts, in a professional
for most genealogy sites. Presenting the map context where rules are clearly defined and
460 THE SAGE HANDBOOK OF WEB HISTORY

written, we were presented with a corpus that media, such as books. Archiving the web
was clearly documented with the seed URL, demands a high level of expertise and it is
the categorisation of the author, the keywords, essential to delegate this task to specific
the person responsible for the selection… institutions or organisations.
and also the method of crawling applied. This Many researchers have underlined that
set of metadata is an exceptional resource the archive cannot correspond exactly to the
because it allows the researcher to check the ‘real’ websites (Brügger, 2013), because the
perimeter of the collection, to define sub- process of archiving cannot guarantee an
corpora, to exclude some websites if needed. identical copy of the documents, but rather a
Evidently, it is not only a question of mere representation of the websites.
documenting the process but also a question Nonetheless, working on web archives rep-
of interaction: the essential element is the resents a great improvement for the research
possibility of interacting with the experts in community: it allows the possibility to verify
charge of the collection, for understanding the results by returning to the raw data and it
certain choices, for suggesting missing web- ensures that the research can be reproduced.
sites, for signalling changes in a URL… This As Halbwachs (1997) mentioned, the mem-
feedback gives researchers a role in creating ory is the collective reconstruction of the
the archive, at the risk of creating a bias in past. We know that the memory of the war
the collection; it is rare for researchers to cre- that we study during the centenary is specific
ate primary archive sources, even if it is more to our time and that it differs from the mem-
common for readers to suggest books for ory just after the war and of what it will be
library collections. The loop of interactions for the bicentenary of the war. For historians,
between the team in charge of the collection it is essential to keep the memory of those
and the team that will do research on a corpus different periods.
extracted from the collection is a condition
of quality.

Working on web archives CONCLUSION


Since the web came into existence, and since
research on the internet has emerged, the This chapter gives an account of a research
necessity of working on archives was so evi- methodology for analysing web uses based on
dent that nobody really sought to explain it. the archives of the web, as preserved by the
Many tools, such as httrack, were available French National Library (BnF). It is based on
for crawling and making local copies of web- a specific research question: the memory of
sites. For researchers in social sciences it was the First World War on the web. This case
obvious that due to the ephemeral character study demonstrates how the three stakeholders
of the web it was necessary to archive cor- involved, librarians, computer scientists and
pora. The initiative of the Internet Archive social scientists, cooperate to produce quality
and the building of the Wayback Machine work with frequent interaction and feedback
arise from the same observation: as the web between the constitution of the collection, the
is in constant transformation, it is vital to data extraction and the analysis and interpreta-
preserve the different states. Researchers tion, the latter being fed by the exploration of
have developed archives, without imagining the archived documents and by the interviews
that they would prove to be so ephemeral: the with the actors. The treatment chain has been
disk cannot be read any more, the copy is designed to be used on other topics and by
partially damaged, the browser cannot sup- other researchers.
port old versions…. Digital documents are As noted above, web archives are a new
particularly fragile compared with other kind of source for many researchers, and can
EXPLORING THE MEMORY OF THE FIRST WORLD WAR USING WEB ARCHIVES 461

be difficult to interpret if the researchers do not ACKNOWLEDGEMENTS


understand the way they have been created.
For librarians, the work with the researchers ‘The future of online digitised heritage: the
on this project was very useful in helping to example of the Great War’ is a research
define what information needs to be provided project conducted by the BnF, the BDIC
on the way sites have been selected and col- and Télécom ParisTech as part of the Cluster
lected. The different publications on the pro- of Excellence, Pasts in the present,
ject are one way of providing this kind of Investissements d’avenir, réf. ANR-11-
information for future researchers. LABX-0026-01 (Nicolas Auray, Valérie
Working on web archives is a major oppor- Beaudouin, Philippe Chevallier, Lionel
tunity for social science research: it allows Maurel, Josselin Morvan, Zeynep Pehlivan,
the use of documented and stabilised collec- Peter Stirling).
tions to constitute corpora. This opens the
possibility of a reproducibility of the research
and allows the testing of new methodologies
Notes
on the same corpus, which opens the possi-
bility of both confrontation and enrichment 1  https://webarchive.jira.com/wiki/display/Ire-
of the analyses. search/Web+Archive+Transformation+(WAT)+
Specification,+Utilities,+and+Usage+Overview
What are the challenges for developing
(visited 6 June 2017).
the use of web archives in social science 2  https://webarchive.jira.com/wiki/display/
research? The first one relates to the availa- Iresearch/Web+Archive+Analysis+Workshop
bility of archives, which is limited by law: in (visited 6 June 2017).
the digital era, having to be physically on-site 3  We used the librarian’s categories, with only mar-
ginal changes: we merged ‘official’ and ‘public’
at the BnF to consult the archives is an obvi-
websites into a global category ‘institutional
ous limitation. The second is the flexibility of websites’ and changed the categorisation of a
the archiving process, which needs to adapt very small number of websites.
to the evolutions of the web. If it is still rel- 4  Three of them were conducted by Philippe Che-
evant to archive websites, it becomes essen- vallier, eight by Valérie Beaudouin.
tial to archive the social media platforms on
which a large part of the activity is deployed.
In our case study, social media were partially BIBLIOGRAPHY
integrated, knowing that in parallel another
researcher was involved in archiving the Achlioptas, D., Clauset, A., Kempe, D., and
WWI on Twitter (Clavert, 2016). Finally, on Moore, C. (2009) ‘On the bias of traceroute
subjects of international dimension, such as sampling: Or; power-law degree distribu-
ours, interoperability between the national tions in regular graphs’, Journal of the ACM,
archives will have to be envisaged in order 56(4): 1–28.
to allow the creation of transnational cor- Albano, A., Guillaume, J.-L., Heymann, S., and
pus. An International Internet Preservation Grand, B. Le (2015) ‘Studying Graph Dynam-
Consortium (IIPC) initiative around the First ics Through Intrinsic Time Based Diffusion
Analysis’, in Kazienko, P. and Chawla, N.
World War is underway.
(Eds), Applications of Social Media and Social
Through this experience, we have been
Network Analysis. Springer International
able to show how, at the time of the first cen- Publishing (Lecture Notes in Social Net-
tenary, an original memory work was built on works). pp. 103–124.
the web around the First World War, which Amar M., and Chevallier P. (2013) Rapport
involves institutions as well as a network of d’étude sur les usages des corpus numérisés
amateurs. These results constitute a reference de Gallica sur la Grande Guerre’, Technical
base for further studies. Report. Paris: BnF.
462 THE SAGE HANDBOOK OF WEB HISTORY

Archambault, D., and Purchase, H.C. (2013) Halbwachs, M. (1994) Les cadres sociaux de la
‘The “Map” in the mental map: Experimen- mémoire. Paris: Albin Michel (First edition,
tal results in dynamic graph drawing’, Inter- 1925, Librairie Alcan).
national Journal of Human-Computer Halbwachs, M. (1997) La mémoire collective.
Studies, 71(11): 1044–1055. Paris: Albin Michel (First edition, 1950).
Baym, N.K. (1993) ‘Interpreting soap operas Hernández, I., Rivero, C.R., Ruiz, D., and
and creating community: Inside a computer- Arjona, J.L. (2012) ‘An Experiment to Test
mediated fan culture’, Journal of Folklore URL Features for Web Page Classification’, in
Research, 30: 143–176. Rodríguez, J.M.C., Pérez, J.B., Golinska, P.,
Beaudouin, V., and Velkovska, J. (2000) ‘Struc- Giroux, S., and Corchuelo, R. (Eds), Trends in
turing a communication space on the Inter- Practical Applications of Agents and Multia-
net’, Réseaux. The French Journal of gent Systems. Berlin and Heidelberg:
Communication, 7(2): 183–222. Springer (Advances in Intelligent and Soft
Beaudouin V. (2016) ‘Forums en ligne: des Computing). pp. 109–116.
espaces de co-production de la connaissance Heymann, S., and Le Grand, B. (2014) ‘Explora-
et du lien social’, in Martin, O. and Dagiral, tory network analysis: Visualization and
E. (Eds), L’ordinaire d’internet. Paris: Armand interaction’, Complex Networks and their
Colin. pp. 164–182. Applications, in Cherifi, H (Ed.), Complex
Berners-Lee, T., Masinter, L., and McCahill, M. Networks and their Applications. Cambridge:
(1994) ‘RFC 1738: Uniform Resource Loca- Cambridge Scholars Publishing. pp.
tors (URL)’, https://www.ietf.org/rfc/ rfc1738. 174–211.
txt. Hine, C. (2000) Virtual ethnography. London:
Bonnel, S., and Oury, C. (2014) ‘Selecting web- Sage.
sites in an encyclopaedic national library: A Jacomy, M., Venturini, T., Heymann, S., and
shared collection policy for internet legal Bastian, M. (2014) ‘ForceAtlas2, A continu-
deposit at the BnF’, in IFLA WLIC 2014. ous graph layout algorithm for handy net-
Boullier, D., and Lévy, J. (2016) ‘Topographies/ work visualization’, PLoS ONE, 9(6): 1–22.
topologies. Langages spatiaux, spatialités, Lave, J., and Wenger, E. (1991) Situated learn-
espaces’, Réseaux, 34(195): 9–162. ing: Legitimate peripheral participation.
Brügger, N. (2013) ‘Historical network analysis Cambridge: Cambridge University Press.
of the Web’, Social Science Computer Le Follic, A., Stirling, P., and Wendland, B.
Review, 31(3): 306–321. (2012) ‘Putting it all together: Creating a
Cardon, D., Fouetillou, G., and Roth, C. (2014) unified web harvesting workflow at the Bib-
‘Topographie de la renommée en ligne’, liothèque nationale de France’. International
Réseaux, 6(188): 85–120. Internet Preservation Consortium (IIPC).
Cervoni, J.-R., and Cervoni, M.-F. (2005) Petit Loez, A. (2013) Les 100 mots de la Grande
dictionnaire de la Grande guerre. Bastia: guerre. Paris: Presses universitaires de
Anima Corsa. France.
Clavert, F. (2016) ‘#ww1. The Great War on Markkandeyan, S., and Devi, M.I. (2015) ‘Effi-
Twitter’, in Digital Humanities 2016: Confer- cient machine learning technique for web
ence Abstracts. Jagiellonian University & page classification’, Arabian Journal for Sci-
Pedagogical University, Kraków. pp. ence and Engineering, 40(12): 3555–3566.
461–462. Meusel, R., Vigna, S., Lehmberg, O., and Bizer,
Diminescu, D., and Pasquier, D. (Eds) (2010) Les C. (2014) ‘Graph Structure in the Web –
migrants connectés – TIC, mobilités et migra- Revisited: A Trick of the Heavy Tail’, in ACM
tions. Réseaux, 1(159): 196. (WWW ‘14 Companion), pp. 427–432.
Gensburger, S., and Lavabre, M. (2005) ‘Entre Meusel, R., Vigna, S., Lehmberg, O., and Bizer,
“devoir de mémoire” et “abus de mémoire”: C. (2015) ‘The graph structure in the web –
la sociologie de la mémoire comme tierce analyzed on different aggregation levels’,
position’, in Müller, B. (Ed.), Histoire, The Journal of Web Science, 1(1): 33–47.
mémoire et épistémologie. A propos de Paul Meyniel, J. (2010) Vocabulaire illustré de la
Ricoeur. Paris: Payot. pp. 76–95. Grande guerre: 1914–1918. Fontaine: EP.
EXPLORING THE MEMORY OF THE FIRST WORLD WAR USING WEB ARCHIVES 463

Nora, P. (1989) ‘Between memory and history: network visualization’, IEEE Transactions on
Les lieux de mémoire’, Representations, Visualization and Computer Graphics, 21(5):
0(26): 7–24. 624–637.
Olick, J.K., and Robbins, J. (1998) ‘Social Souza, T., Demidova, E., Risse, T., Holzmann,
memory studies: From “collective memory” H., Gossen, G., and Szymanski, J. (2015)
to the historical sociology of mnemonic prac- ‘Semantic URL Analytics to Support Efficient
tices’, Annual Review of Sociology, 24(1): Annotation of Large Scale Web Archives’, in
105–140. Cardoso, J., Guerra, F., Houben, G.-J., Pinto,
Ricoeur, P. (2003) La mémoire, l’histoire, l’oubli. A.M., and Velegrakis, Y. (Eds), Semantic
Paris: Seuil. Keyword-based Search on Structured Data
Rousso, H. (2016) Face au passé. Essais sur la Sources. Cham: Springer International Pub-
mémoire contemporaine. Paris: Belin. lishing (Lecture Notes in Computer Science).
Serrano, M.Á., Maguitman, A., Boguñá, M., pp. 153–166.
Fortunato, S., and Vespignani, A. (2007) Stirling, P., Chevallier, P., and Illien, G. (2012a)
‘Decoding the structure of the WWW: ‘Web archives for researchers: Representa-
A comparative analysis of web crawls’, tions, expectations and potential uses’,
ACM Trans. Web, 1(2), http://dx.doi. D-Lib Magazine, 18(3–4), http://www.
org/10.1145/1255438.1255442 dlib.org/dlib/march12/stirling/03stirling.
Severo, M., and Venturini, T. (2016) ‘Enjeux html.
topologiques et topographiques de la car- Stirling, P., Illien, G., Sanz, P., and Sepetjan, S.
tographie du Web’, Réseaux, 34(195): (2012b) ‘The state of e-legal deposit in
84–105. France: Looking back at five years of putting
Shi, L., Wang, C., Wen, Z., Qu, H., Lin, C., and new legislation into practice and envisioning
Liao, Q. (2015) ‘1.5D egocentric dynamic the future’, IFLA Journal, 38(1): 5–24.
31
A History with Web Archives, Not a
History of Web Archives: A History of
the British Measles–Mumps–Rubella
Vaccine Crisis, 1998–2004
Gareth Millward

The media storm over the measles–mumps– human to read in detail. Moreover, most his-
rubella vaccine (MMR) is often attributed to torians – even those dealing with the more-
a now-retracted Lancet article from February recent past – are not trained to use, or have
1998 (Wakefield et al., 1998). Allegations experience using, web archives. Many do not
that the vaccine might be linked to rising have the resources or the inclination to build
autism diagnoses in the UK caused signifi- their projects around ‘the web’ as their main
cant problems for public health authorities. object of study. With so much information
Vaccination rates dropped as parents received available, how can historians use the archived
conflicting information from the medical web as one source among many? How can
profession, the press and, increasingly, the they select and read documents? And how
World Wide Web. In England, 92 per cent problematic is this for our understanding of
of children under the age of two received at the 1990s and 2000s?
least one dose of MMR in 1996. In 2004 this This chapter is a review of one historian’s
had dropped to 80 per cent (Health and Social research involving the Internet Archive.
Care Information Centre, 2005). Although Through a case study, it shows historians
the uptake of MMR then began to recover, without experience of web archives what
it was not until 2012 that rates returned to is possible – as well as the methodologi-
pre-crisis levels. Conveniently, the Internet cal pitfalls. In doing so, it also emphasizes
Archive begins in 1996, and covers this entire to digital historians and web archivists how
period. Given that the web was often credited ‘traditional’ historians with little experi-
as an influence during the crisis, this is a criti- ence of web archives might approach this
cal resource in writing the history of MMR. source base. This may help to explain what
But accessing the archive is no simple task. historians will want from digital repositories
There is far too much information for any one and what training they may need to
A HISTORY WITH WEB ARCHIVES, NOT A HISTORY OF WEB ARCHIVES 465

physically and practically access them in a (2017) Wayback Machine in conjunction


meaningful way. with the SHINE interface – a faceted search
This chapter begins with an outline of the tool developed on a comprehensive index of
project. Next, it explores the methodologi- the .uk domain by the British Library (2017).
cal challenges of using the Internet Archive, These tools allow historians to inspect his-
especially for historians who are using close torical snapshots and contextualize them
textual reading of these sources as part of a with other primary sources.
wider study. After explaining the historical Yet web archives were not (and could not
context of MMR and British internet usage, it be) central to the overall analysis. The period
then outlines three different websites. These they cover was limited to the final book
give not just examples of the empirical data chapter on MMR. Any large-scale analysis of
gleaned from the archive, but also the meth- hyperlink networks or corpus linguistics was
ods for identifying and using that material. impractical. I lacked the training or expertise
As a whole, it shows that the Internet Archive to embark on such an endeavour, and in prac-
can be used as one source among many in tical terms I lacked the time to do so. It would
larger history projects that span a wider time also have created an unbalanced project,
frame than that in which the web has existed. with completely different historical tech-
However, it is still a flawed and problematic niques and research questions employed for
process. Historians will be confronted with around one-fifth of the monograph compared
these problems more regularly over the next with the other time periods. The aim, there-
few years, and it is important that we begin to fore, was to integrate archived web sources;
tackle them sooner rather than later. albeit with a critical eye on how born-digital
sources must be inspected within a different
context to other archived materials (Brügger,
2008). This was not a history of MMR on
A HISTORY OF POLICY the web. Such a project would undoubtedly
shed light on the impact of the web just as
I am a historian of medicine, specializing contemporary sociologists and media studies
primarily in the history of government policy. academics have been invaluable in assessing
Between 2014 and 2017 I wrote a mono- the impact of the media on public discourse
graph on the history of British vaccination (Hargreaves et al., 2003; Speers and Lewis,
policy from the 1940s through to MMR in 2004). Ackland and Evans (2017) give us
the twenty-first century.1 The main source an example of how this might be done with
base comprised official records (such as another contentious debate of medical and
reports, internal governmental procedural social significance, abortion in Australia.
documents and parliamentary proceedings), Instead, this was a history of MMR which
contemporary publications and newspapers. included the web.
Web archives were an important addition to While the work of digital historians has
this body of literature. There were websites been invaluable in opening up these sources
which were in themselves official govern- to wider historical analysis, it is important
ment documents. Similarly, websites created that ‘traditional’ historians are able to work
by other organizations and individuals inform with as well as on internet archives. Not all
analysis of how and why the British public research questions will be designed with web
may have supported or rejected official gov- archives as the central source. Wider social
ernment policy. Not all these sites were still and political studies will also need to analyse
on the live web, or at least not in the form data which are neither born-digital nor origi-
they would have taken during the period nally published on the internet. At the same
under discussion. I used the Internet Archive’s time, any history of the 1990s and 2000s
466 THE SAGE HANDBOOK OF WEB HISTORY

has to engage with the fact that increas- THE MMR CRISIS IN THE CONTEXT
ing numbers of Britons were using the web OF BRITISH WEB HISTORY FROM THE
in their day-to-day lives. In order to write 2000s
such histories, researchers need to be aware
of the power as well as the limits of avail-
able depositories. The Big UK Domain Data MMR became part of the routine vaccination
for the Arts and Humanities (BUDDAH, programme for children in 1988.
2015) project with the Institute of Historical Immunization rates against the three diseases
Research and British Library (Winters, 2017) had been sub-optimal for public health
revealed that web archival sources are not the authorities, and this trivalent vaccine was
same as fully catalogued ‘physical’ collec- more effective and convenient for parents
tions such as those at The National Archives (Department of Health and Social Security,
or local history centres. Similarly, they are 1987; Badenoch, 1988). Uptake of the vac-
not as accessible, coherent or well-curated as cine was good. By 1996 over 90 per cent of
digitized depositories such as the Old Bailey children under the age of two had received
records or newspaper collections.2 MMR (Health and Social Care Information
As part of BUDDAH, I had done some Centre, 2005). But confidence in the vaccine
work on the Internet Archive. I was there- dropped significantly in the late 1990s. In
fore not a complete novice, though I would 1998, a paper was published in The Lancet
also lay no claims to be an accomplished (Wakefield et al., 1998) which appeared to
digital historian. The experience from show a link between MMR and a new type of
this and from my colleagues showed that autism. The paper itself was inconclusive.
naïve search methods could lead to large, However, in a press conference called to
unwieldy corpora that were difficult to ana- launch the paper, the lead author, Andrew
lyse (see esp. Deswarte, 2015; Millward, Wakefield, claimed that it would be safer for
2015). With MMR, I decided to focus on parents to give their children separate mea-
websites which were cited in my other pri- sles, mumps and rubella vaccines. Despite
mary sources. Mostly, these were URLs repeated assertions that there was no evi-
mentioned in correspondence and news dence of a link between MMR and autism –
reports in the leading medical journals The including in the same edition of The Lancet
Lancet and the British Medical Journal. as Wakefield et al.’s paper (Chen and
This produced a concentrated group of DeStefano, 1998) – the press continued to
documents for which there was evidence report that the medical community was
of impact in the historical record. Martin divided on the issue (Speers and Lewis,
Gorsky (2015) employed a similar method- 2004).
ology in his investigation of Primary Care The crisis hit its peak in late 2001 and
Trust websites from the 1990s and 2000s. early 2002 (Speers and Lewis, 2004). Prime
Web archives were crucial in these endeav- Minister Tony Blair refused to confirm or
ours. The Wayback Machine allowed me deny whether his own young son Leo had
to examine captures of websites from the received MMR, citing his right to privacy.
time that they were cited by other publica- Then, a BBC documentary series Panorama
tions. The well-known problem of ‘link rot’ reported on Wakefield and John O’Leary’s
in academic journals (Zittrain et al., 2014) work on the MMR–autism link, casting new
meant that this was also, in many cases, the doubt on the vaccine’s safety (Dobson, 2002).
only way to access any version of URLs The government was forced into a defen-
long since removed from the live web. This sive publicity campaign. The Department
form of triangulation provided richer insight of Health eventually regained some control
from the primary sources. over the MMR story. The weight of scientific
A HISTORY WITH WEB ARCHIVES, NOT A HISTORY OF WEB ARCHIVES 467

evidence in MMR’s favour meant that the leading to several measles outbreaks, nota-
Legal Services Commission (2003) withdrew bly in 2008–09 and 2012–13 (Eaton, 2009;
funding for a case against the Department McCartney, 2013; Keenan et al., 2017). They
brought by the parents of autistic children study MMR in its own right or as a com-
who had received MMR. Then, in the fol- parison with similar events in other countries
lowing year, the investigative journalist Brian and time periods to learn lessons and avoid
Deer (2004) published a series of exposés on such events in the future. For historians and
Wakefield’s research practices in the Sunday media studies scholars, the crisis is of inter-
Times. Ten of the 12 co-authors of Wakefield est as an example of a public debate played
et al. (1998) formally withdrew their support out in the popular press, doctors’ surgeries
for the 1998 paper (Murch et al., 2004), and and the internet (Fitzpatrick, 2004; Horton,
Deer’s documentary for the Channel 4 series 2004; Speers and Lewis, 2004). The media
Dispatches (Berger, 2004) further damaged coverage of MMR was bound up in the politi-
Wakefield’s reputation and reaffirmed the cal and cultural context of its time, and tells
scientific basis for the MMR programme. us much about public attitudes towards sci-
The General Medical Council (2010) began ence, medicine, infectious disease and state
proceedings against Wakefield, and he was authority. Any research questions around
eventually struck off the medical register for these topics must involve the internet. There
serious professional misconduct. is a temporal overlap. Although placing exact
There are two main reasons for studying the dates on the MMR crisis is difficult, the most
MMR crisis. For public health professionals concentrated period of activity is from the
and researchers, the period was considered a publication of Wakefield et al. (1998) to
‘crisis’ because the vaccination rate among the Deer (2004) exposés. A wider view may
children for MMR dropped significantly, take in the decline in MMR uptake – 1996

Figure 31.1 Internet users (per 100 people).


Source: UNdata (2017).
468 THE SAGE HANDBOOK OF WEB HISTORY

(Health and Social Care Information Centre, the internet was used by different groups. What
2005) – and present-day measles outbreaks information was accessible to the general pub-
in Europe and North America (Larson et al., lic? How did the government or other authors
2016). However defined, this is contem- make use of this emerging technology? And
poraneous with the holdings in the Internet if ‘the internet’ was to blame, what specific
Archive. Moreover, the internet was continu- websites or resources on that internet were par-
ally cited as an influence on the debate. ticularly worrisome or empowering to different
Knowing exactly who and how many peo- communities involved in the crisis? This has
ple read about MMR on the internet is dif- necessarily limited the conclusion that one can
ficult to know with absolute precision. These draw from the material, but has also made the
questions are currently asked as part of sur- documents easier to integrate into my existing
veys on parents’ attitudes and the trend has primary source analysis. Historians need to be
been towards increased usage, but they were conscious of their choices and the politics inher-
not discussed in the peak period of the crisis ent in the definition of ‘the internet’ that they
(Ramsay et al., 2002; Campbell et al., 2017). used (Abbate, 2017).
In a 2003 study (Hargreaves et al., 2003), With these caveats, this article now presents
only 5 per cent of respondents said that they three examples of how the Internet Archive
consumed their science news primarily from might be used. The first concerns a govern-
the internet. Undoubtedly, however, the gen- ment website called MMR The Facts, hosted
eral trend has been towards increased internet on the nhs.uk domain. It sought to educate par-
usage and coverage in Britain. This period of ents about MMR, borrowing from sociological
greatest growth coincided with the intensifi- research on risk communication. The govern-
cation of the MMR crisis (UNdata, 2017). It ment used the web because of the internet’s
is therefore understandable that commenta- growing importance (perceived or real) and
tors at the time, at least in part, attributed the because it could provide as little or as much
spread of rumour and negative publicity over information as parents or health professionals
the vaccine to ‘the internet’ (Selway, 1998; wished to consume. The second example is a
Horton, 2004; Speers and Lewis, 2004). report written by vaccine-sceptic activist Alan
Web archives can certainly help to uncover Phillips in 1997. This document was hosted as
these trends, but the definition of ‘the internet’ in a text file on an American university’s server.
this context has to be challenged. It can refer to While the actual impact of the document is
the technology itself, the information contained difficult to measure, it was quoted by a health
on it or the cultures around how it was used and worker in the British Medical Journal who
the meanings attributed to them (Abbate, 2017; claimed many parents in his/her circle had
Turner, 2017). Moreover, the meaning and use read and shared it widely. The final example
of the internet has not been static; nor should concerns the website of the Society for the
we succumb to the temptation of writing either Autistically Handicapped. This was discov-
Whiggish narratives of internet ‘progression’ or ered through SHINE. It shows how potentially
uncritical polemics against the ‘dangers’ of the ‘lost’ documents can be traced within the
democratization of knowledge (Russell, 2017). archive, albeit with limitations in what histori-
However, this is not a history of the web. Instead ans can then do with this information.
it is one that acknowledges that internet usage
and web content at the turn of the millennium
were part of the cultural milieu that influenced
public discourse about MMR. The focus was MMRTHEFACTS.NHS.UK
on content – on closer reading of the informa-
tion contained in select webpages – rather than The British government attempted to combat
a fully theorized and evidenced analysis of how negative publicity about MMR from the day
A HISTORY WITH WEB ARCHIVES, NOT A HISTORY OF WEB ARCHIVES 469

of the infamous Wakefield press conference. over time. The news section, for example,
In the face of increased doubt and declining provided regular updates on stories that reaf-
vaccination rates, it renewed its campaign in firmed the government’s MMR message
2001 (Ramsay, 2001). This failed to reverse (Department of Health, 2002c). A world
the trend. In 2002, the Department of Health map showed which countries used MMR
launched a new public-facing website called and the varying rates of infectious disease
MMR The Facts (Figure 31.2), which it between them. The Department showed that
believed would educate parents and make use most high-income nations used the vaccine,
of an increasingly important technology emphasizing its safety record – especially in
(Department of Health, 2002d). It borrowed relation to the outbreaks of diseases such as
from techniques outlined in the growing field measles in low-income countries. Statistical
of risk communication, amid criticism from models showed what could happen if vacci-
sociologists that the Department’s strategy nation rates dropped too far. An interactive
had hitherto been too proscriptive and inflex- map allowed users to scan the globe, though
ible (Ramsay, 2002; Alaszewski and Horlick- unfortunately the Wayback Machine has not
Jones, 2003). The site’s URL appeared in preserved this Flash application (Department
contemporary news reports, particularly in of Health, 2002e). A ‘myths and truths’ sec-
the British Medical Journal (Muminovic, tion refuted the main claims made against
2002). MMR, especially the Wakefield et al. (1998)
The Internet Archive provided an oppor- article. This ‘quick reference’ section was
tunity to analyse what information the supported by a much longer page about
Department felt was important and to see MMR, why the government preferred it
whether and how these messages changed over other public health measures and the

Figure 31.2 ‘MMR The Facts’ – front page captured 8 September 2002.
Source: Department of Health (2002d).
470 THE SAGE HANDBOOK OF WEB HISTORY

importance of achieving a high uptake among parents were likely to ask and the reasons
the entire population (Department of Health, why the Department believed MMR was still
2002b). Other than the ‘news’ section, these the best course of action for children. While
pages appear to have remained static for the MMR The Facts could perform this role too,
lifespan of the website. A more dynamic and this concise, detailed page was held on the
interactive page came in the form of ‘your Department of Health’s (2002a) own domain.
questions answered’ (Figure 31.3). If parents The Department felt this necessary because it
felt the website did not answer their questions was clear that health workers were members
sufficiently, they could complete a webform of the public too. They had also been affected
with a specific question. This was then for- by constant media coverage and were them-
warded to an ‘expert’ within the Department selves unsure about the precise reasons
of Health who would provide an answer. The behind the MMR programme. The official
Wayback Machine has archived 40 of these ‘Green Book’ on vaccination was updated
questions, and they covered a range of top- once every few years, but was only available
ics from specific enquiries about allergies to in hard copy and could not adapt quickly to
requests for more empirical data (Department the changing discourse around MMR and
of Health, 2002f). vaccine technology.3 General practitioners
Related to this website was a specific page and nurses were known to have the greatest
dedicated to health professionals. Written in impact in educating and changing parents’
a more technical format, it provided a com- behaviour. Therefore, it was important that
prehensive summary of the sort of questions information was made available through the

Figure 31.3 ‘MMR The Facts – Your Questions Answered’ – Captured 3 December 2002.
Source: Department of Health (2002b).
A HISTORY WITH WEB ARCHIVES, NOT A HISTORY OF WEB ARCHIVES 471

most up-to-date and timely medium (Petrovic university server and written like an official
et al., 2001; British Medical Journal, 2002). medical journal article, it had been shared
It is interesting that the sites appear to around Selway’s non-medical friends. Its
have been maintained for a relatively short impact on the general public is unclear. We
amount of time. After 2004, the copyright only have the author’s own claims to its
information on MMR The Facts was no importance from his present-day personal
longer updated; and when the Department website (Phillips, 2017).5 However, as an
of Health’s domain changed from ‘doh.gov. individual document it can tell us more about
uk’ to ‘dh.gov.uk’, the MMR briefing page anti-vaccination campaigning and its place
did not migrate with it. It was replaced with within the wider history of the movement.
a generic ‘immunisation’ section, with a link Historians of the nineteenth and twentieth
to MMR The Facts for anyone who wanted centuries have shown that there were multi-
further information (Department of Health, ple objections to vaccination in Britain and
2004). Triangulating this with other sources, other countries (Durbach, 2005; Colgrove,
we can begin to answer why this might be so. 2006). Even today, parents have a wide range
The core period of press activity over MMR of beliefs, which are flexible depending on
began in 1998 and ended in 2004. By the time the vaccine in question, attitudes towards
of the domain switch, the Department was infectious disease and trust in medical and
confident that it had convinced the major- state authorities (Larson et al., 2014). By
ity of parents to vaccinate their children. employing the techniques of contemporary
MMR The Facts had served a useful purpose scientific literature, Phillips’ report shares
and would continue to do so. However, the many characteristics with the pamphleteers
remarkable drop-off in articles on MMR in of the previous century. Whether or not
The Lancet, the British Medical Journal Phillips himself was influential, this was a
and the general press after 2004 (Speers and good example of the sort of literature that
Lewis, 2004) shows that the government no public health workers believed was circulat-
longer felt the need to publicize MMR’s ben- ing among parents. There are limits to how
efits as forcefully to health care professionals. far we can generalize about all anti-­
It is through all these sources of information vaccination literature at this time, but it cer-
that these archived websites help us to tell the tainly fits within what else we know about
story of the government’s education practices the longer history of the movement.
during the MMR crisis. Through triangulation with other sources,
Phillips’ report also becomes useful as a win-
dow onto the attitudes of the general public.
Contemporary newspaper reports in the UK
UNC.EDU/~APHILLIP/WWW/VACCINE/ may not have cited it, but they amplified sim-
DVM.TXT ilar concerns – such as the claim that safety
testing was inadequate, the diseases being
While MMR The Facts was the most directly prevented were not particularly harmful
useful website for analysing government or that the authorities might be hiding
policy and educational strategies in the key evidence of harm (Speers and Lewis,
2000s, the Internet Archive also retained 2004). Many of these were then refuted
documents that gave a sense of the vaccine- in the ‘myths’ section of MMR The Facts
sceptic material circulating during the crisis. (Department of Health, 2002b) (Figure 31.4).
A correspondent to the British Medical Attitudinal surveys from the early 2000s
Journal (Selway, 1998) drew readers’ atten- show that these concerns were present, even
tion to an American report on the dangers of if not all parents fully believed them (Evans
vaccination (Phillips, 1999).4 Hosted on a et al., 2001; Hargreaves et al., 2003). Other
472 THE SAGE HANDBOOK OF WEB HISTORY

Figure 31.4 ‘MMR The Facts – Myths and Truths’ – Captured 19 October 2002.
Source: Department of Health (2002f).

health crises from the late twentieth century on the World Wide Web. The page on the
had reduced faith in medical authorities, University of North Carolina (UNC) server
meaning that these concerns about MMR had has long gone. Yet his personal site (Phillips,
become more believable. The most promi- 2017) carries a curriculum vitae. We learn
nent of these was the bovine spongiform through this and the report itself that Phillips’
encephalopathy (BSE) scandal (Speers and own child had become injured after receiv-
Lewis, 2004), but there was also adverse pub- ing vaccination. He had worked at UNC as
licity over heart surgery in Bristol (Kennedy, an IT technician, before studying law and
2001), the removal of organs from deceased becoming an attorney specializing in secur-
children without parental permission in ing exemptions for parents with conscien-
Liverpool (Redfern, 2001) and the tainted tious objections to vaccination. This helps
blood scandal in which haemophiliacs were to explain why the author was motivated to
infected with blood-borne diseases (Dyer, write and why he had the skills to write an
2001). When Phillips questioned whether official-sounding report and build a convinc-
‘public health officials always place health ing argument. This made him similar to other
above other concerns’ in his report (Phillips, charismatic campaigners for vaccination
1999), there were reasons why this might reform in the UK, such as Rosemary Fox and
resonate with a British audience. Jackie Fletcher, and helps historians to put
We can also place the author in context these debates into a wider narrative of con-
because he has remained an active vaccine tinuity and change over time.6 These trans-
sceptic and has maintained his presence national comparisons between Britain and
A HISTORY WITH WEB ARCHIVES, NOT A HISTORY OF WEB ARCHIVES 473

the United States are fruitful for those trying (Chen and DeStefano, 1998), they had been
to understand how lay scientific information mostly quiet about this and other potential
spread across borders during the internet age. interests (such as Wakefield’s ownership of a
Thus, while the report is not a central part of patent on an alternative measles vaccine).
the crisis in its own right, it becomes a jump- The URL that Rouse provided was not
ing-off point for asking further questions in the Wayback Machine. At the time of
about the social and cultural history of vac- my study, searching was not possible – and
cination. This is possible through using the even a subsequent search on the new inter-
documents retrieved in the Internet Archive, face did not produce any results. The domain
subjecting them to close reading and plac- – ‘mplc.co.uk’ – was owned by a web ser-
ing them in the context of other primary and vices company (e-crew, 2004). To see if this
secondary sources. was an error, I searched ‘The Society for
the Autistically Handicapped’ (in quotation
marks) with the British Library’s SHINE
interface. The Society appeared to be British,
RMPLC.CO.UK and since the quoted URL was on the .uk
top-level domain I hoped that the site would
The final example is a useful reminder of be indexed. Limiting the results to 1999 (the
how information can be lost. Web archives earliest year with any results for the search
and search tools can be employed forensi- string) returned the site. It was hosted on
cally to track down documents and informa- ‘rmplc.co.uk’. A simple spelling mistake
tion about events that might otherwise be had left the site invisible. Without a search
obscured. Soon after Wakefield’s press con- engine like SHINE, it would have remained
ference, a health worker named Rouse (1998) so. The report (Society for the Autistically
wrote to The Lancet to question Wakefield’s Handicapped, 2000) could only be found on
interests. He noted that ‘a simple internet a successor website on a different domain.
search … quickly found the Society for the Nonetheless, it bore out Rouse’s claims.
Autistically Handicapped’, and that this site Written by lawyers known to have been
contained a report from lawyers building a involved in the case (including Richard Barr:
case against the Department of Health that Barr, 2002; Deer, 2004), Wakefield appears
clearly showed Wakefield’s involvement. as one of the experts to whom parents are
This evidence appeared to show that funding encouraged to listen. The lawyers openly
for Wakefield’s research had come partly claim to be working with Wakefield, citing
through Legal Aid Board (later the Legal his research and the court case.
Services Commission) money which had This page is not crucial to the MMR story
been used to build the case on behalf of par- in itself, nor was it essential in the proceed-
ents of autistic children. Brian Deer (2004) ings against Wakefield. Rouse’s claims were
would later widely publicize such conflicts subsequently made elsewhere in the printed
of interest, and they formed part of the judg- and digital press, and there was enough
ment handed down by the General Medical evidence for Wakefield to be struck off the
Council (2010) that stripped Wakefield of his medical register. Yet, for social historians
licence to practice. That this information wishing to dig deeper into the ways in which
seemed to be readily available in 1998 raises voluntary organizations fought against the
questions. If this was well known, why was it government, these sorts of pages are useful
apparently ignored by the majority of the sources. This page – and presumably dozens
media criticizing government MMR policy? like it – would have been lost without search
Similarly, while the medical press was quick tools. The use of printed URLs as a method
to dismantle Wakefield’s scientific claims of selecting documents is not sufficient in
474 THE SAGE HANDBOOK OF WEB HISTORY

itself. Broken links may require further, it will be necessarily limited. We do not have
manual searching before the cited information extensive reactions from parents or other
can be discovered. members of the public. Anti-government
material (such as that written by Wakefield
and his colleagues), is only included as it per-
tains to Deer’s investigations. Without better
CONCLUSIONS archives with better discovery tools – both in
terms of the digital technology and methodo-
This review of the MMR crisis presents a logical techniques for storing and searching
partial story. The use of printed URLs in data – our histories will continue to skew
other primary sources necessarily skewed towards those sources we find easiest to get
which websites were discoverable and, in our hands on (Hitchcock, 2013).
turn, which websites were included in this Out of the thousands of scholars who con-
history of MMR. Inevitably, the views of the sider themselves historians, very few work
Department of Health received far more with web archives. Historians are often con-
attention than those of other actors. As with servative about how close to the present they
any documentary history, evidence bases are can work and the value of ‘historical dis-
biased towards not just those who had the tance’. In Britain, the end date for projects
power to create information, but those who has usually coincided with the ‘30-year rule’
have been able to maintain and archive that (referencing the fact that most official proce-
information for future scholars. dural documents are released in The National
This survival bias is exacerbated with Archives 30 years after creation). The rise in
internet sources because of link rot and the Freedom of Information, digitized official
archival practices of different organizations. reports and accessible web archives, however,
Historians are acutely aware that informa- is giving us much more material to work with
tion will be lost at the same time as so much from the near-past. Combine this with the
more data are being stored than ever before passage of time, and it will not be long before
(Rosenzweig, 2003). But we should not there will be a sizable cohort of historians of
think that because documents from ‘official’ 1990s and 2000s Britain. How will they make
sources are more likely to survive, this guar- use of internet archives? Undoubtedly, my
antees access to them. Many of the materi- research was aided by my experiences with
als from the General Medical Council, for BUDDAH and attendance at conferences on
example, are no longer available on the live web archiving. Yet this is the sum total of
web. Even for government agencies such as my ‘training’ on the subject; a product both
the Legal Services Commission, the press of muddling through by getting my hands
releases around MMR are not available dirty with the material and making copious
through The National Archives’ collection mistakes. Before I started with MMR, I was
of government websites. Instead, they were aware of the limits of what I would be able
found through the comprehensive online to read, of what conclusions could be drawn
archive maintained by Brian Deer (2017).7 from the way the archiving process works,
It is not difficult to see why this is problem- and what sorts of questions would be answer-
atic. While Deer has taken care to declare the able. This has resulted in a limited foray into
provenance of his documents, the material the archive, but not an unsubstantial one.
he presents is undeniably biased towards his Without even this basic awareness, would
particular investigation into Wakefield. There other social or policy historians be able to
is plenty of other material on MMR that will access such material and use it in a produc-
not and cannot be included in such an archive, tive way alongside their other sources? Are
and so the research questions we can ask of historians even aware that web archives are
A HISTORY WITH WEB ARCHIVES, NOT A HISTORY OF WEB ARCHIVES 475

a potential trove of information? It could be Notes


that I am isolated in my particular networks,
1  This work was conducted at the London School of
but conversations with colleagues in the Hygiene & Tropical Medicine through the Placing
field of contemporary British history suggest the Public in Public Health Wellcome Investigator
that there is an acute lack of awareness. As award project led by Alex Mold (grant number
events around the turn of the millennium will WT-100586-Z-12-Z). This chapter was originally
become more and more relevant to historical prepared as a paper for the RESAW conference
at the University of London, held in July 2017.
inquiry, it is essential that those working with The author would like to thank all those who
web archives pass on their experiences. attended and provided valuable feedback.
I am aware there are limits to what this 2  Web archives often contain duplicates of the
project can say about the UK’s MMR cri- same document, or files with such small changes
sis. There is much more that could be done they are effectively the same. Similarly, archives
can only collect what crawlers are able to access
with the Internet Archive to shed light on and store, meaning large sections of the web
other voices, especially in conjunction with have almost certainly not been catalogued. We
deeper readings of other media and oral his- have no way of telling what or how much. This is
tory techniques. As an adjunct to my main just the beginning. That said, even when these are
project, however, the archive has proved well-curated they can cause significant problems
that historians do not always confront. Research
particularly fruitful. It helped me to answer questions are often biased to what information
my specific research questions on govern- is available and how easy it is to work with. As a
ment policy. While I knew of the existence of result, Hitchcock (2013) worries that there may
MMR The Facts, it was only through reading be an over-reliance on the criminal records of the
and analysing the material contained within Old Bailey in London, depriving us of important
potential projects and conclusions about other
it that the Department of Health’s educational parts of the United Kingdom.
tactics became more visible. For historians 3  For the latest version of this – which is now avail-
who still work with ‘small’ rather than ‘big’ able online – see Public Health England (2013).
data, Alan Phillips’ and the Society for the 4  Although the document states that it was last
Autistically Handicapped’s pages give us revised in 1997, the first available copy through
the Wayback Machine of the report on Phillips’
insights into how individuals and voluntary archived site is from May 1999.
organizations embraced the new technology 5  Phillips (2017) claims that the report was ‘pub-
of the World Wide Web to communicate their lished around the world and translated into Rus-
positions. It will be up to future projects and sian, Chinese, and several European languages in
historians to get the most out of this and the the late 1990’s [sic]. It has been used in medical
school classrooms in three different countries,
other material still waiting to be analysed in appears on websites around the world, and has
the archive. One potential criticism of close appeared in publications including an Australian
reading approaches to this material is that we grassroots newsletter, Indian homeopathic jour-
may place too much emphasis on the docu- nals, the Hindustan Times, and American and
ments that we can find at the expense of the European magazines’.
6  Rosemary Fox (2006) led a campaign for changes
wider picture.8 Again, this is where triangula- in vaccination compensation law in Britain in the
tion can help. While it is difficult to measure 1970s. Although not a professionally trained cam-
the precise audience or the impact that any paigner or academic, her daughter Helen had
one document had, the fact that these pages been diagnosed with brain damage following a
were referenced in leading medical journals polio vaccination in the 1960s. Jackie Fletcher led a
group called Justice Awareness and Basic Support
gives us some indication that certain pub- (2001) during the MMR crisis, and campaigned for
lics had begun to take notice. Further work a court case against the Department of Health and
with link analysis and linguistic analyses of more informed choice for parents. Her son had
larger corpora may be able to ask different developed epilepsy after receiving MMR.
questions of this material and give us wider 7  The Legal Services Commission (LSC) withdrew
legal aid funding for the case against the Depart-
insights into the crisis.
476 THE SAGE HANDBOOK OF WEB HISTORY

ment of Health in October 2003. The first instance Campbell, H., Edwards, A., Letley, L., Bedford,
of the LSC’s website in The National Archives H., Ramsay, M., and Yarwood, J. (2017)
is December 2003, and the press release is not ‘Changing attitudes to childhood immunisa-
accessible. Brian Deer holds a copy, which can tion in English parents’, Vaccine, 35(22):
be corroborated through the Wayback Machine
2979–85.
(Legal Services Commission, 2003, 2004; Deer,
2017).
Chen, R.T., and DeStefano, F. (1998) ‘Vaccine
8  Although it would be simplistic to suggest that adverse events: Causal or coincidental?’, The
this is not an inherent problem with any source Lancet, 351(9103): 611–2.
base. Selection is inevitable in any historical Colgrove, J. (2006) State of Immunity: The Poli-
investigation, and subject to the biases of the tics of Vaccination in Twentieth-Century
researcher. It is only through being open about America. Berkeley: University of California
our methodologies that we can put historical Press.
claims into appropriate context. Deer, B. (2004) ‘MMR: The truth behind the
crisis’, Sunday Times, 22 February: 12.
Deer, B. (2017) Taxpayer cash for MMR action
is stopped after £15m that stoked fear was
REFERENCES spent (http://briandeer.com/mmr/lancet-lsc.
htm). Retrieved 28 July 2017.
Abbate, J. (2017) ‘What and where is the Inter- Department of Health (2002a) Measles, Mumps
net? (Re)defining Internet histories’, Internet and Rubella Vaccine (MMR) (http://web.
Histories, 1(1–2): 8–14. archive.org/web/20021213092753/http:/
Ackland, R., and Evans, A. (2017) ‘Using the www.doh.gov.uk/mmr/). Retrieved 28 July
web to examine the evolution of the abor- 2017.
tion debate in Australia, 2005–2015’, in Department of Health (2002b) MMR: Myths
Niels Brügger and Ralph Schroeder (eds), The and Truths (https://web.archive.org/web/
Web as History. London: UCL Press. 20021019073221/http:/www.mmrthefacts.
pp. 159–89. nhs.uk:80/basics/truths.php). Retrieved 28
Alaszewski, A., and Horlick-Jones, T. (2003) July 2017.
‘How can doctors communicate information Department of Health (2002c) MMR News
about risk more effectively?’, British Medical (https://web.archive.org/web/
Journal, 326(7417): 728–31. 20021202234730/http://www.mmrthefacts.
Badenoch, J. (1988) ‘Big bang for vaccination’, nhs.uk:80/news/). Retrieved 28 July
British Medical Journal, 297(6651): 750–1. 2017.
Barr, R. (2002) ‘Autism, bowel inflammation, and Department of Health (2002d) MMR The Facts
measles’, The Lancet, 359(9323): 2112–3. (https://web.azchive.org/web/
Berger, A. (2004) ‘Dispatches. MMR: What they 20020908120649/http://www.mmrthefacts.
didn’t tell you’, British Medical Journal, 329: nhs.uk/). Retrieved 28 July 2017.
1293. Department of Health (2002e) MMR World
Big UK Domain Data for the Arts and Humani- Map (https://web.archive.org/web/
ties (BUDDAH) (2015) Bursaries. (https:// 20021214014604/http://www.mmrthefacts.
buddah.projects.history.ac.uk/news/bursa- nhs.uk:80/worldmap/). Retrieved 28 July
ries/). Retrieved 29 July 2017. 2017.
British Medical Journal (2002) ‘Letters: Health Department of Health (2002f) Your Questions
professionals’ attitudes to MMR vaccine’, Answered (https://web.archive.org/web/
British Medical Journal, 322(7294): 1120–1. 20021203001027/http://www.mmrthefacts.
British Library (2017) UK Web Archive, SHINE nhs.uk:80/questions/). Retrieved 28 July
(https://www.webarchive.org.uk/shine). 2017.
Retrieved 24 July 2017. Department of Health (2004) Immunisation
Brügger, N. (2008) ‘The archived website and (http://web.archive.org/web/20040301041356/
website philology: A new type of historical http://www.dh.gov.uk:80/PolicyAndGuidance/
document?’, Nordicom Review, 29(2): HealthAndSocialCareTopics/Immunisation/fs/
155–75. en). Retrieved 28 July 2017.
A HISTORY WITH WEB ARCHIVES, NOT A HISTORY OF WEB ARCHIVES 477

Department of Health and Social Security plot’, Cultural and Social History, 10(1):
(1987) Promoting Better Health: The Gov- 9–23.
ernment’s Programme for Improving Primary Horton, R. (2004) MMR: Science and Fiction –
Care. London: HMSO. Exploring a Vaccine Crisis. London: Granta
Deswarte, R. (2015) ‘Revealing British Euro- Books.
scepticism in the UK Web Domain and Internet Archive (2017) The Wayback Machine
Archive Case Study’. London: Institute of (https://archive.org/web/). Retrieved 29 July
Historical Research (http://sas-space.sas. 2017.
ac.uk/6103/#undefined). Justice Awareness and Basic Support (2001)
Dobson, R. (2002) ‘Parents’ champion or loose Fair warning (http://web.archive.org/
cannon?’ British Medical Journal, 324(7334): web/20011025144125/http://www.jabs.
386. org.uk:80/jabsinformation.htm). Retrieved
Durbach, N. (2005) Bodily Matters: The Anti- 28 July 2017.
vaccination Movement in England, 1853– Keenan, A., Ghebrehewet, S., Vivancos, R.,
1907. Durham: Duke University Press. MacPherson, P., and Hungerford, D. (2017)
Dyer, C. (2001) ‘NHS told to pay £10m to ‘Measles outbreaks in the UK, is it when and
patients infected with hepatitis C’, British where rather than if? A database cohort
Medical Journal, 322(7289): 751. study of childhood population susceptibility
Eaton, L. (2009) ‘Measles cases in England and in Liverpool, UK’, BMJ Open, 7(3): e014106.
Wales rise sharply in 2008’, British Medical Kennedy, I. (2001) The Report of the Public
Journal, 338: b533. Inquiry into Children’s Heart Surgery at the
e-crew (2004) Holding page (http://web. Bristol Royal Infirmary 1984–1995: Learning
archive.org/web/20040330011552/http:// from Bristol. London: TSO.
www.mplc.co.uk:80/). Retrieved 28 July Larson, H., Jarrett, C., Eckersberger, E., Smith,
2017. D., and Paterson, P. (2014) ‘Understanding
Evans, M., Stoddart, H., Condon, L., Freeman, vaccine hesitancy around vaccines and vac-
E., Grizzell, M., and Mullen, R. (2001) ‘Par- cination from a global perspective: A system-
ents’ perspectives on the MMR immunisa- atic review of published literature,
tion: A focus group study’, The British Journal 2007–2012’, Vaccine, 32(19): 2150–9.
of General Practice, 51(472): 904–10. Larson, H., Figueiredo, A. de, Xiahong, Z.,
Fitzpatrick, M. (2004) MMR and Autism: What Schulz, W.S., Verger, P., Johnston, I.G., Crook,
Parents Need to Know. Abingdon: A.R., and Jones, N.S. (2016) ‘The state of vac-
Routledge. cine confidence 2016: Global insights through
Fox, R. (2006) Helen’s Story. London: John a 67-country survey’, EBioMedicine. DOI:
Blake. 10.1016/j.ebiom.2016.08.042
General Medical Council (2010) Dr Andrew Legal Services Commission (2003) Decision to
Jeremy Wakefield: Determination on Serious remove funding for MMR litigation upheld
Professional Misconduct (SPM) and Sanction. on appeal (https://web.archive.org/
London: General Medical Council. web/20031010112531/http://www.legalser-
Gorsky, M. (2015) ‘Into the dark domain: The vices.gov.uk/misl/news/press/press-13-03.
UK Web Archive as a source for the contem- htm). Retrieved 28 July 2017.
porary history of public health’, Social His- Legal Services Commission (2004) Legal Ser-
tory of Medicine, 28(3): 596–616. vices Commission (http://webarchive.nation-
Hargreaves, I., Lewis, J., and Speers, T. (2003) alarchives.gov.uk/20040105011642/http://
Towards a Better Map: Science, the Public www.legalservices.gov.uk/). Retrieved 28
and the Media. London: Economic and July 2017.
Social Research Council. McCartney, M. (2013) ‘MMR, measles and the
Health and Social Care Information Centre South Wales Evening Post’, British Medical
(2005) ‘NHS immunisation statistics: Eng- Journal, 246: f2598.
land, 2004–05’ London: National Statistics. Millward, G. (2015) ‘Digital barriers and the
Hitchcock, T. (2013) ‘Confronting the digital: accessible web: Disabled people, informa-
Or how academic history writing lost the tion and the internet’. London: Institute of
478 THE SAGE HANDBOOK OF WEB HISTORY

Historical Research (http://sas-space.sas. Rosenzweig, R. (2003) ‘Scarcity or abundance?


ac.uk/6104/#undefined). Preserving the past in a digital era’, American
Muminovic, M. (2002) ‘MMR’, British Medical Historical Review, 108(3): 735–62.
Journal, 325(7364): 604. Rouse, A. (1998) ‘Autism, inflammatory bowel
Murch, S.H., Anthony, A., Casson, D.H., Malik, disease and MMR vaccine’, The Lancet,
M., Berelowitz, M., Dhillon, A.P., Thomson, 351(9112): 1356.
M.A., Valentine, A., Davies, S.E., and Walker- Russell, A.L. (2017) ‘Hagiography, revisionism
Smith, J. (2004) ‘Retraction of an interpreta- & blasphemy in Internet histories’, Internet
tion’, The Lancet, 363(9411): 750. Histories, 1(1–2): 15–25.
Petrovic, M., Roberts, R., and Ramsay, M. Selway, J. (1998) ‘MMR vaccination and autism
(2001) ‘Second dose of measles, mumps, 1998’, British Medical Journal, 316(7147):
and rubella vaccine: Questionnaire survey of 1824.
health professionals’, British Medical Journal, Society for the Autistically Handicapped (2000)
322(7278): 82–5. Vaccines fact sheet (http://web.archive.org/
Phillips, A. (1999) Dispelling Vaccination Myths web/20000902060843/http://www.autis-
https://web.archive.org/web/ muk.com:80/index1sub4.htm). Retrieved 28
19990503175555/http:/www.unc. July 2017.
edu:80/~aphillip/www/vaccine/dvm.txt Speers, T., and Lewis, J. (2004) ‘Journalists and
Retrieved 28 July 2017. jabs: Media coverage of the MMR vaccine’,
Phillips, A. (2017) Allan Phillips, J.D. Attorney Communication & Medicine, 1(2): 171–81.
and Counselor at Law (http://www.vaccin- Turner, F. (2017) ‘Can we write a cultural his-
erights.com/attorneyphillips.html). Retrieved tory of the Internet? If so, how?’, Internet
28 July 2017. Histories, 1(1–2): 39–46.
Public Health England (2013) Immunisation UNdata (2017) Internet users (per 100 people)
against Infectious Disease. London: Public (http://data.un.org/Data.aspx?
Health England (2nd ed.) (https://www.gov. d=WDI&f=Indicator_Code%3AIT.NET.USER.
uk/government/collections/immunisation- P2). Retrieved 22 June 2017.
against-infectious-disease-the-green-book). Wakefield, A.J., Murch, S.H., Anthony, A., Lin-
Retrieved 21 May 2015. nell, J., Casson, D.M., Malik, M., Berelowitz,
Ramsay, S. (2001) ‘UK starts campaign to reas- M., Dhillon, A.P., Thomson, M.A., Harvey, P.,
sure parents about MMR-vaccine safety’, The Valentine, A., Davies, S.E., and Walker-Smith,
Lancet, 357(9252): 290. J.A. (1998) ‘RETRACTED: Illeal-lymphoid-
Ramsay, S. (2002) ‘UK government tries to nodular hyperplasia, non-specific colitis, and
control MMR panic’, The Lancet, 259(9306): pervasive developmental disorder in chil-
590. dren’, The Lancet, 351(9103): 637–41.
Ramsay, M.E., Yarwood, J., Lewis, D., Camp- Winters, J. (2017) ‘Breaking into the main-
bell, H., and White, J.M. (2002) ‘Parental stream: Demonstrating the value of internet
confidence in measles, mumps and rubella (and web) histories’, Internet Histories, 1(1–
vaccine: Evidence from vaccine coverage and 2): 173–9.
attitudinal surveys’, The British Journal of Zittrain, J., Kendra, A., and Lessig, L. (2014)
General Practice, 52(484): 912–6. ‘Perma: Coping and addressing the problem of
Redfern, M. (2001) The Royal Liverpool Chil- link and reference rot in legal citations’, Legal
dren’s Inquiry Report. London: TSO. Information Management, 14(2): 88–99.
32
Religion and Web History
Peter Webster

INTRODUCTION Shakkour, 2016); and from scholars of reli-


gious studies concerned in particular with the
It was the English explorer Sir Walter Raleigh relationship between religion and the media
who wrote that if ‘whosoever in writing a in general (Beckerlegge, 2001; Mitchell and
modern history, shall follow truth too near Marriage, 2003). The disciplinary labels vary
the heels, it may haply strike out his teeth’ between countries, but however it is named,
(Raleigh, 1614: preface). Raleigh’s remark, little of this writing concerns itself directly
written in 1614 and often quoted since, with the kind of questions that most preoc-
showed the danger for the historian in writ- cupy historians, although in time it will itself
ing about the very recent past. For historians become the raw material for historical study,
of contemporary religion, the difficulties are much as the pioneering religious social sci-
greater still, in that we follow close on the ence now is for historians of the 1960s.1
heels not only of the present, but of scholars Given both the size and the methodologi-
in several other disciplines who study reli- cal eclecticism of the literature to hand, this
gions in something close to real time. The chapter can make no claim to exhaustiveness.
literature on the phenomenon of religion in Rather, its first half attends to some debates
computer-mediated contexts is now very of particular historical and methodological
large, having built up over two decades. That note with which the emerging history of reli-
literature is also produced both in, and in the gions on the Web may fruitfully be brought
spaces between, more than one discipline: into conversation. These include debates con-
Internet Studies, which concerns itself with cerning both the Web itself as a technological
the nature of the medium (see the survey by system, and religious responses to technolog-
Campbell, 2011); the sociology of religion ical change in general. It also sets out some
(for example, Cheruvallil-Contractor and points of contact between Web history and
480 THE SAGE HANDBOOK OF WEB HISTORY

three key themes in contemporary religious scholarship in Internet Studies in general, and
history: secularisation; religious radicalism; on the religious Web in particular (Hoejsgaard
and the place of religion in civic life and the and Warburg, 2005; Wellman, 2011:18–9).
law. It also argues for a fresh integration of The degree to which both crazes and moral
the Web, and the archived Web in particular, panics are a natural accompaniment to far-
with the study of offline religion, in pursuit reaching technological change need not
of an ideal state in which the archived Web is detain us here. But both positive and negative
merely one of many kinds of primary sources discourses of the Web have been expressed in
with which historians work. both implicit and explicit theological, or at
The second half then takes a fourfold least ethical and moral, terms. If we are to
schema of different aspects of religions as understand the engagement of religious
they may be studied, setting out an agenda for people and institutions with the Web, we
future Web history research on each aspect. must attend to the history of the paradigms in
The references cited throughout the chapter which they conceive of the Web as a system.
are indicative and exemplary of particular This task would be relatively straightfor-
questions and approaches, rather than spe- ward were it not for a significant strand of
cific recommendations of this or that piece secular thinking that viewed both the Internet
of work in its substantive claims. The chapter and the Web in quasi-mystical terms as tech-
as a whole takes its examples predominantly nologies; a trend which owed something to the
from the Christian tradition, but would assert Christian and post-Christian culture of the West
that the picture it paints has a more general out of which it grew (see, for instance, Davis,
applicability. 1998, passim). This particular phenomenon
is also related to what David E. Nye dubbed
the ‘technological sublime’, in which new
technologies become objects of awe and won-
RELIGIOUS VIEWS OF THE WEB AS A der (Nye, 1994; see also Mosco, 2004). The
SYSTEM term cyberspace – a spatial metaphor where
none was necessarily required by the intrinsic
Both the Internet and the Web have been nature of the technology – has indirectly given
screens onto which all manner of cultural and occasion to considerable philosophical, not
social aspirations and fears, both utopian and to say mystical, reflection on the online as an
dystopian, have been projected, both strictly alternative plane of existence, even as reflec-
relating to religion and more widely. The tive of the nature of God: a fifth dimension, in
Internet has been feted as a great disruptor: a a sense (Burke, 2016: 158; Cobb, 1998). These
solvent of established privilege and the outlet understandings are ripe for historical examina-
for previously marginal opinions, and a lib- tion (for which, see Webster, 2018b), but here
erator of suppressed creative energy, in poli- one observes simply that to understand the
tics, commerce and the arts. It has equally history of the idea of the Web must involve
well been denounced as the harbour of crimi- attending to these discourses, their origins and
nality, the accelerator of falsehood, the their mutations over time.
destroyer of traditional industries, communi- As well as understanding religious and
ties, languages and cultures (Krotoski, 2014). quasi-religious paradigms of the nature of
Whilst these twin discourses of technological the Web as a system, historians have also the
utopia and dystopia have been widespread, task of attending to what might be termed
religious thinkers have themselves also the ‘religious Web’. I define this as the sum
adopted versions of both (an example of the of the activities of, and interactions between,
latter is Shallis, 1984). To a degree, a utopian religious institutions and those individuals
accent was also audible in some of the early who articulate or act out their own religious
RELIGION AND WEB HISTORY 481

faith online. (An alternative designation might on the online alone: what one scholar has
be the ‘religious Web sphere’.) Historians of referred to as ‘immanent Internet analysis’
this religious Web need also to attend to the (Krüger, 2005: 17). This was unsurprising as
ethical and philosophical frameworks in which scholars came to terms with the nature of the
religious people have tended to view their medium and the new methods required to
specific activities online. To take Christians study it. More recently there has been greater
in Europe and North America as an example, attention paid to the interaction between
some of the early dystopian visions of the online and offline. Recent studies have
computerised future were not the product of attended to the relationship between the two
mere ignorance and fear of change but of con- in theory, and in particular to the direction in
cerns about key ethical issues. To question which influence is exerted: the degree to
the economic effects of the Web on certain which offline roles and organisations are
industries or on certain countries was to make reflected online, and (conversely) the degree
a statement about economic justice based to which the online experience changes the
on much older Christian understandings of expectations that religious people have of
neighbourliness between citizens and between their face-to-face relationships with others
nations. To express concern about the distanc- (for example, Becker, 2011; Lundby, 2011).
ing effect of computer mediation on human At the same time, more empirical studies
relationships was to make a statement about have employed an approach that uses the
the nature of the human person and the rela- evidence of the Web in combination with
tional nature of God himself (see, for instance, other sources, such as direct observation and
Houston, 1998; Lochhead, 1997; Webster, interview evidence with key actors (Burke,
2018b). Concerns about the effect of ano- 2016; Krüger, 2005). These have been
nymity on online behaviour were intimately accompanied by calls for a multi-modal
connected with much older concerns about approach, or (to translate the matter into his-
the spiritually damaging effect of dissimula- torical terms), the integrated study of many
tion on the individual: of portraying oneself as classes of primary source (Cheong et al.,
someone other than the person visible to God. 2016). This is to be welcomed, and when
The rationalisation by American evangelicals viewed in terms of the history of the disci-
observed by Kelsy Burke – that anonymity pline of history, parallels the methodological
was acceptable since God himself was witness work done to incorporate new or neglected
to their conduct online – bears comparison kinds of sources, such as broadcast media,
with the thinking of Protestants under threat recorded music or material culture simply as
of persecution and death in Catholic countries particular kinds of primary source amongst
in the Reformation period (Burke, 2016: 158; many (see, for example, Harvey, 2009).
Dixon, 1997: 144–9; Pettegree, 1996: 85–96). In this respect, the study of history using
These various debates about a cluster of tech- the archived Web is a step behind again. The
nologies and their social and economic effects history of Web archiving itself is only 20
are as much a part of the history of the reli- years long, and scholarly engagement with
gious Web as individual sites and pages. the archives produced younger still (Webster,
2017b). Given this, historians using Web
archives are themselves in the process of
understanding the nature of the material
THE RELATIONSHIP OF ONLINE AND with which they must deal, and consequently
OFFLINE have been less concerned with its integration
with other kinds of sources. In this sense this
Some early studies of religion on the Web body of work is at the same stage of devel-
were marked by a concentration of attention opment as that of scholars of online religion,
482 THE SAGE HANDBOOK OF WEB HISTORY

perhaps 15 years ago. The (so far) small num- classic form attributed great agency to eco-
ber of studies of religious history using Web nomic, social and technological modernisa-
archives have begun to retrieve the element of tion in the weakening of religious belief and
change over time that is often missing from adherence (for a summary in relation to the
other disciplines, but without yet an integra- Web in particular, see Han, 2016: 1–11). In
tion with other source types, in some cases more recent years, greater stress has been
explicitly so (Hofheinz, 2010: 106; Webster, placed not so much on economic and political
2017a: 202). Use of the historic Web which change as on change in the terms of religious
is both diachronic and multi-modal must be discourse, and also on the ways in which reli-
the aim; only at that point will the enterprise gious practice has not so much declined as
look like historical research as is commonly mutated (see, inter alia, Brown, 2009, and
understood. Davie, 2000). Others have stressed the sig-
nificant growth in religious practice outside
the mainstream Christian denominations, and
of the so-called New Religious Movements
LONGER-RANGE HISTORICAL (Goodhew, 2012; Hunt, 2003). The present
QUESTIONS literature on online religion speaks both
directly and obliquely to these concerns in
For religious historians there is another and particular contemporary contexts, but with-
rather larger task at hand: to relate the devel- out beginning the work of understanding
opment of religion on the Web to larger and them in historical terms. Insofar as the Web
longer-range questions of religious history. as a technology may be understood as an
This presents problems, since very few histo- agent of modernisation, how far can it be said
rians of religion have begun serious study of to have been either inimical to or supportive
periods more recent than the late 1980s. As of religious belief and practice as understood
such, what follows is a projection forward in before the Web came about? Understanding
time of three particular historical debates to what degree the religious Web is a reflec-
currently underway regarding the 1960s, tion of social changes occurring offline
1970s and 1980s which are germane to the according to their own logic, as opposed to
more recent past. The examples given here (or as well as) being an agent in fostering,
are particularly pertinent in European and shaping or hindering those changes, is per-
North American contexts, and historians of haps the major challenge for contemporary
the non-Western world will have different religious historians of the next generation.
sets of problems and preoccupations to which Secondly, the events of 9/11 may in time
to refer, but certain similarities may be come to be seen as epoch-making and epoch-
observed. ending: the kind of marker with which his-
Perhaps pre-eminent is the continuing leg- torians divide the past into periods that may
acy of secularisation. Religious historians are be examined and understood, such as 1789
still deeply engaged in understanding how or 1914. But the religious radicalisation of
European, North American and Australasian which the attacks on the United States and
nations – once religiously homogenous elsewhere in subsequent years were a symp-
with highly observant populations, socially tom had a long pre-history, bound up in part
powerful churches and legal systems which with the wider ‘clash of civilisations’ identi-
privileged Judeo-Christian moral principles fied by Samuel P. Huntington (1997). There
– became religiously diverse societies, with was as well a specific history of the growth
secularised systems of law, increasingly non- and strengthening of conservative elements
religious populations and relatively insig- in several of the world religions, intimately
nificant churches. Secularisation theory in its connected to the progress (or otherwise) of
RELIGION AND WEB HISTORY 483

secularisation, into which it is now time to AN HISTORICAL MODEL OF


begin placing the history of the Web (Herbert RELIGIONS
and Wolffe, 2005; McLeod, 2007: 207–12).
Is it the case that the affordances of the Web The remainder of this chapter considers the
have sharpened religious radicalism, as people recent history of religion on the Web under
more easily encounter those things to which four heads. The first of these is doctrine and
they object? Alternatively, the story may be religious knowledge: the symbols and forms
one of a pre-existing radicalism, growing of words that describe the divine, the world,
according both to its own logic, the communi- the human person and their interrelations.
cation of which is facilitated and made more Second are religious organisations and their
visible. Only a reading of religious radicalism representatives (clerical or lay), which are
on the Web alongside its offline history will mostly responsible for the codification and
provide answers to these questions. (For an interpretation of doctrine and the framing
example of such work, see Howard, 2011.) and regulation of communal ritual. Third is
Thirdly, the lifetime of the Web has also religious practice: those communal and soli-
coincided with a sharply renewed public tary activities of prayer, worship and other
debate concerning the place of religious faith, rituals through which religious people
speech and practice in public life, in relation address the divine and represent their reli-
to the law in particular. In part this is a func- gion to each other and to others. Finally, the
tion of debates to which the growth of the section on religions and the Other deals with
Internet itself gave greater force: over censor- all the various modes in which religious
ship, freedom of speech and the idea of hate people and organisations encounter those
speech and its subjection to criminal or civil outside: as potential proselytes, as discussion
law penalties. But the questions are wider, partners in debate about wider social and
taking in the right or otherwise of both reli- economic issues and as antagonists.
gious and non-religious people to protection
from religious offence; the place of religious
buildings, rituals and symbolic objects in Doctrine
local communities; the right or otherwise to
religious expression in the workplace. Each Most of the major world religions are textual
of these has its own histories in different cul- to some significant degree. Sacred texts –
tures. In the Christian West these include the some directly given by God, others created
de-Christianisation of the secular law, which by humans acting in response to divine
has unfolded piecemeal over several dec- prompting – provide perhaps the principal
ades; the history of integration of immigrant way in which doctrine is fixed and transmit-
communities and the contested urban spaces ted. Surrounding sacred scripture is a wealth
which they often inhabit; and the rise of more of commentary, exposition and critique, writ-
assertive atheist and non-religious voices, ten variously by priestly figures, by those
partly in reaction to the increased religious whose profession it is to think and write, and
conservatism noted above (Amarasingam, by those in the wider circle of believers. The
2010; Garnett and Harris, 2013; Webster, earliest affordances of the Web were ideal for
2015a: 65–90). Just as with secularisation the reproduction of text; it was technically
and the growth of conservatism, any his- straightforward, and economic at a time
tory of religion and the Web must attend to when many users paid for their usage by the
the ways in which these debates are played metering of data. Many of the earliest ven-
out online, and the interplay of local, physi- tures of religious organisations onto the Web
cally bounded events with their networked were in the reproduction of texts, both for
representations. study by the devout and in order to engage
484 THE SAGE HANDBOOK OF WEB HISTORY

others (to adopt the distinction made by As well as examining the lives sacred texts
Hutchings, 2016). In the case of the Bible, lead online, another fruitful area of inquiry
this built on pre-existing ventures in software is the effect the Web has had on the shap-
designed for study, dating back as far as the ing of religious texts themselves: not so
1950s. much ancient texts as theological and other
When compared with the interactions writings which are still being produced. It
between individuals (examined below under seems likely that the content of preaching
‘Religious Practice’) these early ventures has been changed by the ready accessibility
in a new form of religious publishing have of a vast range of texts and other resources
attracted relatively little scholarly atten- online (not least other sermons), previously
tion. This may have been in part because, unobtainable by, and perhaps also unknown
for scholars of the Web as a technology, this to, local ministers. The time is also ripe for
represented a use of the medium of limited an historical assessment of the effect of the
technical inventiveness. But opportunities Web on the disciplines of theology and reli-
abound, particularly using the archived Web, gious studies, both in terms of the availability
not only for historians of religion, but also of primary texts for research, but also in the
of education, publishing, media and read- field of Open Access publishing. It is likely
ing. From the very beginning, to what extent that the present chapter could not have been
did the move online represent a duplication written, or at least would have been very dif-
of print publishing or a replacement of it? ferent, without being able to locate and then
What opposition did it arouse, and on what access scholarship on every religious tradi-
grounds? More recent years have seen the tion in many nations. One of the oldest online
advent of social media accounts, notably on theological periodicals celebrated its 20th
Twitter, publishing single verses of a sacred anniversary in 2013; a span of time which
text at a time. Sociologists of religion have invites serious historical reflection (Keown
already asked questions about the effect this and Prebish, 2013).
may have had on reading habits (Hutchings,
2016). Historians might also ask what rela-
tion such practices bear to the evangelical Organisations
Christian culture of the Bible verse on a car
bumper sticker, poster or bookmark (Harvey, A prominent theme in much of the scholar-
2008). ship so far has been that of the authority of
There are also questions, the asking of religious institutions and their representa-
which the scale of Web archives allows. tives. Studying the spirituality of Generation
Scholars of the nineteenth century have made X in 1998, Tom Beaudoin identified a suspi-
inventive use of corpora of digitised books cion of institutions as one of its four key
and newspapers to reconstruct the flow of markers (Beaudoin, 1998: 56–60). Not unlike
texts from publication to publication across other institutions and industries being ‘dis-
great geographical distances (Beals, 2016). rupted’ by the Web, religious institutions
The archived Web now affords rich opportu- were faced with the most fundamental of
nities to observe the spread of religious texts capitalist forces, a competition of ideas, and
from site to site, and their different fram- for attention and for loyalty. Intimately con-
ing and interpretation in each new publica- nected with broader narratives of the culture
tion context. There is an opportunity now to of the Web as ‘virtual communitarian’ and
examine the ways in which religious organi- entrepreneurial (Castells, 2001: 36–63), part
sations have sought to control the reuse of of the early utopian narrative was about a
these texts, by use of copyright law or other new religious freedom. This might be the
means, and for which reasons. freedom to experiment with new forms of
RELIGION AND WEB HISTORY 485

organisation whilst still in good standing how individual congregations are connected
with an existing institution, or to extend the to each other in a particular locality, within
range of activities of that institution to meet larger geographic units and with other nodes
needs hitherto unmet (Burke, 2016, passim; in larger national and international denomi-
Prebish, 2004: 145). It might equally national networks that transcend locality
include the freedom to found new religious (see Webster, 2018a; on the notion of a ‘Web
movements entirely outside, and indeed sphere’, see Brügger, 2010).
possibly inimical to, existing institutions A related theme in the religious history of
(Hoejsgaard, 2005). the twentieth century amongst the Christian
Each of those aspects continues to merit churches was the great hopes invested in
investigation, but the literature has also seen co-operation, and indeed potential reunion,
a more recent turn to try to understand the between the different denominations, revers-
organisations themselves: the degree to ing the great schisms of the Reformation
which organisations and people with author- period. Such questions were traditionally
ity within them adapted the way their author- studied in terms of high-level diplomatic
ity is articulated and gains assent. Based on a contact between denominations – the formu-
case study from Chinese Buddhism, Cheong lation of statements and the holding of con-
et al. (2016) outlined a phenomenon of ferences – but also in the face-to-face contact
‘strategic arbitration’ in which authoritative between Christians in particular localities
figures were able to re-define their offline (Power, 2007; Webster, 2015a: 21–48).
authority to include a role as ‘arbiters of The archived Web offers the opportunity to
knowledge and encounters’ online as well as observe those ecumenical contacts within
offline. Heidi Campbell’s study of the blogs religious traditions – international, national
of American Christians stressed the degree and local – as they are replicated on or trans-
to which bloggers were also already clergy ferred to the Web.
or other authoritative figures in their local
churches: here, authority offline is replicated
online rather than overturned (Campbell, Religious Practice
2010).
Even this more nuanced examination of Of the four aspects of religion under exami-
institutional adaptation has tended to be at the nation here, perhaps the most attention from
level of the individual. The growing availa- scholars of the Web has been paid to prac-
bility of the archived Web at scale now opens tice: what is it that religious people do when
up a new frontier: of investigating the online being religious online? Much of this work
relationships within and between organi- has been implicitly and explicitly ethno-
sations and their evolution over time. To graphic and indeed anthropological in
understand local and national organisations approach, observing, as it were, the behav-
within the same denomination, historians iour of an isolated tribe. This approach has its
have often examined the ways in which for- counterpart in offline research, such as the
mal hierarchical relations between individual method employed by James Steven to inves-
local congregations and the superstructure of tigate worship amongst English charismatic
their denomination have functioned, and how Christians (Steven, 2002: 42–54). Using a
information, people and funds have circulated combination of participant observation and
around the different parts of the structure (for interviews, scholars have examined several
an example of this kind of institutional his- different kinds of religious activities trans-
tory, see Chandler, 2006). The archived Web ferred to online spaces. Obvious continuities
now offers the means to reconstruct the ‘Web may be observed, such as those between the
sphere’ of each individual denomination: television or radio broadcasting of worship
486 THE SAGE HANDBOOK OF WEB HISTORY

(a feature of British life from the 1920s as a proportion of the faith community as
onwards) and live webcasting of the same. a whole. Did such rituals attract those who
One of the earliest studies examined charis- were not already habitually engaged in wor-
matic Christians ‘meeting’ in small groups ship in a physical location? How did partici-
for communal prayer in a virtual reality envi- pants understand what they were doing, and
ronment, a direct transference of a common how it related to their offline worship or pil-
practice into a new space (Schroeder et al., grimage? As such, a methodological shift will
1998). Becker (2011) examined the dynam- be required. The archived traces of the online
ics of Islamic rituals such as the shahada, or rituals themselves offer some evidence.
conversion ritual, in Dutch and German cha- Whatever textual traces of comment from
trooms. Particularly counter-intutive, it might participants that have been preserved will
seem, has been the migration of pilgrimage also be key, as will the contemporary inves-
online – a form of religious observance tigations made by sociologists of religion
which traditionally drew its very validity and others. However, sustained oral history
from the sacrifices involved in physically investigation will be essential to understand
travelling to a particular place. One example how and why people did so participate, and
of cyber-pilgrimage is that to Croagh Patrick what they took from their participation.
in the Republic of Ireland (in the Roman Direct engagement with site owners will also
Catholic tradition), but many others could be be required to secure access to statistics relat-
mentioned (MacWilliams, 2004). ing to usage. Offline published commentary
From the point of view of the historian, and interview evidence from religious lead-
much of the discussion of the ‘success’ of ers and others will also be key to understand-
such transference of offline activity to online ing the attitudes of religious professionals to
is beside the point. Whether or not such these various migrations of practice online:
transferred rituals fulfil the criteria for a were they a disruption, or an opportunity, or
valid and effectual religious observance must both?
remain a question for theologians and litur-
gists. However, specific historical attention
has turned towards changing expectations Religions and the Other
of what constitutes an ‘authentic’ religious
experience (a related but different question Perhaps the most significant area in which
to that of validity), and how the Christian the Internet has not simply provided spaces
churches in particular have responded to for the replication of existing religious activ-
this perceived demand (Garnett et al., 2006). ity, but allowed new kinds of activity, is in
Prayer, corporate worship, pilgrimage, partic- the contact between individual religious
ular buildings and religious ‘events’ such as people of different faiths and with those who
festivals all come into focus under this rubric, profess no religion. Before the Web, when
with a sharpened emphasis on the aesthetic examined internationally, the study of these
and communal aspects of shared experience interactions was necessarily confined to the
in a particular place. To what extent have high-level summitry between official repre-
both worshippers and religious organisa- sentatives of religions, since these were by
tions recognised online ritual as ‘authentic’, and large the only channels available for such
if the recent historical trend has been towards contact (Webster, 2015b). It was also possi-
greater emphasis on physical presence? ble to study the representation of minority
As such online ritual comes into histori- faiths in print and in mainstream media, but
cal focus, the questions will rather be to do given their broadcast nature, to recover the
with how widespread these practices were, reception of and reactions to those represen-
and how many believers engaged in them, tations was relatively difficult. To recreate
RELIGION AND WEB HISTORY 487

the contact between individuals of different not, religious historians studying the Web
faiths and none was a task of very detailed will need also to attend directly to anti-­
local and oral history; rarely did a local religious groups and the ways in which they
encounter leave a significant documentary have attempted to influence the shape of pub-
trace (one exception is documented by lic discourse about religion. Although they
Maiden, 2015). are by no means identical, the phenomena of
The Web in general, and social media in racism, xenophobia, anti-religious sentiment
particular, have afforded the opportunity for and far-right politics overlap, and the far-right
individuals to engage directly with others, was early to adopt the Internet as a means of
either as a means of fostering greater under- internal communication and outward dissemi-
standing or in a more antagonistic mode. nation (Halavais, 2010). Controversies such as
There has been relatively little scholarly those over cartoons of the prophet Muhammad
attention paid to this so far, perhaps because in Denmark in 2005 or the film The Innocence
the contacts between individuals of opposed of Muslims are objects of historical study not
views are harder to identify in the mass only for the reactions from those whose sen-
than are communities of the like-minded. sibilities were offended, but for understanding
However, the work of Stephen Pihlaja on the support amongst the anti-religious for them
Christian-atheist antagonism on YouTube (al-Rawi, 2016; Khan, 2015).
suggests a fruitful approach to the small-
scale interactions of individuals, which might
be replicated using archived blog comment
threads and other discussion fora (Pihlaja, CONCLUSION
2016).
We noted above the opportunity to begin The ‘Great Commission’ given by Jesus
to investigate the growth and change of Christ was to ‘go and make disciples of all
religious organisations as reflected in the nations’, and most (although not all) of the
archived Web. Another particular opportunity world religious traditions have been prose-
for scholars that the archived Web affords lytic in their aims. In this, we see both
is understanding the way in which religious another spur to historical investigation and a
organisations, large and small (as distinct reminder of the limits of Web history.
from individuals) have interacted online Certainly, some of the materials of such
with other organisations. The recent study evangelism have been made available online:
by Ackland and Evans (2017) on the debate sacred texts and expositions of them, and live
over abortion in Australia is instructive, using broadcasts of worship and the preaching that
data from the live Web collected in 2005 and is often embedded within it. The online text
2015 to reconstruct the network of pro-life has its counterpart in the printed tract so
and pro-choice sites and to analyse the terms widely distributed by Christian evangelists in
in which the issue was discussed. Webster earlier periods; the online sermon is prefig-
(2017a) also provided a case study of how ured by the televangelist of the 1970s, or the
the imprint of one religious leader’s site in street corner preacher before him. However,
the link structures of the UK Web changed as whilst the call to conversion may be studied
a result of a single controversial public state- online, the response is much less easily
ment. Such studies point the way towards recovered, which perhaps explains the rela-
longitudinal analyses of evolving debates tive lack of emphasis on this in the research
over social issues in which religious organi- to date. The stories of Christian conversion
sations play only a part. that have survived tend to stress the quietness
As well as public online encounters and the inwardness of the process by which
between religious people and those who are people come to adopt a position of faith:
488 THE SAGE HANDBOOK OF WEB HISTORY

processes which tend to leave no trace what- Feely (Eds), Historical Networks in the Book
ever at the time, and are often only recorded Trade. London: Routledge. pp. 148–70.
much later. (On the exceptionally well-­ Beaudoin, T. (1998) Virtual faith. The irreverent
documented but yet still complex case of spiritual quest of Generation X. San Fran-
C.S. Lewis, see McGrath, 2013: 131–59.) In cisco: Jossey Bass.
Becker, C. (2011) ‘Muslims on the path of the
closing, then, this chapter suggests that to
Salaf al Salih. Ritual dynamics in chat rooms
write the history of proselytisation and and discussion forums’, Information, Com-
response in the early twenty-first century munication and Society, 14(8): 1181–203.
requires not only the study of the Web, but Beckerlegge, G. (Ed.) (2001) From sacred text
also an integration of both the live and to internet. Farnham: Ashgate.
archived Web with the full range of other Brown, C.G. (2009) The death of Christian
primary sources that are the raw material Britain: Understanding secularisation, 1800–
of historical writing: personal testimonies 2000. 2nd edition, London: Routledge.
of both evangelist and convert; the experi- Brügger, N. (2010) ‘Introduction’, in N. Brügger
ence of the local congregations into which (Ed.), Web History. New York: Peter Lang,
the converted pass; changing institutional pp. 1–25.
Burke, K. (2016) Christians under covers: Evan-
strategies and the growth and decline of
gelicals and sexual pleasure on the internet.
membership overall. Such an investigation Oakland, CA: University of California Press.
is complex, and the sources often incom- Campbell, H. (2010). ‘Religious authority and
plete, but the achievement of such an inte- the blogosphere’, Journal of Computer-
gration would mark the point at which Web Mediated Communication, 15(2): 251–76.
history had become part of history as a Campbell, H. (2011) ‘Internet and religion’, in
whole. M. Consalvo and C. Ess (Eds), The Handbook
of Internet Studies. Chichester: Wiley-­
Blackwell. pp. 232–50.
Castells, M. (2001) The Internet galaxy: Reflec-
Note tions on the internet, business and society.
1  For instance, the work of the educationalist Har- Oxford: Oxford University Press.
old Loukes (1961) on religious attitudes amongst Chandler, A. (2006) The Church of England in
British teenagers is now a key source for histori- the twentieth century: The Church commis-
ans of the 1950 and 1960s. sioners and the politics of reform. Wood-
bridge: Boydell.
Cheruvallil-Contractor, S., and Shakkour, S.
(Eds) (2016) Digital methodologies in the
REFERENCES sociology of religion. London: Bloomsbury.
Cheong, P.H, Brummans, B.H.J.M., and Hwang,
Ackland, R., and Evans, A. (2017) ‘Using the J.M. (2016) ‘Research authority in religious
Web to examine the evolution of the abor- organizations from a communicative perspec-
tion debate in Australia, 2005–2015’, in N. tive: A connective online-offline approach’, in
Brügger and R. Schroeder (Eds), The Web as S. Cheruvallil-Contractor and S. Shakkour
History. London: UCL Press. pp. 159–89. (Eds), Digital Methodologies in the Sociology
al-Rawi, A. (2016) ‘Facebook as a virtual of Religion. London: Bloomsbury. pp. 137–46.
mosque: The online protest against Inno- Cobb, J. (1998) Cybergrace: The search for
cence of Muslims’, Culture and Religion, God in the digital world. New York: Crown.
17(1): 19–34. Davie, G. (2000) Religion in Modern Europe: A
Amarasingam, A. (Ed.) (2010). Religion and the memory mutates. Oxford: Oxford University
new atheism: A critical appraisal. Leiden: Brill. Press.
Beals, M. (2016) ‘The role of the Sydney Davis, E. (1998) Techgnosis: Myth, magic and
Gazette in the creation of Australia in the mysticism in the age of information. New
Scottish public sphere’, in J. Hinks and C. York: Harmony.
RELIGION AND WEB HISTORY 489

Dixon, P. (1997) Cyberchurch: Christianity and Howard, R.G. (2011) Digital Jesus: The making
the Internet. Eastbourne: Kingsway. of a new Christian fundamentalist commu-
Garnett, J., Grimley, M., Harris, A., White, W., nity on the Internet. New York: New York
and Williams, S. (Eds) (2006) Redefining University Press.
Christian Britain. Post 1945 perspectives. Hunt, S.J. (2003) Alternative religions: A socio-
London: SCM Press. logical introduction. Farnham: Ashgate.
Garnett, J., and Harris, A. (Eds) (2013) Rescript- Huntington, S.P. (1997) The clash of civiliza-
ing religion in the city: Migration and reli- tions and the remaking of world order.
gious identity in the modern metropolis. London: Simon and Schuster.
Farnham: Ashgate. Hutchings, T. (2016) ‘Studying apps: Research
Goodhew, D. (Ed.) (2012) Church growth in approaches to the digital Bible’, in S.
Britain: 1980 to the present. Farnham: ­Cheruvallil-Contractor and S. Shakkour (Eds),
Ashgate. Digital Methodologies in the Sociology of
Halavais, A. (2010) ‘Evolution of U.S. white Religion. London: Bloomsbury. pp. 97–108.
nationalism on the Web’, in N. Brügger (Ed.), Keown, D., and Prebish, C. (2013) ‘Celebrating
Web History. New York: Peter Lang. twenty years of the Journal of Buddhist
pp. 83–103. Ethics’, Journal of Buddhist Ethics, 20: 37–9,
Han, S. (2016) Technologies of religion: Spheres accessed on 16 December 2016 at http://
of the sacred in a post-secular modernity. blogs.dickinson.edu/buddhistethics/
Abingdon: Routledge. Khan, R.J. (2015) Muhammad in the digital
Harvey, D. (2008) ‘Seen to be remembered: age. Austin, TX: University of Texas Press.
Representation and recollection in contem- Krotoski, A. (2014) ‘Inventing the Internet: Scape-
porary British evangelicalism’, in M. Smith goat, sin eater and trickster’, in M. Graham
(Ed.), British Evangelical Identities Past and and W.H. Dutton (Eds), Society and the Inter-
Present: volume 1. Aspects of the History net: How networks of information and com-
and Sociology of Evangelicalism in Britain munication are changing our lives. Oxford:
and Ireland. Carlisle: Paternoster. pp. Oxford University Press. pp. 23–35.
180–200. Krüger, O. (2005) ‘Discovering the invisible Inter-
Harvey, K. (Ed.) (2009) History and material net: Methodological aspects of searching reli-
culture: A student’s guide to approaching gion on the Internet’, Online: Heidelberg
alternative sources. Abingdon: Routledge. Journal of Religions on the Internet 1: 1,
Herbert, D., and Wolffe, J. (2005) ‘Religion and accessed at http://heiup.uni-heidelberg.de/jour-
contemporary conflict in historical perspec- nals/index.php/religions/article/view/385/360
tive’, in J. Wolffe (Ed.), Religion in History: Lochhead, D. (1997) Shifting realities: Informa-
Conflict, Conversion and Co-existence. Man- tion technology and the Church. Geneva:
chester: Manchester University Press. pp. World Council of Churches.
286–320. Loukes, H. (1961), Teenage religion. London:
Hoejsgaard, M. (2005) ‘Cyber-religion: On the SCM Press.
cutting edge between the virtual and the Lundby, K. (2011) ‘Patterns of belonging in
real’, in M.T. Hoejsgaard and M. Warburg online/offline interfaces of religion’, Informa-
(Eds), Religion and Cyberspace. London: tion, Communication and Society, 14(8):
Routledge. pp. 50–63. 129–35.
Hoejsgaard, M.T., and Warburg, M. (2005) MacWilliams, M.W. (2004) ‘Virtual pilgrimage
‘Introduction: waves of research’, in M.T. Hoe- to Ireland’s Croagh Patrick’, in L.L. Dawson
jsgaard and M. Warburg (Eds), Religion and and D.E. Cowan (Eds), Religion Online: Find-
Cyberspace. London: Routledge. pp. 1–9. ing Faith on the Internet. London and New
Hofheinz, A. (2010) ‘A history of Allah.com’, in York: Routledge. pp. 223–37.
N. Brügger (Ed.), Web History. New York: Maiden, J. (2015) ‘“What could be more Chris-
Peter Lang. pp. 105–35. tian than to allow the Sikhs to use it?”
Houston, G. (1998) Virtual morality: Christian Church redundancy and minority religion in
ethics in the computer age. Leicester: Bedford, 1977–8’, Studies in Church History,
Apollos. 51: 399–411.
490 THE SAGE HANDBOOK OF WEB HISTORY

McGrath, A. (2013) C.S. Lewis, a life: Eccentric Steven, J. (2002) Worship in the Spirit: Charis-
genius, reluctant prophet. London: Hodder. matic worship in the Church of England.
McLeod, H. (2007) The religious crisis of the Carlisle: Paternoster.
1960s. Oxford: Oxford University Press. Webster, P. (2015a) Archbishop Ramsey: The
Mitchell, J., and Marriage, S. (Eds) (2003) Medi- shape of the Church. Farnham: Ashgate.
ating religion: Conversations in media, reli- Webster, P. (2015b) ‘Race, religion and national
gion and culture. London: T. & T. Clark. identity in Sixties Britain: Michael Ramsey,
Mosco, V. (2004) The digital sublime: Myth, archbishop of Canterbury and his encounter
power and cyberspace. Cambridge, MA: MIT with other faiths’, Studies in Church History,
Press. 51: 385–98.
Nye, D.E. (1994) American technological sub- Webster, P. (2017a) ‘Religious discourse in the
lime. Cambridge, MA: MIT Press. archived Web: Rowan Williams, archbishop
Pettegree, A. (1996) Marian Protestantism: Six of Canterbury, and the sharia law contro-
studies. Aldershot: Scholar. versy of 2008’, in N. Brügger and R.
Pihlaja, S. (2016) ‘Analysing YouTube interac- Schroeder (Eds), The Web as History. London:
tion: A discourse-centred approach’, in S. UCL Press. pp. 190–203.
Cheruvallil-Contractor and S. Shakkour (Eds), Webster, P. (2017b) ‘Users, technologies,
Digital Methodologies in the Sociology of organisations: Towards a cultural history of
Religion. London: Bloomsbury. pp. 49–58. world Web archiving’, in N. Brügger (Ed.),
Power, M. (2007) From ecumenism to commu- Web 25: Histories from the First 25 Years of
nity relations: Inter-church relationships in the World Wide Web. New York: Peter Lang.
Northern Ireland 1980–2005. Dublin: Irish pp.175–90.
Academic Press. Webster, P. (2018a, forthcoming) ‘Lessons from
Prebish, C. (2004) ‘The Cybersangha: Bud- cross-border religion in the Northern Irish
dhism on the Internet’, in L.L. Dawson and web sphere: Understanding the limitations
D.E. Cowan (Eds), Religion Online: Finding of the ccTLD as a proxy for the national
Faith on the Internet. London and New York: web’, in N. Brügger and D. Laursen (Eds),
Routledge. pp. 135–47. The Historical Web and Digital Humanities:
Raleigh, W. (1614) The history of the world. The Case of National Web domains. London:
London. Routledge.
Schroeder, R., Heather, N., and Lee, R.M. (1998) Webster, P. (2018b) ‘Technology, ethics and
‘The sacred and the virtual: Religion in multi- religious language: Early Anglophone Chris-
user virtual reality’, Journal of Computer-Medi- tian reactions to “cyberspace”’, Internet
ated Communication, 4(2). DOI: https://doi. Histories. 2(3–4).
org/10.1111/j.1083-6101.1998.tb00092.x Wellman, B. (2011) ‘Studying the Internet
Shallis, M. (1984) The silicon idol: The micro through the ages’, in M. Consalvo and C. Ess
revolution and its social implications. Oxford: (Eds), The Handbook of Internet Studies.
Oxford University Press. Chichester: Wiley-Blackwell. pp. 17–23.
33
Hearing the Past: The Sonic Web
from MIDI to Music Streaming
Jeremy Wade Morris

INTRODUCTION I had been a part of called alt.music.­smashing-


pumpkins (again, do these details hurt or
In a recent round of spring cleaning, I came help my story?). Another user in the forum
across an old cassette tape, one of about a was selling the tape and all I had to do was
dozen or so recorded cassettes and personal send $5 in the mail to somewhere in the
mixtapes that I had saved from the earlier United States and the tape would, suppos-
days of my music collection. I’m not sure edly, be mailed back to me. Being a cost-
whether it helps or hurts my academic cred- conscious Canadian college student who had
ibility to mention the tape was entitled never purchased anything from the Internet
‘Smashing Pumpkins, Dusseldorf Germany’, before, I worried this deal would go sour. But
a bootleg of a concert the titular band played such was my reckless fandom for the band
in 1996. The tape is decidedly analog…it has that potentially losing $5 seemed like a risk
a handmade photocopied cover image and worth taking on this other user, whose only
the physical casing of the tape is degraded. credentials were that they frequented the
But the tape is also forever digital in my same newsgroup I did.
mind. It was the first piece of music (and The tape arrived as promised a few weeks
perhaps the first commodity of any kind) that later and, 20 years later, it stands as a reminder
I purchased ‘from the Internet’, and in 1996, of the many ways in which analog and digital
it was well before standard conventions of collided in the mid 1990s, and the ways new
web commerce had fully developed. There technologies like the Web reconfigured social,
was no ‘click here to buy now’ button linked economic and cultural practices around cul-
to my Amazon account, no shopping cart tural goods like music. Even if computers and
icon to which I could add the purchase. The network technologies in the early 1990s were
tape, instead, came from a music newsgroup limited in their capacities to produce, distribute,
492 THE SAGE HANDBOOK OF WEB HISTORY

sell and play sound, a variety of communities past. These techniques often treat sound as
and companies nevertheless found ways to use secondary or present technical challenges for
the Web towards sonic ends. From early web- capturing sounds and sonic media. Further, as
sites and bulletin board sites where users traded with other digital formats, early sound files are
sound-related texts like MIDI files, song lyr- often unplayable through today’s technologies,
ics, guitar tablature, to the noises of new tech- either because of the ever-quickening pace of
nologies like the sound of the dial-up modem obsolescence or because of technical barriers
or ‘You’ve got Mail!’, to higher-bandwidth resulting from long and inconsistent battles
audio practices like file sharing, mp3 down- over proprietary formats and digital rights man-
loads, streaming music and podcasts, the Web’s agement. Even more open and available sonic
sounds have shaped the industrial, legal and cul- objects, like podcasts, are at risk of becoming
tural logics of the platform. inaccessible to researchers because of increas-
Despite this rich sonic past, histories of the ing commercialization and a lack of urgency
Web generally focus on the visual; we track around preserving something that’s seen as both
website snapshots over time with tools like the mundane and ubiquitous.
Wayback Machine, focus on different eras of I begin with a section called Web of
web design styles like Web 2.0 or flat design Sounds, where I provide an overview of the
and praise the Web’s visual technical innova- changing sound of the Web over the past sev-
tions. What if we amplified this knowledge of eral decades, including some key sonic tech-
the Web’s visual and technical history by lis- nologies that have shaped the Web. Then I
tening to it as well? Accordingly, this chapter move into a second section, Sound Histories,
explores some of the Web’s sonic past by con- which focuses on the challenges sound files,
sidering technologies that made sound playable formats and soundscapes present for digital
on the Web and some of the early communities historians. The Web’s rich sonic history has
that formed around new modes of music dis- been muted or muffled to date, but like all
tribution. I focus particularly on developments web histories, the challenge is how we gather
in North America, given that a number of these the material to tell it.
technologies come from US tech companies
like Winamp, Real Player, Napster, etc. Even
if the reach of many of these technologies WEB OF SOUNDS
was international, sound and its meanings are
always positional so the analysis provided here Early Sounds (1960s–1980s)
should be complemented with sounds and sonic
experiences from other regions. I track the rise Buried in the timeline for ‘A Little History of
of commercial efforts to sell and stream sound the World Wide Web’, which The World
online and the recent explosion in popularity Wide Web Consortium maintains on their
of podcasts and other new sonic forms. Then website, is a ‘snapshot of the WWW Project
I move on to address both the importance of Page as of 3 Nov 1992’ (W3C 2000b).
sound/sound commodities as elements of the Although a troublesome historical document
Web’s development and the difficulties posed as the ‘first’ web page (Gitelman, 2006), it
by web sounds as objects of history. While nevertheless offers a representation of what
all web histories are marked by their absences, early web pages looked like and contained.
web sounds are doubly vulnerable. Efforts and Built as a directory structure, the page offers
tools geared towards preserving, archiving and a link to ‘What’s out there?’ sorted by subject
displaying the historic Web, such as the Internet of interest (e.g. Mathematics, Literature,
Archive’s Wayback Machine, are visually ori- Computing, etc.) or by type of service (e.g.
ented, relying on snapshots, web crawls and Network News, Gopher, Telnet). Halfway
archived web pages as windows to the Web’s down the ‘Subjects’ page is the heading
HEARING THE PAST: THE SONIC WEB FROM MIDI TO MUSIC STREAMING 493

‘Music’, which includes links to ‘MIDI sound of a modem connecting with another
Interfacing’ and ‘Song Lyrics’, immediately modem across the repurposed telephone infra-
structure. It was the noise of being part of the
followed by a parenthetical note that reads
beginning of the Internet. (Madrigal, 2012)
‘(apparently disabled for copyright reasons)’
(W3C, 2000a). The music links no longer For users who sat through the process count-
resolve, nor do they appear in the Internet less times, the dial-up sound meant connec-
Archive’s Wayback Machine. While web tion; it opened up the isolated machine users
historians might never know which MIDI were on to a network of people, places and
files and song lyrics they’d find if these links possibilities. It was a fusion of the analog and
still worked, the existence of these links is at digital; sound waves that carried data across
least evidence that, even in the Web’s earliest phone lines to enable contact, communica-
stages, sound and music were key areas of tion and community.
interest for potential web users. BBSs grew in popularity in the 1980s,
These links aren’t, of course, the first evi- sparking networked discussions on every-
dence of sound on the Internet. In fact, for day subjects like politics, business and music
many early Internet users, the very sound of as well as other more niche subcultures and
connecting to the network was an iconic and scenes (Driscoll, 2016). While primarily tex-
instrumental part of the experience. From the tual interfaces for asynchronous discussions,
acoustic couplers in the 1960s that helped BBSs also had a sonic component to them.
send modem sounds through telephones and As BBS creators became proficient with the
connect two computers (Mann, 1998) to the use of the ANSI text standard to create highly
dial-up modem technology that emerged in visual login screens, with many becoming
the late 1970s and early 1980s from work at part of the larger ‘artscene’ and ‘demoscene’
universities and on NSFnet (Abbate, 1999; movement in computer graphics (Driscoll
Baym, 1994; Hauben and Hauben, 1997), and Diaz 2009), some programmers learned
sound has been a core component of network to add basic musical tones to their BBSs,
technology. In the 1970s and 1980s users giving them a distinct sonic as well as visual
would employ a modem to dial into bulle- character (Scott, 2005).
tin board systems (BBSs), Usenet or similar The sounds of dial-up modems and BBSs
service providers and the sounds of that con- in the 1980s and early 1990s were, like the
nection process would play aloud. Dial-up soundscapes of any period, limited by the
modems made a peculiar though not entirely sound-producing capabilities of the primary
unfamiliar sound (for a nostalgic reminder, technologies of the time. Although early
listen to willterminus, 2008). The dial-up mainframe computers of the 1960s and
sound began like a phone call, with dial tones 1970s were entirely capable of processing
and number tones, but turned into something sound, and many electronic music compos-
more akin to a distorted fax machine: a set ers had been experimenting for decades with
of high-pitched screeching tones bouncing computer music (Manning, 2004), audio was
back and forth, negotiating the terms of con- largely secondary when ‘personal’ computers
nection before giving way to a wash of static. emerged in the late 1970s. The first personal
Technically, the sound represented a hand- computers were extensions of office tools
shake between two computers; the protocols (Venkatesh, 1996; Venkatesh and Vitalari,
by which further communication could take 1987), calculating machines to enhance pro-
place (Madrigal, 2012). Culturally, it meant ductivity at work (Friedman, 2005: 102–10,
so much more: 121; Kirschenbaum, 2016). Computers at
Of all the noises that my children will not under-
the time were not designed for, or perceived
stand, the one that is nearest to my heart is not as, entertainment devices (Friedman, 2005,
from a song or a television show or a jingle. It’s the Venkatesh, 1996), though developments in
494 THE SAGE HANDBOOK OF WEB HISTORY

video game sound on consoles like Atari reconfigured through programs like Trantor’s
and Intellivision in the early 1980s helped Music Box (1991) that let users play audio
shift these conceptions. New affordances CDs in CD-ROM drives (Grunin, 1991).
for making digital sounds and new kinds of The multimedia ‘revolution’, then, was
sound communities, like the chiptune scene1 more of a disparate effort on the part of
(Driscoll and Diaz, 2009), meant that while manufacturers, software developers and tech
computer audio wasn’t yet mainstream, it journalists to expand the market for personal
certainly wasn’t silent. computers than a cohesive movement, and
sound and video played key roles in this tech-
nical and cultural re-imagining of computers
Multimedia Sounds (1980s–1995) and computing. Hardware like CD-ROM
drives, sound cards and stereo ports, as well
Along with advances in gaming, the multi- as audio playback software, served as build-
media ‘revolution’ of the late 1980s and early ing blocks for the sonic web; they were part
1990s brought a number of changes to the of a series of technologies that made the com-
computer’s audio and video capabilities puter an all-purpose sonic machine that could
(Friedman, 2005: 121; Venkatesh, 1996). make, rip, store, play and burn music. They
Companies like Apple, which had developed were also reminders of how foreign the con-
specialty audio technologies in the late 1970s cept of using computers for music and sound
and early 1980s, began to include advanced playback was, even with the advances of the
audio as standard features on all kinds of multimedia ‘revolution’. Using the device for
personal computers, giving everyday users general music production or playback often
the ability to create, record, mix and master required user effort and technical skill, like
music. As a 1991 review for a Macintosh installing sound cards and the translation of
desktop computer noted: a whole series of technologies and practices
onto the computer (Petzold, 1991). Yet even
Recently, sound on the Mac has gotten even
better, suitable for general users rather than just if it was slow and uneven, and more hype
for musicians. Stereo-sound-output ports now than happening, the multimedia ‘revolution’
come standard with every Mac except the LC, helped turn computers from office extensions
which has a mono-output port. The LC and IIsi to personal media devices, something more
come standard with sound-input ports and micro- akin to the ‘small-scale technologies […] for
phones as well, and with third-party products, you
can add sound-input capabilities to other Macs. the transformation of consciousness and com-
(Gruberman and McQuillin, 1991) munity’ and tools for individual self-fashion-
ing we consider them today (Friedman, 2005;
Additionally, the late 1980s and early 1990s Turner, 2005: 489.
saw the introduction of the sound card: an As a variety of communities used these
internal electronics card or chip that provided newly capable ‘personal’ computers to con-
a user’s computer amplified and more cus- nect in BBSs, newsgroups, FTP or Gopher
tomizable sound than the standard internal Sites and other networks, users traded files
speaker. For more adventuresome users, that were sonic in intent, if not in practice.
companies like AdLib and Creative Labs MIDI files, for example, were some of the
provided cards like SoundBlaster and earliest ways users swapped ‘music’ online,
GameBlaster that users could install on their even if those files were more data than sound
machines for enhanced sonic capabilities. (Alderman, 2001: 28–30; Haring, 2000:
CD-ROM drives also served as Trojan horses 83–5). MIDI or Musical Instrument Digital
for getting recorded music onto the com- Interface was a file standard that emerged
puter. Designed originally to store data, play in the early 1980s. It was an open stand-
games, or offer encyclopedias, they were ard designed to coordinate communication
HEARING THE PAST: THE SONIC WEB FROM MIDI TO MUSIC STREAMING 495

between musical instruments and computers songs they had downloaded from the Web
by storing and transferring data about spe- (Amorim, 2007; Ness, 1993).
cific signals or events (e.g. pitch, volume, Internet companies at this time also started
velocity, panning, tempo, etc.), but it was also thinking sonically. An online service pro-
the perfect way for artists and fans to send vider in the United States and Canada called
musical ideas and ‘performances’ across a Quantum Link (Q-Link) hired a voice actor
relatively low-bandwidth system (Théberge, named Elwood Edwards to voice a few stock
1997: 83–90). MIDI files, after all, were phrases and help make their Internet service
merely the instructions that, when combined stand out. When Q-Link changed its name in
with the synthesizers and sound cards capa- 1991 to AOL and began blanketing the United
ble of processing and sequencing MIDI, States with a flood of start-up CDs, mil-
could then create sound. There were also lions of users heard Elwood’s ‘Welcome!’,
many lyrics sites and web spaces devoted to ‘File’s done’, ‘Goodbye’ and ‘You’ve Got
sharing musical notation and tablatures, like Mail!’ each time they started surfing the
the alt.guitar.tab newsgroup that become the Net. Estimates from around that time sug-
Online Guitar Archive (OLGA), one of gested the sounds might have been heard
the largest repositories of guitar tablature on more than 27 million times a day (Sittenfeld,
the early Web. 1998). Minor as they may have been, the
In the mid 1990s, as the Web became the sounds brought visual interfaces to life and
primary interface for navigating the Internet, humanized technology that had bureaucratic
these sonic communities migrated to web- and militaristic historical lineage. Like the
sites that allowed for more direct distribution modem sound, AOL and other service pro-
of sound. What had been vibrant communi- vider sounds became sonic souvenirs of the
ties of users trading MIDI files, song lyrics early Web’s soundscape in the United States
and other textual representations of sonic (AOL used different voice actors in other
objects became thriving sites for making and countries). Together with sounds such as the
sharing actual sounds and songs. Sites like Intel Chime, the various operating system
the Internet Underground Music Archive start-up sounds and online game sounds,
(IUMA) were some of the first digital hubs these noises helped form the soundscape of
and most-cited places to visit on the burgeon- early web use.
ing network (Alderman, 2001: 12–4; Haring,
2000: 36–8). A pre-cursor to mp3.com,
MySpace and other music-focused social Web Sounds (1995–2000s)
networks, the IUMA encoded and converted
music sent from artists (usually on cassettes By the mid 1990s, there was little doubt that
or CDs) into compressed digital formats and the Web was a thriving environment for those
provided a web space for artists to post songs interested in sound and music. A variety of
and band photos and offer merchandise for formats and services emerged to cater to the
sale. There and elsewhere on the Net, music millions of users who were turning to the
enthusiasts were converting and uploading Web for all kinds of news, information and
bootleg versions of concerts, b-sides and entertainment. College radio stations like
other rarities, creating websites for their WXYC in Chapel Hill, WREK out of Atlanta,
favourite artists and posting MIDI files for and KJHK in Lawrence raced to be the ‘first’
users to download and play. Using primitive to provide an Internet broadcast in late 1994
software media players like XingSound – the (Bottomley, 2016), while AudioNet brought
first commercially available real-time audio many terrestrial radio stations online when it
encoder launched in 1993 – or XMCD, users launched in 1995. That year also saw the
could rip CDs to their computers or play launch of Progressive Networks’ RealAudio,
496 THE SAGE HANDBOOK OF WEB HISTORY

which brought a range of streaming audio to played once the copy was authenticated as a
the Web, from music to talk to sports radio. legitimate purchase.
Designed to address the frustratingly long In addition to new formats, there were a
wait times posed by downloading large host of new interfaces for finding and playing
media files over slow modems, RealAudio’s music and sound as well. Playback programs
service worked by allowing users to listen to like Winamp, Windows Media Player, iTunes
a file in real time. Its streaming innovations and others gave users new tools for building
made it a pioneer in attempts to bring radio playlists, storing collections and visualiz-
and television online (Rothenberg, 1999), ing music, while file-sharing networks such
even if they were ultimately disadvantaged in as Napster, Gnutella, Limewire and Kazaa
an online audio and digital music market that provided users spaces to socialize, share and
preferred the direct download for most of the discover music through alternative outlets
2000s (Alderman, 2001: 40). (Morris, 2015). Some of these programs were
This proliferation of new services, sites and more industrially approved (e.g. iTunes) than
media players brought new modes of delivery others (e.g. Napster), setting off a decade-long
and new audio formats for creators and users contest between entrenched music industry
alike. The mp3 format, for example, which interests and upstart tech companies that saw
was the result of research conducted on music and sound as just another form of infor-
behalf of a broader consortium of radio and mation that was meant, like other information
television broadcasters and the film indus- on the Web, to be free. Although the passing
tries (the Motion Picture Experts Group), of the Digital Millennium Copyright Act in
proved itself to be a relatively convenient 1996 and the formation of the Secure Digital
and efficient format for sending songs via Music Initiative in 1998 were early warning
Internet connections and to portable music signs of the court cases and lawsuits that fol-
players (Katz, 2004; Sterne, 2012). A host lowed in the 2000s (Burkart and McCourt,
of competing formats that never gained the 2006), the struggle against piracy and the
prominence of the mp3, such as a2b, Liquid implementation of digital rights management
Audio, Windows Media Audio and Ogg technologies (Gillespie, 2007) defined much
Vorbis, also emerged around this time and of the debate around music and sound on the
each offered different levels of proprietary Internet in ways that still shape music online
control, digital rights management or open- to this day.
ness (Morris, 2015). From the sheer number Complicating these legal battles was the
of these competing formats, it’s clear the rise, in the early and mid 2000s, of various
story of sound on the Web in the late 1990s sound- and music-focused social networks
and early 2000s was contradictorily about that offered the distribution of free sounds.
disseminating sound further, faster and more Although there had been other social net-
conveniently, and about controlling this dis- work sites before it, MySpace was the first
semination in order to profit from it. The bat- mainstream music-focused social network
tles over the various formats were ostensibly site to connect a vast number of established
about which format should prevail or which and emerging musicians with their fans. It
company(ies) might provide the infrastruc- offered users general social networking fea-
ture for this burgeoning market, but these tures but MySpace stood out for the ways it
format wars also very much affected the way allowed artists to market themselves and their
users could and couldn’t use digital audio in music (Suhr, 2012). Every artist could create
this new environment (Burkart and McCourt, a profile and could add their music to it, all
2006; Gillespie, 2007). Some formats limited of which was offered freely, while users had
the players on which you could play back the more intimate access to their favorite artists
music; others only allowed for music to be and music. Subsequent social networks and
HEARING THE PAST: THE SONIC WEB FROM MIDI TO MUSIC STREAMING 497

communities like imeem, GarageBand.com, What were once gray and unsure markets are
Last.Fm, Bandcamp and SoundCloud built now flourishing areas for growth, with digital
on MySpace’s legacy by turning audiences formats comprising half of all purchases and
into communities based around sound, music streaming eclipsing most other forms of con-
and music creation (Suhr, 2012). sumption (IFPI, 2017). Downloadable mp3s
Beyond music, the Web also became a are still of value, and valuable, to users and
prime outlet for audio blogging and podcast- companies alike, but the future that compa-
ing in the 2000s. Although there had been nies like RealAudio and mp3Tunes bet on in
experiments in personal audio broadcasting the early 1990s resonates anew in music ser-
as early as 1993, with hobbyists and technol- vice providers like YouTube, Spotify, Apple
ogists like Carl Malamud broadcasting online Music, Google Play, Pandora, Tidal and oth-
with a technology called Mbone (Bottomley, ers. These services co-exist uneasily with less
2016) or with more mainstream programs like industrially sanctioned file sharing and bit
Winamp’s SHOUTcast, the roots of podcast- torrent sites, as well as private tracker com-
ing more fully took hold in the audio blog- munities. Fans continue to trade lyrics, mp3s,
ging practices of the early 2000s. By the time MIDI files and other sonic texts in a variety
journalist Ben Hammersley (2004) acciden- of sound-focused communities. Newer tech-
tally coined the term ‘podcasting’ in an arti- nologies to connect to the Web, like smart-
cle for The Guardian, personal broadcasting phones, tablets and home assistants bring
on the Web was already a burgeoning cultural new sounds, alerts and notifications that are
form. Podcasting crystallized and institu- defining the contemporary web soundscape.
tionalized through the 2000s (Sterne et al., Yet this partial sampling of the Web’s sonic
2008), as larger players like iTunes (2005) history raises a number of issues about how
incorporated podcasts into their software and researchers, archivists and anyone interested
other aggregators, like Podcast Alley, offered in their own sonic pasts will keep track of the
access to thousands of podcast episodes from Web’s current and future sounds. I turn now
amateurs, public radio stations and commer- to consider some of the challenges web audio
cial media alike. Although critics claim we presents for sonically inclined historians by
are currently experiencing a ‘golden age’ of looking at some of the current repositories
podcasts (Blattberg, 2014; Roose, 2014), per- for historical web sounds. If the Web’s early
sonal broadcasting online has been growing decades were hard to archive, future decades
steadily since the advent of the Web itself. may be even harder.
So, while I remember my purchase of the
Smashing Pumpkins bootleg tape as a pio-
neering act in the history of online music
and e-commerce, there’s plenty of evidence SOUND HISTORIES
suggesting it was simply a tiny transaction
in the much longer history of the Internet as While all web histories are marked by their
a tool for sound and for commerce2. Still, absences – by what cannot be captured in a
it was only in 1995 that the US National dynamic and often-changing environment of
Science Foundation lifted the prohibition of code, objects, pages, sites and spheres
commercial enterprise on the Internet, so my (Brügger, 2009) – web sounds are doubly
1996 purchase, and the many other down- vulnerable. First, web archiving tools are pri-
loads, shares and trading of sounds described marily built on visual metaphors and thus
above, still stand as examples of practices neglect the role of audio. Second, preserving
around digital sonic objects that were still audio formats often requires preserving the
under negotiation with users, technology sounds themselves as well as the technologies
companies, regulators and media producers. on which to play those sounds. Like so many
498 THE SAGE HANDBOOK OF WEB HISTORY

other digital media, web audio is hard to hear and a vast collection of live bootleg concerts.
not just because it is hard to find and save but These collections are invaluable, but they
because it is hard to know what to save along are also stripped of much of their original
with it to make it playable in the future. context. The IUMA Collection, for exam-
Driscoll and Diaz’s claim that ‘the role of ple, provides access to sounds but doesn’t
music, sound, and noise in computer games really provide a sense of how these songs
remains relatively under examined’ (2009:2) and sounds appeared within the original site.
applies to web history more broadly and to the If the visual snapshots from the Wayback
sonic elements of the web’s past and present. Machine are silent, the sound collections are
hard to visualize.

The Sounds of the Internet


Archive The Sounds of YouTube and User-
Generated Repositories
The current tools for archiving and display-
ing older versions of the Web are highly In addition to songs and other popular audio,
visual. The Internet Archive’s Wayback the Internet Archive also has some of the
Machine, for example, can certainly help us iconic computer and Internet sounds from
to see what Napster’s website looked like early days of the Web. The ‘You’ve Got
between 1998 and 2001, or what people were Mail!’ sound, for example, is preserved, as is
talking about on Winamp’s early forums the sound of a dial-up modem, though many
(Morris, 2015). There’s even screengrabs of others are absent. Some of these sounds have
the IUMA dating back to 1996, where users not changed since their earliest iterations
can see that the IUMA was boasting that the (e.g. ‘You’ve Got Mail!’), while others seem
site was ‘Now Real Audio Enriched!’, though to be re-creations or approximations of the
the links return error pages when trying to sounds as users remember them. The archived
play music from the artists or genres listed. dial-up modem sound, for example, comes
Like so many of the other audio and media from a YouTube video that was uploaded in
links in the Wayback Machine’s archives, the 2008, along with the caption:
source audio has either moved or is no longer
I was bored so I wanted to see if I could get free
playable. This snapshot-based approach to
dial up internet so I found that NetZero still has
web preservation and display is crucial, free service so I put in the number and heard the
allowing users to time-hop through various glorious sound of the Dial-up. Remind me of years
representations of the past, but it foregrounds gone. Unfortunately I was not able to make a con-
the visual and ensures a site’s text/layout is nection. (willterminus, 2008)
preserved while capturing less of the other
media associated with historical sites. It’s not exactly the sound of a modem from
This is not to say that the Internet Archive the early 1990s, but it represents the sound
is absent of audio. The site has a robust close enough to satisfy close to 6,000 com-
audio archive, with millions of audio files menters, who chime in with responses like
uploaded by dedicated web users and histo- ‘Ah yes, the sound of my childhood’ or
rians. There are, for example, 45,036 results memories like ‘Get off the Internet, I’m on
in the ‘IUMA (Internet Underground Music the phone’. However, responses from users
Archive) Collection’, which is made up of from different countries (or different loca-
songs that were posted to the site over time. tions within certain countries) serve as a
The broader audio collection has thousands reminder that what’s past for some is present
of files, including podcasts, popular music, for many. Dial-up is only a ‘historical’ sound
local terrestrial and web radio, audiobooks for those privileged enough to live in cities
HEARING THE PAST: THE SONIC WEB FROM MIDI TO MUSIC STREAMING 499

and countries where the Internet infrastruc- Congress, etc.), genre (e.g. Cornell’s Natural
ture is developed, or for users with the finan- Sound Archive, Smithsonian Folkways World
cial means to be able to afford faster Music Collection, etc.), technology (e.g.
connections. Even in the United States, some Radio Archives, Cylinder Audio Archive,
3% of the population, or close to ten million etc.) or artist (e.g. The Studs Terkel Archive,
users, still regularly connect to the Internet Alan Lomax Sound Archive, etc.) in their
via dial-up (PEW, 2013). The sounds of the collections. They don’t, however, address the
Web, in other words, point to the sounds of Web or many of the new noises Internet-
the various digital divides that have structured related technologies have brought since the
and continue to structure the technology. 1990s. More unofficial repositories, such as
On YouTube, videos of users trying to con- the Museum of Endangered Sounds3, fill in
nect 56k modems using 2017 technology, or some of these gaps, but hardly in a way that
grainy old home videos showing users start- is systematic or comprehensive.
ing up their computers in the 1990s, point to Unlike analog audio objects that leave
a second challenge of saving the sonic web: physical traces and copies of sounds that
like other media, preserving the historical historians can listen back to in the future,
sounds associated with the Web also means digital sounds are often intimately tied to the
preserving the technologies on which to play technologies that produce them. The start-up
those sounds. RealAudio, for example, was a chimes or BBS home page sounds exist within
crucial audio format in the 1990s and a sig- a particular machine or website for a particu-
nificant portion of early online broadcasts lar period of time exclusively unless they are
were made using RealAudio and RealPlayer preserved in other formats or migrated to new
technologies. As with the many other propri- hardware and software. Analog media are, of
etary formats released at the time, RealAudio course, not immune to these challenges; cas-
requires RealPlayer to decode and play the settes need cassette players, Betamax tapes
audio, making any sound saved in that format need Betamax machines and analog materi-
dependent on the continued maintenance and als decay over time, whether it’s valuable
support of RealPlayer technology (which is film stock or my Smashing Pumpkins cas-
currently supported, but only for Windows). sette. But the intertwining of format, soft-
Other sound formats and services, such as the ware and hardware that comprise many web
Yahoo Music Store or the inaccurately named sounds complicate the process of accessing
Windows PlaysForSure designation are evi- those sounds as time wears on. Trying to run
dence of how a good portion of web-related a Flash-based video game or play a Liquid
audio can be protected and privatized to the Audio file downloaded from a file-sharing
point of inaccessibility, once the technologies site in the mid 1990s is a whole other mat-
for maintaining the infrastructure shut down ter, given the copy protection schemes and
or close (Morris, 2015). rapid pace of obsolescence of many Internet-
dependent technologies. Depending on the
severity of the digital rights management, or
The Sounds of Institutional the networked links between the audio for-
mat, the software media player and the hard-
Archives
ware itself, it may be next to impossible.
There are more official and institutional
sound archives online that take greater care in
preserving, cataloguing and making research- The Sounds of Streams
able a variety of sounds and sound record-
ings. These archives tend to focus on history The shift towards streaming as a dominant
(e.g. the British Library, The Library of form of delivery for sound (and other media)
500 THE SAGE HANDBOOK OF WEB HISTORY

on the Web is a prime example of the ways neither mundane nor permanent. Huge per-
this assemblage of format–software– centages of early silent film, television and
hardware complicates the ability to save and radio have been lost, destroyed or are other-
store audio. While streaming offers the abil- wise inaccessible.4 Given how ubiquitous and
ity to play music and sound from multiple available digital sound files are, one might
devices and doesn’t take up space on devices assume these web artifacts would not face
with limited storage space (i.e. some mobile the same preservation risks as, say, old radio
phones, tablets or ageing computers), it also tape reels, transcription discs or celluloid film
leaves no – or at least no easy way to access stock. But similar preservation challenges
and copy – trace of the audio in question. face new and old media alike. For instance,
Despite all the outcry around digital rights there’s no general exemption in current US
management through most of the 2000s, and copyright law for the copies that libraries and
how the technology prevented users from preservationists might want to make in order
sharing files with other devices and users, to keep digital formats up to date and properly
making multiple copies of songs or editing preserved – updating from, say, wma files to
files, streaming is celebrated even though it something more current – leaving merely a
upholds these same restrictions. Although complex and shifting patchwork of policies
most of the songs in Spotify’s library come regulating digital preservation.
from other sources (i.e. Spotify’s version is Ultimately, it’s not just a question of
rarely the only copy), audio on sites like what the Internet Archive or the Library of
SoundCloud and other music communities Congress has (or doesn’t) or what other
are increasingly born-digital and exclusively archives might or might not. Rather, it’s
digital. As streaming becomes a dominant about whether we should be trying to save
mode of delivery for all kinds of web-based sounds of the early Web, or saving its sound-
media, these natively digital goods may be scapes. Schafer’s (1977) original conception
less copy-able and save-able than their pre- of the soundscape is primarily a sonic one,
decessors. Much of what remains from the but the link to landscapes emphasizes how
early days of the IUMA or mp3.com remains sound is shaped by the structures, bodies
because users downloaded and saved those and objects that make up any given scene.
sounds. Today’s streaming sites rely less on On the Web, it’s often not enough to save
providing users their own copy of sounds, the sound of a particular song, technology or
than on providing access to a vast catalogu web object; the contextual material that helps
of sounds in exchange for a subscription fee create that sound must be preserved as well.
or for advertisements. Increasingly, our web- Preserving the soundscape means preserving
based technologies are prizing the mobile much more than just the sounds; it requires
and the ubiquitous, the dynamic over the a multi-sensorial effort to document the Web
static, causing difficulties for those trying to in all its multimediated complexity. If ‘digital
write the histories of sound on the Web or sources necessitate a rethinking of the histo-
anyone interested in listening to the past. rian’s toolkit’ (Milligan, 2012: 23), they also
We know from the histories of other media, require rethinking how the very practices of
like television, film or radio, that the early archiving and historiography take place.
years of a developing medium are invaluable
for researching intersections of technology,
industry, culture and power. We also know
that these early years are some of the hardest CONCLUSION
to preserve, not just because of the logistics
of preserving, but often because we seem to As I was finishing this article, there was a
realize too late that the latest technology is flurry of articles about the death of the
HEARING THE PAST: THE SONIC WEB FROM MIDI TO MUSIC STREAMING 501

mp3 (e.g. Beadon, 2017; Flanagan, 2017). media historiography would not be possible
The articles were in response to an without absence: ‘It is the absence of the past,
announcement from the organization that the impossibility of finding direct access to it,
created the mp3, the Fraunhofer Institute, that makes possible the writing, reading, and
noting that the licensing program for some contemplation of history. History’s condition
mp3 patents and software was ending. Of of impossibility – the irreducible distance
course, as critics were quick to point out, of finitude – is thus its condition of pos-
this didn’t mean that the worldwide stashes sibility’ (Sterne, 2010: 80). We can recover
and stores of mp3s would suddenly vanish. sounds from the Web’s past but even the most
Rather, it recognized the more dull and complete archive (like a download of all the
pragmatic fact that Fraunhofer could no IUMA files) is still merely an archive of frag-
longer collect licenses on technology like ments and gaps. They are, as Sterne argues,
media players, audio editors, etc. since its ‘already not the history they described’ and
patents were expiring. Instead, Fraunhofer therefore it is up to historians to ‘find link-
was shifting its efforts to more profitable ages across documents, registers, genres,
formats, like Advanced Audio Coding and problems to give history meaning and
(AAC) or other efficient compression intelligibility for ourselves and our readers’
codecs that are used by many of the current (Sterne, 2010: 86).
streaming services. The sounds of the Web, from MIDI to
Media historians are trained to recog- streaming, are an integral part of its history; a
nize that the death of formats or media is part of its history that is often ignored through
rarely as final or certain as the label ‘death’ tools like the Internet Archive’s Wayback
implies. Media and their formats persist Machine. They are often the absence in highly
over time and interact with the media for- visual representations of the Web’s past.
mats that come after it (Bolter and Grusin, These sounds are not only nostalgic remind-
1999; Gitelman, 2006). Our music librar- ers of technologies and experiences past; they
ies didn’t simply shed the vinyl or cassettes contain important clues as to how machines
or CDs once digital files came along; they were used, how particular websites or tech-
likely remained in some liminal state made nologies worked, and what experiences early
up of multiple formats from multiple eras, web users might have had. Sound research-
each taking up varying amounts of physical ers can always cobble together an archive
and emotional real estate in our collections. through unofficial or accidental archives
But Fraunhofer’s announcement does sig- like YouTube or through more institutional
nal that the download model that has driven efforts, but only by repacking our toolkit so
much of the innovation in web music since that it also includes a focus on web sounds
the mid 1990s is giving way to streaming, can we better preserve those sounds moving
and that the mp3, long the favored format of forward. What today’s podcasters, web musi-
file sharers and legal downloaders alike, will cians, computer sound engineers, etc. are
have less institutional and industrial support producing today will have value in the future,
moving forward. Obsolescence is rarely a if not for its content, but for what it tells us
moment; it’s a process (Parks, 2007; Sterne, about audio’s longer history, about who has
2007). Like modem sounds and AOL start- the right to communicate and by what means.
up discs, the mp3 will persist and be put to If we’re not making efforts to preserve these
novel uses even as it becomes less ubiqui- sounds now, we’ll likely find ourselves in the
tous, common and used than it has been in same sonic conundrum many radio historians
the past. now find themselves in: writing, research-
All history is marked by its absences. As ing and thinking about a past they can’t fully
Sterne notes, the very endeavor of doing hear.
502 THE SAGE HANDBOOK OF WEB HISTORY

Notes Bottomley, A. (2016) Internet Radio: A History


of a Medium in Transition. PhD Dissertation,
1  ‘The strictest definition of chiptune refers to a University of Wisconsin-Madison, Depart-
song composed exclusively for performance by a
ment of Communication Arts.
microchip capable of synthesizing sound’ but the
Brügger, N. (2009) ‘Website History and the Web-
chiptune culture that emerged from the Sound
Interface Device era in the 1980s, the tracking site as an Object of Study’. New Media &
scene, and the Gameboy musicians in the 1990s Society, 11(1&2): 115–32.
took ‘the term and aesthetics far beyond that Burkart, P. & McCourt, T. (2006) Digital Music
simple definition’ (Driscoll and Diaz, 2009). Wars: Ownership and Control of the Celes-
2  Apparently, there’s evidence of the sale of mari- tial Jukebox. Lanham, Maryland: Rowman &
juana between university students via ARPAnet in Littlefield Publishers.
the early 1970s (Markoff, 2006) and, certainly, Carey, J. & Elton, M. C. J. (2009) ‘The Other
videotex services like Minitel and CompuServe Path to the Web: The Forgotten Role of Vide-
in the 1980s offered a number of e-commerce
otex and Other Early Online Services’. New
options (Carey and Elton, 2009).
Media & Society, 11(1–2): 241–60.
3  The Museum is supposedly the project of uber-
nerd and sound archivist Brendan Chilcutt, but in Driscoll, K. (2016) ‘Social media’s dial-up roots’.
a story line built for Internet historians, Chilcutt is IEEE Spectrum, 53(11): 54–60.
actually a fictional creation of three former adver- Driscoll, K. & Diaz, J. (2009) ‘Endless Loop: A Brief
tising students at Virginia Commonwealth Uni- History of Chip Tunes’, Transformative Works
versity (Pardes, 2016). The trio created Chilcutt and Culture, 2(1). Available: http://journal.
as a spoof, but the sounds they preserved have transformativeworks.org/index.php/twc/article/
driven serious interest and attention to the site. view/96/94 [Accessed 21 June 2017].
4 With many thanks to Eric Hoyt for helping me Flanagan, A. (2017) ‘The MP3 Is Officially
formulate these ideas on preservation more
Dead, According To Its Creators’ [Online].
clearly and effectively.
NPR. Available: http://www.npr.org/sections/
therecord/2017/05/11/527829909/the-mp3-
is-officially-dead-according-to-its-creators
[Accessed 13 May 2017].
REFERENCES Friedman, T. (2005) Electric Dreams: Comput-
ers in American Culture. New York, NY: New
Abbate, J. (1999) Inventing the Internet. Cam- York University Press.
bridge, Mass: MIT Press. Gillespie, T. (2007) Wired Shut: Copyright and
Alderman, J. (2001) Sonic Boom: Napster, MP3, the Shape of Digital Culture. Cambridge,
and the New Pioneers of Music. Cambridge, Mass: MIT Press.
Mass: Perseus Pub. Gitelman, L. (2006) Always Already New:
Amorim, R. (2007) ‘XingSound MP2 Player’ Media, History And The Data Of Culture.
[Online]. Really Rare Wares. Available: http:// Cambridge, Mass: MIT Press.
web.archive.org/web/20070214091726/ Gruberman, K. & McQuillin, K. (1991) ‘Multi-
http://www.rjamorim.com/rrw/xingsound. media and Audio’. MacUser. Ziff-Davis.
html [Accessed 8 December 2007]. Grunin, L. (1991) ‘Utility Coaxes Cd-Audio Out Of
Beadon, L. (2017) ‘The MP3 Is Dead, Long Live Your Rom Drive’. PC Magazine. Ziff Davis.
the MP3’ [Online]. Hypebot. Available: http:// Hammersley, B. (2004) ‘Audible Revolution’.
www.hypebot.com/hypebot/2017/05/ The Guardian, February 12.
the-mp3-as-dead-as-pepe-the-frog.html Haring, B. (2000) Beyond the Charts: MP3 and
[Accessed 18 May 2017]. the Digital Music Revolution. Los Angeles,
Blattberg, E. (2014) ‘The Podcast Enters a New Calif.: JM Norther Media LLC.
Golden Age’. [Webpage]. Digiday Nielsen. Hauben, M. & Hauben, R. (1997) Netizens: On
Available: http://digiday.com/publishers/ the History and Impact of Usenet and the
nielsenes-rise-podcast/ [Accessed 12 Internet. Los Alamitos, Calif.: IEEE Computer
February 2016]. Society Press.
Bolter, J. D. & Grusin, R. A. (1999) Remedia- IFPI (2017) IFPI Global Music Report 2017.
tion: Understanding New Media. Cambridge, London: International Federation of the Pho-
Mass: MIT Press. nographic Industry.
HEARING THE PAST: THE SONIC WEB FROM MIDI TO MUSIC STREAMING 503

Katz, M. (2004) Capturing Sound: How Tech- PEW. (2013) ‘3% of Americans Use Dial-Up at
nology Has Changed Music. Berkeley, Calif.: Home’ [Online]. PEW Research Center. Avail-
University of California Press. able: http://www.pewresearch.org/fact-
Kirschenbaum, M. G. (2016) Track Changes: A tank/2013/08/21/3-of-americans-use-dial-
Literary History of Word Processing. Cam- up-at-home/ [Accessed 15 May 2017].
bridge, Mass: The Belknap Press of Harvard Roose, K. (2014) ‘What’s Behind the Great
University Press. Podcast Renaissance?’ New York Magazine,
Madrigal, A. C. (2012) The Mechanics and 30 October.
Meaning of That Ol’ Dial-Up Modem Rothenberg, R. (1999). ‘Rob Glaser, Moving
Sound. The Atlantic [Online]. Available: Target’. Wired, 7(8): 126–33.
https://www.theatlantic.com/technology/ Schafer, R. M. (1977) The Tuning of the World.
a rc h i v e / 2 0 1 2 / 0 6 / t h e - m e c h a n i c s - a n d - New York, NY: Knopf.
meaning-of-that-ol-dial-up-modem- Scott, J. (Director) (2005) BBS: The
sound/257816/ [Accessed 22 September Documentary.
2017]. Sittenfeld, C. (1998) ‘He’s the Voice of the Net
Mann, Hugh. (1998) ‘The Internet from Pay- Generation’ [Online]. Fast Company. Availa-
phones’ [Online]. Web Page. Available: http:// ble: https://www.fastcompany.com/35450/
www.wrybread.com/WryRoad/gadgets/cou- hes-voice-net-generation%7D [Accessed 20
pler.htm. [Accessed 16 February 2017] January 2017].
Manning, Peter. (2004) Electronic and Com- Sterne, J. (2007) ‘Out With The Trash: On the
puter Music. Oxford; New York: Oxford Uni- Future of New Media’. In: Acland, C. R. (ed.)
versity Press. Residual Media. Minneapolis, Minn: Univer-
Markoff, J. (2006) What the Dormouse Said: sity of Minnesota Press: 16–31.
How the Sixties Counterculture Shaped the Sterne, J. (2010) ‘Rearranging the Files: On
Personal Computer Industry. New York, NY: Interpretation in Media History’. The Com-
Penguin Books. munication Review, 13(1): 75–87.
Milligan, I. (2012) ‘Mining the “Internet Grave- Sterne, J. (2012) MP3: The Meaning of a
yard”: Rethinking the Historians’ Toolkit’. Format. Durham, NC: Duke University Press.
Journal of the Canadian Historical Associa- Sterne, J., Morris, J. W., Baker, M. & Moscote
tion, 23(2): 21–64. Freire, A. (2008) ‘The Politics of Podcast-
Morris, J. W. (2015) Selling Digital Music, For- ing’. Fibreculture, (13). Available: http://
matting Culture. Berkeley, Calif.: University thirteen.fibreculturejour nal.org/fcj-
of California Press. 087-the-politics-of-podcasting/ [Accessed
Museum of Endangered Sounds [Online]. Avail- 13 December 2008].
able: http://savethesounds.info [Accessed 20 Suhr, C. (2012) Social Media and Music: The
January 2017]. Digital Field of Cultural Production, New
Ness, L. (1993) ‘Xing Technology Ships Xing- York, NY: Peter Lang.
Sound MPEG Audio Compression Software’ Théberge, P. (1997) Any Sound You Can
[Online]. Business Wire 12 October. Imagine: Making Music/Consuming Tech-
Pardes, A. (2016) ‘Where the Sounds of Your nology. Hanover, Conn: Wesleyan Univer-
Childhood Go to Rest’ [Online]. Medium sity Press.
– Mel Magazine. Available: https://mel- Turner, F. (2005) ‘Where the Counterculture
magazine.com/where-the-sounds-of-your- Met the New Economy: The Well and the
c h i l d h o o d - g o - t o - re s t - 2 8 e 0 8 1 2 b b e 5 2 Origins of Virtual Community’. Technology
[Accessed 15 May 2017]. and Culture, 46: 485–512.
Parks, L. (2007) ‘Falling Apart: Electronics Sal- Venkatesh, A. (1996) ‘Computers and Other
vaging and the Global Media Economy’. In: Interactive Technologies for the Home’.
Acland, C. R. (ed.) Residual Media. Minne- Communications of the ACM, 39(12):
apolis, Minn: University of Minnesota Press: 47–54.
32–47. Venkatesh, A. & Vitalari, N. (1987) ‘A Post-
Petzold, C. (1991) ‘Putting Sound on PCs: An Adoption Analysis of Computing in the
Introduction to Waveform Audio’. PC Maga- Home’. Journal of Economic Psychology, 8:
zine. Ziff Davis. 161–80.
504 THE SAGE HANDBOOK OF WEB HISTORY

W3C. (2000a) ‘Information by Subject’ [Online]. sortium. Available: https://www.w3.org/His-


World Wide Web Consortium. Available: tory.html [Accessed 20 January 2017].
https://www.w3.org/History/19921103- willterminus. (2008, 9 November). ‘The
hypertext/hypertext/DataSources/bySubject/ Sound of Dial-Up Internet’. YouTube.
Overview.html [Accessed 20 January 2017]. Available: https://www.youtube.com/
W3C. (2000b) ‘A Little History of the World watch?v=gsNaR6FRuO0 [Accessed 9
Wide Web’ [Online]. World Wide Web Con- January 2018].
34
Memes
Jim McGrath

INTRODUCTION the internet is described and defined (Abbate,


2017), this chapter is interested in where,
In digital contexts, the word meme has been how, and why the metaphor of the meme is
used to describe a wide range of media: text, used to describe particular modes of expres-
photographs, video clips, audio files, image sion in digital contexts with a specific focus
macros, GIFs. Broadly speaking, a digital on the web.
object is often labeled an ‘internet meme’ Memes (as they were explicitly called by
once it begins to spread (often quickly, or certain users) began circulating on the news-
‘virally’, to embrace a popular metaphor) groups of Usenet in the early 1990s. The
across various social media networks and meme metaphor continued to be invoked in
digital publication platforms, especially the 2000s, mainly to describe popular jokes
when the object is seen moving far beyond and references circulated by commenters on
the network of the individual who first circu- forums like Something Awful and blogs like
lated it. In recent years it has become more Metafilter. But memes were a concept that
conventional to see ‘internet memes’ more didn’t often travel far beyond these smaller
generally referred to as ‘memes’ in informal communities and coteries. The first English-
discussions, media commentary, and aca- language Wikipedia article for the term
demic scholarship. The erasure of ‘internet’ ‘internet meme’ was created in March 2005;
in these contexts suggests that the term may until December 2006 the page was simply
feel redundant or unnecessary; it seems taken a stub that redirected readers to the article
for granted that discussions of memes inevi- for ‘internet phenomena’ (CesarB, 2005).
tably refer to ‘the internet’. Echoing scholars The words ‘meme’ and ‘internet meme’ are
of web history who ask us to consider the absent from Time magazine’s 2006 ‘Person
political dimensions of where, how, and why of the Year’ cover story on ‘Web 2.0’ and its
506 THE SAGE HANDBOOK OF WEB HISTORY

users (Grossman, 2006). Their ubiquity on mobile media’, a period ‘characterized by the
(and beyond) the web in the second decade emergence of new social media that are born
of the twenty-first century is a fairly recent on and with mobile media, and that are app-
development. based only, most notably Instagram, but also
The absence of memes from these texts Snapchat and Tinder’ (2016: 1064). Memes
and contexts is a useful reminder of how rap- are arguably the digital objects we most fre-
idly the web has changed over a short period quently associate with online behavior during
of time. Whereas in the mid 1990s we saw this particular period. How did we come to
a ‘desire among users to situate themselves inhabit what Ryan Milner (2016) calls ‘the
on the web’ via creating personal web sites world made meme?’.
on GeoCities and elsewhere (Milligan, 2017: To answer this question, this chapter will
138), many twenty-first-century users instead consider the long history of memes, focusing
gravitate towards the networks and interfaces particularly on the earliest forms of memes to
of social media networks. Ian Milligan high- circulate on the web. It argues that an under-
lights the ‘geographical community meta- standing of the aesthetics, uses, and reception
phor’ governing GeoCities and similar uses of memes in the twenty-first century benefits
of spatial language to describe and ‘map’ the from examining the earliest examples of
terrain of the web during this period (2017: memes. But it also suggests that highlighting
138). While online community formation these dimensions of earlier memes might also
remains a key interest of many web users, improve our understanding of web history,
these particular manifestations of spatial and by highlighting both recurring traits in online
geographic resonances are arguably muted behavior as well as changes in behavior that
in the constructions of later communities reflect the transformation of how we access,
through social media, communities tied to inhabit, and imagine the web in our contem-
particular mobile apps, and other kinds of porary moment. At our current moment in
group and identity formations on the twenty- the twenty-first century, we see memes that
first-century web. Memes have been read as frequently travel back and forth with rela-
essential components in these more recent tive ease between and across the web, social
efforts of community formation, in that their media, and mobile media. A closer look at
arguments, materials, and uses have been what memes are being created and circulated
shown to construct, critique, and reimagine and where they travel can teach us much
ideas of normative and subversive online about the ways the web is utilized, populated,
behavior (Gal et al., 2016). and imagined by particular individuals at spe-
Memes are seemingly everywhere: they cific historical and cultural moments in time.
have come to dominate the online discourse In other words, while memes might often
of social media networks, they receive exten- be described as freely or ‘virally’ circulating
sive press coverage on news and entertain- rapidly across and beyond the web, we might
ment web sites and blogs, they have been also see in them traces of constraints, conven-
appropriated by advertising and marketing tions, and communities from the web’s past.
firms across the globe, they are the subjects Some of these traces materialize in the form
of serious critical inquiry by scholars and of particular points of reference from these
archivists, and they have even warranted earlier moments: images and texts popular
direct commentary (and even re-use) by among smaller online communities that are
politicians and heads of state. Generally now familiar to much larger audiences. For
speaking, memes seem to have particularly example, ‘Godwin’s Law’, a meme created
flourished during what Niels Brügger calls in 1991 by Usenet user Mike Godwin, con-
the ‘third wave’ in the development of ‘the tinues to circulate on news web sites, tweets,
nexus between the Web, social media, and and Facebook comments in the twenty-first
MEMES 507

century. Why does ‘Godwin’s Law’ continue Usenet and internet message boards. It draws
to resonate on the web for particular users? parallels and identifies points of compari-
Does the meme have value for particular son between early ideas and uses of memes
demographics of users, and do these demo- and more recent iterations, but it also calls
graphics change over time? When consider- attention to the differences in the particular
ing why particular memes hold the attention material and historical conditions, rhetorical
of various web users at various points in time, strategies, and social uses of these particular
we would do well to remember who does and networks and the different kinds of memes
does not have access to particular technolo- created and circulated there. The history of
gies, cultural references, and other dimen- memes is a history of technology, in the sense
sions and conventions of online discourse. that the creation and dissemination of memes
How do the earlier communities of Usenet over the last three decades is greatly informed
compare to the communities of users shar- by the digital tools, sites of publication, and
ing ideas across the web, social media, and devices enabling users to read, create, share,
mobile apps in later contexts? and remix these materials. The speed at which
Looking at earlier memes can also high- devices, editing software, and publication
light the ways that the material dimensions platforms have become more readily available
of knowledge production have changed over and less expensive has resulted in a significant
the history of the web. ‘Godwin’s Law’, increase in producers and consumers of inter-
which we will discuss in greater detail later, net memes. A decelerated, deliberately detail-
is a meme created in static text and then dis- oriented approach to the reading and curation
tributed through the particular network of of memes might help present and future web
Usenet. The material and networked dimen- historians attend to what is getting lost in
sions of Godwin’s Law are visible in its suc- these rapid exchanges of cultural information.
cessful reception beyond Usenet on the web
at large and in other contexts like print: this
particular meme can be physically replicated
and remediated with relative ease. Other WHAT IS A MEME?
meme formats, like GIFs, image macros, or
short videos circulated on the social media In The Selfish Gene, Richard Dawkins
app Vine, become popular among certain searched for a more precise way to meet ‘the
users in part because of limitations in tech- formidable challenge of explaining culture,
nology: these formats came to be preferred cultural evolution, and the immense differ-
by users because they were easy to create and ences between human cultures around the
circulate on the parts of the web and social world’ (1989: 191). Analogies ‘between cul-
media that they inhabited at various points tural and genetic evolution’ favored by other
in time. And we also see memes shaping scientists and sociologists seemed, to
the material affordances of particular online Dawkins, to rely too much on an interpreta-
spaces when they travel across networks: for tion of Darwin that went ‘looking for “bio-
example, in recent years Twitter, Facebook, logical advantages”’ in their justifications of
and other social media networks have made the longevity of certain cultural practices
it easier for users to share, upload, and dis- (1976: 191). Dawkins believed that replica-
play memes in GIF and video formats to meet tion and imitation in cultural behavior (like
demands among users related to memes. the frequency of the belief in the existence of
This chapter brings recent examples of God) was not necessarily due to ‘genetic
internet memes on Twitter and Facebook advantages’ in the brains capable of creating
into conversation with earlier investments in and subscribing to those ideas (1976: 191). He
the meme metaphor in digital contexts like searched for ‘a name for the new replicator, a
508 THE SAGE HANDBOOK OF WEB HISTORY

new noun that conveys the idea of a unit of general sense and are primarily intended for
cultural transmission, or a unit of imitation’, a STEM audiences, though this research has
term that might rhyme with ‘gene’ to play- at times considered and intersected with the
fully acknowledge the influence of Darwinian more particular topic of internet memes as
thinking but might also afford a kind of criti- well as academic investments in humanistic
cal inquiry that is not dominated by its reason- research. For example, Susan Blackmore’s
ing (1976: 192). He settled on ‘meme’. The Meme Machine has a chapter titled ‘Into
With the idea of the ‘meme’ as a framing the Internet’, in which she considers the
device, Dawkins invites us to use it to think impact of new forms of ‘replication machin-
about the shared traits in disparate things like ery’ within the larger narrative of ‘the evo-
‘tunes, ideas, catch-phrases, clothes fashions, lution of meme replication’ (1999: 205). In
ways of making pots or of building arches’ the same way that ‘memes took a great step
(1976: 192). Dawkins unites these texts, forward when they got into books’ with the
objects, and referents under the rhetorical advent of the printing press, the internet and
framework of the meme by noting the ways its comparatively inexpensive publishing
they are all continuously ‘leaping from brain mechanisms are situated as key factors in
to brain via a process which, in the broad enabling more forms and modes of reception,
sense, can be called imitation’ (1976: 192). duplication, and distribution of memes (1999:
He classifies memes as possessing three 209–10). On the other hand, Blackmore
qualities: ‘longevity, fecundity, and copying- seems more interested in ‘traditional’ forms
fidelity’ (1976: 194). While he provides this of replication that aspire to exact or precise
list of traits to his readers, Dawkins sees them copies in digital publication and communica-
more as a ‘speculative’ framework than a set tion platforms. For example, she describes
of descriptive mandates (1976: 199). He is the benefits of ‘bots’ designed with artifi-
reluctant to explicitly designate ‘what a sin- cial intelligence as ‘small and simple units
gle unit-meme’ is (or is not): an entire song, that together do clever things’ (1999: 217).
a lyric, a brief section of music, a particular These programs might be designed to resem-
performance. He is more interested in the ble ‘insects laying chemical trails’ and put to
circumstances that lead a particular meme work on ‘error-correction tasks or censorship
to ‘dominate the attention of a human brain’ duties’ (1999: 217). Replicating mechanisms
(1976: 197). Milner argues that ‘it may be are becoming more prevalent in online dis-
better to diminish the connections between course. For example, we have seen YouTube,
Dawkins’ argument and participatory media’, Facebook, and other sites automate processes
given the differences between the idea of the to regulate the circulation of content in ways
meme outlined in The Selfish Gene and the that address concerns about copyright protec-
connotations this descriptive lens has gener- tion, and we find traces of bots dedicated to
ated in digital contexts (2016: Loc. 3968). On mundane formatting tasks in Wikipedia’s edit
the other hand, Dawkins himself has noted histories. But generally speaking, bots are
that the meaning of the word ‘meme’ in the frequently viewed as distinct from ‘memes’
particular context of the internet ‘is not that in terms of their perception and social uses.
far away from the original’ (Solon, 2013). In
fact, Dawkins seems to have embraced these
connotations: for example, he performed in a
‘theatrical piece’ about memes organized by GODWIN’S LAW
the ad agency Saatchi and Saatchi at the 2013
Cannes Advertising Festival (Solon, 2013). Despite the productivity of researchers in the
Most book-length projects with ‘meme’ emerging field of memetics, one of the most
in the title have focused on memetics in a memorable experiments in ‘memetic
MEMES 509

engineering’ did not take place in a university whole schools of thought’ and create a shared
lab or require research funding (Godwin, vocabulary and shorthand for conversation,
1994). In the early 1990s, lawyer Mike Godwin is more immediately concerned with
Godwin was a frequent contributor to Usenet, the limits of consensus-oriented online dis-
an online hub of various ‘newsgroups’ dedi- course. ‘[V]iral memes are capable of doing
cated to text-centric discussions about a host lasting damage’, he warns, and he wonders
of topics. Godwin had grown tired of the if there is a ‘moral imperative’ or ‘obligation
prevalence of what he called the ‘Nazi- to improve our information environment’
comparison meme’, a rhetorical move that that might drive the creation and circulation
Godwin believed ‘trivialized the horror of the of additional ‘counter-memes’ on the web
Holocaust and the social pathology of the (1994). Godwin suggests that memes might
Nazis’ in its frequent invocation on various also be sites of critique and resistance, spaces
newsgroups (1994). He channels these frus- where both familiar topics and the conven-
trations into the creation of ‘Godwin’s Law’, tions of discourse might be subverted or pro-
a ‘counter-meme’ situated as a call to arms, a ductively critiqued.
critique that its author hopes will compel Originality often has less to do with
‘net.dwellers [sic] to make a conscious effort a meme’s successful replication than its
to control the kinds of memes they create or authors or inventors would like us to believe.
circulate’ (1994). In fact, several factors likely contributed to
In an earlier version of the law posted to the popularity of Godwin’s Law on Usenet
the rec.arts.sf-lovers newsgroup on August and beyond this original site of publication.
18, 1991, ‘Godwin’s Rule of Nazi Analogies’ The frequency of the Nazi comparison had
states that ‘As a Usenet discussion grows been flagged by other users before Godwin: a
longer, the probability of a comparison list of ‘FAQs’ (Frequently Asked Questions)
involving Hitler or Nazis approaches one’ about Godwin’s Law highlights a comment
(Godwin, 1991). The rule became a ‘law’ due by Richard Sexton on October 16th, 1989,
to the frequency with which the Usenet com- noting that ‘You can tell when a USENET
munity seemed to inevitably replicate this discussion is getting old when one of the par-
behavior, whether it was in a discussion of the ticipents [sic] drags out Hitler and the Nazis’
allegedly homophobic views of science fic- (Skirwin, 1999). Godwin’s framing calls
tion author Orson Scott Card (as it was in this attention to itself, is part playful and part
instance) or in a 1999 thread about the ben- challenge, inviting readers to test its verac-
efits of bicycling in pdx.general, a Portland ity, whereas Sexton’s (misspelled) remark is
(Oregon) newsgroup, to take one example at delivered in an offhand manner and quickly
random from the Usenet archives. Beyond forgotten. Many successful memes adopt
his sense that these comparisons to Hitler rhetorical strategies that have proven suc-
or the Nazis do a disservice to the historical cessful in other digital and non-digital con-
specificity of these atrocities when they are texts: for example, the framing of Godwin’s
invoked to discuss comparatively mundane Law might be productively compared to
subjects, Godwin desires a level of intellec- philosophical or legal language as well as the
tual and rhetorical rigor that meets his image branding strategies of advertisers. And, like a
of a ‘Net’ that is ‘filled with diverse critical successful advertising campaign, ‘Godwin’s
thinkers’ (1994). Law’ likely spreads in part because its author
Godwin explicitly situates his law within is both committed to circulating its message
the context of memetics in a 1994 essay pub- across various Usenet newsgroups and in
lished in Wired magazine. He embraces the possession of the time, technology, and addi-
‘viral’ resonances of the metaphor: while he tional resources needed to maintain a pres-
commends a meme’s ability to ‘crystallize ence on this digital communication network.
510 THE SAGE HANDBOOK OF WEB HISTORY

In Godwin’s recollection of the rise of method of ‘contextualizing one’s written


‘Godwin’s Law’, he notes that he ‘seeded’ messages with an emoticon to indicate emo-
his observation ‘in any newsgroup or topic tional intent’ has become a staple of online
where I saw a gratuitous Nazi reference’ and communication, most visibly in the ubiquity
then quickly found that other people picked of emojis and emoji keyboards on mobile
up on it, even occasionally forming ‘corollar- devices (2012: 124). The uses and forms of
ies’ about Nazi-related discussions or trends emoticons have changed with the introduc-
(1994). In his version of the story, Godwin tion of newer digital publication platforms
invents the meme, introduces it to particu- and networks, and other digital material has
lar discussion threads, it is accepted and served to accentuate the emotional or rhe-
embraced by other users, and it is quickly torical weight of digital content.
adopted and redistributed. Godwin narrates Since ‘Godwin’s Law’, we have seen simi-
a story about control and invention: he is lar claims that attempt to combat and critique
the conductor of an experiment, the origina- conventions of online behavior by calling
tor of an idea, the designer and orchestrator them out as clichéd, illogical, or otherwise
of a rhetorical campaign. His history is one unwanted. The idea of ‘jumping the shark’
that places his own authority and agency at – the moment when a once-popular serial-
the center, a retelling of events that renders ized work (generally fiction) takes a turn
the success of his meme predictable, a logical that vocal parts of its audience find absurd
outcome that confirms his perceptive inter- or illogical – originated in a University of
pretation of the ways information circulates Michigan dorm room conversation about of
on Usenet. While he seemed to accept that its Arthur Fonzarelli’s waterskiing mishaps in
spread was ‘far beyond my control’, Godwin an episode of Happy Days (Fox Jr., 2010);
does seem to enjoy the ability to dictate the it spread beyond these walls in 1997 when
terms of the meme’s narrative in the pages of Jon Hein, one of the undergrads participating
Wired and elsewhere (2008). in this conversation, created jumptheshark.
Based on the attention it received beyond com, a popular web site dedicated to docu-
Usenet in North American news media, menting these moments in the history of
Godwin’s Law is a useful context for under- television (Fox Jr., 2010). Similarly, ‘The
standing how the language of memetics Bechdel Test’, a checklist of film tropes
became prevalent in descriptions of modes designed to expose how frequently their
of digital behavior. That being said, there stories of women in films were dominated
are other ways to begin a ‘pre-history’ of by the male figures in their lives, began as
memes. For example, Patrick Davison sug- a conversation between two characters in a
gests that the introduction of the emoticon 1985 installment of Allison Bechdel’s Dykes
marker ‘:-)’ to Carnegie Mellon’s bulletin To Watch Out For comic serial; it eventually
boards in 1982 might be a generative point of found a wider audience when ‘lefty blogs’
origin for the history of memes, given their began referencing it, leading to its reappear-
popularity on social media networks and ance and redistribution in digital form via a
in text messaging (2012: 124). ‘Emoticons scan posted to the artist’s blog (Bechdel and
are a meme’, Davison argues, and they are Cathy, 2005). Bechdel and Hein do not call
a particularly popular and malleable kind their observations ‘memes’, but they reso-
of meme (2012: 124). Read through the nate in many ways with Godwin’s sense of
lens of memetics, and the early emoticon the ‘counter-meme’ (and have been classified
certainly seems like a meme, given the fre- as memes by the popular Know Your Meme
quency with which it can be (and still is) web site). These phrases and their contexts
quickly composed and replicated on various quickly spread and reappear across blogs
devices. Davison notes how this particular and message boards, in print publications, on
MEMES 511

television programs dedicated to popular cul- was arguably the most popular dancing baby
ture, in offline conversation. of them all, serving as the inspiration for the
Early internet memes like Godwin’s Law, ‘Dancing Baby’ that appeared as a hallucina-
the Bechdel Test, and ‘jumping the shark’ tion in several episodes of the FOX sitcom
are all text-based, enabling them to easily Ally McBeal in 1998. Dancing babies would
and quickly circulate across various dis- go on to appear in advertisements, as GIFs,
cussion boards and web sites. Text-centric and in other digital and commercial contexts,
memes circulate on social media networks though Lussier was only directly involved in
and are sometimes transformed by the tech- and compensated for a small number of these
nical affordances available there: Instagram appearances.
and Twitter users may turn static memes into The ‘Dancing Baby’ meme primarily cir-
hyperlinked hashtags, for instance. culated via email, and while it could be con-
Some scholars have noted the ways in sumed, it was difficult to revise: alterations
which memes seem to function as ‘signatures required a technical skill level and a famili-
of topics and events [that] propagate and dif- arity with both digital animation and distri-
fuse over the web’ (Leskovec et al., 2009). bution that limited who could participate in
Memes like ‘jumping the shark’ quickly the creation of meme variations. Lussier cre-
became ‘networked’ in digital contexts, in ated a GIF from his ‘original’ dancing baby
that they could be hyperlinked in certain sites in 1996 to enable the smaller file to more
of publication to sites providing additional widely and freely circulate via email and on
context, or users could often run searches to web sites, but it would have been difficult
find additional use-cases. But the ubiquity for many internet users to remix even this
of hashtags on platforms like Twitter makes file. Lussier clearly had conflicted feelings
the networked dimensions and the potential about the forms of distribution and uses of
for replication more visible and explicit, and his ‘Dancing Baby’ files. A page concern-
the aesthetic dimensions of digital interfaces ing ‘Copyright Info’ on his company’s web
inviting users to create, recirculate, or click site notes that ‘the original net-baby…has
on these networked memes has certainly become a “loose property” (not a legal defi-
shaped their reception and popularity. nition!!), by way of being many peoples [sic]
work, and because it was a great promotional
phenomenon’ (Lussier, n.d.). Despite his
own admitted unfamiliarity with legal terms
DANCING BABIES and protections concerning copyright in
digital contexts, Lussier warns against ‘refor-
In 1996, LucasArts employee Ron Lussier matting, cropping, using, or otherwise modi-
was able to ‘fix up’ a 3D animation of a fying someone else’s work without written
dancing baby found on Character Studio, a permission’ and claims that his files can
digital animation tool (Lussier, n.d.). After only be distributed ‘for private use’ (Lussier,
showing his results to some co-workers, n.d.). ‘Only stills of the dancing baby may
Lussier got some requests to email the be displayed on a private, non-commercial
‘Dancing Baby’ file, and he soon found that web page’, he cautions, and these snap-
‘people…had received it back again from shots must be republished alongside the web
people outside the company, across the coun- address of Lussier’s company, Burning Pixel
try’ (Lussier, n.d.). The animation originally Productions (Lussier, n.d.). Newer incarna-
circulated without a soundtrack, though a tions and variations of the ‘Dancing Baby’
version in which the baby appears to be created by Lussier also feature large repro-
dancing to the song ‘Hooked on a Feeling’ as ductions of his URL in their JPEG and GIF
performed by the band Blue Suede in 1974 versions.
512 THE SAGE HANDBOOK OF WEB HISTORY

The solutions Lussier comes up with to online where contributors could embed or
address perceived failures to properly, ethi- link to media files found elsewhere on the
cally, or even legally compensate him for his web, sometimes even images of their own
work on the ‘Dancing Baby’ meme may seem invention. While Usenet survived as a com-
inelegant or even incorrect to some contem- munications platform into the early years of
porary readers, but they do suggest the ways the twenty-first century, other digital hubs for
in which claims for copyright or ownership conversation were emerging, creating their
of materials can impact the creation, distribu- own communities, modes of online dis-
tion, and study of internet memes. For exam- course, and, of course, memes. In the late
ple, while Twitter’s Terms of Service note 1990s and early 2000s, forums hosted on
that the social media network ‘will respond sites like Fark and Something Awful became
to notices of alleged copyright infringement’, hotbeds of activity where global networks of
it also notes, on a page concerning its inter- users would distribute and dissect various
pretation of ‘Fair Use’ and similar global forms of popular culture, solicit and provide
concepts governing ‘uses of copyrighted advice on personal matters, tell jokes, and
material [that] may not require the copyright seek out kindred spirits. While these digital
owner’s permission’, that ‘there is no clear spaces often looked ‘unprofessional’ and
formula’ to determining strict guidelines lo-fi compared with the sleeker output of
informed by this concept (2016). Other social commercial organizations, marketers, news
media networks are more proactive in polic- media outlets, and scholars of popular cul-
ing users who may willingly or unwittingly ture paid attention to what was happening in
engage in copyright violation: for example, spaces like Something Awful when they
many users of Facebook’s ‘Facebook Live’ noticed the popularity of content created and
video streaming tool have discovered that made. It is for this reason that institutions
the network will disrupt or refuse to publicly like the US Library of Congress have taken
archive material checked ‘against files in [its] an interest in archiving and curating the
Rights Manager Reference library’ (Brogan, records of these conversations, which they
2016). While it is easy for many users to view as forms of ‘digital folklore’ essential
download, screenshot, edit, and re-upload to collections that ‘reflect contemporary tra-
memes to social media on mobile and desk- ditional culture on the web and beyond’
top devices, networks that choose to enforce (Saylor, 2014). In many respects, the begin-
copyright through automated modes of sur- nings of the aesthetic and stylistic conven-
veillance or compliance checks in the name tions and formalist dimensions of internet
of intellectual property may limit the abilities memes as they came to be known in the
of users to create memes or content that may twenty-first century began in these discus-
one day be re-used or appropriated itself for sion boards.
the purposes of meme creation. When the phrase ‘All Your Base Are Belong
To Us’ began circulating in various online
and offline contexts at the start of the twenty-
first century, journalists quickly discovered
ALL YOUR BASE ARE BELONG TO US that the origins of the phrase could be found
within these newer iterations of internet mes-
As it became comparatively easier to create, sage boards. As recounted by Wired reporter
edit, and redistribute a wider range of media Jeffrey Benner, ‘All Your Base Are Belong To
on web sites, beyond text, message boards, Us’ is a phrase from a 1992 English transla-
and forums, internet memes began to take on tion of a ‘cut scene’ describing the world of
increasingly multimodal dimensions in the a mostly forgotten 1989 Toaplan video game
late 1990s. Message boards began to be areas named Zero Wing (2001). These cut scenes
MEMES 513

were extracted and compiled in an Adobe memes are frequently evaluated and even
Flash animation file, which then began cir- venerated for their role in forming communi-
culating on forums like Metafilter, Something ties and social bonds in digital spaces, and
Awful, and memepool, among many others for their status as important artifacts in iden-
(Benner, 2001). One popular video compila- tity formation. For example, Shifman argues
tion anticipated the spread of ‘All Your Base that memes are ‘rooted in economic, social,
Are Belong To Us’ beyond digital contexts, and cultural logics of participation’, forces
mixing the robotic voice from Zero Wing into that draw our attention ‘not only on the texts’
club-ready techno mixes, photoshopping the that comprise individual units of memes but
text onto street signs, fortune cookies, comic also ‘the cultural practices surrounding them’
strips, even the mugshot file photo of O.J. (2012: 33–4). Social networks may invite
Simpson (Bad-CRC, 2008). Benner meets performances of ‘accelerated individualism’,
the popularity of the phrase’s ‘mutations but these performances ‘[demonstrate] an
and gossip’ across internet chat rooms with enduring human longing for communality’
some skepticism, but he also acknowledges that pre-dates the web’ (2012: 33). Shifman
that the meme has outperformed ‘[a]rmies further argues that memes play a vital role in
of marketers toiling for years’ and finds its fostering more varied forms of community
success ultimately indicative of ‘a medium formation; to her, the most interesting memes
that refuses to be tamed into predictability’ reveal ‘a new arena of bottom-up expression’
(Benner, 2001). on the web, a world wide web whose occu-
pants ‘can blend pop culture, politics, and
participation in interesting ways’ (2012: 4).
Milner similarly highlights the variety of
THE GROWING FIELD OF MEME ways memes are created and deployed to
STUDIES meet a wide range of ‘communicative ends’;
they can circulate in networks and contexts
In the years following the initial spread and that can be ‘vastly public, communally
popularity of ‘All Your Base Are Belong To social, or intimately interpersonal’ (2016:
Us’, memes appear to have become a famil- Loc. 510). For Milner, memes are the coin of
iar and recognizable medium to marketers the realm in popular spaces of ‘participatory
and many other content creators on the web. media’ like social networks: these are digi-
Scholars interested in new media, sociology, tal environments where ‘it’s relatively cheap
internet culture, and other fields and research and easy to make a statement, remix a text,
interests have begun to describe their now- or spread an idea’ (2016: Loc. 429). Milner
predictable conventions, their common revi- argues that all memes are ‘made by collective
sions, their reception histories, and their practices more substantial than any individual
various capacities for redistribution. For text’, and his book applies a particular gram-
example, MIT Press has published two book- mar and inventorying mechanism to memes
length surveys of internet memes in recent in an attempt to draw connections between
years: Limor Shifman’s Memes in Digital the range of motives, identities, and com-
Culture (2012) and David Milner’s The munities of ‘cultural participants’ involved in
World Made Meme: Public Conversations making memes, as well as the varied multi-
and Participatory Media (2016). Bradley modal dimensions, methods of appropriation
Wiggins and Bret Bowers also attempt a ‘ten- (and, as Milner puts it, ‘reappropriation’),
able genre development of Internet memes’ and distribution networks involved in their
(2015: 1903). These projects are perceptive spread and reception (2016: Loc. 970).
surveys of patterns in meme creation, con- Shifman and Milner invent terms and
sumption, and distribution. In these studies, methods of classification that may prove
514 THE SAGE HANDBOOK OF WEB HISTORY

useful to scholars interested in these forms of both memes that arguably oversaturated the
digital media. They are more critical lenses press and social media outlets they circulated
than comprehensive surveys of the field, invi- in: their impact was transformed by the speed
tations to apply observations about general with which they were championed and then
methods of creation, reception, and distribu- discarded. Encyclopedic indexes and aggre-
tion to specific memes and their attendant gators of memes (like Know Your Meme or
communities of reception and socio-histor- BuzzFeed) that are primarily circulated as
ical contexts. Both surveys are geographi- web sites can avoid being fixed in amber by
cally and temporally limited: Milner notes, publication deadlines and the material limita-
for instance, his book ‘is almost entirely tions of print media. On the other hand, these
situated in Western contexts, most specifi- digital publications can still quickly become
cally English-language contexts within the outdated themselves if they fail to secure
United States’ (2016: Loc. 366). General audiences, contributors, web designers, and
surveys of internet memes also tend to be revenue streams, among other factors.
preoccupied with the material conditions and Some scholars have tended to disassociate
forms of memes that are popular in their his- internet memes from their particular temporal
torical moment of composition. Wiggins and and material dimensions in favor of highlight-
Bowers remind us that ‘memes are a develop- ing the ‘spread’ of the meme across time and
ing genre of communication’ (2015: 1892), space. Jenkins et al. eschew periodization in
but in doing so they also suggest that case favor of a ‘top down to bottom up’ or ‘grass-
studies of internet memes in their particular roots to commercial’ narrative of viral media
historical, cultural, and digital contexts might like internet memes (2015: 1). ‘If it doesn’t
yield more compelling insights than attempts spread, it’s dead’ is the shorthand Jenkins
to unify and classify memes in such a gener- uses for determining ‘value and meaning’ in
alized fashion. twenty-first-century digital contexts. Nooney
The particular memes selected as exam- and Portwood-Stacer argue that ‘In meme
ples or case studies by authors in scholarly culture, flow takes primacy over origin, as the
editions or monographs are inevitably dated. creator of an object and even the conditions
Shifman’s book begins with memes related in which it was made often remain unknown
to ‘Gangnam Style’, a 2012 song by Korean to the legions of users who remix it and pass
pop artist Psy with an accompanying music it on’ (2014: 2). There are certainly instances
video that, at the time of Shifman’s writing, where the ‘flow’ or ‘spread’ of an internet
‘was the first [YouTube video] to surpass the meme remediates it in surprising or troubling
one-billion-view mark’ (2012: 1). Milner ways beyond its original context, and creators
starts his discussion with memes related to of memes (or the material appropriated for
American hip-hop artist Kanye West’s inter- the creation of a meme) often lack a degree of
ruption of fellow celebrity musician Taylor agency or control over where and how their
Swift at MTV’s 2009 Video Music Awards, content is re-used. The materiality of memes
a televised moment that led to ‘a flurry of varies greatly: memes can be snippets of
“Kanye Interrupts” remixes’ across social text, multimodal works, photos, screenshots,
media and entertainment news sites once it digitally edited images, cartoons, found foot-
was re-broadcast and discussed by news, age. And it is true that what we know about
entertainment, and social media outlets their authors and audiences can also vary
(2016: Loc. 486). These memes both rely on greatly: some internet memes have citational
content distributed by corporate-driven media information, while others circulate anony-
channels (MTV and YouTube) and feature mously or travel across networks populated
individuals familiar to consumers of tradi- by pseudonyms, and some are even rendered
tional outlets of global news media. They are unrecognizable by their original authors.
MEMES 515

But origins and conditions in which internet and revitalization’ (2016). Contextual details
memes are first created are not always so about the political, racial, economic, and
hidden or obscured, thanks to archivists and social dimensions of digital communication
scholars invested in highlighting, collecting, platforms offer important information about
and preserving these materials. why and how particular memes are gener-
What are the ethical dimensions worth con- ated and distributed. Lisa Nakamura argues
sidering when individuals who are not public that ‘media scholarship needs to explore the
figures find themselves starring in memes? genealogy, distribution, aesthetics, and visual
Moya Bailey has discussed the benefits and history of memetic culture, so much of which
challenges of a methodology of ‘collabora- is racist, sexist, and comes to us from circui-
tive consent’ in surveys of digital culture, a tous and pseudonymous paths’ (2014: 260).
perspective that allows subjects of analysis Gabriella Coleman notes, for instance, that
the right to refuse to be discussed or named in the aesthetic and performative dimensions
scholarship (2015). While these refusals may of what she calls ‘the lulz’ are ‘lighthearted
create gaps or silences in a scholar’s discus- jokes…enjoyed by many internet nerds
sion of social media activity, Bailey argues around the world’, but she also wants us to
that ‘citation itself may be the thing that remember the ‘cruel’ and troll-like dimen-
creates the harm to the community’ (2015). sions of this form of comedy in its point of
Debates surrounding the presence, absence, origin on 4chan and other message boards
redaction, and remediation of citational data (Coleman, 2015). These earlier contexts con-
in memes are indicative of the particular tinue to haunt this mode of online expression
challenges facing scholars interested in con- as memes travel beyond these initial sites of
temporary digital media. Milner claims that publication.
in his own survey of memes, ‘finding their Jackson, Nakamura, Coleman, and other
creator and site of origin is largely impos- scholars are arguing for more site-specific,
sible, and arguably inconsequential when user- and context-oriented approaches to
considering how they resonate’ (2016: Loc. the reading and making of memes. Recent
531). While it is true that it is at times dif- developments suggest that the futures of
ficult to identify points of origin and creators memes and their analysis will frequently
of memes, knowledge of the original sites of highlight the creators and users who star
publication and distribution of memes, even in and circulate them. For example, in July
if some of that information is incomplete, 2017, media attention focused on President
can yield important insights about online dis- Donald Trump’s redistribution of a meme
course, its various communities and coteries, on his personal Twitter page that imagined
and where these groups overlap and intersect. him in a wrestling match with a CNN logo
Additionally, the visibility and presence (Kaczynski, 2017). The attention paid to
of a diverse and varied set of perspectives Trump’s retweet, to the meme’s initial popu-
within the global web can be muted by gen- larity on Reddit, and to the Reddit user who
eralized discussions of memes and their uses. first posted the meme (and the critiques of
For example, Laur M. Jackson observes that CNN’s decision not to reveal the identity
‘in the tenuous trade of meme profiteering, of the user to its reader) suggest that claims
the actual authors remain largely absent from for the ‘subtle poetic ambiguity’ of memes
the monetary benefits of their own creations’ (Goldsmith, 2011) can be productively chal-
(2016). She calls for a more careful and lenged at times when additional context
precise consideration of the ways ‘memes about particular meme creators, distributors,
in their emergence, development, transfor- and networks reveals more explicit political
mation, and resurgence are imbued with a and social dimensions informing their crea-
semantically Black mode of improvisation tion and reception.
516 THE SAGE HANDBOOK OF WEB HISTORY

THE FUTURES OF MEMES have images removed or digital distributors of


memes legally punished: a 2015 Washington
Scholars interested in the present state of Post article called ‘How Copyright is kill-
memes can learn a lot from excavating the ing your favorite memes’ directs readers to
digital objects and networks of the recent a page where the encyclopedic Know Your
past, as well as their attendant contextual and Meme site has recorded the various Digital
material dimensions. More recent develop- Millennium Copyright Act notices it has
ments impacting the creation and spread of received over the years (Dewey, 2015).
memes include the rise of meme aggregators Jon Ronson’s So You’ve Been Publicly
like Giphy, which make memes searchable Shamed profiles individuals who have lost
and more easily transferable to social media jobs and relationships due to the negative
networks (as well as professional spaces like attention they received from ‘starring’ in inter-
Slack) but which also rely upon user-gener- net memes critical of their actions or language
ated metadata and third-party support to (2015). More recently, we have seen the ‘Pepe
index, preserve, and keep this content in cir- The Frog’ meme – centered on a cartoon frog
culation. The increasing reliance on mobile who originally appeared in a series of print
platforms and networks has resulted in new and digital comics in 2006 by Matt Furie –
variations on memes that reflect the ease of move from the alternative comics community
smart phone screencapping: instead of image to 4chan message boards to Tumblr to aggre-
macros, many users have taken to remediat- gated content on BuzzFeed to Donald Trump
ing content from other platforms like text propaganda and anti-propaganda. An image
message conversations, Twitter exchanges, once used in parodies of ‘slacker’ millennial
and Facebook comment threads. These new behavior is now tied to images classified as
iterations of memes create new challenges hate speech by the Anti-Defamation League.
for archivists, given that screenshots are fre- While initially pleased with the spread of
quently lower-quality versions of media that the image before its embrace by members
had already been compressed in the first of the ‘alt-right’ political spectrum in the
place. Users distributing these memes beyond United States, Furie has since responded to
spaces that encourage the use of tags to the controversy surrounding the lifespan of
improve visibility and accessibility fre- these memes in comics (featuring Pepe hav-
quently care little about citation or best prac- ing nightmares about his transformation), in
tices for linking content to contextual a #SavePepe hashtag campaign designed to
information, and ‘content farms’ devoted to reject these recent affiliations (Furie, 2016),
the economics of memes will frequently (and and ultimately in the act of ‘killing’ his crea-
often willfully) obscure contextual details in tion in a 2017 comic book (Fortin, 2017). ‘I
their campaigns for attention. understand that it’s out of my control’, Furie
This lack of context becomes more con- notes in an editorial for Time magazine pub-
cerning if (or when) you wake up one morn- lished after the Anti-Defamation League con-
ing and find that you or something you’ve troversy, ‘but in the end, Pepe is whatever
created has become a meme. Some subjects you say he is, and he and I, the creator, say
of memes have leveraged their visibility that he is love’ (Furie, 2016). Despite this
into endorsement deals and other employ- acknowledgment of a content creator’s inevi-
ment opportunities: for example, in 2015 table loss of control in digital spaces popu-
Delta Airlines employed several ‘stars’ and lated by millions of users with little regard
visuals affiliated with then-popular internet for copyright law’s dictates regarding re-use
memes in a safety video that aired as part of and redistribution, Furie’s frustrations are
the boarding procedures used on its planes clear in his repeated attempts to contextualize
(Grayson, 2015). Others have taken steps to Pepe and regain some semblance of control.
MEMES 517

Furie’s stated wishes (and the national press Bad-CRC. (2008) ‘All Your Base Are Belong To
attention they received) may influence some Us (Original Version)’, YouTube (https://
users of social media and internet meme crea- youtu.be/8fvTxv46ano) Accessed August 1,
tors to avoid uploading and remixing images 2018.
of Pepe, but the announcement of a meme’s Bailey, M. (2015) ‘#transform(ing)DH Writing
and Research: An Autoethnography of Digi-
‘death’ is little more than ceremonial in this
tal Humanities and Feminist Ethics’, Digital
particular instance. Humanities Quarterly, 9(2), (http://www.
On Twitter, Facebook, Instagram, and digitalhumanities.org/dhq/vol/
other social media networks, we see meme 9/2/000209/000209.html) Accessed August
producers and ‘stars’ enjoying the benefits 1, 2017.
of celebrity or the negative consequences of Bechdel, A. and Cathy. (2005) ‘The Rule’,
public shaming, we watch marketing firms Alison Bechdel (http://dykestowatchoutfor.
mimic, purchase, or outright steal viral con- com/the-rule) Accessed August 1, 2017.
tent and aesthetic conventions, and we see Benner, J. (2001) ‘When Gamer Humor
journalistic outlets and content farm pub- Attacks’, WIRED (https://www.wired.
lishers discover new ways to cover, curate, com/2001/02/when-gamer-humor-attacks/)
Accessed August 1, 2017.
and monetize the attention paid to popular
Blackmore, S. (1999) The Meme Machine. New
memes. But the economics of attention are York: Oxford University Press.
complicated, and these networks are making Brogan, J. (2016) ‘Facebook Live’s Biggest Prob-
decisions that may improve their bottom line lem Isn’t Porn. It’s Copyright’, Future Tense
while simultaneously alienating or disrupt- (Slate) (http://www.slate.com/blogs/future_
ing creators of internet memes. For exam- tense/2016/04/12/facebook_live_video_
ple, Brian Feldman has noted how Tumblr’s has_a_problem_with_copyright_not_porn_
status as a hub of ‘internet culture’ and the despite_rights.html) Accessed August 1, 2017.
site of origin for numerous internet memes Brügger, N. (2016) ‘Introduction: The Web’s
has not translated into a successful busi- First 25 Years’, New Media and Society,
ness model, and that Twitter ‘still has trou- 18(7): 1059–65.
Cesar B. (2005) ‘Internet Meme’, Wikipedia
ble turning a profit’ in 2017 (2017). These
(https://en.wikipedia.org/w/index.php?
observations raise questions about the future title=Internet_meme&oldid=17287330)
of internet memes and whether that future Accessed August 1, 2017.
will be encouraging or restrictive; for exam- Coleman, G. (2015) Hacker, Hoaxer, Whistle-
ple, Feldman argues that recent innovations blower, Spy: The Many Faces of Anonymous.
like Facebook and Instagram’s ‘algorithmic New York: Verso Books.
timelines’ are ‘terrible for internet culture’ Davison, P. (2012) ‘The Language of Internet
(2017). The instability of these networks sug- Memes’, in Michael Mandiberg (ed.), The
gests that archivists may need to act sooner Social Media Reader. New York: NYU Press.
rather than later to document and preserve pp. 120–34.
internet memes and important contextual Dawkins, R. (1989) The Selfish Gene (new edi-
tion). New York: Oxford University Press.
materials that might help future generations
Dewey, C. (2015) ‘How Copyright Is Killing
understand their popularity and perceived Your Favorite Memes’, The Washington Post
value to earlier generations of the web. (https://www.washingtonpost.com/news/
the-intersect/wp/2015/09/08/how-copy-
right-is-killing-your-favorite-memes/?utm_
REFERENCES term=.684b32e941f3) Accessed August 1,
2017.
Abbate, J. (2017) ‘What and Where is the Feldman, B. (2017) ‘Tumblr’s Unclear Future
Internet? (Re)defining Internet Histories’, Shows That There’s No Money In Internet
Internet Histories, 1(1–2): 8–14. Culture’, Select All (New York Magazine)
518 THE SAGE HANDBOOK OF WEB HISTORY

(http://nymag.com/selectall/2017/06/theres- the-blackness-of-meme-movement)
no-money-in-internet-culture.html) Accessed Accessed August 1, 2017.
August 1, 2017. Jenkins, H., Ford, S., and Green, J. (2016)
Fortin, J. (2017) ‘Pepe The Frog is Dead, Or So Spreadable Media: Creating Value and
His Creator Hopes’, New York Times (https:// Meaning in a Networked Culture. New York:
www.nytimes.com/2017/05/08/us/pepe-the- NYU Books.
frog-comic.html) Accessed August 15, 2017. Kaczynski, A. (2017) ‘How CNN found the
Fox Jr., F. (2010) ‘First Person: In Defense of Reddit user behind the Trump wrestling GIF’,
Happy Days’ “Jump the Shark” Episode’, Los CNN (https://www.cnn.com/2017/07/04/p
Angeles Times (http://articles.latimes. olitics/kfile-reddit-user-trump-tweet/index.
com/2010/sep/03/entertainment/la-et-jump- html) Accessed August 1, 2017.
the-shark-20100903) Accessed August 1, Leskovec, J., Backstrom, L., and Kleinberg, J.
2017. (2009) ‘Meme-tracking and the Dynamics of
Furie, M. (2016) ‘Pepe The Frog’s Creator: I’m the News Cycle’, ACM SIGKDD International
Reclaiming Him. He Was Never About Hate’, Conference on Knowledge Discovery and
Time (http://time.com/4530128/pepe-the- Data Mining (ACM KDD).
frog-creator-hate-symbol/) Accessed August Lussier, R. (n.d.) ‘Dancing Baby FAQ’, Burning
15, 2017. Pixel Productions (http://www.burningpixel.
Gal, N., Shifman, L., and Kampf, Z. (2016) ‘“It com/Baby/BabyFAQ.htm) Accessed August
Gets Better”: Internet Memes and the Con- 1, 2017.
struction of Collective Identity’, New Media Milligan, I. (2017) ‘Welcome To The Web: The
and Society, 18(8): 1698–714. Online Community of GeoCities during the
Godwin, M. (1991) ‘Untitled’, Rec.arts.sf-lovers Early Years of the World Wide Web’, in Niels
(http://groups.google.com/groups?selm=19 Brügger and Ralph Schroeder (eds), The Web
9 1 A u g 1 8 . 2 1 5 0 2 9 . 1 9 4 2 1 % 4 0 e ff . o r g ) As History. London: University College
Accessed August 1, 2017. London Press. pp. 137–58.
Godwin, M. (1994) ‘Meme, Counter-Meme’, Milner, R. (2016) The World Made Meme:
WIRED (https://www.wired.com/1994/10/ Public Conversations and Participatory
godwin-if-2/) Accessed August 1, 2017. Media. Cambridge (MA): MIT Press. Kindle
Godwin, M. (2008) ‘I Seem To Be A Verb: 18 Edition.
Years of Godwin’s Law’, Jewcy (http://jewcy. Nakamura, L. (2014) ‘“I WILL DO EVERYthing
com/jewish-arts-and-culture/i_seem_be_ That Am Asked”: Scam-Baiting, Digital
verb_18_years_godwins_law) Accessed Show-Space, and the Racial Violence of
August 1, 2017. Social Media’, Journal of Visual Culture,
Goldsmith, K. (2011) ‘The Meme Museum’, 13(3): 257–74.
Harriet (The Poetry Foundation) (https:// Nooney, L. and Portwood-Stacer, L. (2014)
w w w . p o e t r y f o u n d a t i o n . o r g / h a r- ‘One Does Not Simply: An Introduction to
riet/2011/04/the-meme-museum) Accessed the Special Issue on Internet Memes’, Journal
August 1, 2017. of Visual Culture, 13(3): 248–52.
Grayson, N. (2015) ‘Delta Airlines’ Meme Ronson, J. (2015) So You’ve Been Publicly
Safety Video Is Garbage’, Kotaku (https:// Shamed. New York: Riverhead Books.
kotaku.com/delta-airlines-meme-safety- Saylor, N. (2014) ‘Getting Serious About Col-
video-is-garbage-1705916797) Accessed lecting And Preserving Digital Culture’, Folk-
August 1, 2017. life Today (Library of Congress) (https
Grossman, L. (2006) ‘You – Yes, You – Are ://blogs.loc.gov/folklife/2014/06/getting-serious-
Time’s Person Of The Year’, Time (http:// about-collecting-and-preserving-digital-
content.time.com/time/magazine/arti- culture/) Accessed August 1, 2017.
cle/0,9171,1570810,00.html) Accessed Shifman, L. (2012) Memes in Digital Culture.
August 1, 2017. Cambridge (MA): MIT Press.
Jackson, L.M. (2016) ‘The Blackness of Meme Skirwin, T. (1999) ‘How to post about Nazis
Movement’, Model View Culture (https:// and get away with it – the Godwin’s Law
modelviewculture.com/pieces/ FAQ’, alt.usenet.kooks,alt.usenet.legends,alt.
MEMES 519

answers,news.answers. (http://wiki.killfile. Twitter. (2016) ‘Terms of Service’, Twitter


org/projects/usenet/faqs/godwin/) Accessed (https://twitter.com/en/tos) Accessed Decem-
August 1, 2017. ber 15, 2016.
Solon, O. (2013) ‘Richard Dawkins on the inter- Wiggins, B. and Bowers, G.B. (2015) ‘Memes
net’s hijacking of the word “meme”’, WIRED as Genre: A Structurational Analysis of the
(UK) (http://www.wired.co.uk/article/richard- Memescape’, New Media and Society, 17(1):
dawkins-memes) Accessed August 1, 2017. 1886–906.
35
Years of the Internet: Vernacular
Creativity before, on and after the
Chinese Web
Gabriele de Seta

INTRODUCTION: YEAR OF THE In the words of a local literature professor,


INTERNET going online in the late 1990s through the
precarious configuration of a 386 personal
computer, a telephone set, a CNY 500
In the hype-ridden People’s Republic of China, modem and CNY 200 worth of phone con-
1996 was the ‘Year of the Internet.’ No matter nection fees offered the experience of a
that, by the highest estimates, only 150,000
‘speed never seen in history before’ (Yan,
Chinese people – barely 1 in 10,000 – are actually
wired. (Barmé and Sang, 1997) 1997: 3). Similarly, in macro-economic and
infrastructural terms, the development of
Technically speaking, the Internet arrived in networked communications in China is com-
Mainland China¹ on August 26th, 1986, monly described as ‘break-neck’, ‘momen-
through an X.25 connection established tous’, ‘accelerated’ and ‘compressed’,
between Karlsruhe University and the adjectives that give a sense of how Chinese
Chinese Institute of Computer Applications authorities and local IT industries have man-
(ICA) in Beijing (Hauben, 2005). Its early aged to leapfrog technological advancements
years in the country were characterized by and catch up with an imagination of the
academic frustration, technical pessimism, Internet largely shaped by the Silicon Valley
bureaucratic impediments and international model.
mistrust (Zheng, 1994: 236). Offered to the This chapter offers an episodic chronology
country’s general public in 1996, ten years of Internet use in China by singling out six
after its first infrastructural link, the Internet landmark years and highlighting how crea-
has become, in the span of little more than tive practices have been shaping the adoption
two decades, a fundamental part of the every- of the technology before, during and after the
day life of the majority of Chinese citizens. popularization of the World Wide Web as the
YEARS OF THE INTERNET 521

primary information space for hundreds of development from different angles and per-
millions of users. One peculiar consequence spectives (Qiu, 2004; Tai, 2006; Zhou, 2006).
of the accelerated and compressed develop- Yet, as is increasingly often recognized, these
ment of ICTs in China has been the confla- inquiries have disproportionately focused on
tion of the Internet (as a network of networks) the political implications and democratizing
and the Web (as an information-sharing pro- potential of ICTs (Herold and de Seta, 2015),
tocol) in the experience of a large majority leading many to ignore ‘the myriad ways in
of users. The first web server in China was which the Internet might potentially affect
installed in 1994 (Zhou, 2006: 136), when Chinese life’ (Kluver and Yang, 2005: 306).
only a minimal fraction of the population had As more recently noted, the study of usage,
access to the Internet, so that the users com- users and user experiences is fundamental to
ing online during and after the ‘Year of the understanding the relationship between the
Internet’ of 1996, along exponential growth Internet, the Web and society (Schroeder and
rates propelling the Chinese user population Brügger, 2017: 2). Perhaps as a result of the
to 2.1 million by the end of 1998 (Du, 1999: terminological conflation highlighted above,
405), could already experience the Internet much of the scholarship on the Chinese wan-
not only through e-mail and bulletin boards, gluo has focused on the Internet rather than
but also through web browsers and link direc- the Web, tending to privilege infrastructural
tories. As of today, the Mandarin Chinese concerns over user practices. Yet the existence
terms hulianwang [literally ‘interconnection of a Chinese Web cluster is hardly disputed,
net’] and wangluo [literally ‘network’] are and recent studies attribute its accretion more
routinely used to refer to both the Internet to language and content preferences of local
and the Web, while more technical transla- users than to the much-discussed censorship
tions of World Wide Web such as wanwei- mechanisms developed by the authorities
wang and huanqiuwang remain outside of (Taneja and Wu, 2014).
popular parlance. The wang character, pre- In this chapter, I historicize the modest
sent in all these terms to translate both ‘web’ advent, massive popularity and creeping
and ‘net’, exemplifies this blurring of tech- enclosure of the Web in China by linking it
nology and protocol: checking if something to the continuities in the vernacular creativ-
is wang shang [‘on the network’] could mean ity of Internet users. After introducing the
either verifying a device’s wireless access to early years of local Internet development,
the Internet, the online presence of a friend explaining the situated experience of both
on a messaging application, the existence of the Internet and World Wide Web, and argu-
information on a webpage, or the successful ing for a historical perspective that highlights
upload of data to a cloud server. As in other how the creative practices of users have
local contexts, the vernacular conflation of accompanied networked communications
Internet and Web testifies to the centrality of in the country throughout three decades, I
the protocol to the experience of the underly- chronicle six historical turning points exem-
ing technology. plified by different forms of vernacular crea-
Recent calls for histories of both the Internet tivity, each connected to the rise and fall of
and the Web (Brügger, 2011; Wellman, 2011) communication protocols, coding standards,
have urged scholars to move beyond the content-management systems, messaging
amply chronicled pasts of networked com- software and social media platforms. Starting
munications in the United States and Europe from the pre-Web examples of e-mail and
(Abbate, 1999; Roy, 1998). Sensitive to these BBSs, I follow everyday creative usage
concerns, a growing body of academic litera- through the boom of amateur homepages,
ture on Chinese media history has been chart- community portals and blogs, towards the
ing the three decades of the country’s Internet looming enclosure of the Web hidden behind
522 THE SAGE HANDBOOK OF WEB HISTORY

contemporary social media platforms. In the (Danet, 2001) and of ANSI artworks on
conclusion, I argue that vernacular creativity Bulletin Boards (Scott Sadofsky, 2005) are
provides a productive thread through which widely documented as early examples of
the advent, popularity and disappearance of vernacular creativity on the Internet², but it
the Web can be followed from the point of is with the popularization of the World
view of local populations of Internet users. Wide Web in the mid 1990s that vernacular
creativity is embraced by larger popula-
tions of amateur users (Lialina, 2009a,
2009b). In this chapter, I adopt Burgess’
A HISTORY OF VERNACULAR definition of vernacular creativity as ‘both
CREATIVITY an ideal and a heuristic device’ (2007: 206)
in order to distill a chronology of the Web
The idea of the ‘vernacular’ was introduced in China from the point of view of the
as early as 1960 by anthropologist Margaret creative practices of its users. Grounded on
Lantis to describe forms of everyday culture ethnographic research on Internet use in
and speech that are different ‘from the liter- the country, my periodization playfully
ary language or from the language of straight appropriates the media hype rhetorics that
news reporting’ (1960: 202–3). This atten- recurrently describe China as going
tion to everyday forms of speech has been through ‘the Year of’ whatever emerging
extended, most famously by the work of media technologies happen to achieve
Michel de Certeau, to the larger ‘lexicon of widespread adoption at any specific point
users’ practices’ (1984: 31) resulting from in time. From e-mail greeting cards to
the consumers’ tactical engagement with amateur web design, from personal blogs
popular culture and media. The category has to smartphone video clips, my chronology
been more recently discussed by various singles out significant years for the uptake
authors in linguistics, media and cultural of specific media forms, protocols and
studies, including Henry Jenkins’ use of platforms, and connects them to the corre-
‘vernacular culture’ to define the experimen- sponding practices of vernacular creativity
tal productions created by amateurs (2006: through which users experiment with the
132), David Barton’s idea of ‘vernacular communicative possibilities of each
writing on the Web’ (Barton, 2010: 109) to medium.
discuss literacy practices rooted in locality
and everyday life, and Jean Burgess’ defini-
tion of ‘vernacular creativity’ as:
1987: YEAR OF THE E-MAIL
a productive articulation of consumer practices
and knowledges (of, say, television genre codes)
with older popular traditions and communicative Ueber die Grosse Mauer erreichen wie alle Ecken
practices (storytelling, family photography, scrap- der Welt
booking, collecting). (Burgess, 2007: 207)
Across the Great Wall we can reach every corner in
With the popularization of Internet access the world
and digital media platforms, these commu-
nicative practices become increasingly So read the first e-mail sent on September
central in everyday life, and the rift 20th, 1987 from the Institute of Computer
between active producers and passive con- Applications (ICA) in Beijing, China to
sumers shrinks into a permeable boundary. research partners at the University of
The emergence of ASCII compositions out Karlsruhe, Germany, one year after the
of the limited affordances of e-mail writing Chinese Academic Network (CANet) was
YEARS OF THE INTERNET 523

established in cooperation between the two was enthralled. A television miniseries is


institutions (Hauben, 2005). For five years, reportedly in the works’ (Barmé and Sang,
after these first steps achieved through the 1997). Another account of what was argu-
painstaking efforts of joint teams of Chinese ably the first large-scale example of user-
and foreign computer scientists, expensive led crowdsourcing information in China
and precarious e-mail links were ‘the only correlates it with the country’s booming
network connectivity that could reach the informatization: ‘she [Zhu Ling] was really
rest of the world [from China]’ (Hauben, lucky that this happened right during the
2005). During this period of time, academic year in which the Internet began to pene-
researchers and an increasing number of uni- trate massively in China’ (Yan, 1997: 109).
versity students were the first Chinese users For roughly a decade, before being partly
to register e-mail addresses and experience substituted by discussion boards, Web-based
the possibilities of corresponding at a dis- services and messaging applications, e-mail
tance with other users inside the country and was the central medium through which
abroad. Given the lack of international Chinese Internet users could experiment with
encoding standards for hanzi [‘Chinese char- the creative possibilities of computerized
acters’], most of these exchanges happened textual communication. The ASCII graphic
in foreign languages and through alphabetic compositions collected by folklorist Seana
scripts. A second well-documented e-mail Kozar throughout the early 1990s, circulated
message symbolizes the adoption of the by Chinese students as ‘electronic greeting
Internet in China by its early users. It was cards’ during Christmas or Spring Festival
sent on April 10, 1995 from Peking holidays (Figure 35.1), are a striking exam-
University, and it read: ple of the practices of vernacular creativity
through which early Internet users figured
Hi, out ways of reproducing traditional symbols
This is Peking University in China, a place of those and playful motifs in spite of the limitations
dreams of freedom and democracy. However, a of character encoding and textual format-
young, 21-year old student has become very sick ting of the time (Kozar, 1995). For the few
and is dying. The illness is very rare. Though they hundred thousand Chinese users who experi-
have tried, doctors at the best hospitals in Beijing
cannot cure her; many do not even know what enced a pre-Web Internet, vernacular creativ-
illness it is. So now we are asking the world – can ity offered ways of rendering the visuality of
somebody help us? the Chinese script and the lack of consistent
encoding standards (McLelland, 537–550),
[…] foreshadowing a desire for localization that
This is the first time that Chinese try to find help kept driving many of the practices described
from Internet, please send back E-mail to us. in the following sections.
(Fung, 1995)

Classmates of Zhu Ling, a Tsinghua


University sophomore who fell mysteri- 1995: YEAR OF THE BBS
ously ill, posted this message on various
mailing lists seeking medical help from According to the detailed chronology pro-
abroad, and received a flood of helpful vided by Zhou Yongming, the first BBS was
responses. Geremie Barmé and Ye Sang set up in China in 1991. It was only in 1994,
recount how 84 of these responses were in though, that the first local Internet-based
fact correct diagnoses of thallium poison- BBS (the Dawn BBS) was opened to the
ing: ‘Zhu Ling was treated and eventually public, beginning a two-decade-long history
began a slow recovery; the Chinese public of bulletin board use in the country.
524 THE SAGE HANDBOOK OF WEB HISTORY

Figure 35.1 A ‘good fortune lantern’ ASCII graphic sent via e-mail by a Chinese student
during the 1992 New Year celebrations (Kozar, 1995).

Local BBSs quickly overtook the use of rapid shift to Web-based bulletin board
e-mail newsgroups such as alt.chinese.text. access – and in contrast with the specific
big5, which required additional shareware meaning of the acronym in English-language
software in order to input, encode and dis- usage – the term ‘BBS’ was adopted in
play Chinese characters on the user’s termi- Chinese parlance to indicate all sorts of
nal in the various standards available at the online forums, from older bulletin board sys-
time, including zw-DOS, GB (guobiao tems to newer, Web-based forums and image
‘national standard’) for simplified characters, boards. This local usage of the ‘BBS’ acro-
and Big5 for traditional ones (Kozar, 1995). nym persists until today for the description of
The example of the Tsinghua University both massive online community platforms
Shuimu Tsinghua BBS (Figure 35.2), which with multimedia content affordances such as
implemented Web-based access in 1995 after Baidu Tieba as well as government-backed
a few months of TCP/IP-only service forums like the People’s Daily Online
(Xinlang Keji, 2008), highlights how the Qiangguo Luntan [‘Strong Nation Forum’].
arrival of the Web was rapidly embraced by Besides supporting the formation of
users and admins alike. As a result of this new collective identities and facilitating
YEARS OF THE INTERNET 525

Figure 35.2 Log-in page screenshot of the Tsinghua University Shuimu Tsinghua BBS, origi-
nally set up on a 386 computer running Linux, using the same PalmBBS behind the National
Taiwan University Coconut Trees BBS (Xinlang Keji, 2008).

large-scale community interaction on a creative uses of BBSs flourished around polit-


national geographical scale (Damm, 2007: ical extremization, particularly in the realms
288), the expanded textual affordances, of both right-wing nationalism and leftist
the possibilities of account personaliza- revival (Hu, 2007). Community-based ver-
tion and the increasingly varied multimedia naculars moved beyond the textual, includ-
affordances of BBS platforms have rapidly ing personalized emoticons, satirical images
sustained their users’ experimentation with and even amateur video productions, as the
various forms of vernacular creativity. Early repertoires of egao [bad taste humor, literally
user creativity was largely linguistic, pushing ‘evil making’] content became increasingly
lexical and rhetorical choices into the conten- deployed in the practices of confrontation
tious realms of flaming and personal attacks accurately described in Han (2015). The
(Huang, 1999). The development of neolo- rapid shift from TCP/IP bulletin boards to
gisms, in-jokes and slang terms understood Web accessibility, and the following popu-
only by community members was seen as a larization of Web-native discussion forums
form of resistance against the official lan- and large-scale online communities proves
guage of public discourse (Clark, 2012: 162), how the history of the various platforms that
and contributed to the emergence of a first in China are all called BBS is tightly linked
wave of online political satire during the late to the World Wide Web and its protocols.
1990s and early 2000s (Luqiu, 2017: 124).
As communities diversified beyond uni-
versity campuses, and as enthusiastic early
Internet adopters were overtaken by new 1998: YEAR OF THE HOMEPAGE
users flocking towards commercial and
government-sponsored platforms – 2006 In May 1994, the CAS Institute of High
was widely acclaimed as ‘the year of online Energy Physics marked another milestone for
communities’, as Pang (2008: 60) notes – the Internet development in China by setting up
526 THE SAGE HANDBOOK OF WEB HISTORY

the first www server in the country and devel- and Manucharova, 2009: 396); as evidenced
oping the first homepage hosted within its by an analysis of Chinese GeoCities webpages
national borders (Zhou, 2006: 136). As noted created between 1996 and 1998, a substantial
by McLelland (537–550), the popularization part of them was dedicated to introductions
of Internet connectivity in China coincided to the country, tea-savoring instructions, coin
with a historical moment in which the World divination guides, classical literature and
Wide Web was becoming the predominant other explicitly ‘cultural’ topics (Figure 35.3).
way through which users around the world Language was also a defining feature of the
created and navigated online content. It is Chinese vernacular Web: even if hanzi char-
thus not surprising that, along with e-mail acters were introduced in UNICODE in 1992,
interactions and BBS-based communities, most web designers kept encoding their pages
creating websites, following hyperlinks and in either the national standard GB2312 for
exploring web directories were some of the simplified characters, or the Big5 standard for
chief forms of Internet usage in late-1990s traditional characters, a choice mostly guided
China (Yan, 1997: 100). It is also not surpris- by personal tastes and regional preferences³.
ing that the majority of early Chinese web- Many websites provided multiple versions
masters were academics and college students: of their pages in different encodings to avoid
according to a 2002 survey, one in eight stu- visualization problems, and some webmas-
dents had started using the Internet before ters offered bilingual contents to cater to both
1998 (Clark, 2012: 161), acquiring the liter- local and international visitors.
acy in HTML language and web design nec- Zhai Zhenming, who is today a Professor
essary to create simple yet highly personal of Philosophy at Sun Yat-sen University,
homepages hosted on university servers or Guangzhou, was one of the first Chinese
free hosting service providers like GeoCities, users of GeoCities, where he set up his per-
Tripod or Angelfire. The diffusion of encod- sonal homepage in 1997 with the intent to
ing standards with increasingly comprehen- promote his academic output and share his
sive character sets contributed to lowering the poetry compositions. He remembers his
linguistic barriers for users to navigate and experience as an amateur web designer:
design an increasingly rich Chinese Web.
I did very little coding with HTML, I basically just
Emerging in parallel to the local com- used the templates provided by GeoCities, plus
mercial and governmental developments of some additional marking adjustment with simple
the World Wide Web, the ‘vernacular web’ HTML tags. I remember I also linked to my musical
(Lialina, 2009a: 22) of Chinese homepage- clips on the webpage, it’s me playing the Chinese
instrument erhu… I don’t know if you can still find
making amateur culture wasn’t that different
those recordings online. (Interview with the author,
from its American or European counterparts, July 2015)
and remains a precious testimony of how early
users latched onto the creative possibilities Another amateur web designer, Song Gang,
offered by the HTML protocol and created whose website is a colorful potpourri of bor-
their own online homes as bricolages of starry rowed graphic elements and painstakingly
backgrounds and navigation icons, personal nested HTML tables (Song, 1998), successively
beliefs and biographical details, low-resolu- went on to establish one of China’s earliest
tion photographs and animated GIFs, hyper- grassroots environmental protection organiza-
links and guestbooks (Chandler, 1998). While tions. He believes his successful career was
structurally similar to their global counter- helped by his early tinkering with web design,
parts, Chinese amateur websites also had their and he takes pride in his dedication to the craft:
defining features. National pride and cultural My personal website was made in 1998 using
heritage contributed to a clearly local look of HTML – design and coding are all done by me. I
many Chinese amateur websites (Gevorgyan taught myself how to do it because I was i­nterested
YEARS OF THE INTERNET 527

Figure 35.3 Bilingual coin divination page from one of the earliest Chinese websites hosted
on GeoCities. Screenshot by the author, courtesy of the GeoCities Research Institute archive.

in it, I bought some books in Zhongguancun [a Inc. released a proprietary messaging appli-
technology hub in Haidian district, Beijing] to cation called OICQ, an acronym standing for
study, and I had many exchanges with colleagues
‘Open ICQ’. Following a lawsuit filed by
who were specializing in computing. (Interview
with the author, October 2014) AOL, which at the time owned the increas-
ingly popular instant messenger software
Before GeoCities and similar free hosting pro- ICQ, Tencent had to rename its application
viders were either blocked in China or fell out QQ. Struggling during its first two years of
of fashion, and before hosting websites inside operation, Tencent QQ eventually managed
the country was subjected to increasingly strict to achieve a breakthrough in popularity in
regulations and controls, amateur web design- 2001, quickly moving from one million to
ers explored the potentialities of the Web by 50 million users, and started experimenting
cobbling together personal homepages to intro- with sustainable business models (CIW
duce themselves and their country to the world. Team, 2014). In the early 2000s, the stylized
penguin with a red scarf that QQ used as an
icon and mascot became instantly recogniz-
able among Chinese Internet users. For many,
2001: YEAR OF QQ registering a QQ number (a unique identifier
used for both login and for contact exchange
In February 1999, a few months after being across Tencent services, both Web-based and
founded, Chinese Internet company Tencent proprietary) is the earliest memory of
528 THE SAGE HANDBOOK OF WEB HISTORY

accessing the Internet, even before setting up of my interviewees, who were children or
an e-mail account or a BBS profile (Wallis, teens in the early 2000s, still used their first
2011: 414). QQ number and QQ Mail address after more
Besides the instant messaging capabilities than a decade, and connect the Tencent media
of the software, having a personal QQ num- ecology to affective moments and intimate
ber introduced Internet users to a growing memories: some opened their account in ele-
number of Web-based Tencent services such mentary school and used QZone to keep in
as QZone (a social networking and content- touch with classmates and family members,
sharing platform) and QQ Mail (an e-mail experiencing first-hand the plights of context
inbox), as well as to other software-based collapse when posting something too racy or
services such as QQ Music (a music stream- vulgar; others registered their QQ numbers in
ing and download application) or QQ Pinyin high school, and sometimes got two or more
(an input method for Chinese characters accounts to divide personal and work connec-
alternative to the Microsoft Pinyin offered by tions (de Seta, 2015: 124).
Windows operating systems). As QQ moved I registered my own QQ number in the mid
away from its beginnings as an ICQ clone 2000s, as it was evidently a necessary medium
through the addition of features such as virtual to keep contact and interact with Chinese
currency (QCoin), in-software videogames friends and acquaintances. Over the years,
(QQ Games) and paid memberships (QQ my contact list has grown to include hundreds
Vip), it quickly became an essential applica- of names and dozens of chat groups discuss-
tion for the lives of Chinese Internet users, ing the most varied topics (Figure 35.4),
younger ones in particular. A large majority and my QQ account has played a central role

Figure 35.4 A QQ group chat window, including a common message feed, a textbox with
multimedia uploading options, different toolbars to access additional services and a list of
the 118 group members. Screenshot by the author.
YEARS OF THE INTERNET 529

in both my private life and my professional Moreover, in late 2009 Tencent launched a
research projects, offering the convenience Web-based version of the QQ messenger,
of logging into numerous local web ser- which offers a comprehensive, desktop-like
vices and mobile apps through its verified user experience accessible on any computer
user credentials. Similarly, for more than through a simple scanning of an in-browser
15 years, QQ has provided an indispensa- QR code. From its ungainly beginnings as a
ble communicational space and an array of knockoff messaging application in the early
interactional tools to hundreds of millions 2000s, Tencent QQ has developed into a
of Chinese users, expanding the social pos- sprawling media ecology of software and
sibilities offered by mailing lists and BBSs platforms that are as self-enclosing as they
into a more immediate and personalizable are intricately woven with the Chinese Web.
experience that moves (sometimes clumsily4)
between Web-based services and proprietary
software. It is through QQ chat windows that
slang terms circulate beyond the boundaries 2005: YEAR OF THE BLOG
of online communities, coalescing in an ever-
growing, nation-wide repertoire of wangluo In the mid 2000s, the World Wide Web of
yuyan [‘Internet language’]; similarly, it is static homepages was shifting towards the
because of the possibility of importing and dynamic content and interoperable protocols
editing images of multiple formats as emoti- promised under the banner of Web 2.0
cons ready to be exchanged among users that (O’Reilly, 2005). Along with social network-
genres of visual content like egao achieve ing websites and media streaming services,
wide popularity online; finally, it is thanks to blogs emerged as one of the most popular
the convenience of managing groups, organ- kinds of platform, and China was no excep-
izing activities, exchanging files and shar- tion. For many of the Internet users who
ing screenshots that communities of practice encountered the Web without any coding
(such as subtitling and translation groups) as experience, blogging was the first opportu-
well as entire companies have adopted QQ as nity to construct a personal online space
a fundamental software tool for professional (Sima and Pugsley, 2010: 292). Blogging
productivity. websites existed in China since 2002, but the
While the QQ messenger itself is not a turning point for this media form happened
Web-based service but rather an instant mes- in 2005 – widely recognized as the ‘Year of
saging software operating through a propri- the Blog’ (Yu, 2007: 425) – when the posts
etary Tencent protocol (originally released as of 4.3 million active bloggers proved to have
a computer application and today also widely a resounding social and cultural impact. In
used in its mobile app version), it is undeni- late 2006, search engine provider Baidu
able that its usage has from the beginning launched its own blog platform alongside
been closely tied to the experience of Web other companies like Sina, NetEase and
content: on launch, its starting screen offers a Sohu, and by 2008 the country had a popula-
crammed collage of news headlines, images tion of 105 million active bloggers (Wang
and video previews linking to the QQ News and Hong, 2010: 67). Due to their low par-
section of the Tencent portal; hyperlinks and ticipation threshold, their social networking
screengrabs of Web content are routinely features and their strong authorial identity,
shared and commented upon across private blogs supported the expansion of online
conversations and group discussions; a sim- interactions beyond BBSs and QQ group
ple click on the small icons in the main win- chats, providing a space to growing net-
dow of the application opens the QQ Mail or worked audiences engaging in celebrity
Qzone Web interfaces in the system browser. fandom and large-scale debates following
530 THE SAGE HANDBOOK OF WEB HISTORY

Figure 35.5 The Sina Blog of Zhou Xiaoping, a 1981-born blogger popular for his nationalist
and anti-American views, as of September 2010. Screenshot by the author.

popular or controversial blog posts (Wallis, speaking truth to each other, and by doing so
2011: 414). in a widely accessible manner, are speaking
As Wu (2012) notes, it was precisely the truth to power’ (Esarey and Xiao, 2008: 753).
booming of blogging practices in China that In the wake of the July 2009 Ürümqi
catapulted online debate culture – with its riots and the subsequent shutdown of the
vernacular slang and rhetorical positions – first Chinese micro-blogging service Fanfou
into the realm of public opinion and every- (along with the blockage of foreign websites
day life at large. What was once limited to like Facebook, Twitter and YouTube), the
mailing lists and university discussion boards Sina Corporation launched Sina Weibo, a
became discussed throughout comment sec- micro-blogging website through which users
tions and blog posts, while popular bloggers could publish 140-chararacter multimedia
such as race car driver Han Han or nation- posts, follow accounts and comment or re-
alist writer Zhou Xiaoping rose to the status blog each other’s content. In the span of a few
of widely followed celebrities and opinion months, competing blogging providers such
leaders (Figure 35.5). The blogging boom of as NetEase, Sohu and Tencent all released
the late 2000s led many journalists and aca- their own Weibo platforms, trying to attract
demics to identify a ‘Chinese blogosphere’ users to their own micro-blogging service.
that could function as an incubator for a local The intensification of censorship practices
civil society mediating between individuals commonly summarized through the ‘Great
and the state (Tang, 2009). Blog posts dis- Firewall’ metaphor (Tsui, 2007), combined
playing a skillful use of ambiguous language with the opportunistic proliferation of local
and clever egao humor were widely regarded blogging and micro-blogging platforms, con-
as a forceful pushback against authoritar- tributed to enclose the activities of hundreds
ian control and censorship: ‘Chinese are of millions of users in increasingly walled
YEARS OF THE INTERNET 531

social media gardens amenable to state con- the app, accessible in any browser through the
trol and delegated censorship. simple scanning of a QR code. Other functions
Yet, despite the number of analyses that added to the platform over the years include
frame blogs and micro-blogs as either safety WeChat Pay, a digital wallet enabling users to
valves for dissent or pressure cookers for perform mobile payments and send money to
activism (Hassid, 2012), most users of these each other; City Services, a booking system for
platforms didn’t engage with politics or different kinds of public and private services in
social issues, but largely blogged to keep a urban areas; and even WeChat Search, a propri-
public record of their daily lives, to share etary search engine. WeChat epitomizes a
and discuss their hobbies and passions, or to major shift that characterized Chinese Internet
participate in larger communities of interest development in the early 2010s: the drastic
or fandom (Clark, 2012: 129). Even when growth of mobile Internet access and the broad
channeled towards the discussion of political uptake of smartphones and tablets throughout
topics, the vernacular creativity of Chinese the country. As the Internet is predominantly
bloggers did not necessarily fit the bill of accessed through app interfaces and proprie-
an emerging public sphere: on the one hand, tary protocols, the Web seems to disappear
nationalist, conservative and even reactionary behind fluttering messaging windows and end-
blogs sprang up along their more liberal and lessly scrolling social media feeds.
pro-democracy counterparts (Leibold, 2011: The affordances offered by smartphones
1034); on the other, the activities of amateur and tablets – wireless Internet access, front
bloggers and online celebrities were accom- and rear camera imaging, audiovisual play-
panied by ‘Red Blogs’ set up by party organs back and recording – played a central role in
and official accounts managed by Chinese the kinds of vernacular creativity practiced
authorities with the goal of promoting politi- on WeChat and shaped the larger ecology of
cal stability (Esarey, 2015). mobile apps and online platforms coalescing
around it. Besides exchanging textual mes-
sages and voice snippets, users of WeChat
could communicate through emoticons,
2011: YEAR OF WECHAT in-app stickers, funny images, short video
‘sights’ shot through the app itself, longer
A decade after launching QQ – by then an videos imported from other platforms, loca-
extremely successful application used across tional data, and even hongbao [‘red enve-
personal computers, smartphones and tablets – lopes’] used to transfer small amounts of
Tencent released a mobile-oriented messaging money between accounts (Figure 35.6). The
app called Weixin [literally meaning ‘micro- multimedia messaging possibilities provided
message’], better known in English as WeChat. by WeChat also directly contributed to the
Profiting from an already existing population emergence of new genres of digital con-
of hundreds of million QQ accounts, and tent. Alongside the emoticons and stickers
helped by a societal shift towards mobile offered by the platform, users imported and
devices and wireless connectivity, WeChat exchanged vibrant repertoires of personal-
quickly accrued a massive userbase. Tencent ized biaoqing [literally ‘expressions’], giving
learned from its previous experience with mes- a new lease of life to the egao humor popular-
saging applications, and included social net- ized by BBSs and micro-blogging platforms.
working functions in WeChat through a Dividing their time between chat conversa-
pengyouquan [‘friend circle’] content feed, the tions and pengyouquan feeds, users chroni-
possibility of following official accounts man- cled their daily activities through edited zipai
aged by brands, organizations, news outlets and [‘selfies’] and short videos ready to be liked
celebrities, as well as a Web-based version of and commented on. Exposed to the activities
532 THE SAGE HANDBOOK OF WEB HISTORY

Figure 35.6 In-app stickers, photos, personalized biaoqing images, short videos, hong-
bao red envelopes and emoticons used in interactions across three WeChat group chats.
Screenshots by the author, 2017.

of official accounts and the discussions tak- of the users’ creative activity on WeChat still
ing place in massive chat groups, they took passes through the Web, or draws the Web
advantage of their devices’ affordances to back into the app, in different shapes and
engage first-hand in activities like coordi- forms. WeChat’s proprietary browser is the
nating nationalist campaigns (Han, 2015), default way of opening hyperlinks shared
marketing and reselling branded products in WeChat conversations, but two extra taps
(Zhang, 2015) or participating in everyday in the menu allow users to launch their own
practices of citizen activism (Pan, 2017). default browser; WeChat posts shared by pub-
WeChat’s success story, grounded on lic accounts are actually HTML webpages
Tencent’s experience with the development designed through a Web-based editor and
of QQ and its ecology of services, and culmi- hosted on the WeChat Official Account web-
nating in its overtaking of sister software QQ site; WeChat stickers, stories and screenshots
with almost 900 million users and ten mil- are also collected by individual users and
lion official accounts by Q4 2016 (Penguin uploaded in Web-based repositories to facili-
Intelligence, 2017), also marks an important tate circulation and preservation. As social
year for the Chinese Web, which users increas- media platforms seek to enclose user activity
ingly experience through the selective cura- and content, vernacular creativity pulls the
tion of social media feeds, and which seems Web back into their app-based ecologies from
to slowly disappear as browsers become just unexpected sociotechnical backdoors.
one of the many ways of interfacing with the
Internet among the many app icons pinned to
mobile device screens. Yet proclaiming the
death of the Chinese Web might be exagger- 2020: YEAR OF THE FUTURE
ated: while it is true that social media plat-
forms push for app-based experiences and for A few years from now, perhaps 2015 will be
the enclosure of both proprietary and user- remembered as China’s ‘Year of e-payments’,
generated content, it is also true that much 2016 as the ‘Year of the gig economy’, 2017
YEARS OF THE INTERNET 533

as the ‘Year of livestreaming’, and 2018 as chronology doesn’t imply that usage of the
the ‘Year of mobile gaming’, while predic- media described in each section remained
tions for the tech hype of 2019 are still open. limited to its breakout year – conversely, each
What is less disputable is that the develop- of those protocols and platforms has a pecu-
ment of digital media in the country will not liar history that is best investigated ethno-
grind to a halt anytime soon: both the National graphically: some have become widely used
Informatization Plan (2006–20) and the 13th throughout the world at large, some have
Five-Year Plan (2016–20) identify the year remained popular only locally, and others
2020 as a symbolic yet pressing deadline for have slowly shrunk or quickly disappeared.
completing the transformation of the country Twenty years after China’s ‘Year of the
into a ‘world-class information society’ Internet’ described by Geremie Barmé and Ye
(Austin, 2014: 3). Along with the rather gen- Sang (1997), the online interactions of hun-
eral goals outlined by policy documents, it is dreds of millions of users have surely come a
expected that the near future of the Internet in long way from the struggles with composing
China will see the emergence of new net- Spring Festival greeting cards through a lim-
worked devices and online platforms, new ited set of ASCII characters to be carefully
kinds of services and modes of participation, copy-pasted across e-mails and BBSs. After
as well as new forms of vernacular creativity experiencing a wangluo environment already
introduced by users through their everyday Web-centric since its early years of popular
interaction with different media. Alongside adoption, most of the Chinese Internet users
these innovations, it is likely that some of the of today spend a large part of their online
existing media forms, protocols and practices time in the enclosed playgrounds of plat-
will subside, while others will remain cur- forms like WeChat, Baidu, Taobao, Bilibili or
rent, or even see unexpected resurgences. Zhihu, and rarely need to tinker with HTML
The history sketched in this chapter has tags, character encoding or FTP uploads in
no pretense of completeness, and relies on order to create, share and circulate a large
necessary elisions: the six significant years I amount of multimedia content. Yet the prac-
chose (1987, 1995, 1998, 2001, 2005, 2011) tices of vernacular creativity through which
to describe the various kinds of protocols users make themselves at home in the media
and platforms (e-mail, BBSs, websites, QQ, ecologies shifting around them remain a con-
blogs, WeChat), their relationship to the mod- stant presence, pulling back the Web into the
est advent, massive popularity and creeping walled gardens of social media platforms.
enclosure of the Chinese Web, and the forms Vernacular creativity emerges as a shaping
of vernacular creativity practiced by their force of political economy through which
users are simply one of the many possible users open up unpredicted spaces of maneu-
ways of chronicling the history of the Internet ver in the media at their disposal (Li, 2016),
in China. My choices prioritize radical shifts and offers a productive category through
in protocol adoption, platform design and which to rethink where the boundaries of the
device usage, and in doing so they necessar- Web lie today. Regardless of what the next
ily conceal the variety of media under analy- ‘Year of the Internet’ will turn out to be about,
sis. Similar discussions could be dedicated users will rewrite its history and contribute to
to the rise of social networking websites like shaping the future of the Chinese Web.
Kaixin001, Renren or Douban, the fortune of
online marketplaces like JD or Taobao, the
adoption of foreign platforms like MySpace,
Facebook or Twitter, or the recent uptake Notes
of livestreaming apps. Similarly, the deci- 1  Mainland China, or Chinese mainland, are terms
sion to structure this history as a year-based commonly used to define the geographical area
534 THE SAGE HANDBOOK OF WEB HISTORY

under the jurisdiction of the People’s Republic of http://archive.wired.com/wired/


China (PRC), excluding the Special Administra- archive/5.06/china_pr.html (accessed 12
tive Regions (SAR) of Hong Kong and Macau. January 2015).
Given that the development of the Internet and Barton, D. (2010) ‘Vernacular writing on the
the Web in the Chinese mainland have been inti-
Web’, in: Barton D and Papen U (Eds) The
mately shaped by the peculiar geopolitical and
legal configuration of the PRC, the terms ‘China’
anthropology of writing: Understanding tex-
and ‘Chinese’ are used throughout this chapter tually-mediated worlds. London, United
as shorthands to refer to this area. Kingdom: Continuum. pp. 109–125.
2  ASCII (American Standard Code for Information Brügger, N. (2011) ‘Web archiving – Between
Interchange) is a character encoding standard past, present, and future’, in: Consalvo M
published in 1963 used to represent text by and Ess C (Eds) The handbook of Internet
encoding 128 characters into seven-bit integers. studies. Handbooks in Communication and
The ANSI (American National Standards Institute) Media. Malden, MA: Wiley-Blackwell. pp.
is a private non-profit organization that facilitates 24–42.
the development of national consensus stan-
Burgess, J. (2007) Vernacular creativity and
dards. ASCII characters have been arranged in
visual compositions to create emoticons and pic-
new media. PhD Thesis. Queensland Univer-
tures since the introduction of the standard, and sity of Technology, Brisbane, Australia. Avail-
its expansion through extended character tables able at: eprints.qut.edu.au/16378/1/
and ANSI color encoding have resulted in the cre- Jean_Burgess_Thesis.pdf (accessed 18 Febru-
ative practice termed ANSI art. ary 2015).
3  The Big5 encoding was prevalently used in Hong Chandler, D. (1998) Personal home pages and
Kong, Taiwan, Singapore and other Chinese- the construction of identities on the Web.
speaking regions, including diasporic commu- Available at: http://visual-memory.co.uk/
nities around the world. Yet some Mainland daniel/Documents/short/webident.html
Chinese webmasters preferred encoding their
(accessed 19 September 2014).
pages in the traditional characters mapped by
Big5 over the simplified ones offered by the GB
CIW Team (2014) The story of the rise of
national standards – some for their geographic Tencent Empire. Available at: https://www.
or biographic closeness to Hong Kong or Taiwan, chinainternetwatch.com/6031/tencent-
others for aesthetic or literary preference. rising-of-penguin-empire/ (accessed 20 May
4  Attributing virtually any problem encountered 2017).
during the use of a computer to the bloated Clark, P. (2012) Youth culture in China: From
interfaces of Tencent software was a common Red Guards to netizens. New York, NY: Cam-
diagnosis I repeatedly heard throughout my field- bridge University Press.
work: many informants saw QQ-branded appli- Damm, J. (2007) ‘The Internet and fragmenta-
cations as unreplaceable but aggressive software
tion of Chinese society’, Critical Asian Stud-
that took unwanted liberties with their machine.
The 2010 ‘Qihoo 360 vs. Tencent’ dispute over
ies, 39(2): 273–294. DOI: 10.1080/
unfair competition and software bundling prac- 14672710701339485.
tices highlights how the QQ messenger had Danet, B. (2001) Cyberpl@y: Communicating
become much more than a single piece of propri- online. Oxford, United Kingdom: Berg.
etary software. de Certeau, M. (1984) The practice of everyday
life. Berkeley, CA: University of California
Press.
de Seta, G. (2015) Dajiangyou: Media practices
REFERENCES of vernacular creativity in postdigital China.
PhD Thesis. The Hong Kong Polytechnic Uni-
Abbate, J. (1999) Inventing the Internet. Inside versity, Hong Kong, China.
Technology. Cambridge, MA: MIT Press. Du, X. (1999) ‘Internet diffusion and usage in
Austin, G. (2014) Cyber policy in China. China China’, Prometheus, 17(4): 405–420. DOI:
Today. Cambridge, United Kingdom: Polity 10.1080/08109029908632119.
Press. Esarey, A. (2015) ‘Winning hearts and minds?
Barmé, G.R., and Sang, Y. (1997) ‘The Great Cadres as microbloggers in China’, Journal
Firewall of China’. Wired, June. Available at: of Current Chinese Affairs, 44(2): 69–103.
YEARS OF THE INTERNET 535

Esarey, A., and Xiao, Q. (2008) ‘Political expres- celebrations on the Internet’, Journal of
sion in the Chinese blogosphere: Below the Computer-Mediated Communication, 1(2).
radar’, Asian Survey, 48(5): 752–772. DOI: DOI: 10.1111/j.1083-6101.1995.tb00329.x.
AS.2008.48.5.752. Lantis, M. (1960) ‘Vernacular culture’, Ameri-
Fung, M. (1995) Help needed in Beijing. Avail- can Anthropologist, 62(2): 202–216.
able at: http://www.bio.net/bionet/mm/ Leibold, J. (2011) ‘Blogging alone: China, the
bioforum/1995-April/014054.html?utm_ Internet, and the democratic illusion?’, The
(accessed 26 March 2015). Journal of Asian Studies, 70(4): 1023–1041.
Gevorgyan, G., and Manucharova, N. (2009) DOI: 10.1017/S0021911811001550.
‘Does culturally adapted online communica- Li, L.N. (2016) ‘Rethinking the Chinese Inter-
tion work? A study of American and Chinese net: Social history, cultural forms, and indus-
Internet users’ attitudes and preferences trial formation’, Television & New Media,
toward culturally customized Web design 18(5): 393–409. DOI: 10.1177/
elements’, Journal of Computer-Mediated 1527476416667548.
Communication, 14(2): 393–413. DOI: Lialina, O. (2009a) ‘A vernacular web’, in: Lial-
10.1111/j.1083-6101.2009.01446.x. ina O and Espenschied D (Eds) Digital folk-
Han, R. (2015) ‘Defending the authoritarian lore. Stuttgart, Germany: Merz & Solitude.
regime online: China’s “voluntary fifty-cent pp. 19–33.
army”’, The China Quarterly, 224: 1006– Lialina, O. (2009b) ‘A vernacular web 2’, in:
1025. DOI: 10.1017/S0305741015001216. Lialina O and Espenschied D (Eds) Digital
Hassid, J. (2012) ‘Safety valve or pressure folklore. Stuttgart, Germany: Merz & Soli-
cooker? Blogs in Chinese political life’, Jour- tude. pp. 58–69.
nal of Communication, 62(2): 212–230. DOI: Luqiu, L.R. (2017) ‘The cost of humour:
10.1111/j.1460-2466.2012.01634.x. Political satire on social media and
Hauben, J. (2005) Across the Great Wall – The censorship in China’, Global Media and
China-Germany email connection 1987– Communication, 13(2): 123–138. DOI:
1994. Available at: http://www.columbia. 10.1177/1742766517704471.
edu/~hauben/china-email.doc (accessed 12 O’Reilly, T. (2005) What is Web 2.0: Design pat-
November 2013). terns and business models for the next gen-
Herold, D.K., and de Seta, G. (2015) eration of software. Available at: http://
‘Through the looking glass: Twenty years www.oreilly.com/pub/a/web2/archive/what-
of Chinese Internet research’, The Informa- is-web-20.html (accessed 21 May 2017).
tion Society, 31(1): 68–82. DOI: Pan, W. (2017) ‘Under the Dome: Un-
10.1080/01972243.2014.976688. engineering digital capture in China’s smog’,
Hu, A.Y. (2007) ‘The revival of Chinese Asiascape: Digital Asia, 4(1–2): 13–32. DOI:
leftism online’, Global Media and Communi- 10.1163/22142312-12340066.
cation, 3(2): 233–238. DOI: 10.1177/ Pang, C. (2008) ‘Self-censorship and the rise of
1742766507078420. cyber collectives: An anthropological study
Huang, E. (1999) ‘Flying freely but in the cage of a Chinese online community’, Intercul-
– An empirical study of using Internet for the tural Communication Studies, 17(3): 57–76.
democratic development in China’, Informa- Penguin Intelligence (2017) 2017 Weixin
tion Technology for Development, 8(3): 145– yonghu & shengtai yanjiu baogao [2017
162. DOI: 10.1080/02681102.1999.9525303. research report on WeChat users & ecology].
Jenkins, H. (2006) Convergence culture: Where Available at: http://tech.qq.com/a/20170424/
old and new media collide. New York, NY: 004233.htm#p=1 (accessed 28 August
New York University Press. 2017).
Kluver, R., and Yang, C. (2005) ‘The Internet in Qiu, J.L. (2004) ‘The Internet in China: Tech-
China: A meta-review of research’, The Infor- nologies of freedom in a statist society’, in:
mation Society, 21(4): 301–308. DOI: Castells M (Ed.) The network society: A
10.1080/01972240591007616. cross-cultural perspective. Cheltenham,
Kozar, S. (1995) ‘Enduring traditions, ethereal United Kingdom: Edward Elgar Publishing
transmissions: Recreating Chinese New Year Limited. pp. 99–124.
536 THE SAGE HANDBOOK OF WEB HISTORY

Roy, R. (1998) ‘Wizards, bureaucrats, warriors, Wang, S.S., and Hong, J. (2010) ‘Discourse
and hackers: Writing the history of the Inter- behind the Forbidden Realm: Internet
net’, The American Historical Review, 103(5): surveillance and its implications on
1530–1552. DOI: 10.1086/ahr/103.5.1530. China’s blogosphere’, Telematics and Infor-
Schroeder, R., and Brügger, N. (2017) ‘Intro- matics, 27(1): 67–78. DOI: 10.1016/j.
duction: The web as history’, in: Brügger N tele.2009.03.004.
and Schroeder R (Eds) The web as history: Wellman, B. (2011) ‘Studying the Internet
Using web archives to understand the through the ages’, in: Consalvo M and Ess C
past and the present. London, United King- (Eds) The handbook of Internet studies.
dom: UCL Press. pp. 1–19. DOI: Handbooks in Communication and Media.
10.14324/111.9781911307563. Malden, MA: Wiley-Blackwell. pp. 17–23.
Scott Sadofsky, J. (2005) BBS: The documen- Wu, A.X. (2012) ‘Hail the independent thinker:
tary. Documentary. Bovine Ignition Systems. The emergence of public debate culture on
Sima, Y., and Pugsley, P.C. (2010) ‘The rise of a the Chinese Internet’, International Journal
“me culture” in postsocialist China: Youth, of Communication, 6: 2220–2244.
individualism and identity creation in Xinlang Keji (2008) 1995 nian Qinghua Shuimu
the blogosphere’, International Communica- Qinghua BBS chengli [In 1995, Qinghua Uni-
tion Gazette, 72(3): 287–306. DOI: versity sets up the Shuimu Qinghua BBS].
10.1177/1748048509356952. Available at: http://tech.sina.com.cn/i/2008-
Song, G. (1998) Song Gang’s Cyberspace Villa. 11-12/18262575131.shtml (accessed 19
Available at: http://www.grchina.com/qiang/ May 2017).
cybervilla.htm (accessed 22 May 2017). Yan, F. (1997) ‘Shenghuo zai wangluo zhong
Tai, Z. (2006) The Internet in China: Cyberspace – Shangpian [Living in the Internet – First
and civil society. Abingdon, United King- part]’, in: Shenghuo zai wangluo zhong
dom: Routledge. [Living in the Internet]. Wangluo wenhua
Taneja, H., and Wu, A.X. (2014) ‘Does the congshu. Beijing, China: Zhongguo Renmin
Great Firewall really isolate the Chinese? Daxue Chubanshe. pp. 3–114.
Integrating access blockage with cultural Yu, H. (2007) ‘Blogging everyday life in Chinese
factors to explain Web user behavior’, The Internet culture’, Asian Studies Review,
Information Society, 30(5): 297–309. DOI: 31(4): 423–433.
10.1080/01972243.2014.944728. Zhang, L. (2015) ‘Fashioning the feminine self
Tang, H. (2009) ‘Blogging in China: Freedom of in “prosumer capitalism”: Women’s work
expression vs political censorship in sexual and the transnational reselling of Western
and satirical blogs’, Networking Knowledge, luxury online’, Journal of Consumer Culture.
2(1): 1–17. DOI: 10.1177/1469540515572239.
Tsui, L (2007) ‘An inadequate metaphor: The Zheng, C. (1994) ‘Opening the digital door:
Great Firewall and Chinese Internet censor- Computer networking in China’, Telecom-
ship’, Global Dialogue, 9(1–2): 60–68. munications Policy, 18(3): 236–242.
Wallis, C. (2011) ‘New media practices in Zhou, Y. (2006) Historicizing online politics:
China: Youth patterns, processes, and poli- Telegraphy, the Internet, and political partici-
tics’, International Journal of Communica- pation in China. Stanford, CA: Stanford
tion, 5: 406–436. University Press.
36
Cultural, Political and Technical
Factors Influencing Early Web
Uptake in North America and East
Asia
Mark McLelland

INTRODUCTION also prefigured aspects of the Web that we


take for granted today. However, there were
Today internet histories is a developing field technical limitations that made the use of
with more attention being paid to regional these services outside of North America and
histories (Goggin and McLelland, 2017) as Europe problematic due to the lack of inter-
well as pre-internet developments in com- national protocols surrounding language and
puter mediated communication (CMC). As script input and retrieval. As Nishigaki
Carey and Elton (2010) argue, there is a (1998) has noted, the complexity of some of
growing acknowledgement of the many alter- the world’s writing systems has ‘often
nate developments in online services that hinder[ed] the rapid development of text pro-
prefigured and prepared the ground for the cessing technology’. In this chapter I look at
internet applications that became ubiquitous early developments in CMC across four of
with the roll-out of the World Wide Web in the currently most densely networked socie-
the mid-to-late 1990s. For instance, they ties in East Asia: Taiwan, mainland China,
refer to teletext in the United Kingdom and South Korea and Japan, to assess how both
videotext in the United States in the 1970s, common and specific challenges faced by
which provided news and other information these societies influenced their uptake of
in a text format that could be displayed on a Web-based applications in the mid 1990s.
television screen and navigated with the It is important to recall that in Euro-American
remote control (2010: 221–3). Although societies the roll-out of public internet access,
uptake of the services was low, they did get which was accelerated by the introduction of
people ‘used to the idea of information the Web and the first Web browsers in 1994,
retrieval and display’ (2010: 226). Numerous took place in a context where large segments
database and computer conferencing systems of the population already had some prior
538 THE SAGE HANDBOOK OF WEB HISTORY

experience of personal computing (Fouser, THE ROLE OF ENGLISH IN THE


2001: 274). The fact that most high-school DEVELOPMENT OF COMPUTER
graduates already had a familiarity with the NETWORKS
QWERTY keyboard, originally popularised
by the Remington typewriter in 1873, meant The dominance of English-speaking coun-
that prior to the development of the mouse and
tries in the development of computing in the
the graphic interface, the use of the keyboard
middle of the last century led to a situation
to interface with a computer screen was easily
where the Roman alphabet and the English
intelligible. Despite the fact that the QWERTY
language were the default script and lan-
layout, originally devised to prevent the jam-
guage used both for the construction of com-
ming of commonly occurring letters in English
puter code and commands and for discussion
words, is not the most efficient for computer
concerning research and development (Breen
input, the familiarity established by the type-
and Tokita, 2004: 1; UNESCO, 2005). The
writer has made the keyboard difficult to
7-bit code that was established by the US
change (Castillo, 2011: 613). The centrality
standards agency in the early 1970s, gener-
of the QWERTY keyboard to early computer
ally known as the American Standard Code
input and navigation systems meant that speak-
for Information Interchange (ASCII),
ers of East Asian languages needed to develop
familiarity with this counter-intuitive system in allowed for 128 basic characters including
order to process their own languages. lower- and uppercase letters from a–z,
Given the complex input and display issues numerals 0–9 and punctuation marks. With
associated with non-Roman scripts such as the extension to 8-bits, a further 128 charac-
Chinese, Japanese and Korean, which made ters were made available that included
interoperability between different encod- accented letters and additional punctuation
ing systems challenging, the introduction of marks (Breen and Tokita, 2004: 1; UNESCO,
Web-based platforms was less straightfor- 2005: 71–3). Most countries using European
ward in East Asia than in North America and languages were thus able to deploy the stand-
Europe. Although the ascendancy of Web- ard ASCII set while using the extra character
based applications may seem inevitable if we capacity to configure diacritic, punctuation
look at the development of the internet from and other marks to suit the local writing
a Euro-American perspective, a review of the system. Non-Roman alphabets such as Greek
history of Web uptake in East Asia reveals a or Cyrillic were able to use the extended
number of culturally embedded issues with code space to configure their own alphabets.
the technology that first had to be resolved However, languages such as Chinese and
before the Web could be embraced by a Japanese, which use several thousand distinct
majority of internet users. Chinese characters, could not be written by
This chapter reviews aspects of pre-Web such a restricted number of options.
internet communications systems in East In North America publicly available net-
Asia, focusing specifically on the BBS cul- worked computing emerged on a small
tures where the majority of early adopters scale in 1978 with the founding of the
were first introduced to native script process- Computerized Bulletin Board System
ing. It then considers how the affordances (CBBS), a dial-up technology that enabled
of these early systems and the cultures that a limited number of users to access message
developed around them paved the way for the boards housed on central computers. The
different pace and scope of Web uptake in the establishment of ASCII at this time as the
mid-to-late 1990s and explains the persis- standard input and display code across appli-
tence of non-Web-based applications in some cations in North America enabled a range
contexts in East Asia today. of text-based networks such as FidoNet,
CULTURAL, POLITICAL AND TECHNICAL FACTORS INFLUENCING EARLY WEB UPTAKE 539

CompuServe and later AOL to begin to offer a few years of its development. In North
services nationally and later overseas. One America the transition from early internet
of the key features that allowed networked applications such as Usenet and various
communication was the development in 1978 Bulletin Board Systems (BBS) was largely
(and release onto ARPANET in 1983) of the seamless since the Web included many fea-
TCP/IP set of protocols that allowed comput- tures already made familiar in these contexts
ers with different operating systems to com- and it simply enhanced the user experi-
municate with each other via telephone lines ence, making the retrieval and upload of
(Carey and Elton, 2010: 217). However, an information more straightforward for those
important caveat is needed here – these pro- with less computer literacy. These factors
tocols were also based on the alphabet as well resulted in a large imbalance in the amount
as numerals and punctuation marks used in of material available in English vis-a-vis
English, thus assuming familiarity with a other (especially non-European) languages
specific script (not to mention keyboard lay- in the late 1990s. For instance, one of the
out), making it difficult to use the TCP/IP earliest surveys of language presence on
protocols to send and receive data in scripts the internet conducted by the Babel Project
other than Roman. discovered that as of June 1997, 84 percent
Subsequent to the implementation of TCP/ of Web content was in English, followed by
IP, the next development that increased the 4.5 percent in German. At this time only 3.1
capacity for information to be sent and received percent was in Japanese and Chinese usage
via the internet was the World Wide Web, made had yet to register (cited in Gerrand, 2007).
available to the public in 1993. The Web had This massive imbalance in the amount of
numerous advantages when compared with information available in English led some
earlier applications, especially its capacity for Korean and Japanese commentators to com-
handling formatted text, embedded graphics plain at the time of an American ‘hegemony
and sound and visual media. However, Web over Internet culture’ (Auh, 1998; see also
programming and mark-up languages carried Nishigaki, 1998).
over existing biases in terms of their reliance on
Roman script. Initially languages using scripts
other than Roman needed to upload their Web
pages as image files, resulting in slow down- CHALLENGES TO COMPUTING WITH
load speeds and lack of searchability. Hence, CHARACTER-BASED SCRIPTS
as Pargman and Palme, in their analysis of
‘ASCII imperialism’, concluded, ‘there does As mentioned earlier, many non-European-
exist a bias among the organisations, institu- language users, particularly those in East
tions, and individuals’ involved in setting the Asia whose written languages were not
standards for computer communication ‘that alphabet based but depended to some extent
works in favour of English-speaking Internet upon the reproduction of complex characters
users’ (2009: 197). originating in China (such as Chinese,
By 1994, the development of Web Japanese and, to an extent, Korean), were not
browsers able to launch applications that familiar with the typewriter. Although type-
displayed content using earlier internet pro- writers for Asian scripts did exist, they were
tocols meant that much information embed- cumbersome and could only be used by
ded in previous systems was still available. highly trained operatives, making it impos-
This fact, added to the ability to search for sible to reproduce the typing pools that had
content across different platforms, enabled developed across Western businesses and
the Web to become the most popular internet government departments (Choi, 2013: 42;
application across many jurisdictions within Gottlieb, 2000: 136–7).
540 THE SAGE HANDBOOK OF WEB HISTORY

The possibilities for easier text input, dis- there was no unifying system in place for
play and retrieval afforded by computers the input and conversion of Chinese charac-
were therefore of great interest to East Asian ters (used across all three languages) nor for
businesses and governments and from the the local hiragana and katakana syllabaries
1970s onward various schemes and protocols used in Japan or for the hangul alphabet in
were explored across the region. However, Korea. Only during the 1980s did the need
unlike the close cooperation between North for texts to move across national boundaries
American and European researchers and as well as for the use of multiple scripts in the
manufacturers at the time, there was little same document become apparent (Breen and
attempt at achieving standardisation between Tokita, 2004: 2).
countries in the region (Contreras, 2014). One It was not until 1993 that Unicode, a sys-
major problem was the restricted memory tem that integrated aspects of the ISO 10646
available in early computers, making it dif- code that the International Organization
ficult to develop code to deal with characters for Standardization, an independent NGO
beyond the 26 letters of the Roman alphabet, based in Geneva, had developed, was pub-
Arabic numerals and a small number of key licly released. Unicode was a 16-bit system
punctuation symbols associated with written allowing a potential 65,536 characters that
English. The fact that the internet and CMC finally allowed for the inputting and display-
more generally were pioneered in the United ing of ‘unified CJK [Chinese, Japanese and
States, a largely English-speaking jurisdic- Korean] ideographs’ across platforms and
tion, meant that there had been little inclina- applications. However, this code could not
tion to invest in technology to expand this itself address some fundamental biases con-
character range. Hence, as Breen (2007: 1) cerning script built into the architecture of the
points out, at the time of these early devel- internet. For instance, Japanese and Chinese
opments in computing, it made little sense text is traditionally written from right to left
to speak of ‘computing in English’ – since in a vertical manner from top to bottom of
the use of the English language was embed- the page. Also, unlike European languages,
ded in the very architecture of computer which separate words on the page and screen,
programs and English was the default lan- other than paragraph spaces, Chinese and
guage for international communication on Japanese are written without word breaks
the early internet (Breen and Tokita, 2004: (Unger, 1987: 29–31). This distinctive writ-
1; Pargman and Palme, 2009: 184; Weber, ing style requires conversion software to
1997: 16). However, ‘computing in Japanese’ recognise not just individual words but also
(or Chinese or Korean) raised a whole set of lexemes.
issues particular to these languages and there Another fundamental bias concerning
have been a number of studies dedicated to script is the reliance on the Roman alpha-
the specific computing problems raised by bet plus numerals and common punctuation
the use of East Asian scripts (see, for exam- devices in the coding of internet protocols
ple, Gottlieb, 2000; Lunde, 1999; Unger, and Web domain addresses. In 1998 the
1987). Multilingual Internet Names Consortium,
During the 1970s separate attempts were an NGO that had emerged from a research
made in countries of the East Asian region project based in Singapore, began to address
by computer manufacturing companies and this issue and in 2003 a standard conversion
other bodies to develop ‘double byte char- algorithm was developed and approved by
acter sets’ that could cope with a non-pho- the Internet Engineering Task Force. This
netic ideograph-style writing system, but allowed for a domain name to be written in
unlike in North America and Europe where a local script familiar to end users (deploy-
the American system had become standard, ing Unicode) and then converted into the
CULTURAL, POLITICAL AND TECHNICAL FACTORS INFLUENCING EARLY WEB UPTAKE 541

international format familiar to Web brows- From 1995 onward, news sharing, an impor-
ers that utilised the limited ASCII character tant early function, was replaced by portal
set (Breen and Tokita, 2004: 3). It thus took sites such as yahoo.com, and chat functions
a decade after the introduction of the Web were replaced by instant messaging tools
before users in Japan and China were able to such as those supported by MSN. Yet for over
use their character-based scripts in the desig- a decade prior to the launch of the Web, BBS
nation of Web addresses. played an important role in the development
of internet culture globally. BBS were sup-
ported by a set of internet protocols known as
Telnet that in the days of limited bandwidth
EARLY ONLINE NETWORKING: and computer memory enabled a no-frills dis-
BULLETIN BOARD SYSTEMS play of textual information. The display was
in plain text and without the use of images, or
For the majority of early adopters across sound and video files. There was no mouse
East Asia it was in the context of various to navigate the screen and commands needed
Bulletin Board Systems (BBS) that they to be issued by hotkey combinations such as
became accustomed to CMC, especially in Ctrl + R for replying to a message. Thus, a
regard to the input and display of native familiarity with the QWERTY keyboard was
scripts. BBS are among the earliest forms necessary for these early users, but once the
of social networking and information-­ commands had been memorised the naviga-
sharing systems that introduced users to tion of services was fairly straightforward. It
many of the online activities that we take was also not too difficult to set up a BBS of
for granted today. These included message one’s own and many individuals and small
boards used for news sharing and the post- groups dedicated to niche interests and pas-
ing of ads and personal requests, file shar- times took advantage of the technology to
ing, online gaming, self-advertisement, network.
chat and debate. However, despite the fact The introduction of the Web in the mid
that these systems pioneered some of these 1990s did not immediately supplant the use
familiar applications, technical limitations of BBS. Instead BBS sites were reconfig-
meant that they offered a communications ured to allow access from a Web browser in
environment that lacked the immediacy of addition to a Telnet connection and support
the modern internet. Lack of memory and site navigation via a mouse, not key com-
slow data speeds have already been dis- mands. Increased bandwidth also granted
cussed regarding script input and display, enhanced functionality, particularly support
but these issues also affected the kind of for multimedia formats. However, as more
material that could be exchanged online, sophisticated Web-based services were
limiting it mostly to text. In addition, the developed many users simply migrated
number of phone lines that were able to to these new and more powerful applica-
link to a mainframe at any one time was tions, preferring to search for information
limited, leading to connectivity issues. It via Google or chat and exchange opinions
was also not unusual for phone companies on MSN. As in North America and Europe,
to charge for connectivity by the minute, BBS played an important role in the ini-
encouraging users to avoid peak times and tial development of computer literacy and
take advantage of lower off-peak rates late communication across East Asia, where
at night. Web-based BBS have remained influential
BBS have generally been supplanted by in Japan and mainland China, and earlier
newer, Web-based applications such as online Telnet versions are still in use in Taiwan, as
forums and commercial sites like Facebook. outlined below.
542 THE SAGE HANDBOOK OF WEB HISTORY

THE PERSISTENCE OF BBS IN TAIWAN means to communicate efficiently with the


student body. However, the systems were
As a small and relatively resource-poor often set up and run by student volunteers,
island that supports a large population of causing tension between the institutional
over 23 million, keeping a competitive edge owners of the system and the everyday
in exports has been vital for Taiwan’s devel- administrators. The need to negotiate with
opment. Hence, in the early 1980s there was institutions about technical innovations frus-
growing awareness among government agen- trated many students, who looked for ways to
cies that Taiwan needed to do more to keep develop their own private networks using the
up with developed nations in terms of com- infrastructure provided by their universities.
puterisation and office automation. Yet, as Student innovators developed a number of
Wang points out, given language and other plug-in packages that enabled individuals to
differences, ‘computerization and automa- easily set up their own BBS on existing uni-
tion [could not] be accomplished by just versity networks, leading to a boom where
purchasing the hardware’ (1984: 15) since a up to 400 large Telnet BBS were in operation
number of changes needed to take place, between 1996 and 2000, a significant number
including the abolition of martial law in 1987 for the island’s population of only 21 million
and the subsequent deregulation of the tele- at the time (Li et al., 2017: 188).
communications industry. During the late 1990s, even though other
Internationally networked communication Web-based networking forums were now
began in Taiwan in 1988 when an existing net- available, there was a strong preference
work of local Taiwanese computer hobbyists among Taiwanese students for continued
established a connection to FidoNet, which interaction via BBS. This was partly to do
allowed previously independent and isolated with language – there was simply much more
BBS to be linked together in a network span- content in Chinese on existing BBS forums
ning international borders (Li et al., 2017: than on the Web at this time, but there was
183). However, at this time the bandwidth also the perception that Web pages were
restrictions of dial up and problems with han- somewhat ‘static’, difficult to update in real
dling of Chinese script limited the uptake of time and separated from the needs and inter-
the technology. In 1990 the Taiwanese gov- ests of their users. Campus-based BBS on
ernment took the lead in internet develop- the other hand seemed to speak much more
ment, funding internet centres across three directly to their student audiences in a lan-
national universities, providing subsidised guage all could use and deploying technol-
access to all staff and students. This initial net- ogy developed by the students themselves. It
work soon spread to other universities where is for these reasons that Li et al. (2017) point
it became common to provide internet access to the continued salience of BBS technology
portals to students living in campus dormito- in Taiwan at a time when Web-based systems
ries. At this time students were introduced to were proving much more popular in Euro-
a number of CMC features including BBS; American contexts. As one of their inform-
however, unlike the previous costly and lim- ants pointed out, ‘Back in the old days, when
ited dial-up system deployed by the original other countries were crazy for Web sites, for
BBS innovators, the Taiwanese system used several years college students in the whole
a TCP/IP model providing enhanced band- [of] Taiwan were mad about developing
width that also supported the use of Chinese and improving BBS systems’ (2017: 188).
script (Li et al., 2017: 184–5). Indeed, BBS have maintained their popular-
Universities generally embraced student ity among Taiwanese students today, espe-
use of BBS, viewing the technology as a cially the PTT network originally founded
CULTURAL, POLITICAL AND TECHNICAL FACTORS INFLUENCING EARLY WEB UPTAKE 543

by a National Taiwan University student in the internet via a dedicated line connecting to
1995, which still uses a Telnet protocol offer- the United States in 1994 (Jin, 2008), at a
ing a primarily text-based environment. time when the Web was already becoming
The continuing popularity of BBS use the most popular means of creating and navi-
among students in particular can be put down gating internet content. However, the costs of
to the strong sense of community and owner- computer purchase plus connecting to the
ship as requisite technical skills and admin- network as well as a lack of computer literacy
istrative duties have been handed down from were initial barriers to widespread adoption.
generation to generation of student users. China’s first TCP/IP-based BBS, known
BBS in Taiwan maintained their critical edge as the Dawn BBS, was established soon after
and autonomy at a time when Web-based China’s connection with the World Wide Web
networks were becoming increasingly com- in April 1994, and hosted by the National
mercialised and dominated by advertising. Research Centre for Intelligent Computing
PTT is now the most popular BBS in Taiwan, Systems. It was the first public internet-based
with more than 20,000 discussion boards. BBS in China, offering services such as
Indeed, it has been such an important part of news updates, online forums and chat rooms.
the life of generations of students that in 2012 Universities, among the first institutions to be
it was the topic of a major movie suppos- given internet access, allowed students easy
edly based on ‘true events’ taking place on access to the new communications technol-
the discussion boards (Taiwan Today, 2010). ogy, initially via BBS. Telnet-based commer-
Like Japan’s 2-channel, which also utilises cial BBS such as the ‘sports salon’ affiliated
a BBS format (albeit Web based), PTT has with Sitong Lifang Information Technology
developed its own online lingo and art and Company also gained mass audiences at this
become a major part of the country’s media time. These early BBS communities were
landscape, accessed by both current and for- established on CFidoNet, a branch of the
mer students. The continuing interest in and international FidoNet system, with a user-
access to BBS networks across the popula- base of around 10,000 at its peak in 1997.
tion in Taiwan is distinct from the situation These early adopters were well-educated and
in mainland China (discussed below) where, tech-savvy individuals from the economi-
due to legislation introduced in 2005 that cally developed south-east of the country and
required all college BBS to ‘transform into tended to be liberal in political outlook, sup-
exclusive, on-campus communication plat- porting a wide variety of discussion topics
forms based on real name registration’, off- (Li, 2010: 67). Yet, unlike in Taiwan where
campus users and alumni are barred from private student-run BBS were able to grow
participating (Jin, 2008: 22–3). outside of government and university con-
trol, on the mainland government interfer-
ence, including the closing of popular sites,
soon tempered the liberal outlook of these
MAINLAND CHINA: FROM BBS first forums (Li, 2010: 69).
TO THE WEB 1997 saw the roll-out of popular com-
mercial portal websites such as Sina.com.
Early computer networking in mainland In 1998 Sitong was incorporated into Sina,
China between 1986 and 1992 was limited to which became the largest Web portal in the
email applications shared between university Chinese-speaking world (Yang, 2017: 372),
computer research labs. China, spurred on by inaugurating the era of Web-based BBS.
the Clinton administration’s strong support Other commercial developments encouraged
for internet infrastructure, was first linked to Web uptake at this time, with major news
544 THE SAGE HANDBOOK OF WEB HISTORY

outlets setting up their Web presence and limited interaction from users. Developments
internet startups such as Netease.com offer- included a video text service known as
ing free space for users to build their own Chollian, inaugurated in 1987, with terminals
Web pages (Yang, 2017: 372). Web-based set up in airports, major hotels and event ven-
BBS still remain popular, focusing as they ues. This limited access meant that the service
do on topics and issues of social concern, as was not widely used, and it was relaunched in
opposed to other more people-centric social 1988 with an expanded number of services,
networks (Inside China, 2012). including home shopping, but still failed to
catch on with a broad public.
Jo (2017) argues that the first internet cul-
tures in Korea did not develop around users
EARLY KOREAN APPLICATIONS of the Chollian terminals but instead emerged
ANTICIPATE THE WEB from early adopters of email communication.
In 1984 DACOM established an email ser-
In Korea the development of internet cultures vice that was connected to the United States
can be traced back to 1982, when the govern- and made available to research institutions
ment introduced a research-oriented com- and major corporations, but the system was
puter communication network using TCP/IP only formatted to handle text in English and
protocols. As Jo comments, although there did not prove popular with individual users.
had been some use of computers in Korea At the time an alternative system that could
dating from the late 1970s, ‘the effective receive and display messages in the hangul
implementation of a Korean character code script, known as H-mail, was already being
and the search for an optimal standard were used among DACOM employees and was
key factors necessary before new information gradually rolled out to its corporate clients
and communication technologies could and, from 1987, it was made available to the
become part of popular and mass culture’ general public. Uptake was not rapid, how-
(2017: 199). Although Korean script has also ever, for a number of reasons, mainly due
traditionally employed hanja (the Korean to low PC penetration rates and the need for
equivalent of Chinese characters), these have expensive hardware such as modems.
mostly fallen out of use, albeit they are still Despite the relatively few users, H-mail
studied at school and widely comprehended. saw some important developments that pre-
Hence, the development of code for the pro- figured in many ways the kinds of collabora-
cessing of the native hangul alphabet does tive cultures that were to come into their own
not pose the same challenges as character- with the roll-out of Web-based social media
based scripts such as Chinese and Japanese, platforms in the early 2000s. For instance,
given that hangul consists of a finite set of 19 affordances of the system such as BBS news
consonants and 21 vowels. The main prob- groups and chat-like electronic conferencing
lem was one of consistency, since before spaces proved extremely popular. Unlike the
1987 there were numerous competing and Chollian system, where users accessed pre-
incompatible software programs for the pro- determined information, the information on
cessing of the hangul script before the KS C H-mail was mostly provided by and shared
5601 character set was adapted as the national among the users themselves. Like on early
standard. internet networks in North America, and par-
The Data Communication Company of alleling the experience in Taiwan at this time,
Korea (DACOM) was established by the H-mail boards provided the perfect environ-
government in 1982 and was intended to be ment for networking among computer nerds
the main artery for data flow. It was estab- and hackers who were interested in pro-
lished on a top-down broadcast model with viding an enhanced user experience, often
CULTURAL, POLITICAL AND TECHNICAL FACTORS INFLUENCING EARLY WEB UPTAKE 545

against the directives of the network owners. English (Shapard, 1993: 260). Even after the
These networks led to the establishment of establishment of a growing number of local
the first BBS systems in 1988, which, like networks, Japan’s overpriced leased lines and
in Taiwan at the time, became extremely expensive metered call charges made it dif-
popular among university students. However ficult for individuals to engage with these
unlike in Taiwan where one board, PTT, new opportunities.
developed rapidly on the back of government Hence, by the early 1990s, unlike the situ-
support for university networks, the early ation in North America, there was ‘no strong
BBS systems in Korea were ‘skunkworks’- tradition of grassroots approaches in info-
style developments set up by individuals or communications and no extensive experience
small teams since, as Jo points out, at the in the use of electronic community networks
time private usage of a modem and unauthor- via CATV or computer networks’ in Japan
ised communications systems like the BBS (Latzer, 1995: 524). The computing environ-
were regarded as potential threats to national ment was largely defined by mainframes and
security (2017: 204–5). It was on these early proprietary systems complete with bundled
BBS that donghohoe or popular ‘friendship software, and the price of personal comput-
associations’ developed, challenging the ers remained high (Parker Smith, 1997: 26).
information retrieval model of internet com- Established business practices prioritising
munication anticipated by the authorities (Jo, face-to-face meetings and the exchange of
2017: 205). Hence, when major internet por- hand-written documents via tele-fax also
tals began to offer Web-based internet café delayed the take-up of email exchange. The
and communication-style services in the late exchange of email, which encourages lat-
1990s, they were tapping into cultures of use eral connections between employees, was
and expectation that had already been estab- also a cause for concern in many of Japan’s
lished in the BBS era, albeit extending these major businesses used to top-down commu-
services to users beyond their original tech- nication structures (Parker Smith, 1997: 29).
savvy and student-based cohorts. The uptake of personal computers was also
low compared with other developed nations.
Research in 1994, for instance, showed a
PC penetration rate of 15.8 percent for the
JAPAN AND THE EARLY ADOPTION United States and only 5.7 percent for Japan.
OF THE MOBILE INTERNET Significantly, of these American computers,
52 percent were linked to local area networks
The difficulty of inputting Japanese text and compared with only 8.6 percent of Japanese
the lack of familiarity with computing tech- computers that were similarly linked (Latzer,
nology among the general public meant that 1995: 524). Japanese people’s relative lack
computer literacy was very low in Japan in of familiarity with computers is also seen
the early 1980s (McLelland, 2017). There in a 1997 survey comparing Japanese with
were also industry and policy factors that American and Korean computer literacy. It
slowed the development even of a hobbyist was found that the Japanese had least famili-
cohort of computer pioneers (Contreras, arity with a keyboard, only 23.7 percent of
2014: 919). Due to restrictions on the use of Japanese indicating that they could ‘type
Japanese telephone lines, prior to industry fast’ compared with 31.6 percent of Koreans
reform in 1985 those hobbyists who wanted and 54.4 percent of Americans. In Japan
to access computer networks had to do so via 50.8 percent of respondents indicated they
overseas systems such as CompuServe in the had never used a computer, in comparison to
United States, requiring expensive interna- only 21.8 percent of American respondents
tional dial-up fees and some fluency in (Fouser, 2001: 274). This situation meant that
546 THE SAGE HANDBOOK OF WEB HISTORY

the discourse about Japan and the internet in This early familiarity with BBS-style com-
journalistic venues such as Wired was framed munication in Japan helps to explain the
in terms of bureaucratic incompetence and phenomenal rise of the 2-channel Web-based
cultural entrenchment, stressing a need for BBS site established in 1999, which at its
‘catch-up’ based on American models (see, peak in the mid 2000s was receiving 2.5 mil-
for example, Abate, 1996; Johnstone, 1994). lion posts a day. Like the PTT BBS system
As in Taiwan, mainland China and Korea, in Taiwan discussed earlier, 2-channel has
issues of interoperability raised by different played a major role in Japan’s internet cul-
encoding systems needed to be overcome ture, becoming an alternative news outlet and
before the widespread adoption of CMC was developing its own distinct language and art
possible. The situation in Japan was even forms (Katayama, 2007).
more complex than in these other nations However, the major boom in internet
since in addition to 2,000 or so basic kanji connectivity in Japan did not eventuate as
(Chinese characters), Japanese also uses two a result of gradually increasing PC penetra-
native syllabaries, hiragana and katakana, tion, but as a result of the early adoption of
which function like alphabets. Prior to 1995 the mobile internet accessed via the keitai
there had been little attempt to develop a uni- denwa (handheld phone). In the late 1990s in
fied approach to language input and display Japan internet access was expensive, requir-
problems, leading to Japan’s major com- ing investment in a computer, modem and
puter manufacturers developing their own line rental and, as telecoms were still charg-
proprietary systems (Seo, 2013: 186). This ing for dial-up calls by the minute, Web
meant that although there were several BBS- browsing could rack up large bills. In 1999
style computer networks, including NEC’s NTT’s ‘i-mode’ system allowed access to
PC-Van and Fujitsu’s Nifty-Serve, each with specially configured sites using a cell phone
about half a million subscribers by 1994, and charged only for downloads, not time
the networks could not link with each other spent browsing. I-mode did not deploy WAP
(McLelland, 2017). (Wireless Application Protocol), which was
These companies pioneered what is known developed by an industry-wide consortium
in Japanese as pasokon ts ūshin or ‘personal in Europe and North America and failed to
computer communication’ – initially based connect with consumers at the time, largely
on a BBS that enabled users to seek out due to the impoverished Web environment
information from various news feeds as well that could be provided on a screen capable of
as participate in online discussions and send displaying only a few lines of text (Vincent,
emails to other users – albeit, in the early 2001: 16). Instead, the Japanese system used
years at least, only with those on the same a simplified form of HTML that required
system (McLelland, 2017). The interoper- sites to be specifically designed or converted
ability problem was finally overcome, not by for the system, meaning that i-mode was only
Japanese computer companies, but by an out- available in Japan and able to be accessed via
side intervention from IBM, which licensed the specially configured keitai handsets. As
its DOS/V software in 1995, allowing com- a result, early advertising for the medium
puter manufacturers to market open systems avoided discussion of specifications and
that allowed anyone to install Japanese fonts even avoided mention of the internet or the
onto their PCs (Parker Smith, 1997: 28). The Web, stressing instead the convenience of the
launch of a Japanese version of Windows 95 various applications offered (such as ticket
at the end of 1995 made available a host of reservations, email and music downloads)
new applications and was a major boost to PC (Natsuno, 2003: 77).
sales among the general public, with smaller The sudden boom of interest in i-mode
and lighter laptops proving the most popular. can be understood in the context of earlier
CULTURAL, POLITICAL AND TECHNICAL FACTORS INFLUENCING EARLY WEB UPTAKE 547

developments that had introduced consumers added to their monthly phone bill, thus obvi-
to the convenience of handheld communica- ating the need for credit card transactions.
tion devices. The popularity of pagers among One particular success of this model has been
Japanese teens, for example, has been noted, the licensed download of music, ring tones
especially the manner in which schoolgirls and e-books. For over a decade i-mode was
used the pagers’ numerical display capacity a distinct ecology within the wider internet
to send coded messages (Kohiyama, 2005: world, complete with its own navigation and
64). Matsuda argues that early adopters custom-designed streamline services that
among the nation’s youth simply carried over were both inexpensive and light on battery
this message-exchange function when they use. It is only in recent years that i-mode’s
moved to mobile phones, pointing out that appeal has begun to be challenged by smart
‘the keitai Internet is substantially different phones offering a much richer Web experi-
from that accessed by personal computers’ ence and the full range of apps expected by
(2005: 124). The relative absence of a wide- modern consumers (Akimoto, 2011).
spread PC-based internet culture in Japan,
alongside the functionality of the specially
designed i-mode sites and handsets contain-
ing several short-cut buttons, made mobile CONCLUSION
internet access seem efficient and attractive.
The keitai also saw an innovation in input This chapter has looked at language, cultural
method for the Japanese language, using a and political issues on the early internet,
limited numerical keypad where each num- investigating how they affected the take-up
ber on the pad is associated with a sequence of internet applications in North America and
of kana syllables – such as ka, ki, ku, ke, East Asia prior to the advent of the Web. I
ko – with the desired syllable being selected have argued that in the case of North America
according to the number of button presses. and Europe, the move from early text-based
Once the phonetic spelling of the desired systems to the multimedia graphics interface
term is visible on screen, conversion to the offered by the Web was largely seamless due
required kanji can be achieved by the use of to the internet and the Web already sharing
an arrow button, allowing the entire process basic protocols, most notably reliance on
to be navigated just by using one thumb – ASCII, TCP/IP and the QWERTY keyboard
giving rise to the moniker yubi-zoku or as the main human/computer interface.
‘thumb tribe’ to describe young people who Bulletin Board Systems were shown to be
soon became proficient in this input style. important precursors to Web applications in
It is subsequent to the advent of mobile that they already anticipated many of the
internet access via the keitai that the his- affordances that the Web made more widely
tory of the internet in Japan can no longer available. While the move from BBS to Web-
be framed as a matter of ‘catch up’. Indeed, based applications was fairly seamless in a
by the end of May 2001, 40 million Japanese Euro-American context, I noted how specific
were able to access a version of the inter- cultural, language and political issues com-
net on their mobiles, while in August 2000 plicated the uptake of the Web to an extent in
only four million North Americans had an East Asian context.
such access (Miyata et al., 2005: 145). Yet For the first few years after the roll-out
the i-mode system allowed access to a lim- of the Web, English was undoubtedly still
ited internet environment, isolated from the the global language of the internet, leading
World Wide Web, where sponsored commer- students in Taiwan to remain embedded in
cial sites allowed users to purchase informa- their local Chinese-speaking BBS systems,
tion, items and services and have the cost while Japanese users developed a separate
548 THE SAGE HANDBOOK OF WEB HISTORY

internet ecology based on i-mode mobile Domesticated System?’ First International


applications. Mainland China, due to its gov- Conference on Cultures and Technologies in
ernment’s longstanding desire for informa- Asia, Mumbai, India, Feb 2004 (http://www.
tion sovereignty (Qiu, 1999: 2), developed its edrdg.org/~jwb/paperdir/jwww.html),
own proprietary Web-based platforms. accessed 18 June 2018.
Breen, J. (2007) ‘Computing in Japanese: What
However, in no sense has this chapter been
Are the Frontiers Now?’ Workshop on Com-
arguing for cultural essentialism. It did not putational Japanese Studies, University of
take long for users across East Asia to become Tokyo (http://www.csse.monash.edu.au/~jwb/
enthusiastic adopters of Web-based platforms cj_abstract.html), accessed 17 March 2017.
once issues of script input, display and inter- Carey, J., and Elton, M. (2010) When Media
operability had been overcome. Indeed, by Are New: Understanding the Dynamics of
2002, 6 percent of Web content was esti- New Media Adoption and Use. Ann Arbor:
mated to be in Japanese (ahead of Spanish MIT Press.
and French), and by 2005 Chinese speakers Castillo, M. (2011) ‘QWERTY, @, &, #’, Ameri-
had emerged as the second-largest bloc of can Journal of Neuroradiology, 32: 613–14.
internet users after speakers of English (cited Choi, Y.B. (2013) ‘Path Dependence on the
Korean Keyboard’, Journal of Economic
in Gerrand, 2007). Instead, I have argued for
Behaviour and Organization, 88: 37–46.
the need for an enhanced scrutiny of hidden Contreras, J. (2014) ‘Divergent Patterns of
biases in internet infrastructure, including Engagement in Internet Standardization:
the manner in which the Web extended and Japan, Korea and China’, Telecommunica-
amplified these biases, in order to under- tions Policy, 38: 914–32.
stand the complex and distinct regional Fouser, R. (2001) ‘“Culture”, Computer Liter-
negotiations that needed to take place across acy and the Media in Creating Public Atti-
the globe before Web use could become the tudes to CMC in Japan and Korea’, in
seemingly natural and default mode of inter- Charles Ess (ed.), Culture, Technology, Com-
net access that it has become today. munication: Towards an Intercultural Global
Village, Albany: SUNY Press, pp. 261–78.
Gerrand, P. (2007) ‘Estimating Linguistic Diver-
sity on the Internet: A Taxonomy to Avoid
Pitfalls and Paradoxes’, Journal of Computer
REFERENCES Mediated Communication, 12(4):
1298–321.
Abate, T. (1996) ‘The Midnight Hour: Japan Goggin, G., and McLelland, M. (2017) ‘Global
Ventures onto the Net in the Dark of Night’, Coordinates of Internet Histories’, in Gerard
Scientific American, January, 37. Goggin and Mark McLelland (eds), The Rout-
Akimoto, A. (2011) ‘In the Battle with Smart ledge Companion to Global Internet Histo-
Phones is I-Mode Dead?’ Japan Times, April 20 ries, New York: Routledge, pp. 1–20.
(http://www.japantimes.co.jp/life/2011/04/20/ Gottlieb, N. (2000) Word Processing Technol-
digital/in-the-battle-with-smart-phones-is-i- ogy in Japan: Kanji and the Keyboard. Rich-
mode-dead/#.V_2l7DVH5W0), accessed 18 mond: Curzon.
June 2018. Inside China. (2012) ‘Chinese BBS – The Social
Auh, T.-S. (1998) ‘Promoting Multiculturalism Activity that Never Grows Old’
on the Internet: Korean Experience’, paper, (http://thinkingchinese.com/
Graduate School of Journalism and Mass chinese-bbs-the-social-activity-that-never-
Communication, Korea University, Republic grows-old)
of Korea (http://www.unesco.org/Web- Jin, L. (2008) ‘Chinese Online BBS Sphere:
world/infoethics_2/eng/papers/paper_8. What BBS Has Brought to China’, Master’s
htm), accessed 17 March 2017. thesis, Massachusetts Institute of Technology
Breen, J., and Tokita, A. (2004) ‘The WWW in (http://pdf.textfiles.com/academics/liwen-
Japan: A Threat to Cultural Identity or a jin2008.pdf), accessed 18 June 2018.
CULTURAL, POLITICAL AND TECHNICAL FACTORS INFLUENCING EARLY WEB UPTAKE 549

Jo, D. (2017) ‘H-mail and the Early Configura- Pedestrian: Mobile Phones in Japanese Life,
tion of Online User Culture in Korea’, in Cambridge: MIT Press, pp. 143–63.
Gerard Goggin and Mark McLelland (eds), Natsuno, T. (2003) The i-mode Wireless Ecosys-
The Routledge Companion to Global Inter- tem. Chichester: John Wiley & Sons.
net Histories, New York: Routledge, pp. Nishigaki, T. (1998) ‘Multilingualism on the
197–208. Net’, Paper presented at UNESCO INFOethics
Johnstone, B. (1994) ‘Wiring Japan: A Bitter ‘98 (http://www.unesco.org/Webworld/info-
Cultural Clash Has Reduced Japan to a Third- ethics_2/eng/papers/paper_5.htm), accessed
Rate Power in Networking’, Wired, 1 Febru- 17 March 2017.
ary (http://archive.wired.com/wired/ Pargman, D., and Palme, J. (2009) ‘ASCII
archive/2.02/wiring.japan_pr.html) Imperialism’, in Martha Lampland and
Katayama, L. (2007) ‘2-Channel Gives Japan’s Susan Leigh Star (eds), Standards and their
Famously Quiet People a Mighty Voice’, Stories: How Quantifying, Classifying and
Wired, 19 April (http://archive.wired.com/ Formalizing Practices Shape Everyday Life,
culture/lifestyle/news/2007/04/2channel), Ithaca: Cornell University Press, pp.
accessed 18 June 2018. 177–99.
Kohiyama, K. (2005) ‘A Decade in the Develop- Parker Smith, N. (1997) ‘Computing in Japan:
ment of Mobile Communications in Japan From Cocoon to Competition’, Computing,
(1993–2002)’, in Mizuko Ito, Daisuke Okabe, March, 26–33.
and Misa Matsuda (eds), Personal, Portable, Qiu, J.L. (1999) ‘Virtual Censorship in China:
Pedestrian: Mobile Phones in Japanese Life, Keeping the Gate between Cyberspaces’,
Cambridge: MIT Press, pp. 61–70. International Journal of Law and Communi-
Latzer, M. (1995) ‘Japanese Information Infra- cations Policy, 4: 1–25.
structure Initiatives’, Telecommunications Seo, D. (2013) Evolution and Standardization
Policy, 19(7): 515–29. of Mobile Communications Technology.
Li, S. (2010) ‘The Online Public Space and Hershey, PA: Information Science
Popular Ethos in China’, Media, Culture and Reference.
Society, 32(1): 63–83. Shapard, J. (1993) ‘Islands in the (Data)Stream:
Li, S.L., Lin, Y.-R., and Huang, A.H.-M. (2017) Language, Character Codes, and Electronic
‘Brief History of Taiwanese Internet: The BBS Isolation in Japan’, in Linda M. Harasim (ed.),
Culture’, in Gerard Goggin and Mark McLel- Global Networks: Computers and Interna-
land (eds), The Routledge Companion to tional Communication, Cambridge, MA: MIT
Global Internet Histories, New York: Rout- Press, pp. 255–70.
ledge, pp. 182–96. Taiwan Today. (2010) ‘BBS-themed Movie Stirs
Lunde, K. (1999) CJKV Information Processing. Up Campuses’, June 8 (http://taiwantoday.
Sebastopol, CA: O’Reilly & Associates. tw/ct.asp?xitem=106214&ctnode=1730
Matsuda, M. (2005) ‘Mobile Communication &mp=9), accessed 18 June 2018.
and Selective Sociality’, in Mizuko Ito, UNESCO. (2005) ‘Measuring Linguistic
Daisuke Okabe, and Misa Matsuda (eds), Diversity on the Internet’, UNESCO, Paris
Personal, Portable, Pedestrian: Mobile (http://www.unesco.org/new/en/communi-
Phones in Japanese Life, Cambridge: MIT cation-and-information/resources/publica-
Press, pp. 123–42. tions-and-communication-materials/
McLelland, M. (2017) ‘Early Computer Net- publications/full-list/measuring-linguistic-
works in Japan 1984–1994’, in Gerard diversity-on-the-internet/), accessed 18
Goggin and Mark McLelland (eds), The Rout- June 2018.
ledge Companion to Global Internet Histo- Unger, J.M. (1987) The Fifth Generation Fal-
ries, New York: Routledge, pp. 171–80. lacy: Why Japan Is Betting its Future on
Miyata, K., Boase, J., Wellman, B., and Ikeda, Artificial Intelligence. New York: Oxford Uni-
K. (2005) ‘The Mobile-izing Japanese: Con- versity Press.
necting to the Internet by PC and Webphone Vincent, G. (2001) ‘Learning from I-Mode
in Yamanashi’, in Mizuko Ito, Daisuke Okabe, [Packet-Based Mobile Network]’, IEE Review,
and Misa Matsuda (eds), Personal, Portable, 47(6): 13–18.
550 THE SAGE HANDBOOK OF WEB HISTORY

Wang, G. (1984) ‘Information Revolution in Weber, G. (1997) ‘Top Languages: The World’s
Taiwan: Economic Concerns and Beyond’, in Top 10 Languages’, Language Today, 2:
AMIC-ISEAS-EWC Workshop on Information 22–8.
Revolution in Asia-Pacific, Singapore, Yang, L. (2017) ‘Platforms, Practices, and Poli-
Dec 10–12, 1984, Singapore: Asian Mass Com- tics: A Snapshot of Networked Fan Commu-
munication Research & Information Centre nities in China’, in Gerard Goggin and Mark
(https://dr.ntu.edu.sg/bitstream/handle/ McLelland (eds), Routledge Companion to
10220/311/AMIC_DEC10-12_1984_10. Global Internet Histories, New York: Rout-
pdf?sequence=1), accessed 18 June 2018. ledge, pp. 370–84.
37
Online Pornography
Susanna Paasonen

Pornography has played a crucial, albeit and the vocal public concerns that it tends
often neglected role in the development of to evoke, academic studies concerning it –
Web solutions and Web economy since their and particularly ones focusing on commer-
very earliest days. The enterprises of online cial platforms – remained few up until the
gaming and shopping began to pick up 2010s. The rare in-depth studies that did exist
towards the end of the 1990s, whereas por- focused almost exclusively on US contexts
nography remained, virtually from the launch (Lane, 2001), on alternative and independ-
of the first graphic Web browsers, one of the ent pornographies (Jacobs, 2007), or both
few forms of content that users were consist- (Magnet, 2007). It is fair to state that Web
ently willing to pay for (Lane, 2001: xiii; porn long remained one of the most under-
Perdue, 2002). Safe credit card processing studied areas in Internet research. Significant
systems, streaming video technologies, host- knowledge gaps continue to exist when it
ing services, promotional design practices comes to the production, distribution, and
such as banner advertisements, mouse-­ consumption of Web pornography in a his-
trapping, and pop-ups were first developed torical perspective.
for and applied on porn sites (Bennett, 2001; More porn is available on the Web than
Lane, 2001: 70; Johnson, 2010; McNair, ever to date and massively popular video
2013: 27–9). Pornography has often been aggregator sites modelled after YouTube (est.
heralded as a ‘killer app’ as a form of content 2005) sport multiple billion annual visits.
that quickly migrates to new media platforms This development seems to resonate with the
with commercial success: this was certainly broad diagnoses on the pornification of media
the case with the Web in the 1990s. culture, according to which pornographic
Despite both online pornography’s remark- aesthetics have grown ubiquitous enough to
able perennial popularity among consumers infiltrate diverse visual media practices from
552 THE SAGE HANDBOOK OF WEB HISTORY

advertising to the circulation of nude selfies. public visibility of pornography has been
At the same time, pornography’s role and altered in the course of its online distribution,
position is currently crucially different than and how this connects to the policing of
in the Web cultures of the 1990s. Despite its online content in national-level media regu-
perennial popularity, the role of porn as a lation and policy, as well as in terms of the
driving force of dot.com enterprise and tech- moderation carried out by online platforms.
nical innovation has clearly passed, as has the All this necessitates understanding the pro-
period when one could create quick profits by duction of Web pornography, as well as the
simply setting up an adult site. Pornographic notion of the porn industry that it connects
content is actively weeded out from the tar- to, as characterized by inner distinctions and
geted advertising and linked content on social constant fragmentation on the one hand, and
media sites such as Facebook, Instagram, and by the increasing centralization of ownership
Pinterest. As tech journalist Cade Metz notes, and distribution, on the other.

with the rising power of companies like Apple and


Google and Facebook, the adult industry doesn’t
drive new technology. In many respects, it doesn’t
even have access to new technology. The big tech PORN AND THE WEB: A PERFECT
companies behind the big platforms control not MATCH?
only the gateway services (the iPhone app store,
Google Search, the Facebook social network) but Graphic Web browsers such as Mosaic
the gateway devices (the iPhone, Android phones,
Google Chromecast, the Amazon Fire TV, the (1993), Netscape Navigator (1994), and
Oculus Rift virtual reality headset). And for the Microsoft Explorer (1995) made it possible
most part, they’ve shut porn out. (Metz, 2015) to embed image files and, gradually, ani-
mated GIFs and video clips into the inter-
All this results in a complex nexus where the face. Multimodal interface design possibilities
abundant accessibility and diversity of Web were understandably lucrative for the distri-
porn meets its limited visibility on social bution of pornography, which relies heavily
media and app markets, insufficient knowl- on visual and audiovisual material, despite
edge of the working practices and economies the continuing popularity of literary
of pornography, and public discourses of porn, especially in its user-generated forms
concern on the pornification of culture. (Lane, 2001: 69–70; Paasonen, 2010a). At
Exploring these connections and disconnec- the time, the market of porn was dominated
tions, this chapter first maps out the develop- by VHS, DVD, and magazine releases sold in
ment of Web pornography from the special shops and newsagents, as well as
home-grown enterprises of the early 1990s to through mail order.
the increasing visibility of sexual subcultures Pornography was readily and plentifully
and the presence of established companies on available on Usenet and bulletin board sys-
online platforms. It then examines the shifts tems (BBSs) alike as binary files, and it soon
that have occurred in porn production with migrated to Web platforms. Initially, the most
the ubiquity of proam (a.k.a. professional visible and successful Web porn enterprises
amateur), reality, and amateur productions, were small-scale ventures that occasionally
as well as the centralization of porn distribu- had the same people performing in front of
tion on video aggregator sites. The chapter the camera as writing the HTML. Some –
explains how Web technologies and the cen- such as the amateur star ‘Wifey’ of Wife’s
trality of search functions and metadata in World – had established their reputation on
particular have affected the development and Usenet, while others – such as Danni Ashe,
uses of pornographic content, what kinds of the owner and star of Danni.com – also had
sexual taste cultures have emerged, how the modelling and stripping experience before
ONLINE PORNOGRAPHY 553

starting to run their own sites (Perdue, 2002: pornographies, from sites specializing in
27, 156–8; Mash, 2004; Paasonen, 2011: 35, erotica writing to queer experimental porn
93–5). Large, established video production and glossy commercial pornography targeted
studios and print publishers, from Playboy at straight male audiences.
to Penthouse, Hustler, Vivid, and Private, While Web hosting was never exactly free,
did not similarly pioneer in establishing their especially when involving larger user bases
Web presence, coin innovative online ser- and audiovisual content necessitating more
vices, or by any means invest extensively in bandwidth, online distribution was particu-
them (Perdue, 2002: 63). These companies larly tempting for independent entrepreneurs
advertised their productions online but were that had little or no resources to engage in
relatively slow in extending their business DVD or print production. Online distribution
models to Web distribution. The dwindling was an attractive option for porn producers
markets of DVD and magazine retail later on a number of levels. First of all, it afforded
considerably hurt such established brands. a potentially global audience unlimited by
Playboy temporarily dropped sexually restrictions in local regulation, store hours,
explicit images in 2016 in response to feed- and access to retail opportunities. Content
back from its younger readers, while both not allowed for distribution in one country
Hustler and Penthouse have struggled to keep could easily be placed on a server outside its
up their print production. borders, which eroded the viability of estab-
Code remained simple and easily man- lished systems of media regulation based on
ageable during the first years of the Web. pre-screening and classification developed
Before the development of Web professions for the distribution of film, video, and print
and design solutions such as cascading style- materials. Furthermore, by doing away with
sheets (CSS) and JavaScript, it took relatively theatres, mail-order companies, and stores
little skill to set up and run a site. Since the that have traditionally gained considerably
professional standards of Web design were from selling porn, Web platforms gradually
yet to be established (Kotamraju, 1999), the redefined the ways in which porn producers
playground was notably level for both aspir- reach consumers. Meanwhile, other middle-
ing independent pornographers and multi- men profiting from the traffic, such as credit
million-dollar companies trading in porn card companies, continued to thrive.
superstars. Consequently, 1990s Web porn For consumers, online platforms drasti-
was heterogeneous in terms of its agents, cally changed the accessibility and uses
economies, aesthetics, and agendas. Porn of pornography. Since porn use no longer
was made available in limited ‘free tour’ necessitated the acquisition of material com-
sections of pay sites, as pirated content, as modities, it was unnecessary to visit a spe-
thumbnail teasers on link sites, as free con- cialty shop or indeed to move anywhere from
tent, and as materials protected by pay walls. one’s computer. Despite the factual tracking
US products dominated the markets of Web capacities inbuilt in IP addresses and cook-
porn, yet low distribution costs, coupled with ies, online porn consumption allowed for an
the increasing affordability of digital cameras unprecedented impression of anonymity. The
and the gradual demographic diversification range of available Web pornographies very
of Internet users in terms of nationality, age, soon grew, allowing for a degree of option
and ethnicity, broadened the base of porn unavailable even in the largest of porn retail
production. Combined with the increased cir- stores. Given the volume of freely accessi-
culation and accumulation of pre-digital por- ble teasers and pirated files, it was not even
nographies on online platforms and the rise necessary to pay for the content downloaded
in the distribution of amateur content, this beyond the connectivity charges themselves.
resulted in the clear expansion of available The costs of dial-up connections were
554 THE SAGE HANDBOOK OF WEB HISTORY

generally calculated either by second or by whiteness, and thinness. The acts, gestures,
the bytes downloaded, and access to free por- and positions performed have similarly fol-
nography was therefore not all free. lowed the choreographies of commercial
porn, which are used as templates in making
amateur clips look like pornography (Doorn,
2010). At the same time, videos need to have
AMATEURS, PROAMS, AND INDIE a domestic feel in order to come across as
PRODUCERS amateur productions. Amateur porn produc-
tion therefore involves particular forms of
The spread of digital cameras, camcorders, gendered domestic labour even as it is pre-
and smart phones provided both amateurs sented as no work at all (Hofer, 2014: 335,
and semi-professional producers with inex- 343–4). Their aesthetic of homey intimacy is
pensive tools for making their own pornogra- connected to the assumed ethics of produc-
phy. While amateurs have produced their tion, according to which amateur content is
own explicit content in virtually all media voluntarily produced for the sake of pleasure
known to man, it was not easy to circulate on rather than for profit, and it therefore remains
VHS tapes or as printed matter beyond one’s detached from the oppressive work condi-
immediate social circles (Esch and Mayer, tions of the porn industry (Paasonen, 2010b:
2007: 101). People shared their own digital 1302–4). Images and videos leaked, stolen,
content in newsgroups and through IRC and otherwise distributed without permis-
(Internet Relay Chat), which allowed for sion on revenge porn sites have a different
exchanges between people with similar sort of appeal, one geared towards slut sham-
sexual palates (Slater, 1998; Dery, 2007). ing (Stroud, 2014). For their part, celebrity
Amateur Web porn was successful through- sex tapes have been both leaked and know-
out the 1990s (e.g. Lane, 2001: 209–12), yet ingly produced as the means of building
its visibility and popularity peaked in the mid one’s star image (e.g. Esch and Mayer, 2007;
2000s as porn distribution began to shift to Hilderbrand, 2009: 68–71; Cruz, 2011).
platforms emulating the operating principles The rise in amateur pornography has been
of social media sites. The rise in amateur a key trend of Web porn for over a decade.
porn therefore runs parallel to the overall rise The appeal of immediacy and realness is
in user-generated content crucial to the oper- crucial to the genre of pornography, which
ating principles and business models of Web has promised to convey sexual acts through
2.0 and social media (Dijck, 2013; Marwick, the lens of the camera as they unfold ever
2013: 21–66; Jarrett, 2015: 7–10). since the advent of film (Williams, 1989).
The central promise and attraction of ama- Gonzo porn rose to popularity in the 1990s
teur porn involves its unpolished look of real- with its seemingly improvised scenes, non-­
ness and authenticity, which is seen to contrast professional performers, and subjective cam-
with the stylized scenarios and trimmed bod- era shots (Maina and Zecca, 2016; Stella,
ies of commercial productions (e.g. Hardy, 2016). On Web platforms, the concept of
2009; Hilderbrand, 2009: 66–7; Paasonen, gonzo gave way to reality porn, such as the
2010b). Uploaded on both general-interest sites run by the Miami-based Reality Kings.
video aggregator sites and platforms special- Shot on streets and in cars, motels, and pri-
ized in amateur content, user-­generated clips vate residencies, reality porn makes use of
have been shared for free, in return for gift proam performers and the constant, renew-
vouchers and fixed fees. The performers of able stream of new young talent. Inexpensive
the most popular – and hence also the most to produce, reality porn balances amateur
visible – amateur videos have generally con- codes of authenticity with repetitive the-
formed to the body norms of (female) youth, matic and narrative patterns: the standard
ONLINE PORNOGRAPHY 555

plot of Backroom Casting Couch, for exam- community feel, they have, seemingly para-
ple, involves ingénue actresses perform- doxically, equally been identified as ‘the
ing sexual favours in the hope of landing a research and development arm of the porn
part, whereas, in Bait Bus, young women are industry’ (Cramer and Home, 2007: 165). In
assumedly lured to have sex with the promise order to evoke the interest of users with nov-
of money they are never to receive. elties and specialities, so-called mainstream
Parallel to the staged and rehearsed authen- porn companies have long drawn on sexual
ticity of reality sites and the avalanche of ama- subcultures and niche pornographies for
teur porn, Web porn has afforded a broader inspiration, hence also familiarizing them in
public visibility to a range of sexual niches the process (McNair, 2002: 206; Dery, 2007).
and subcultures while also commodifying All kinds of porn sites have turned towards
them in different degrees. Be it a question of alt porn when seeking out new audiences,
preferences concerning body size, age, eth- diverse content, and novel principles of
nicity, hirsuteness or the lack thereof; tastes operation (Attwood, 2007: 452). Tattooed
in role-play, bondage or discipline; fetishes and pierced female models have long ceased
involving uniforms, hiking boots, balloons, to be exceptional in so-called mainstream
or fake fur; interest in the sexual frolicking productions – quite the contrary – while
of cartoon characters or Hollywood stars, community features have grown increasingly
Web platforms cater to virtually any fantasy central to how users are invited to engage
scenarios and participatory possibilities. This with sites, comment on content, grade it, and
has contributed to the ever clearer articula- upload material themselves.
tion of sexual taste cultures in the realm of Some scholars have identified the trans-
pornography. formations fuelled by digital production and
Established in 2001, Suicide Girls became online distribution as entailing a rupture in
the best-known soft-core alt porn site, with the history of pornography. In the mid 2000s,
its emphasis on female sexual agency and these perspectives were united under the
subcultural body styles. The models featured rubric of netporn, defined as pornographies
tattoos, piercings, and punk and Goth coif- particular to online platforms and networks
fures, and users were invited as members to (Jacobs, 2007; Jacobs et al., 2007). Netporn
read their blogs and engage with them: in this referred to the blurred boundaries of porn
sense, Suicide Girls framed membership as producers and consumers, and the rise in
a lifestyle choice (Attwood, 2007; Magnet, alternative body aesthetics and amateur, sex
2007). The adult performer and producer activist, and art projects, as well as the slip-
Joanna Angel established Burning Angel in periness of the very notion of porn caused
2002 as the more sexually explicit alt porn by the proliferation of queer, independent,
site. In their applications of subcultural capi- and alternative content. It was contrasted
tal and sexual titillation, alt porn became, in with ‘porn on the Net’, namely the recy-
Feona Attwood’s (2007: 449–50) phrasing, cling of the same old pornographic images
representative of ‘new sex taste cultures’, and videos online (Shah, 2007). The notion
which defined ‘themselves through a variety of netporn, as outlined in the two Netporn
of oppositions to mainstream culture – and conferences held in Amsterdam in 2005 and
especially mainstream porn – as creative, 2007, a listserv, and an edited reader (2007),
vibrant, classy, intelligent, glamorous, erotic, was premised on technological, aesthetic,
radical, varied, original, unique, exceptional ethical, political, and economic particularity.
and sincere’. It helped to mark out the increasing visibility
While alt and indie porn have been of non-normative sexual palates and minori-
argued to challenge the porn industry in tarian sexual cultures, yet its binary prem-
terms of their ethics of production and their ises and dynamics were of less assistance in
556 THE SAGE HANDBOOK OF WEB HISTORY

mapping out the transformations that the pro- Such distinctions provided users with broad
duction, distribution, consumption, and range options to navigate in between. The range and
of pornography were undergoing (Paasonen, plethora of tagging practices marks the latest
2010b). Furthermore, it was the Web, and not stage in the development that has, throughout
the Net more generally, that had become the the history of Web porn, rendered the variety
norm in porn distribution on a global scale. of body styles, roles and positions, niches,
While scholarly interest towards alternative and fetishes increasingly articulate and there-
and independent pornographies remains, fore also more recognizable. This has also
the vocabulary of netporn is currently in involved the familiarization – if not precisely
scarce use. the domestication – of fringe pornographies
that were previously deemed too marginal for
mainstream consumption.
This has perhaps most obviously been
METADATA, SEARCHABILITY, AND the case with Japanese pornographic anime,
PORN TAXONOMIES hentai, which routinely depicts penetrative
sex between humans, demons, and monsters,
Pornographic images distributed in news- notably often in scenarios of non-consensual
groups, BBSs, IRC, or Gopher were indexed domination and submission. Hentai remained
through file names and categories. In con- a specialty niche too bizarre for DVD appeal
trast, the searchability of visual and audio- distribution in the 1990s, only to enter the
visual Web content necessitated a much more menus of all kinds of porn Web sites during
clear and complex textual marking out of the following decade (Dahlqvist and Vigilant,
subcategories, terms, names, titles, acts, and 2004). The popularity of hentai has since
preferences (Chun, 2006: 106). Such contex- only increased in connection with both the
tual metadata was necessary for the function- rise of cartoon porn and Japanese game porn
ality of search engines but equally for the fandom. Another equally visible example
search functions of porn sites themselves, as involves transgender porn targeted at hetero-
they grew in size to host hundreds of thou- sexual male consumers. While so-called ‘she-
sands of files. Porn images and videos are male’ and ‘chicks with dicks’ pornography – a
indexed on the basis of factors such as date, field distinct from transgender porn produced
file size and length, subgenre, popularity of and consumed within queer communities –
content, number of views, body types, had long been produced in Brazil in particu-
national origins, performers, production stu- lar, the subgenre was more thoroughly com-
dios, and the acts performed. Metadata modified and modified on Web platforms as
remains crucial to the diverse categorizations it grew into a staple element of the palette
and tagging functions of aggregator sites of online porn in the early 2000s (Paasonen,
appropriating the participatory models of 2011: 147–50). The relatively early entry
social media. Dictated by the necessities of of both transgender porn and hentai into
information architecture, metadata has the interfaces of so-called mainstream porn
helped to render the inner distinctions speaks of the fragmentation and diversifica-
within the genre of pornography more tion of online pornography in ways that call
manifest than ever. into question the veracity of the very notion
Print, VHS, and DVD porn were all of the mainstream.
broadly categorized through parameters such The promise of the Web was, since the mid
as straight or gay, as being focused on spe- 1990s, one of abundant and diverse pornog-
cific acts or scenarios (e.g. anal sex, BDSM), raphy that only waited to be found. In prac-
through their stars, and production styles tice, finding it was not, however, always easy.
such as amateur and gonzo pornography. Content remained fragmented across the
ONLINE PORNOGRAPHY 557

Web, occasionally brought together as Web Some of the sites listed offered as few as five
rings pointing users from one affiliated site or six images, while only two promised more
to another, and as listed on directory sites than 1,000 images. The contextual metadata
that were the default means of finding con- describing the style, genre, and content of the
tent before search engines were in common images remained thin or even absent. Since
use. Persian Kitty’s Adult Links was one of thumbnail preview was by no means always
the best-known link sites facilitating access in use, it was necessary to first download the
to free porn images. Established by Beth images in order to see what they were. Given
Mansfield, ‘a Tacoma, Washington home- the speed of dial-up connections, this would
maker and mother of two’ (Lane, 2001: 89) have been a slow enterprise. Dial-up modems
in 1995, Persian Kitty was a simple directory had the maximum download speed of 56
page with links to featured sites that paid for kilobytes per second, which severely con-
advertising space and an alphabetical listing strained the use of images and video clips.
of sites with information on the number of Larger files easily took minutes to download
images that each of them contained. By 1998, and small thumbnails were in broad use for
Persian Kitty attracted over half a million vis- giving the user some idea of the awaiting
itors per day and made tens of thousands in visual content. Rough black-and-white bit-
monthly profit. During a period when finding map (BMP) raster images were also used for
pornography, and free porn in particular, was similar purposes, gradually fading away from
cumbersome, Persian Kitty grew into one the background as the desired JPG or GIF file
of the prime portals for accessing it (Lane, of the same width and height grew visible.
2001: 190–2). Indeed, according to an early Web design
The earliest version of Persian Kitty availa- rule-of-thumb, individual pages should not
ble through the Internet Archive (archive.org) exceed 100K in size, were they to download
is from November 26, 1996. Its featured con- smoothly.
tent includes the site’s sponsor, Danni’s Hard Web directories were not necessarily main-
Drive, as well as Amateur Hardcore with ‘A tained frequently enough to keep their links
quarter-million fast downloading hardcore fully up to date. Meanwhile, porn sites tried
amateur pix and totally raw movie clips, in to knowingly derail users to click on lucra-
AVI, Quicktime & Mpeg. Unlimited down- tive links, independent of what the users may
loads! Instant Access! And it’s all Keyword- have been specifically looking for. The opti-
searchable! Thumbnails and descriptions, mization of clicks emerged early as a strategy
too! Raunchy! Nasty! Extreme Hardcore!’ of profit generation, while other solutions,
The description of the alphabetically listed including mouse-trapping and pop-ups, were
link sites remained much more straightfor- designed to force users to stay on the site, or
ward, as this excerpt illustrates: to become acquainted with content against
their will (Perdue, 2002). For porn users, the
HENTAI MANGA ANIME PAGE – 50 hentai pix landscape of Web porn was therefore one of
HERMITAGE – 20 Asian pix endless, optimistic yet frustrating waiting,
HEUY’S PAGE – 12 pix searching and clicking, getting stuck, and
HOLLYWOOD HILLS HOOCHIES – 25 babe pix, no
regularly ending up with different content
preview
than that advertised (Patterson, 2004).
HONNEAMISE ASIANS – 20 pix, no preview
HORNY TOADS – 9 babe pix, 1 avi Altavista, the leading search engine pre-
HOT CHICKS – 30 babe pix, no preview ceding the reign of Google, was launched in
HOT PEPPERS – 15 pix, no preview 1995, the year that Yahoo! started out as a Web
HOTTEST BABE ON THE WEB – 20 pix a week, vote directory and a year after the equally popular
for your favorite search tools Lycos, Infoseek, and WebCrawler
THE HOTZONE – 18 pix, no preview were introduced. The increasing use of search
558 THE SAGE HANDBOOK OF WEB HISTORY

engines marked a gradual shift from link sites became a profitable novelty while stream-
to keyword searches in porn use. Pornography ing video, and the format of the video clip in
was remarkably popular with early searches: particular, began to dominate as the default
in 1997, almost 17 per cent of search queries format of porn consumption.
were connected to pornography and sexual-
ity. In contrast, in 2005, porn comprised less
than 4 per cent of all searches (Spink et al.,
2006). This proportional drop does not speak VIDEO AGGREGATOR SITES AND THE
of a decrease in the popularity of pornog- IMPLICATIONS OF CENTRALIZATION
raphy – quite the contrary, both the volume
and use of Web porn continued to grow, and Porn sites adopted streaming video technolo-
they continue to do so to this date. It rather gies early on since many users found applica-
speaks of expansive increase in all Web con- tions requiring plug-ins, such as Real Player
tent and in the diversification of its user base: and Microsoft Media Player, too cumber-
to mention only a few examples, during this some (Perdue, 2002: 140–3). In 2002, porno-
period, public broadcasting companies started graphic content was estimated to take 30 to
making their content available through online 70 per cent of all bandwidth, largely for the
archives and streaming video services; banks reason that other forms of video content
shifted their services online; online learn- remained much less popular (Perdue, 2002:
ing platforms emerged; online shopping had 179–84). The success of porn video aggrega-
grown steeply with giants such as Amazon tor sites such as YouPorn (est. 2006),
and eBay leading the way; and social media RedTube (2006), Xtube (2006), XVideos
platforms were about to make a breakthrough. (2007), xHamster (2007), and Pornhub
In 1995, there were an estimated 23,500 web- (2007) means that pornography continues to
sites (that is, unique hostnames): the number occupy server space and user attention, yet
grew to over three million by 1999. Out of with platforms such as YouTube, Netflix, and
these, an estimated 30,000 were focused on Hulu in the mix, the landscape of streaming
pornography (Lane, 2001: 135). By 2005, the video looks clearly different than it did in the
total number of sites was close to 65 million early 2000s. In 2016, an impressive 92 bil-
and by 2015, over 860 million (Internet Live lion videos were watched on Pornhub, the
Stats, 2017). As the volume of Web content world’s second most popular porn site, yet
rapidly grew in number and variety, the pro- YouTube’s number was 20 times higher, with
portional volume of porn sites decreased. The almost five billion daily views.
same applied to search queries. Aggregator sites marked a departure from
Directories, link and click sites continue the free and centralized accessibility of video
to live on in a range of forms, from person- clips. As more and more video became avail-
ally curated fan selections to metasites that able online as torrents shared in P2P net-
routinely guide users to different directions works and as tube content shared by studios
than those of their own choosing – and even for marketing purposes, uploaded by users
Persian Kitty is still up and running. The with little regard for copyright, and produced
rhythms and temporalities of porn browsing by the users themselves, DVD sales began to
have nevertheless undergone clear transfor- steeply and permanently drop (Moye, 2013;
mations since the 1990s when one needed to Rosen, 2013). This drop equalled the trans-
constantly wait for servers to respond and for formations that had occurred in the music
materials to download. The gradual spread industry since the launch of peer-to-peer
of broadband connections around the new sharing app Napster in 1999 (Leyshon et al.,
millennium allowed for image resolutions 2005), yet it took place significantly later,
to increase to the point that high-definition around the financial crisis of 2008.
ONLINE PORNOGRAPHY 559

Pornography was long considered (2014) notes that its ‘producers make porn
‘­recession-proof’ in the sense that its profits films mostly for the sake of being uploaded
tended to steadily increase despite any oscil- on to MindGeek’s free tube sites, with lower
lations in national or international economy. returns for the producers but higher returns
More pornography is being produced than for MindGeek, which makes money off of the
ever to date, there is factually more pornog- tube ads that does not go to anyone involved
raphy available, and the number of visits in the production side’. In other words, the
to porn sites continues to increase, yet the profits of online pornography revolve around
profits of the porn industry have, as a whole, distribution, while the income streams have
decreased. Users have grown unwilling to grown considerably thinner in its production.
pay subscription fees even for specialized and Production studios have moved from
premium content. Kink.com (est. 1997), one establishing long exclusive contracts with
of the success stories of Web porn, special- models to a system where producers hire
izing in rough scenarios of domination and the necessary crew and cast for each title.
control, announced in 2016 that its revenues As Heather Berg (2016: 161) points out, this
had recently been cut in half due to drastic ‘gig economy’ builds on a reserve army of
drop in membership subscriptions. Webcams, labour that is ‘willing to perform in porn
with their live nature and interaction possi- even when pay and conditions are poor’ and
bilities with porn stars, amateurs, and proams the workers of which are placed ‘in shifting
alike, remain virtually the only form of adult positions as entrepreneurs, independent con-
content that users are willing to broadly pay tractors, employees, contracted and freelance
for. Pornographic webcams have been pop- managers, and producers’. On the one hand,
ular for as long as the technology has been the economy of Web porn has grown increas-
available (Lane, 2001: 249–58; Senft, 2008; ingly corporate and centralized, especially in
Hillis, 2010), yet webcamming has also North America. On the other hand, the labour
increasingly centralized around hubs. At the of porn has grown increasingly precarious as
time of writing, the largest of these were the a source of income for its makers.
Hungarian-born site LiveJasmin (est. 2001), Just as the traffic of views, links, and
the California-based Chaturbate (est. 2011), clicks connected to news items, memes, and
and the Cyprus-based BongaCams (est. video clips is driven through globally lead-
2012). ing social media hubs, the traffic of online
The money flows of Web porn have not porn is increasingly centralized and organ-
suddenly dried up, but they are following ized through aggregator sites. If, throughout
different routes than during the previous the 1990s and beyond, porn consumption
decades. As studios focused on DVD pro- was characterized by endless searching,
duction and pay content have suffered, many in the 2010s, tube sites promise to host all
of them have been bought up. The origi- possible content within one interface: users
nally Montreal-based MindGeek (formerly need merely to browse through the available
known as Manwin) alone has bought major categories and conduct key term searches
brand names such as Men.com, Brazzers, within the site. This centralization follows
Reality Kings, and Digital Playground. This similar patterns to those followed by devel-
has resulted in an unprecedented centraliza- opments connected to corporations like
tion of ownership in both porn production Google, Microsoft, and Facebook, which buy
and distribution, given that MindGeek also up smaller enterprises, aggressively expand
runs the most popular video aggregator sites their operations within the online economy,
(with the exception of XVideos and xHam- and collect massive volumes of data on user
ster). Analysing MindGeek’s dominant role activities, preferences, and trends while doing
in porn, technology writer David Auerbach so. This means that select companies have
560 THE SAGE HANDBOOK OF WEB HISTORY

considerable power to modulate the accessi- expense (see Paasonen et al., forthcoming).
bility of content and forms of usage while not In an Adweek interview, Pornhub’s vice pres-
needing to be transparent as to the parameters ident, Corey Price, explained how the com-
of operation. Most Web porn companies are pany wants to let ‘people know that watching
not listed, and the financial specificities are porn shouldn’t be an underground activity
therefore not subject to public knowledge. It that’s to be seen as shameful. Everyone does
then follows that there is much opaqueness to it, why not just bring that out in the open?’
their flows of money and labour. (Monnlos, 2014). The overall aim of the cam-
The work of video aggregator sites largely paigns is to build an entertainment brand with
involves tasks such as running servers, the mainstream recognizability of the kind previ-
management of data, and the tweaking of ously gained by Playboy or Hustler.
algorithms. As is the case with any social
media platforms or online services, the suc-
cessful operation of porn tubes necessitates
large-scale investments into software engi- PORNIFICATION AND FILTERING
neering and programming. The careers avail-
able at MindGeek – including PHP and Web The increased public visibility of Web por-
development, project management, customer nography and its key brands, together with
service, sales, support, video editing, web- the more general flirtation with sexual repre-
site optimization, mobile design, legal coun- sentation across the fields of media, have
sel, financial, data, security, Web analysis, given rise to a myriad of diagnoses on the
marketing, sales, and PR – differ notably pornification culture (McNair, 2002;
little from those in other tech companies. Paasonen et al., 2007; Attwood, 2009; Smith,
Alongside this partial redefinition of porn 2010; Mulholland, 2013; Paasonen, 2016).
labour as tech work runs the mainstreaming While these diagnoses remain notably diverse
of companies such as MindGeek as brands in their examples and premises, they share a
with more general cultural visibility. Adult focus on how pornography has grown mun-
entertainment companies have abroad social dane in its abundant availability and how its
media presence, yet, with the exceptions of codes and conventions travel across media
Twitter and Tumblr, social media terms of platforms. The mainstreaming of pornogra-
service generally ban gratuitous displays of phy is mapped in terms of its sheer popular-
nudity and sex with the explicit aim of block- ity, as well as the cultural presence of
ing pornographic content from platforms pornography in the guise of porn stars turned
such as Facebook, Pinterest, and Instagram. mainstream media celebrities, and in its aes-
Since this sets automated limits to the circu- thetics circulated in films, television shows,
lation of sexually explicit content, porn com- journalistic overviews, and art projects
panies have resorted to a range of publicity (Attwood, 2009: xiv). It is nevertheless
stunts. online pornography that has become the key
In the case of Pornhub, such efforts have symptom of and symbol for such
ranged from their globally circulated annual developments.
infographics detailing search trends, lengths, In public discourse, online porn is fre-
and the volume of visits within the calendar quently posed as a problem in its exaggerated
year (since 2013) to campaigns against breast displays of gendered and racialized relations
cancer (since 2012) and for the protection of of control and in its arguably addictive quali-
sperm whales (in 2016) and media stunts such ties. The access of minors to online porn has
as the Pornhub theme song (2014) and advert similarly been a key concern since the mid
contests (2014). These campaigns afford 1990s. A range of filtering software has been
broad social media publicity for virtually no developed for blocking adolescents’ access
ONLINE PORNOGRAPHY 561

to porn and Google introduced SafeSearch, be the quintessentially ‘bad’ form of online
namely the automatic filtering of porno- content that would be best weeded out.
graphic content, in 2009 (Paasonen, 2011: The history of Web pornography is one of
32, 43–5). Concerns over the exposure of simultaneous and possibly paradoxical frag-
minors to sexually explicit content remain mentation, diversification, and centraliza-
vocal, especially given the increasing ubiq- tion. Despite the recent increase in academic
uity of smart phones, which also allow interest and in examinations of the webcam
minors to generate and share their own con- sector in particular, many white areas remain
tent. Incidents concerning sexting have given in studies of Web porn, its economies, and
rise to bullying as well as legal action: in the labour practices in perspectives both his-
United States, adolescents taking naked self- torical and contemporary. Porn may sit awk-
ies have been identified as sex offenders for wardly in the overall palette of Web content,
producing and distributing child pornography not least in terms of the content allowed on
(Zhang, 2010). Child pornography remains most social media platforms, yet there is no
a topic of great public concern, especially doubt as to the central role that it has played
in connection with the distribution facili- in technological and economic developments
ties allowed by networked communications. throughout the history of the Web. Rather
Since child pornography is illegal and heav- than continuing to routinely fence off pornog-
ily policed internationally, it is primarily raphy as a special concern, a social problem,
distributed in the so-called deep Web rather or a marginal field of activity, it is crucial to
than on openly accessible platforms. It is also acknowledge the role it has played in online
a form of content actively filtered out from content production and distribution, and to
virtually any website. account for it in analyses thereof. The tena-
Parallel to the concern about the public cious exclusion of pornography from studies
accessibility of all kinds of online smut runs of e-commerce, site design, Web technology,
a range of filtering and censorship practices user cultures, and their affective entangle-
ranging from governmental firewalls in coun- ments can ultimately only obscure the overall
tries such as China and Saudi Arabia to the understanding of how the Web has developed,
filtering of hentai in Australia and the applica- and how it continues to do so. This would
tion of obscenity laws to niche pornographies mean both reproducing existing knowledge
in the UK. Few politicians or tech-sector bias and ignoring the insights brought forth
professionals are willing to defend pornogra- in scholarship on online pornography.
phy with other arguments than possibly those
concerning the freedom of speech: the work
or content of porn would not be included
in most considerations of creative labour REFERENCES
or innovation. On the contrary, the well-
acknowledged presence and ubiquity of smut Attwood, F. (2007) ‘No money shot? Com-
online and the harmful resonances it is seen merce, pornography and new sex taste cul-
to involve have been rationale for a broad tures’, Sexualities, 10(4): 441–56.
range of content policing practices from traf- Attwood, F. (2009) ‘Introduction: The sexuali-
zation of culture’, in F. Attwood (ed.), Main-
fic filtering to nation-specific acts of block-
streaming Sex: The Sexualization of Western
ing, tracking, and banning. Such governance Culture. London: I.B.Tauris. pp. xiii–xxiv.
practices are by no means limited to porn, yet Auerbach, D. (2014) ‘Vampire porn’, Slate, 23
they gain political and public support through October. (http://www.slate.com/articles/
attempts to fight it. Pornography is, in sum, technology/technology/2014/10/mindgeek_
despite its public visibility and perennial por n_monopoly_its_dominance_is_a _
popularity among consumers, considered to cautionary_tale_for_other_industries.html)
562 THE SAGE HANDBOOK OF WEB HISTORY

Bennett, D. (2001) ‘Pornography-dot-com: Hofer, K.P. (2014) ‘Pornographic domesticity:


Eroticising privacy on the Internet’, The Amateur couple porn, straight subjectivities,
Review of Education/Pedagogy/Cultural and sexual labour’, Porn Studies, 1(4):
Studies, 23(4): 381–91. 334–45.
Berg, H. (2016) ‘“A scene is just a marketing Internet Live Stats (2017) ‘Total number of
tool”: Alternative income streams in porn’s websites’. (http://www.internetlivestats.
gig economy’, Porn Studies, 3(2): 160–74. com/total-number-of-websites)
Chun, W.H.K. (2006) Control and Freedom: Jacobs, K. (2007) Netporn: DIY Web Culture
Power and Paranoia in the Age of Fiber and Sexual Politics. Lanham: Rowman &
Optics. Cambridge: MIT Press. Littlefield.
Cramer, F. and Home, S. (2007) ‘Pornographic Jacobs, K., Janssen, M. and Pasquinelli, M.
coding’, in K. Jacobs, M. Janssen and (eds) (2007) C’Lick Me: A Netporn Studies
M. Pasquinelli (eds), C’Lick Me: A Netporn Reader. Amsterdam: Institute of Network
Studies Reader. Amsterdam: Institute of Cultures.
Network Cultures. pp. 159–71. Jarrett, K. (2015) Feminism, Labour and Digital
Cruz, A. (2011) ‘The black visual experience: Media: The Digital Housewife. London:
Hendrix, porn, and authenticity’, Camera Routledge.
Obscura, 26(1): 65–93. Johnson, J.A. (2010) ‘To catch a curious clicker:
Dahlqvist, J.P. and Vigilant, L.G. (2004) ‘Way A social network analysis of the online por-
better than real: Manga sex to tentacle nography industry’, in K. Boyle (ed.), Every-
hentai’, in D. D. Waskul (ed.), Net.seXXX: day Pornography. London: Routledge. pp.
Readings of Sex, Pornography, and the Inter- 147–63.
net. New York: Peter Lang. pp. 91–103. Kotamraju, N.P. (1999) ‘The birth of Web site
Dery, M. (2007) ‘Paradise lust: Pornotopia design skills: Making the present history’,
meets the culture wars’, in K. Jacobs, American Behavioral Scientist, 43(3):
M. Janssen and M. Pasquinelli (eds), C’lick 464–74.
Me: A Netporn Studies Reader. Amsterdam: Lane, F.S. III (2001) Obscene Profits: The Entre-
Institute of Network Cultures. pp. 125–48. preneurs of Pornography in the Cyber Age.
Dijck, J. van (2013) The Culture of Connectivity: New York: Routledge.
A Critical History of Social Media. Oxford: Leyshon, A., Webb, P., French, S., Thrift, N. and
Oxford University Press. Crewe, L. (2005) ‘On the reproduction of the
Doorn, N. van (2010) ‘Keeping it real: User- musical economy after the Internet’, Media,
generated pornography, gender reification, Culture & Society, 27(2): 177–209.
and visual pleasure’, Convergence, 16(4): Magnet, S. (2007) ‘Feminist sexualities, race and
411–30. the Internet: An investigation of suicidegirls.
Esch, K. and Mayer, V. (2007) ‘How unprofes- com’, New Media & Society, 9(4): 577–602.
sional: The profitable partnership of amateur Maina, G. and Zecca, F. (2016) ‘Harder than
porn and celebrity culture’, in S. Paasonen, fiction: The stylistic model of gonzo pornog-
K. Nikunen and L. Saarenmaa (eds), Pornifi- raphy’, Porn Studies, 3(4): 337–50.
cation: Sex and Sexuality in Media Culture. Marwick, A. (2013) Status Update: Celebrity,
Oxford: Berg. pp. 99–111. Publicity, and Branding in the Social Media
Hardy, S. (2009) ‘The new pornographies: Repre- Age. New Haven: Yale University Press.
sentation or reality?’, in F. Attwood (ed.), Mash, T. (2004) ‘My year in smut: Inside Danni’s
Mainstreaming Sex: The Sexualization of Hard Drive’, in D.D. Waskul (ed.), Net.Sexxx:
Western Culture. London: I.B.Tauris. pp. 3–18. Readings on Sex, Pornography, and the Inter-
Hilderbrand, L. (2009) Inherent Vice: Bootleg net. New York: Peter Lang. pp. 237–58.
Histories of Videotape and Copyright. McNair, B. (2002) Striptease Culture: Sex,
Durham: Duke University Press. Media and the Democratization of Desire.
Hillis, K. (2010) ‘Historicizing webcam culture: New York: Routledge.
The telefetish as virtual object’, in N. Brügger McNair, B. (2013) Porno? Chic! How Pornogra-
(ed.), Web History. New York: Peter Lang. phy Changed the World and Made it a Better
pp. 137–54. Place. London: Routledge.
ONLINE PORNOGRAPHY 563

Metz, C. (2015) ‘The porn business isn’t any- Patterson, Z. (2004) ‘Going on-line: Consum-
thing like you think it is’, Wired, 15 October. ing pornography in the digital era’, in L. Wil-
(https://www.wired.com/2015/10/ liams (ed.), Porn Studies. Durham: Duke
the-porn-business-isnt-anything-like-you- University Press. pp. 104–23.
think-it-is/) Perdue, L. (2002) Erotica Biz: How Sex Shaped
Monnlos, K. (2014) ‘Inside Pornhub’s crusade the Internet. New York: Writers Club Press.
to tear down the taboos of watching sex Rosen, D. (2013) ‘Is the Internet killing the porn
online?’, Adweek, 18 December. (http:// industry?’, Salon, 30 May. (http://www.salon.
www.adweek.com/brand-marketing/ com/2013/05/30/is_success_killing_
inside-pornhubs-crusade-tear-down-taboos- the_porn_industry_partner/)
watching-sex-online-161910/) Senft, T.M. (2008) CamGirls: Celebrity & Com-
Moye, D. (2013) ‘Porn industry in decline’, The munity in the Age of Social Networks. New
Huffington Post, 19 January. (http://www. York: Peter Lang.
huffingtonpost.com/2013/01/19/porn- Shah, N. (2007) ‘PlayBlog: Pornography, per-
industry-in-decline_n_2460799.html) formance and cyberspace’, in K. Jacobs,
Mulholland, M. (2013) Young People and Por- M. Janssen and M. Pasquinelli (eds), C’Lick
nography: Negotiating Pornification. New Me: A Netporn Studies Reader. Amsterdam:
York: Palgrave. Institute of Network Cultures. pp. 31–44.
Paasonen, S. (2010a) ‘Good amateurs: Erotica Slater, D. (1998) ‘Trading sexpics on IRC:
writing and notions of quality’, in F. Attwood Embodiment and authenticity on the Inter-
(ed.), Porn.com: Making Sense of Online Por- net’, Body & Society, 4(4): 91–117.
nography. New York: Peter Lang. pp. 138–54. Smith, C. (2010) ‘Pornographication: A dis-
Paasonen, S. (2010b) ‘Labors of love: Netporn, course for all seasons’, International Jour-
Web 2.0, and the meanings of amateurism’, nal of Media and Cultural Politics, 6(1):
New Media and Society, 12(8): 1297–312. 103–8.
Paasonen, S. (2011) Carnal Resonance: Affect Spink, A., Partridge, H. and Jansen, B.J. (2006)
and Online Pornography. Cambridge: MIT ‘Sexual and pornographic Web searching:
Press. Trend analysis’, First Monday, 11(9). (http://
Paasonen, S. (2016) ‘Pornification and the www.firstmonday.org/ojs/index.php/fm/
mainstreaming of sex’, in N. Rafter (ed.), article/view/1391/1309)
Oxford Encyclopedia of Criminology. Oxford: Stella, R. (2016) ‘The amateur roots of gonzo
Oxford University Press. (http://criminology. pornography’, Porn Studies, 3(4): 351–61.
oxfordre.com/view/10.1093/acrefore/9780 Stroud, S.R. (2014) ‘The dark side of the online
1 9 0 2 6 4 0 7 9 . 0 0 1 . 0 0 0 1 / self: A pragmatist critique of the growing
acrefore-9780190264079-e-159.) plague of revenge porn’, Journal of Mass
Paasonen, S., Nikunen, K. and Saarenmaa, L. Media Ethics, 29(3): 168–83.
(2007) ‘Pornification and the education of Williams, L. (1989) Hard Core: Power, Pleasure,
desire’, in S. Paasonen, K. Nikunen and L. and the ‘Frenzy of the Visible’. Berkeley: Uni-
Saarenmaa (eds), Pornification: Sex and Sexual- versity of California Press.
ity in Media Culture. Berg, Oxford. pp. 1–20. Zhang, X. (2010) ‘Charging children with child
Paasonen, S., Jarrett, K. and Light, B. (forth- pornography: Using the legal system to
coming) Not Safe for Work: Sex, Humor and handle the problem of “sexting”’, Computer
Risk in Social Media. Cambridge: MIT Press. Law & Security Review, 26(3): 251–9.
38
Spam
Finn Brunton

THE COUNTERHISTORY OF THE WEB what the rules are, and who’s in charge. And
the first conversation, over and over again:
This chapter builds on a larger argument what exactly is ‘spam?’. Briefly looking at
about the history of the Internet, and makes how this question got answered will bring us
the case that this argument has something to the Web and what made it different.
useful to say about the Web; and, likewise, Before the Web, before the formalization
that the Web has something useful to say of the Internet, before Minitel and Prestel
about the argument, expressing an aspect of and America Online, there were graduate stu-
what is distinctive about the Web as a tech- dents in basements, typing on terminals that
nology. The larger argument is this: spam connected to remote machines somewhere
provides another history of the Internet, a out in the night (the night because comput-
shadow history. In fact, following the history ers, of course, were for big, expensive, labor-
of ‘spam’, in all its different meanings and intensive projects during the day – if you, a
across different networks and platforms student, could get an account for access at all
(ARPANET and Usenet, the Internet, email, it was probably for the 3 a.m. slot). Students
the Web, user-generated content, comments, wrote programs, created games, traded mes-
search engines, texting, and so on), lets us sages, and played pranks and tricks on each
tell the history of the Internet itself entirely other. Being nerds of the sort that would stay
through what its architects and inhabitants up overnight to get a few hours of computer
sought to exclude. Identifying and describing access, they shared a love of things like sci-
spam, from the terminals of time-shared ence fiction and the absurd comedy of Monty
mainframes in the 1970s to the elaborate Python’s Flying Circus. Alone at the termi-
automated filtering systems of Gmail, meant nals, together on the network, they would vol-
having to talk about what the network is for, ley lines from Python sketches back and forth
SPAM 565

– the dead parrot sketch, the dirty fork sketch, – for instance (Larson, 1994).) The lottery
the spam sketch. This last was particularly was an initiative to simplify and speed up the
popular because most of the dialogue was process of getting papers to live and work in
just the repetition of ‘spam’, whether sung by the United States as a foreign national: a sub-
Vikings or shouted by the waitress, and it was ject of obviously narrow interest, which they
therefore trivial to generate. You could write had broadcast to computers from Singapore
a simple program that, at the right spot in the to Australia to the Netherlands – and, of
dialogue, would post ‘SPAM! SPAM! SPAM! course, throughout the United States, where
SPAM! SPAM! SPAM! SPAM! SPAM!’ over the vast majority of recipients were citizens
and over, relentlessly and without pause, fill- already. To enter the lottery, furthermore,
ing the screen, killing the discussion, and only required sending in a postcard, but
often overloading the chat platform com- the message suggested that paid legal help
pletely, kicking people offline. Jussi Parikka would be needed – a sleazy commercial
and Tony Sampson (2009), in the context of misrepresentation. Abuse of the network’s
their larger analysis of spam, have shown that many-to-many tools and global reach
the sketch itself is built around a communi- had been combined with a moneymaking
cations breakdown: the point where noise on scheme.
the line overwhelms any particular signal. It In the process of trying to describe this
was annoying, but playful and mischievous event, the global community on Usenet set-
rather than malign, like unexpectedly blow- tled on spam as the term of art for their mes-
ing a vuvuzela in the middle of a conversa- sage and their action: the American lawyers
tion. This kind of noisy, frustrating behavior had spammed the network. The word had
was dubbed ‘spamming’. jumped into the domain in which we identify
The term came in useful in the ensuing today … or rather, it had come closer to our
decade-plus – though not with reference to current understanding, in a way that is inti-
advertising or commercial messages, which mately intertwined with the development of
were their own category of etiquette viola- the Web.
tion. ‘Spamming’ remained the domain of After assembling the whole history of
noise and the indiscriminate, wasting time, spam, from an anti-Vietnam War message dis-
attention, and bandwidth on redundant copies tributed on MIT’s early time-sharing system
of messages, on overly verbose and off-topic in 1971, to a Digital Equipment Corporation
postings, on tedious rants and cut-and-pasted ad on ARPANET in 1977, to present-day
slabs of text. Dave Hayes, prominent on the phishing, comment spam, ‘spammy’ posts
discussion system Usenet, wrote a mani- and social network activity, clickbait and
festo in 1997 itemizing the forms of social 419 (‘Nigerian price’) emails, I arrived at a
misbehavior online – with ‘commercial self- working definition. What is ‘spam’? Spam is
promotion’ as a separate entry from ‘SPAM’, the manipulation of information technology
which meant precisely being a high-noise infrastructure to exploit existing aggrega-
low-signal attention hog over a precious tions of human attention. That is the meaning
and expensive medium (Hayes, 1996). This of ‘spam’ once all the technological particu-
definition shifted irrevocably in the spring of lars of search engine spamming or phishing
1994, when two lawyers from Arizona posted campaigns have been worn away: follow-
a message across Usenet – that is, to com- ing the term’s broad application across the
puters around the world, to many thousands decades – both in English and as a loanword
of users, indiscriminately – offering their in other languages on the Internet – includes
services with the process of entering the US commercial and noncommercial activities,
Green Card lottery. (We know this event best criminal and legitimate, with many different
through the responses that quote it at the time technologies and platforms, from email to
566 THE SAGE HANDBOOK OF WEB HISTORY

Twitter to content production. What remains with their lame message, treating the whole
consistent, I argue, is the model: spam- network as a passive audience whose time
mers identify already existing collections of was theirs to spend. (Some of the complex
human attention, and imitate and manipulate distinctions inherent in spam, ‘trash’, ‘junk’,
their particular properties to extract value. I and ‘waste’ are considered in the analysis
want to say a few more words explaining this collected in Parikka and Sampson (2009),
before focusing on the Web in particular. especially Galloway and Thacker’s ‘On
Spam is an information technology phe- Narcolepsy’ (2008), and Gansing (2011).)
nomenon. Across their many modes and Two consequences follow from studying
domains, spammers push the properties of spam in this light. First, we can see the his-
information technology to their extremes: tory of networked computing as a thread in
the capacity for automation, algorithmic the history of the management and distri-
manipulation, and scripting; the leveraging of bution of attention – Alessandro Ludovico
network effects and vast economies of scale; (2005) has argued that spam is best seen as
distributed connectivity and free or very one instance in a long history, from traveling
low-cost participation. So many neglected salesmen to personalized bulk postal mail to
blogs and wikis and other social spaces are eye-catching billboards, of trying to interfere
out there on the Web: automatic bot-posted with our thoughts and provoke us into some
spam comments, one after another, will fill form of consumer desire. Second, we can see
the limits of their server space. What this in concrete terms how the nebulous shape of
means – beyond characterizing spam as an community online used spam to define itself.
activity – is that spammers take advantage The intersection of these two topics brings us
of existing infrastructure in ways that make back to the distinctive history of the Web, the
it difficult to extirpate them without making ways it gathered attention and formed com-
changes for which we would pay a high price. munities, shown to us anew through spam’s
Indeed, in Geert Lovink’s argument (2005), crass, hustling, inventive counterhistory. I
spam is akin to other network failures like will break this counterhistory up into four
identity theft in being inherent in the design – sections, which describe from spam’s side
constitutional elements of yesterday’s net- how the Web became searchable, central, and
work architecture. Spammers partially sur- social.
vive by finding places where the potential
value lost and effort expended in locking-
down could exceed the harm they do – which
reveals those places, and their value, to us. JUNK RESULTS
More exactly, spammers find places where
the open and exploratory infrastructure of the How, though, could spam have come to play
network hosts gatherings of humans, how- a role in the Web? My brief sketch of spam’s
ever indirectly, and where their attention is origins, above, is all social spaces: conversa-
pooled. The use they make of this attention tion threads on Usenet, chat on time-sharing
is exploitative not because they extract some computer networks, and of course email,
value from it but because in doing so they where ‘spam’ as a concept and a business
devalue it for everyone else – that is, in plain scaled up. The Web, though, was a kind of
language, they waste our time for their ben- document navigation system at first – a
efit. Recall the objections against the Green markup language and set of protocols for
Card spam campaign on Usenet: it wasn’t authoring and exploring knowledge through
simply that the lawyers were acting com- hypertext. It was a project suited to a poly-
mercially but that they didn’t respect sali- glot scientific community: developed by an
ence, barraging everyone indiscriminately English computer scientist, revised by a
SPAM 567

Belgian, with the first site built in French, Brin and Page put it, that ‘some advertisers
hosted on a Californian machine in the walls attempt to gain people’s attention by taking
of a Swiss research institute, a knowledge measures meant to mislead automated search
presentation and navigation tool for one of engines. … “Junk results” often wash out any
the twentieth century’s biggest capital-S results that a user is interested in’ (2). Or, as
Scientific communities. It’s a context, and a a paper published on the very same day more
technology, in which you can no more imag- bluntly put it: ‘Some authors have an inter-
ine spam thriving than you can imagine mold est in their page rating well for a great many
growing on a space station. types of query indeed – spamming has come
But mold does in fact grow in outer space, to the web’ (Pringle et al., 1998: 1).
behind panels, on gaskets and insulation, The form that spamming took reflects the
on walls, under clothes. As human atten- unique particulars of the Web and search
tion condensed, collected, and pooled on the technology: it was designed not simply to
Web, from Erwise to ViolaWWW to NCSA dominate a conversation, as in chat, or flood
Mosaic, techniques began appearing to a channel with messages, as with email and
absorb and exploit it. Take a year, from the Usenet, but to make assertions of relevance
middle of 1994 to 1995, when we can see and salience. We can see, through spam’s
many different factors picking up speed: the development, how generations of search
global growth of users away from computer engines tried to model information on the
science professionals to the general popula- Web in terms of what it meant for a user’s
tion; the end of the noncommercial restric- query. The spiders that the first search engines
tions on the network in the United States; the sent out would go through the HTML source
shift in power from sysadmins to lawyers and of a page, using the structure of the markup to
entrepreneurs as social arbiters; and events assess the significance of words with greater
like the Green Card lottery spam on Usenet or lesser degrees of importance and relevance
(with subsequent publicity in newspapers to a search. A word in a URL (for uniform
– the first appearance of ‘spam’ in print – resource locator, the ‘address’ of the page) or
and major advertisers scenting blood in the in the first header tag – which is the markup for
water); Mosaic’s booming download num- what the human reader would see as the ‘title’
bers; and the publication of How to Make a of the page, as in <h1>My Homepage</h1> –
Fortune on the Information Superhighway – a was probably more important than one in
cash-in book by those same Arizona lawyers, the body text of a page and would be rated
promising to teach readers how to market accordingly in the index. A set of elements
across the global network and get rich quick called ‘meta tags’ were used in HTML spe-
by exploiting the technology. cifically for the benefit of search engine spi-
There were 20-odd websites in the fall ders, with keywords listed for the page such
of 1992, 10,000 by the end of summer in that they would be invisible to the human
1995, and millions by mid 1998. There were reader but helpful to search indexing. Helpful
so many sites by then that finding what you in theory, anyway: though meta tag elements
were looking for – even knowing what was were popularized by early search engines
available to be found – was an enormous such as AltaVista and Infoseek, they were so
challenge, eloquently described in a paper aggressively adopted by spammers that meta-
published on April Fool’s Day of 1998: ‘The data was largely ignored by the turn of the
Anatomy of a Large-Scale Hypertextual Web century, with AltaVista abandoning the influ-
Search Engine’, by Sergey Brin and Larry ence of meta tags on search results in 2002:
Page. Others had already tried to solve the ‘the high incidence of keyword repetition and
problem of abundance on the Web by devel- spam made it an unreliable indication of site
oping search engines. The issue was, as content and quality’ (Sullivan, 2002).
568 THE SAGE HANDBOOK OF WEB HISTORY

What precisely was the business plan of Web browser from other platforms, like a
these early search spammers, and what were phone, or a Braille display, making it possi-
they putting in their web pages? Keywords ble to serve a compact page to the phone and
were repeated in the meta tags and gathered text instead of images to the Braille device.
in the page itself, hidden from the casual This means you can serve one page to a spi-
human reader’s eye. One of the details that der, to be indexed and delivered as a search
HTML can specify is the color of text, so result, and an entirely different page to the
the page’s author could set the page’s back- user who clicks on the link. The signatures of
ground to gray and make text the same shade spider requests, which trigger the cloak page,
of gray, invisible on the human reader’s dis- proved very difficult to disguise from spam-
play while appearing to be normal text on mers – which brings us back to Google’s
the page as far as the spider was concerned. embryonic form in 1998: ‘a prototype of a
Innocuous pages with some form of spammy large-scale search engine which makes heavy
intent would have a mysterious gap at the use of the structure present in hypertext’, cre-
bottom of the page. (Such techniques could ated precisely to solve the problem posed by
also be used playfully or prankishly, part of ‘junk results’, spammy web pages – and cre-
the toolkit of the ‘vernacular web’ of bespoke ating in turn a new way for spam to reshape
HTML (Lialina, 2009).) The text on the page the Web (Brin and Page, 1998: 1).
ended and there were no images, just a few
inches of the gray background before the bot-
tom. In that gap, in background-matching
color and often minuscule font size, lay a MUTUAL ADMIRATION SOCIETY
magma flow of obscenity and pornography,
product names, pop stars, distinctive phrases, ‘The citation (link) graph of the web is an
cities and countries, odd terms seemingly important resource that has largely gone
plucked from Tristan Tzara’s hat, selected unused in existing web search engines’,
because they happened to get good returns at wrote Brin and Page. ‘These maps
that time. The text reads as though a Céline allow rapid calculation of a web page’s
character worked for Entertainment Tonight: “PageRank,” an objective measure of its
toyota ireland ladyboy microsoft windows citation importance that corresponds well
hentai pulp fiction slut nirvana. with people’s subjective idea of importance’
Such blocks of text illustrate a recurring (3). Inspired by academic citation structure,
theme in the development of spam on the Web they argued for reputation, essentially treat-
and elsewhere: a matter-of-fact distinction ing links as a measurable expression of
between humans and machines, with differ- social value. They were not the first to do
ent strategies for dealing with each. Almost this – earlier search engine projects, trying
every piece of spam, whether over email or in to beat the keyword-stuffing of the Web’s
the context of spam blogs or comment spam, first spammers, had tried to use numbers of
became biface, capable of being read in two links to roughly evaluate the meaningful-
ways with very different messages for the ness of results. In response, spammers had
algorithm and for the human. (We will return started link farms, pages of nothing but links
to this distinction and its consequences for between spam sites, providing a cheap-and-
the Web at the end.) Spamming the early Web easy boost to that metric. Part of Google’s
exacerbated this process, with techniques like brilliance lay in the flaw in this strategy:
‘cloaking’. Search engine spiders identify spam pages are lonely. They may link to
themselves by the way in which they request thousands of other sites, but the only
a page. This identification is part of the set inbound links, as a rule, come from other
of protocols that help to distinguish a normal spam pages.
SPAM 569

Links, in theory, carry an implicit endorse- among ordinary users of the Web: a com-
ment, a vote of relevance made by a person. ment in a blog post included the comment-
The spam-fighting question is: who is the ers’ websites along with their names, to rack
person, and how much does their endorse- up another link. Posting something without
ment count for? Google’s PageRank equation including a ‘via’ link to the person you got it
answered those questions with the behavior from – the ‘via’ being an additional outbound
of the ‘random surfer’, an abstracted user of link as a kind of thanks for using their dis-
the late-1990s Web. This rather depressing covery – became increasingly rude, the sign
model of a person starts on ‘a web page at of an uncouth person. ‘Mutual admiration
random and keeps clicking on links, never societies’ arose, huge link-heavy sets of sites,
hitting “back,” but eventually gets bored and each page linking to many of the others –
starts on another random page’ (4). The like- all sites kept carefully unspammy, maintain-
lihood that this idle character, clicking ever ing the pretense of legitimate use of the Web.
forward along the link graph, will land on Their business was not to produce spam sites
a given page defines PageRank. This means themselves, but to charge for outbound links
that other sites linking to your site matters from the society. They were renting out their
– as does which sites link to those that link accumulated ‘votes’. But even those had a
to you. It’s a reputational model that works characteristic shape: heavy cross-linking
transitively, with links weighted differently within a group of sites, all with only a few
by their significance: ‘pages that have per- inbound links (because spam pages are lone-
haps only one citation from something like some), creating little islands of intense self-
the Yahoo! homepage are also generally endorsement with no outside involvement.
worth looking at’ (Yahoo! being, at the time, To analytic tools, it’s a pattern as obvious as
a directory of human-curated significant the newspaper ads taken out by vanity pub-
links.) What were spammers to do? Building lishing houses for their new releases with the
on the algorithmic inference of social data, blurbs from friends and family – and easy for
Google could make it ‘nearly impossible to Google to discount accordingly.
deliberately mislead the system’. The only In 1999, a company called Pyra Labs
workaround for spammers would be to build launched a service called Blogger. (Google
their own artificial societies. would buy it in 2003.) Blogger’s goal, as
A variety of strategies developed as of so many related systems, from Flickr to
Google’s market share grew and other search Wikipedia, was to provide people with an
engines around the world developed similar intuitive means for publishing their content
models. Websites with a high PageRank were on the Web. It was remotely hosted, so you
transformed into kingmakers. A link from did not have to own a website domain name
them could move a site onto the first page or or pay for hosting; many of its processes
top three returns of the different search sites, were automated, so you did not have to
boosting attention and revenue. Sites took design it or do any coding behind the scenes;
advantage of preexisting ideas for the human- and it had a useful and increasingly sophis-
curated Web, like ‘Best of the Web’ awards, ticated Application Programming Interface
‘Top 100 Sites’ awards, and so forth; these (API) for connecting with other Web appli-
awards included a badge, a little image, and cations and automating processes. With the
a snippet of code to be copied into the win- boom in weblog popularity and the peculiar
ning site – a snippet that included a link to the chronological publishing model of blogs,
award-giving site. The human user saw a lit- came another three-letter acronym, RSS,
tle badge image, but the search engine spider ‘Really Simple Syndication’, which makes
saw an outgoing link: a digital endorsement. new posts or other changes on a site available
New habits of use and etiquette appeared in forms that are easy to use. (For the sake
570 THE SAGE HANDBOOK OF WEB HISTORY

of Web history completeness: RSS originally search engine spiders. The splogs only work
stood for ‘RDF Site Summary’, which high- from a distance, appearing to be groups of
lights its relationship with the history of Web people, the language and links functioning
document formatting – but was retroactively in aggregate. Taken in statistical total and
changed to the more straightforward mean- algorithmic analysis, splogs resemble the
ing.) Feed readers can gather the latest entries patterns of a thriving community. Their posts
from RSS-enabled sites (which blogs soon are pitched at precisely the level of complex-
were by default), material can be forwarded ity the spider requires to accept their input
to mobile devices, and a page can feature the as human, and they adapt human text for
headlines or recent posts from other sites. other machines to read and act on; affecting
What does this have to do with Web spam? humans happens only indirectly – boosting
Consider the toolkit laid out by these devel- the search ranking of a spammy appliance
opments: a content publishing system that review site, for instance, that makes money
can be easily automated (new accounts, posts through ads and affiliate links, or leading a
on a prearranged schedule, modified set- human searcher into a fraudulent destina-
tings), without the detailed work and paper tion, whether a simple rip-off with ads and
trail of registering domain names and paying no meaningful content, or a site that might
for Web hosting, and – with RSS – a faucet of middleman a transaction, tacking on an addi-
other people’s words, content that looked real tional fee or trying to force-download some
and human because it actually was, unlike a adware.
lot of spam production. Hooked together with This section opened with ‘Google’, a
the right software tools, you can generate a new-minted concept for a ranking algorithm,
new kind of mutual admiration and endorse- and ends with Google, a massively success-
ment society with a network of spam blogs ful advertising company that runs a search
– or ‘splogs’. engine. One question raised by this chroni-
A splog production system will pull in RSS cle of spam’s relationship to the Web and
feeds from other blogs and news sources, chop Web search is how complicit, or symbiotic,
them up and remix them, insert relevant links, Google is with its own antagonist. Consider
and post the resulting material, hour after splogs: built and hosted on a platform Google
hour and day after day, with minimal human owns, using text for content drawn from
supervision. You can turn the machine on and other sites hosted by Google, optimized to
leave the room while it makes money for you. best fit Google’s search engine algorithms,
With contextual advertising (including ads as to boost the results for Google searches for
a launching point for browser malware) you sites that make money by hosting ads served
can make money through pageviews and the through Google’s affiliate advertising pro-
occasional click by running ‘excerpt model’ gram (which, of course, also makes money
splogs, with fragments taken from other peo- for Google). Search engine spammers run-
ple’s posts that are polling particularly well in ning their vast stables of spam blogs and sites
Google’s keyword metrics. A more ambitious are not anomalous, quantitatively or quali-
system is ‘full content’ splogs, cross-linking tatively; splogs now account for more than
in their hundreds and thousands to distort the half of the total number of all blogs (Fetterly
shape of the Web. Each splog is assigned a et al., 2004). They are the optimal users,
set of keywords and feeds from which to pull from Google’s perspective, constructing a
related text, and in turn links to other splogs, system in which all the extraneous matter
which link to still more, forming an insular of people and conversation has been pruned
community on a huge range of sites – a kind away in favor of the automation of content
of PageRank greenhouse that is not in itself production, search results, clicks, and ads
meant to be read by people, but solely by served. This system in turn puts Google in the
SPAM 571

contradictory position of having to analyze schedule an appointment with the lawyers,


and expel many of their most dedicated cus- just as, with much of the spam in the years
tomers: those who overexploit, and acciden- following 1994, you could actually purchase
tally overexpose, the financial and attention the quack weight-loss pills, the deadstock
economies and technologies that underlie the toys, the counterfeit watches. Spam – spam
contemporary Web. Google is hardly alone in of this era and this meaning – was loathed
this problem, as we will see. and despised, but it was also still somewhat
legitimate, if only by accident. It thrived in
the regulatory shadow of direct mail market-
ing, a powerful and moneyed interest that
THE LANDING PAGE didn’t want a legal precedent set that could
close off a future advertising venue, and in
Another shift in the dynamics of spam was the novelty of the increasingly popular and
developing, meanwhile, which reflects the commercial Web, growing faster than legis-
Web’s role as what Christian Sandvig (2013) lation and defensive software could keep up.
calls an ‘emerging essential’. It had been International guides to legal redress for spam-
something that ran on infrastructure – ming sprang up, their constantly updated con-
running on top of the Internet, which ran in turn fusion of potential laws – from CAN-SPAM
on top of the infrastructure of telephony – in the United States to economic crime units
which became infrastructure itself, in the in Norway to Canada’s Department of Justice
negative sense defined by Paul Edwards task force on pyramid schemes – highlighting
(2003: 187), ‘those systems without which the problem of figuring out where crimi-
contemporary societies cannot function’. The nal lines were crossed (for example, Hollis,
Web became a key component for banking, 2005). Many spammers were able to present
health care, work and administration, and themselves as brashly inventive promoters,
content creation and consumption, with with postal addresses and registered trade-
browser standards and shared protocols marks, seeking recognition in the classic tra-
becoming matters of urgent negotiation, dition of entrepreneurial hustlers. What they
monopolistic strategy, and even public safety produced is still what many people think of
(as in the push to standards like HTTPS) – when they think of spam: the enthusiastic
with international implications from the top- pitches full of mangled grammar and implau-
level domains issued within the United States sible stock photography, in the service of a
to legal decisions on hate speech in the EU recognizable, even traditional, class of dubi-
(Goldsmith and Wu, 2008). In other words, it ous pleasures from timeshares and self-help
was not simply an aggregation of human books to diets and pornography.
attention around documents and content, But as the Web became an infrastructural
navigated through search and hyperlinks, but system – and spam, for reasons too complex
a portal into many vulnerable and intimate to go into here, became a progressively more
parts of our working and personal lives. To embattled industry with less easy money – a
understand how spam shifted accordingly, it new predatory spam technique took shape,
helps to briefly look back for the last time to to trick humans rather than machines. As far
that Green Card lottery message in 1994. back as the mid 1990s, hackers and spam-
What the lawyer-spammers were offering mers alike had been finding ways to fool
was, technically, an actual service (a mislead- people into giving up their login informa-
ing, borderline fraudulent one, but let that tion. Initially, the goal was to send spam
pass), whereas much of the spam we receive from trustworthy-looking accounts, or
now has a very different agenda. You could within closed networks (whose members
call the real, working telephone number and were often more naïve and easier to exploit).
572 THE SAGE HANDBOOK OF WEB HISTORY

In early 1996, in the Usenet newsgroup for botnet machines can run semi-autonomously,
the hacker magazine 2600, the term for this receiving command-and-control instructions
technique makes its first appearance: ‘phish- for new spam campaigns and then spewing
ing’ (‘mk590’, 1996). out messages in the millions, adjusting each
A representative spam business in the mid one individually (‘per-message polymor-
1990s – who spelled it ‘fishing’ – used a simple phism’) and shifting strategy if they receive
ASCII picture, ‘<><’, to note AOL accounts an unusually high number of rejections
they’d captured to deluge people within the (Kreibich et al., 2008). In fact, a unique, tar-
AOL network with spam ads (Brunton, 2013: geted ‘spear-phishing’ attack, like the one that
76). That was small change, though, com- got John Podesta’s Gmail login and disrupted
pared with the uses to which phishing would the 2016 American election, is a flashback to
be put. Targeted phishing messages used a more artisanal, personal time in the Web. It
the same biface aspect of HTML – one side had a carefully designed trick URL (‘myac-
of the markup visible to people, the other count.google.com-­s ecuritysettingpage.
for machines – to send email purporting to tk’) and a beautiful HTML email and login
be from a bank, a credit card company, an landing page, both mimicking Google’s
email provider, or an employer, with a link style to the pixel. (Similar care was taken
whose innocuous text (‘To resolve this block with attempts to get internal email from the
on your credit card, click here’) disguises a campaign of Emmanuel Macron in France –
suspicious URL (mastercard.l337haxx0r. a tactic now so common that his staff pre-
ru, or whatever). Following the link reveals pared for it in advance.) That kind of attack
a careful – or sometimes not-so-careful – is now the exception rather than the rule, the
counterfeit of the original ‘landing page’ for human touch for high-value targets. When
the legitimate site. Careful study of spammer we encounter a spam comment, a spam blog,
landing pages reveals how much HTML and a spam email, a message on Twitter @’d to
CSS – the markup and styling vocabulary of you by a bikini avatar with a high-entropy
early Web design – could convey the ‘real- name, we are very likely the first people to
ness’ of a particular online destination. Some have ever seen it. It is the product of layers
crudely copy-and-pasted the HTML avail- of wholly computational work, for which the
able through ‘view source’ commands for the humans merely set the parameters, assembled
sites they were pirating; others, faced with and passed around the world on a chain of
better-protected sites, reverse-engineered the mechanical writers and readers. This brings
design of their counterfeit, replicating color us to the last chapter of the Web’s counterhis-
schemes, trying to duplicate the placement of tory, the closing act of those linkfarms and
images, and lifting or in some cases amus- mutual admiration societies: the rise of the
ingly improvising the text. post-human social Web.
Phishing sites are now often hosted on
the compromised computers or servers that
make up ‘botnets’, networks of many thou-
sands of machines under the remote con- WE STILL BELIEVE THERE IS HUMAN
trol of the spammer. These botnets generate INVOLVEMENT
the great bulk of the spam that we encoun-
ter (and much that we don’t – spam can be The Web had always been social, of course,
upwards of 85% of all email on the Internet alight with cultures of linking, authoring,
at peak times, most of it stopped by filters sharing, and citing, with forums, boards,
well before a person receives it (Messaging comments, and ‘virtual communities’. But as
Anti-Abuse Working Group, 2011)). That a matter of terminology, the Web became
verb, ‘generate’, is carefully chosen here: the ‘social’ after it became searchable and
SPAM 573

increasingly central, when Friendster, pages and YouTube accounts, serving up


MySpace, Facebook, Orkut, LinkedIn, Bebo, porn links and browser exploits and selling
Sina Weibo, Twitter, the Marie Celeste ghost armies of Twitter followers, blocks of tens of
ship that is Google+, and a million other thousands of Facebook ‘likes’ and YouTube
social networks came to dominate much of views and upvotes. A huge portion of human
the experience and use of the Web. This was time on the Web became devoted to interact-
an aggregation, a pooling, of human attention ing with and producing content that sought
on a scale beyond a spammer’s wildest the illusion of salience through popularity –
dreams. and if there was one thing spammers were
Furthermore, it was a model for aggre- good at, it was producing exactly that illu-
gating human attention to which spamming sion. In a sweatshop model that recalls the
came quite naturally. It was – and is – a ter- early days of email spamming, employees
rain dominated by clickbait and linkbait, of ‘likefarming’ firms will ‘like’ a particular
by eyeball-grabbing fake news (a technique brand or product for a fee. The going rate is
pioneered by spam emails with links that a few US dollars for 1,000 likes (Schneider,
launched malware downloads) and you’ll- 2004). Performed in narrowly focused bursts
never-guess headlines bannered over the thin- of activity devoted to liking one thing or one
nest content, by the endlessly refilled candy family of things, from accounts that do little
bowl of meme culture, and advertisements else, this tactic is easy to spot, so they have to
indistinguishable from old-school spam generate the appearance of casual use. They
come-ons (weight loss, penile enlargement, do this by liking pages recently added to the
predatory home-loan scams, and other drag- feed of Page Suggestions, which Facebook
nets for dim fish). Even the legitimate human promotes according to its model of the user’s
users developed spam-like approaches to their interests – they behave, in other words, as
activity. Merlin Mann, a bemused witness to ideal Facebook citizens, heavy users con-
the dot-com scene, dubbed this activity on stantly clicking the thumbs-up and endorsing
Twitter personality spamming, the work of whatever Facebook’s recommendation algo-
arrogating attention for oneself, using social rithm thinks they will endorse.
media to build an audience – often a very Twitter, likewise, has an enormous bot
carefully quantified audience of ‘followers’ problem. It must regularly conduct sweeps
and ‘rebloggers’ – rather than a network to purge the bot accounts from its ranks,
of friends. It is the socially acceptable but but the bots follow paying users in packs of
aggressively eyeball-hungry work of those thousands to make them look important and
who would be, or act like, celebrities, ‘influ- popular, as well as random humans to create
encers’, or ‘thought leaders’. From the Web the illusion of normalcy for their other activi-
2.0 status culture analyzed by Alice Marwick ties. A too-successful purge is rewarded with
(2013) to Whitney Phillips’s media-savvy outrage as users see their follower numbers
trolls (2015), to Limor Shifman’s circulating plummet (and Twitter as a company sees its
memes (2013) and Sarah Jeong’s ‘Internet of value drop, likewise, as the pool of active
Garbage’ (2015), studies of the social Web users shrinks). The same is true of buying
capture how difficult it can be to distinguish views for your YouTube video, listens for
from what its own users call spam. your song on Soundcloud, or clicks on your
Spam was a natural fit for this set of plat- ads. In a Web 2.0 version of Goodhart’s
forms and practices – so much so that it Law (‘When a measure becomes a target, it
produced a similar paradox to that faced by ceases to be a good measure’, or, ‘What gets
Google, for which its most optimal custom- evaluated, gets gamed’), any metric meant to
ers were spammers. Spammers jumped into describe human interest, esteem, or attention
creating fake Twitter accounts and Facebook more generally will spur the development
574 THE SAGE HANDBOOK OF WEB HISTORY

of customized code to take it over for pay whose work makes regular data entry look
(Goodhart, 1981: 116). All of these vast social exceedingly pleasant by comparison, are
Web platforms have created models in which essentially being paid to be human – to exhibit
the spammers boost the metrics in exactly the a theoretically solely human characteristic.
way they’re supposed to be boosted – just In that labor, and in the statement ‘We still
not as legitimate human users. (What these believe there is human involvement’, we can
models say about the expectations for the see a Web increasingly and finally dominated
humans, those hoof-clicking attention cattle, by the activity and content not of humans but
is left to the reader.) of software, and humans directly responding
I would like to close with a final prob- to and directed by software. As Ben Light
lem, a ubiquitous mark on the structure (2016) points out, human agency was always
of the Web left by spam, one that points the centerpiece of Web 2.0 – but, observed
towards the future: the humble CAPTCHA. more closely, it’s clear we miss the real story
The CAPTCHA system – the deformed let- of the contemporary Web if we fail to account
ters on weird backgrounds that only humans for all the nonhuman agency and activity on
can read, in theory, to verify their non-bot it. This is an appropriately grim and para-
status – is meant to block automated posting, doxical note for the close of this counterhis-
commenting, and account-creation tools, key tory of the Web: with Twitter accounts made
components in the contemporary spammer by humans solving problems for machines,
arsenal. CAPTCHAs make it harder to start to provide other humans with the illusion of
new Blogger blogs or open more free email social activity, of ‘Web presence’.
accounts, and spammers have been working We have followed the work of drawing the
assiduously on different fronts to overcome line defining spam and non-spam through
them. In May 2008, the security company the history of the Web, from links and sites
Websense documented a series of attacks to blogs and search to the exquisite fakes
on the account-creation process of email that mislead users of the Web’s infrastruc-
services. Many requests for accounts kept ture. Throughout, I have argued that the act
hitting the CAPTCHA stage, and most, but of calling something ‘spam’ tells as much
not all, failed (Whoriskey, 2008). The pace about what is being excluded as what is being
(replies in six seconds) and the failure rate identified: junk results and salient searches,
(nine to one) suggested that computers were personality spamming and meaningful social
doing the solving. ‘We still believe there is content, fake-out landing pages and the real
human involvement’, said the company’s thing, abusive advertising and legitimate
statement. Botnet attacks on text recogni- applications. I hope I have also conveyed
tion have improved enough since then that how blurry those categories can be. As we
new forms of CAPTCHAs rely on identify- follow the movement of the word and the
ing somewhat ambiguous visual informa- systems to which it is applied, spam exposes
tion, like picking storefronts out of a set of how vague and tricky the distinctions can be,
pictures of buildings. To solve this, spam- and will continue to be. Spam exposes the
mers have turned to automating humans with failures and holes in models: how relevance
services like Captcha King, which retrieves is calculated, what a valuable collection of
the CAPTCHA images from things like the interlinked Web pages looks like, how people
Twitter account-creation process for manual understand the pages they see and how their
entry. An outsourced staff sits there all day machines construe them, what constitutes
banging out CAPTCHAs, with a guaran- approval, interaction, relationships, society,
teed ‘success rate of 95% with a response even humanness. In some of the cases I’ve
time of less than 90 seconds’ (Krebs, 2012; described here, spam indicts the very systems
Motoyama et al., 2010). Those poor souls, it exploits; it produces optimal users, pushing
SPAM 575

business models to their logical extreme. Goodhart, C. (1981) ‘Problems of Monetary


Following all these developments provides Management: The U.K. Experience’, in
a different history of the Web – searchable, Anthony S. Courakis (Ed.), Inflation, Depres-
central, and social – through the ways each sion, and Economic Policy in the West (pp.
development created communities, attention, 111–146). Lanham, MD: Rowman &
Littlefield.
and the possibility of their own failure.
Hayes, D. (1996) ‘An Alternative Primer on Net
Which brings us back to the present and Abuse, Free Speech, and Usenet’ (http://
future of the Web, seen from spam’s point www.jetcafe.org/dave/usenet/freedom.
of view: in which the humans involved have html).
never been less important, mere fodder for Hollis, K. (2005) ‘alt.spam FAQ or “Figuring out
content production and analytic stats, under Fake E-Mail & Posts”’ (Rev. 20050130, 30
the watchful eye of the platforms. The end of January 2005) (http://digital.net/~gandalf/
the line. spamfaq.html).
Jeong, S. (2015) The Internet of Garbage. New
York: Forbes Media.
Krebs, B. (2012) ‘Virtual Sweatshops Defeat Bot-
REFERENCES or-Not Tests’ (Krebs on Security, 9 January
2012) (http://krebsonsecurity.com/2012/01/
Brin, S., and Page, L. (1998) ‘The Anatomy of a virtual-sweatshops-defeat-bot-or-not-tests/).
Large-Scale Hypertextual Web Search Kreibich, C., Kanich, C., Levchenko, K., Enright,
Engine’, Computer Networks & ISDN Sys- B., Voelker, G., Paxson, V., and Savage, S.
tems 30(1–7): 107–117. (2008) ‘On the Spam Campaign Trail’, Pro-
Brunton, F. (2013) Spam: A Shadow History of ceedings of the 1st Usenix Workshop on
the Internet. Cambridge, MA: MIT Press. Large-Scale Exploits and Emergent Threats
Edwards, P. (2003) ‘Infrastructure and Moder- (http://cseweb.ucsd.edu/~savage/papers/
nity: Force, Time, and Social Organization in LEETStormspam08.pdf).
the History of Sociotechnical Systems’, in Larson, W.L. (1994) ‘Re: Green Card Lottery –
Thomas Misa, Philip Bray, and Andrew Feen- Final One?’ in news.admin.policy, 12 April
berg (Eds.), Modernity and Technology 1994.
(pp. 185–225). Cambridge: MIT Press. Lialina, O. (2009) ‘A Vernacular Web 2’, in Olia
Fetterly, D., Manasse, M., and Najork, M. Lialina and Dragan Espenschied (Eds.), Digi-
(2004) ‘Spam, Damn Spam, and Statistics: tal Folklore Reader (pp. 58–69). Stuttgart:
Using Statistical Analysis to Locate Spam Merz Akademie.
Web Pages’, Proceedings of the 7th Interna- Light, B. (2016) ‘The Rise of Speculative
tional Workshop on the Web and Databases Devices: Hooking Up with the Bots of Ashley
67 (2004): 1–6. Madison’, First Monday, 21(6) (http://first-
Galloway, A.R., and Thacker, E. (2008) ‘On monday.org/ojs/index.php/fm/article/
Narcolepsy’, in Jussi Parikka and Tony D. view/6426/5525).
Sampson (Eds.), The Spam Book: On Viruses, Lovink, G. (2005) ‘The Principle of Notworking
Porn, and Other Anomalies from the Dark (Concepts in Critical Internet Culture)’ (lec-
Side of Digital Culture (pp. 251–263). ture, Hogeschool van Amsterdam), 24 Febru-
Cresskill, NJ: Hampton Press. ary 2005.
Gansing, K. (2011) ‘Spamculture: The Informa- Ludovico, A. (2005) ‘Spam, the Economy of
tional Politics of Functional Trash’, in Miyase Desire’ (Neural.it, 1 December 2005) (http://
Christensen, André Jansson, and Christian www.neural.it/art/2005/12/spam_the_
Christensen (Eds.), Online Territories: Glo- economy_of_desire.phtml).
balization, Mediated Practice and Social Marwick, A. (2013) Status Update: Celebrity,
Space (pp. 89–109). New York: Peter Lang. Publicity, and Branding in the Social Media
Goldsmith, J., and Wu, T. (2008) Who Controls Age. New Haven: Yale University Press.
the Internet?: Illusions of a Borderless World. Messaging Anti-Abuse Working Group (2011)
Oxford: Oxford University Press. ‘Email Metrics Program: The Network
576 THE SAGE HANDBOOK OF WEB HISTORY

Operators’ Perspective. Report #15 – First, Pringle, G., Allison, L., and Dowe, D. (1998)
Second and Third Quarter 2011’ (http://www. ‘What Is a Tall Poppy among Web Pages?’,
maawg.org/sites/maawg/files/news/MAAWG Computer Networks & ISDN Systems 30(1–
_2011_Q1Q2Q3_Metrics_Report_15.pdf). 7): 369–377.
‘mk590’ (1996) ‘AOL for Free?’ in alt.2600, 28 Sandvig, C. (2013) ‘The Internet as Infrastruc-
January 1996. ture’, in William Dutton (Ed.), The Oxford
Motoyama, M., Levchenko, K., Kanich, C., Handbook of Internet Studies. Oxford:
McCoy, D., Voelker, G., and Savage, S. Oxford University Press.
(2010) ‘Re:CAPTCHAs – Understanding Schneider, J. (2004) ‘Likes or Lies? How Per-
CAPTCHA-Solving Services in an Economic fectly Honest Business can be Overrun by
Context’, Proceedings of the USENIX Secu- Facebook Spammers’ (TheNextWeb, 23 Jan-
rity Symposium (August 2010): 435–452. uary 2004) (http://thenext
Parikka, J., and Sampson, T.D. (2009) ‘On web.com/facebook/2014/01/23/
Anomalous Objects of Digital Culture: An likes-lies-perfectly-honest-businesses-can-
Introduction’, in Jussi Parikka and Tony D. overrun-facebook-spammers/).
Sampson (Eds.), The Spam Book: On Viruses, Shifman, L. (2013) Memes in Digital Culture.
Porn, and Other Anomalies from the Dark Cambridge, MA: MIT Press.
Side of Digital Culture (pp. 1–18). Cresskill, Sullivan, D. (2002) ‘Death of a Meta Tag’
NJ: Hampton Press. (Search Engine Watch, 30 September 2002)
Phillips, W. (2015) This Is Why We Can’t Have (http://searchenginewatch.com/
Nice Things: Mapping the Relationship article/2066825/Death-Of-A-Meta-Tag).
between Online Trolling and Mainstream Whoriskey, P. (2008) ‘Digital Deception’, Wash-
Culture. Cambridge, MA: MIT Press. ington Post, 1 May 2008.
39
Trolls and Trolling History: From
Subculture to Mainstream
Practices
Michael Nycyk

INTRODUCTION (1989–2005) and Web 2.0 (2004 to present).


It begins by defining trolls, trolling and the
The internet and its component the World Wide technologies where it takes place. Stories
Web (web) brings many benefits to human- of trolling discussed demonstrate its spread,
kind, such as communication and information development and sophistication, seen espe-
exchanges, yet humans have struggled with the cially in the organized trolling groups that
freedoms it offers. Trolling, cyberbullying, have caused harm to internet users. The con-
hacking and other practices have confronted centration of pre-web Usenet groups in this
internet users, governments and lawmakers account is important because it demonstrated
with the challenges of how to manage them the first accounts of the severity of trolling and
and what measures should be introduced to its consequences. The discussion then moves
punish trolls. This chapter documents the his- to the way in which trolling is supported by the
tory of trolls and trolling, informed by the idea extended reach and ease of use of internet and
that technological developments and increased web platforms, such as social media, and its
access to the internet have supported trolling, prevalence as a mainstream practice is identi-
and that, once a relatively contained activity, fied. The chapter then concludes with sugges-
trolling has spread to become a widely prac- tions of how the future of trolling may look.
tised one. While other reasons account for it,
such as changes in the civility and conduct of
humans in societies, the chapter’s thesis is TROLLS, TROLLING AND TECHNOLOGY:
driven by this underlying concept. This histori- DEFINITIONS AND CONTEXTS
cal account documents this view.
This account’s time period addresses troll- Trolls and trolling, as they are used in inter-
ing from pre-web (before 1989), Web 1.0 net contexts, are associated with stories of
578 THE SAGE HANDBOOK OF WEB HISTORY

ugly, supernatural creatures in Scandinavian shocked by the content, but it is the disrup-
and Norse (Middle Age Icelandic and tion to the internet user’s experience and
Norwegian languages) folklore. Lindow offline life, when using any online space, that
(2014: 5–6) claims the origin of the word is of concern. Trolling that causes hurt or dis-
‘troll’ is still unknown, but in Norse language ruption is done by (but not limited to): post-
they were so called because they were trou- ing sexist, racist and homophobic comments,
blesome, shifting, changing and hard to pin human body shaming, physical and mental
down. This analogy seems appropriate when disability, publication of private details called
applying it to trolls on the internet. Over time doxxing, sexting and revenge porn publica-
the troll figure from this period of history has tion, fake news reports, false accusations of
been associated with those who use the inter- paedophilia, attacking people’s political and
net for trolling; hence the original meaning religious beliefs and desecrating human and
of troll has morphed into being used to animal online memorial sites.
describe humans who do mischief using elec- Trolls can act alone but are also highly
tronic communication. organized into co-operative groups. Trolling
Bishop’s (2013: 28) definition of trolls behaviours have been attributed to what Suler
encompasses qualities we have come to (2004: 321) states is a form of disinhibition
expect from trolling behaviours: they post where people do not feel any consequences
messages and image content via electronic for posting harmful content about others.
networks intended to be provocative, offen- Suler’s work has been frequently cited as
sive, menacing and disruptive. Bergstrom offering causes for such behaviours. Other
(2011) vividly compares an internet troll to a terms that scholars have applied to trolls
mythical Norse troll that has malicious intent include sadism, Machiavellianism (named
to do harm. Many trolls consider themselves after an Italian Renaissance politician), self-
comedians doing it for the Lulz (laughs) and interested, psychopathic, sociopathic and
think their work is harmless, but scholars deceitful. From these it is evident that trolls
often report them as being antagonistic for are usually portrayed in books and the mass
the sake of amusement (Hardaker, 2013: media as causing harm and as criminals.
58). Another common term for trolls is Trolling is practised in electronic commu-
keyboard warriors. nication environments requiring mostly pass-
Bishop (2014a: 8-9) describes the histori- word access and software skill mastery to
cal origins of trolls and trolling as originat- operate. The increased ease of access and use
ing from the US military in the 1960s, where of software, as exemplified in social media
fighter pilots’ strategies were fishing and like Facebook and Twitter, has played a part
‘reeling in’ one’s opposition in a form of in trolling, which became more common as
dog-fighting. He argues that trolling has been the software became easier to use. A clear
understood since 2001 as a form of abuse divide can be delineated, with technology
(Bishop, 2014b: 9). Further, it is argued that it becoming more accessible and easy to use.
is also a form of humour and a type of trans- Pre-web and Web 1.0 applications were lim-
gressive disruptive behaviour that became ited to those with expertise or willingness to
associated with the hacker group Anonymous learn them, whereas Web 2.0 represented new
and the bulletin board group 4chan, where and easier ways to take up trolling. However,
members made trolling a source of enjoy- Ankerson (2015: 1) cautions against forming
ment (Bishop 2014b: 9). neat categories of technology development,
Trolling is practised across many types especially the distinction between Web 1.0
of electronic communication and software, and Web 2.0. Though there are differences,
mainly through text, image and video. The trolls used many pre-web and Web 1.0 soft-
strategy is to bait people into being hurt or ware such as newsreaders or Internet Relay
TROLLS AND TROLLING HISTORY: FROM SUBCULTURE TO MAINSTREAM PRACTICES 579

Chat well past 2004 when Web 2.0 was said where human relationships are built through
to have begun. support and information exchange, they can
Therefore, a brief description of the context act as a form of psychotherapy (Rheingold,
of internet and web technologies is given to 1993: 4–7). Protected spaces can be infil-
show where trolls practise trolling. The devel- trated by trolls, hence the energy expended
opment of the web, attributed to Tim Berners- by owners of websites and internet spaces,
Lee and launched in 1989, is described as a who struggle to protect users from the nega-
document collection in an information space tive experiences trolls provide.
identified by Uniform Resource Locators Throughout internet and web studies the
(URLs) (The World Wide Web Consortium word ‘culture’ is used to describe activities
(W3C), 2004). Characterizing this was the done in cyberspace; hence the use of the
term ‘read-only web’, where information word cyberculture. Scholars speak of troll-
could be found but limited opportunities to ing culture but settling on a precise defini-
post content existed. tion is difficult. For this trolling history, the
Web 1.0 and 2.0 had an overlap in terms definition of culture is based on Kroeber and
of functionality; however, the advantage of Kluckhohn’s (1952: 53) claim that peoples’
increased web and internet development was beliefs and values inform action. A sub-
a more interactive, useful and interconnecting culture has beliefs at odds with a dominant
user experience with easier posting of content culture, often challenging the main culture’s
(Naik and Shivalingaiah, 2008). Cormode and values. Therefore, trolls are a subculture
Krishnamurthy (2008) describe the impact of because their beliefs and attitudes inform
this change as a shift from passive consumer negative actions such as disruption and hurt,
use in Web 1.0 to a participatory model where the antithesis of co-operative, respectful
users became content creators. communities. If pre-web and Web 1.0 are
For this chapter, the technological timeline more bounded, with trolling occurring only
for describing trolling is informed by the peri- in limited parts, in Web 2.0 and beyond, troll-
ods of pre-web, Web 1.0 and 2.0. However, to ing is considered mainstream. This means
fully cover the changes in trolling, references it is common and occurs across the internet
are also made to other software where trolling and web, fostered by better technologies and
is now abundant, such as phone applications changing societal attitudes to civility.
(apps), gaming and virtual environments. To demonstrate this core argument of troll-
ing history – its move from bounded subcul-
ture to mainstream practice – the evidence
is in the reporting of significant incidents.
PRE-WEB AND WEB 1.0 TROLLING: These are discussed in the next two sections,
A BOUNDED SUBCULTURE but again the pre-web, Web 1.0 and Web 2.0
are merely markers of time where improved
Calling the pre-web and Web 1.0 a bounded technology and access to the internet meant
subculture means that trolling took place in trolling became more widespread.
limited areas on the internet and was less
known by the public. It occurred on the web,
but also on non-web applications such as The LambdaMOO Troll Incident
Telnet bulletin boards, Usenet, chat rooms,
email, Multi-User Domains (MUDS) and The publication of Julian Dibble’s (1993) ‘A
virtual communities, the last one being an Rape in Cyberspace: Or how an Evil Clown,
example of a subculture bounded by rules a Haitian Trickster Spirit, Two Wizards and a
and moderation to protect members’ privacy. Cast of Thousands Turned a Database into a
As virtual communities are online spaces Society’ was significant for two reasons.
580 THE SAGE HANDBOOK OF WEB HISTORY

First, it challenged people to question troll- the public. Created in 1979 at the University of
ing, especially the concept that sexual assault North Carolina, the computer-based distrib-
took place online. Was virtual rape the same uted discussion system was a significant com-
as physical rape? Second, it highlighted the munication channel for information exchange,
dilemmas faced by those who ran any inter- emotional support and sharing of interests.
net space. Freedom to post was at odds with Newsgroups in Usenet consisted of topics and
the censure and moderation of troll activity. message threads within them. Messages were
Some people called for punishments such as cross-posted, meaning they appeared in many
banning, but others saw no legitimate harm newsgroups, which became a factor in wide-
in the troll’s action. spread unconstrained trolling activity.
To illustrate, a short description of the inci- Moderation and censorship of messages,
dent is needed to appreciate its significance including banning of trolls, were attempted,
in trolling history. Dibble was involved in a but trolls did not adhere to such rules.
Multi-User Domain Object Orientated space Usenet users were unhappy with the con-
called LambdaMOO, a text-based program straints on language and topic use in the
where people chatted and interacted with groups, and pressured Usenet’s owners to
each other in public and private spaces. He create a subculture called alternative (alt).
stated it was a harmonious environment Smith (1999: 201) observed that many Usenet
where trust was given freely and people sup- groups seemed to be dedicated to non-
ported each other. Into this came a troll named co-operative interactions, with people chal-
Mr Bungle, described by Dibble (1993) as an lenging, insulting and threatening each other.
evil-looking clown in a harlequin suit with Trolling became a standard part of Usenet use,
a belt buckle displaying a swear word. The with a number of documented events that came
troll, or perhaps trolls, as Dibble suggests it to public attention. Trolls succeeded in derail-
may have been more than one person, mali- ing conversations and eroding trust between
ciously disrupted the MOO by forcing female people, but also involved other Usenet groups
participants to perform sexual acts on him as unwilling participants drawn into trolling
and each other. behaviours (Sindorf, 2013: 201).
Despite the troll eventually being banned, the The historical significance of Usenet troll-
success of the disruption was an early account ing was around the group alt.folklore.urban
of how troll acts cause problems and how man- (Bartlett, 2014). Krappitz (2012: 40) claimed
aging trolling is difficult. It was not just the act alt.flame, a Usenet group devoted to insulting
that disrupted the room but the troll’s attempt and flaming other members, contained one of
to return under a new name and the division the first appearances of the word ‘troll’, as
between members who did not see the trolling reported in this post:
as a serious issue. Dibble’s story is important
Just some credentials: I am called Troll. I didn’t get
for scholars, albeit critically, as debates exist the name because I’m a fun guy. I am the cham-
over the use of the term ‘virtual rape’ as a pion of channel +insult on irc and I have thrice
valid act. However, as an early account of the defended the title before the channel went down,
problematic nature of troll behaviour, it is an so I can flame with the best. Flame away if you
invaluable resource for reflecting on the com- like, but ‘I’m gonna deal it back to you in spades.
‘Cause when I’m havin’ fun ya know I can’t con-
monplace dilemma of controlling it. ceal it. Because I know you’d never cut it in my
game.’ -Guns N’ Roses’ 99

The Usenet and Flame Wars This example of the word troll was applied
increasingly to people who were disrupting
The bounded system of Usenet brought troll- others’ internet experiences through name-
ing, and its mischievous nature, to scholars and calling and making fun of people. Note in the
TROLLS AND TROLLING HISTORY: FROM SUBCULTURE TO MAINSTREAM PRACTICES 581

quote the threat that retaliation will earn the group were warned of the Harvard group’s
person more vitriol from the troll. intention of trolling. A Harvard student, Matt
Trolls also flame people as a form of baiting Bruce, urged trolls to compose messages
others to respond to their comments. Bartlett’s using big words and immaculate grammar.
(2014) account in The Dark Net: Inside the Bruce had meant this for amusement, but it
Digital Underworld comprehensively cov- was seen as arrogant by the other group. In
ers the Usenet troll battles that became infa- retaliation against the Harvard students, the
mous but also stimulated research enquiry beavis-n-butthead group Usenet members
into trolling practices. He wrote of the story banded together and posted multiple offensive
of a Usenet member Moby who wanted to messages in the Karl Malden Harvard group.
invite a female to his apartment. However, his This trolling achieved the aim of driving
cats were on heat, messing up his apartment. many Harvard members to stop using Usenet
Though only posted in the Usenet group alt. because of the excessive and relentless num-
tasteless, his dilemma was cross-posted to ber of troll messages. One Harvard member
many groups, including pet groups. who was targeted was called C.A.T., the name
The alt.tasteless members would troll other inspiring the beavis-n-butthead trolls to use cat
Usenet groups, posting Moby’s message to phrases in messages as jokes (Lee, 2016):
groups where explicit language was banned,
Suddenly, afk-mn, alt.college.college-bowl, and
such as pet discussion groups. Trolls from scores of other groups were flooded to the gills
other Usenet groups in turn began trolling alt. and beyond with hundreds upon hundreds of
tasteless in return with messages involving sex- huge meow articles from all corners of Usenet.
ual acts with cats, torture and execution. This Cascades, ASCII cats, hundred-line ‘meow’ hello-
world-type flood posts, and more were posted,
caused outrage in many groups because cross-
reposted, munged, pureed, and regurgitated all
postings would disrupt the flow of exchanges over the servers of the world. The Harvard kids’
between members posting about their hob- protests were quickly lost in the feline tidal wave.
bies and interests, as well as flooding the Every post by a Harvard snot would result in fifty
groups with irrelevant and offensive messages. cascade follow-ups. alt.college.college-bowl, a
known regular haunt of Matt Bruce, was reduced
Trolls were relentless in posting content about
to a smoldering crater, so inundated with meows
Moby’s predicament with messages about cats that its regulars could no longer use it.
that were sexual and violent, causing problems
with moderating the groups, which were over- Bruce tried to appeal to all to stop the trolling
whelmed by the volume of messages. but was further ridiculed for his attempt to do
In 1996 another well-known Usenet incident so. This trolling incident is significant as it
highlighted the organized nature of trolling. A demonstrates how a subculture’s beliefs and
group of Harvard University students found an attitudes impacted on all users of an internet
abandoned Usenet group called alt.fan.karlmal- space. The Harvard students were seen as elit-
den.nose, named after the American actor Karl ist, deserving the flame war waged on them.
Malden, who had a bulbous nose after it was This suggested that trolls, although seen as
broken. It was meant to be a group that only dis- pranksters and disruptors, had genuine beliefs
cussed campus events. Lee (2016) on the web- about another group in society that made that
site ‘The One True History of Meow’ stated group the object of derision and hate.
that the Harvard Usenet members decided These incidents captured scholars’ inter-
to target another Usenet group with disrup- est in theorizing about trolls, particularly the
tive messages. They chose a group named alt. motivations of trolls in disrupting the internet
tv.beavis-n-butthead after two Music Television and websites using vitriol against groups in
(MTV) cartoon characters. society. Mitra’s (1996) study of the Usenet
Displaying the nature of co-operation that group soc.cult.indian concluded that, despite
existed across Usenet, the beavis-n-butthead the group having rules on behaviours, those
582 THE SAGE HANDBOOK OF WEB HISTORY

identified as coming from Pakistan openly drawing upon Suler’s (2004) disinhibition
racially attacked those living in India. This work. Research began recognizing new moti-
replicated the historical conflict between the vations for trolling, with Buckels et al. (2014:
countries as trolls disrupted the civil conver- 97) arguing that research on trolling was
sations between group members. Spender’s needed to diversify and explore motivations
(1995) study of women’s forums illustrated for it, incorporating sadism, narcissism,
what has become common trolling behav- Machiavellianism and psychopathic factors.
iour: using sexist and misogynistic terms These explanations marked a shift in which
to disparage women’s achievements. Jane trolling began to increasingly concern the
(2012) researched trolling and flaming of public, with demands for severe actions to
women online, terming the treatment they punish trolls.
received ‘e-bile’. Although unpleasant, Jane Whitney Phillips, a scholar who stud-
(2015: 66) argues that over 30 years of troll- ied trolling using ethnographic methods,
ing research has assisted well for understand- described Web 2.0 as the period when internet
ing the hostile discourse trolls take. trolling became mainstream. She argues that
The selected historical troll incidents it reflects both the societal and technological
that took place in bounded spaces online changes that have caused trolling to become
within subcultures were increasing as more a mainstream activity (Phillips, 2015: 137):
people began using the internet and web.
As agents of cultural digestion, trolls are subject to
Moderators, also known as administrators and directly reflect shifting political, historical, and
and on some sites wizards, began to struggle economic sands. It would stand to reasons, then,
to control trolls. As Usenet studies showed, that changes in mainstream culture would result in
trolls could organize themselves into highly corresponding changes in trolling subculture, a
efficient groups capable of attacking many critical point when considering the differences
between the emerging troll space of the mid-
others they had disdain for. This period laid 2000s, the established troll space of 2008–2011,
the basis for understanding trolling behav- and the scattershot troll space of 2012–2015.
iours, which were breaking out of the sub-
cultures and corners of the internet and web, Usenet required specific software and use
finding their way into the mainstream culture of passwords, but new sites such as 4chan
of the internet. did not ask for such requirements. The
web-based bulletin board of 4chan has been
widely researched and attracts criticism
from scholars as being a centre of organ-
WEB 2.0 TROLLING: BECOMING ized trolling, with behaviours described as
MAINSTREAM divorced from a moral hinge and joining a
‘hivemind’ where trolls could attack indi-
While Ankerson (2015: 1) stated the tempo- viduals (Manivannan, 2013: 122). However,
ral move from Web 1.0 to Web 2.0 was not it is only one of many online spaces where
seamless, increased functionality of online trolls now practise their pranking and
software and growing access to the internet disruption skills.
encouraged a growth in trolling activity. To illustrate the spread of trolling into the
Researchers recognized trolling’s growing mainstream daily life of the internet, a num-
perverseness in moving from Usenet and ber of important internet spaces are discussed.
virtual forums to social networks, online With these are examples of trolling that dem-
magazines and newspapers, blogs and web- onstrate this shift, and how trolling has become
based virtual worlds and computer games a practice that is almost taken-for-granted,
(Coles and West, 2016: 233). They sought only punctuated by occasional outrage that
answers for why trolling was occurring, brings it back to public focus.
TROLLS AND TROLLING HISTORY: FROM SUBCULTURE TO MAINSTREAM PRACTICES 583

Internet Memes and Online The date was 25 February 2010. That afternoon,
Sea World trainer Dawn Brancheau was thrashed
Memorial Vandalism to death by Tillikum, a 12,000 pound killer whale,
as the two performed in front of a live stadium
Internet memes have become a widely used audience (Martinez, 2010). Within minutes of
artefact to troll and prank online, but have Brancheau’s death, trolls began uploading macros
also been used to convey messages about onto /b/ featuring a homicidal whale (‘Killed the
serious issues, beginning with the Occupy bitch cos she didn’t bring fishs’) as well as varia-
tions on Rule 34 (an unofficial rule of the internet
Wall Street movement of 2011. By contrast,
declaring that whatever ‘it’ is, there is porn of it; if
trolls are seen by the public as villains, for on the off-chance there isn’t, one is expected to
example the trolling of online memorials for promptly create or PhotoShop some).
people and pets, referred to as vandalism.
Memes are multimodal artefacts often cre- Public outrage of such vandalism is often
ated in response to events in popular culture met with more posts from trolls making fur-
for public comment (Milner, 2013: 2359) but ther fun of the tragedy. There have been
over time may become ghoulish and atten- arrests and incarceration in many countries
tion seeking (Rintel, 2013: 266). for such behaviours, though court action is
A popular image of trolls was created by often prohibitive by cost and is not a deter-
Carlos Ramirez in 2008, called troll face. rent to trolls. Memes and online memorial
Shown as a bald head with large teeth and a vandalism trolling have become notorious as
grinning mouth, it was often placed over peo- weapons trolls use to hurt and disrupt others
ple’s faces to display humour or contempt. on a large scale.
Burgess (2008: 9) states that video and
image memes are deeply situated in the eve-
ryday use of the internet and are simply users Social Media Trolling
performing mundane and creative traditions.
In her study of the meme website LOLCats, Social media web-based platforms allow
Miltner (2014) argues that feline memes text, image and video posts to be shared with
became a harmless participatory practice. Yet an unlimited audience. Although they require
the meme can also cause hurt and offence if membership through the provision of an
used with the intent to harm. email address, fake accounts are abundant,
Trolls vandalizing online memorial sites and flaming, baiting and name-calling exist
invokes greater outrage among internet users across all of them. Although privacy settings
and the public. The term ‘memorial vandal- are offered, these are often not in bounded
ism’ officially emerged in 2010 to describe systems, so message content can be seen by
the provocative, insensitive and inflammatory anyone. Curran (2012: 56) argues that troll-
text and photos posted on human and animal ing has been caused by the fact that the inter-
memorial websites and social media. Leaver net has over time fostered a cumulative shift
(2013: 221) questions whether such practices from values and beliefs that prioritize the
are trolling, suggesting that the term trolling collective good of the community, and of
is being diluted and as such memorial vandal- groups within it, to ones that give priority to
ism is not a form of harm. Yet Marwick and the satisfaction of the needs, desires and
Ellison (2012: 390–1) claim such practices aspirations of the individual.
are a serious form of trolling with an impact What is significant is the move away from
not seen before in trolling history. Donath’s (1999: 45) description of trolling
Phillips (2011) argues that early memo- being a ‘game’ about identity deception, to
rial vandalism trolling came from 4chan, in people exposing themselves by stating their
a post making light of an attack by a killer real names. This disclosure has changed our
whale on a Sea World trainer in 2010: understanding of trolls, for they are no longer
584 THE SAGE HANDBOOK OF WEB HISTORY

invisible, anonymous, under-the-bridge inter- posted across accounts and, although it is a


net users. People seem willing on some social bounded system, public tweets can be seen
media sites to use their accounts for public by anyone. Examples of mass trolling during
trolling that can, and has, resulted in bans and 2016 included: England’s vote to leave the
defamation court cases. Their behaviours are European Union (Brexit), the 2016 Clinton–
like the Usenet trolls but now they have a face Trump US presidential campaign, continued
and a name. opposition to marriage equality in the United
States, and harassment of African American
actor Leslie Jones, who acted in a remake of
Ghostbusters, by conservative blogger Milo
Facebook and Twitter Trolling
Yiannopoulos.
Facebook and Twitter are frequently criti- Another example was reported by Simon
cized as platforms for trolls. Facebook has Hattenstone (2013) of The Guardian online
widespread trolling on it, prevalent on pages newspaper site, who interviewed Caroline
for media news sites, celebrities, politicians Criado-Perez, a feminist campaigner. She
and controversial human figures. Users will became a victim of Twitter trolls for her
argue and fight with each other, leaving posts success in having a woman placed back on
public. Pressure has been placed on Facebook English banknotes. As she states:
to enact stricter punishments for trolling.
Then there were the death threats. ‘One was from
This has proven to be difficult to consistently
a really bright guy who said: “I’ve just got released
apply and often people who are not trolling from prison.”’ She shows me her phone: “I’d do a
are banned for publishing comments and lot worse than rape you. I’ve just got out of prison
photographs that were not intended to bait and would happily do more time to see you ber-
and disrupt others. Clearly, the decision as to ried [sic]. #10feetunder.” The tweet is signed
Ayekayesa. There is another one, equally chilling.
what is trolling on Facebook, undertaken by
“I will find you, and you don’t want to know what
several content-watching offices around the I will do when I do. You’re pathetic. Kill yourself.
world, is subjective. Page owners have the Before I do. #Godie.”
ability to ban trolls from their sites, but it is
the decision of Facebook to ban someone She was successful in having some of the
permanently. trolls jailed. Another reported Twitter trolling
Twitter was created in 2006 as a micro- case was New Zealand model and television
blogging service where messages are called presenter Charlotte Dawson. It was reported
tweets, but its unique branding is that each that she was targeted because she posted on
tweet is limited to only 140 characters (boyd Twitter back at the trolls when they posted
et al., 2010: 2). It has become controver- tweets such as ‘It’s a very good thing that
sial because of the prevalence of trolls and you cannot breed’, and ‘please go hang your-
the failure of Twitter to manage them. The self’ (The New Zealand Herald: 2012).
platform is also notorious for the number of Dawson’s case was unique because the troll-
anonymous accounts and fake profiles of per- ing was considered a factor in her suicide,
sonalities, as well as trolls posting about fake bringing greater awareness to the public
deaths of celebrities. about the consequences of trolling. The
Trolling on Twitter can be sophisti- mainstream commercial media began exten-
cated and organized, like Usenet groups. In sively reporting and analysing trolling, often
2016 mass trolling occurred that was even in a sensational manner.
more efficiently organized and inflamed Over time, trolling on social media has
mob mentality behaviours. This behaviour not only become more organized but is an
is reinforced by the use of hash tags (#) industry in some countries, for example
and re-tweeting, where messages can be China and Russia. One example from Russia
TROLLS AND TROLLING HISTORY: FROM SUBCULTURE TO MAINSTREAM PRACTICES 585

involved hackers who, it was suggested, had community guidelines that users who join
been interfering in the country’s political agree to adhere to. What is concerning is that
system by spreading fake news on Twitter. trolling can appear on any video unless users
They were investigated by Finnish journal- disable comments. Trolling comments are
ist Jessikka Aro during 2014, resulting in often used to cause arguments, although con-
her being trolled on her Twitter account. troversial issues, political figures, criminals,
Interviewing Aro, Miller (2016: 4) describes religion, celebrities and people’s sexuality
the systematic, relentless and personal troll- are also frequently targeted by trolls. Trolls
ing Aro experienced: turn debates on YouTube into arguments that
become hateful with often discriminatory
Last spring someone sent her a text message pre- remarks used to further inflame users who
tending to be from her father – who died 20 years
ago – telling her he was ‘watching her’. Another willingly participate in such debates.
wrote a song, mocking her as a bimbo ‘James A significant study of trolling was con-
Bond’ NATO agent with a drug habit. There is even ducted by Shachaf and Hara (2010: 357)
a music video online, with Aro portrayed by an using Wikipedia, a collaborative encyclopedia,
actress in a leotard and wig. It would be funny if it which illustrated the reasons why trolls prac-
wasn’t dripping with venom’.
tise disruptive trolling activities. The authors
Social media trolling has become mainstream found that the Wikipedia trolls they inter-
because of the reporting of incidents such as viewed were motivated by boredom and the
Aro and Dawson. Trolls usually do prefer need to find challenges, viewing the harm they
fake accounts but, unlike Usenet, Twitter and did as entertainment. This study is frequently
Facebook became significant for open troll- cited because the authors were able to inter-
ing where full disclosure of people’s names view a small but significant sample of trolls,
occurred. The term trolling became generic giving insights into why people will troll a site,
rather than just being a dark part of the web suggesting that pleasure in harming is part of
only experienced by some users. the reason for trolling behaviours.
Web-based role-playing sites like Second
Life, founded in 2003, have trolls that inter-
Video Sharing, Gaming and rupt the residents’ activities. This is through
sexual harassment of residents and posting
Mobile Computing Trolling
text and images not allowed by the world’s
Trolling also occurs on video and knowledge- owner. Krappitz (2012: 112) documented the
sharing platforms such as YouTube and activities of Second Life troll Ralph Pootawn,
Wikipedia and increasingly on smart phone who gained fame by making videos of his
apps such as dating sites. Web gaming sites trolling. Owners constantly banned him for
and the virtual reality world Second Life, a watching residents having intimate encoun-
sophisticated graphical version of Dibbell’s ters. Trolls who did this in virtual worlds came
LambdaMOO, also experience trolling. to be known as professional griefers. Second
These mainstream websites are used daily so Life trolling is a more sophisticated version of
trolling, while tolerated, does place pressure Dibbell’s (1993) depiction of text-only troll-
on the owners of these websites to regulate ing, but it is a more accepted and mainstream
them against trolls. activity compared with the outrage that virtual
YouTube, founded in 2005, shares videos world trolling previously caused.
posted by users and invites users to com- Significant trolling now also occurs on
ment on them. Videos are posted by trolls smart phone apps. Dating sites such as Tinder,
showing their online exploits across the web Grindr and OkCupid are especially targeted.
and internet. However, YouTube’s user com- Because of the immediacy of contact between
ments section is often unmoderated despite members, people will often play with the
586 THE SAGE HANDBOOK OF WEB HISTORY

trolls through humorous conversations. In an accessible phenomenon that grew out of


Australian study of 357 Tinder users, March bounded subcultures into a mainstream,
et al. (2017: 139) found that trolling on this taken-for-granted part of using the internet.
app resulted from traits of psychopathy, sad- Web 1.0 and 2.0 are representations of the
ism and dysfunctional impulsivity. In terms shift the technology has allowed in supporting
of troll history, such studies found a shift in greater content creation and ease of use
society that needs further validation but is a that has in some way encouraged trolling.
reflection of changing societal values. Much However, the use of the word troll has devel-
trolling is associated with younger males. oped a new meaning as it has morphed into
However, this study suggests an equalling those who simply like to argue and fight with
of male and female trolling, with females’ each other online.
growing access and skill in trolling activities A rapid increase in research has contrib-
being more frequently reported. uted to our understanding of why trolls act
This section has presented the argu- as they do, using perspectives. There are
ment that trolling history has shifted from still calls for more research, as well as the
a bounded system activity to a public main- need to develop theories of trolling. Trolling
stream one. Griffiths (2014: 85) reflects that behaviours may stay consistent in intent –
little has changed, as it appears trolling is an to inflame, upset and disrupt – yet the inci-
act of intentionally provoking and/or antago- dence and shape of them have changed. It
nizing users in an online environment that cre- is a mainstream phenomenon because the
ates an often desirable, sometimes predictable, web technologies and platforms we use are
outcome for the troll. The difference is sim- integrated into our daily lives. Therefore, it
ply in the increase in types of, and numbers is not just the technology that has changed
of, trolls that exist. However, added to this opportunities to troll, but also the generic use
is how we define trolls and trolling, as these of the term troll to describe mere arguments
terms are applied to many forms of abuse web between web users has become convoluted.
users experience in many areas of the web. Yet Speculation on the future history of web
it is clear from the examination of the sample trolling lies in two possible areas: technol-
of web and internet trolling presented in this ogy development and changing definitions of
chapter that it has become a widespread issue what trolling is. First, technology will con-
that is still manifest, despite safeguards and tinue to evolve. Will virtual reality be a space
policies that exist to protect web users. for trolling on a larger scale than the current
virtual worlds? What if artificial intelligence
develops a trolling side to it? Second, our
perceptions of trolling and civility towards
CONCLUSIONS AND FUTURE others will evolve over time. Will the insults
SPECULATIONS ON TROLLING of trolls have less importance as we become
ACTIVITY a more tolerant society no longer willing to
accept racism, sexism, homophobia and other
In this historical account of trolling I have issues to cloud our interactions towards oth-
given a chronological order of events through ers in our global community? Importantly,
the pre-web, Web 1.0 and 2.0 eras: how will trolls have further influence on our lives
­trolling operates, what it looks like and how it as opinion shapers when world events occur?
has developed. Although such an account Calls to minimize and eliminate trolling
risks a biased view and leaves out many will continue as governments consider harsher
other events, as is the nature of historical penalties for its practice. As we continue to
accounts, it has demonstrated that trolling has refine our theoretical conceptions of trolling
developed into a sophisticated, organized and we must also use that knowledge to assist us
TROLLS AND TROLLING HISTORY: FROM SUBCULTURE TO MAINSTREAM PRACTICES 587

in making decisions as to how we can accom- IEEE, Kauai, Hawaii (https://www.danah.org/


plish the elimination of it. This imperative is papers/TweetTweetRetweet.pdf).
supported by the concerning ability for trolls Buckels, E.E., Trapnell, P.D., and Paulhus, D.L.
to influence government and societal affairs. (2014) ‘Trolls just want to have fun’, Person-
Trolling was once bounded within spaces, yet ality and Individual Differences, 67: 97–102.
Burgess, J.E. (2008) “‘All your chocolate rain
its flow into the mainstream has been to the
are belong to us?” viral video, YouTube and
web and internet user’s detriment. The inter- the dynamics of participatory culture’, in G.
net will grow further; trolling will take on new Lovink and S. Niederer (Eds.), Video Vortex
characteristics, with users who are finding Reader: Responses to YouTube. Amsterdam:
themselves the victim of trolling struggling Institute of Network Cultures. pp.101–109.
to contain their influence. This is the price we Coles, B.A., and West, M. (2016) ‘Trolling the
pay for trolls coming into our mainstream cul- trolls: online forum users constructions of
ture, but with research and willingness to take the nature and properties of trolling’, Com-
action against them, it is not a battle that will puters in Human Behavior, 60: 233–244.
necessarily be lost. Cormode, G., and Krishnamurthy, B. (2008)
‘Key differences between Web 1.0 and Web
2.0’, First Monday, 13(6) (http://firstmonday.
org/article/view/2125/1972).
Curran, J. (2012) ‘Rethinking internet history’,
REFERENCES in J. Curran, N. Fenton, and D. Freedman
(Eds.), Misunderstanding the Internet.
Ankerson, M.S. (2015) ‘Social media and the London: Routledge. pp. 34–65.
“read-only” web: reconfiguring social logics Dibbell, J. (1993) ‘A rape in cyberspace: or how
and historical boundaries’, Social Media + an evil clown, a Haitian trickster spirit, two
Society, 1(2): 1–12. wizards and a cast of thousands turned a
Bartlett, J. (2014) The Dark Net: Inside the Digital database into a society’, The Village Voice,
Underworld. London: William Heinemann. 38(51): 26–42 (http://www.villagevoice.com/
Bergstrom, K. (2011) ‘“Don’t feed the troll”: news/a-rape-in-cyberspace-6401665).
shutting down debate about community Donath, J.S. (1999) ‘Identity and deception in
expectations on Reddit.com’, First Monday, the virtual community’, in P. Kollock and M.
16(8) (http://firstmonday.org/article/view/ Smith (Eds.), Communities in Cyberspace.
3498/3029). London: Routledge. pp. 29–59.
Bishop, J. (2013) ‘The effect of de-individuation Griffiths, M.D. (2014) ‘Adolescent trolling in
of the internet troller on criminal procedure online environments: a brief overview’, Edu-
implementation:an interview with a hater’, cation and Health, 32(3): 85–87.
International Journal of Cyber Criminology, Hardaker, C. (2013) ‘“Uh…..not to be
7(1): 28–48 (http://www.cybercrimejournal. nitpicky,,,,,but…the past tense of drag is
com/Bishop2013janijcc.pdf). dragged, not drug”: an overview of trolling
Bishop, J. (2014a) ‘Dealing with internet troll- strategies’, Journal of Language Aggression
ing in political online communities: towards and Conflict, 1(1): 57–85.
the this is why we can’t have nice things Hattenstone, S. (2013) ‘Caroline Criado-Perez:
scale’, International Journal of E-Politics, “Twitter has enabled people to behave in a
5(4): 1–20. way they wouldn’t face to face”’. The
Bishop, J. (2014b) ‘Representations of “trolls” Guardian (http://www.theguardian.com/life-
in mass media communication: a review of andstyle/2013/aug/04/caroline-criado-
media-texts and moral panics relating to perez-twitterrape-threats).
“internet trolling”’, International Journal of Jane, E.A. (2012) “‘Your a ugly, whorish, slut”
Web Based Communities, 10(1): 7–24. – understanding e-bile’, Feminist Media
boyd, D.M., Golder, S., and Lotan, G. (2010) Studies, 14(4): 531–546.
‘Tweet, tweet, retweet: conversational Jane, E.A. (2015) ‘Flaming? what flaming? the
aspects of retweeting on Twitter’, HICSS-43, pitfalls and potentials of researching online
588 THE SAGE HANDBOOK OF WEB HISTORY

hostility’, Ethics and Information Technology, group identity in the interpretation and
17(1): 65–87. enjoyment of an internet meme’, First
Krappitz, S. (2012) ‘Troll Culture’, Honours Monday, 19(8) (http://firstmonday.org/ojs/
Thesis, Gestaltung, Kunst und Medien, Stutt- index.php/fm/article/view/5391/4103).
gart (http://wwwwwwwww.at/downloads/ Mitra, A. (1996) ‘Nations and the internet: the
troll-culture.pdf). case of a national newsgroup, “soc.cult.
Kroeber, A.L., and Kluckhohn, C. (1952) Cul- indian”’, Convergence: The International
ture: A Critical Review of Concepts and Defi- Journal of Research into New Media Tech-
nitions. Cambridge, MA: Peabody Museum nologies, 2(1): 44–75.
of American Archaeology and Ethnology, Naik, U., and Shivalingaiah, D. (2008) Com-
Harvard University. parative Study of Web 1.0, Web 2.0 and
Leaver, T. (2013) ‘Olympic trolls: mainstream Web 3.0. Paper presented to the Interna-
memes and digital discord?’, Trolls and the tional CALIBER-2008 meeting University of
Negative Space of the Internet: The Fibrecul- Allahabad, Allahabad, India (http://
ture Journal, 22: 216–233 (http://twentytwo. ir.inflibnet.ac.in/handle/1944/1285).
fibreculturejournal.org/fcj-163-olympic- Phillips, W. (2011) ‘LOLing at tragedy: Face-
trolls-mainstream-memes-and-digital- book trolls memorial pages and resistance to
discord/). grief online’, First Monday, 16(12) (http://
Lee, X. (2016) ‘The one true history of meow’, firstmonday.org/article/view/3168/3115).
Netiquette Anthropology (http://xahlee.info/ Phillips, W. (2015) This Is Why We Can’t Have
Netiquette_dir/_/meow_wars.html). Nice Things: Mapping the Relationship
Lindow, J. (2014) Trolls: An Unnatural History. between Online Trolling and Mainstream
London: Reaction Books. Culture. Cambridge, MA: The MIT Press.
Manivannan, V. (2013) ‘Tits or GTFO: the logics Rheingold, H. (1993) The Virtual Community:
of misogyny on 4chan’s random – /b/’, Trolls Homesteading on the Electronic Frontier.
and the Negative Space of the Internet: The New York: Harper Perennial.
Fibreculture Journal, 22: 108–131 (http:// Rintel, S. (2013) ‘Crisis memes: the importance
twentytwo.fibreculturejour nal.org/ of templatability to internet culture and free-
fcj-158-tits-or-gtfo-the-logics-of-misogyny- dom of expression’, Australasian Journal of
on-4chans-random-b/). Popular Culture, 2(2): 253–271.
March, E., Grieve, R., Marrington, J., and Jona- Shachaf, P., and Hara, N. (2010) ‘Beyond van-
son, P.K. (2017) ‘Trolling on Tinder® and dalism: Wikipedia trolls’, Journal of Informa-
other dating apps: examining the role of the tion Science, 36(3): 357–370.
dark tetrad and impulsivity’, Personality and Sindorf, S. (2013) ‘Symbolic violence in the online
Individual Differences, 110: 139–143. field: calls for “civility” in online discussion’,
Marwick, A., and Ellison, N.B. (2012) ‘”There isn’t Trolls and the Negative Space of the Internet:
wifi in heaven!” negotiating visibility on Face- The Fibreculture Journal, 22: 193–215 (http://
book memorial pages’, Journal of Broadcasting twentytwo.fibreculturejour nal.org/
and Electronic Media, 56(3): 378–400. fcj-162-symbolic-violence-in-the-online-field-
Miller, N. (2016) ‘Trolling the messenger in the calls-for-civility-in-online-discussion/).
name of propaganda’, The Sun Herald Smith, M.A. (1999) ‘Invisible crowds in cyber-
(http://www.smh.com.au/world/finnish- space: mapping the social structure of the
journalists-jessikka-aros-inquiry-into-russian- Usenet’, in P. Kollock and M. Smith (Eds.),
trolls-stirs-up-a-hornets-nest-20160310- Communities in Cyberspace. London: Rout-
gng8rk.html). ledge. pp.195–219.
Milner, R.M. (2013) ‘Pop polyvocality: internet Spender, D. (1995) Nattering on the Net:
memes, public participation, and the Occupy Women, Power and Cyberspace. North Mel-
Wall Street movement’, International Journal bourne: Spinifex.
of Communication, 7: 2357–2390. Suler, J. (2004) ‘The online disinhibition effect’,
Miltner, K.M. (2014) ‘”There’s no place for lulz CyberPsychology and Behavior, 7(3):
on LOLCats”: the role of genre, gender, and 321–326.
TROLLS AND TROLLING HISTORY: FROM SUBCULTURE TO MAINSTREAM PRACTICES 589

The New Zealand Herald. (2012) ‘Charlotte The World Wide Web Consortium (W3C). (2004)
Dawson Hospitalised after Troll War’ (http:// Architecture of the World Wide Web, Volume
www.nzherald.co.nz/lifestyle/news/article. One W3C recommendation 15 December
cfm?c_id=6&objectid=10830456). 2004 (https://www.w3.org/TR/webarch/#acks).
This page intentionally left blank
PART VI

The Roads Ahead


This page intentionally left blank
40
Web Archives and (Digital)
History: A Troubled Past and a
Promising Future?
Jane Winters

INTRODUCTION the archived Web as a primary source for the


study of the recent past, if not oblivious to
‘For more than four decades, the Internet has the very existence of Web archives.1 This
grown and spread to an extent where today it chapter will examine the reasons for histori-
is an indispensable element in the communi- ans’ relative failure to engage with the
cation and media environment of many coun- archived Web, and suggest why it is critical
tries, and indeed of everyday life, culture and for contemporary, political and digital histo-
society’ (Brügger et al., 2017: 1). So begins rians at least to do so. It will go on to explore
the introduction to the journal Internet the changing relationship between archivists,
Histories: Digital Technology, Culture and librarians and historians, which is beginning
Society, launched in 2017. The World Wide to break down researchers’ reluctance to
Web, which unlocked the full potential of the work with born-digital materials and big
Internet, has been with us for nearly 30 data. Finally, it will propose an exciting
years; and in October 2016 the Internet future for (digital) historical research, which
Archive celebrated 20 years of capturing, employs a combination of quantitative and
preserving and republishing the Web qualitative approaches to recover the lives
(Hanamura, 2016). These are pleasingly and voices of ordinary people.
round figures, indicating the passage of sub-
stantial time and the relative maturity both of
the Web itself and the processes that have
evolved to ensure that it is archived for the HISTORIANS AND WEB ARCHIVES
benefit of researchers. But those same
researchers, and historians in particular, The Web, like the newspapers that it now
remain largely oblivious to the richness of incorporates, contains material of interest for
594 THE SAGE HANDBOOK OF WEB HISTORY

every sub-discipline of history – politics, the Web in this way it is not necessary to
sport, finance, culture, food, fashion, conflict engage with its archives, although arguably
are all present in infinite varieties. Web to do so would enrich their understandings of
archives, imperfect though they may be,2 our most recent twentieth-century media
reflect this range and diversity; there is some- revolution. It is to be hoped that this engage-
thing for everyone. But there are three (over- ment will come with greater chronological
lapping) groups for whom Web archives distance, as the early technologies of the
might be expected to hold immediate and Internet become as unfamiliar as those of the
particular interest: contemporary, political printing press or the scriptorium.
and digital historians. Why is there so little
evidence that they are engaging with this new
primary source, or indeed with a whole range Political History
of born-digital archives?
The question of how to work with born-­
digital data is more pressing for political
historians, some of whom, of course, would
Contemporary History
also think of themselves as working in the
According to Kandiah (2008), ‘the aim of field of contemporary history. Governments
contemporary history is to conceptualise, have taken to the Web with marked enthusi-
contextualise and historicise – to explain – asm and no little skill, as they seek to engage
some aspect of the recent past or to provide a with, provide support for and learn more
historical understanding of current trends or about their citizens. A 2016 United Nations
developments’. Web archives are an invalua- report (Department of Economic and Social
ble lens through which to study life in the Affairs, 2016: 82), for example, identified
developed West in the late twentieth and the UK as world-leading in e-government,
early twenty-first centuries, but perhaps that noting ‘a Whole-of-Government approach in
past is still too recent, the digital apparently online service delivery’. Data portals like
still too new. Weber (2017: 26) reports that data.gov.uk, open.canada.ca and data.gouv.fr
‘When I told people I was researching the are increasingly making the workings of gov-
history of the Web in early 1995, about half ernment transparent, but they serve as an
of them were amused: “But it’s too young to early warning to political historians that they
have a history!”’ The strong connection will have to change how they work. In the
between the Web and journalism may also be sphere of government, the adoption of digital
a problem here. As Kandiah (2008) notes, means of communication, both internally and
‘Critics of the discipline feared that contem- externally, has been definitive and startlingly
porary history could … at best be nothing quick. Political historians will soon have
more than a form of journalism because its little choice but to seek information from
concerns were so closely rooted to the pre- Web archives because government plays out
sent’. In relation to Web archives, and the on the Web. They ‘will need to transpose
history of the Web, it is the desire to histori- their long-established disciplinary skills and
cise that seems to be most dominant. A con- instincts into a digital register: asking the
ference on ‘History and the Internet’ usual critical questions about their source
organised by History and Policy3 in December material – how it was produced and why it
2016, for example, included presentations on has survived – and establishing a deep and
the Domesday Book as big data and parlia- rich set of contexts through which to inter-
ment and print culture in the seventeenth pret it’ (McCarthy, 2016).
century, but only limited treatment of the For historians in both these and other
Internet itself. For historians who approach fields, however, the key reason for failing
WEB ARCHIVES AND (DIGITAL) HISTORY 595

to use Web archives is the requirement to century, but as recently as 2004 there could
develop new skills, or to refresh old ones.4 be no argument that ‘it remains deeply inter-
This was an important, if not unexpected, ested in text’ (Schreibman et al., 2004). Web
finding of the Big UK Domain Data for the archives contain vast quantities of text, but
Arts and Humanities (BUDDAH) research they are far removed from a digital scholarly
project.5 The project case studies are reveal- edition or a corpus prepared for linguistic
ing of the problems faced by researchers: ‘we analysis.8 Digital history, by contrast, has no
do not have enough case studies or meth- such unifying thread. It encompasses
odological literature to help us design this Geographic Information Systems (GIS) and
research’ (Millward, 2015: 10); ‘Keyword approaches drawn from historical geography;
full-text searching as the standard method- significant elements of public history, as
ology needs to be critically reconsidered’ exemplified by the work of the Roy
(Deswarte, 2015: 9); archived Web pages Rosenzweig Center for History and New
bring ‘the challenge of defining the ana- Media; scholarly editing and textual scholar-
lytical object itself’ (Huc-Hepher, 2015a: ship; economic and social research, influ-
8). Some of the missing skills are technical enced by social science methodologies;
ones – manipulating and cleaning large quan- prosopographical, biographical and genea-
tities of data is much easier if you have an logical investigation; and so on. Its propo-
understanding of Regular Expressions,6 for nents are interested in digital pedagogy, in
example – but others relate to historians’ scholarly communication, in big and small
(in)ability to work with statistics, to undertake data, micro- and macro-analysis. All of these
even the most basic quantitative analysis. A approaches and interests may be brought to
lack of statistical understanding impedes both bear on the historical Web, yet digital histori-
analysis at scale and the sampling that might ans have generally displayed the same ‘tepid
facilitate closer reading and micro-analysis. interest’ in Web archives and Internet histo-
The turn away from quantitative methods ries as other humanities researchers (Weber,
and approaches that has characterised much 2017: 27).
recent historical research and training has left One explanation is that while digital his-
historians singularly ill-equipped to deal with tory has embraced a range of historical
increasingly vast Web and other born-digital sub-disciplines, and borrowed readily from
archives.7 cognate subjects like archaeology and his-
torical geography, it has largely failed to
take account of developments in two cru-
Digital History cial areas: library, archive and information
studies; and digital preservation. Libraries
If a dearth of appropriate skills, and skills and archives have necessarily been at the
training, is hindering many historians from forefront of Web archives research and prac-
studying Web history, or from adding Web tice: it is they who have been responsible for
archives to their basket of primary sources, developing the tools and protocols to harvest
one might assume that this would not be the the Web, for running Web crawls, for devis-
case for our third group: digital historians. ing preservations tools and standards, for
But until very recently, the focus of even exploring how to document and search Web
digital history has lain elsewhere. This lack archives of varying size and scale. This is
of attention is both striking and, in my view, work that is discussed among the members
surprising. Digital humanities ‘has its origins of the International Internet Preservation
in the research carried out … in textually Consortium (IIPC), but not among historians,
focused computing’; it has diversified admi- digital or otherwise. Webster (2017) rightly
rably in the second decade of the twenty-first notes that much of the debate about the
596 THE SAGE HANDBOOK OF WEB HISTORY

impact of ‘the transition from paper to digi- on ‘Sensitivity review and digital records’
tal in records management and archiving … (Seles, 2017). In June 2017, RESAW and the
is to be found in the journals of the archival IIPC collaborated to run a conference which
profession, into which historians rarely look’. considered ‘Researchers, practitioners and
The boundaries between digital history and their use of the archived Web’, highlighting
digital preservation are even more clearly the value and importance of cross-sectoral
delineated; as with the conservation of books conversations. The strict separation of
and manuscripts, digital preservation may spheres that has obtained for so long is
only be noticed when it has failed in some beginning to break down in the face of the
way.9 The first reaction of a historian on see- challenges posed by born-digital data, and
ing an archived Web page is more likely to be nowhere is this more apparent than in rela-
‘Why are those images missing?’ than ‘How tion to Web archives.
has so much of this page been successfully
preserved?’ The narrative of the ‘digital dark
age’, which sometimes seems ubiquitous in
the mainstream media, only persists because MEDIATING ACCESS: HISTORIANS,
of a general lack of awareness of, and appre- LIBRARIANS, ARCHIVISTS AND THE
ciation for, the scope of existing digital pres- ARCHIVED WEB
ervation work and expertise (Winters, 2017a:
45). If some historians are still prone to confuse
Web archives with archives of historical data
which happen to have been published on the
Web – ‘archive’ is not a particularly helpful
CHANGING TIMES? term here, as Brügger (2016) has discussed
– it does seem as though a turning-point has
But there are signs that this is beginning to been reached. It no longer seems entirely
change. In the UK, for example, the Arts and fanciful to argue that we are moving towards
Humanities Research Council (AHRC) has the promising future of this chapter’s title; a
funded two separate research networks which future in which many different types of histo-
bring together historians, archivists, librari- rian, not just those with an interest in con-
ans and digital preservation specialists, temporary politics or digital methods, can
among others, to discuss the challenges integrate Web archives into their research.
posed by collecting, preserving, publishing For most of those historians, early encoun-
and using born-digital data of all kinds.10 The ters with Web archives are likely to be medi-
much larger European network RESAW (A ated by archivists and librarians (another
Research Infrastructure for the Study of reason to ensure that disciplinary silos are
Archived Web Materials)11 similarly includes breached). Given the scale of most Web
both humanities researchers, including sev- archives,12 and the consequent limitations of
eral historians, and representatives of keyword searching, curated special collec-
memory institutions with a responsibility for tions provide an easy and obvious route in to
archiving the Web. More events are being the data. They are also more likely to be
organised which offer something to multiple openly available than the broad national
sectors. There has generally been very little crawls undertaken by libraries and archives
exchange of ideas and personnel between the on a statutory basis.
‘Digital history’ and ‘Archives and society’ To date, the British Library has published
seminars hosted by the Institute of Historical 45 special collections around themes which
Research in London, for example, but in have been deemed ‘useful and interesting’ by
January 2017 a joint seminar was organised curators.13 They immediately showcase both
WEB ARCHIVES AND (DIGITAL) HISTORY 597

the chronological span of the archive and the predictable event generate special collections
range of human activity represented within in the Web archive – elections, anniversa-
it. One of the earliest special collections is ries, major sporting occasions; other clusters
concerned with the terrorist attacks that took are responses to the unexpected, to natural
place in London on 7 July 2005, which killed disasters like the Indian Ocean tsunami in
52 people and injured more than 700; the December 2004 or to terrorism. This latter
most recent deals with the UK general elec- trend is also apparent in the collections devel-
tion of 2015. A special collection capturing oped through Archive-It, which is described
the 2016 EU referendum debate is in prepa- as ‘The leading web archiving service for
ration, and will soon be made available to collecting and accessing cultural heritage
researchers (Kunze, 2016). There is plenty of on the web’.14 Archive-It involves more than
material here for political historians: a series 400 institutions in 16 countries who between
of UK general elections from 2005 to 2015; them have curated more than 4,000 special
the Scottish parliamentary election of 2007 collections. Of these, 178 (4.25 per cent) are
and the Scottish Independence referendum categorised as arising from ‘Spontaneous
of 2014; the London mayoral election of events’, and while the first collection on
2008 (although not those of 2012 or 2016); the list is an archive of the 100,000 Poets
the European Parliament election of 2009; for Change website,15 many concern shock-
the Credit Crunch, 2008–10. Other areas of ing and more or less unpredictable events,
strength include sport (the Commonwealth from the 2013 Boston Marathon bombing to
Games held in Glasgow in 2014, the London Hurricane Katrina.16 Politics looms large too:
Olympic and Paralympic Games of 2012), for example, 412 collections (9.83 per cent)
anniversaries of national significance (the are categorised as relating to ‘Government’,
200th anniversary of the birth of Charles and there are numerous smaller and perhaps
Darwin, Queen Elizabeth II’s Diamond overlapping clusters concerned with particu-
Jubilee in 2012, the centenary of the Easter lar elections or states in the United States.17
Rising in 2016), health (collections dedi- In both of these instances, the special col-
cated to mental health, personal experiences lections serve an important role in illuminat-
of illness, the 2012 Health and Social Care ing the wider Web archives from which they
Act and even pandemic influenza outbreaks are derived, respectively those of the British
since 2005) and religion (a general collec- Library and the Internet Archive. They act as
tion on religion, politics and law since 2005 a shop window for archives that are challeng-
and more specific ones concerned with the ing to encounter at scale, encouraging initial
Quakers and the Free Churches). browsing which might then lead on to more
These special collections are enormously in-depth analysis and research. This neces-
rich and diverse, but that very eclecticism sarily imbues them with an importance that
poses something of a problem for Web his- may not always have been considered by
tory and historians. Why, for example, are those responsible for their creation. A par-
there collections relating to Cornwall and ticular collection almost certainly has enor-
Hampshire but to no other counties in the mous value for the curator(s), and for the
UK? What was it about the Cambridge many others who will explore it in years to
Network, ‘a membership organisation based come, but what does it say about the shape
in the vibrant high technology cluster of and significance of the wider Web archive?
Cambridge’, which led to its being singled The British Library’s remit to preserve and
out in this way? Why have 26 websites con- make available the UK’s intellectual and cul-
cerned with e-publishing trends been given tural heritage is apparent in the Web archive
special status alongside those dealing with collections that deal with significant and/or
national politics? Clearly, certain types of traumatic events, but others are suggestive of
598 THE SAGE HANDBOOK OF WEB HISTORY

personal interest and enthusiasm or a seren- with archivists and programmers, as described
dipitous partnership.18 This is even more the in Milligan et al. (2016); or they might inves-
case for Archive-It, where some collections tigate the various open-source Web archiving
have been curated in specific teaching con- tools that have been developed, like Warcbase
texts, for example.19 This is still an evolving and Wget.20 Milligan in particular has shown
landscape, and experimentation is entirely what can be achieved when historians embrace
appropriate, but there is a risk that these early these approaches and participate in the crea-
experiments may begin to ‘fix’ a particular tion as well as the analysis of Web archives
view of Web archives and the kinds of his- (2012; 2017a). Both of these approaches to
torical research for which they are most suit- Web archiving, however, require a degree of
able. This is particularly true if access to the technical expertise with which many histori-
larger archives remains restricted, for legal, ans are, and are likely to remain, uncomfort-
technical or other reasons. The choices that able. Fortunately, there are alternatives that do
are made now could resonate for decades to not involve such a steep technical learning
come, and some of the consequences might curve, notably Webrecorder.21
be unintended: as Schwartz and Cook (2002: Webrecorder allows anyone ‘to create high
2) note, ‘Archives – as records – wield power fidelity, context rich and interactive archives
over the shape and direction of historical of the dynamic web’ (Espenschied, 2016).
scholarship, collective memory, and national The resulting collections of WARC files are
identity, over how we know ourselves as indi- not just personal but personalised: the pages
viduals, groups and societies’, and archivists are captured as the researcher moves through
have the power to shape how we access those a website, keeping a record of her chosen
records. pathway. This personalisation even extends
to the faithful recording of the ‘logged-in’
experience on online social media. A shal-
low hierarchical structure is present in these
THE HISTORIAN AS ARCHIVIST archives, with ‘sessions’ organised into col-
lections, and simple descriptions can be
If librarians and archivists will play an impor- added to aid future navigation and discover-
tant role in determining not just what is ability. All of this functionality is available in
included in Web archives but how that archived the browser, but the archived files can also
material is used by historians, even the kinds be downloaded and replayed offline using a
of questions that they will ask, there is also desktop app, the Webrecorder Player.22 This
considerable scope for historians to take per- is a very different approach to Web archiving
sonal responsibility. Web archiving at scale is from the comprehensive full domain crawl
a highly technical process, requiring invest- undertaken by large memory institutions,
ment in expertise and equipment, but research- one which supports a micro-level approach
ers can build their own collections. This both to the harvesting and study of the Web.
tendency towards personal archiving has been Archiving a website of any size, for exam-
present from the very earliest days of Web his- ple that of a major public broadcaster, would
tory. Brügger, for example, noted in 2010 that be difficult and time-consuming using this
his personal Web archive ‘contains a substan- method, with all links having to be followed
tial part of the Danish web activity in relation to ensure their successful capture. The col-
to the Olympic Games in 2000’ (2010: 351). lections that can be created in this way will
There are a number of options available to remain relatively small, and focused nar-
researchers who would like to assume some rowly on a researcher’s interests and expe-
control over the archiving process. They riences. In some instances they will record
might, for example, choose to collaborate and reflect her personal social and research
WEB ARCHIVES AND (DIGITAL) HISTORY 599

networks online, providing some of the eth- America from the early 2000s (Dougherty,
nographic context that is missing from the 2017); and Saskia Huc-Hepher has explored
larger automated Web crawls. There are lay- the histories of French communities in
ers of interest and value in such archives, London through their archived blogs (Huc-
which can only be fully realised if they are Hepher, 2015b). But this volume and variety
shared, with other researchers and perhaps poses challenges for historians. How do we
ultimately with libraries and archives. make sure that we find individual voices
among all the noise? How can we judge the
significance of what they are saying when
any and all points of view will have been
CAPTURING THE VOICE OF THE captured, but we have little or no data indica-
INDIVIDUAL tive of circulation or popularity?

This focus on the small, on the personal, is


one possible future for Web history; and one
which reflects growing interest in the value BIG HISTORY AND THE MACROSCOPE
of digital storytelling (see, for example,
Burgess, 2006; Coleborne and Bliss, 2011). Alongside this renewed emphasis on close
Web archives, which record so many differ- reading, the current abundance of digital data
ent voices, hold out the promise of a new has led to calls for a return to ‘big’ history
golden age of history from below.23 It is not and the longue durée (Guldi and Armitage,
true to say that anyone can create online con- 2014), for the adoption of the distant reading
tent which might find its way into a Web approach first proposed by Franco Moretti
archive – certain groups are still privileged, for the study of digitised literary corpora
depending on education, age, social class, (Moretti, 2013). Graham et al. (2015) argue
geographical location, and so on24 – but there for historians to embrace big data – one
are unprecedented opportunities to self-­ chapter is even titled ‘The joys of big data for
publish, to comment on the publications of historians’. The authors are among an
others. The sheer diversity of authors who increasing number of historians to apply the
may be represented in Web archives is high- concept of the macroscope (see, for example,
lighted by a Blogs special collection in the Jockers, 2013; Hitchcock, 2014), which
UK Web Archive: among the 763 blogs ‘instead of allowing you to see things that are
included, alongside those of politicians and small or far away … makes it easier to grasp
protest groups, are ‘Alan in Belfast’, who the incredibly large. It does so through a pro-
writes about ‘cinema, books, technology, and cess of compression, by selectively reducing
the occasional rant about life’; ‘Nelly’s complexity until once-obscure patterns and
Garden’, which presents the thoughts of relationships become clear’ (Graham et al.,
Nelly Culleybackey from County Antrim; 2015: 15). This is a far more nuanced
and ‘Lizzy’s Literary Life’, ‘celebrating the approach than Culturomics, which more or
pleasures of a 21st century bookworm’.25 Ian less completely failed to take account of
Milligan’s work on the GeoCities archive has humanities research practices and concerns
begun to recover the voices of the children (Michel et al., 2011). The proponents of the
who were active participants in the early macro-historical are not suggesting that the
Web, and specifically in the Enchanted Forest micro-historical should be abandoned in
‘neighbourhood’ (Milligan, 2017a and the face of the data deluge, rather that there
2017b); Megan Dougherty has shone a spot- is value in considering both; and, indeed, the
light on the brief flourishing of the subcul- interactions between the two can produce
tural Islamic punk movement in North new insights and interpretations.
600 THE SAGE HANDBOOK OF WEB HISTORY

THE SMALL AND THE LARGE: A example, considered whether ‘they simply
QUESTION OF SCALE make research easier, cheaper (to the
researcher), more convenient, and less time
As is so often the case, the most promising consuming, or whether there is evidence that
approaches for historical research in this they open up new avenues for research’. The
field bring together the small and the large, story here is one of enormous potential:
‘the general and the particular’ (Manovich, researchers are perceived to be developing
2016). The challenge is for historians to find new methods and even theories, but ‘How
new ways of working without losing the these perceived changes result in new
emphasis on the individual that has long dis- research questions across the humanities in
tinguished humanities research. Using the the long run is still emerging’ (Meyer, 2011:
framework of cultural analytics, Lev 39). Perhaps it is unfair to consider 16 years
Manovich proposes that ‘we may combine as ‘the long run’, but this description of
the concern of social science, and sciences in humanities research, and digital history,
general, with the general and the regular, and would not seem out of place today. The
the concern of humanities with individual promise of transformation is still tantalis-
and particular … analysing massive datasets ingly there, but how close is it to being
to zoom in on the unique items’ (Manovich, realised?
2016). Tim Hitchcock argues that, in contrast Web archives, and other kinds of born-­
with ‘the categories of knowing that domi- digital data, do bring the possibility of, and
nated the nineteenth and twentieth centuries’, perhaps even necessitate, a radical refram-
born-digital data, and Web archives in par- ing of digital history – through their scale,
ticular, provide ‘an opportunity to re-think their heterogeneity, their complexity, their
what is possible, and to re-think what it is we fragility. Some historians will undoubtedly
are asking; how we might ask it, and to what continue to focus solely on the textual ele-
purpose’; but while historians must ‘be able ments of the archived web, abstracting words
to wield the tools of large-scale visualization from their rich digital context. But others will
… we need to do so at the same time as we work with text, sound, and still and moving
preserve the values and practices that under- images in the round. In doing so they might
pin traditional academic history, while going engage with the history of art and design,
beyond the standards of scholarship we have media and communication studies, the his-
inherited’ (Hitchcock, 2015; 2013: 20). tory of technology, linguistics, film studies
– and with other researchers in those fields.
They might move across the boundaries of
the (social) sciences, arts and humanities,
A PROMISING FUTURE? learning new skills themselves or building
partnerships. They might, as Marc Weber
These are provocative calls to action, which urges, turn their attention to the materiality
speak to an exciting future for historical of the Web, learning from the work of muse-
research – one which is predicated on the ums.26 They might do some or all of these
ever-increasing availability of digital sources things in combination. And they might, by
and the development, and widespread adop- combining big data approaches with human-
tion, of innovative digital methods that build istic understandings, at last begin to develop
on the best traditions of humanistic explora- genuinely new research questions and gener-
tion. It is, however, an exciting future that has ate new knowledge.
been invoked before in the current life-cycle It is vitally important that historians, digi-
of digital history. A report on the impact of tal or otherwise, should carve out a space for
digital resources published in 2011, for themselves in the study of both the live and
WEB ARCHIVES AND (DIGITAL) HISTORY 601

the historical Web, especially where these This seems to me simultaneously to underesti-
are conceived first and foremost as big data. mate the variety of humanities research method-
ologies, the fuzziness of concepts like ‘essential’
The individual, the human, risks becoming
and ‘authoritative’, and the ability of historians to
lost in the face of arguments that ‘Causality navigate uncertainty and devise new approaches.
… is being knocked off its pedestal as the 5  BUDDAH was funded by the Arts and Humani-
primary foundation of meaning. Big data ties Research Council (AHRC) as part of its Digi-
turbocharges non-causal analyses, often tal Transformations in the Arts and Humanities
theme (https://buddah.projects.history.ac.uk/;
replacing causal investigations’ (Mayer-
grant reference AH/L009854/1). Cowls (2016)
Schönberger and Cukier, 2017: 66); or claims usefully summarises the progress made and chal-
that ‘Culturomic results are a new type of evi- lenges faced by the ten researchers who were
dence in the humanities’ (Michel et al., 2011: awarded bursaries to use Web archives in their
181). Large-scale trends are, of course, enor- research.
6  ‘Regular expressions … are a way of defining
mously important to understand, as Mayer-
patterns that can apply to sequences of things
Schönberger and Cukier demonstrate, but … and they are incorporated into most general
historians are very well placed to combine an programming languages’ (Knox, 2013).
appreciation of broader patterns and move- 7  There are many reasons for the current lack of
ments with a forensic understanding of the interest in, even distrust of, quantitative methods
among historians, but at least some of it may be
small-scale and the human – of ordinary lives
traced back to the overblown claims made for the
not just data points. And the archives of the ‘Cliometrics Revolution’ of the 1960s and 1970s.
Web are a unique record of many millions of The fall from grace experienced by Cliometrics
ordinary lives, alongside histories of celeb- was dramatic, and echoes of it may be seen in
rities, institutions and nations. As yet those the heated debates that followed the publication
of Michel et al. (2011), and reactions to ‘Cul-
archives remain largely unexplored, but there
turomics’. For a fuller discussion of these issues,
are many Web histories to be uncovered, see Winters (2018).
many pathways to be explored and many new 8  Compare, for example, the order and structure of
questions to be asked. the Corpus of Global Web-Based English (Davies,
2014) with the disorder and relative chaos of the
UK Web Archive.
9  In the UK, the most resonant example is still that
Notes of the BBC Domesday project. This ambitious ini-
tiative to create a comprehensive digital record of
1  A 2010 study into the ‘state of the art’ noted life in Britain, 900 years after the compilation of
the gap ‘between the potential community of the original Domesday Book, was an early and
researchers who have good reason to engage very high-profile digital preservation failure. It
with creating, using, analysing and sharing web cost £2.5 million, but the special BBC Micro com-
archives, and the actual (generally still small) com- puters required to read the 12-inch video disks
munity of researchers currently doing so’ (Dough- rapidly became obsolete and the data was inac-
erty et al., 2010: 5). cessible just 15 years after it was collected (McKie
2  The problems of working with Web archives have and Thorpe, 2002). The contrast with its medieval
been well rehearsed by, among others, Brügger – analogue – predecessor could not have been
(2012a and 2012b), Brügger and Finnemann greater. What is much less remarked is that there
(2013), Milligan (2016) and Winters (2017b). were a number of efforts to recover the data, and
3  History and Policy is a UK network of historians indeed in 2011 the contents of the ‘Community’
which ‘promotes better public policy through a disk were published online as part of Domesday
greater understanding of history’ (http://www. Reloaded. In another failure of digital preserva-
historyandpolicy.org/). tion, this recovered data was until recently acces-
4  I am unconvinced by the argument that histori- sible on the live web at http://www.bbc.co.uk/
ans are reluctant to use Web archives because history/domesday, but at the time of writing it is
they are concerned ‘that they will not be able to only partially available via the Internet Archive.
replicate their historical research process when 10  The two interdisciplinary networks are: Born-
using web archives, and may not find essential digital Big Data and Approaches for History and
and authoritative records’ (Belovari, 2017: 60). the Humanities (https://borndigitaldata.blogs.
602 THE SAGE HANDBOOK OF WEB HISTORY

sas.ac.uk/; grant reference AH/N006178/1) and 19  See, for example, the K-12 Web Archiving Pro-
Record DNA (https://recorddna.wordpress.com/; gram in the United States, which at the time of
grant reference AH/P006868/1). writing has generated more than 300 student
11  RESAW (http://resaw.eu/). collections (https://archive-it.org/k12/). I owe this
12  There are some exceptions to this. At the time of reference to Ian Milligan.
writing, the UK Parliament Web Archive (http:// 20  Warcbase (https://github.com/lintool/warcbase)
webarchive.parliament.uk/) only includes around has been developed by Jimmy Lin at the Univer-
37 websites. Individually, some of these might be sity of Waterloo; Wget (https://www.gnu.org/
very large (the Hansard Archive from 1803 is one software/wget/) is part of the GNU Operating
example), but the collection as a whole remains System.
manageable and susceptible to qualitative analy- 21  Webrecorder (https://webrecorder.io/). Webre-
sis. By contrast, the full domain crawl of the .uk corder has been developed by Ilya Kramer at Rhi-
country code Top Level Domain (ccTLD) under- zome, with funding from the Andrew W. Mellon
taken by the British Library in 2014 captured 2.5 Foundation.
billion web pages and other assets (Hartmann, 22  Webrecorder Player 1.0.5 (https://github.com/
2015). webrecorder/webrecorderplayer-electron/
13  UK Web Archive: Browse by Special Collection releases/tag/v1.0.5).
(https://www.webarchive.org.uk/ukwa/collection/). 23  See Taylor (1997), for a discussion of history from
14  Archive-It (https://archive-it.org/). below and modern British social history.
15  100,000 Poets (https://archive-it.org/collections/ 24  Just in the UK, a 2017 survey of digital skills pub-
2845). lished by the Tech Partnership, in association with
16  There are, in fact, three special collections deal- Lloyds Banking Group, found that ‘21% (11.5m)
ing with the Boston Marathon bombing: Blasts of the UK are classified as not having Basic Digi-
in Boston Marathon (https://archive-it.org/col- tal Skills’. A heatmap of digital exclusion derived
lections/3752); 2013 Boston Marathon Bomb- from the survey data reveals significant geograph-
ing (https://archive-it.org/collections/3649); and ical variation (http://heatmap.thetechpartnership.
Boston Marathon Bombing: Twitter and RSS com/). Even where relatively advanced digital
Feeds (https://archive-it.org/collections/3645). skills are present, there are a range of other fac-
Hurricane Katrina warrants two: Hurricane tors influencing whether or not people choose to
Katrina (https://archive-it.org/collections/174); participate in the creation of online content (see,
and Hurricane Katrina Blogs Web Collection e.g., Hargittai and Walejko (2008); Blank (2013).
(https://archive-it.org/collections/7625). This 25  Alan in Belfast (https://www.webarchive.org.uk/
multiplicity of collections relating to single ukwa/target/18710548/collection/100698/source/
events is an additional problem for the historian, collection); Nelly’s Garden (https://www.webarchive.
as the relationship between them is unclear. In org.uk/ukwa/target/7176221/collection/100698/
the Boston example, two have been curated by source/collection); Lizzy’s Literary Life (https://www.
the Virginia Tech: Crisis, Tragedy, and Recovery webarchive.org.uk/ukwa/target/65208425/collec-
Network, so there is presumably no overlap, tion/100698/source/collection).
but what of the collection curated by Internet 26  Weber sees ‘the history of the online world as
Archive Global Events? Detailed comparison is not just about the Web itself, or networks like
required to establish the ways in which these the Internet, or computers, but as all of these
three collections differ from or are similar to within the long tradition of tools we have created
each other. for sharing and refining information: books and
17  Government special collection (https://archive- clay tablets and talking drums and more’ (Weber,
it.org/explore?show=Collections&fc=meta_ 2017: 34). Web90 – Patrimoine, mémoires et his-
Subject%3AGovernment). Examples of smaller toire du Web dans les années 1990, at L’Institut
collections on a similar theme are Government – US des sciences de la communication du CNRS, is an
Federal (https://archive-it.org/explore?show=Co exemplary project in this respect (http://web90.
llections&fc=meta_Subject%3AGovernment- hypotheses.org/).
usfederal) and SF [San Francisco] Government
(https://archive-it.org/explore?show=Collections
&fc=meta_Subject%3ASF+Government).
18  This is explicitly the case with the Live Art col-
lection, which has been produced in partnership REFERENCES
with the Live Art Development Agency in London
(https://www.webarchive.org.uk/ukwa/collec- Belovari, S. (2017) ‘Historians and Web
tion/26312782/page/1). archives’, Archivaria, 83: 59–79.
WEB ARCHIVES AND (DIGITAL) HISTORY 603

Blank, G. (2013) ‘Who creates content? Strati- 2016: E-Government in Support of Sustain-
fication and content creation on the able Development. New York: United Nations
Internet’, Information, Communication and (http://workspace.unpan.org/sites/Internet/
Society, 16(4): 590–612. DOI: Documents/UNPAN97453.pdf).
10.1080/1369118X.2013.777758. Deswarte, R. (2015) Revealing British Euroscep-
Brügger, N. (2016) ‘Webraries and Web archives ticism in the UK Web domain and archive.
– the Web between public and private’, in D. London: School of Advanced Study.
Baker and W. Evans (eds), The End of pp. 1–10 (http://sas-space.sas.ac.uk/6103/1/
Wisdom? The Future of Libraries in a Digital Deswarte%20case%20study.pdf).
Age. Oxford: Chandos Publishing. Dougherty, M. (2017) ‘“Taqwacore is dead.
pp. 185–90. Long live Taqwacore” or punk’s not dead?
Brügger, N. (2012a) ‘Web history and the Web Studying the online evolution of the Islamic
as a historical source’, Zeithistorische punk scene’, in N. Brügger and R. Schroeder
Forschungen, 9(2): 316–25. (eds), The Web as History: Using Web
Brügger, N. (2012b) ‘When the present Web is Archives to Understand the Past and the
the later past: Web historiography, digital Present. London: UCL Press. pp. 204–19.
history, and Internet studies’, Historical Dougherty, M., Meyer, E. T., Madsen, C. M.,
Social Research/Historische Sozialforschung, van den Heuvel, C., Thomas, A. and Wyatt,
37(4): 102–17. S. (2010) Researcher Engagement with Web
Brügger, N. (ed) (2010) Web History. Bern, Archives: State of the Art. Jisc Report
Switzerland: Peter Lang US. (https://papers.ssr n.com/sol3/papers.
Brügger, N. and Finnemann, N. O. (2013) ‘The cfm?abstract_id=1714997).
Web and digital humanities: theoretical and Espenschied, D. (2016) Introduction to
methodological concerns’, Journal of Broad- Webrecorder (https://www.youtube.com/
casting and Electronic Media, 57(1): 66–80. watch?v=n3SqusABXEk&feature=youtu.be).
DOI: 10.1080/08838151.2012.761699. Graham, S., Milligan, I. and Weingart, S. (2015)
Brügger, N., Goggin, G., Milligan, I. and Exploring Big Historical Data: The Historian’s
Schafer, V. (2017) ‘Introduction: Internet his- Macroscope. London: Imperial College Press
tories’, Internet Histories: Digital Technology, (http://www.themacroscope.org/2.0/).
Culture and Society, 1(1–2): 1–7. DOI: Guldi, J. and Armitage, D. (2014) The History
10.1080/24701475.2017.1317128. Manifesto. Cambridge: Cambridge University
Burgess, J. (2006) ‘Hearing ordinary voices: Press (https://www.cambridge.org/core/books/
cultural studies, vernacular creativity and the-history-manifesto/AC1A1EC711AE91A4
digital storytelling’, Continuum: Journal of F9004E7582D79AFD#).
Media and Cultural Studies, 20(2): 201–14. Hanamura, W. (2016) The Internet Archive
DOI: 10.1080/10304310600641737. turns 20! Internet Archive Blogs (https://
Coleborne, C. and Bliss, E. (2011) ‘Emotions, blog.archive.org/2016/09/19/the-internet-
digital tools and public histories: digital sto- archive-turns-20/).
rytelling using Windows Movie Maker in Hargittai, E. and Walejko, G. (2008) ‘The par-
the history tertiary classroom’, History ticipation divide: content creation and shar-
Compass, 9(9): 674–85. DOI: ing in the digital age’, Information,
10.1111/j.1478-0542.2011.00797.x. Communication and Society, 11(2): 239–56.
Cowls, J. (2016) ‘Cultures of the UK Web’, in DOI: 10.1080/13691180801946150.
N. Brügger and R. Schroeder (eds), The Web Hartmann, S. (2015) 2015 UK domain crawl
as History: Using Web Archives to Under- has started. UK Web Archive blog (http://
stand the Past and the Present. London: UCL blogs.bl.uk/webarchive/2015/09/2015-uk-
Press. pp. 220–37. domain-crawl-has-started.html).
Davies, M. (2014) Corpus of Global Web-Based Hitchcock, T. (2015) The UK Web Archive,
English (GloWbE) (https://corpus.byu.edu/ born-digital sources, and rethinking the
glowbe/). future of research. Web Archives for Histori-
Department of Economic and Social Affairs ans (https://webarchivehistorians.org/tag/
(2016) United Nations E-Government Survey tim-hitchcock/).
604 THE SAGE HANDBOOK OF WEB HISTORY

Hitchcock, T. (2014) Big data, small data and Mayer-Schönberger, V. and Cukier, K. (2017)
meaning. Historyonics blog (http://historyon- Big Data: The Essential Guide to Work, Life
ics.blogspot.co.uk/2014/11/big-data-small- and Learning in the Age of Insight. London:
data-and-meaning_9.html). John Murray (Publishers).
Hitchcock, T. (2013) ‘Confronting the digital: Meyer, E. T. (2011) Splashes and Ripples: Syn-
or how academic history writing lost the thesizing the Evidence on the Impacts of
plot’, Cultural and Social History, 10(1): 9– Digital Resources. Jisc Report (https://ssrn.
23. DOI: 10.2752/147800413X1351529209 com/abstract=1846535).
8070. Michel, J.-B., Shen, Y. K., Aiden, A. P., Veres,
Huc-Hepher, S. (2015a) Searching for home in A., Gray, M. K., The Google Books Team,
the historic Web: an ethnosemiotic study of Pickett, J. P., Hoiberg, D., Clancy, D., Norvig,
the London-French habitus as displayed in P., Orwant, J., Pinker, S., Nowak, M. A. and
blogs. London: School of Advanced Study. Aiden, E. L. (2011) ‘Quantitative analysis of
pp. 1–27. (http://sas-space.sas.ac.uk/6252/1/ culture using millions of digitized books’, Sci-
Huc-Hepher%20case%20study.pdf). ence, 331(6014): 176–82. DOI: 10.1126/
Huc-Hepher, S. (2015b) ‘Big Web data, small science.1199644.
focus: an ethnosemiotic approach to cultur- Milligan, I. (2017a) ‘Welcome to the Web: the
ally themed selective Web archiving’, Big online community of GeoCities during the
Data & Society, 2(2) (http://journals.sagepub. early years of the World Wide Web’, in N.
com/doi/abs/10.1177/2053951715595823). Brügger and R. Schroeder (eds), The Web as
Jockers, M. L. (2013) Macroanalysis: Digital History: Using Web Archives to Understand
Methods and Literary History. Urbana- the Past and the Present. London: UCL Press.
Champaign: University of Illinois Press. pp. 137–58.
Kandiah, M. D. (2008) Contemporary history. Milligan, I. (2017b) ‘Pages by kids, for kids’:
Making History (http://www.history.ac.uk/ unlocking childhood and youth history
makinghistory/resources/articles/contempo- through the GeoCities Web archive.
rary_history.html). Researchers, practitioners and their use of
Knox, D. (2013) Understanding Regular Expres- the archived Web, International Internet
sions. The Programming Historian (https:// Preservation Consortium/RESAW Confer-
programminghistorian.org/lessons/ ence, London.
understanding-regular-expressions). Milligan, I. (2016) ‘Lost in the infinite archive:
Kunze, S. (2016) Capturing and preserving the the promise and pitfalls of web archives’,
EU Referendum debate (Brexit). UK Web International Journal of Humanities and Arts
Archive blog (http://blogs.bl.uk/webarchive/ Computing, 10(1): 78–94. DOI: 10.3366/
2016/06/capturing-and-preserving-the-eu- ijhac.2016.0161.
referendum-debate-brexit.html). Milligan, I. (2012) ‘Mining the “Internet grave-
McCarthy, H. (2016) Political history in the digi- yard”: rethinking the historians’ toolkit’,
tal age: the challenges of archiving and Journal of the Canadian Historical Associa-
analysing born digital sources. Impact of tion/Revue de la Société historique du
Social Sciences – LSE Blogs (http://eprints.lse. Canada, 23(2): 21–64. DOI: 10.7202/
ac.uk/66690/). 1015788ar.
McKie, R. and Thorpe, V. (2002) Digital Domes- Milligan, I., Ruest, N. and St. Onge, A. (2016)
day Book lasts 15 years not 1000. The ‘The great WARC adventure: using SIPS,
Guardian (https://www.theguardian.com/ AIPS, and DIPS to document SLAPPS’, Digital
uk/2002/mar/03/research.elearning). Studies/Le champ numérique, 2015–16
Manovich, L. (2016) ‘The science of culture? open issue (http://www.digitalstudies.org/
Social computing, digital humanities and ojs/index.php/digital_studies/article/view/
cultural analytics’, Journal of Cultural 325/412).
­Analytics. DOI: 10.22148/16.004 (http:// Millward, G. (2015) Digital barriers and the
culturalanalytics.org/2016/05/the-science-of- accessible web: disabled people, information
culture-social-computing-digital-humanities- and the internet. London: School of
and-cultural-analytics/). Advanced Study. pp. 1–12 (http://sas-
WEB ARCHIVES AND (DIGITAL) HISTORY 605

space.sas.ac.uk/6104/1/Millward%20 basic-digital-skills-standards/basicdigital-
case%20study.pdf). skills2016_findingssummary.pdf).
Moretti, F. (2013) Distant Reading. London: Weber, M. (2017) ‘A common language’, Inter-
Verso. net Histories: Digital Technology, Culture and
Schreibman, S., Siemens, R. and Unsworth, J. Society, 1(1–2): 26–38. DOI:
(2004) ‘The digital humanities and humanities 10.1080/24701475.2017.1317118.
computing: an introduction’, in S. Schreibman, Webster, P. (2017) Book review: The Silence of
R. Siemens and J. Unsworth (eds), A Compan- the Archive by David Thomas, Simon Fowler
ion to Digital Humanities. Oxford: Blackwell and Valerie Johnson. Review of Books –
(http://www.digitalhumanities.org/companion/). LSE Blogs (http://blogs.lse.ac.uk/lsereview­
Schwartz, J. M. and Cook, T. (2002) ‘Archives, ofbooks/2017/08/11/book-review-the-
records, and power: the making of modern silence-of-the-archive-by-david-thomas-
memory’, Archival Science, 2(1–2): 1–19. simon-fowler-and-valerie-johnson/).
DOI: 10.1007/BF02435628. Winters, J. (2018) ‘Digital history’, in P. Burke
Seles, M. (2017) ‘It always seems impossible and M. Tamm (eds), Debating New
until it is done’: sensitivity review and digital Approaches in History. London: Bloomsbury
records. Archives & Society and Digital His- Publishing. pp. 277-300.
tory seminars (https://www.youtube.com/ Winters, J. (2017a) ‘Will history survive the digital
watch?v=ncdbLdshheI). age?’, BBC History Magazine, 3: 41–5.
Taylor, M. (1997) ‘The beginnings of modern Winters, J. (2017b) ‘Coda: Web archives for
social British history?’, History Workshop humanities research – some reflections’, in
Journal, 43: 155–76. N. Brügger and R. Schroeder (eds), The Web
Tech Partnership (2017) Basic digital skills UK as History: Using Web Archives to Under-
2017: summary of findings (https://www. stand the Past and the Present. London: UCL
thetechpartnership.com/globalassets/pdfs/ Press. pp. 238–48.
This page intentionally left blank
Index
Note: Page numbers in italics indicate tables and figures. Page numbers followed by n indicate end-of-chapter notes.

#iranelection, 48 multiple archives, 197–9, 204


2-channel, Japan, 543, 546 news aggregators, 405
4chan, 578, 582, 583 page-level aggregation, 448
9/11, 364, 482 pay-level-domain (PLD) aggregation, 448
9/11 web collection, 44, 51 seed-URL aggregation, 448, 451
30-year rule, 474 aggregator sites, 558–60
56.com, 379 Aiden, Erez, 345
404 file not found, 44, 53n Ainsworth, S.G., 159
1990, historical context of, 392–3 AJAX, 376
Alexa Internet, 31, 44, 131
Aarhus University, 54n Alexa toolbar, 44, 51, 53n
A/B testing, 340 Alexandria, 44
Abbate, Janet, 73, 79, 80 algorithmic governmentality, 81
abortion, 487 algorithms, 405, 406, 407
absence, 492, 497, 501 Alkwai, L.M., 159
abundance, 113, 115, 118–21, 345–6 All the Delicate Duplicates (Breeze and Campbell,
Academia.edu, 379 2013), 437, 438
Accept-Datetime, 106, 195, 201 ‘All Your Base Are Belong To Us’, 512–13
Ackland, R., 487 alphabet, 538, 540
actor-network theory (ANT), 75, 78 AltaVista, 231, 245, 249, 336, 557
ad networks, 334–6 alt-right, 516
ad servers, 339 amateur pornography, 554
Adamic, L.A., 172–3 amateurs, 445, 458–9
AdSense, 337 Amazon, 93, 252, 376, 380, 491
Advanced Audio Coding (AAC), 501 America Online (AOL)
Advanced Research Projects Agency (ARPA), 217, 218, AOL Messenger, 375
270, 272 bulletin board systems (BBSs), 402, 539
advertising, 330–1, 332–3 electronic billboards, 333
ad network and dotcom bubble, 334–6 messenger software see QQ messaging app
CommerceNet Consortium, 286 multimedia sounds, 495
digital news organizations, 404 phishing, 572
display advertising, 333 surveillance advertising, 337, 339
electronic billboards and corporate home pages, Usenet, 66
333–4 American Standard Code for Information Interchange
Facebook, 252 (ASCII), 534n, 538–9
Google, 247–8, 251 ASCII graphics, 523, 524
HotWired, 246 American University of Cairo, 49
Mosaic, 282 American Wikipedia, 321–2
programmatic advertising, 338 Ammann, Rudolf, 359, 362
search advertising, 336–7 Anderegg, W.R., 92
splogs, 570 Andersen, Hans Christian, 46
surveillance advertising, 335–6, 337–41 Anderson, Benedict, 417
AdWords, 247 Anderson, Chris, 62
affiliate networks, 249 Andreessen, Marc, 276, 279, 283
Agamben, G., 368 Android, 290, 291, 292
agency, 75, 77, 574 animated visualization, 181–2, 182–3
aggregation, 170, 448, 449–50 Ankerson, M. S., 64, 227
host aggregation, 448, 450, 452, 453 anonymity, 481
meme aggregators, 516 anonymization, 109
608 THE SAGE HANDBOOK OF WEB HISTORY

Anonymous (hacker group), 578 ASCII (American Standard Code for Information Inter-
ANSI, 493 change), 534n, 538–9
AOL (America Online) ASCII graphics, 523, 524
AOL Messenger, 375 Asia see East Asia
bulletin board systems (BBSs), 402, 539 Asian Tsunami, 2004, 45
electronic billboards, 333 assemblages, 74, 75
messenger software see QQ messaging app assertional adequacy, 260–1
multimedia sounds, 495 Association of Internet Researchers, 101
phishing, 572 Asynchronous JavaScript and XML (AJAX) approach, 142
surveillance advertising, 337, 339 AT&T, 333–4, 392
Usenet, 66 Atari, 494
Apache Solr, 34 attention, 93–4, 242, 248, 249, 251, 252
API (application programming interface), 47, 48, 49 audience duplication, 94
app collections, 47 audiences, 407, 409–10
App Links, Facebook, 236 audio blogging, 497
app links, Google, 236 audio formats, 496
Apple audio streaming, 496, 499–500
app store, 294n, 380 audio technologies see sonic web
audio technologies, 494 AudioNet, 495
Financial Times, 305 audio-visual archives, 36
HyperCard, 230 Australia, 31, 36, 90, 92
iPhone, 291–2, 379–80, 401, 409 authorship, 599
iTunes, 236 autobiographical history, 46–9
iWatch, 47
news consumption, 407 backlinks, 247
Universal Links, 236 Backroom Casting Couch, 556
Apple Newsstand, 406 BackRub, 247
apps Badouard, R., 75, 79
and deep linking, 235–6 Baidu Baike, 90, 92, 248, 251, 529
health apps, 47 Bailey, Moya, 515
hybrid apps, 307n Bait Bus, 556
versus mobile Web, 290–2, 305–6 Bangemann report, 67
news apps, 408–10 Bangkok, 47
QQ messaging app, 527–9 banking services, 393–4
and smartphone, 304 banner ads, 246, 333–4, 334–5, 336
social media apps, 379–80 bar charts, 169
trolling, 585–6 Barger, Jorn, 360–1, 362, 367, 377
Archie, 244 Barmé, Geremie, 523
Archival Acid Test, 205 Barnet, B., 229
Archive Team, 32, 348 Bartlett, J., 581
archived snapshots, 154, 157, 159, 162, 163 Barton, David, 522
archived web see web archives bayareaindian.com, 393
archive.is, 194, 195, 199, 201, 204, 205, 206, 210 BBC, 404
Archive-It, 35, 49, 53, 53–4n, 119, 597, 598 BBC Domesday project, 601n
Mementos, 199 beavis-n-butthead group, 581
archives, 595–6 Bechdel Test, 510
see also web archives Belgium, 421
Archives Unleashed Toolkit (AUT), 350 Bell, E., 406, 407, 408
archivists, 598–9 Belmont Report, 101
ARPA (Advanced Research Projects Agency), 217, belonging, 388, 389, 396
218, 374 Ben-David, Anat, 7–8, 37, 134, 424
ARPANET, 374 Benkler, Yochai, 315
art see electronic literature (e-lit) Benner, Jeffrey, 512–13
Art Com Electronic Network (ACEN), 431 Berlin, 47
articulation, 360 Berners-Lee, Tim
artificial intelligence (AI) research, 257–61 deep linking, 236
Arts and Humanities Research Council (AHRC), UK, 596 hyperlinks, 128, 230–1, 234–5
INDEX 609

hypertext, 61–2 Bobrow, Daniel, 259


search, 244, 246 Bohnett, David, 346
technical protocols, 332 Bomis, 319, 320–2
W3C, 64, 65, 78–9 bookmark files, 245
World Wide Web, 256–7, 272–6 bookmarking, 377
Berry, David, 154 born-digital heritage, 76
Bertin, J., 169, 182 born-digital material, 17, 153, 161–2, 600
Beverly Hills Internet, 346 Ukraine Orange Revolution case study, 119–21
see also GeoCities University of Bologna case study, 116–18
bias, 35, 323–6, 389–90, 474, 540 see also electronic literature (e-lit); web archives
visual, 492, 497, 498 born-digital turn, 114–16
Bible, 484 botnets, 572, 574
bibliographical coupling, 128 boundaries and spatial contextualisation, 64–6
Bibliotèque nationale de France (BnF) boundary objects, 75–7
(National Library of France), 49, 442–6 Bowers, Bret, 513, 514
Big Data, 9, 333, 344 Bowker, Geoffrey, 69
Big Data analysis, 155 Boyandin, I., 182
‘Big Data: Demonstrating the Value of the UK Web Brachman, Ronald, 260–1, 266
Domain Dataset for Social Science Research’, 415 branches, 224
Big UK Domain Data for the Arts and Humanities Braudel, Ferdinand, 7
(BUDDAH), 9–10, 10–11, 415, 466, 595 Brazil, 47, 94, 379
Big5 encoding, 526, 534n Breeze, Met, 437, 438
Bina, Eric, 276, 279, 280, 283, 293n Brewster, Kahle, 43, 44
binary classification, 448 Bright, J., 177
Bing, 248, 252 Brin, Sergey, 247, 567, 568
biographical (single-site) history, 43–4, 49 British Library, 10, 17, 38, 420, 596–8
bitmap (BMP), 557 Brodsky, Ira, 305
Bitnet Relay, 375 Brown University, 222, 223
BitTorrent, 376 browser-editors, 274, 288, 290
Björneborn, L., 140 browsers, 276–83, 334, 335, 552
black boxes, 80 browser wars, 284, 287–8
BlackBerry, 291, 380 early history, 270–2
Blackmore, Susan, 508 mobile browsers vs. apps, 290–2. see also
Blair, Tony, 466 mobile Web
blockmodels, 126–7 for virtual reality, 285
blog software, 233, 234 World Wide Web, 272–6
Blogger, 375, 569 see also browser-editors; Cello; Firefox; i-mode;
blogonomics, 365 Internet Explorer; line-mode; Lynx; Midas;
blogosphere, 50, 51, 232–4, 252, 364, 377 Mosaic; Mozilla; Netscape Navigator; Opera;
blogrolls, 232, 245 Viola; WAP (Wireless Access Protocol)
blogs, 359–69 Bruce, Matt, 581
audio, 497 Bruce, Tom, 276, 293n
boundary objects, 77 Brügger, Niels
China, 529–31 archive vs. webrary, 38
in contemporary Web ecology, 368–9 Digital Humanities, 155–6
Eaton Web, 50 historiography of technology, 61
emergence, 360–3 hyperlink, 228
Italy, 369n national web domains, 7
microblogging, 359, 366–8, 378, 530–1 reborn-digital material, 115
political blogs, 264–6, 363, 364–6 web archives, 133, 154, 159
proliferation, 363–6 web archiving, 63
Red Blogs, 531 Bubbly, 380
religious, 485 BUDDAH (Big UK Domain Data for the Arts and
search, 250 Humanities), 9–10, 10–11, 415, 466, 595
user participation, 376, 377 bulletin board systems (BBSs), 374, 375
see also weblog community Computerized Bulletin Board System (CBBS),
Blood, Rebecca, 232 374, 538
610 THE SAGE HANDBOOK OF WEB HISTORY

East Asia, 541 Charniak, Eugene, 260


China, 523–5, 543–4 chat, 375
Japan, 546 chat rooms, 381
Korea, 544–5 Cheong, P.H., 485
Taiwan, 542–3 child pornography, 552
pornography, 552 Chinese Academic Network (CANet), 522–3
sonic component, 493 Chinese Internet, 531
Burgess, Jean, 522 Chinese language, 539, 540
Bush, V., 216, 229, 243 Chinese Web, 520–33
business archives, 36 ASCII graphics, 524
business community, 285–7 Baidu Baike, 90, 92, 248, 251, 529
Business Insider, 404 blogs, 529–31
business models, 402, 403–5, 408 bulletin board systems (BBSs), 523–5, 543–4
censorship, 91
.ca, 418 emails, 522–3
Cailliau, Robert, 61, 273, 274, 275, 276, 281–2, future of, 532–3
283, 285 homepages, 525–7
Callon, Michel, 76 Internet Archive, 82
Cambridge Analytica, xxvi QQ messaging app, 527–9
Campbell, A., 438 social media platforms, 379
Campbell, Heidi, 485 web visibility, 93, 94
Canada WeChat, 379, 531–2
Dale Askey Archive, 32 Wireless Access Protocol (WAP), 301
historiography, 3, 5–6 chiptunes, 494, 502n
Internet Archive, 82 Chollian, 544
‘Longitudinal Analysis of the Canadian World Christian-atheist antagonism, 487
Wide Web as a Historical Resource’, 415 Christianity, 481, 483, 485, 486, 487–8
National Library of Canada (NLC), 31 Chrome, ‘Memento for Chrome’ extension, 201, 202,
national web archive, 133 203, 204
national web domain, 418 chronological perspective, 60–4
Toronto Political Parties and Political Interest citation analysis, 128
Groups collection, 10, 11 citation graph, 568–9
capitalism citizenship, 391, 393
digital, 331, 339, 340 civic life, religion in, 483
industrial, 331 Clark, Jim, 283
print-capitalism, 417 classification, 448–9
CAPTCHA (Completely Automated Public Turing test Classmates, 375
to tell Computers and Humans Apart), 574 clickbait, 565, 573
CAS Institute of High Energy Physics, 525–6 client-side archiving, 142
Cascading Style Sheets (CSS), 572 Clinton, Bill, 333
cassettes, 491, 499, 501 Clinton, Hillary, 48
cataloging, 242, 243, 245 cloaking, 568
catalogues, 38 close reading, 7, 8–9, 356, 599
CB Simulator, 375 CNN, 404
ccTLD see country code top level domain (ccTLD) Coates, Tom, 364, 365
CD-ROM drives, 494 co-citation analysis, 128
CDs, 494, 495, 501 co-construction, 80–1, 82
CDX, 37, 38 Coleman, Gabriella, 515
Cello, 276, 277, 293n Coleridge, Samuel Taylor, 220
censorship, 91, 530–1, 552 collaborative consent, 515
centenaire.org, 458 collaborative software, 374
Center for Media Education, 339 collection creation see corpus creation
CERN, 64, 78, 273, 275–6, 279, 281, 285 collection types, 25–7
CERN line-mode browser, 275 collective and individual experiences, 66–9
CFidoNet, 543 College radio stations, 495
change over time, 181–3 Columbia University Human Rights Web Archive, 32
Charlie Archivist, 53n .com, 418, 422, 423
INDEX 611

comment spam, 233–4 Copenhagen science festival, 53n


CommerceNet Consorium, 286, 287 copyright, 39, 319–22, 496, 500, 511–12
commercialization, 232, 237, 378–9 corporate home pages, 333–4
see also web advertising corpus see web archive corpus
commodification, 242 corpus creation, 25, 26, 27, 161–2, 419–20
commodities, 491 see also First World War research project: collec-
Common Crawl Foundation, 31, 35, 38, 145 tion construction
communication, 87 counterhistory of the Web, 564–75
communication and media studies, 140 phishing, 571–2
Communications Act 1934, US, 332 post-human social Web, 572–5
community, 271–2 spam development, 566–8
digital history community, 114, 121 spam links, 568–71
virtual communities, 579 Counterpart (TV drama), xxv
weblog community, 233, 362, 377 country code second level domain (ccSLD), 422–3
community identity, 394–5 country code top level domain (ccTLD), 35–6, 161,
Community Memory, 374 417–19, 443
community-oriented homepages, 391–5 Northern Ireland, 423–4
completeness, 35, 159–60, 258, 267n Yugoslavia see .yu
see also incompleteness see also national web domains; second-level
Completely Automated Public Turing test to tell domains (SLD); transnational web domains
Computers and Humans Apart (CAPTCHA), 574 Cowls, J., 177
comprehensiveness, 31–2, 35 crawled web collection, 23–4, 26–7
CompuServe, 333, 375, 539, 545 see also web crawling
computational medium, 153, 155 creativity see vernacular creativity
computational methods, 153–64 credit reporting, 338
challenges, 157–60 Criado-Perez, 584
techniques and solutions, 160–3 criminality see phishing
theoretical background, 154–7 critical political economy (CPE), 331–2, 339–40, 341
computational social sciences, 154, 155, 157 Crocker, Lee, 319
computational turn, 113–14, 153, 154–5 CSS (Cascading Style Sheets), 572
computer mediated communication (CMC), East Asia cultural capital, 251
character-based script, 539–41 cultural digital practices, 163
China, 543–4 cultural heritage, 46
Japan, 545–7 cultural metaphors, 387–8
Korea, 544–5 culture, 579
Taiwan, 542–3 culturomics, 155, 162–3
Computerized Bulletin Board System (CBBS), 374, 538 cumulative frequency plots, 179
computers, 491–2, 493–5, 499, 501 curl, 190–2, 194, 195, 196, 197, 198, 199, 206,
confidentiality, 109 207, 209
configuration, 227 cyberculture, 579
consent, 103, 108–9, 515 cyber-pilgrimage, 486
constitutive choices, 332 cyberspace, 480
consumer data, 337, 338 Czech internet, 45
contemporary history, 594
‘Content-Encoding’ response header, 192 Dale Askey Archive, 32
contextual integrity, 105–6 ‘Dancing Baby’ meme, 511–12
contextualisation of web history, 59–70, 158–9 Danica, 46
boundaries and spatial contextualisation, 64–6 Dark Net, The (Bartlett, 2014), 581
collective and individual experiences, 66–9 DARPA Agent Markup Language (DAML), 263–4
diachronic and chronological perspective, 60–4 DART (Dynamic Advertising, Reporting, and
see also source critique Targeting), 335
controversies, 79, 82 data
controversy mapping, 130 ontological status, 101–3
conversion, 486, 487–8 see also Big Data; metadata; personal data; social
co-occurrence analysis, 141 network data; user data
cookies (internet), 335 data aggregation see aggregation
Copeland, Henry, 365 data brokers, 338
612 THE SAGE HANDBOOK OF WEB HISTORY

data collection, 335 digitality, 17–18, 21, 24–5


see also aggregation; surveillance advertising digitalization, 372
Data Communication Company of Korea (DACOM), 544 digitally native news, 402, 403, 404–5
data fusion, 338 digitised material, 17
data visualization techniques see visualization digitised newspapers, 18
techniques direct citations, 128
data-ink ratio, 169–70 disciplinarity, 87–9
dating sites, 105–6, 393 discussion boards, 372
Davison, Patrick, 510 dispersion plots, 179, 180
Dawkins, Richard, 507–8 display advertising, 333
Dawn BBS, 543 dispositif (device), 75
Dawson, Charlotte, 584 distance principle, 106–7
de Certeau, Michel, 522 distant reading, 7, 9, 10–11, 356, 599
declarativists, 259–60 distribution, 169, 179, 182
deep linking, 235–6 .dk, 418, 420
Deer, Brian, 467, 473, 474 DMOZ, 245, 246
Dejanews, 247 doctrine, 483–4
del.icio.us, 376, 377 Domain Name System (DNS), 161
Denmark domains see national web domains; transnational
country code top level domain (ccTLD), 418 web domains
heritage, 46 Donner, Jonathan, 300, 301
legal deposit, 32, 39 dotcom bubble, 246, 317, 334, 336
national web archive, 35, 49, 53n, 54n, 107, dotcom crash, 319, 321, 322
164n, 420 dotcom era, 316, 317
national web domain, 418, 429 DoubleClick, 334, 335, 336, 337
Netarkivet, 46, 418, 420 downloading, 19
deregulation, 333 DuckDuckGo, 251
derived data, 38 Dutch culture, 50
Dertouzos, Michael L., 65, 78 Duvall, Bill, 218, 219
description logics (DL), 260–1 Dykes to Watch Out For, 510
device see dispositif (device) dynamic graphs, 453–4
diachronic perspective, 60–4 dynamics of updating, 20–1
dial-up, 493, 498–9, 557 dystopia, 480, 481
Diaryland, 363
Dibble, Julian, 579–80 East Asia
Digital Archive of Chinese Studies (DACHS), 37 computer mediated communication (CMC),
digital archives, 99, 154 541–8
see also web archives script, 539–41
digital art see electronic literature (e-lit) see also China; Japan
digital capitalism, 331, 339, 340 Eastgate Systems, Inc., 429
digital curation, 596–8, 602n Eaton Web, 50, 51
digital diaspora, 388 eBay, 376
digital divides, 91, 92, 93 e-bile, 582
digital fingerprinting, 338 e-books, 434
digital folklore, 512 economics, 67–8
digital history, 12, 50, 595–-6 edge bundling, 173, 174, 175
digital history community, 114, 121 edge pruning, 172–3
digital humanists, 153 EdgeRank, 129
digital humanities, 12, 154–6, 595 edges, 170
digital intermediaries, 405 Edwards, Elwood, 495
digital libraries, 157 egao, 525, 529, 530, 531
digital media, 87, 89–90 e-government, 594
digital methods, 49 Egyptian Revolution, 46
Digital Methods Initiative, 48, 133, 157, 162 EIES, 374
Digital Millennium Copyright Act 1996, 496 Elastic Search, 34
digital preservation, 595–6 Elbaz, Gil, 31
digital special collections, 596–8, 599 electronic billboards, 333–4
INDEX 613

Electronic Frontier Foundation, 339 dominance, 93


Electronic Literature Collection, 433 EdgeRank algorithm, 129
electronic literature (e-lit), 428–31 finance capital, 341
contemporary situation, 437–8 heatmaps, 176
as net art, 433–7 informed consent, 108
origins, 431–2 live video streaming, 380
Electronic Literature Lab (ELL), 429 memes, 516, 517
Electronic Literature Organisation (ELO), 438 network analysis, 130–1, 450
emails, 374, 522–3, 528 news consumption, 401, 406–7
emerging essentials, 571 pornography, 552
emoticons, 510 search engines, 252
employment, 405 Share button, 234
Enchanted Forest (GeoCities), 350–3, 354 social graph, 129, 130–1
encoding, 523, 526, 533, 534n spam, 573
see also American Standard Code for Information success, 378
Interchange (ASCII) surveillance advertising, 338, 339
Engelbart, Douglas C., 216–17, 218, 219, 224, 229, 230 trolling, 584
English, Bill, 218 WhatsApp, 380
English language, 538–9, 540 Fanfou, 379
entity linking, 120 fashion blogs, 366
Enyedy, Edgar, 321–2 Federal Communications Commission, 332
epistemology, 260 Federal Trade Commission, 337
Ernst, Wolfgang, 43 Feldman, Brain, 517
Erwise, 276 feminism, 584
ethics, 100–10 FidoNet, 542, 543
informed consent, 103, 108–9 File Retrieval and Editing System (FRESS), 225
memes, 515 File Room, The (Muntadas, 1994), 435
private versus public data, 104–9 File Transfer Protocol (FTP), 244
and religion, 480, 481 file-sharing networks, 496
risk assessment and protection from harm, 103–4 filtering, 448–9, 451, 452, 551–2
‘text versus human’ debate, 101–3 filters, 362, 380
ethnography, 458 finance capital, 341
.eu, 418, 423 Financial Times, 305
EU Data Protection Directive, 102–3 Findery, 380
EU elections, 53n finger protocol, 243–4
Europe, 79, 423 Finneman, Niels Ole, 63
European Union, 341 Firefox, 288
European Web Archive, 35 First World War research project, 441–60
evangelism, 487 collection construction, 442–6
Evans, A., 487 metadata, 446–54
event-based history, 44–5 dynamic graphs, 453–4
event-harvesting, 119 qualitative analysis, 449–53
events collections, 118–21 quantity and quality, 447–9
everyday life, 89–92 results, 454–60
excerpt model splogs, 570 Fishbrain, 379
Excite, 245, 248 flame wars, 580–2
experiences, collective and individual, 66–9 Flash, 338, 436, 499
EZWeb, 294n Fletcher, Jackie, 475n
flickr, 376, 377, 378
Facebook focus groups, 90
App Links, 236 folksonomy, 376
archived content, 5, 82 Fonzarelli, Arthur, 510
autobiographical archiving, 46, 47, 49 Foot, K., 132, 133
Cambridge Analytica, xxvi Foot, K.A., 9, 45
copyright, 512 footnotes, 228
crawlers, 48 ForceAtlas2, 455
crawling, 54n forking, 320–2
614 THE SAGE HANDBOOK OF WEB HISTORY

‘Form Art’ (Shulgin, 1997), 436 origin and destination links, 351
formats, 363, 492, 495–6, 497–8, 499, 500, 501 PageRank, 350–6
forum software, 374 search engines, 354–6
F/OSS (Free and Open Software), 316–22 topic modeling, 354
Foursquare, 379, 380 web archive access, 347–50
Fox, Rosemary, 475n web publishing software, 375
.fr, 416, 418, 422, 423 geographical community metaphor, 506
fragmentation, 18 (geo)filters, 380
frame languages, 258, 260–1 Gerlitz, Caroline, 74
Frame-based Artificial Intelligence Language German language, 539
(FRAIL), 260 Germany, 47, 53, 94, 251, 379
‘Framework for Representing Knowledge, A’ Ghostery, 51
(Minsky, 1974), 258–9 GIFs, 507, 511
France gifs, 62
audio-visual archives, 36 Gigandet, Stéphane, 365
blogging, 365–6 Gillmor, Dan, 364–5
internet access, 66 ‘Gin, Television, and Social Surplus’ (Shirky, 2012),
legal deposit, 39 315–16
Minitel, 67 Giphy, 516
National Library (Bibliotèque nationale de Glance, N., 172–3
France), 49, 442–6 Gleick, James, 345
national web archive, 46, 158 Global Framework for Electronic Commerce, A
national web domain, 35, 416, 418, 422, 423 (1997), 333
Quaero project, 251 Global Network Navigator, 280, 286
visibility, 94 Global Organization of People of Indian Origin, 394
Web90 - Patrimoine, Mémoires et Histoire du globalization, 372
Web dans les années 1990, 416, 421–2 GNU Free Documentation License (GFDL), 317–18,
see also First World War research project 319, 320, 321
Fraunhofer Institute, 501 GNUpedia, 318, 326n
Free and Open Software (F/OSS), 316–22 Gnutella, 496
Freedman, Des, 333 Godwin’s Law, 506–7, 508–11
frequency plots, 179 Goldsmith, G.R., 92
FRESS (File Retrieval and Editing System), 225 gonzo porn, 554
Friendster, 377, 378 Google
Frontier, 361 app links, 236
Fruchterman-Reingold algorithm, 171 bias, 251–2
full content splogs, 570 blogoshere, 252
full-text search, 38 dominance, 92, 93
Furie, Matt, 516–17 expansion, 247–8
finance capital, 341
gadget blogs, 363 Google Hangouts, 380
gaming, 585 Google Image Search, 50
Gangnam Style, 514 Google Newsstand, 406
Garfield, Eugene, 128 Google Play, 497
Gaza War 2008, 323–4 Google Play Store, 380
gender, 318–19, 432, 586 Google Scholar, 49
see also Indian immigrant women Google+, 378
genealogies, 60–2 GoogleAdSense, 376
General Medical Council, 467, 473 news consumption, 407
Generation X, 484 PageRank, 129, 231–2, 247, 350–6, 568, 569
GeoCities, 5, 32, 64, 231, 344–57, 599 removal of sites, 250
China, 526, 527 search advertising, 336–7
distance and close reading, 356 search engine optimization (SEO), 249–50
Enchanted Forest/Glade/3891, 352 social media platforms, 252
history of, 346–7 spam, 233–4, 568–9, 570–1
link structure Enchanted Forest, 352 surveillance advertising, 337–8, 339, 341
memes, 506 ‘Google and the Politics of Tabs’, 49–50
INDEX 615

Google bomb, 250 Hein, Jon, 510


Google N-grams, 179 Hektor, A., 88
googlejiuce, 250 Helmond, A., 68, 134
googlization, 248 Hendler, James, 263
Gopher, 64, 70n, 244, 272, 279, 280 Hennessey, D., 173
Gopher t-shirt, 281 Henshaw-Plath, Evan, 367
gopio.net, 394 hentai, 552, 556
Gore, Al, 67, 271, 280, 284, 293n Heritrix, 34, 443
Gorsky, Martin, 466 HES (Hypertext Editing System), 222–5
GoTo, 336 higher education, history of, 116
governance, 64–6, 77–82 hindunet.com, 395
government see e-government hiragana, 540, 546
government archives, 33, 36 histograms, 169
see also national web archives historians
governmentality, 81 as archivists, 598–9
‘Grab them all’ add-on, 54n engagement with web archives, 593–8
Graham, S., 155 historical domain, 162
graph theory, 126 historical method, 112–22
graphics see non-textual elements; visualization tech- Ukraine Orange Revolution case study, 119–21
niques University of Bologna case study, 116–18
graphs see web graphs see also historiography
Gray, Matthew, 244–5 historical network analysis, 131–5, 161
Great Britain, 332 historiography
‘Great Firewall’ metaphor, 530 event-based, 44–5
‘Great War on the Web, The’ collection see First World importance of web archives, xxix, 3–13
War research project characteristics of web archives, 21–4
Green Card lottery, 565, 566, 571 computational access to web archives, 9–12
Grey, Noah, 363–4 scholary use, 7–8, 24–7, 49–51
Greymatter, 363–4 third wave of digital history, 12
Grigar, D., 432, 436–7 Wayback Machine, 8–9
Grindr, 379, 380 national, 45–6
Groksoup, 363 sources, 161–2
Guha, Ramanathan, 262 see also historical method; web archiving:
history of
H4 visa, 395, 397n history from below, 599, 602n
Habermas, Jürgen, 364 history of higher education, 116
hackers, 578 Hitchcock, Tim, 81
Hale, S.A., 134, 159, 422 hive plots, 174–5, 176
Hall, Wendy, 265 H-mail, 544
hangul, 540, 544 Hockx, M., 91
Happy Days, 510 home, idea of, 388–9
Hara, N., 585 homeland, 388, 390–1, 394
Hardin, Joseph, 276, 280 virtual, 389
Hardy, Jonathan, 340 homeland ideologies, 391–4, 396
Hargittai, E., 90 homepages, 140, 362, 387–97
harm, 103–4 China, 525–7
Harrington, Bryce, 321, 322 community-oriented homepages, 391–5
Harvard University, 581 corporate, 333–4
Haughey, Matt, 362 as a cultural form, 389–90
Hayes, Dave, 565 personal homepages, 389–91. see also personal
Hayes, Patrick, 259–60, 264 websites
Hayles, N.K., 428, 438 popularity, 375
health apps, 47 host aggregation, 448, 450, 452, 453
heatmaps, 175–6, 177, 178 HotBot, 245, 246, 249
Heer, J., 176 HotWired, 246, 333–4
Heflin, Jeff, 263 Hourihan, Meg, 363
Heilman, J., 92 household, 388, 389
616 THE SAGE HANDBOOK OF WEB HISTORY

Houseparty, 380 Hypertext Markup Language (HTML), 230, 233


HTML (Hypertext Markup Language), 230, 233 blogs, 362
blogs, 362 China, 526–7, 532
China, 526–7, 532 compact HTML, 291
compact HTML, 291 EZWeb, 290, 294n
EZWeb, 290, 294n ‘Heading 2’, 293n
‘Heading 2’, 293n HTML files, 18, 19, 22
HTML5, 306 HTML5, 306
Japan, 546 Japan, 546
phishing, 572 phishing, 572
Simple HTML Ontology Extension (SHOE), 263 Simple HTML Ontology Extension (SHOE), 263
spam, 568, 576 spam, 568, 576
HTML files, 18, 19, 33 Hypertext Transfer Protocol (HTTP), 190–2, 230, 244,
HTML5, 306 274, 335
HTTP (Hypertext Transfer Protocol), 190–2, 230, 244, see also Memento Protocol
274, 335 hypertextuality, 387, 429, 435
see also Memento Protocol
HTTrack, 33 IBM, 546
Huffington Post, 365, 377, 402, 405 IBM 360, 224
Huffington Post is Not a Blog, The (Barger, 2005), 367 Ibrus, Indrek, 298
Hulu, 337 Iceland, 35
human subjects, ‘text versus human’ debate, 101–3 Icelandic National Library, 45
Hustler, 553 ICICI Bank, 393–4
Huurdeman, H.C., 159 ICQ, 375
hybrid apps, 307n identity, 321–2, 362, 389, 394–5, 396, 583–4
HyperCard, 230, 273 see also transnational self
Hyper-G, 272 .ie, 423
hyperlink analysis, 43, 45, 46, 50, 53, 130 image files, 19
hyperlinks, 18–19 images, 33, 38
archived web, 22 IMG tag, 279
blogs, 361, 362 immanent Internet analysis, 481
change in meaning, 68 immigrants, 388–9
historical studies of, 79 community-oriented homepages, 391–7
Hypertext Editing System (HES), 224 personal homepages, 390–1
periodization of, 227–38 i-mode, 291, 294n, 299–300, 546–7
blogosphere and new link types, 232–4 INCLUDE tag, 279
effect of platformization on the hyperlink, incompleteness, 22, 162
234–5 see also completeness
hyperlink as currency of the web, 231–2 indexing, 34, 243, 244, 556
hyperlink as fabric of the web, 230–1 India, 94, 379
mobile apps and deep linking, 235–6 India Abroad, 392, 393, 394
proto-hyperlink in early hypertext systems, indiamatches.com, 393
229–30 Indian immigrant women, 395
social network analysis, 128 Indian immigrants, 388
web crawling, 19–20 community-oriented homepages, 391–7
World Wide Web, 273 personal homepages, 390–1
Hypermedia as the Engine of Application State (HET- individual and collective experiences, 66–9
AOAS), 197 individualization, 372
hypertext, 61–2, 345 indolink.com, 392, 394
history of, 215–25 indusladies.com, 395
File Retrieval and Editing System (FRESS), industrial capitalism, 331
225 information, 87
Hypertext Editing System (HES), 222–5 Information, The (Gleick, 2012), 345
on-line system (NLS), 216–20, 224–5, 230 information and communication technologies (ICTs), 74
Xanadu, 220–2, 229–30 information control, 319
Viola hypertext system, 272 information infrastructure, 94–5
Hypertext Editing System (HES), 222–5 Information Processing Techniques Office (IPTO), 218
INDEX 617

information retrieval, 119–20, 128, 130, 243 Internet broadcast, 495


see also search engines Internet community, 271–2
information science, 86 Internet Engineering Task Force (IETF), 65
information society, 372 Internet Explorer, 288, 302
information superhighway, 333–4 Internet governance, 77, 81–2
information-seeking see also web governance
and disciplinarity, 87–9 ‘Internet History in Australia and the Asia-Pacific’, 415
in everyday life, 89–92 internet literature, 91
web visibility, 93–4 see also electronic literature (e-lit)
Wikipedia, 92–3 internet memes see memes
informed consent, 103, 108–9 Internet Memory Research, 35
Infoseek, 245, 336 Internet Relay Chat (IRC), 374–5
infrastructure, 75, 77, 80, 81, 229, 331, 335, 571 internet research ethics, 101–10
Ingwersen, P., 140 private versus public data, 104–9
Inktomi, 246 risk assessment and protection from harm, 103–4
inodes, 190 ‘text versus human’ debate, 101–3
input systems, 528 Internet Studies, 479
INRIA, 64, 65 internet surfing see web surfing
Instagram, 47, 378, 381, 406, 552 Internet Underground Music Archive (IUMA), 495, 498
memes, 511, 517 internet users, 467
instant messengers, 375 see also user participation
Institute of Computer Applications (ICA), Bejing, 520, 522 internetting, 271
institutional archives, 499 interpretive flexibility, 76
Intel Chime, 495 interviews, 458–9
intellectual property, 319 iPhone, 291–2, 379–80, 401, 409
Intellivision, 494 IPTO (Information Processing Techniques Office), 218
interactive visualization, 181–2, 182–3 iQIYI, 379
interdisciplinarity, 596, 600, 601–2n #iranelection, 48
interface methods, 74 Iraq War, 2003-2011, 53
interfaces, 158 Ireland, 423–4
International Internet Preservation Consortium (IIPC), Ishii, Kenichi, 300
32, 37, 53n Islamic rituals, 486
international web history, 64–5 Issue Crawler, 130
internet, 468 issue network analysis, 130
internet access, 66 Italy, 369n, 421
Internet Archive, 4, 5, 31, 53n, 156 iTunes, 236, 380
access, 39, 107 podcasts, 497
Archive-It, 35, 49, 53, 53–4n, 119, 597, 598 Ivarsøy, Geir, 301
Mementos, 199 iWatch, 47
audio archives, 497–500
bias, 35 Jackson, Laur M., 515
Canada, 82 Jaiku, 366
China, 82 Japan, 94, 290–1, 300, 543, 545–7, 556
crawling, 45 Japanese language, 539, 540, 546, 547
Eaton Web, 51 Jeanneney, Jean-Noël, 251
full-text search, 38 Jenkins, H., 514, 522
GeoCities, 350 journalism, xxvii, 364–5, 594
Mementos, 193, 194, 195, 199 ‘jumping the shark’, 510
mission, 131 junk results, 566–8
national web archives, 421
political events collections, 119 Kahle, Brewster, 31, 82, 131
scope, 35, 159 Kaixin001, 379
single-site histories, 43–4 Kamada, Tomihisu, 291
sound and music, 498 Kandel, S., 176
subject-based archives, 37 kanji, 546, 547
University of Bologna, 117 Karlsruhe University, 520, 522
see also Wayback Machine Karp, David, 367
618 THE SAGE HANDBOOK OF WEB HISTORY

katakana, 540, 546 LinkedIn, 48, 379


Kazaa, 496 links, 224, 272, 274–5, 568–71
Kedrosky, Paul, 363 L’Institut nationale de l’audiovisuel, France, 36
keita, 546, 547 Liquid Audio, 496, 499
Kelty, Chris, 317 listening, 496, 499
keyboards, 538, 541 LISTSERV, 374
keyword searching, 164, 595, 596 literary games, 438
keyword stuffing, 249 literary study, 114
Kink.com, 559 literature see electronic literature (e-lit)
knowledge, 87 ‘Little History of the World Wide Web, A’ (W3C,
knowledge representation, 257–61 2000), 492–3
on the web, 261–5 live streaming, 380, 533
Knowledge Representation Language (KRL), 259 live web, 129–31, 208–10
Korea, 544–5 see also online web
Kosovo, 424 location-based services, 380
Kottke, Jason, 232 logicists, 259–60
Kozar, Seana, 523 LOLCats, 583
‘Kubla Khan’ (Coleridge), 220 ‘Longitudinal Analysis of the Canadian World Wide
Web as a Historical Resource’, 415
LAERD Statistics tutorials, 150 longue durée, 43
LambdaMoo troll incident, 579–80 Luke, Sean, 263
Lancet, 466, 473 Lussier, Ron, 511
landing page, 571–2 Lycos, 336
Langlais, Pierre-Carl, 79–80 Lynx, 272, 279
language, 94, 418–19, 526, 540
natural language processing (NLP), 148 MacGregor, Robert, 261
see also Web semantics; individual languages machine learning, 120
Lantis, Margaret, 522 Macintosh, 494
Lassila, Ora, 261–2, 265 mailing lists, 372, 374, 375
Last.FM, 497 mainstream media, 486
Last-Modified, 191–2, 207 Makati City, Philippines, 47
Latent Dirichlet Allocation (LDA), 180, 181 Malden, Karl, 581
Latour, Bruno, 66–7 Mallapragda, M., 387, 388, 393, 394
law, 483, 571 Malloy, Judy, 428, 431, 432
Le Monde, 448 Manhattan, 47
legal deposit, 32, 39, 442–3, 444 Manjoo, Farhad, 249–50
Legal Services Commission, 474, 475–6n Mann, Merlin, 573
Lerner, A., 79 Manovich, Lev, 47, 431, 433, 434
LGBTQ+, 379 Mansfield, Beth, 557
Lialina, Olia, 231, 435 marketing/media complex, 331–2, 340–1
libraries, 595–6 Marres, Noortje, 74
digital, 157 Marvin, Carolyn, 331
see also national libraries Mashable, 377
library catalogues, 38 mass marketing, 331
Library of Congress, US, 37, 45, 48, 49, 512 see also marketing/media complex
website, 2002, 146 Mataley, J., 161–2
Licklider, J.C.R., 217–18 MatchLogic, 334
Lie, Håkon Wium, 301–2, 307n material turn, 75
lieu de mémoire, 455 materiality, 75, 514
life blogging see autobiographical archiving Maurer, Hermann, 279
‘likefarming’ firms, 573 McCarthy, John, 259
Limewire, 496 McChesney, Robert, 341
line-mode, 275 McDermott, Drew, 259
link economy, 232 Measles-Mumps-Rubella (MMR) vaccine crisis, 464–75
link farms, 568 Alan Phillips report, 471–3
link rot, 466 MMR The Facts website, 468–71, 472
linked data movement, 265 Society of Autistically Handicapped, 473–4
INDEX 619

media, 361, 364–5, 366, 367, 467 methodology see computational methods; digital
see also digital media; marketing/media complex; methods; historical method; interface methods;
print media network analysis; quantitative research methods;
media archaeology, 75 visualization techniques
media culture, 551–2 metrics, 177–8
Media Gossip, 364 Metz, Cade, 552
media studies, 74–5, 79, 140 Meyer, E., 173, 182
mediation, 75–6 Miami, 47
mediatization, 372 Michel, J.B., 155, 345
medical content, 92–3 microblogging, 359, 366–8, 378, 530–1
medium-specificity, 229 see also Twitter
MediWiki, 196, 198 microfilm, 243
Meerkat, 380 microportals, 362
meinVZ, 379 Microsoft
meme aggregators, 516 Bing search engine, 248, 252
Meme Machine, The (Blackmore, 1999), 508 browser wars, 288
Memento Project, 53n mergers, 337
Memento Protocol, 143, 189, 190, 192–211 mobile Web, 380
aggregators of multiple archives, 197–9 Opera browser, 302
framework, 198 surveillance advertising, 339
‘Memento for Chrome’ extension, 201, 202, Windows 95, 289
203, 204 Microsoft Explorer, 552
ongoing research, 201–10 micro-worlds, 258
aggregators of multiple archives, 204 Midas, 276, 277–8, 279
archives versus live web, 208–10 MIDI (Music Instrument Digital Interface) files, 494–5
quality and temporal coherence, 205–8 migrants, 388–9
URI routing, 201–4 community-oriented homepages, 391–7
terminology and concepts, 192–7 personal homepages, 390–1
Time Travel service, 199–201 Milligan, Ian, 506
Memento-Datetime, 193, 207 Millward, Gareth, 9–10
memes, 62, 505–17 Milner, David, 513–14, 515
‘All Your Base Are Belong To Us’, 512–13 Mindgeek, 559, 560
‘Dancing Baby’ meme, 511–12 Minitel, 61, 66, 67, 421–2
future of, 516–17 Minsky, Marvin, 258–9
Godwin’s Law, 506–7, 508–11 Misa, Tom, 59
meme studies, 513–15 MIT Immersion Lab, 10
trolling, 583 MIT Laboratory for Computer Science (MIT-LCS),
Memes in Digital Culture (Shifman, 2012), 513 64–5
Memex, 216, 229, 230, 243, 345 Mitra, A., 581–2
MemGator, 199, 206 mixtapes, 491
Mémoire des hommes, 459 MMR The Facts website, 468–71, 472
memorial vandalism, 583 MMR vaccine crisis see Measles-Mumps-Rubella
memory, 455, 458, 460 (MMR) vaccine crisis
Memory Parctices in the Sciences (Bowker, mobile apps, 47, 235–6, 290, 292, 304, 305–6, 379–80
2006), 69 mobile browsers, 290–2
Merletti, F., 422–3 mobile dating apps, 380
Merzeau, Louise, 63 mobile messaging, 380
messengers, 373, 375, 379 mobile news apps, 408–10
meta process, 372 mobile phones, 298, 302, 304, 379–80, 547
Meta-Content Framework (MCF), 262 mobile trolling, 585–6
metadata, 10–11, 38, 145, 446–54, 556 mobile usage, 379–80
dynamic graphs, 453–4 mobile Web, 236, 297–307
qualitative analysis, 449–53 i-mode, 299–300
quantity and quality, 447–9 Japan, 546–7
MetaFilter, 362 memes, 516
metaphors, 346, 387–8, 396, 506, 530 mobile Web 2.0, 303–4
meta-search, 246 Opera browser, 301–3
620 THE SAGE HANDBOOK OF WEB HISTORY

smartphones and social media, 304–6 National Library of the Netherlands, 420–1
Wireless Access Protocol (WAP), 298–9, 300–1 National Security Agency (NSA), 10
modems, 492, 493, 498, 499 national web archives, 45–6, 49, 53n, 164n, 419–21
mommy blogs, 366 Canada, 133
Mondothèque, 229 Denmark, 35, 49, 53n, 54n, 107, 164n, 420.
Montulli, Lou, 285 see also Netarkivet
Moreno, Jacob, 125–6 France, 46, 158
Moretti, Franco, 4, 7, 356 Germany, 53
Mosaic, 231, 276, 279–83, 284, 288, 552 Netherlands, 420–1
Moscow, 47 United Kingdom, 7, 33, 35, 36–7, 46, 81, 164n,
Motion Picture Experts Group, 496 420, 599
‘Mouchette’ (Neddam, 1997), 435 University of Bologna, 117–18
Moulthrop, S., 432 national web domains, 35–6
mouse, 217, 218, 219 historical studies of, 413–25
Movable Type, 364 brief research history, 415–16
Mozilla, 283 Norther Irish web domain, 423–4
mp3, 496, 497, 501 research themes and challenges, 416–21
mplc.co.uk, 473 second-level domains (SLD), 422–3
Mullenweg, Matt, 368 transnational web domains, 423
Multilingual Internet Names Consortium, 540 ‘Web90’ research project, France, 421–2
Multi-Media Marketing Group, 248 Yugoslavia, 7–8, 134, 161, 162, 163, 418, 424
multimedia platforms, 373, 379 see also country code top level domain (ccTLD)
multimedia revolution, 494 national web history, 66
multimedia sounds, 494–5 nation-ness, 417
multinational web history, 64–5 native apps, 305
Mundaneum, 61 natural language processing (NLP), 112, 118, 119, 120–1, 148
Muntadas, Antoni, 435 Navicrawler, 459
Museum of Endangered Sounds, 499, 502n ‘Nazi-comparison’ meme, 509–10
music NBC, 334
QQ Music, 528 NCSA (National Center for Supercomputing
see also sonic web Applications), 276, 279, 280–1, 283–4
Music Box (Trantor, 1991), 495 neats vs. scruffies, 259–60, 264
Musso, M., 422–3 Neddam, Martin, 435
‘My Boyfriend Came from the War’ (Lialina, 1996), 435 negative policy, 333, 338
MySpace, 338, 377, 378, 496 Nelson, M.L., 132
Nelson, Theodor Holm (Ted), 61, 215, 220–2, 223, 225,
Nakamura, Lisa, 515 229–30
namaste.com, 395 neoliberalism, 333, 340, 363, 365, 366
Napoli, Philip, 340 Net Art, 433–7
Napster, 376, 492, 496, 498 Net Art Anthology (Rhizome, 2016), 434, 435
Narrabase, 431, 432 Netarkivet, 46, 418, 420
National Archives, UK, 33 Netherlands
National Center for Supercomputing Applications culture, 50
(NCSA), 276, 279, 280–1, 283–4 deposit laws, 46
national delimitation, 417–19, 421–2 national web archive, 420–1
national history, 45–6 national web domain, 418, 420
national identity, 321–2, 389, 394–5, 396, 526 web archive users, 49, 54n
see also transnational self web sphere analysis, 156
national libraries, 31–2, 36–7, 39, 45–6, 53, 158 Netherlands Institute for Sound and Vision, 36
see also British Library; Library of Congress, Netiquette, 64
US; National Library of France (Bibliotèque Netlab, Aarhus University, 54n
national de France); National Library of the netporn, 555–6
Netherlands Netscape Gold, 288, 294n
National Library of Australia, 31 Netscape Navigator, 284, 287, 288, 335, 552
National Library of Canada (NLC), 31 network analysis, 13, 125–35, 350
National Library of France (Bibliotèque national de application in live web research, 129–31
France), 49, 442 application in web history research, 131–4, 161
INDEX 621

criticism, 127 NTT DoCoMo, 291


First World War research project, 441–60 numerical translation, 145–8
collection construction, 442–6 Nuremberg Code, 101
metadata, 446–54 Nye, David E., 480
results, 454–60
future directions, 134–5 Occupy Wall Street, 6
historical foundations, 125–6 Odnoklassniki, 379
influence on search engines, 128–9 OICQ see QQ messaging app
influence on social media, 129, 130–1 OkCupid dating site, 105–6
key concepts, 126–8 Old Bailey, 345
network density, 126 Oldweb.today, 68
network diagrams, 126, 130, 131–2, 133 Olympic Games, 37, 53n
network structure, 129, 135n one-way analysis of variance, 149
network visualization, 170–8, 450, 451, 452 Ong, Walter, 138
edge bundling, 173, 174, 175 online attention, 93–4
edge pruning, 172–3 online banking, 393–4
heatmaps, 175–6, 177, 178 online content platforms, 405
hive plots, 174–5, 176 online forums, 362
metrics, 177–8 online gaming, 585
node bundling, 173, 175 Online Guitar Archive (OLGA), 495
nodes, 170 online indexes, 243, 244
see also web graphs online interaction, 372, 374–5, 381
New York, 47 online literature, 91
New York Times, 52, 365, 366, 404 see also electronic literature (e-lit)
New Zealand, 32 online memorial vandalism, 583
news aggregators, 405 online news, 402–5
news alerts, 408, 409 online pornography, 249, 551–61
news blogs, 363, 364–5 availability, 552–4
news consumption, 400–10 searchability and taxonomies, 556–8
mobile news apps, 408–10 types of, 554–6
newspapers on the web, 402–5 video aggregator sites, 558–60
social media, 405–8 online sociality, 375, 381
news ecosystem, 401, 402, 405, 406, 407–8, 409–10 on-line system (NLS), 230
news organizations, 401, 407, 409 online visibility, 94
news pages, 361–2 online web, 16, 18–19, 20, 21, 22, 24–5, 129–31
newsgroups, 372, 374, 375 see also live web
newspapers, 18, 364–5, 400, 402–5, 417 online-offline relationship, 481–2
see also print media oN-Line-System (NLS), 216–20, 224–5
niche blogs, 366–7 ontologies, 257–61
Nielson, 334 open source, 317
Nifty-Serve, 546 see also Free and Open Software (F/OSS)
Nissenbaum, H., 105–6 open standards, 232
.nl, 418, 420 Open Systems Interconnect (OSI), 272
NLS (oN-Line-System), 216–20, 224–5 Open UK Web Archive, 39
nofollow, 233–4, 250 Open Wayback, 34
Nokia Communicator 900, 298 open web, 232, 236
Non-Resident Indian (NRIs), 393 Opera, 291, 301–3
non-textual elements, 162–3 Opera Mini, 302, 303
see also visualization techniques oral culture, 138
Nooney, L., 514 oral history, 5, 7, 113, 486, 487
Nora, Pierre, 455 Orange Revolution, 119–20
North Carolina, 36 O’Reilly, Tim, 62, 227, 376
Northern Ireland, 423–4 original, lack of, 22
Northern Light, 246 Original Resource, 189, 193
‘notice and choice’, 339 Orkut, 379
NPOV (Neutral Point of View), 327n Otlet, Paul, 61, 229
NRC Handelsblad, 50 outlink extraction method, 161, 162
622 THE SAGE HANDBOOK OF WEB HISTORY

outsourcing, 334 Pitas, 363


Overture, 336 place of memory, 455
PlanetRomeo, 379
Page, Larry, 247, 567, 568 platform approach, 338
page-level aggregation, 448 Platform for Internet Content Selection (PICS), 79, 262
PageRank, 129, 231–2, 247, 350–6, 568, 569 platformization, 234–5
pages persos, 365 PLATO, 374, 375
pageviews, 570 playback programs, 496
Palm, 290, 291 Playboy, 553
Paloque-Berges, Camille, 76 podcasts, 377, 497
Pandora (music service provider), 497 political blogs, 264–6, 363, 364–6
PANDORA project, Australia, 31 political economy
paradigmatic technology, 368 critical political economy (CPE), 331–2, 339–40, 341
Park, H.W., 130 of linking, 232–4
participation see user participation political events collections, 118–21
participatory vigilance, 80 political history, 595–6
Pasig, Philippines, 47 Politico, 405
path dependence, 332 politics, 67–8, 81, 435
Pathfinders, 430, 432, 438n Pootawn, Ralph, 585
pay-level-domain (PLD) aggregation, 448 pop-ups, 335
PC-VAN, 546 Pornhub, 558, 560
PDF files, 33 pornification culture, 551–2, 560–1
Peking University, 523 pornography see child pornography; online pornography
Pellow, Nicola, 275 portal war, 246
Pentagon, 44 Portugal, 46
Penthouse, 553 Portuguese Web Archive, 39
‘Pepe the Frog’ meme, 516–17 Portwood-Stacer, L., 514
periodisation of web history, 62–4 positional variables, 169
see also web archiving: history of post-human social Web, 572–5
Periscope, 380 posts, 232
permalinks, 232 prayer, 483, 486
Persian Kitty, 557 preservation, 34
personal computer communication, 546 International Internet Preservation Consortium
personal computers, 493–4 (IIPC), 32, 37
personal data primary sources (historical concept)
private versus public data, 104–9 born-digital material as, 116–21
protection from harm, 103–4 collection of, 113
‘text versus human’ debate, 101–3 web archives as, xxix, 3–6
see also user data print media, 434, 484, 486
personal digital archiving, 598–9 see also newspapers
personal homepages, 390–1 print news, 402
personal websites, 375 print-capitalism, 417
see also personal homepages privacy policy, 338–9
personality spamming, 573 private versus public data, 104–9, 443
Pew Internet & American Life Project, xxvi private versus public self, 391
Pew Research Center, 405 privatization, 333
Philippines, 47 proam (professional amateurs) performers, 554–5
Phillips, Alan, 471–3, 475n ‘Probing a Nation’s Web Sphere - the Historical
Phillips, Whitney, 582 Development of the Danish Web’, 415
phishing, 571–2 proceduralists, 259–60
photo-sharing, 377 Procter & Gamble, 337
Pihlaja, Stephen, 487 Prodigy, 333, 402
pilgrimage, 486 producers, 404–6, 406–7, 407–8, 409
pingbacks, 233 profile database, 335, 336, 338
Pinkerton, Brian, 245 programmatic advertising, 338
Pinterest, 48, 378, 552 Progressive Networks, 495
Pioch, Nicolas, 422 Project Xanadu, 220–2, 229–30
INDEX 623

proprietary linking, 236 Red Blogs, 531


ProPublica, 404 Reddit, 48, 515
proselytisation, 487, 488 Rediff.com, 393–4
Prost, Antoine, 63 reference citation, 80
proto-hyperlink, 229–30 refind sites, 245
PTT network, 542–3 relational sociology, 127, 129, 131
public life, religion in, 483 reliability, 87, 90, 93
public policy, 332, 333 Wikipedia, 324–6
public sphere, 364 religion, 479–88
public versus private data, 104–9, 443 historical model of, 483–7
public versus private self, 391 doctrine, 483–4
publishing, 434 organisations, 484–5
religious texts, 484 practice, 485–6
punch cards, 243 religions and the Other, 486–7
Pyra Labs, 363, 569 historical questions, 482–3
relationship of online and offline, 481–2
QQ, 379 religious views of the web, 480–1
QQ Mail, 528 religious experience, 486
QQ messaging app, 527–9 religious freedom, 484–5
QQ mobile, 301 religious organisations, 484–5
QQ Music, 528 religious practice, 485–6
QQ Pinyin, 528 religious radicalism, 482–3
Quaero project, 251 religious studies, 479
qualitative analysis, 449–53 religious Web, 480–1
quantitative research methods, 138–51 RenRen, 379
analysis, 148–50 Repository-Based Software Engineering (RBSE)
conclusions, 150–1 spider, 245
limitations, 150 Representation State Transfer (REST), 197
literature, 139–40 RESAW (Research infrastructure for the Study of
numerical translation, 145–8 Archive Web Materials), 596
research questions, 140–2 research ethics, 100–10
web archive corpus, 142–5 private versus public data, 104–9
Quantum Link (Q-Link), 495 risk assessment and protection from harm, 103–4
see also AOL ‘text versus human’ debate, 101–3
Quartz, 404 research methodology see computational methods;
query routing problem, 201–2 digital methods; historical method; interface
QuickTime, 50 methods; network analysis; quantitative research
QWERTY keyboard, 538, 541 methods; visualization techniques
QZone, 379, 528 research questions, 140–2
ResearchGate, 379
racism, 582 Resource Description Framework (RDF), 262–3
Radio Act 1927, US, 332 ‘Rethinking the Participatory Web’ (Stevenson, 2014), 63
radio stations, 495 retinal variables, 169
Raleigh, Sir Walter, 479 Rhizome, 47, 434, 435
Ramirez, Carlos, 583 Rieder, B., 131
random surfer, 569 right-wing extremism, 50
‘Rape in Cyberspace, A’ (Dibble, 1993), 579–80 risk assessment, 103–4
rationalization of audience understanding, 340 ritual, 483, 486
Ravelry, 379 rmplc.co.uk, 473–4
RBSE (Repository-Based Software Engineering) spider, 245 Robot Wisdom, 360–1
RealAudio, 495–6, 498–9 robots.txt, 34, 204, 206
reality porn, 554–5 Robust Links, 209–10
RealPlayer, 498–9 Rogers, R., 157
reborn web, 21–4 Roman alphabet, 538, 540
reborn-digital material, 17, 115, 154, 156 Romenesko, James, 364
Recho, 380 Ronson, Jon, 516
reconstructed domain, 163 Rosenberg, Scott, 361, 366
624 THE SAGE HANDBOOK OF WEB HISTORY

Rosenzweig, Roy, 4, 345 pornography, 557–8


Rouse, A., 473 search engine optimization (SEO), 248–50
Royal Library of Sweden, 32 social media platforms, 252–3
RSS (Really Simple Syndication or Rich Site spam, 567–8
Summary), 377, 569–70 TF-IDF Search Engine, 355
Russell, Andrew, 65, 67 user behaviour, 88, 90, 93
Russia, 47, 94, 379 see also digital intermediaries; Google
Second Life, 585
sacred texts, 483–4 second-level domains (SLD), 422–3
Said, Khaled, 46 secularisation, 480, 483
SalahEldeen, H.M., 132 secularisation theory, 482
Salton, Gerard, 243 Secure Digital Music Initiative, 496
samachar.com, 392 seed-URL aggregation, 448, 451
Sanger, L., 318, 319, 321, 322 Segal, Ben, 275
São Paulo, 47 self-archiving, 47
sawnet.org, 395 selfies, 42, 48
scarcity, 115, 116–18, 345–6 Selfish Gene, The (Dawkins, 1989), 507–8
scatterplots, 169, 179 self-organizing maps (SOMs), 180, 181
Schafer, R.M., 500 self-performance, 360–3, 365, 367, 368
Schafer, Valérie, 421 self-publication, 599
Schank, Roger, 259, 264 self-regulation, 333, 339
schema.org, 265, 266, 267 self-selection, 89
Schiller, Dan, 340 Semantic Web, 257–67
Schneider, S.M., 9, 45 genealogies of Web semantics, 266
Schreiber, Guus, 264 knowledge representation, 257–61
Schrock, Andrew, 306 on the web, 261–5
SchülerVZ, 379 Semiology of Graphics (Bertin, 1964), 169
science and technology studies (STS), 73–83 semiotic layers, 18
hyperlink network analysis, 130 Sendall, Mike, 274
and information and communication technologies server-side web archiving, 142
(ICTs), 74 sexism, 582
and media studies, 74–5, 79 sexting, 552
and web governance, 77–82 sexual assault, 580
and web history, 75–7 Shachaf, P., 585
Science journal, 154–5 Share button, Facebook, 234
scope, 449, 451–3 Shifman, Limor, 513–14
Scott, Jason, 348, 349–50 SHINE interface, 10–11, 38
screen dumps, 26 Shirky, Clay, 315–16
screen movies, 19, 23, 25–6 SHOUTcast, 497
screencast documentaries, 43, 49–50, 53 Shulgin, A., 436
screenshots, 23, 33 Siles, Ignacio, 77, 360, 362, 363, 364, 367, 375
script, 538–9 Silicon Valley, 283
character-based, 539–41, 542, 544, 546 Simple HTML Ontology Extension (SHOE), 263
search simplicity, 367
relevance, 246, 250 SINA Weibo, 379, 530
social, 252–3 Sinclair, John, 331
search advertising, 336–7 single-site approach, 158
search engine optimization (SEO), 248–50 single-site histories, 43–4, 49
search engine results pages (SERP), 249, 250 SixDegrees, 375, 378
search engine wars, 246 SLAC (Stanford Linear Accelerator Center), 7
search engines, 231, 242–53 Slashdot, 247
bias, 251–2 Smales, Andrew, 363
co-evolving search, 253 small multiples, 182
early history, 243–4 Smarr, Larry, 276, 280
GeoCities, 354–6 smartphones, 47, 91, 291–2, 304–6, 379–80
before Google, 244–7 news consumption, 401
Google Revolution, 247–8 ownership, 408
INDEX 625

trolling, 585–6 dial-up modems, 493


use, 409 interfaces, 496
Smithsonian Institution, 33, 37, 39n, 44 Internet broadcast, 495
Snapchat, 5, 47, 380, 406 Internet Underground Music Archive
Snowden, Edward, 10 (IUMA), 495
social bookmarking, 377 MIDI (Music Instrument Digital Interface)
social buttons, 234 files, 494–5
social graph, Facebook, 129, 130–1 Quantum Link (Q-Link), 495
social media, 372–81 sound cards, 494
apps and mobile usage, 379–80 web archives
conceptualization, 373–4 institutional archives, 499
diversification, proliferation and commercialization, Internet Underground Music Archive
378–9 (IUMA), 498
and network analysis, 129, 130–1 streaming, 499–500
news consumption, 405–8 YouTube, 498–9
precursors of, 374–5 sound cards, 494
primary uses, 105 sound histories, 497–500
rise in the wake of Web 2.0, 376–8 SoundCloud, 379, 500
see also social media platforms; social networking soundscape, 500
sites (SNS) source critique, 161–2
social media archiving, 46–9 source types, 416–17
social media platforms South by Southwest, 362
crawling, 53–4n South Korea see Korea
effects on the hyperlink, 234–5 space, 21, 25
memes, 517 spam, 233–4, 564–75
in order of importance, 48 development, 566–8
religion, 487 links, 568–71
search engines, 252–3 phishing, 571–2
spam, 573–4 post-human social Web, 573
see also social networking sites (SNS) spamdexing, 248–9
social media trolling, 583–5 Spanish Fork, 321–2
social network analysis see network analysis spatial contextualisation of the web, 64–6
social network data, 126 spatial inconsistency, 22
social networking apps, 380 spatial metaphors, 346, 506
social networking sites (SNS), 366, 368 spear-phishing, 572
memes, 513, 516 splogs, 570
rise of, 377–9 Spotify, 4, 500
sound and music, 496–7 Spyglass, 284
surveillance advertising, 338 stabilization, 360, 369n
see also social media platforms Stallman, Richard, 317
social sciences, 156 Stanford Artificial Intelligence Lab, 243–4
computational, 154, 155, 157 Stanford Linear Accelerator Center (SLAC), 7
social search, 252–3 Stanford University Libraries, 7
Social Theory after the Internet, 414 Star, Susan Leigh, 76, 81
social Web, 572–5 Starr, Paul, 332
Society of Autistically Handicapped, 473–4 statistical significance, 149
sociology of religion, 479 statistics resources, 150
software, 362–4, 368 Steinbeck, John, xxvii
Something Awful, 512 Sterne, J., 501
Song Gang, 526–7 Stevenson, Michael, 62–3
sonic web, 491–501 stop words, 179
development Storyspace, 429
audio blogging and podcasting, 497 storytelling, 404
audio formats, 496 Strachey, Christopher, 428
audio streaming, 496 strategic arbitration, 485
bulletin board systems (BBSs), 493 Strava, 379
CD-ROM drives, 494 streaming, 380, 496, 499–500, 533
626 THE SAGE HANDBOOK OF WEB HISTORY

Stroke Width Transform (SWT), 145–6 TimeGates, 189, 193, 196, 199
structural equivalence, 126 TimeMaps, 189, 193, 195, 199
structural inheritance network, 260 timestamps, 132
StudiVZ, 379 Tinder, 380, 585–6
subject-based archives, 36–7 TinyURL, 235
subsumption, 260 Toaplan, 512–13
suicide, 435, 584 topic modeling, 354
Suicide Girls, 555 Toronto Political Parties and Political Interest Groups
surfing, 245, 248, 390 collection, University of Toronto, 10, 11
surveillance advertising, 335–6, 337–41 Traces, 380
Sweden, 32 trackbacks, 233
Swedish National Library, 45 trackers, 51, 53
New York Times, 52
tagging, 377 see also HTTrack
Taiwan, 542–3 tracking, 42, 43
Talkomatic, 375 TrackingExcavator, 79
TANA.org, 395 transclusion, 221, 222, 230
Taneja, H., 94 transgender porn, 556
tapes, 491, 497, 499, 500 transience, 380
TCAT (Twitter Capture and Analysis Tool), 48 translation notion, 76
TCP/IP, 539, 542, 544 transnational self, 390–1
TechCrunch, 377 transnational subjectivity, 391–2
technological sublime, 480 transnational subjects, 394
technology see science and technology studies (STS) transnational web domains, 418, 423
technology companies, 405, 406–8 transnational web history, 64–5
Technorati, 252 Trantor, 495
teletext, 537 triangulation tool, 162, 164n
televideo, 537 TripAdvisor, 159, 379
Telex technology, 402 trolls and trolling, 577–87
Telnet, 541, 542 pre-web and Web 1.0, 579–82
temporal graphs see dynamic graphs Web 2.0, 582–6
temporal inconsistency, 22, 24, 25 Trompette, P., 76
temporal validity, 207 Trouillot, Michel, 390
temporal violations, 205–8 Trump, Donald, 13, 48, 515
Tencent Inc. Tsinghua University, 524, 525
QQ messaging app, 527–9 Tufte, E., 169, 182
WeChat, 531–2 tumblelogs, 366
Tencent Weibo, 379 Tumblr, 366, 367, 378, 517
terminological adequacy, 260 Twitter
territoriality, 443 archiving, 4, 82, 142
text, 139, 146–50 copyright, 512
religious, 483–4 event-harvesting, 119
visualisation techniques, 178–81 launch, 406
see also non-textual elements memes, 511, 516, 517
text analysis, 178–81 microblogging, 366, 367, 378
text mining, 121 Net art, 436
‘text versus human’ debate, 101–3 network analysis, 130, 131
TF-IDF Search Engine, 355 news consumption, 401, 407, 408
Thailand, 47 Periscope, 380
Thatcher, Margaret, 161–2 posting links, 235
Thelwall, M., 35, 130, 140, 159 religious texts, 484
theology, 484 search engines, 252
Tidal, 497 spam, 573
time, 21, 190 surveillance advertising, 338
change over, 181–3 TCAT (Twitter Capture and Analysis Tool), 48
see also Memento Protocol; temporal inconsistency trolling, 584–5
Time Travel, 199–201 Twitter Revolution, 48
INDEX 627

two-way links, 230, 234 selfies, 47


typewriters, 538, 539 sexting, 552
televideo, 537
.uk, 416, 417, 418, 420, 422, 423 voice search, 91
UK Legal Deposit Web Archive, 39 web advertising, 330–1, 332–3
Ukraine Orange Revolution, 119–20 ad network and dotcom bubble, 334–6
Ulman, Amalia, 47 electronic billboards and corporate home
UN Declaration of Human Rights, 101 pages, 333–4
Uncharted: Big Data as a Lens on Human Culture search advertising, 336–7
(Aiden and Michel, 2013), 345 surveillance advertising, 337–41
Uncle Roger (Malloy, 1987-88), 428, 429–31, 432 web domains, 35–6
Understanding How Computing Has Changed the web visibility, 94
World (Misa, 2007), 59 White House, 50
UNICODE, 526, 540 WhiteHouse.gov 2002, 147
United Kingdom see also American Wikipedia
Arts and Humanities Research Council (AHRC), Unity, 436
596 Universal Links, Apple, 236
Big UK Domain Data for the Arts and Humanities Universal Resource Identifiers (URIs), 192, 193, 194,
(BUDDAH), 9–10, 10–11, 415, 466, 595 201–4, 230, 235
British Library, 10, 17, 38, 420, 596–8 URI search, 37
Conservative Party, 16 Universal Resource Locators (URLs), 33, 37, 160–1,
e-government, 594 219, 274, 275
legal deposit, 32 seed-URL aggregation, 448, 451
Measles-Mumps-Rubella (MMR) vaccine crisis, TinyURL, 235
464–75 university archives, 37
Alan Phillips report, 471–3 University of Bologna, 7, 116–18
MMR The Facts website, 468–71, 472 University of Carolina (UNC), 471–3
Society of Autistically Handicapped, 473–4 University of Karlsruhe, 520, 522
national web archive, 7, 33, 35, 36–7, 46, 81, University of Minnesota, 64
164n, 420, 599 University of Peking, 523
national web domain, 416, 417, 418, 420, 422, 423 University of Toronto, 10, 11
Open UK Web Archive, 39 Unix, 190, 243–4
second-level domains (SLD), 422–3 updating, dynamics of, 20–1
teletext, 537 URIs (Universal Resource Identifiers), 192, 193, 194,
UK Legal Deposit Web Archive, 39 201–4, 230, 235
United States URI search, 37
2016 elections, 82 URLs (Universal Resource Locators), 33, 37, 160–1,
blogs, 360–9 219, 274, 275
Congress, 339 seed-URL aggregation, 448, 451
Defense Advanced Research Projects Agency TinyURL, 235
(DARPA), 263 Usenet, 272, 374, 375, 512, 539
historiography, 3 Dejanews, 247
Indian immigrants, 388 Godwin’s Law, 509–10
community-oriented homepages, 391–7 pornography, 552
personal homepages, 390–1 spam, 564, 565, 566, 567, 572
information overload, 90 trolling, 580–2
internet access, 66 user data, 335, 337, 338, 339
Library of Congress, 37, 45, 48, 49, 512 see also personal data; surveillance advertising
website, 2002, 146 user participation, 63
National Center for Supercomputing Applications blogoshere, 233
(NCSA), 276, 279, 280–1, 283–4 e-literature, 435–6
National Information Infrastructure, 293n memes, 513
news consumption, 400–10 private versus public data, 104–9
mobile news apps, 408–10 protection from harm, 103–4
newspapers on the web, 402–5 social media, 376
social media, 405–8 ‘text versus human’ debate, 101–3
privacy, 105 Wikipedia, 80
628 THE SAGE HANDBOOK OF WEB HISTORY

user-generated content (UGC), 373 hive plots, 174–5, 176


utopia, 480, 484–5 metrics, 177–8
text, 178–81
vaccination crisis see Measles-Mumps-Rubella (MMR) algorithmic approaches, 180–1
vaccine crisis simple text visualization, 178–80
Vaclav Havel collection, 46 VKontakte, 379
Van Dam, Andries, 222, 223, 224, 225 voice, 364
Van Dijck, J., 376 voice search, 91
van Harmelen, Frank, 264 Von Tetzchner, Jon Stephenson, 301, 303
vandalism, 583
Vary response header, 192 WAIS, 272
Vaughan, L., 35, 140, 159 Wakefield, Andrew, 466, 467, 473
venture capital, 336 Wales, J., 318, 319–20, 321
Verifiability policy, Wikipedia (WP:V), Wall Street Journal, 334
323–4 walled gardens, 236, 237, 297, 301, 334, 530–1
Verizon, 339 Waller, V., 90
vernacular creativity, 522 wanadoo.fr, 422
ASCII graphics, 524 Wanamaker, John, 340
blogs, 529–31 Wandex, 244–5
bulletin board systems (BBSs), 523–5 wangluo, 521, 529, 533
emails, 522–3 WAP (Wireless Access Protocol), 290, 546
future of, 532–3 war blogs, 364
homepages, 525–7 see also political blogs
QQ messaging app, 527–9 WARC files, 10, 13n, 33, 38, 145, 350
WeChat, 531–2 Washington Post, 9–10
vernacular culture, 522 WAT (Web Archive Transformation), 38, 447
vernacular writing, 522 Wayback Machine, 23, 26–7, 44, 156
Veronica, 244 close reading, 8–9
version, 22 GeoCities, 348, 349
Victory Garden (Moulthrop, 1992), 430 keyword searching, 164n
video aggregator sites, 558–60 link ripper software, 54n
video blogs, 363, 366, 377 Open Wayback, 34
video files, 19, 23, 27 single-site histories, 49–50, 53n, 158
video sharing, 585 sounds of the Web, 493, 498, 501
video streaming, 380 temporal inconsistency, 24
videocasts, 373 URI search, 37
video-sharing sites, 378 visuality, 492
see also YouTube White House, 8
vimeo, 377 Wealth of Networks (Benkler, 2006), 315
Vinck, D., 76 Weaving the Web (Berners-Lee, 2000), 230
Vine, 378 web
Viola, 272, 276, 277, 279, 294n as infrastructure, 571
virtual communities, 579 and laws, 571
virtual ethnography, 458 see also live web; online web; web archives
Virtual Homelands (Mallapragada, 2014), 389 Web 1.0, 227
virtual rape, 580 trolling, 577, 578–82
Virtual Reality Markup Language (VRML), 285 Web 2.0, 62, 227, 435, 436, 492, 573
virtuality, 66–7 blogs, 359
visibility, 93–4 human agency, 574
visual bias, 492, 497, 498 mobile, 303–4
visualisation, 38 social media, 376–8
visualization techniques, 168–83 trolling, 577, 578–9, 582–6
change over time, 181–3 see also mobile Web
networks, 170–8 Web 3.0, 265
edge bundling, 173, 174, 175 web advertising, 330–1, 332–3
edge pruning, 172–3 ad network and dotcom bubble, 334–6
heatmaps, 175–6, 177, 178 CommerceNet Consorium, 286
INDEX 629

digital news organizations, 404 web banners see banner ads


display advertising, 333 web browsers see browsers
electronic billboards and corporate home pages, web collections
333–4 crawled, 23–4, 26–7
Facebook, 252 types, 25–7
Google, 247–8, 251 web commerce, 285–7
HotWired, 246 web cookies, 42, 43, 51, 53
Mosaic, 282 web crawlers, 6
programmatic advertising, 338 web crawling, 19–20, 21, 99, 419
search advertising, 336–7 bias, 35
splogs, 570 blocking of, 141
surveillance advertising, 335–6, 337–41 crawled web collection, 23–4, 26–7
Web apps, 305 desktop applications, 33–4
web architecture, 242, 245 First World War research project, 443, 444, 447–8
web archive corpus, 142–5, 161–2 Issue Crawler, 130
Web Archive Transformation (WAT), 38, 447 Navicrawler, 459
web archives social media platforms, 48
access and use, 37–9, 158, 160–1 see also Common Crawl Foundation; Common
completeness, 159–60, 162 Crawl project
contextualisation, 68–9, 158–9 web design, 361, 551, 552, 553, 557
creation, 33–5 web domains see national web domains; transnational
First World War research project, 441–60 web domains
collection construction, 442–6 Web Enact software, 47
metadata, 446–54 web experiences, collective and individual, 66–9
results, 454–60 web governance, 64–6, 77–82
future of research, 600–1 web graphs, 446–54
historians’ engagement with, 593–8 dynamic graphs, 453–4
as historical source, xxix, 3–13, 161–2 qualitative analysis, 449–53
characteristics, 21–4 quantity and quality, 447–9
computational access to web archives, 9–12 web history
scholary use, 7–8, 24–7, 49–51 contextualisation, 59–70, 158–9
third wave of digital history, 12 boundaries and spatial contextualisation, 64–6
Wayback Machine, 8–9, 8 collective and individual experiences, 66–9
limitations, 475n diachronic and chronological perspective, 60–4
national web domains, 419–21 two dimensions of, xxix
network analysis, 131–4 Web of Sounds, 492–7
non-textual elements, 162–3 Web Ontology Language (OWL), 263–4
ontological status, 154 web rings, 231
scope and structure, 35–7 Web Science, 88
sonic web, 497–500 Web semantics, 257–67
web crawling, 99–100 genealogies, 266
see also GeoCities; Internet Archive; national knowledge representation, 257–61
web archives on the web, 261–5
Web Archives for Longitudinal Knowledge (WALK) web sphere, 9, 38, 44, 45, 46, 51
Project, 133 web sphere analysis, 156
web archiving, 19–21 web surfer, 44
client-side web archiving, 142 web surfing, 245, 248, 390
history of, 30–3, 42–3, 481 web uses, 86–96
autobiographical archiving, 46–9 information infrastructure, 94–5
event-based special collections, 44–5 information-seeking and disciplinarity, 87–9
national web archives, 45–6, 49, 53n information-seeking in everyday life, 89–92
single-site histories, 43–4, 49 web visibility, 93–4
as a microcosm of Internet governance, 81–2 Wikipedia, 92–3
server-side web archiving, 142 web visibility, 93–4
see also Internet Archive; web history Web90 - Patrimoine, Mémoires et Histoire du Web dans
web archiving software, 53–4n les années 1990, 416, 421–2
Web art see Net art Webarchivist project, 44–5
630 THE SAGE HANDBOOK OF WEB HISTORY

‘WebART: Enabling Scholarly Research in the KB Wireless Markup Language (WML), 299
Web Archive’, 415 women, 318–19, 395, 432, 586
web-based banking, 393–4 WordPress, 368, 377
webcams, 559 World Made Meme, The (Milner, 2016), 513, 514
WebCitation, 194 World Trade Center, 44
WebCrawler, 245 World War I research project, 441–60
webindia.com, 392–3 collection construction, 442–6
weblog community, 233, 362, 377 metadata, 446–54
weblogs see blogs dynamic graphs, 453–4
Weblouvre, 422 qualitative analysis, 449–53
webometrics, 130, 140 quantity and quality, 447–9
webpages, text on, 139, 146–50 results, 454–60
webrary, 38 World Wide Web Consortium (W3C), 64, 65, 67, 78–9,
Webrecorder, 33, 47, 598 196, 197, 230, 235, 285
Websense, 574 Browser War II, 288
website categories, 143 ‘Little History of the World Wide Web, A’
websites, 140, 231, 375, 403, 558 (W3C, 2000), 492–3
China, 526–7 mobile Web, 298, 303–4
typology, 445 Resource Description Framework (RDF), 262
Webster, Peter, 7, 487 Web Ontology Language (OWL), 263–4
WeChat, 379, 531–2 World Wide Web (WWW), 256–7, 272–6, 375
Wei, Pei, 272, 276, 279, 280 worship, 483, 485–6, 487
see also Viola WPP, 337
Weibo, 530 writing see script
Weixin see WeChat Wu, A.X., 94
WELL, The, 429–30, 431
Weltevrede, E., 134 Xanadu, 61, 220–2, 229–30, 345
West, A., 92 XING, 379
West, Kanye, 514 XingSound, 495
Western Union, 392 XMCD, 495
WET, 38 XML, 262
Wget, 33
‘What you can get is what can be assembled’, 23 Yahoo! 245, 248, 334, 336, 337, 569
‘What you see is what you get’, 23 GeoCities, 347, 348
WhatsApp, 378, 380 surveillance advertising, 339
White House, 8, 50 Ye Sang, 523
WhiteHouse.gov 2002, 147 YOUKU, 379
Wiggins, Bradley, 513, 514 YouTube, 4–5, 337, 377, 378
Wikipedia, 79–80, 90, 92–3, 315–26, 377 advertising, 338
Free and Open Software (F/OSS), 316–22 archiving, 48
quality anxiety, 322–3 audio archives, 498–9
reliable sources, 324–6 religion, 487
self-organizing maps (SOMs), 181 spam, 573
trolling, 585 trolling, 585
Verifiability policy (WP:V), 323–4 user participation, 435–6
wikis, 376–7 ‘You’ve Got Mail!’, 492, 495, 498
WikiWikiWeb, 375, 377 .yu, 7–8, 134, 161, 162, 163, 418, 424
Williams, Evan, 363, 367 Yugoslavia, 7–8, 37, 43, 134, 161, 162, 163, 424
Williams, Raymond, 340, 367, 396
Winamp, 492, 496, 497, 498 Zero Wing, 512–13
Windows 95, 289, 546 Zhai Zhenming, 526
Windows Media Audio, 496 Zhou Xiaoping, 530
Winer, David, 234 Zhu Ling, 523
Winner, Langdon, 81 Zimmer, M., 109
Winograd, Terry, 259 Zippered Lists, 222
Wired, 368, 509, 510, 512 Zuckerberg, Mark, 338
Wireless Access Protocol (WAP), 290, 298–9, 300–1, 546

You might also like