Professional Documents
Culture Documents
Lecture #5
Lecture #5
Lecture 5
Textual analysis
Objectives
2
Lecture 5
Textual analysis
3
GET1030
4
GET1030
Find patterns
Count words or Identify trends (in the (which words tend
metadata features usage of words, to be more
of books, lyrics, number of works common in certain
subtitles (place, published, etc.) genres)
year, genre)
5
GET1030
6
GET1030
Simple
Complex
measurements
classification through
machine learning
7
Lecture 5
Textual analysis
8
GET1030
Distant reading
The quest for broad patterns in literary history through empirical methods. An analysis of 7000 titles.
A concept proposed by Franco Moretti (Stanford Literary Lab).
Moretti (2009)
9
GET1030
Distant reading
The quest for broad patterns in literary history through empirical methods.
A concept proposed by Franco Moretti (Stanford Literary Lab).
Moretti (2009)
10
GET1030
Distant reading
Moretti
(2009)
11
GET1030
Distant reading
Moretti (2009) 12
GET1030
Distant reading
A title with twenty words and one with two are not the same creature, one
larger and one smaller; they are different animals altogether. Different
styles.
13
GET1030
Macroanalysis
14
GET1030
Macroanalysis
15
Jockers (2013)
GET1030
Macroanalysis
16
Jockers (2013)
GET1030
18
MIchel et al (2011)
GET1030
MIchel et al (2011)
19
Lecture 5
Textual analysis
20
GET1030
Lexical diversity
21
GET1030
What is a word?
● Lemmatized words:
○ A lemma is the root form of a word
■ Different forms of a word: book, books
● Sense disambiguation:
○ Book: verb
○ Book: noun
● Kinds of words:
○ Functional/grammatical words, “a”, “the”, “in”
■ Often used for stylometry (analysis of style)
○ Lexical/content words, “cat”, “house”, “book
■ Often kept for the analysis of themes
22
GET1030
https://pudding.cool/2017/02/vocabulary/index.html
23
GET1030
1853-1890
24
GET1030
https://voyant-tools.org/?corpus=2a9aa299a95d7eca47cf68d25f0382e7
Next week we will see how to import data into Voyant, perform different types of
analysis and then export it for reuse in Python 25
GET1030
Type/word ratios
26
GET1030
Word trends
https://voyant-tools.org/?corpus=2a9aa299a95d7eca47cf68d25f0382e7 27
GET1030
Concordances
http://www.opensourceshakespeare.org/concordance/findform.php
29
GET1030
KWIC
https://voyant-tools.org/?corpus=2a9aa299a95d7eca47cf68d25f0382e7
30
GET1030
Collocates
Words that tend to occur within the same “window” as another. In this case, the
window is a sentence.
31
GET1030
Collocates can also be contiguous, where we are interested in words that occur next to
each other.
32
GET1030
Hermeneuti.ca
http://hermeneuti.ca/name-games
http://gamestudies.org/1903
33
Lecture 5
Textual analysis
34
GET1030
Machine Learning
Supervised ML
● Classification
○ Labelled examples used to train a model
■ Training set
■ Test set
Unsupervised ML
● Clustering
35
GET1030
Supervised ML example
Representation of character
across 104,000 books
Underwood et al (2018)
36
GET1030
Supervised ML example
Characterization omitting
names and obvious
gender markers (he/she)
Underwood et al (2018) 37
GET1030
Corpus Linguistics
38
Underwood et al (2018)
GET1030
References
Michel, Jean-Baptiste, Yuan Kui Shen, Aviva P. Aiden, Adrian Veres, Matthew K. Gray, Joseph P. Pickett, Dale Hoiberg, et al.
2011. “Quantitative Analysis of Culture Using Millions of Digitized Books.” Science 331 (6014): 176–82.
https://doi.org/10.1126/science.1199644.
Underwood, William E., David Bamman, and Sabrina Lee. 2018. “The Transformation of Gender in English-Language Fiction.”
Journal of Cultural Analytics 1 (1): 11035. https://doi.org/10.22148/16.019.
Jockers, Matthew Lee. Macroanalysis. Topics in the Digital Humanities. Baltimore: University of Illinois Press, 2013.
Moretti, Franco. “Style, Inc. Reflections on Seven Thousand Titles (British Novels, 1740–1850).” Critical Inquiry 36, no. 1
(2009): 134–58. https://doi.org/10.1086/606125.
Rockwell, Geoffrey, and Stéfan Sinclair. Hermeneutica: Computer-Assisted Interpretation in the Humanities. Cambridge:
MIT Press, 2016. http://www.jstor.org/stable/j.ctt1c0gm6h.
39