Professional Documents
Culture Documents
Lecture #2
Lecture #2
Lecture 2
What is data?
1
GEK2050/GET1030
GET1030
Learning Objectives
● To define data
● To consider the relationship between data and
interpretation
2
GEK 2050
GET1030
https://www.nytimes.com/interactive/2020/12/11/opinion/culture/diversity-publishing-industry.html
https://www.kaggle.com/xvivanco
s/analyzing-the-lord-of-the-rings-
data
4
GEK2050/GET1030
GET1030
Using data
5
GEK2050/GET1030
GET1030
Decisions
6
GET1030
Computers and the humanities
Lecture 2.1
Defining data
7
GEK2050/GET1030
GET1030
What is data?
DIKW pyramid
8
GEK2050/GET1030
GET1030
Using data
Data
Sequence of
steps
Answer
9
GEK2050/GET1030
GET1030
Is this data?
10
GEK2050/GET1030
GET1030
Is this data?
● No
● If you read the novels in a
digital format, you are
treating it as information
(according to our definition
so far).
● What would you need to
do to treat it as data?
11
GEK2050/GET1030
GET1030
https://www.kaggle.com/xvivanco
s/analyzing-the-lord-of-the-rings-
data
12
GEK2050/GET1030
GET1030
Using data
Data
Sequence of
steps
Answer
13
GEK2050/GET1030
GET1030
Context
● Data is context-dependant
● The question should not be what are data but
"when are data" (Borgman, 2015)
14
GEK2050/GET1030
GET1030
Types of data
● Free text
● Categories
● Coordinates
● Floating point number
● Etc.
15
GEK2050/GET1030
GET1030
Types of data
16
GEK2050/GET1030
GET1030
Characteristics of a dataset
17
GEK2050/GET1030
GET1030
Different Objectives
18
GEK2050/GET1030
GET1030
Data elsewhere
19
GET1030
Computers and the humanities
Lecture 2.3
Problems of using data
20
GEK2050/GET1030
GET1030
21
GEK2050/GET1030
GET1030
Potential issues
23
GEK2050/GET1030
GET1030
A cultural example
24
GEK2050/GET1030
GET1030
Potential issues
25
GEK2050/GET1030
GET1030
26
GEK2050/GET1030
GET1030
Baseball […] has statistical rigor. Its gurus have an immense data set at hand,
almost all of it directly related to the performance of players in the game.
Moreover, their data is highly relevant to the outcomes they are trying to
predict. This may sound obvious, but as we'll see [...], the folks building
Weapons of Math Destruction routinely lack data for the behaviors they're most
interested in. So they substitute stand-in data, or proxies. They draw statistical
correlations between a person's zip code or language patterns and her
potential to pay back a loan or handle a job. These correlations are
discriminatory, and some of them are illegal. Baseball models, for the most
part, don't use proxies because they use pertinent inputs like balls, strikes, and
hits (Cathy O’Neil, Weapons of Math Destruction, 17–18).
27
GET1030
Computers and the humanities
Lecture 2.4
Data in the humanities
28
GEK2050/GET1030
GET1030
30
GEK2050/GET1030
GET1030
Types of data
● Big data
● Smart data
31
GEK2050/GET1030
GET1030
Smart data
32
GEK2050/GET1030
GET1030
Big data
33
GEK2050/GET1030
GET1030
From
Schöch
(2013)
34
GEK2050/GET1030
GET1030
35
GEK2050/GET1030
GET1030
Problems of data
● Standardization
● Incompleteness
● Inaccuracy
What are potential problems in your group projects?
36
GEK2050/GET1030
GET1030
References
37