Text Mining and Classifica1on: Karianne Bergen

Text
Mining and Classifica1on
Karianne Bergen
kbergen@stanford.edu
Ins1tute for Computa1onal and Mathema1cal Engineering,
Stanford University
Machine Learning Short Course | August 11-‐15 2014 1

Text Classifica1on
•  Determine a characteris1c of a document
based on the text:
–  Author iden1fica1on
–  Sen1ment analysis (e.g. posi1ve vs. nega1ve
review)
–  Subject or topic category
–  Spam filtering

Text Classifica1on
hTp://www.theshedonline.org.au/ac1vi1es/ac1vity/scam-‐email-‐examples

Document Features
•  How do we generate a set of input features
from a text document to pass to the machine
learning algorithm?
–  Bag of words / term-‐document matrix
–  N-‐grams

Bag-‐of-‐Words Model
•  Representa1on of text data in terms of
frequencies of words from a dic1onary
–  The grammar and ordering of words are ignored
–  Just keep the (unordered) list of words that
appear and the number of 1mes they appear

Bag-‐of-‐Words Model

Term-‐Document Matrix
•  Term-‐document matrix useful for working
with text data
–  Sparse matrix, describes frequency of words
occurring in a collec1on of documents
–  Rows represent terms/words, Columns represent
individual documents
–  Entry (𝑖,𝑗) gives number of occurrences of term 𝑖
in document 𝑗

•  Example
–  Documents:
1.  “one fish two fish”
2.  “red fish blue fish”
3.  “black fish blue fish”
4.  “old fish new fish”

–  Terms: “one”, “two”, “fish”, “red”, “blue” “black”
“old”, “new”

Document
1 2 3 4
“one” 1 0 0 0
“two” 1 0 0 0
“fish” 2 2 2 2
Term “red” 0 1 0 0
“blue” 0 1 1 0
“black” 0 0 1 0
“old” 0 0 0 1
“new” 0 0 0 1

N-‐gram
•  N-‐gram: a con1guous sequence of 𝑛 items
(e.g. words or characters)
•  Used for language modeling -‐ features retain
informa1on related to word ordering
•  e.g. “It's kind of fun to do the impossible.”
-‐ Walt Disney
–  3-‐grams: “It’s kind of,” “kind of fun,” “of fun to,”
“fun to do,” “to do the”, “do the impossible,” “the
impossible it’s” “impossible it’s kind”

Text Mining: NMF
•  Unsupervised learning method for
dimensionality reduc1on
•  NMF is a type of matrix factoriza1on
–  Original matrix and factors only contain posi1ve
or zero values
–  For dimensionality reduc1on and clustering
–  Non-‐nega1vity of factors makes the results easier
to interpret than other factoriza1ons

Nonnega1ve Matrix Factoriza1on
•  NMF factors matrix 𝑋 into product of two non-‐
nega1ve matrices:
𝑋≈𝑊𝐻,
𝑊≥0, 𝐻 ≥0
•  𝑊 is the “dic1onary” matrix and columns are
“metafeatures”, 𝐻 is coefficient matrix


NMF for Text
•  𝑋∈ℝ↑𝑡 𝑥 𝑑  : term-‐document matrix
•  𝑊∈ℝ↑𝑡 𝑥 𝑘 : 𝑘 columns (“metafeatures”) ,
each represen1ng a collec1on of terms
•  𝐻∈ℝ↑𝑘 𝑥 𝑑 : coefficients
•  Each document is represented as a posi1ve
combia1on of the 𝑘 metafeatures

NMF for Text
•  Example
–  Documents:
1.  “one fish two fish”
2.  “red fish blue fish”
3.  “old fish new fish”
4.  “some are red and some are blue”
5.  “some are old and some are new”
–  Terms: “one”, “two”, “fish”, “red”, “blue”, “old”,
“new”, “some”, “are”, “and”

NMF for Text:
X (term-‐document matrix)
Document
1 2 3 4 5
“one” 1
“two” 1
“fish” 2 2 2
“red” 1 1
“blue” 1 1
Term
“old” 1 1
“new” 1 1
“some” 2 2
“are” 2 2
“and” 1 1

NMF for Text:
W (dic1onary matrix)
Metafeature
“one” + “fish” “red” + “old” + “some” + “are” +
“two” “blue” “new” 0.5 ·∙ “and”
“one” 1
“two” 1
“fish” 1
“red” 1
Term
“blue” 1
“old” 1
“new” 1
“some” 1
“are” 1
“and” 0.5
NMF for Text:
H (coefficient matrix)
Document
1 2 3 4 5
“one” + “two” 1
“fish” 2 2 2
Metafeature
“red” + “blue” 1 1
“old + new” 1 1
“some” + “are” + 0.5 ·∙ “and” 2 2
•  e.g. “one fish two fish” → “one” “fish” “two” “fish”

= 1×“one” + 1× “two”+ 2× “fish”
OR = 1×(“one” + “two”) + 2× “fish”
NMF for Text
•  Metafeatures in dic1onary matrix 𝑊 may
reveal interes1ng paTerns in the data
–  Posi1vity of metafeatures helps with
interpretability
–  Groupings of words in metafeatures onen occur
together in the same document
•  e.g. “red” and “blue” or “old” and “new”

NMF for Text
•  e.g. Text from news from business sec1on
–  2500 ar1cles, 50 authors
–  948 terms aner pre-‐processing (stemming, stop
word removal, removal of infrequent terms)
–  Apply NMF factoriza1on with 𝐾=25
–  Metafeatures in dic1onary factor 𝑊 roughly
correspond to topics within the text
–  Representa1on of text: 948 terms à 25 topics

NMF for Text
Ford Motor Co. Thursday announced sweeping
organizational changes and a major shake-up of its
senior management, replacing the head of its
global automotive operations. The moves include
combining Ford's four components divisions into a
single organization with 75,000 employees and $14
billion in revenues, and a consolidation of the
automaker's vehicle product development centers to
three from five.

à  { “ford” “motor” “thursday” “announc” “chang”

“major” “senior” “manag” “replac”… }

NMF for Text
Metafeature 1 Metafeature 2 Metafeature 3 Metafeature 4
cargo 0.47 internet 0.43 china 0.73 plant 0.47
air 0.47 comput 0.42 beij 0.31 worker 0.35
airline 0.24 corp 0.30 chines 0.30 uaw 0.24
servic 0.18 use 0.29 state 0.21 strike 0.21
kong 0.16 system 0.20 offici 0.20 ford 0.19
hong 0.16 microsoE 0.19 said 0.19 part 0.17
aircraE 0.13 soEware 0.18 trade 0.14 local 0.15
airport 0.13 inc 0.16 foreign 0.13 auto 0.15
flight 0.12 technolog 0.16 unite 0.11 said motor 0.14
industri 0.16 truck 0.13
network 0.15 chrysler 0.13
product 0.13 work 0.13
servic 0.13 automak 0.13
busi 0.11 union 0.13
contract 0.13
0.11

NMF for Images

NMF for Images

NMF for Images

≈ + + + +
+ + + +

# NMF in R
# install.packages("NMF") # nmf
library(NMF)
V <- scale(data, center = FALSE, scale = colSums(V))
k = 20
res <- nmf(V,k)
W <- basis(res) # get dictionary matrix W

H <- coef(res) # get dictionary matrix H
V.hat <- fitted(res) # get estimate W*H

Text Classifica1on
•  Naïve Bayes
–  Simple algorithm based on Bayes rule from
sta1s1cs
–  Uses the bag-‐of-‐words model for documents
–  Has been shown to be very effec1ve for text
classifica1on

Naïve Bayes
•  NB chooses the most likely class label based on
the following assump1on about the data:
–  Independent feature (word) model – presence of any
word in document is unrelated to the presence/
absence of other words
•  This assump1on makes it easier to combine the
contribu1ons of features, don’t need to model
interac1ons between words
•  Even though this assump1on rarely hold, NB s1ll
works well in prac1ce

Naïve Bayes
•  Compute 𝑃𝑟𝑜𝑏(𝑌=𝑗 |𝑋) for each class 𝑗 and
choose class with greatest probability
•  Bayesian classifiers
𝑃𝑟𝑜𝑏𝑌⁠𝑋 = 𝑃𝑟𝑜𝑏(𝑌)𝑃𝑟𝑜𝑏(𝑋|𝑌)/𝑃𝑟𝑜𝑏(𝑋) 
•  For Naïve Bayes
𝑌 =argmax┬𝑌 ⁠𝑃𝑟𝑜𝑏(𝑌)∏𝑗=1↑𝑑▒𝑃𝑟𝑜𝑏(𝑋↓𝑗 |𝑌)  
–  𝑃𝑟𝑜𝑏(𝑌), 𝑃𝑟𝑜𝑏𝑋↓𝑗 ⁠𝑌  es1mated using training data

Naïve Bayes
•  Advantages:
–  Does not require a large training set to obtain
good performance, especially in text applica1ons
–  Independence assump1on leads to faster
computa1ons
–  Is not sensi1ve to irrelevant features
•  Disadvantages:
–  Independence of features assump1on
–  Good classifier, but poor probability es1mates

Author iden1fica1on
•  Collec1on of poems – William Shakespeare or
Robert Frost?
Two roads diverged in a yellow wood,
And sorry I could not travel both
And be one traveler, long I stood
And looked down one as far as I could
To where it bent in the undergrowth;
Then took the other, as just as fair,
And having perhaps the better claim…
Shall I compare thee to a summer's day?

Thou art more lovely and more temperate.
Rough winds do shake the darling buds of May,
And summer's lease hath all too short a date.
Sometime too hot the eye of heaven shines,
And often is his gold complexion dimmed;
And every fair from fair sometime declines,
By chance, or nature's changing course, untrimmed;…

Author iden1fica1on
install.packages("tm") # text mining
library(tm) # loads library
# shakespeare
s.dir = "shakespeare"
s.Docs <- Corpus(DirSource(directory=s.dir,
encoding="UTF-8"))
# frost
f.dir = "frost"
f.Docs <- Corpus(DirSource(directory=f.dir,
encoding="UTF-8"))

cleanCorpus<-function(corpus){
# apply stemming
corpus <-tm_map(corpus, stemDocument, lazy=TRUE)
# remove punctuation
corpus.tmp <- tm_map(corpus,removePunctuation)
# remove white spaces

corpus.tmp <- tm_map(corpus.tmp,stripWhitespace)
# remove stop words

corpus.tmp <-
tm_map(corpus.tmp,removeWords,stopwords("en"))
return(corpus.tmp)
}

d.docs <- c(s.docs, f.docs) # combine data sets
d.cldocs <- cleanCorpus(d.docs) # preprocessing
# forms document-term matrix

d.tdm <- DocumentTermMatrix(d.cldocs)
# removes infrequent terms

d.tdm <- removeSparseTerms(d.tdm,0.97)
> dim(d.tdm) # [ #docs, #numterms ]

[1] 264 518
> inspect(d.tdm) # inspect entries in document-term

matrix

# exploring the data
# terms appearing > 55 times in shakespeare’s poems

> findFreqTerms(s.tdm,55)
[1] "and" "but" "doth" "eye" "for" "heart" "love"
"mine" "sweet" "that" "the" "thee" "thi" "thou"
"time" "yet"
# terms appearing > 55 times in frost’s poems

> findFreqTerms(f.tdm,55)
[1] "and" "back" "but" "come" "know" "like" "look"
"make" "one" "say" "see" "that" "the" "they" "way"
"what" "with" "you"

# identify associations between terms - shakespeare

> findAssocs(s.tdm, "winter", 0.2)
winter
summer 0.50
age 0.40
youth 0.34
like 0.24
old 0.23
beauti 0.21
seen 0.21

# identify associations between terms - frost

> findAssocs(f.tdm, "winter", 0.5)
winter
climb 0.66
town 0.62
toward 0.57
side 0.55
black 0.53
mountain 0.52

# assign class labels to each document,
# based on the document author
class.names = c('shakespeare','frost')
d.class = c(rep(class.names[1], nrow(s.tdm)),
rep(class.names[2], nrow(f.tdm)))
d.class = as.factor(d.class)
> levels(d.class)
[1] "frost" "shakespeare“

# separate data into training and test sets
set.seed(123) # set random seed

train_frac = 0.6 # fraction of data for training
train_idx = sample.int(nrow(d.tdm), size =
ceiling(nrow(d.tdm) * train_frac),
replace = FALSE);
train_idx <- sort(train_idx)
test_idx <- setdiff(1:nrow(d.tdm), train_idx)
d.tdm.train <- d.tdm[train_idx,]

d.tdm.test <- d.tdm[test_idx,]
d.class.train <- d.class[train_idx]
d.class.test <- d.class[test_idx]

# separate data into training and test sets
> d.tdm.train
<<DocumentTermMatrix (documents: 159, terms: 518)>>
Non-/sparse entries : 6167/76195
Sparsity : 93%
Maximal term length : 9
Weighting : term frequency (tf)
> d.tdm.test
<<DocumentTermMatrix (documents: 105, terms: 518)>>
Non-/sparse entries : 4578/49812
Sparsity : 92%
Maximal term length : 9
Weighting : term frequency (tf)

# CART
install.packages("rpart") # install cart package

library(rpart) # load library
d.frame.train <- data.frame(as.matrix(d.tdm.train));

d.frame.train$class <- as.factor(d.class.train)
treefit <- rpart(class ~., data = d.frame.train)
> summary(treefit)
Variables actually used in tree construction:
[1] doth eyes green grow let thee which

Decision Tree result
plot(treefit, uniform=TRUE)
text(treefit, use.n=T)

•  William Shakespeare or Robert Frost?
Two roads diverged in a yellow wood,

And sorry I could not travel both
And be one traveler, long I stood
And looked down one as far as I could
To where it bent in the undergrowth;
Then took the other, as just as fair,
And having perhaps the better claim…
Shall I compare thee to a summer's day?

Thou art more lovely and more temperate.
Rough winds do shake the darling buds of May,
And summer's lease hath all too short a date.
Sometime too hot the eye of heaven shines,
And often is his gold complexion dimmed;
And every fair from fair sometime declines,
By chance, or nature's changing course, untrimmed;…

# CART
Node number 1: 159 observations, complexity param=0.3947368
predicted class=shakespeare expected loss=0.4779874 P(node) =1
class counts: 76 83
probabilities: 0.478 0.522
left son=2 (120 obs) right son=3 (39 obs)
Primary splits:
thee < 0.0007022472 to the left, improve=21.14, (0 missing)
thi < 0.01323529 to the left, improve=21.14, (0 missing)
thou < 0.003511236 to the left, improve=19.58, (0 missing)
doth < 0.0007022472 to the left, improve=16.21, (0 missing)
love < 0.01906318 to the left, improve=14.89, (0 missing)
Surrogate splits:
thou < 0.003511236 to the left, agree=0.906, (0 split)
thi < 0.0007022472 to the left, agree=0.899, (0 split)
art < 0.005088523 to the left, agree=0.836, (0 split)
thine < 0.0007022472 to the left, agree=0.824,(0 split)
hast < 0.009433962 to the left, agree=0.805, (0 split)

# CART
predclass <- predict(treefit1, d.frame.test)
colNames = colnames(predclass)
d.class.pred <-
as.factor(colNames[max.col(predclass)])
tree.table <- table(d.class.pred, d.class.test)
> tree.table
actual
predicted frost shakespeare
frost 55 12
shakespeare 1 37

# CART
errorRate<-function(table){
TP = table[1,1]; # true positives
TN = table[2,2]; # true negatives
FP = table[1,2]; # false positives
FN = table[2,1]; # false negatives
error_rate = (FP + FN)/(TP + TN + FP + FN)
return(error_rate)
}
> errorRate(tree.table)
[1] 0.1238095

COME unto these yellow
sands,
And then take hands:
Court'sied when you have, How countlessly they congregate
and kiss'd,-- O'er our tumultuous snow,
The wild waves whist,-- Which flows in shapes as tall as
Foot it featly here and trees
there; When wintry winds do blow!–
And, sweet sprites, the As if with keenness for our fate,
burthen bear. Our faltering few steps on
Hark, hark! To white rest, and a place of
Bow, wow, rest
The watch-dogs bark: Invisible at dawn,--
Bow, wow. And yet with neither love nor
Hark, hark! I hear hate,
The strain of strutting Those stars like some snow-white
chanticleer Minerva's snow-white marble eyes
Cry, Cock-a-diddle-dow! Without the gift of sight.

COME unto these yellow
sands,
And then take hands:
Court'sied when you have, How countlessly they congregate
and kiss'd,-- O'er our tumultuous snow,
The wild waves whist,-- Which flows in shapes as tall as
Foot it featly here and trees
there; When wintry winds do blow!–
And, sweet sprites, the As if with keenness for our fate,
burthen bear. Our faltering few steps on
Hark, hark! To white rest, and a place of
Bow, wow, rest
The watch-dogs bark: Invisible at dawn,--
Bow, wow. And yet with neither love nor
Hark, hark! I hear hate,
The strain of strutting Those stars like some snow-white
chanticleer Minerva's snow-white marble eyes
Cry, Cock-a-diddle-dow! Without the gift of sight.
True Author: Shakespeare True Author: Frost

Predicted: Frost Predicted: Shakespeare

# KNN
library(class)
knn_res <- knn(d.tdm.train, d.tdm.test,
d.class.train, k = 5, prob=TRUE)
knn.table <- table(knn_res, d.class.test,
dnn = list('predicted','actual'))
> knn.table
actual
predicted frost shakespeare
frost 56 33
shakespeare 0 16
> errorRate(knn.table)
[1] 0.3142857

# naive bayes
nb_classifier <- naiveBayes(as.matrix(d.tdm.train),

d.class.train, laplace = 1)
res <- predict(nb_classifier, as.matrix(d.tdm.test),
type = "raw", threshold = 0.5)
> res
frost shakespeare
[1,] 2.265614e-244 1.000000e+00
[2,] 2.285289e-165 1.000000e+00
[3,] 5.696532e-67 1.000000e+00
…
[104,] 1.000000e+00 0.000000e+00
[105,] 1.000000e+00 0.000000e+00

# naive bayes
> nb_classifier$apriori # breakdown of training data

d.class.train
frost shakespeare
77 82

errorRate<-function(table){
TP = table[1,1]; # true positives
TN = table[2,2]; # true negatives
FP = table[1,2]; # false positives
FN = table[2,1]; # false negatives
error_rate = (FP + FN)/(TP + TN + FP + FN)
return(error_rate)
}
> errorRate(res.table)
[1] 0.1619048

NMF for Text
•  'cargo' 'air' 'airlin' 'servic' 'kong‘ 'hong' 'aircran' 'airport'
'flight’ ( 0.4711 0.4696 0.2349 0.1772 0.1648 0.1583 0.1328
0.1271 0.1245 )
•  'internet' 'comput' 'corp' 'use' 'system' 'microson' 'sonwar‘
'inc' 'technolog' 'industri' 'network' 'product' 'servic'
'busi‘ (0.4285 0.4165 0.2990 0.2885 0.1958 0.1883 0.1776
0.1630 0.1618 0.1565 0.1519 0.1347 0.1320 0.1146)
•  'china' 'beij' 'chines' 'state' 'offici' 'said' 'trade' 'foreign‘
'unite‘ ( 0.7297 0.3059 0.3034 0.2089 0.2038 0.1884 0.1400
0.1337 0.1147 )
•  'plant' 'worker' 'uaw' 'strike' 'ford' 'part' 'local' 'auto‘ 'said'
'motor' 'truck' 'chrysler' 'work' 'automak' 'union‘ 'contract'
'agreement' 'three' 'mich‘ ( 0.4729 0.3485 0.2438 0.2141
0.1877 0.1692 0.1498 0.1452 0.1382 0.1310 0.1305 0.1291
0.1281 0.1264 0.1261 0.1130 0.1044 0.1040 0.1023)

# CART
Node number 1: 159 observations, complexity param=0.3947368
predicted class=shakespeare expected loss=0.4779874 P(node) =1
class counts: 76 83
probabilities: 0.478 0.522
left son=2 (120 obs) right son=3 (39 obs)
Primary splits:
thee < 0.5 to the left, improve=21.14719, (0 missing)
thi < 0.5 to the left, improve=20.35459, (0 missing)
thou < 0.5 to the left, improve=19.57953, (0 missing)
doth < 0.5 to the left, improve=16.20745, (0 missing)
tree < 0.5 to the right, improve=13.91526, (0 missing)
Surrogate splits:
thou < 0.5 to the left, agree=0.906, adj=0.615, (0 split)
thi < 0.5 to the left, agree=0.899, adj=0.590, (0 split)
art < 0.5 to the left, agree=0.830, adj=0.308, (0 split)
thine < 0.5 to the left, agree=0.824, adj=0.282, (0 split)
hast < 0.5 to the left, agree=0.805, adj=0.205, (0 split)

Sample R code
> Auto=read.table("Auto.data")
> fix(Auto)
> dim(Auto)
[1] 392 9
> names(Auto)
[1] "mpg" "cylinders " "displacement" "horsepower "
[5] "weight" "acceleration" "year" "origin"
[9] "name"

Text Mining and Classifica1on: Karianne Bergen

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Text Mining and Classifica1on: Karianne Bergen

Uploaded by

Copyright:

Available Formats

Text

Mining and Classiﬁca1on

Machine Learning Short Course | August 11-­‐15 2014 1

Machine Learning Short Course | August 11-­‐15 2014 2

Machine Learning Short Course | August 11-­‐15 2014 3

Machine Learning Short Course | August 11-­‐15 2014 4

Machine Learning Short Course | August 11-­‐15 2014 5

Machine Learning Short Course | August 11-­‐15 2014 6

Machine Learning Short Course | August 11-­‐15 2014 7

Machine Learning Short Course | August 11-­‐15 2014 8

Machine Learning Short Course | August 11-­‐15 2014 9

Machine Learning Short Course | August 11-­‐15 2014 10

Machine Learning Short Course | August 11-­‐15 2014 11

Machine Learning Short Course | August 11-­‐15 2014 12

Machine Learning Short Course | August 11-­‐15 2014 13

Machine Learning Short Course | August 11-­‐15 2014 14

Machine Learning Short Course | August 11-­‐15 2014 15

• e.g. “one fish two fish” → “one” “fish” “two” “fish”

Machine Learning Short Course | August 11-­‐15 2014 18

Machine Learning Short Course | August 11-­‐15 2014 19

à { “ford” “motor” “thursday” “announc” “chang”

Machine Learning Short Course | August 11-­‐15 2014 20

Machine Learning Short Course | August 11-­‐15 2014 21

Machine Learning Short Course | August 11-­‐15 2014 22

Machine Learning Short Course | August 11-­‐15 2014 23

Machine Learning Short Course | August 11-­‐15 2014 24

V <- scale(data, center = FALSE, scale = colSums(V))

W <- basis(res) # get dictionary matrix W

Machine Learning Short Course | August 11-­‐15 2014 25

Machine Learning Short Course | August 11-­‐15 2014 26

Machine Learning Short Course | August 11-­‐15 2014 27

Machine Learning Short Course | August 11-­‐15 2014 28

Machine Learning Short Course | August 11-­‐15 2014 29

Shall I compare thee to a summer's day?

Machine Learning Short Course | August 11-­‐15 2014 30

library(tm) # loads library

Machine Learning Short Course | August 11-­‐15 2014 31

# remove white spaces

# remove stop words

Machine Learning Short Course | August 11-­‐15 2014 32

# forms document-term matrix

# removes infrequent terms

> dim(d.tdm) # [ #docs, #numterms ]

> inspect(d.tdm) # inspect entries in document-term

Machine Learning Short Course | August 11-­‐15 2014 33

# terms appearing > 55 times in shakespeare’s poems

# terms appearing > 55 times in frost’s poems

Machine Learning Short Course | August 11-­‐15 2014 34

# identify associations between terms - shakespeare

Machine Learning Short Course | August 11-­‐15 2014 35

# identify associations between terms - frost

Machine Learning Short Course | August 11-­‐15 2014 36

Machine Learning Short Course | August 11-­‐15 2014 37

set.seed(123) # set random seed

d.tdm.train <- d.tdm[train_idx,]

Machine Learning Short Course | August 11-­‐15 2014 38

Machine Learning Short Course | August 11-­‐15 2014 39

install.packages("rpart") # install cart package

d.frame.train <- data.frame(as.matrix(d.tdm.train));

treefit <- rpart(class ~., data = d.frame.train)

Machine Learning Short Course | August 11-­‐15 2014 40

Machine Learning Short Course | August 11-­‐15 2014 41

Machine Learning Short Course | August 11-‐15 2014 1

Machine Learning Short Course | August 11-‐15 2014 2

Machine Learning Short Course | August 11-‐15 2014 3

Machine Learning Short Course | August 11-‐15 2014 4

Machine Learning Short Course | August 11-‐15 2014 5

Machine Learning Short Course | August 11-‐15 2014 6

Machine Learning Short Course | August 11-‐15 2014 7

Machine Learning Short Course | August 11-‐15 2014 8

Machine Learning Short Course | August 11-‐15 2014 9

Machine Learning Short Course | August 11-‐15 2014 10

Machine Learning Short Course | August 11-‐15 2014 11

Machine Learning Short Course | August 11-‐15 2014 12

Machine Learning Short Course | August 11-‐15 2014 13

Machine Learning Short Course | August 11-‐15 2014 14

Machine Learning Short Course | August 11-‐15 2014 15

•  e.g. “one fish two fish” → “one” “fish” “two” “fish”

Machine Learning Short Course | August 11-‐15 2014 18

Machine Learning Short Course | August 11-‐15 2014 19

à  { “ford” “motor” “thursday” “announc” “chang”

Machine Learning Short Course | August 11-‐15 2014 20

Machine Learning Short Course | August 11-‐15 2014 21

Machine Learning Short Course | August 11-‐15 2014 22

Machine Learning Short Course | August 11-‐15 2014 23

Machine Learning Short Course | August 11-‐15 2014 24

Machine Learning Short Course | August 11-‐15 2014 25

Machine Learning Short Course | August 11-‐15 2014 26

Machine Learning Short Course | August 11-‐15 2014 27

Machine Learning Short Course | August 11-‐15 2014 28

Machine Learning Short Course | August 11-‐15 2014 29

Machine Learning Short Course | August 11-‐15 2014 30

Machine Learning Short Course | August 11-‐15 2014 31

Machine Learning Short Course | August 11-‐15 2014 32

Machine Learning Short Course | August 11-‐15 2014 33

Machine Learning Short Course | August 11-‐15 2014 34

Machine Learning Short Course | August 11-‐15 2014 35

Machine Learning Short Course | August 11-‐15 2014 36

Machine Learning Short Course | August 11-‐15 2014 37

Machine Learning Short Course | August 11-‐15 2014 38

Machine Learning Short Course | August 11-‐15 2014 39

Machine Learning Short Course | August 11-‐15 2014 40

Machine Learning Short Course | August 11-‐15 2014 41

Machine Learning Short Course | August 11-‐15 2014 42

Machine Learning Short Course | August 11-‐15 2014 43

Machine Learning Short Course | August 11-‐15 2014 44

Machine Learning Short Course | August 11-‐15 2014 45

Machine Learning Short Course | August 11-‐15 2014 46

Machine Learning Short Course | August 11-‐15 2014 47

Machine Learning Short Course | August 11-‐15 2014 48

Machine Learning Short Course | August 11-‐15 2014 49

Machine Learning Short Course | August 11-‐15 2014 50

Machine Learning Short Course | August 11-‐15 2014 51

Machine Learning Short Course | August 11-‐15 2014 52

Machine Learning Short Course | August 11-‐15 2014 53

Machine Learning Short Course | August 11-‐15 2014 54