Professional Documents
Culture Documents
Exercise 2
Exercise 2
Load the Browns corpus from NLTK (nltk.corpus.brown) with fiction category (pass the category
to the loader functions). From the corpus, load the tagged and untagged sentences. Make sure
that the tags are using the universal tag set.
To evaluate the taggers, divide the tagged sentence into 75-25 split for training tagging
algorithms and testing them. Report both the accuracy on the training data and testing data.
from nltk.corpus import brown
brown_fiction_tagged = brown.tagged_sents(categories='fiction', tagset='universal')
brown_train = brown_fiction_tagged[75:]
brown_test = brown_fiction_tagged[:25]
brown_train = brown_fiction_tagged[75:]
brown_test = brown_fiction_tagged[:25]
from nltk.tag import untag
test_sent = untag(brown_test[0])
perceptron_trained = nltk.perceptron.PerceptronTagger(load=False)
perceptron_trained.train(brown_train, nr_iter=1)
print(perceptron_trained.evaluate(brown_train))
print(perceptron_trained.evaluate(brown_test))
b. 5 iteration
perceptron_trained.train(brown_train, nr_iter=5)
c. 10 iteration
perceptron_trained.train(brown_train, nr_iter=10)
3. Train a 3 Conditional Random Field using a different custom feature function. The
feature function must contain the features below. Model A should use features a-c.
Model B should use features a-e and Model C should use all the features.
a. Previous, Current, and Next Word
b. 1-3 Character Prefix
c. 1-3 Character Suffix
d. Capitalize
e. Word contains a number
f. Word is first in the sentence
g. Word is last in the sentence