Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 3

Exercise 2:

Load the Browns corpus from NLTK (nltk.corpus.brown) with fiction category (pass the category
to the loader functions). From the corpus, load the tagged and untagged sentences. Make sure
that the tags are using the universal tag set.

To evaluate the taggers, divide the tagged sentence into 75-25 split for training tagging
algorithms and testing them. Report both the accuracy on the training data and testing data.
from nltk.corpus import brown
brown_fiction_tagged = brown.tagged_sents(categories='fiction', tagset='universal')

brown_train = brown_fiction_tagged[75:]
brown_test = brown_fiction_tagged[:25]

from nltk.tag import untag


test_sent = untag(brown_test[0])
print("Tagged: ", brown_test[0])
print("Untagged: ", test_sent)

from nltk import DefaultTagger


print('Accuracy on the training data: %4.1f%%' % (100.0 *
DefaultTagger('NUM').evaluate(brown_train)))
print('Accuracy on the testing data: %4.1f%%' % (100.0 *
DefaultTagger('NUM').evaluate(brown_test)))

Submit the notebook that performs the tasks below.

1. Explore the performance of N-Gram taggers on the corpus.


a. Unigram Tagger
from nltk.corpus import brown
from nltk import UnigramTagger
brown_fiction_tagged = brown.tagged_sents(categories='fiction',
tagset='universal')
brown_train = brown_fiction_tagged[75:]
brown_test = brown_fiction_tagged[:25]
t0=UnigramTagger(brown_train)
print('Accuracy: %4.1f%%' % ( 100.0 * t0.evaluate(brown_test)))
b. Unigram Tagger with a verb backoff
from nltk.corpus import brown
from nltk import DefaultTagger
from nltk import UnigramTagger
brown_fiction_tagged = brown.tagged_sents(categories='fiction',
tagset='universal')
brown_train = brown_fiction_tagged[75:]
brown_test = brown_fiction_tagged[:25]
t0=DefaultTagger('VB')
t1=UnigramTagger(brown_train, backoff=t0)
print('Accuracy: %4.1f%%' % ( 100.0 * t1.evaluate(brown_test)))
c. Trigram Tagger with Unigram Tagger and adjective backoff
from nltk.corpus import brown
from nltk import DefaultTagger
from nltk import UnigramTagger
from nltk import TrigramTagger
brown_fiction_tagged = brown.tagged_sents(categories='fiction',
tagset='universal')
brown_train = brown_fiction_tagged[75:]
brown_test = brown_fiction_tagged[:25]
t0=DefaultTagger('JJ')
t1=UnigramTagger(brown_train)
t2=TrigramTagger(brown_train, backoff=t0)
print('Accuracy: %4.1f%%' % ( 100.0 * t2.evaluate(brown_test)))
d. Trigram Tagger with a Bigram Tagger backoff
from nltk.corpus import brown
from nltk import BigramTagger
from nltk import TrigramTagger
brown_fiction_tagged = brown.tagged_sents(categories='fiction',
tagset='universal')
brown_train = brown_fiction_tagged[75:]
brown_test = brown_fiction_tagged[:25]
t0=BigramTagger(brown_train)
t1=TrigramTagger(brown_train, backoff=t0)
print('Accuracy: %4.1f%%' % ( 100.0 * t1.evaluate(brown_test)))
2. Train an Average Perceptron Tagger with different iterations. Compare the results of
using different iterations.
a. 1 iteration
import nltk
from nltk.corpus import brown
brown_fiction_tagged = brown.tagged_sents(categories='fiction',
tagset='universal')

brown_train = brown_fiction_tagged[75:]
brown_test = brown_fiction_tagged[:25]
from nltk.tag import untag
test_sent = untag(brown_test[0])

perceptron_trained = nltk.perceptron.PerceptronTagger(load=False)
perceptron_trained.train(brown_train, nr_iter=1)

print(perceptron_trained.evaluate(brown_train))
print(perceptron_trained.evaluate(brown_test))
b. 5 iteration
perceptron_trained.train(brown_train, nr_iter=5)
c. 10 iteration
perceptron_trained.train(brown_train, nr_iter=10)
3. Train a 3 Conditional Random Field using a different custom feature function. The
feature function must contain the features below. Model A should use features a-c.
Model B should use features a-e and Model C should use all the features.
a. Previous, Current, and Next Word
b. 1-3 Character Prefix
c. 1-3 Character Suffix
d. Capitalize
e. Word contains a number
f. Word is first in the sentence
g. Word is last in the sentence

You might also like