Week6 Lab Pos Tagging

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

Computational Linguistics University of Passau

Lab session Annette Hautli-Janisz, Prof. Dr.


Summer 2022

Week 6: POS tagging

Task 1: HMM
Build the Viterbi lattice for the sentence ‘Janet will back the bill’ given the transition
probabilities and the observation probabilities below (taken from SLP, Chapter 8). Make
sure you understand the procedure.

Task 2: Tagging errors & evaluation


a) Find one tagging error in each of the following sentences that are tagged with the
Penn Treebank tagset:

1. I/PRP need/VBP a/DT flight/NN from/IN Atlanta/NN

2. Does/VBZ this/DT flight/NN serve/VB dinner/NNS

1
3. I/PRP have/VB a/DT friend/NN living/VBG in/IN Denver/NNP

4. Can/VBP you/PRP list/VB the/DT nonstop/JJ afternoon/NN flights/NNS

b) Use the Penn Treebank tagset to tag each word in the following sentences from
Damon Runyons short stories. You may ignore punctuation. Some of these are quite
difficult; do your best.

1. It is a nice night.

2. This crap game is over a garage in Fifty-second Street.

3. Nobody ever takes the newspapers she sells.

4. He is a tall, skinny guy with a long, sad, mean-looking kisser, and a mournful
voice.

5. I am sitting in Mindys restaurant putting on the gefillte fish, which is a dish I am


very fond of,...

6. When a guy and a doll get to taking peeks back and forth at each other, why there
you are indeed.

Before we discuss the analysis, get together with a fellow student (or group). What is the
‘observed agreement’ for sentences 2 and 5 between you, i.e., what is the percentage
of agreement?

c) Calculate Fleiss’ κ for more than two annotators (data elicited impromptu from student
judgements in the session). Those who need to catch up on the measure can have a
look at the Wikipedia page of Fleiss’ κ: (https://en.wikipedia.org/wiki/Fleiss%
27_kappa).

Task 3: Stemming versus lemmatization


a) The Porter Stemmer (https://tartarus.org/martin/PorterStemmer/) is one of
most widely used stemmers for English – see http://snowball.tartarus.org/algorithms/
porter/diffs.txt for vocabulary (left column) is reduced to its stemmed version (right
column). What does that tell you about the properties of stemming?
b) Compare to lemmatization, for instance in the example below. What are the differen-
ces between the two approaches?

#> Original Lemma Tags


# 0 the the DT
# 1 bats bat NNS
# 2 saw see VVD

2
# 3 the the DT
# 4 cats cat NNS
# 5 with with IN
# 6 best good JJS
# 7 stripes stripe NNS
# 8 hanging hang VVG
# 9 upside upside RB
# 10 down down RB
# 11 by by IN
# 12 their their PP$
# 13 feet foot NNS
# 14 unplugging unplug VBG

NLP applications
If you ever want to do MSc thesis in NLPh, have a look at the Stanford NER and POS
tagger here: https://nlp.stanford.edu/software/

Python users make use of the spaCy library (https://spacy.io/) (or NLTK, which is
older and less accurate).

You might also like