Professional Documents
Culture Documents
HW #1 NLP: 1 System Overview
HW #1 NLP: 1 System Overview
04/02/2016
1 System Overview
My model has been implemented in Python, using the sklearn crfsuite package, as suggested
by the assignment. The features have been encoded, again, as suggested in the paper: each
character of each word has a feature set containing the sequences of (up to) k characters to its
left and right; furthermore, a bias function was added, whose output has been set to 1 regardless
of input. More features have been tried to improve the model, but with little success; this part
will be explained in more detail in section 2.
Below is an example of a featureset for a single two-characters word, by (using = 1):
1
thing is that T and F had the exact same scores across the three K-Folds, but I dont really
know how to explain this (I ran all the validations twice to make sure that the data was right).
K MI CV P R F1
Q 4 7 36 0.907 0.892 0.821 0.855
H 6 5 6 0.913 0.89 0.86 0.87
T 6 5 16 0.921 0.936 0.899 0.917
F 6 5 16 0.921 0.936 0.899 0.917
MI Precision Recall F1
Q 11 105 100 0.884 0.722 0.795
H 11 105 100 0.906 0.815 0.858
T 11 105 100 0.936 0.834 0.882
T 8 104 67 0.896 0.877 0.886
T 6 104 21 0.920 0.842 0.880
F 6 104 21 0.911 0.892 0.902