Welcome to Scribd!

HW #1 NLP: 1 System Overview

Uploaded by

0% found this document useful (0 votes)

11 views2 pages

This document describes the author's implementation of a named entity recognition model in Python using scikit-learn's CRFsuite package. The author tested various features and hyperparameters, including character n-grams, word length, and regularization strength. Through cross-validation on subsets of the training data, the best model used character n-grams with n=6, regularization strength of 0.0001, and 21 maximum iterations, achieving an F1 score of 0.902 on the development set.

Original Description:

hw nlp

Original Title

homework1-1770766

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as pdf or txt

0% found this document useful (0 votes)

11 views2 pages

HW #1 NLP: 1 System Overview

Uploaded by

Anonymous ob8Bqp

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as pdf or txt

Jump to Page

You are on page 1of 2

Search inside document

HW #1 NLP

Niccolo Campolungo 1770766

04/02/2016

1 System Overview
My model has been implemented in Python, using the sklearn crfsuite package, as suggested
by the assignment. The features have been encoded, again, as suggested in the paper: each
character of each word has a feature set containing the sequences of (up to) k characters to its
left and right; furthermore, a bias function was added, whose output has been set to 1 regardless
of input. More features have been tried to improve the model, but with little success; this part
will be explained in more detail in section 2.
Below is an example of a featureset for a single two-characters word, by (using = 1):

[{bias: 1.0, right_<w>: 1}, {bias: 1.0, left_<w>: 1, right_b: 1},

{bias: 1.0, left_b: 1, right_y: 1}, {bias: 1.0, left_y: 1, right_</w>: 1}]

2 Results and Analysis

2.0 Features addition
Throughout the phases reported more in detail below, I tried adding new features to enhance
the performance of the model. I tried adding simple features like word length, left and right
segments lengths, and index of character, aiming at an overall increase of score, without much
luck. Some of the features really decreased the models performance by a lot (up to 0.08),
whereas others just reduced it slightly, hence I opted to remove all of them.

2.1 First phase, tuning

The first thing to do was to tune the parameter. I implemented a simple for loop that
iterated over 20 values of (1..20) to find the one with highest f1 score. Without tuning other
parameters, the highest score was achieved with = 11 (f 1 0.882, p 0.936, r 0.834).

2.2 Second phase, parameters tuning

The real problem was finding the best combination among all the parameters; in general, this
can be done by Cross Validating our training data with the development set. The parameters
of our Grid Search were and M I, whereas the Grid Search type chosen was K-Fold, with k
ranging from 4 to 6. As for and M I, the former was set to be 10i , i = 1..5, while the latter
ranged from 1 to 100. All of this was iterated over different values of , going from 3 to 15
(since values lower than 3 and higher than 14 always gave much worse scores with some fixed
good parameters).

2.3 Third phase, training set subsets

As requested in the assignment, some [4, 5, 6]-Fold Grid Searches were executed using subsets
of the training set, specifically a quarter (Q), a half (H), the training set (T) and T + the
crowd sourced dataset (F). The table below reports the results of the above runs. One weird

1
thing is that T and F had the exact same scores across the three K-Folds, but I dont really
know how to explain this (I ran all the validations twice to make sure that the data was right).

K MI CV P R F1
Q 4 7 36 0.907 0.892 0.821 0.855
H 6 5 6 0.913 0.89 0.86 0.87
T 6 5 16 0.921 0.936 0.899 0.917
F 6 5 16 0.921 0.936 0.899 0.917

2.4 Fourth phase, crowd sourced training

Using the crowd sourced dataset along with the training set, I ended up re-executing the 2-
dimensional grid search (with 6-Fold Cross Validation and the same assumptions as before),
after repeating the first three phases performed on the original training set. All those steps
led to a final model that, with = 6, M I = 21 and = 0.0001 could score an f1 of 0.9018 on
the development set, whereas the scores on the test set were p 0.891, r 0.864, p 0.878.

2.5 Overall scores

The following table shows more data obtained throughout the evaluation of the model, in the
steps described above.

MI Precision Recall F1
Q 11 105 100 0.884 0.722 0.795
H 11 105 100 0.906 0.815 0.858
T 11 105 100 0.936 0.834 0.882
T 8 104 67 0.896 0.877 0.886
T 6 104 21 0.920 0.842 0.880
F 6 104 21 0.911 0.892 0.902

C1W3 Assignment
Document7 pages
C1W3 Assignment
Rainata Putra
No ratings yet
hmk#7 p-5
Document3 pages
hmk#7 p-5
wliuw
No ratings yet
Assignment3 Ans 2015 PDF
Document11 pages
Assignment3 Ans 2015 PDF
Mohsen Frag
No ratings yet
T-Test Z Test
Document33 pages
T-Test Z Test
Jovenil Bacatan
No ratings yet
Base Sas Certification Exercise
Document47 pages
Base Sas Certification Exercise
svidhyaghantasala
No ratings yet
A1 Tutorial Problems QE1 2015-16
Document11 pages
A1 Tutorial Problems QE1 2015-16
Avelyn Tang
100% (1)
Format of The Thesis Manuscript
Document11 pages
Format of The Thesis Manuscript
Slender Gaming TV
100% (1)
COCS71188 - AWS - Module Handbook SL
Document6 pages
COCS71188 - AWS - Module Handbook SL
Saabirah Ansar
No ratings yet
VCarve Inlay Description and Procedure
Document13 pages
VCarve Inlay Description and Procedure
miguelez69
100% (1)
Homework Assignment 3 Homework Assignment 3
Document10 pages
Homework Assignment 3 Homework Assignment 3
Ido Akov
No ratings yet
Tutorial On Loops and Functions: September 28, 2007
Document3 pages
Tutorial On Loops and Functions: September 28, 2007
ErdiApatay
No ratings yet
All Types of Cross Validation
Document9 pages
All Types of Cross Validation
Priya dharshini.G
No ratings yet
A Practical Guide To Support Vector Classi Cation - Chih-Wei Hsu, Chih-Chung Chang and Chih-Jen Lin
Document12 pages
A Practical Guide To Support Vector Classi Cation - Chih-Wei Hsu, Chih-Chung Chang and Chih-Jen Lin
Vítor Mangaravite
No ratings yet
Lab 6
Document6 pages
Lab 6
AMQ
No ratings yet
AP19110010030 R Lab-Assignment-4
Document7 pages
AP19110010030 R Lab-Assignment-4
Sravan Kilaru AP19110010030
No ratings yet
Carolina Found The Following Site With An Example of Unit Root Tests
Document10 pages
Carolina Found The Following Site With An Example of Unit Root Tests
neman018
100% (1)
DIT865 2018 Mar Solution
Document9 pages
DIT865 2018 Mar Solution
Education VietCo
No ratings yet
Slide Jaya Alg Vs Cuckoo Search Alg
Document14 pages
Slide Jaya Alg Vs Cuckoo Search Alg
Mashuk Ahmed
No ratings yet
21SM HW13 Group5 FridayMorning
Document7 pages
21SM HW13 Group5 FridayMorning
tung456bop
No ratings yet
Report On - Social Media Research Topic Modeling
Document26 pages
Report On - Social Media Research Topic Modeling
subhro biswas
No ratings yet
Workshop 4b Nonlin Buckling
Document30 pages
Workshop 4b Nonlin Buckling
Ron Rae
No ratings yet
CS-3035 (ML) - CS End April 2024
Document21 pages
CS-3035 (ML) - CS End April 2024
2106319
No ratings yet
Assignment 4
Document5 pages
Assignment 4
shellierathee
No ratings yet
Z-Test ND F-Test - Colab
Document3 pages
Z-Test ND F-Test - Colab
Subham Padhan
No ratings yet
Exam1 s16 Sol
Document10 pages
Exam1 s16 Sol
Vũ Quốc Ngọc
No ratings yet
Exam DUT 070816 Ans
Document5 pages
Exam DUT 070816 Ans
Edward Baleke Ssekulima
No ratings yet
SPECIMEN EXAM SOLUTIONS - CS1B - IFoA - 2019 - Final
Document8 pages
SPECIMEN EXAM SOLUTIONS - CS1B - IFoA - 2019 - Final
Kev Raz
No ratings yet
A Practical Guide To Support Vector Classification
Document16 pages
A Practical Guide To Support Vector Classification
Jônatas Oliveira Silva
No ratings yet
Suggession of Machine Learning
Document6 pages
Suggession of Machine Learning
Parthasarathi Hazra
No ratings yet
Golf Project Report
Document12 pages
Golf Project Report
Raja Sekhar P
No ratings yet
GPU Experiment 3 Arjun Kesani, 31410497 Venkatesh Sridharan, 31413670
Document3 pages
GPU Experiment 3 Arjun Kesani, 31410497 Venkatesh Sridharan, 31413670
Anuj Dsouza
No ratings yet
Assignment 1 - Solution
Document17 pages
Assignment 1 - Solution
Razin
No ratings yet
Merancang Jaringan Supply Chain (31 Hal)
Document27 pages
Merancang Jaringan Supply Chain (31 Hal)
puskom ibi-k57
No ratings yet
Solution First Point ML-HW4
Document6 pages
Solution First Point ML-HW4
Juan Sebastian Otálora Montenegro
100% (1)
Department of Statistics: Course Stats 330
Document5 pages
Department of Statistics: Course Stats 330
PETER
No ratings yet
LAB4
Document5 pages
LAB4
dam huu khoa
No ratings yet
CS771 IITK EndSem Solutions
Document8 pages
CS771 IITK EndSem Solutions
AnujNagpal
100% (1)
04 BasicAnalyses
Document44 pages
04 BasicAnalyses
Cotta Lee
No ratings yet
Statistical Learning Master Reports
Document11 pages
Statistical Learning Master Reports
Valentina Mendoza Zamora
No ratings yet
Base Sas Certification Exercise
Document47 pages
Base Sas Certification Exercise
Ershad Shaik
No ratings yet
HW1 Final
Document4 pages
HW1 Final
Vino Wad
No ratings yet
181 Pex 2 S
Document15 pages
181 Pex 2 S
Muzamil Shah
No ratings yet
Dar Solved Ans
Document20 pages
Dar Solved Ans
Dev
No ratings yet
A Practical Guide To Support Vector Classification: I I I N L
Document15 pages
A Practical Guide To Support Vector Classification: I I I N L
rabbityeah
No ratings yet
Stats151 Spring 2012 Assignment 3 Solutions: N X To N X
Document8 pages
Stats151 Spring 2012 Assignment 3 Solutions: N X To N X
Embtrans Inc
No ratings yet
Regression Testing
Document18 pages
Regression Testing
Aaka Sh
No ratings yet
SAS Project 2 (31.3 Pts. + 4.2 Bonus) Due Dec. 8: 1. Confidence Intervals/Hypothesis Tests
Document21 pages
SAS Project 2 (31.3 Pts. + 4.2 Bonus) Due Dec. 8: 1. Confidence Intervals/Hypothesis Tests
Ashwani Pasricha
No ratings yet
Bond Example For Discussion
Document2 pages
Bond Example For Discussion
VikramJha
No ratings yet
Solutions Modernstatistics
Document144 pages
Solutions Modernstatistics
divyansh
No ratings yet
Unit4 Fundamental Stat Maths2 (D)
Document28 pages
Unit4 Fundamental Stat Maths2 (D)
Azizul Anwar
No ratings yet
Statistics
Document3 pages
Statistics
phucvotrong1611
No ratings yet
C2M2 - Assignment: 1 Risk Models Using Tree-Based Models
Document38 pages
C2M2 - Assignment: 1 Risk Models Using Tree-Based Models
Sarah Mendes
100% (1)
Chapter Fourandfive
Document18 pages
Chapter Fourandfive
Simeon Paul taiwo
No ratings yet
Answers For End-Sem Exam Part - 2 (Deep Learning)
Document20 pages
Answers For End-Sem Exam Part - 2 (Deep Learning)
Ankur Borkar
No ratings yet
Intermediate Statistics Using SPSS 1st Edition Knapp Solutions Manual instant download all chapter
Document65 pages
Intermediate Statistics Using SPSS 1st Edition Knapp Solutions Manual instant download all chapter
meroonmulaz
100% (4)
Midterm Exam Economic Analysis
Document1 page
Midterm Exam Economic Analysis
Imran
No ratings yet
Big Data Machine Learning
Document6 pages
Big Data Machine Learning
Randy Marmer
No ratings yet
Industrial Statistics - A Computer Based Approach With Python
Document140 pages
Industrial Statistics - A Computer Based Approach With Python
htapiaq
No ratings yet
(Download PDF) Intermediate Statistics Using SPSS 1st Edition Knapp Solutions Manual Full Chapter
Document65 pages
(Download PDF) Intermediate Statistics Using SPSS 1st Edition Knapp Solutions Manual Full Chapter
imogenpalic84
100% (7)
ASSIGNMENT 3 - Probabilistic Models, GBDT, SVM
Document3 pages
ASSIGNMENT 3 - Probabilistic Models, GBDT, SVM
thecoolguy96
No ratings yet
A Short Introduction To The Caret Package: Max Kuhn June 20, 2013
Document10 pages
A Short Introduction To The Caret Package: Max Kuhn June 20, 2013
Renukha Pannala
No ratings yet
Six Sigma Green Belt, Round 2: Making Your Next Project Better than the Last One
From Everand
Six Sigma Green Belt, Round 2: Making Your Next Project Better than the Last One
Tracy L. Owens
No ratings yet
Advanced C Concepts and Programming: First Edition
From Everand
Advanced C Concepts and Programming: First Edition
Gayatri
Rating: 3 out of 5 stars
3/5 (1)
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
Active Server Pages Object Model
Document19 pages
Active Server Pages Object Model
uday
No ratings yet
Python Tutorial
Document181 pages
Python Tutorial
Rohit singh
No ratings yet
History of Autocad
Document4 pages
History of Autocad
Nivea Zuniega
No ratings yet
Cpu Evolution
Document3 pages
Cpu Evolution
Dinesh Kumar
No ratings yet
Kyle Phillips Isearch Paper
Document21 pages
Kyle Phillips Isearch Paper
api-650856612
No ratings yet
Sample Metrics For ITIL Processes
Document3 pages
Sample Metrics For ITIL Processes
santuchetu
No ratings yet
Q1. in Which of These Situations Are Interfaces Better Than Abstract Classes?
Document22 pages
Q1. in Which of These Situations Are Interfaces Better Than Abstract Classes?
Rio Nguyen
No ratings yet
Exam Data Structure
Document2 pages
Exam Data Structure
Asma Elmangoush
No ratings yet
Face Recognition
Document4 pages
Face Recognition
Nishikanta Nayak
No ratings yet
A Lab1acceslist
Document5 pages
A Lab1acceslist
veracespedes
No ratings yet
Lab guide-vRO
Document308 pages
Lab guide-vRO
Tahir Syed
No ratings yet
Troubleshooting Cisco Unified Contact Center Express PDF
Document118 pages
Troubleshooting Cisco Unified Contact Center Express PDF
KrisKriss
No ratings yet
RDS GRC PC PC101V1 Sol Summary EN XX-1
Document2 pages
RDS GRC PC PC101V1 Sol Summary EN XX-1
peterbore
No ratings yet
FB33 Teach Value Store Sort Dependent With Copy Function
Document6 pages
FB33 Teach Value Store Sort Dependent With Copy Function
pk cfctk
No ratings yet
01 - Working With JCS REST APIs
Document10 pages
01 - Working With JCS REST APIs
Maged
No ratings yet
KX-MB271, KX-MB781 - SM
Document293 pages
KX-MB271, KX-MB781 - SM
Ana Lorena Salvatierra
No ratings yet
Tuytuyt
Document33 pages
Tuytuyt
AmorVander
No ratings yet
Javac
Document9 pages
Javac
PPP
No ratings yet
BSI Annex SL Whitepaper PDF
Document4 pages
BSI Annex SL Whitepaper PDF
muhammad rizqan
No ratings yet
1 Random Variables: 1.1 Examples
Document3 pages
1 Random Variables: 1.1 Examples
Kimharly Versoza
No ratings yet
HERE Android SDK Premium Edition v3.9 Release Notes
Document97 pages
HERE Android SDK Premium Edition v3.9 Release Notes
jorgermz
No ratings yet
Linux Toolchain
Document34 pages
Linux Toolchain
Dhruv Chawda
No ratings yet
Sbi
Document2 pages
Sbi
yaswanth
No ratings yet
Spiral Cone Unfold
Document12 pages
Spiral Cone Unfold
Luis Albanes
No ratings yet
Cantata++ DO-178B ED-12B Tool Qualification Presentation
Document56 pages
Cantata++ DO-178B ED-12B Tool Qualification Presentation
kalkikali
No ratings yet