Professional Documents
Culture Documents
Diana FB
Diana FB
What is NE?
What isnt NE?
Problems and solutions with NE task
definitions
Problems and solutions with NE task
Some applications
Why do NE Recognition?
Key part of Information Extraction system
Robust handling of proper names essential for
many applications
Pre-processing for different classification levels
Information filtering
Information linking
NE Definition
NE involves identification of proper names in texts,
and classification into a set of predefined categories of
interest.
Three universally accepted categories: person, location
and organisation
Other common tasks: recognition of date/time
expressions, measures (percent, money, weight etc),
email addresses etc.
Other domain-specific entities: names of drugs, medical
conditions, names of ships, bibliographic references etc.
What NE is NOT
documents
format NE
tokeniser gazetteer
analysis grammar
NEs
Modules
Tokeniser
segments text into tokens, e.g. words,
numbers, punctuation
Gazetteer lists
NEs, e.g. towns, names, countries, ...
key words, e.g. company designators, titles, ...
Grammar
hand-coded rules for NE recognition
JAPE
(UPPERCASE _LETTER)
(LOWERCASE_LETTER)* >
Token; orth = upperInitial; kind = word
Gazetteer
Set of lists compiled into Finite State Machines
Each list has attributes MajorType and
MinorType (and optionally, Language)
Term
Matching
Recognition of Biological
Terminology
Results:
Results: We We have
have determined
determined the the crystal
crystal
structure
structure ofof aa triacylglycerol
triacylglycerol lipase lipase from
from
Pseudomonas
Pseudomonas cepacia
cepacia (Pet)
(Pet) in
in the
the absence
absence of
of aa
bound
bound inhibitor
inhibitor using
using X-ray
X-ray crystallography.
crystallography.
The
The structure shows the lipase to
structure shows the lipase to contain
contain an
an
alpha/beta-hydrolase fold and a catalytic
alpha/beta-hydrolase fold and a catalytic triad triad
comprising
comprising of of residues
residues Ser87,
Ser87, His286His286 and
and
Asp264.
Asp264. The
The enzyme
enzyme shares
shares several
several structural
structural
features
features with
with homologous
homologous lipases
lipases from
from
Pseudomonas glumae (PgL) and Chromobacterium
Pseudomonas glumae (PgL) and Chromobacterium
viscosum
viscosum (CvL),
(CvL), including
including aa calcium-binding
calcium-binding
site.
site. The
The present
present structure
structure of of Pet
Pet reveals
reveals aa
highly
highly open
open conformation
conformation with with aa solvent-
solvent-
accessible
accessible active site. This is in contrast to
active site. This is in contrast to
the structures of PgL and Pet
the structures of PgL and Pet in which thein which the
active
active site
site isis buried
buried under
under aa closedclosed oror
partially
partially opened
opened 'lid',
'lid', respectively.
respectively.
MUMIS