Professional Documents
Culture Documents
Analytics (16pgm54)
Analytics (16pgm54)
P (16PGM54)
What is text analytics
Text Analytics is the process of converting
unstructured text data into meaningful data for
analysis, to measure customer opinions, product
reviews, feedback, to provide search facility,
sentimental analysis and entity modeling to
support fact based decision making.
Text analysis is about deriving high-quality
structured data from unstructured text. Another
name for text analytics is text mining.
IE – information extraction
Information extraction (IE): Identification and
extraction of relevant facts and relationships from
unstructured text; the process of making structured
data from unstructured and semi structured text.
NAMED ENTITY RECOGNITION
RELATION EXTREATION
How IE works
Feature Selection takes raw text as input and identifies low-
level entities called features. Features can be things like
capitalized words, sequences of numbers, or the names of
Fortune 500 companies.
Identification uses features to build more complex entities and
the relationships among them, including sentiment and events.
For example, a common first name followed by a capitalized
word might resolve into a person’s full name.
Resolution involves cleaning up ambiguities that arise in the
output of the “Identification” step (that is, the entities,
relationships, events, and sentiment identified in the text). For
example, a document may use several different strings – first
name, full name, a pronoun – to identify the same person.
Flow chart of how IE works
What is system T web tool
The SystemT web tool is a drag-and-drop graphical
interface, available as part of IBM Big Insights.
It aids rapid iterative development, allowing users to develop
extractors, run them, immediately view results, and refine the
extractors.
It offers the benefit of text analytics, without requiring users to
write code.
It contains a rich library of pre-built extractors (general,
domain-specific, and task-specific).
BASIC VISUAL CONSTRUCTS FOR CREATING
EXTRACTORS USING THE SYSTEMT WEB TOOL
Atomic constructs
Composite constructs
Output refinement constructs
Information extraction with AQL
In SystemT, extraction programs are expressed in a
language called Annotation Query Language (AQL). In
the SystemT Web Tool that you used so far, the
visual specification of the extractor is in fact automatically
translated into an AQL program.
AQL is a declarative language: the developer declares the
semantics of the extractor in AQL in a logical way, without
specifying how the AQL program should be executed. The
SystemT Optimizer compiles the AQL program into
a compiled execution plan. This compiled plan is
executed on an input document to output the extraction
results.
Advantages of the declarative AQL
languages
It separates the extractor semantics and the implementation.
This means that the AQL developer must think only about
"what" to extract, enabling an Optimizer to automatically
determine "how" to implement it efficiently. Hence, you get
better runtime performance through optimization.