Download as pdf or txt
Download as pdf or txt
You are on page 1of 78

INFORMATION

EXTRACTION FROM
TEXT

TRU CAO
HO CHI MINH CITY UNIVERSITY OF TECHNOLOGY
AND JOHN VON NEUMANN INSTITUTE
OUTLINE
• Named entity recognition and relation extraction

• Practical applications

• Rule-based methods

• Machine learning methods

• State of the art

5/19/19
Cao Hoang Tru 2
NAMED ENTITY RECOGNITION
• Named entity:
• Entity that is referred to by a proper name.

• Example: Barack Obama, Ministry of Education, Saigon.

• Named entity recognition (NER):


• To recognize the category of a named entity.

• Main categories: PERSON, ORGANIZATION, LOCATION.

• Sub-categories are defined in an ontology of discourse.

5/19/19
Cao Hoang Tru 3
RELATION EXTRACTION
• To extract the relation between named entities.
• Relations are defined in an ontology of discourse.
• Example:
• Input text:
• “In 1998, Larry Page and Sergey Brin founded Google Inc.”

• Extracted relations:
• FounderOf(Larry Page, Google Inc.)
• FounderOf(Sergey Brin, Google Inc.)
• FoundedIn(Google Inc., 1998)

5/19/19
Cao Hoang Tru 4
PRACTICAL APPLICATIONS
• General applications:
• Semantic analysis and text understanding
• Knowledge discovery from text

• Specific domain applications:


• Electronic medical records: to recognize problems, tests, and
treatments, and to discover their relations (medication).
• Online shopping: to survey product prices on the web.
• Semantic search: “Find web pages about universities”.

Bệnh_nhân siêu_âm phát_hiện sỏi thận , phẫu_thuật


nội_soi nhưng không khỏi, hiện_tại đau nhiều vùng hông
lưng.

5/19/19
Cao Hoang Tru 5
RULE-BASED METHODS
• 2004-2006: Vietnamese Semantic Web
• A national key project funded by Ministry of Science and
Technology.
• To recognize and annotate named entities in Vietnamese
web pages.
• To build up a knowledge base and to manage web pages
annotated with popular Vietnamese named entities (VN-KIM).

5/19/19
Cao Hoang Tru 6
RULE-BASED METHODS
• 2004-2006: Vietnamese Semantic Web

5/19/19
Cao Hoang Tru 7
RULE-BASED METHODS
• 2007-2009: Information Extraction and integration
on Vietnamese Semantic Web (VNUHCM)
Plug-ins
S-Search S-Editor

VN-KIM Front-End

APIs

Semantic LUCENE VN-KIM IE Fuzzy SESAME

Semantic Web
Semantic Annotation Knowledge Base
Repository

5/19/19
Cao Hoang Tru 8
A RULE-BASED NER METHOD
• Nguyen, V.T.T. & Cao, T.H. (2007), VN-KIM IE:
Automatic Extraction of Vietnamese Named-
Entities on the Web. Journal of New Generation
Computing, 25 (3), 277-292.

5/19/19
Cao Hoang Tru 9
A RULE-BASED NER METHOD
• Ontology and knowledge base:
• VN-KIM KB: 370 classes and 115 properties, with over
120,000 entities (about 60% are in Vietnam and the rest in
the world).
• Lexical resource: words surrounding entity proper names.
• Examples: “Chủ tịch Hồ Chí Minh”, “Thành phố Hồ Chí Minh”.
• Each entity class has a corresponding lexical resource.

5/19/19
Cao Hoang Tru 10
A RULE-BASED NER METHOD
• System architecture

5/19/19
Cao Hoang Tru 11
A RULE-BASED NER METHOD
• VN Hash Gazetteer:
• To match a name against entity aliases and abbreviations.
• To generate temporary annotations.
• Examples: “Bộ Giáo dục - Đào tạo”, “Bộ GD&ĐT”, …

5/19/19
Cao Hoang Tru 12
A RULE-BASED NER METHOD
• Pattern Matching

5/19/19
Cao Hoang Tru 13
A RULE-BASED NER METHOD
• Removal of misclassified words in capitals:
• An entity name often comes with an initial uppercase
character, but the reverse may not true.
• Example: “Bình minh trên đỉnh Hàm Rồng”.

5/19/19
Cao Hoang Tru 14
A RULE-BASED NER METHOD
• Recognition of overlapping entities:
• Sharing a common text segment.
• Example: “Giám đốc công ty FPT Trương Gia Bình”.

5/19/19
Cao Hoang Tru 15
A RULE-BASED NER METHOD
• Lexical resource-based recognition:
• Lexical resource words provide contextual and structural
information for recognizing NEs yet present in the knowledge
base.
• Example: “Ca sĩ Minh Vương”.

5/19/19
Cao Hoang Tru 16
A RULE-BASED NER METHOD
• Context-based recognition:
• Contextual words such as conjunctions can also help to
identify entity classes.
• Example: “Công ty Kinh Đô và Thăng Long”.

5/19/19
Cao Hoang Tru 17
A RULE-BASED NER METHOD
• Removal of inconsistent annotations:
• After the previous steps, a NE can be associated with two or
more annotations that are inconsistent to each other.
• Example: “Trường đại học Tôn Đức Thắng”.

5/19/19
Cao Hoang Tru 18
A RULE-BASED NER METHOD
• Performance evaluation:

5/19/19
Cao Hoang Tru 19
A RULE-BASED NER METHOD
• Performance evaluation:

5/19/19
Cao Hoang Tru 20
RULE-BASED RELATION
EXTRACTION
• Cao, T.H. & Cao, T.D. & Tran, T.L. (2008), A Robust
Ontology-Based Method for Translating Natural
Language Queries to Conceptual Graphs. In Proc.
of the 3th Asian Semantic Web Conference,
Springer-Verlag, 479-492.

5/19/19
Cao Hoang Tru 21
RULE-BASED RELATION
EXTRACTION
• Conceptual graphs:
• A bipartite graph of concept vertices alternate with
(conceptual) relation vertices.
• Example: “Cognac is produced in a province in France”.

5/19/19
Cao Hoang Tru 22
RULE-BASED RELATION
EXTRACTION
• A syntax-free method:

E1 R1 E2 R2 E3 R3

5/19/19
Cao Hoang Tru 23
RULE-BASED RELATION
EXTRACTION
• A syntax-free method:
• Taking entities as anchors

“What county is Modesto, California in?”

entities
• Recognizing relations between entities

“What county is Modesto, California in?”

5/19/19
relations to be recognized
Cao Hoang Tru 24
RULE-BASED RELATION
EXTRACTION
• A syntax-free method (9 steps):
• Recognizing specified entities
• Recognizing unspecified entities
• Extracting relational phrases
• Determining the type of queried entities
• Unifying identical entities
• Discovering implicit relations
• Determining the types of relations
• Removing improper relations
• Constructing the final conceptual graph
5/19/19
Cao Hoang Tru 25
RULE-BASED RELATION
EXTRACTION
• A syntax-free method (steps 1-4):
• Recognizing specified entities
• What is the capital of Mongolia?
• Recognizing unspecified entities
• How many counties are in Indiana?
• Extracting relational phrases
• What state is Niagara Falls located in?
• Determining the type of queried entities
• What is WWE short for?

5/19/19
Cao Hoang Tru 26
RULE-BASED RELATION
EXTRACTION
• A syntax-free method (steps 5-9):
• Unifying identical entities
• Who is the president of Bolivia?
• Discovering implicit relations
• What county is Modesto, California in?
• Determining the types of relations
• When was Microsoft established?
• Removing improper relations
• What city in Florida is Sea World in?
• Constructing the final conceptual graph

5/19/19
Cao Hoang Tru 27
RULE-BASED RELATION
EXTRACTION
• Performance evaluation:
• R-error: due to GATE’s performance.
• O-error: due to lack of entity types, relation types, NEs in
KIM ontology and knowledge base.
• Q-error: due to expressiveness of simple conceptual graphs.
• M-error: due to the proposed algorithm itself.

5/19/19
Cao Hoang Tru 28
RULE-BASED RELATION
EXTRACTION
• Performance evaluation:
Query Number of Correct
R-errors O-errors Q-errors M-errors
Type Queries CGs
What 173 120 0 23 28 2
Which 15 9 0 2 4 0
Where 13 9 0 2 0 2
Who 57 36 0 9 11 1
When 13 10 0 2 1 0
How 56 4 0 2 50 0
Other 118 81 0 19 18 0
269 0 59 112 5
Total 445
(60.45%) (0%) (13.26%) (25.17%) (1.12%)

5/19/19
Cao Hoang Tru 29
RULE-BASED METHODS FOR NER
• Advantages:
• The rules are transparent (humans can understand them).
• No training corpus is required.
• Effective if the rules are well defined.

• Disadvantages:
• The labor cost is high for manually specifying the rules.
• Coverage of the rules is limited.
• It is difficult to make extension.

5/19/19
Cao Hoang Tru 30
MACHINE LEARNING FOR NER
• Class labels = {PER, ORG, LOC, OTHER}.
• Word sequence:
• Example: “Facebook CEO Zuckerberg visited Vietnam”.

ORG OTHER PER OTHER LOC


• NER: to find the most probable label sequence for
a given word sequence.

5/19/19
Cao Hoang Tru 31
HIDDEN MARKOV MODELS
• Introduction
• Example
• Independence assumptions
• Forward algorithm
• Viterbi algorithm
• Training
• Application to NER

5/19/19
Cao Hoang Tru 32
HIDDEN MARKOV MODELS
• One of the most popular graphical models.
• Dynamic extension of Bayesian networks.
• Sequential extension of Naïve Bayes classifier.

5/19/19
Cao Hoang Tru 33
HIDDEN MARKOV MODELS
• Example:
• Your possible looking prior to the exam = {tired, hungover,
scared, fine}.
• Your possible activity last night = {TV, pub, party, study}.
• Given a sequence of observations of your looking, guess
what you did in previous nights.

5/19/19
Cao Hoang Tru 34
HIDDEN MARKOV MODELS
• Example:
• Your possible looking prior to the exam = {tired, hungover,
scared, fine}.
• Your possible activity last night = {TV, pub, party, study}.
• Given a sequence of observations of your looking, guess
what you did in previous nights.

Fri Sat Sun Mon

Your activity
last night
? ? ? ?

Your looking fine hungover tired scared


today
5/19/19
Cao Hoang Tru 35
HIDDEN MARKOV MODELS
• Example:
• Your possible looking prior to the exam = {tired, hungover,
scared, fine}.
• Your possible activity last night = {TV, pub, party, study}.
• Given a sequence of observations of your looking, guess
what you did in previous nights.
• A model:
• Your looking depends on what you did in the night before.
• Your activity in a night depends on what you did in some
previous nights.

5/19/19
Cao Hoang Tru 36
HIDDEN MARKOV MODELS
• A finite set of possible observations.
• A finite set of possible hidden states.
• To predict the most probable sequence of
underlying states {y1, y2, …, yT} for a given
sequence of observations {x1, x2, …, xT}.
transits

states y1 yt-1 yt yT
emits

observations x1 xt-1 xt xT
5/19/19
Cao Hoang Tru 37
HIDDEN MARKOV MODELS
0.05 0.4
Tired 0.3 Transition probability Tired 0.2
Hungover 0.4 Hungover 0.1
0.7
Scared 0.2 Scared 0.2
Fine 0.1 Fine 0.5
Party TV
Emission
(observation)
0.1
probability 0.6 0.2

0.1 0.05 0.3 0.2

0.25 0.3
0.25

Tired 0.4
Pub Study Tired 0.3
Hungover 0.2 Hungover 0.05
Scared 0.1 0.4 Scared 0.3
Fine 0.3 Fine 0.35
0.05 0.05
5/19/19
Cao Hoang Tru
Marsland, S. (2009) Machine Learning: An Algorithmic Perspective.
38
HIDDEN MARKOV MODELS
0.05
Σy p(y | yt) = 1 0.4
Tired 0.3 Transition probability Tired 0.2
Hungover 0.4 Hungover 0.1
0.7
Scared 0.2 Scared 0.2
Fine 0.1 Fine 0.5
Party TV
Emission
(observation)
0.1
probability 0.6 0.2
Σx p(x | yt) = 1
0.1 0.05 0.3 0.2

0.25 0.3
0.25

Tired 0.4
Pub Study Tired 0.3
Hungover 0.2 Hungover 0.05
Scared 0.1 0.4 Scared 0.3
Fine 0.3 Fine 0.35
0.05 0.05
5/19/19
Cao Hoang Tru
Marsland, S. (2009) Machine Learning: An Algorithmic Perspective.
39
HIDDEN MARKOV MODELS
• HMM conditional independence assumptions:
• State at time t depends only on state at time t – 1.
p(yt | yt-1, Z) = p(yt | yt-1)
• Observation at time t depends only on state at time t.
p(xt | yt, Z) = p(xt | yt)

5/19/19
Cao Hoang Tru 40
HIDDEN MARKOV MODELS
• Generative model:
• Joint distributions: p(Y, X)
Example: binary A, B, C
p(A, B, C), p(-A, B, C), …., p(-A, -B, -C)
• It can generate any distribution on Y and X.
Example: p(A | -B, C) = p(A, -B, C)/p(-B, C)
p(-B, C) = p(A, -B, C) + p(-A, -B, C)

5/19/19
Cao Hoang Tru 41
HIDDEN MARKOV MODELS
• Generative model:
• Joint distributions: p(Y, X)
Example: binary A, B, C
p(A, B, C), p(-A, B, C), …., p(-A, -B, -C)
• It can generate any distribution on Y and X.
Example: p(A | -B, C) = p(A, -B, C)/p(-B, C)
p(-B, C) = p(A, -B, C) + p(-A, -B, C)

• Discriminative model:
• Conditional distributions: p(Y I X)
• It discriminates Y given X.

5/19/19
Cao Hoang Tru 42
HIDDEN MARKOV MODELS
• HMM is a generative model:
• Joint distributions: one can prove that
p(Y, X) = p(y1, y2,…, yT, x1, x2,…, xT) = Πt=1,T p(xt | yt).p(yt | yt-1)

5/19/19
Cao Hoang Tru 43
HIDDEN MARKOV MODELS
• HMM is a generative model:
• Joint distributions: one can prove that
p(Y, X) = p(y1, y2,…, yT, x1, x2,…, xT) = Πt=1,T p(xt | yt).p(yt | yt-1)

Values are given in HMM

5/19/19
Cao Hoang Tru 44
HIDDEN MARKOV MODELS
• HMM is a generative model:
• Joint distributions:
p(Y, X) = p(y1, y2,…, yT, x1, x2,…, xT) = Πt=1,T p(xt | yt).p(yt | yt-1)
Proof:
p(y1, y2,…, yT, x1, x2,…, xT)
= p(xT | y1, y2,…, yT, x1, x2,…, xT-1).p(y1, y2,…, yT, x1, x2,…, xT-1)
= p(xT | yT).p(y1, y2,…, yT, x1, x2,…, xT-1)
= p(xT | yT).p(yT | y1, …, yT-1, x1, …, xT-1).p(y1, …, yT-1, x1, …, xT-1)
= p(xT | yT).p(yT | yT-1).p(y1, …, yT-1, x1, …, xT-1)
=…

5/19/19
Cao Hoang Tru 45
HIDDEN MARKOV MODELS
• HMM is a generative model:
• Joint distributions:
p(Y, X) = p(y1, y2,…, yT, x1, x2,…, xT) = Πt=1,T p(xt | yt).p(yt | yt-1)
p(y1 | y0) = p(y1)

5/19/19
Cao Hoang Tru 46
HIDDEN MARKOV MODELS
• HMM is a generative model:
• Joint distributions: transition emission
p(Y, X) = p(y1, y2,…, yT, x1, x2,…, xT) = Πt=1,T p(yt | yt-1).p(xt | yt)
p(y1 | y0) = p(y1)
• It can generate any distribution on Y and X.

5/19/19
Cao Hoang Tru 47
HIDDEN MARKOV MODELS
• HMM is a generative model:
• Joint distributions:
p(Y, X) = p(y1, y2,…, yT, x1, x2,…, xT) = Πt=1,T p(yt | yt-1).p(xt | yt)
p(y1 | y0) = p(y1)
• It can generate any distribution on Y and X.

• In contrast to a discriminative model (e.g., CRF):


• Conditional distributions: p(Y I X)
• It discriminates Y given X.

5/19/19
Cao Hoang Tru 48
HIDDEN MARKOV MODELS
• Forward algorithm:
• To compute the joint probability of the state at time t being yt
and the sequence of observations in the first t steps being
{x1, x2, …, xt}:
αt(yt) = p(yt, x1, x2, …, xt)

5/19/19
Cao Hoang Tru 49
HIDDEN MARKOV MODELS
• Forward algorithm:
• To compute the joint probability of the state at time t being yt
and the sequence of observations in the first t steps being
{x1, x2, …, xt}:
αt(yt) = p(yt, x1, x2, …, xt)
• Bayes’ theorem gives:
p(yt I x1, x2, …, xt)
= p(yt, x1, x2, …, xt)/p(x1, x2, …, xt)
= αt(yt)/p(x1, x2, …, xt)

5/19/19
Cao Hoang Tru 50
HIDDEN MARKOV MODELS
• Forward algorithm:
• To compute the joint probability of the state at time t being yt
and the sequence of observations in the first t steps being
{x1, x2, …, xt}:
αt(yt) = p(yt, x1, x2, …, xt)

• Bayes’ theorem gives:


p(yt I x1, x2, …, xt)
= p(yt, x1, x2, …, xt)/p(x1, x2, …, xt)
= αt(yt)/p(x1, x2, …, xt)

• The highest αt(yt) is, the most likely yt would be given the
same {x1, x2, …, xt}.

5/19/19
Cao Hoang Tru 51
HIDDEN MARKOV MODELS
• Forward algorithm:
• To compute the joint probability of the state at time t being yt
and the sequence of observations in the first t steps being
{x1, x2, …, xt}:
αt(yt) = p(yt, x1, x2, …, xt)
• The highest αt(yt) is, the most likely yt would be given the
same {x1, x2, …, xt}.
Fri Sat Sun Mon

Your activity
last night
?

Your looking fine hungover tired scared


5/19/19
today
Cao Hoang Tru 52
HIDDEN MARKOV MODELS
• “Naïve” computation:
αt(yt)
= p(yt, x1, x2, …, xt)
= Σy1,y2, .., yt-1 p(y1, y2, …, yt-1, yt, x1, x2, …, xt)

like
p(A) = p(A, B) + p(A, -B)
p(A) = p(A, B, C) + p(A, -B, C) + p(A, B, -C) + p(A, -B, -C)

5/19/19
Cao Hoang Tru 53
HIDDEN MARKOV MODELS
• Forward algorithm:
αt(yt)
= p(yt, x1, x2, …, xt)
= Σyt-1p(yt, yt-1, x1, x2, …, xt)

= Σyt-1p(xt | yt, yt-1, x1, x2, …, xt-1).p(yt, yt-1, x1, x2, …, xt-1)

= Σyt-1p(xt | yt).p(yt | yt-1, x1, x2, …, xt-1).p(yt-1, x1, x2, …, xt-1)

= Σyt-1p(xt | yt).p(yt | yt-1).p(yt-1, x1, x2, …, xt-1)

= p(xt | yt) Σyt-1 p(yt | yt-1).αt-1(yt-1)


5/19/19
Cao Hoang Tru 54
HIDDEN MARKOV MODELS
• Forward algorithm:
αt(yt) α1(y1) = p(y1, x1) = p(x1| y1). p(y1)
= p(yt, x1, x2, …, xt)
= Σyt-1p(yt, yt-1, x1, x2, …, xt)

= Σyt-1p(xt | yt, yt-1, x1, x2, …, xt-1).p(yt, yt-1, x1, x2, …, xt-1)

= Σyt-1p(xt | yt).p(yt | yt-1, x1, x2, …, xt-1).p(yt-1, x1, x2, …, xt-1)

= Σyt-1p(xt | yt).p(yt | yt-1).p(yt-1, x1, x2, …, xt-1)

= p(xt | yt) Σyt-1 p(yt | yt-1).αt-1(yt-1)


5/19/19
Cao Hoang Tru 55
HIDDEN MARKOV MODELS
• Forward algorithm:
α1(y1) = p(y1, x1) = p(x1| y1). p(y1)
αt(yt) = p(xt | yt) Σyt-1 p(yt | yt-1).αt-1(yt-1)

X
5/19/19
Cao Hoang Tru 56
HIDDEN MARKOV MODELS
• Forward algorithm’s complexity:
αt(yt)
= p(xt | yt) Σyt-1 p(yt | yt-1).αt-1(yt-1)

fT = O(T.N2)

5/19/19
Cao Hoang Tru 57
HIDDEN MARKOV MODELS
• Viterbi algorithm:
• To compute the most probable sequence of states {y1, y2, …, yT}
given a sequence of observations {x1, x2, …, xT}:
Y* = argmaxY p(Y | X) = argmaxY [p(Y, X)/p(X)] = argmaxY p(Y, X)

5/19/19
Cao Hoang Tru 58
HIDDEN MARKOV MODELS
• Viterbi algorithm:
• To compute the most probable sequence of states {y1, y2, …, yT}
given a sequence of observations {x1, x2, …, xT}:
Y* = argmaxY p(Y | X) = argmaxY p(Y, X)

Fri Sat Sun Mon

Your activity
? ? ? ?

Your looking fine hungover tired scared

5/19/19
Cao Hoang Tru 59
HIDDEN MARKOV MODELS
• Viterbi algorithm:
• To compute the most probable sequence of states {y1, y2, …, yT}
given a sequence of observations {x1, x2, …, xT}:
Y* = argmaxY p(Y | X) = argmaxY p(Y, X)

5/19/19
Cao Hoang Tru 60
HIDDEN MARKOV MODELS
• Viterbi algorithm:
• To compute the most probable sequence of states {y1, y2, …, yT}
given a sequence of observations {x1, x2, …, xT}:
Y* = argmaxY p(Y | X) = argmaxY p(Y, X)

5/19/19
Cao Hoang Tru 61
HIDDEN MARKOV MODELS
• Exhaustive search complexity: |Y|T (!)
maxy1:T p(y1, y2, …, yT, x1, x2, …, xT)

5/19/19
Cao Hoang Tru 62
HIDDEN MARKOV MODELS
• Viterbi algorithm: maxa,b f(a, b)
maxy1:T p(y1, y2, …, yT, x1, x2, …, xT) = maxa (maxb f(a, b))

= maxyT maxy1:T-1 p(y1, y2, …, yT-1, yT, x1, x2, …, xT) Best for each yT

= maxyT maxy1:T-1 (p(xT | yT).p(yT | yT-1).p(y1, …, yT-1, x1, …, xT-1))


= maxyT maxyT-1 maxy1:T-2(p(xT | yT).p(yT | yT-1).p(y1, …, yT-1, x1, …, xT-1))
= maxyT maxyT-1 (p(xT | yT).p(yT | yT-1).maxy1:T-2p(y1,…, yT-2, yT-1, x1,…, xT-2, xT-1))
Best for each yT-1
y1 y2 yT-2 yT-1 yT

Best for each yT-1

5/19/19 maxy1:T-2p(y1,…, yT-2, yT-1, x1,…, xT-2, xT-1) p(xT | yT).p(yT | yT-1)
Cao Hoang Tru 63
HIDDEN MARKOV MODELS
• Viterbi algorithm:
maxy1:T p(y1, y2, …, yT, x1, x2, …, xT)
= maxyT maxy1:T-1 p(y1, y2, …, yT, x1, x2, …, xT) Best for each yT

= maxyT maxyT-1 (p(xT | yT).p(yT | yT-1).maxy1:T-2p(y1,…, yT-2, yT-1, x1,…, xT-2, xT-1))
Best for each yT-1

• Dynamic programming:
• Solving, storing, and reusing solutions of the sub-problems for the
current problem.

5/19/19
Cao Hoang Tru 64
HIDDEN MARKOV MODELS
• Viterbi algorithm:
maxy1:T p(y1, y2, …, yT, x1, x2, …, xT)
= maxyT maxy1:T-1 p(y1, y2, …, yT, x1, x2, …, xT) Best for each yT

= maxyT maxyT-1 (p(xT | yT).p(yT | yT-1).maxy1:T-2p(y1,…, yT-2, yT-1, x1,…, xT-2, xT-1))
Best for each yT-1

• Dynamic programming:
• Compute:
maxy1 p(y1, x1) = maxy1 p(x1 | y1).p(y1)
• For each t from 2 to T, and for each state yt, compute:
argmaxy1:t-1 p(y1, y2, …, yt, x1, x2, …, xt)
= argmaxyt-1 (p(xt | yt).p(yt | yt-1).maxy1:t-2p(y1,…, yt-2, yt-1, x1,…, xt-2, xt-1))
• Select:
argmaxyT maxy1:T-1 p(y1, y2, …, yT, x1, x2, …, xT)
65
HIDDEN MARKOV MODELS
• Dynamic programming:

66
HIDDEN MARKOV MODELS
• Could the results from the forward algorithm be
used for Viterbi algorithm?

5/19/19
Cao Hoang Tru 67
HIDDEN MARKOV MODELS
• Where does an HMM come from?

5/19/19
Cao Hoang Tru 68
HIDDEN MARKOV MODELS
• Training HMMs:
• Topology is designed beforehand.
• Parameters to be learned: emission and transition probabilities.
• Supervised or unsupervised training.

5/19/19
Cao Hoang Tru 69
HIDDEN MARKOV MODELS
• Supervised learning:
• Training data: paired sequences of states and observations
(y1, y2, …, yT, x1, x2, …, xT)

y
x

5/19/19
Cao Hoang Tru 70
HIDDEN MARKOV MODELS
• Supervised learning:
• Training data: paired sequences of states and observations
(y1, y2, …, yT, x1, x2, …, xT)

y
x

• p(y1 = F) = ?

5/20/19
Cao Hoang Tru 71
HIDDEN MARKOV MODELS
• Supervised learning:
• Training data: paired sequences of states and observations
(y1, y2, …, yT, x1, x2, …, xT)

y
x

• p(y1 = F) = the prior probability of the first hidden state in a


sequence being F = 4/8

5/19/19
Cao Hoang Tru 72
HIDDEN MARKOV MODELS
• Supervised learning:
• Training data: paired sequences of states and observations
(y1, y2, …, yT, x1, x2, …, xT)

y
x

• p(y1 = F) = 4/8
• p(y = F | y* = B) = ?

5/20/19
Cao Hoang Tru 73
HIDDEN MARKOV MODELS
• Supervised learning:
• Training data: paired sequences of states and observations
(y1, y2, …, yT, x1, x2, …, xT)

y
x

• p(y1 = F) = 4/8
• p(y = F | y* = B) = number of (B, F)/number of (B, B/F) = 10/11

5/19/19
Cao Hoang Tru 74
HIDDEN MARKOV MODELS
• Supervised learning:
• Training data: paired sequences of states and observations
(y1, y2, …, yT, x1, x2, …, xT)

y
x

• p(y1 = F) = 4/8
• p(y = F | y* = B) = 10/11
• p(x = H | y = F) = ?

5/20/19
Cao Hoang Tru 75
HIDDEN MARKOV MODELS
• Supervised learning:
• Training data: paired sequences of states and observations
(y1, y2, …, yT, x1, x2, …, xT)

y
x

• p(y1 = F) = 4/8
• p(y = F | y* = B) = 10/11
• p(x = H | y = F) = number of (x = H, y = F)/number (x = H/T, y = F)
= 17/36
5/19/19
Cao Hoang Tru 76
HIDDEN MARKOV MODELS
• Supervised learning:
• Training data: paired sequences of states and observations
(y1, y2, …, yT, x1, x2, …, xT)
• p(y1 = y) = num. of sequences starting with y/num. of all sequences
• p(yt = y | yt-1 = y*) = number of (y*, y)/number of all (y*, y’)
• p(xt = x | yt = y) = number of (y, x)/number of all (y, x’)

y
x

5/19/19
Cao Hoang Tru 77
HIDDEN MARKOV MODELS
• Supervised learning example:

p(F | F)? ?

p(B | F)?
p(H | F) H ? H ?
F B
p(T | F) T ? p(F | B)? T ?

p(F)? ?
Start

5/19/19
Cao Hoang Tru 78

You might also like