Professional Documents
Culture Documents
Ai Unit 5
Ai Unit 5
Bottom up Parsing: the parsing star ts with the input symbol and construct the parse tree up
to the star t symbol.
• Both top-down and bottom - up parsing can be inefficient, however, because they can
end up repeating effor t in areas of the search space that lead to dead ends. Consider
the following two s entences:
Have the students in section 2 of Computer Science 101 take the exam.
Have the students in section 2 of Computer Science 101 taken the exam?
• Even though they share the first 10 words, these sentences have ver y different parses,
because the first is a command and the second is a question.
• A left-to-right parsing algorithm would have to guess whether the first word is par t of
a command or a question and will not be able to tell if the guess is correct until at
least the eleventh word, take or taken
• If the algorithm guesses wrong, it will have to backtrack all the way to the first word
and reanalyze the whole sentence under the other interpret ation.
DYNAMIC PROGR AMMING:
Ever y time we analyze a substring, store the results so we won’t have to reanalyze it later .
we can record that result in a data structure known as a char t. Algorithms that do this are
called char t parsers .
There are many types of char t parsers.
Example: CYK algorithm (John Cocke, Daniel Younger, and Tadeo Kasami )
Chomsky Normal Form :
NT → NT NT
NT → T
Deciding Membership Using CYK Algorithm
To decide the membership of any given string, we construct a triangular table where -
• Each row of the table corresponds to one par ticular length of sub strings.
• Bottom most row corresponds to strings of length -1.
• Second row from bottom corresponds to strings of length -2.
• Third row from bottom corresponds to strings of length -3.
• Top most row from bottom corresponds to the given string of length -n.
Notations:
xij
x i j represents a sub string of “x” star ting from location ‘i’ and has length ‘j’.
Number of sub strings possible = n(n+1)/2 = 4 x (4+1) / 2 = 10
Example x = abcd
a b c d
1 2 3 4
• x11 = a x21 = b x31 = c x41 = d
• x 1 2 = ab x 2 2 = bc x 3 2 = cd
• x 1 3 = abc x 2 3 = bcd
• x 1 4 = abcd
Vij
v i j represents a set of variables in the grammar which can derive the sub string x i j .
If the set of variables consists of the star t symbol, then it becomes sure -
• Sub string x i j can be derived from the given grammar.
• Sub string x i j is a member of the language of the given grammar.
Example Problem:
Consider the Grammar given below check the acceptance of string w = baaba using CYK
Algorithm
S → AB / BC
A → BA / a
B → CC / b
C → AB / a
Solution:
The given grammar is in Chomsky Normal Form. So no need to conver t
5 V15
4 V14 V24
3 V13 V23 V33
2 V12 V22 V32 V42
1 V11 V21 V31 V41 V51
1 2 3 4 5
Row 1:
V11 represents the set of variables deriving X11
X11=b
V11={B}
V21 represents the set of variables deriving X21
X21=a
V21={A,C}
V31 represents the set of variables deriving X31
X31=a
V31={A,C}
V41 represents the set of variables deriving X41
X41=b
V41={B}
V51 represents the set of variables deriving X51
X51=a
V51={A,C}
Row 2:
As per the algorithm, to find the value of V i j from 2 n d row on wards,
we use the formula -
V ij = V ik V (i+ k ) ( j -k )
where k varies from 1 to j -1
V12
i=1,j=2,k=1
V12=V11.V21
V12={B}{A,C}
V12={BA,BC}
V12={A,S}
V22
i=2,j=2,k=1
V22=V21.V31
V22={A,C}{A,C}
V22={AA,AC,CA,CC}
Since AA , AC and CA do not exist, so we have -
V22={B}
V32
i=3,j=2,k=1
V32=V31.V41
V32={A,C}{B}
V32={AB,CB}
V32={AB}
V32={S,C}
V42
i=4,j=2,k=1
V42=V41.V51
V42={B}{A,C}
V42={BA,BC}
V42={A,C}
Row 3:
V13
i=1,j=3,k=1,2
V13=V11.V22 U V12.V31
V13= { B } { B } ∪ { A , S } { A , C }
V13 = { BB } ∪ { AA , AC , SA , SC }
V13= ϕ
V23
i=2,j=3,k=1,2
V23=V21.V32 U V22.V41
V23= { A , C } { S , C } ∪ { B } { B }
V23= { AS , AC , CS , CC } ∪ { BB }
V23= { CC }
V23 = B
V33
i=3,j=3,k=1,2
V33=V31.V42 U V32.V51
V33= { A , C } { A , S } ∪ { S , C } { A , C }
V33= { AA , AS , CA , CS } ∪ { SA , SC , CA , CC }
V33= ϕ ∪ { CC }
V33= ϕ ∪ { B }
V33 = { B }
Row 4:
V14
i=1,j=4,k=1,2,3
V14=V11.V23 U V12.V32 U V13.V41
V14= { B } { B } ∪ { A , S } { S , C } ∪ { ϕ , B }
V14= { BB } ∪ { AS , AC , SS , SC } ∪ { B }
Since BB , AS , AC , SS , SC and B do not exist, so we have -
V14= ϕ ∪ ϕ ∪ ϕ
V14 = ϕ
V24
i=2,j=4,k=1,2,3
V24= V 2 1 . V 3 3 ∪ V 2 2 . V 4 2 ∪ V 2 3 . V 5 1
V24= { A , C } { B } ∪ { B } { A , S } ∪ { B } { A , C }
V24= { AB , CB } ∪ { BA , BS } ∪ { BA , BC }
Since CB does not exist, so we have -
V24= { AB } ∪ { BA , BS } ∪ { BA , BC }
V24= { S , C } ∪ { A } ∪ { A , S }
V24= { S , C , A }
ROW 5:
V15
i=1,j=5,k=1,2,3,4
V15= V 1 1 . V 2 4 ∪ V 1 2 . V 3 3 ∪ V 1 3 . V 4 2 ∪ V 1 4 . V 5 1
V15= { B } { S , C , A } ∪ { A , S } { B } ∪ { ϕ } { A , S } ∪ { ϕ } { A , C }
V15= { BS , BC , BA } ∪ { AB , SB } ∪ { A , S } ∪ { A , C }
Since BS , SB , A , S and C do not exist, so we have -
V15= { BC , BA } ∪ { AB } ∪ ϕ ∪ ϕ
V15= { S , A } ∪ { S , C } ∪ ϕ ∪ ϕ
V15 = { S , A , C }
5 {S,A,C}
4 {ϕ} {S,A,C}
3 {ϕ} {B} {B}
2 {S,A} {B} {S,C} {S,A}
1 {B} {A,C} {A,C} {B} {A,C}
1 2 3 4 5
• There exists total 4 distinct sub strings which are members of the language of given
grammar.
• These 4 sub strings are ba, ab, aaba, baaba.
• This is because they contain star t symbol in their respective cell.
• Strings which cannot be derived from any variable are baa, baab.
• This is because they contain ϕ in their respective cell.
• Strings which can be derived from variable B alone are b, aa, aba, aab.
• This is because they contain variable B alone in their respective cell.
5.4 AUGMENTED GR AMMAR:
• lexicalized PCFG, in which the probabilities for a rule depend on the relationship
between words in the parse tree.
Example: To get at the relationship between the verb “eat” and the nouns “banana” versus
“bandanna,”
• we can’t have the probability depend on ever y word in the tree, because we won’t have
enough training data to estimate all those probabilities.
• It is useful to introduce the notion of the head of a phrase —the most impor tant word.
Thus, “eat” is the head of the VP “eat a ban ana” and “banana” is the head of the NP “a
banana.”
VP(v) to denote a phrase with categor y VP whose head word is v .
• We say that the categor y VP is augmented with the head variable v
Here is an augmented grammar that describes the verb –object relation:
VP(v) → Verb(v) NP(n) [P1(v, n)]
VP(v) → Verb(v) [P2(v)]
NP(n) → Ar ticle(a) Adjs( j) Noun(n) [P3(n, a)]
Noun(banana) → banana [pn]
We would set this probability to be relatively high when v is “eat” and n is “banana,” and low
when n is “bandanna.”
1 still over generates, English requires subject –verb agreement for person and number of
the subject and main verb of a sentence .
Example:
I smell → Grammatically Correct
I smells → Wrong
It smell → wrong
It smells → Correct
NP(c, pn, head) has three augmentations: c is a parameter for case, pn is a parameter for
person and number, and head is a parameter for the head word of the phrase.
2 :
S(head) → NP(Sbj, pn, h) VP(pn, head) | ...
NP(c, pn, head) → Pronoun(c, pn, head) | Noun(c, pn, head) | ...
VP(pn, head) → VP(pn, head) NP(Obj, p, h) | ...
PP(head) → Prep(head) NP(Obj, pn, h)
Pronoun(Sbj, 1S,I) → I
Pronoun(Sbj , 1P, we) → we
Pronoun(Obj, 1S, me) → me
Pronoun(Obj, 3P,them) → them