Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 36

SYNTACTIC ANALYSIS II

(PARSING USING CFGS)

Dr. Sukhnandan Kaur


TIET
INTRODUCTION
 Syntactic parsing, is the task of recognizing a sentence and assigning a syntactic
structure to it.
 These kinds of syntactic structures (parse trees) are directly useful in applications
such as
 grammar checking in word-processing systems; a sentence which cannot be parsed may
have grammatical errors.
 parse trees serve as an important intermediate stage of representation for semantic
analysis.
 in applications like question answering and information extraction

 For example, to answer the question


What books were written by British women authors before 1800? 
We’ll need to know that the subject of the sentence was what books and that the by-
adjunct was British women authors to help us figure out that the user wants a list of
books (and not a list of authors).
 Context-free grammars don’t specify how the parse tree for a given sentence should
be computed.
 Therefore, we’ll need to specify parsing algorithms that employ these grammars to
produce trees.
PARSING- A SEARCH PROCESS
 In syntactic parsing, the parser can be viewed as searching through
the space of possible parse trees to find the correct parse tree for a
given sentence.
 The goal of a parsing search is to find all the trees whose root is the
start symbol S and which cover exactly the words in the input.
 The following two constraints guide the search process:

1) Input:
o The words in the input sentence to be parsed is the first constraint.
o A valid parse is one that covers all the words in a sentence and must
constitute the leaves of the final parse tree.
2) Grammar:
o The second kind of constraint comes from the grammar.
o The parse tree should be constructed according to the rules of the
grammar.
PARSING- A SEARCH PROCESS CONTD….
 The two constraints in the search process give rise to the two
most widely used search strategies by parsers namely:
1) Top-down or goal-directed parsing
2) Bottom-up or data-directed parsing
TOP-DOWN PARSING
 Top-down parsing (as the name suggests) starts its search from the
root node S and works downwards towards the leaves of the tree.
 The next step is to find all the sub-trees which can start with S. This
is done by expanding the root node by using all the grammar rules
with S on their left hand side.
 Similarly, each non-terminal in the resulting sub-trees is expanded
using the grammar rules having a matching non-terminal on the left
hand side.
 The right hand side of the grammar rules provide the nodes to be
generated, which are then expanded recursively.
 As the expansion grows in the tree, a state arrives where the bottom
of the tree consists only part-of-speech categories.
 At this point, all trees whose leaves do not match the words in the
input sentence are rejected, leaving only trees that represent
successful parses.
TOP-DOWN PARSING EXAMPLE
 Using the top-down parsing technique, parse the sentence “Book
that flight” . Consider the following phrase structure grammar

S → NP VP Det → that | this | a


S → VP Noun → book | flight | meal | money
S → Aux NP VP Verb → book | include | prefer
NP → Pronoun Pronoun → I | she | me
NP → Proper-Noun Proper-Noun → Houston | TWA
NP → Det Noun Aux → does
VP → Verb Preposition → from | to | on | near | through
VP → Verb NP
VP → Verb NP PP
VP → Verb PP
VP → VP PP
PP → Preposition NP
TOP-DOWN PARSING EXAMPLE
TOP-DOWN PARSING EXAMPLE CONTD…
 The algorithm starts by assuming the input can be derived by the designated
start symbol S.
 The next step is to find the tops of all trees which can start with S, by looking
for all the grammar rules with S on the left-hand side.
 In the grammar, there are three rules that expand S, so the second ply, or level,
of the search space has three partial trees.
 We next expand the constituents in these three new trees, just as we originally
expanded S. The first tree tells us to expect an NP followed by a VP, the second
expects an Aux followed by an NP and a VP, and the third a VP by itself.
 Trees are grown downward until they eventually reach the part-of-speech
categories at the bottom of the tree.
 At this point, trees whose leaves fail to match all the words in the input can be
rejected, leaving behind those trees that represent successful parses.
 In Fig, only the fifth parse tree in the third ply (the one which has expanded the
rule VP → Verb NP) will eventually match the input sentence Book that flight.
 
TOP DOWN PARSING CONTD….
 Top down parsing can be implemented with both depth-first
and breadth first search strategy.
 In Depth-First Search, we explore one possibility at a time.

 Depth first search means that whenever there is more than one
rule that could be applied at one point, we first explore one
possibility (and all its consequences).
 Only if we fail, we consider the alternative(s) following the
same strategy. So, we stick to a decision as long as possible.
 In Breadth-First Search, we pursue all possible choices "in
parallel".
 in breadth-first search we pursue all possible choices "in
parallel", instead of just exploring one. So, instead of
committing to one decision, we jump between all alternatives.
DERIVATION USING TOP-DOWN, DEPTH- FIRST

S S S S S

NP VP NP VP NP VP
NP VP
Pro
PNoun Noun
DET
Book Book Book Book Book
S S S S S S
S
VP VP VP VP VP
VP
NP V NP V NP V NP V NP
V

N DET
PRO PNoun DET
N

Book Book Book that that that flight


BOTTOM-UP PARSING
 A bottom-up parser starts with the words in the input sentence and
attempts to construct a parse tree in an upward direction towards the
root.
 At each step, the parser looks for rules in the grammar where the right
hand side matches some of the portions in the parse tree constructed
so far, and reduces it using the left hand side of the production.
 The parse is considered successful if the parse tree reduces the tree to
the start symbol of the grammar.
 In general, the parser extends one ply to the next by looking for places
in the parse-in-progress where the right-hand side of some rule might
fit.
 This contrasts with the earlier top-down parser, which expanded trees
by applying rules when their left-hand side matched an unexpanded
non-terminal
BOTTOM-UP PARSING EXAMPLE
 Using the bottom-up parsing technique, parse the sentence
“Book that flight” . Consider the following phrase structure
grammar
S → NP VP Det → that | this | a
S → VP Noun → book | flight | meal | money
S → Aux NP VP Verb → book | include | prefer
NP → Pronoun Pronoun → I | she | me
NP → Proper-Noun Proper-Noun → Houston | TWA
NP → Det Nominal Aux → does
Nominal → Noun  
Nominal → Nominal Noun
Nominal → Nominal PP  
VP → Verb Preposition → from | to | on | near | through
VP → Verb NP
VP → Verb NP PP
VP → Verb PP
VP → VP PP
PP → Preposition NP
BOTTOM-UP PARSING EXAMPLE
BOTTOM-UP PARSING EXAMPLE
 The parser begins by looking up each input word in the lexicon and
building three partial trees with the part-of-speech for each word.
 But the word book is ambiguous; it can be a noun or a verb. Thus the
parser must consider two possible sets of trees.
 Each of the trees in the second ply is then expanded. In the parse on
the, the Nominal → Noun rule is applied to both of the nouns (book
and flight). This same rule is also applied to the sole noun (flight) on
the right, producing the trees on the third ply.
 In the fourth ply, in the first and third parse, the sequence Det
Nominal is recognized as the right-hand side of the NP → Det
Nominal rule.
 In the fifth ply, the interpretation of book as a noun has been pruned
from the search space. This is because this parse cannot be continued:
there is no rule in the grammar with the right-hand side Nominal NP.
TOP-DOWN VS. BOTTOM-UP PARSING
 Each of these two architectures has its own advantages and
disadvantages.
 The top-down strategy never wastes time exploring trees that
cannot result in an S. In the bottom-up strategy, by contrast,
trees that have no hope of leading to an S, or fitting in with any
of their neighbors, are generated
 While top-down parser, does not waste time with trees that do
not lead to an S, it does spend considerable effort on S trees
that are not consistent with the input.
 This weakness in top-down parsers arises from the fact that
they generate trees before ever examining the input.
 Bottom-up parsers, on the other hand, never explore a tree that
does not match with the input.
TOP-DOWN VS. BOTTOM-UP PARSING CONTD...

 Another problem in top-down parsers (depth-first) is of left


recursion, which causes the search to stuck in infinite loop.
 The problem of left recursion arises if the grammar is left
recursive i.e., it contains a non-terminal A, which derives in
one or more steps, a string beginning with the same non
terminal i.e. A * A  for some .
 The parsers also suffers from the problem of repeated parsing
i.e. parser often builds valid trees for portions of the input that
it discards during backtracking. These have to be rebuilt during
subsequent steps in the parse.
AMBIGUITY
 Ambiguity is perhaps the most serious problem faced by parsers.
 The most common types of ambiguity which the parsers have to deal with is
structural ambiguity.
 Structural ambiguity occurs when the grammar assigns more than one
possible parse to a sentence.
 Structural ambiguity, comes in many forms. Two particularly common
kinds of ambiguity are attachment ambiguity and coordination ambiguity.
 A sentence has an attachment ambiguity if a particular constituent can be
attached to the parse tree at more than one place.
 For example, for the sentence “The girl plucked the flower with a long stick”,
there are two ways of generating the prepositional phrase “with a long stick”.
 The first parse can be generated from the verb phrase and it leads to the
interpretation that the stick is used to pluck the flower.
 The second parse can be generated from the noun phrase and it leads to the
interpretation that the flower being plucked has a long stick.
AMBIGUITY CONTD…..
PP-attachment with VP PP-attachment with NP

S S

NP VP NP VP

The girl V NP PP The girl V NP

plucked the flower P NP plucked DET N PP

the flower P NP
with a long stick

with a stick
AMBIGUITY CONTD…..
 In coordination ambiguity there are different sets of phrases
that can be joined by a conjunction like and.
 For example, the phrase old men and women can be bracketed as
[old [men and women]], referring to old men and old women, or
as [old men] and [women], in which case it is only the men who
are old.
 
AMBIGUITY CONTD…..
 Even if a sentence isn’t ambiguous, ), it can be inefficient to
parse due to local ambiguity.
 Local ambiguity occurs when some part of a sentence is
ambiguous, that is, has more than one parse, even if the whole
sentence is not ambiguous.
 For example the sentence Book that flight is unambiguous, but
when the parser sees the first word Book, it cannot know if it is a
verb or a noun until later. Thus it must use consider both
possible parses.
DYNAMIC PROGRAMMING PARSING METHODS
 The problems that afflict standard bottom-up or top-down parsers
can be solved by a single class of algorithms called Dynamic
Programming methods.
 Dynamic programming approaches systematically fill in tables of
solutions to sub-problems.
 When complete, the tables contain the solution to all the sub-
problems needed to solve the problem as a whole.
 In the case of parsing, such tables are used to store sub-trees for
each of the various constituents in the input as they are discovered.
 These sub-trees are discovered once, stored, and then used in all
parses calling for that constituent.
 This solves the re-parsing problem (sub-trees are looked up, not
re-parsed) and partially solves the ambiguity problem.
DYNAMIC PROGRAMMING PARSING METHODS
CONTD…
 The three most widely used methods are the
 Cocke-Kasami-Younger (CKY) algorithm

 Earley algorithm

 Chart Parsing

 
COCKE-KASAMI-YOUNGER (CKY)
PARSING
 The major requirement for CKY algorithm is that the CFG
should be in Chomsky Normal Form (CNF).
 The grammars in CNF are restricted to rules of the form A → B
C, or A → w i.e. the right-hand side of each rule must expand
to either two non-terminals or to a single terminal.
 This single restriction gives rise to an extremely simple and
elegant table-based CKY parsing method.
CONVERSION TO CNF
 Assuming we’re dealing with an ε-free grammar, there are three situations
we need to address in any generic grammar:
 rules that mix terminals with non-terminals on the right-hand side,
 rules that have a single non-terminal on the right,
 and rules where the right-hand side’s length is greater than two.

 The rules that mix terminals and can be converted into CNF by simply
introducing a new dummy non-terminal that covers only the original
terminal.
 For example, a rule for an infinitive verb phrase such as INF-VP → to VP
would be replaced by the two rules INF-VP → TO VP and TO → to.
 Rules with a single non-terminal on the right are called unit productions.
Unit  productions are eliminated by rewriting the right-hand side of the
original rules with the right-hand side of all the non-unit production rules
that they ultimately lead to.
 More formally, if A* B is a unit production and B  u is a non unit
production, then we add a rule A  u in the grammar
CONVERSION TO CNF
 Rules with right-hand sides longer than 2 are remedied through
the introduction of new non-terminals that spread the longer
sequences over several new productions.
 Formally, if we have a rule like A → B C γ we replace the
leftmost pair of non-terminals with a new non-terminal and
introduce a new production result in the following new rules.
X1 → B C
A → X1 γ
 The entire conversion process can be summarized as follows: 

1) Copy all conforming rules to the new grammar unchanged,

2) Convert terminals within rules to dummy non-terminals,

3) Convert unit-productions,

4) Binarize all rules and add to new grammar.


EXAMPLE – CFG TO CNF
Convert the following CFG to CNF:
S → NP VP Det → that | this | a
S → VP Noun → book | flight | meal | money
S → Aux NP VP Verb → book | include | prefer
NP → Pronoun Pronoun → I | she | me
NP → Proper-Noun Proper-Noun → Houston | TWA
NP → Det Nominal Aux → does
Nominal → Noun Preposition → from | to| any| some|through
Nominal → Nominal Noun
Nominal → Nominal PP
VP → Verb
VP → Verb NP
VP → Verb NP PP
VP → Verb PP
VP → VP PP
PP → Preposition NP
EXAMPLE CONTD….
Nominal → Nominal PP
S → NP VP VP → book | include | prefer
S → X1 VP VP → Verb NP
X1 → Aux NP VP → X2 PP
S → book | include | prefer X2 → Verb NP
S → Verb NP VP → Verb PP
S → X2 PP VP → VP PP
X2 → Verb NP PP → Preposition NP
S → Verb PP Det → that | this | a
S → VP PP Noun → book | flight | meal | money
NP → I | she | me Verb → book | include | prefer
NP → TWA | Houston Pronoun → I | she | me
NP → Det Nominal Proper-Noun → Houston | TWA
Nominal → book | flight | meal | money Aux → does
Nominal → Nominal Noun Preposition → from | to| any| some | through
CKY RECOGNITION
 Let n be the number of words in the input, we can consider that n+1
lines separate them from 0 to n.
 More specifically, for a sentence of length n, we will be working with
the upper-triangular portion of an (n + 1) × (n + 1) matrix.
 Each cell [i, j] in this matrix contains a set of non-terminals that
represent all the constituents that span positions i through j of the input.
 The upper triangular matrix is created in the bottom-up fashion.

 Since our grammar is in CNF, the non-terminal entries in the table have
exactly two daughters in the parse. Therefore, for each constituent
represented by an entry [i, j] in the table there must be a position in the
input, k, where it can be split into two parts such that i < k < j.
 Given such a k, the first constituent [i, k] must lie to the left of entry [i,
j] somewhere along row i, and the second entry [k, j] must lie beneath it,
along column j.
CKY RECOGNITION CONTD…
 CKY recognition is simply a matter of filling the parse table in
the right way.
 To do this, we’ll proceed in a bottom-up fashion so that at the
point where we are filling any cell [i, j], the cells containing the
parts that could contribute to this entry, have already been
filled.
 The upper-triangular matrix is filled a column at a time working
from left to right.
 Each column is then filled from bottom to top.

 This scheme guarantees that at each point in time we have all


the information we need (to the left, since all the columns to the
left have already been filled, and below since we’re filling
bottom to top)
CKY RECOGNITION CONTD…
 Ordering Illustration
EXAMPLE – CKY RECOGNITION
 Using the CFG grammar converted to CNF (in previous slides),
recognize the sentence “ Book the flight through Houston”
CKY PARSING
 CKY recognition algorithm succeeds if it simply finds an S in cell [0,
N].
 To turn it into a parser capable of returning all possible parses for a
given input, we’ll make two simple changes to the algorithm:
1) The first change is to augment the entries in the table so that each non-
terminal is paired with pointers to the table entries from which it was
derived.
2) The second change is to permit multiple versions of the same non-
terminal to be entered into the table.
 With these changes, the completed table contains all the possible parses
for a given input.
 Returning an arbitrary single parse consists of choosing an S from cell
[0, n] and then recursively retrieving its component constituents from
the table.
 
CKY PARSING EXAMPLE
 Use CKY parsing algorithm, to generate correct syntactic structure(s) for the
sentence:
“A pilot likes flying planes”
Given the following CFG:
S  NP VP
VP  VBG NNS
VP  VBZ VP
VP  VBZ NP
NP  DET NN
NP  JJ NNS
DET  a
NN  pilot
VBZ  likes
VBG  flying
JJ  flying
NNS  planes
CKY PARSING EXAMPLE CONTD…
 The given CFG is already in CNF. Therefore, it will directly be
used to build the parse table.

0 a 1 pilot 2 likes 3 flying 4 planes 5


A1: DET F1: NP (A1,B1) ----- ----- I1: S1 (F1,H1)
0
[0,1] [0,2] [0,3] [0,4] I2: S2 (F1, H2)

B1: NN ----- ------ -----


1
[1,2] [1,3] [1,4] [1,5]

C1: VBZ ----- H1: VP1 (C1,G1)


2
[2,3] [2,4] H2: VP2 (C1, G2)
[2,5]
D1: VBG G1: VP (D1,E1)
3 D2: JJ G2: NP (D2,E1)
[3,4] [3,5]
E1: NNS
4 [4,5]
CKY PARSING EXAMPLE CONTD…
 The first parse tree generated from the table is :

S1 (I1)

NP (F1) VP1 (H1)

DET (A1) NN (B1)


VBZ (C1) VP (G1)

a pilot VBG (D1) NNS (E1)


likes

flying planes
 This interpretation of parse tree indicates that pilot likes to fly planes (as flying planes is
associated with VP)
CKY PARSING EXAMPLE CONTD…
 The second parse tree generated from the table is :

S2 (I2)

NP (F1) VP2 (H2)

DET (A1) NN (B1)


VBZ (C1) NP (G2)

a pilot
likes JJ (D2) NNS (E1)

flying planes
 This interpretation of parse tree indicates that pilot likes the planes that are flying (as
flying planes is associated with NP)

You might also like