Professional Documents
Culture Documents
Syntactic Analysis Ii (Parsing Using CFGS) : Dr. Sukhnandan Kaur Tiet
Syntactic Analysis Ii (Parsing Using CFGS) : Dr. Sukhnandan Kaur Tiet
1) Input:
o The words in the input sentence to be parsed is the first constraint.
o A valid parse is one that covers all the words in a sentence and must
constitute the leaves of the final parse tree.
2) Grammar:
o The second kind of constraint comes from the grammar.
o The parse tree should be constructed according to the rules of the
grammar.
PARSING- A SEARCH PROCESS CONTD….
The two constraints in the search process give rise to the two
most widely used search strategies by parsers namely:
1) Top-down or goal-directed parsing
2) Bottom-up or data-directed parsing
TOP-DOWN PARSING
Top-down parsing (as the name suggests) starts its search from the
root node S and works downwards towards the leaves of the tree.
The next step is to find all the sub-trees which can start with S. This
is done by expanding the root node by using all the grammar rules
with S on their left hand side.
Similarly, each non-terminal in the resulting sub-trees is expanded
using the grammar rules having a matching non-terminal on the left
hand side.
The right hand side of the grammar rules provide the nodes to be
generated, which are then expanded recursively.
As the expansion grows in the tree, a state arrives where the bottom
of the tree consists only part-of-speech categories.
At this point, all trees whose leaves do not match the words in the
input sentence are rejected, leaving only trees that represent
successful parses.
TOP-DOWN PARSING EXAMPLE
Using the top-down parsing technique, parse the sentence “Book
that flight” . Consider the following phrase structure grammar
Depth first search means that whenever there is more than one
rule that could be applied at one point, we first explore one
possibility (and all its consequences).
Only if we fail, we consider the alternative(s) following the
same strategy. So, we stick to a decision as long as possible.
In Breadth-First Search, we pursue all possible choices "in
parallel".
in breadth-first search we pursue all possible choices "in
parallel", instead of just exploring one. So, instead of
committing to one decision, we jump between all alternatives.
DERIVATION USING TOP-DOWN, DEPTH- FIRST
S S S S S
NP VP NP VP NP VP
NP VP
Pro
PNoun Noun
DET
Book Book Book Book Book
S S S S S S
S
VP VP VP VP VP
VP
NP V NP V NP V NP V NP
V
N DET
PRO PNoun DET
N
S S
NP VP NP VP
the flower P NP
with a long stick
with a stick
AMBIGUITY CONTD…..
In coordination ambiguity there are different sets of phrases
that can be joined by a conjunction like and.
For example, the phrase old men and women can be bracketed as
[old [men and women]], referring to old men and old women, or
as [old men] and [women], in which case it is only the men who
are old.
AMBIGUITY CONTD…..
Even if a sentence isn’t ambiguous, ), it can be inefficient to
parse due to local ambiguity.
Local ambiguity occurs when some part of a sentence is
ambiguous, that is, has more than one parse, even if the whole
sentence is not ambiguous.
For example the sentence Book that flight is unambiguous, but
when the parser sees the first word Book, it cannot know if it is a
verb or a noun until later. Thus it must use consider both
possible parses.
DYNAMIC PROGRAMMING PARSING METHODS
The problems that afflict standard bottom-up or top-down parsers
can be solved by a single class of algorithms called Dynamic
Programming methods.
Dynamic programming approaches systematically fill in tables of
solutions to sub-problems.
When complete, the tables contain the solution to all the sub-
problems needed to solve the problem as a whole.
In the case of parsing, such tables are used to store sub-trees for
each of the various constituents in the input as they are discovered.
These sub-trees are discovered once, stored, and then used in all
parses calling for that constituent.
This solves the re-parsing problem (sub-trees are looked up, not
re-parsed) and partially solves the ambiguity problem.
DYNAMIC PROGRAMMING PARSING METHODS
CONTD…
The three most widely used methods are the
Cocke-Kasami-Younger (CKY) algorithm
Earley algorithm
Chart Parsing
COCKE-KASAMI-YOUNGER (CKY)
PARSING
The major requirement for CKY algorithm is that the CFG
should be in Chomsky Normal Form (CNF).
The grammars in CNF are restricted to rules of the form A → B
C, or A → w i.e. the right-hand side of each rule must expand
to either two non-terminals or to a single terminal.
This single restriction gives rise to an extremely simple and
elegant table-based CKY parsing method.
CONVERSION TO CNF
Assuming we’re dealing with an ε-free grammar, there are three situations
we need to address in any generic grammar:
rules that mix terminals with non-terminals on the right-hand side,
rules that have a single non-terminal on the right,
and rules where the right-hand side’s length is greater than two.
The rules that mix terminals and can be converted into CNF by simply
introducing a new dummy non-terminal that covers only the original
terminal.
For example, a rule for an infinitive verb phrase such as INF-VP → to VP
would be replaced by the two rules INF-VP → TO VP and TO → to.
Rules with a single non-terminal on the right are called unit productions.
Unit productions are eliminated by rewriting the right-hand side of the
original rules with the right-hand side of all the non-unit production rules
that they ultimately lead to.
More formally, if A* B is a unit production and B u is a non unit
production, then we add a rule A u in the grammar
CONVERSION TO CNF
Rules with right-hand sides longer than 2 are remedied through
the introduction of new non-terminals that spread the longer
sequences over several new productions.
Formally, if we have a rule like A → B C γ we replace the
leftmost pair of non-terminals with a new non-terminal and
introduce a new production result in the following new rules.
X1 → B C
A → X1 γ
The entire conversion process can be summarized as follows:
3) Convert unit-productions,
Since our grammar is in CNF, the non-terminal entries in the table have
exactly two daughters in the parse. Therefore, for each constituent
represented by an entry [i, j] in the table there must be a position in the
input, k, where it can be split into two parts such that i < k < j.
Given such a k, the first constituent [i, k] must lie to the left of entry [i,
j] somewhere along row i, and the second entry [k, j] must lie beneath it,
along column j.
CKY RECOGNITION CONTD…
CKY recognition is simply a matter of filling the parse table in
the right way.
To do this, we’ll proceed in a bottom-up fashion so that at the
point where we are filling any cell [i, j], the cells containing the
parts that could contribute to this entry, have already been
filled.
The upper-triangular matrix is filled a column at a time working
from left to right.
Each column is then filled from bottom to top.
S1 (I1)
flying planes
This interpretation of parse tree indicates that pilot likes to fly planes (as flying planes is
associated with VP)
CKY PARSING EXAMPLE CONTD…
The second parse tree generated from the table is :
S2 (I2)
a pilot
likes JJ (D2) NNS (E1)
flying planes
This interpretation of parse tree indicates that pilot likes the planes that are flying (as
flying planes is associated with NP)