Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

NLP - MODULE 3

CS - 03
Short Answers:
1. What is Syntactic constituency?
A constituent is a word or group of words that form a unit built around a head. They can be
made up of words, phrases, and even entire clauses. The ‘head’, the word around which the
constituent is built, determines the grammatical properties of its constituent.
Syntactic parsing is the task of assigning a syntactic structure to a sentence. The purpose of this
syntactic parsing is to draw exact meaning i.e. dictionary meaning from the text. Syntax
analysis checks the text for meaningfulness comparing to the rules of formal grammar.

2. Define CFG?
Context free grammar is a formal grammar which is used to generate all possible strings in a
given formal language. Context free grammar G can be defined by four tuples as:
G= (V, T, P, S)
Where,
G describes the grammar
T describes a finite set of terminal symbols.
V describes a finite set of non-terminal symbols
P describes a set of production rules
S is the start symbol.

3. What is Structural Ambiguity?

Structural or syntactic ambiguity is the potential of multiple interpretations for a piece of


written or spoken language because of the way words or phrases are organized.
Example:
"One morning, I shot an elephant in my pajamas. How he got in my pajamas I don't know."
The ambiguity here is who was in the pajamas, Groucho or the elephant? Groucho, answering
the question in the opposite way of expectation, gets his laugh.

4. What is PCFG?

A PCFG is a probabilistic version of a CFG where each production has a


probabaility.Probabilites of all productions rewriting a given non-terminal must add to 1,
defining a distribution for each non-terminal. String generation is now probabilistic where
production probabilities are used to non-deterministically select a production for rewriting a
given non-terminal.
5. What is Conjunction ambiguity with example?

Conjunctions are words that join together other words or groups of words. So a conjunction
ambiguity is where we get two different meaning for a same sentence due to a conjunction for
example:

Native speakers probably know what cheese and tomato sandwiches are, but they don't realise
that the phrase is actually ambiguous (has more than one meaning).
So why do we have an ambiguity when we say cheese and tomato sandwiches? The answer
concerns the conjunction and. Here are the two meanings of this phrase, combined with
brackets to show which parts belong together in the two interpretations:

sandwiches filled with cheese and tomato [[cheese and tomato] sandwiches]
cheese, plus sandwiches with a tomato filling [cheese] and [tomato sandwiches]

Long Answers:
1. Explain CKY parsing algorithm with examples.
• One of the earliest recognition and parsing algorithms
• Bottom-up dynamic programming
• Standard version can only recognize CFGs in Chomsky Normal Form (CNF)
• Grammars are restricted to production rules of the form:
A→BC
A→w
• This means that the righthand side of each rule must expand to either two non-terminals or a
single terminal
• Any CFG can be converted to a corresponding CNF grammar that accepts exactly the same set
of strings as the original grammar!
• Three situations we need to address:
1. Production rules that mix terminals and non-terminals on the righthand side
2. Production rules that have a single non-terminal on the righthand side (unit
productions)
3. Production rules that have more than two non-terminals on the righthand side
Situation #1: Introduce a dummy non-terminal that covers only the original terminal
• INF-VP → to VP could be replaced with INF-VP → TO VP and TO → to
Situation #2: Replace the non-terminals with the non-unit production rules to which they
eventually lead
• A → B and B → w could be replaced with A → w
Situation #3: Introduce new non-terminals that spread longer sequences over multiple rules
• A → B C D could be replaced with A → B X1 and X1 → C D

CKY Algorithm
• With the grammar in CNF, each non-terminal node above the POS level of the parse tree will
have exactly two children
• Thus, a two-dimensional matrix can be used to encode the tree structure
• For sentence of length n, work with upper-triangular portion of (n+1) x (n+1) matrix
• Each cell [i, j] contains a set of non-terminals that represent all constituents spanning
positions i through j of the input
• Cell that represents the entire input resides in position [0, n]
• Non-terminal entries: For each constituent [i, j], there is a position, k, where the constituent
can be split into two parts such that i < k < j
• [i, k] must lie to the left of [i, j] somewhere along row i, and [k, j] must lie beneath it along
column j
• To fill in the parse table, we proceed in a bottom-up fashion so when we fill a cell [i, j], the
cells containing the parts that could contribute to this entry have already been filled.
2. Explain Early Parsing algorithm with example.
• Top-down dynamic parsing approach
• Table is length n+1, where n is equivalent to the number of words
• Table entries contain three types of information:
i. A subtree corresponding to a single grammar rule
ii. Information about the progress made in completing the subtree
iii. The position of the subtree with respect to the input
• In Earley parsing, table entries are known as states.
• States include structures called dotted rules •
• A • within the righthand side of a state’s grammar rule indicates the progress made towards
recognizing it.
• A state’s position with respect to the input is represented by two numbers, indicating (1)
where the state begins, and (2) where its dot lies.
Earley Algorithm:
• An Earley parser moves through the n+1 sets of states in a chart in order
• At each step, one of three operators is applied to each state depending on its status
i. Predictor
ii. Scanner
iii. Completer
• States can be added to the chart, but are never removed
• The algorithm never backtracks
• The presence of S → α •, [0, n] indicates a successful parse.

Earley Operators: Predictor


• Creates new states
• Applied to any state that has a non-terminal immediately to the right of its dot (as long as the
non-terminal is not a POS category)
• New states are placed into the same chart entry as the generating state
• They begin and end at the same point in the input where the generating state ends

Earley Operators: Scanner


• Used when a state has a POS category to the right of the dot
• Examines input and incorporates a state corresponding to the prediction of a word with a
particular POS into the chart
• VP → • Verb NP, [0,0]
a. Since category following the dot is a part of speech (Verb)
b. Verb → book •, [0,1]

Earley Operators: Completer


• Applied to a state when its dot has reached the right end of the rule
• Indicates that the parser has successfully discovered a particular grammatical category over
some span of input
• Finds all previously created states that were searching for this grammatical category, and
creates new states that are copies with their dots advanced past the grammatical category
• NP → Det Nominal •, [1,3]
a. What incomplete states end at position 1 and expect an NP?
b. VP → Verb • NP, [0,1]
c. VP → Verb • NP PP, [0,1]
d. So, add VP → Verb NP •, [0,3] and the new incomplete VP → Verb NP • PP, [0,3]
to the chart.

Example
Book that flight.

Det → that | this | a | the


Noun → book | flight | meal | money
Verb → book | include | prefer
S → NP VP
S → VP
NP → Det Nominal
Nominal → Noun
VP → Verb
VP → Verb NP

Book • that flight.

Book that • flight.


Book that flight. •

Which states participate in the final parse?


Successful Earley Parse

3. Explain PCFG algorithm with an example.


• A PCFG is a probabilistic version of a CFG where each production has a probability.
• Probabilities of all productions rewriting a given non-terminal must add to 1, defining a
distribution for each non-terminal.
• String generation is now probabilistic where production probabilities are used to non-
deterministically select a production for rewriting a given non-terminal.

Sentence Probability
• Assume productions for each node are chosen independently.
• Probability of derivation is the product of the probabilities of its productions.
Syntactic Disambiguation
• Resolve ambiguity by picking most probable parse tree.

Sentence Probability
• Probability of a sentence is the sum of the probabilities of all of its derivations.

Three Useful PCFG Tasks


• Observation likelihood: To classify and order sentences.
• Most likely derivation: To determine the most likely parse tree for a sentence.
• Maximum likelihood training: To train a PCFG to fit empirical training data.

PCFG: Most Likely Derivation


There is an analogue to the Viterbi algorithm to efficiently determine the most probable
derivation (parse tree) for a sentence
4. Explain PCKY algorithm with example.
PCKY algorithm is same as CKY parsing with probabilities.
CKY algorithm
• With the grammar in CNF, each non-terminal node above the POS level of the parse tree will
have exactly two children
• Thus, a two-dimensional matrix can be used to encode the tree structure
• For sentence of length n, work with upper-triangular portion of (n+1) x (n+1) matrix
• Each cell [i, j] contains a set of non-terminals that represent all constituents spanning
positions i through j of the input
• Cell that represents the entire input resides in position [0, n]
• Non-terminal entries: For each constituent [i, j], there is a position, k, where the constituent
can be split into two parts such that i < k < j
• [i, k] must lie to the left of [i, j] somewhere along row i, and [k, j] must lie beneath it along
column j
• To fill in the parse table, we proceed in a bottom-up fashion so when we fill a cell [i, j], the
cells containing the parts that could contribute to this entry have already been filled.

PCKY algorithm
Finding the most likely tree argmaxτ P (τ, s) is similar to
Viterbi for HMMs:
• Initialization: every chart entry that corresponds to a terminal
(Entries X in cell[i][i]) has a Viterbi probability PVIT(X[i][i]) = 1
• Recurrence: For every entry that corresponds to a non-terminal X
in cell[i][j], keep only the highest-scoring pair of back-pointers
to any pair of children (Y in cell[i][k] and Z in cell[k+1] [j]):
PVIT(X[i][j]) = argmax Y, Z, k PVIT(Y[i][k]) × PVIT(Z[k+1] [j]) × P (X → Y Z | X )
• Final step: Return the Viterbi parse for the start symbol S
in the top cell[1][n].

You might also like