Taaltheorie en Taalverwerking Samenvatting 1

Lecture 1a
Reasons for studying natural language processing (NLP) within AI:
 You want a computer to communicate with users in their terms.
 There is a vast store of information recorded in natural language that can be accessible via
computers: news, government reports, social media etc.
 Three main types of applications:
o Enabling human-computer communication
o Improving human-human communication
o Doing useful processing of text or speech

Formal (FLs) and Natural (NLs) languages:

 Formal (computer) languages: e.g. Java, Prolog, Python, HTML
 Natural (human) languages: e.g. Dutch, English, Spanish, German, Japanese
 ‘Languages’ that represent the behavior of a machine or a system: e.g. think about
‘communicating’ with a vending machine via coin insertions and button presses: pressButton1

Syntax trees:
 Syntax: Describes the structural properties of the language. Natural language is much more
complicated than the formal languages used for the artificial languages of logics and computer

Natural language semantics:

 Semantics: Represents the meaning of words and sentences. Logic is a good candidate.
 Consider the sentence:
o Every student has access to a computer
 The meaning of this can be expressed as two different logical formulas:
o ∀x.(student(x) ⇒ ∃y.(computer(y) ∧ hasAccessTo(x, y)))
 Every student has access to their own computer
o ∃y.(computer(y) ∧ ∀x.(student(x) ⇒ hasAccessTo(x, y)))
 There is a computer to which every student has access to
 Problem: How can (either of) these formulas be mechanically generated from a syntax tree for
the original sentence?

FLs and NLs:

 There are close relationships between FLs and NLs, but also important differences:
o FLs can be pinned down by a precise definition.
o NLs are fluid, fuzzy at the edges and constantly evolving.
o NLs are riddled with ambiguity at all levels.
 This is normally avoidable in FLs.

Ambiguity in Natural Language:

 Phonological ambiguity: ‘an ice lolly’ vs. ‘a nice lolly’
 Lexical ambiguity: ‘bass’ has at least two meanings: fish and musical instrument
 Syntactic ambiguity: two possible syntax trees for ‘complaints about referees multiplying
o Could mean:
 There are more and more complaints about referees
 Referees are multiplying and there are complaints about that
 Semantic ambiguity: ‘Please use all available doors while boarding the train’ vs. ‘Please fill all
sections in the questionnaire’
o First sentence: Doesn’t mean that you personally have to use every single door
o Second sentence: Does mean that you have to fill in every single section

Levels of Language Complexity:

 Some languages features are ‘more complex’ (harder to describe, harder to process) than
others. We can classify languages on a scale of complexity known as the Chomsky Hierarchy:
o Regular languages: Those whose phrases can be ‘recognized’ by a finite state machine
o Context-free languages: Many aspects of NLs can be described at this level (also used
for most programming languages)
o Context-sensitive languages: Some NLs involve features at this level of complexity
o Recursively enumerable languages: All languages that can in principle be defined via
mechanical rules.

Formal Languages:
 A formal language is a set of strings
 Each string is composed of symbols from a set called an alphabet (or a vocabulary)
o Examples of alphabets:
 English letters: Σ = {a, b, c . . . z}
 Decimal digits: Σ = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
 Programming language ‘tokens’: Σ = {if, while, x, ==}
 ‘Primitive’ actions performed by a machine or system, e.g. a vending machine
Σ = {insert50c, pressButton1, ...}
 Words in (some fragment of) a natural language: Σ = {the, an, a, dog, cat,
o Examples of strings over alphabets:
 Let Σ1 = {0, 1} be an alphabet. Then 01101, 000001, 1101 are strings over Σ1.
— in fact, all binary numbers are strings over Σ1
 Let Σ2 = {a, b, c, d, e, f, g} be an alphabet. Then bee, dad, cabbage, and face
are strings over Σ2, as are fffff and agagag.
 Let Σ3 = {ba, ca, fa, ce, fe, ge} be an alphabet. Then face is a string over Σ3 but
bee, dad or cabbage are not.
 Let Σ4 = {♠, ©, ♣} be an alphabet. Then ♠♠ and ♣©♣ are strings over Σ4.
 The length of a string is the number of token symbols from the alphabet it contains
o Examples:
 The length of ‘face’ over Σ2 = {a, b, c, d, e, f, g} is 4
 The length of ‘face’ over Σ3 = {ba, ca, fa, ce ,fe, ge} is 2
 The string of length 0 is called the empty string, denoted by ε (epsilon).
 Given a string s, a substring of s is a string formed by taking contiguous symbols of s in the
order in which they occur in s.
 An initial substring is called prefix and a final substring a suffix.

o Let unthinkable be a string over Σ = {a, b, c . . . x, y, z} Then, ε, un, unth, unthinkable

are prefixes, while ε, e, able, thinkable, and unthinkable are suffixes. Other substrings
include nthi, inka, bl.
 Σ* denotes the set of all strings over an alphabet Σ
o Σ is always infinite, regardless of the number of symbols Σ contains.
 We may now define a formal language L over an alphabet Σ as any subset of Σ*: L ⊆ Σ*
o Examples: let Σ = (a, b, c … x, y, z}. Then Σ* is the set of strings over the Latin alphabet
and the following subsets of Σ* are possible formal languages:
 The set of strings consisting of consonants (medeklinker) only
 The set of strings containing at least one vowel (klinker) and one consonant
 The set of strings whose length is less than 9 symbols
 The set {one, two, tree, four, five, six, seven, eight, nine, ten}
 The set of all English words
 The empty set

Ways to define a Formal Language:

 Given an alphabet Σ and the infinite set Σ* of formal languages it can give rise to, how can we
select a particular formal language?
o Direct mathematical definition: Σ = {a, b, c}, L1 = {aa, bb, abc, abcc}, L2 = {a nbn |n > 0}
o Formalisms (formal expressions and grammars): sets of rules
o Automata: computational devices for computing languages
 Specify some machine for testing whether a string is legal or not
 Formalisms and automata allow us to distinguish a formal language of interest (a set of
strings) from other possible languages over a given alphabet
o They capture the patterns that characterize a language
o As such, they act as a definition of the language they capture
 From an abstract point of view, a natural language, like Dutch, is a set of strings
(sounds/letters/words etc.)
 Therefore, formalisms and automata can help us to model aspects of natural languages

Regular expressions:
 Regular expressions are a formal notation for characterizing sets of strings that follow a fairly
simple regular pattern.
 We can construct regular expressions over an alphabet Σ as follows:
(these regular expressions are written in mathematical notation)

o We often ignore the dot in concatenation, and simply write ab

o Disjunction (or union) may be written as a|b a+b or a∪b
o a+ is the set of a-strings with at least one a (same as a*a or aa*)
o an can be used to abbreviate the concatenation of a with itself n times
o the notation Σ* can be seen as abbreviating (a|b|…)* for any symbol a, b, … in Σ
 Examples: Let Σ = {a, b, c … x, y, z}
o me(o)*w mew, meow, meooow, meooooooooooow etc.

o ba(a)+ baa, baaaaa, baaaaaaa, baaaaaaaaaaaa etc.

o (ab)*c c, abc, ababc, ababababababababc etc.
o (a*|b*)c c, ac, bc, aaaaaabbbc, abbbbbbc etc.
 Another notation is Perl notation

 A regular expression allows us to characterize a formal language declaratively

 We can characterize the same language procedurally by means of an automaton that specifies
how the language is computed.

Which of the following states are legal?

 ε yes
 11 yes
 1010 yes
 1101 no

Finite State Automata (FSA): Formal definition:

 We can formally specify an FSA by 5 parameters:
o Σ: an input alphabet
o Q: a finite set of states
o Q0 ∈ Q: the start state
o F: a set of final or accepting states (F⊆Q)
o δ: a transition function between states that maps pairs of states and input symbols
<q,i> to a new state q’ (Q X Σ Q)

 Regular expressions and FSAs capture the same class of languages: any language that is the
denotation of some regular expression can be computed by an FSA and vice-versa

From Regular Expression to FSA:

 Strategy:
o Base case: build an automaton for simple expressions
o Inductive step: show how to reproduce each of the operations on regular expressions
with an automaton.
 a* always has a loop above the state:

Lecture 1b
Finite State Automata: some remarks:
 There is only 1 start state
 But there can be several final states
 The start state and final state may be the same
 Every time we traverse the FSA from the start state to one of the final states, we have a string
of the formal language (a legal string)
 In case a final state has an outgoing arrow we may continue as long as we are able to reach a
final state later on.

 FSAs can be used to generate strings:

o Each valid computation (from the start state to some final state) generates on of the
possible strings of a language
 They can also be used to recognize strings:
o Does a string belong to the language defined by the automaton?
o If the automaton consumes all input symbols in a string and reaches a final state, the
string is accepted; else, it is rejected.

Non-deterministic FSAs (NFSA)

 Recognizer: If we can transition to more than one state without being in conflict with the input
string, the FSA is non-deterministic.
 Sources of non-determinism:
o States with more than one outgoing arrow with the same label (leading to different
o States with more than one outgoing arrow where some arrow is an ε-transition

 For every NFSA, there is an equivalent DFSA. In the sense that both accept exactly the same
set of strings (same language)
 Simplifying somewhat, to build a DFSA from a NFSA we need to create a deterministic path for
each possible path in the NFSA.
 The recognition behavior of a DFSA is fully determined by the state it is in and the symbol it is

 NFSAs are not more powerful than FSAs, but they may be simpler.
 An NFSA accepts a string if there is at least one path (one way of consuming the input string)
that leads to an accepting state
o We don’t stop at the first ‘fail’
 String recognition with an NFSA can thus be seen as a search problem: the problem of finding
an accepting path
 The search space may be explored with two different strategies: depth-first and breadth-first.

The lexicon:
 A lexicon is a repository of words (= a dictionary)
o What sort of information about words should a lexicon contain?
o How can it be represented efficiently?
 Words are made up of morphemes, the smallest meaningful units in a language

 Dictionaries typically don’t include all the inflected forms of a word

 We can use an FSM to define the inflectional paradigm of a word:

 An efficient way to define morphological lexicons consist in specifying FSAs that apply to sub-
categories of words.

 This accounts for the productivity of morphological processes (if we learn a new verb, we can
guess its inflectional paradigm)

 A morphological lexicon lists the lemmas and other morphemes in a language (and their
categories) and specifies the allowable morpheme sequences that make up the words.
 Irregularities:
o Not all morphological processes consist in concatenating morphemes
 Foot  feet
 Make  makes, making, made
o To account for irregular forms, we require transformations:
 Ff
 Oe
 Oe
 Tt
o We can define such transformations with a variant of FSAs that maps between two
sets of symbols: finite state transducers (FSTs)

Finite State Transducers:

 Σ: an input alphabet
 Δ: an output alphabet
 Q: a finite set of states
 Q0 ∈ Q: the start state
 F: a set of final or accepting states (F ⊂ Q)
 δ: a transition function that maps pairs of states and symbols i ∈ Σ to pairs of states and
symbols j ∈ (Q X Σ Q X Δ)

 Lemmatization is the process of reducing the forms in an inflexional paradigm of a word to
their underlying common lemma:
o Sing, sings, singing, sang, sung  sing
o Walk, walks, walking, walked  walk
 Why is it important?
o In information retrieval applications like web search, when we search for keywords we
want to find documents that contain any inflectional variants
 We can use FSTs for both regular and irregular forms
 Types of transformations:
o x:x identity (no change)
o x:y substitution
o x:ε deletion
o ε:x insertion

 Stemming has the same goal as lemmatization, but does the reduction with less knowledge,
using heuristic rules:
o Does not rely on morphological lexicon (lemmas are not known)

o Tries to leave only the ‘stem’ (an approximation of a lemma) by stripping off the
endings of the words
 Ational  ate (e.g. relational  relate)
 Ing  ε (e.g. motoring  motor)
 The stemming rules can also be implemented as FSTs
 Stemming can be useful, but can easily lead to mistakes
o National  nate

Morphological Parsing:
 To construct well-formed sentences, we need to pay attention to morphological features (e.g.
agreement between subject and verb)
o His foot is broken
o His feet are broken
 We would like to map the surface form (the words as they appear in a text) to more
informative representations
 Parsing: producing some sort of linguistic structure for an input expression

Lecture 2a
Syntax: from words to sentences:
 Syntax deals with the structural properties of sentences
o Not all sequences of words count as a sentences of a language

o Speakers have intuitions about what well-formed sentences of a language are, even if
they don’t know what a sentence means

Word classes:
 Nouns, verbs, pronouns, prepositions, adverbs etc.
 Three criteria for classifying words:
o Distributional criteria: Where can the word occur?
o Morphological criteria: What form does the word have? What affixes can it take?
o Notional (or semantic) criteria: What sort of concept does the word refer to?
 Open classes: are typically large, have fluid membership
o Four major word classes are widely found in languages worldwide: nouns, verbs,
adjectives, adverbs
 Closed classes: are typically small, have relatively fixed membership
o E.g. determiners (a, the), prepositions (English, Dutch), postpositions (Korean, Hindi)
o Closed-class words (e.g. of, which, could) often play a structural role in the grammar
as function words.
 Nouns (zelfstandig naamwoord):
o Notionally: nouns refer to things; living things (cat), places (Amsterdam), nonliving
things (ship), or concepts (marriage)
o Formally: -ness, -tion, -ity, -ance tend to indicate nouns (happiness, preparation,
activity, significance)

o Distributionally: we can examine the contexts in which nouns occur. For example,
nouns can appear with possession: ‘his car’, ‘her idea’ etc.
 Verbs (werkwoord):
o Notionally: verbs refer to actions (write, think, observe)
o Formally: words that end in -ate or -ize tend to be verbs; words ending in -ing are
often the present participle of a verb (automate, modernize, sleeping)
o Distributionally: we can examine the contexts where a verb appears. Different types of
verbs have different distributional properties. For example, base form verbs can
appear as infinitives: ‘to jump’, ‘to learn’.
 Adjectives (bijvoegelijk naamwoord):
o Notionally: adjectives convey properties of or opinions about things that are nouns
(small, sensible, excellent)
o Formally: words that end in -al, -ble and -ous tend to be adjectives (formal, sensible,
o Distributionally: adjectives appear before a noun or after a form of be. E.g. ‘the big
building’, ‘John is tall’
 Adverbs (bijwoord):
o Notionally: adverbs convey properties of actions or events (quickly, often, possibly) or
adjectives (really)
o Formally: words that end in -ly tend to be adverbs
o Distributionally: adverbs can appear next to a verb, or an adjective, or at the start of a
 Importance of criteria:
o Often in reading, we come across unknown words
o Even if we don’t know its meaning, formal and distributional criteria help people (and
machines) recognize which (open) class an unknown word belongs to

Constituency: basic idea

 In sentences, some groups of words fit together and act as units.
o Examples:
 The waiter brought the meal to the table

 The waiter brought the meal of the day

 We call these units constituents or phrases

 They are organized around a headword, and are named after it:
o Noun Phrase (NP): the head is a noun, a proper name, or a pronoun
o Prepositional Phrase (PP): the head is a preposition followed by an NP
o Verb Phrase (VP): the head is a verb and it includes its complements
o Adjective Phrase (AP): the head is an adjective
 The words that belong to a phrase type behave in similar ways:
o Externally: similar distribution within a sentence

 Test for constituency:

o Most basic test: substitution
 The little boy fed the cat  He fed her
o It-cleft test
 It is the meal that the waiter brought to the table
 It is the meal of the day that the waiter brought to the table
o It is not always easy or obvious to decide what constituents a phrase.

Tree structures:
 We can represent the phrase structure of a sentence by means of a syntactic tree:

 Each node in the tree represents a constituent

 The leaves of the tree correspond to the words in the sentence
 The symbol S denotes the whole sentence and corresponds to the top node

Formal Grammars:
 The tree structures we have seen can be modelled with phrase structure rules:

 A collection of phrase structure rules of this sort constitutes a formal grammar for a particular
 Grammars (like regular expressions, FSAs, FSTs) are a formal device to specify languages
 But they are more powerful because they can be used to define languages that cannot be
captured by FSAs.
 Formally, a grammar can be specified by 4 parameters:
o Σ: a finite set of terminal symbols
o N: a finite set of non-terminal symbols
o S: a special symbol S ∈ N called the start symbol
o R: a finite set of rules or productions containing:
 A sequence of terminal or non-terminal symbols
 The symbol 
 Another sequence of terminal or non-terminal symbols
 Rules have the form: α  β, where α, β ∈ (N ∪ Σ)*
 Context-Free Grammar (CFG):
o Σ: a finite set of terminal symbols
o N: a finite set of non-terminal symbols
o S: a special symbol S ∈ N called the start symbol
o R: a set of rules or productions of the form:
 X  α, where X ∈ N, α ∈ (N ∪ Σ)*

o Conventions:
 Non-terminal symbols are represented with uppercase letters
 Terminal symbols are represented with lowercase letters
 The left-hand side symbol of the first rule is the start symbol
 Terminal symbols: we take these to be words
 Non-terminal symbols:
o Phrase types (NP, VP, PP etc.)
o Word classes (Parts-Of-Speech): interface between words and phrases (N, V, P etc.)
 Start symbol: S stands for ‘sentence’
 Rules:
o Phrase structure rules: rules about phrases and their internal structure
o Lexicon: rules with a POS leading to a word

 A grammar can be used to generate or recognize sentences (sequences of terminals)

o Generation: the language generated by a grammar is the set of sentences that can be
derived by the application of grammar rules.
o Recognition: if a sentence can be derived from the grammar rules, then it is
syntactically well-formed, i.e. part of the language specified by the grammar or
 A grammar can also be used to assign structure to a sentence
o The computational process of assigning a syntactic structure to a sentence is called
 A grammar is structurally ambiguous if there is a string in its language that can be given two or
more syntax trees.

 Recursion in a grammar makes it possible to generate an infinite number of sentences

o Direct recursion: a non-terminal on the LHS of the rule also appears on its RHS:
 VP  VP conj VP
o Indirect recursion: some non-terminal can be expanded (via several steps) to a
sequence of symbols containing that non-terminal:
 NP  Det N PP
 PP  Prep NP

 CF grammars are called context-free because a rule X  α says that X can always be expanded
to α, not matter where the X occurs.
 We need some sort of matching between formal features of the constituents in a sentence.
Feature structures:
 Replace atomic categories (NP-1p-sg) with feature structures

o A feature structure is a list of features (=attributes) and their values

o Also called attribute-value matrices
o Usually values are typed
 A complex value is a feature structure itself

 Two feature structures A and B unify A ∪ B if they can be merged into one consistent feature
structure C, else it fails.

Grammars in Prolog:
 Definite Clause Grammars (DCGs)

 DCGs allow us to enhance a grammar with features by adding extra arguments to the DCG
rules and exploiting Prolog’s matching to enforce agreement

Lecture 2b
 The language defined by a grammar is the set of strings composed of terminal symbols that can
be derived from the grammar’s rules.
 Each sequence of rules that produces a string of the language is called a derivation.
 A string s is ambiguous with respect to a grammar G, if there is more than one possible
derivation that allows G to recognize or generate s.

o The first rule to be applied must begin with the start symbol
o To apply a rule, we ‘rewrite’ the left symbol with the fight sequence
o The derivation finishes when we end up with terminal symbols
o The resulting string of terminal symbols is a string in the language defined by the
 Can we find regular expressions that are equivalent to the grammars?

Right-Linear Grammars:

 For any regular expression or FSA, we can design an equivalent grammar with the following
o Terminals are the input symbols
o Non-terminals are the states
o For every transition X  a Y, we have a production X  a Y
o For every accepting state X, we have a production X  ε
 This kind of grammar is called a right-linear or regular grammar
 Right-linear grammars are a subset of all possible grammars, and regular languages are a
subset of all possible formal languages
 The languages definable by regular grammar are precisely the regular languages

Are all formal languages regular?

 Can we use regular expressions/FSAs/right-linear grammars to define any formal language?
For finite languages- yes.
 How about infinite language L?
o To be regular, L must be recognized by some FSA, which by definition has a finite
number of states
o Since L is infinite, it includes strings that are longer than the number of states in the
o Therefor the FSA must contain a loop
o Let x be the substring up to the loop, y the substring on the loop, and z be the
substring following the loop ending on a final state

o Regardless of how many times we traverse the loop, the resulting string will be part of
the language (all strings xynz for n ≥ 0)
o The Pumping Lemma:
 All these observations are summarized in the Pumping Lemma, so called
because substring y is said to be ‘pumped’
 Pumping Lemma: Let L be an infinite regular language. Then the following
‘pumping’ condition applies:
 There is a string xyz ∈ L such that y ≠ ε and any string xynz for any
n ≥ 0 also belongs to L.
 If the pumping condition does not hold, then L is not regular
 The Pumping Lemma can only be used to prove that a language is not regular:
if we can show that it is not possible to find a string in L for which the
pumping condition holds, then the language is not regular.
 Showing that L satisfies the Pumping Lemma doesn’t prove that L is regular.
 Consider L = {anbn | n ≥ 0}

 We can use the Pumping Lemma to show that L is not regular

 If the pumping condition holds, then:
o Some strings xyz with y ≠ ε is part of L
o Since y must be non-empty, y can only be either
 Some number of a’s
 Some number of b’s
 Some a’s followed by some b’s
o If we pump y in any of the three cases above, we obtain a
string that does not belong to L
 Therefor the pumping condition does not hold and we can conclude
that L is not regular
 But we can specify it with a grammar:
o Sε
o SaSb
o The easiest way to show that a language is regular, is to give a regular expression, FSA,
or right-linear grammar for it.
o Languages that require an ability to count are not regular languages
o Every regular expression or FSA has an equivalent formal grammar, but the reverse
does not hold:
 Not all formal grammars are equivalent to a regex/FSA, only right-linear
grammars are.
o When a formalism or grammar can define a formal language that another one cannot
define, we say that it has greater generative power or greater complexity

The Chomsky Hierarchy:

 Chomsky proposed to classify formal grammars into 4 types, which differ in their generative
capacity, in the complexity of the languages they are able to generate or recognize.
 The different types of grammars within the hierarchy differ with respect to the types of
grammar rules they require:

 This classification acts as a subsumption hierarchy: the set of languages described by

grammars of greater power subsumes the set of languages described by grammars of less

o Balance between expressibility and computational efficiency: grammars with less

expressive power are computationally more tractable.
Grammars for Natural Languages:
 Let’s assume that the following sentences are grammatical and that English allows an
indefinite number of these structures, called center-embeddings

o They can be hard to understand, but they are structurally possible

o They become hard to process due to the number of embeddings (connected with
memory), but the structure is the same throughout
o There is no natural way to write a grammar that allows at most n embeddings.

Is Natural Language Syntax Context-Free?

 Many languages, including English, seem to be context-free
 But some languages, like Swiss German, have been proven not to be
 There is another version of the Pumping Lemma that can be used to show that a language is
not context-free
o Suffice to say that this lemma would allow us to show that the following language is
not context-free: anbmcndm
o The key feature that makes this language non context-free are the cross-serial
dependencies it exhibits

Lecture 3a
Syntactic Ambiguity
 A sentence is structurally or syntactically ambiguous with respect to a grammar, if the
grammar can assign more than one parse tree.
 Although the most plausible meaning of the sentence is compatible with only one structure,
the grammar can assig it two structures:

Wrong right
 Sometimes, more than one syntactic structure (and their respective associated interpretation)
make sense:
o The tourist saw the astronomer with the telescope
 The astronomer was holding the telescope and the tourist saw him
 The tourist saw the astronomer while looking through the telescope
 We can account for some of the different meanings of a sentence by assigning more than one
possible internal structure to it.
 Main types of syntactic ambiguity:
o Attachment ambiguity: one constituent can appear in more than one location in the
parse tree (can be ‘attached’ to more than one phrase):
 The tourist saw the astronomer with the telescope
 I shot an elephant in my pajamas
 The waiter brought the meal (of the day/to the table)
 We saw (the Eiffel Tower / the plane) flying to Paris
o Coordination ambiguity: uncertainty about the arguments of a coordinating
conjunction such as and or or:
 Secure hardware and software
 Secure [hardware] and [software]
 [secure hardware] and [software]
 A house with a balcony or a garage
 A house with [a balcony] or [a garage]
 [a house with a balcony] or [a garage]

Probabilistic Grammars:
 Ambiguity is pervasive in natural language
 Probabilistic grammars offer a way to resolve structural ambiguities
o Main idea: given a sentence, assign a probability t each possible tree and choose the
most probable one
o Compute the probability of a parse tree
o Compute the probability of a grammar rule
 Probabilistic CFCs: where each rule is augmented with a probability:
o Σ: a finite alphabet of terminal symbols
o N: a finite set of non-terminal symbols
o S: a special symbol S ∈ N called the start symbol
o R: a finite set of rules each of the form A  β p, where
 A is a non-terminal symbol
 Β is any sequence of terminal or non-terminal symbols, including ε
 P is a number between 0 and 1 expressing the probability that A will be
expanded to the sequence β, which we can write as P(A  β)
 For any non-terminal A, the sum of the probabilities for all rules A  β must
be 1: Σβ P(A  β) = 1
 P(A  β) is a conditional probability P(β | A): the probability of observing a β once we have
observed an A.
 Example:

 These probabilities can provide a criterion for disambiguation: they give us a ranking over
possible parses for any sentence. We can simply choose the parse tree with the highest
 We can compute the probabilities of the grammar rules with a treebank. The trees in the
treebank are considered the correct trees, the Gold Standard trees.
 For each non-terminal A, we want to compute the
probability of each rule A  β occurs in the treebank
 Divide that by the total number of rules that expand A
(the total number of occurrences of A on LHS in the

 For example: if the rule VP  V NP is seen 105 times

in our corpus, and the non-terminal VP is seen as
LHS 1000 times
 The gold standard tree can be used to evaluate
other trees made by the grammar
 Example:

 Evaluation measures
o Precision: number of correct constituents in the parse tree created by the parser
divided by the total number of constituents in that tree
 How many of the hypothesized constituents are correct?
o Recall: number of correct constituents in the parse tree created by the parser divided
by the number of constituents in the gold-standard parse tree
 How many of the actual constituents were reproduced correctly?
o F-measure: a score that combines recall and precision as follows: F1 = (2PR)/(P+R)

Human Syntactic Processing

 Some difficulties seem related to human memory limitations:
o Recall the sentences with center-embedding constructions we looked at to show that
natural languages were not regular:
 [the cat [the dog [the rat [the goat licked] bit] chase] likes tuna fish]
o This sentence has 3 relative clauses embedded into each other
o It is grammatical: the syntactic structures used are the same as in simpler sentences
with fewer levels of embedding
o There is no way to specify a particular number of embeddings in a grammar, the
grammar allows an infinite number
 These memory limitations are not only due to the number of incomplete phrases we need to
store in memory:

o Other factors play a role as well in what is perceived as complex

 [the pictures [that the artist [I met at the party] took] were great]
o This sentence with a double embedding is relatively easy to process
o But if we substitute the pronoun ‘I’ for a different NP (e.g. my neighbor), it becomes
more difficult
o Pronouns that refer to the speaker or the addressee seem to be easier to process
 Another type of processing difficulties humans experience is related to so-called garden-path
sentences: temporarily ambiguous sentences in which the initially preferred parse leads to a
dead end.
o The horse raced past the barn fell
o The government plans to raise taxes were defeated
o The students forgot the solution was at the end of the book
 If a construction is less predictable, it is more difficult to process.

Lecture 3b
Lexicalized PCFGs:
 Phrasal rules in PCFG are ‘insensitive’ to word-level information
 Solution: add lexical information to phrasal rules to help resolve ambiguities
o Replaces rules like S  NP VP with S(ate)  NP(boy) VP(ate)

o VP(ate)  V(ate) NP(spaghetti) PP(chopsticks) high probability

o VP(ate)  V(ate) NP(spaghetti) PP(meatballs) low probability
o NP(spaghetti)  N(spaghetti) PP(meatballs) high probability
o NP(spaghetti)  N(spaghetti) PP(chopsticks) low probability
 You can also have a partially lexicalized PCFG with rules like
o VP(brought)  VBD(brought) NP PP(to)

Parsing as a Search problem:

 A grammar defines a set of possible sentences: it is a declarative specification of what are
well-formed sentences
 A parser is a procedural interpretation of a grammar: it computes one or more parse trees for
a given (grammatical) sentence
 Parsing can be viewed as a search problem
o A grammar defines a search space of possible trees
o The parser searches through this space
 All possible trees. Each state in this space corresponds to a tree:
o Complete trees a grammar can generate (trees starting with S and ending with words
at the leaves)

o Partial trees (where some node can still be expanded by a rule), which can be seen as
intermediate steps in the construction of complete trees.
 The search space defined by a grammar is a theoretical search space
 We can follow one of two strategies to explore the space:
o Depth-first: we work vertically
o Breadth-first: we work horizontally
 In addition, a parser navigates the search space defined by a grammar following two obvious
o A complete tree for a sentence must begin with the start symbol S
o And must have as leaves the words in the sentence
 These two constraints five rise to two search strategies:
o Top-down: the parser starts with S, assuming the input is a well-formed sentence
o Bottom-up: the parser starts with the words in the sentence and builds up structure
from there

Parsing Algorithms:
 A parser is an algorithm that computes a structure for an input string given a grammar. All
parsers have two fundamental properties:
o Directionality: the sequence in which the structures are constructed
o Search strategy: the order in which the search space of possible analyses is explored
 Three basic parsing algorithms:
o Recursive descent top-down algorithm
 A recursive descent parsing algorithm builds a parse tree using a top-down
approach with depth-first search:
 Given a parsing goal, the parser tries to prove that the input is such as
constituent by building up structure from the top of the tree down to
the words
 It does so by looking at the grammar rules left-to-right
 It recursively expands its goals descending in a depth-first fashion
 If at some point there is no match, the parser must back up and try a
different alternative
o Parser searches through the trees licensed by the grammar to
find the one that has the required sentence as leaves of the
o Directionality = top-down: it starts from the start symbol of
the grammar and works its way down to the terminals
o Search strategy = depth-first: it expands a given non-terminal
as far as possible before proceeding to the next one.
 Shortcomings
o Because it uses depth-first, some types of recursion may send
it into an indefinite loop
o Like all top-down parsers, it may waste a lot of time
considering words and structures that do not correspond to
the input sentence
 Top-down parsers use a grammar to predict what the
input is before inspecting the input at all

o Shift-reduce bottom-up algorithm

 A shift-reduce parser tries to find sequences of words and phrases that
correspond to the right-hand side of a grammar production and replace them
with the left-hand side:
 Directionality = bottom-up: starts with the words of the input and
tries to build trees from the words up
 Search strategy = breadth-first: starts with the words, then applies
rules with matching right hand sides and so on until the whole
sentence is reduced to an S.
 Given a parsing goal the parser tries to prove that the input is such a
constituent by building up structure from the words at the bottom to the top
of the tree
 It does so by looking at the grammar rules right-to-left
 It recursively reduces the words to constituents until it reaches its goals using
two operations:
 Shift: push the next word onto memory repository
 Reduce: if there are n shifted elements that match the right-hand side
of a grammar rule, the rule is applied to build structure.
o Reduce as many times as possible, then go back to shift.
 If at some point no more structure can be built, the parser must back up and
try a different alternative.

o Left-corner algorithm combining top-down and bottom-up strategies

 The left-corner of a rule is the first symbol on the right hand side of a rule.
 S  NP VP, Nom  A Nom, Da
 Given a parsing goal the parser tries to predict that the input is such a
constituent by building structure from the words at the bottom but using top-
sown predictions
 We build structure by choosing a rule whose left-corner matches the current
 Further bottom-up steps are constrained by top-down predictions
 It may have to back up if an S can’t be reached or not all input is consumed

o The Cocke-Yonger-Kasami(CKY) algorithm (bottom-up): an efficient algorithm for CFGs

o The Earley algorithm (top-down): enhance with prediction
 The algorithms require:
o A grammar
o An input string of words
o A parsing goal (a constituent type: S, NP etc.)
 Bottom-up vs. top-down
o Top-down parsers never explore illegal parse trees that cannot form an S, but may
waste time on trees that can never match the input words.
o Bottom-up parsers never explore trees that are inconsistent with the input sentence,
but may waste time exploring illegal parse trees that will never lead to an S root.
o So the two are combined for the best results (left-corner)

