Professional Documents
Culture Documents
Stochastic Definite Clause Grammars: 1 Introduction and Background
Stochastic Definite Clause Grammars: 1 Introduction and Background
Stochastic Definite Clause Grammars: 1 Introduction and Background
139
International Conference RANLP 2009 - Borovets, Bulgaria, pages 139–143
of Cussens algorithm is the estimation of the number consists of a name which is a Prolog atom, followed
of times rules are used in failed derivations. PRISM by an optional parenthesized, comma-separated list of
estimates failed derivations using a failure program, features; (F1..Fn). Features are either Prolog atoms
derived through a program transformation called First or variables. Rule constituents may additionally have
Order Compilation (FOC) [14]. prefix regular expression modifiers. The allowed mod-
ifiers are * (kleene star) meaning zero or more oc-
currences, + meaning one or more occurrences and ?
2 Stochastic Definite Clause meaning zero or one occurrence.
Grammars Embedded code takes the form, { P }, where P is
a block of Prolog goals and control structures. The
Stochastic Definite Clause Grammars is a stochastic allowed subset of Prolog corresponds to what is al-
unification based grammar formalism. The grammar lowed in the body of a Prolog rule, but with the re-
syntax is modeled after, and is compatible with, Defi- striction that every goal must return a ground answer
nite Clause Grammars. To facilitate writing stochastic and may not be a variable. Also, while admitted by
grammars in DCG notation, a custom DCG compiler the syntax, meta-programming goals like call are not
has been implemented. The compiler converts a DCG allowed. The goals unify with facts and rules defined
to a PRISM program, which is a stochastic model of outside the embedded Prolog code, but not in other
the grammar. embedded code blocks.
Utilizing the functionality of PRISM, the grammar Symbol lists are Prolog lists of either atoms or vari-
formalism supports parameter learning from anno- ables or a combination of the two. The list usually
tated or unannotated corpora and provides and mech- take the form, [ S1,S2,..,SN ], but the list opera-
anism for parse selection through statistical inference. tor | may also be used. However, it is required that
Parameter learning and inference is performed using every variable in the list is ground. A symbol list may
PRISMs builtin functionality. not be empty.
SDCG include some extensions to the DCG syntax. Expansion macros have the form,
It includes a compact way of expressing recursion, in- @name(V1,V2,...,Vn)
spired by regular expressions. It has expansion macros
used for writing template rules which allow compact where name is an atom and is followed by a non-empty
expression of multiple similar rules. The grammar syn- parenthesized, comma-separated list, V1...Vn, con-
tax also adds a new conditioning operator which makes sisting of atoms or variables or a combination. A
possible to condition rule expansions on previous ex- macro corresponds have a corresponding goal, name/n,
pansions. which must be defined.
140
place of a matching rule constituent. of the nonterminal and its arity. For instance, since
A rule ri may have a condition (conditioning clause), np has an arity of 1, the corresponding random vari-
in which case the probability of its expansion depend able is named np(1). The possible outcomes of this
on the probability of the condition ci ∈ C n,a being particular random variable are np_1_1 and np_1_2.
true, C n,a being the set of possible values for condi- The first parameter of the implementation rules
tion clauses for rules in Rn,a . Each distinct condition uniquely identifies them and this name corresponds
(clause value) has a separate probability, such that to an outcome of the random variable used by the se-
lection rule. The implementation rules for the above
|C n,a |
X grammar is shown below:
P (ci ) = 1
i=1 np_impl(np_1_1, Number, In, Out) :-
det(Number, In, InOut1),
We denote number of rules in Rn,a satisfying a partic- noun(Number, InOut1, Out).
ular condition c, |n, a, c|. np_impl(np_1_2, Number, In, Out) :-
It holds for the sum of probabilities of such rules noun(Number, In, Out).
rin,a ∈ Rn,a that,
|n,a,c| 2.5 Grammar extensions
X
P (rin,a |c) =1 Regular expression operators, expansion macros and
i=1 conditioning clauses, which are extensions of the usual
where the probability of a rule r given a combination DCG syntax, makes it possible to express aspects of
of conditions c is their product, P (r|c) = P (r)P (c). If the grammar more compactly. These operators are
rules with the same head (Rn,a ) occur without condi- implemented in a preprocessing step which expands
tioning (C n,a = ∅) then the condition true is assumed the compacted grammar.
and P (true) = 1.
The probability of a derivation is the product of the 2.5.1 Regular expression modifiers
probabilities of all rules used in that derivation. The
probability of given sentence is the sum of the proba- Regular expression operators is a way of expressing
bilities for each possible derivation of the sentence. A recursion in a more convenient manner. An example
derivation may be unsuccessful due to failure of vari- grammar rule containing all the allowed regular ex-
able unification. The probability of all possible deriva- pression operators is shown below:
tions, successful and unsuccessful sums to unity, given name ==> ?(title), *(firstname), +(lastname).
by the relation, Psuccess = 1 − Pf ailure .
The regular expression operators are implemented
2.4 The translated SDCG by generating some additional rules and replacing the
original constituent (orig const), which the operator
The compiler behaves similar to a usual DCG com- is applied to, with another constituent (new const).
piler, by transforming rules in a DCG syntax to Pro- All regular expression operators can be implemented
log rules with difference lists. In addition to these nor- generating a subset of the following rules:
mal Prolog rules, which we call implementation rules,
special selection rules are used to control the stochas- 1) new_const ==> []
tic derivation process. Each rule head with the same 2) new_const ==> orig_const
number and arity in the original DCG grammar are 3) new_const ==> new_const, new_const
grouped together and managed by one selection rule.
The selection rule has the same name and number of The ? operator is implemented by adding rules 1-2.
features as the of the original rule, but any ground The + operator is implemented adding rules 2-3 and
atoms in the original rule are replaced by variables in the * operator is implemented adding all the rules.
the selection rule. Consider the two rules in the exam- The name new_const is symbolic. The compiler use a
ple below, naming scheme, which avoids conflicting names: The
name of the regular expression modifier is prefixed to
np(Number) ==> det(Number), noun(Number). the constituent name. For instance *(firstname) be-
np(Number) ==> noun(Number). comes sdcg_regex_star_firstname/0. The compiler
only adds the implementation rules for the same regu-
The generated selection rule for the two rules is shown lar expression once, even if it is used in multiple rules.
below:
2.5.2 Expansion macros
np(Number, In, Out) :-
msw(np(1), RuleIdentifier), Macros are special Prolog goals embedded in gram-
np_impl(RuleIdentifer, Number, In, Out). mar rules. They may occur in both the head and the
body of rules. Grammar rules with macros are meta
The msw goal is a special PRISM primitive which grammar rules; they act as templates for the genera-
implements simulation of a random variable, which tion similar rules. The result of macro expansion of a
here stochastically unifies RuleIndentifier to a value rule is a set of rules, equal in structure to the original
given the name of the random variable. The name of rule, but where each macro is replaced with selected
the random variable is assigned according to the name parameters from an answer for the goal. The ground
141
input to the goal is omitted by default. It is possible to sentence ==>
explicitly configure which parameters of a goal should np(nohead,NPHead),vp(NPHead,VPHead).
be inserted using an expand_mode directive. If the np(ParentHead,Head) | @headword(W) ==>
goal contains more than one non-ground/answer pa- det(ParentHead,DetHead),noun(DetHead,Head).
rameter, the answer parameters are inserted comma- vp(ParentHead,Head) | @headword(W) ==>
separated. If a rule contains more than one macro, verb(ParentHead,Head).
then the set of expanded rules correspond to a carte-
sian product of the answers for all the macros. When We have not specified conditioning modes for the
several macros in the same rule use the same name for rules, but in each case the condition corresponds to the
a variable, this works as a constraint on the answers first parameter in the head. Assume that the macro
for the macros. This is exactly as if the goals of the @headword expands to each of the words (terminals)
macros were constituents in the body of a Prolog rule. in the grammar. The headword is propagated from
The original motivation for expansion macros was the terminals, so for instance in the sentence rule, the
integration of lexical resources. Suppose that we wish choice of which vp rule to expand depends on head-
to integrate the lexicon defined by the following simple word propagated from the preceding np. Conditioning
Prolog program, a rule on every word implicates that the rule given that
word will have a distinct probability distribution.
word(he,sg,masc). word(she,sg,fem). More advanced lexicalization schemes can easily be
number(Word,Number) :- word(Word,Number,_). created using the conditioning mechanism. The limi-
gender(Word,Gender) :- word(Word,_,Gender). tation lies in the order in which variables conditioned
on are unified (and thus derivation order). It is not
expand_mode(number(-,+)). possible to condition on a variable which is not yet
expand_mode(gender(-,+)). ground.
142
learn([ start([det,noun,modalverb,verb], allow expression of probabilistic grammars very com-
[the,can,will,rust],[]), pactly. This naturally includes probabilistic regular
start([det,noun,modalverb,verb], grammars (such as the demonstrated POS tagger) and
[the,can,can,rust],[]), probabilistic context-free grammars, but also includes
start([det,noun,noun], [the,can,rust],[]), context sensitive grammars. It was demonstrated that
start([det,noun],[the,rust],[]), lexicalization schemes can be compactly expressed in
start([modalverb,noun,verb], the formalism through conditioning and macros.
[will,rust,rust],[]), Some optimizations are needed in order to utilize
start([noun,modalverb,verb], large grammars (and training sets) for natural lan-
[will,can,rust],[]), guages. Alternative methods for parameter learning
start([noun,noun],[the,the],[]) ]). may be explored.
Finally, the success of the grammar formalism de-
When the grammar/tagger has been trained we can pends on the applications that using it. SDCG will
pose a viterbi query to find the most likely tag se- evolve with the development of applications using it.
quence for a sentence,
| ?- viterbig(start(T,[the,can,will,rust],[])).
References
T = [det,noun,modalverb,verb|_4794] ? [1] S. P. Abney. Stochastic attribute-value grammars. Computa-
tional Linguistics, 23(4):597–618, 1997.
yes
[2] C. Brew. Stochastic HPSG. In Proceedings of EACL-95,
February 1995.
143