Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

From: AAAI-94 Proceedings. Copyright © 1994, AAAI (www.aaai.org). All rights reserved.

Inducing Deterministic Prolog Parsers from Treebanks:


A Machine Learning Approach *
John M. Zelle and Raymond J. Mooney
Department of Computer Sciences
University of Texas
Austin, TX 78712
zelle@cs.utexas.edu, mooney@cs.utexas.edu

Abstract or untagged treebanks. ’ When trained on an untagged


corpus, it constructs it’s own syntactic and/or seman-
This paper presents a method for constructing de- tic classes of words and phrases that allow it to deter-
terministic Prolog parsers from corpora of parsed ministically parse the corpus. It can also learn to pro-
sentences. Our approach uses recent machine duce case-role assignments instead of syntactic parse
learning methods for inducing Prolog rules from trees and can use learned lexical and semantic classes
examples (inductive logic programming). We dis- to resolve ambiguities such as prepositional phrase at-
cuss several advantages of this method compared tachment and lexical ambiguity (Zelle and Mooney,
to recent statistical methods and present results 1993b). Fourth, it uses a single, uniform parsing frame-
on learning complete parsers from portions of the work to perform all of these tasks and a single, general
ATIS corpus. learning method that has also been used to induce a
range of diverse logic programs from examples (Zelle
Introduction and Mooney, 1994).
The remainder of the paper is organized as follows.
Recent approaches to constructing robust parsers
In section 2, we summarize our ILP method for learn-
from corpora primarily use statistical and probabilis-
ing deterministic parsers, and how this method was
tic methods such as stochastic context-free grammars
tailored to work with existing treebanks. In section 3,
(Black et al., 1992; P ereira and Schabes, 1992). Al-
we present and discuss experimental results on learn-
though several current methods learn some symbolic
ing parsers from the ATIS corpus of the Penn Treebank
structures such as decision trees (Black et al., 1993)
(Marcus et al., 1993). Section 4 covers related work,
and transformations (Brill, 1993), statistical methods
and section 5 presents our conclusions.
still dominate. In this paper, we present a method that
uses recent techniques in machine learning to construct
symbolic, deterministic parsers from parsed corpora
The Chill System
(treebanks). Specifically, our approach is implemented Overview
in a program called CHILL (Zelle and Mooney, 1993b) Our system, CHILL, (Constructive Heuristics Induc-
that uses inductive logic programming (ILP) (Muggle- tion for Language Learning) is an approach to parser
ton, 1992) to learn heuristic rules for controlling a de- acquisition which utilizes a general learning mecha-
terministic shift-reduce parser written in Prolog. nism. The input to the system is a set of training
We believe our approach offers several potential instances consisting of sentences paired with the de-
advantages compared to current methods. First, it sired parses. The output is a shift-reduce parser (in
constructs deterministic shift-reduce parsers, which Prolog) which maps sentences into parse trees.
are very powerful and efficient (Tomita, 1986) and The CHILL algorithm consists of two distinct tasks.
arguably more cognitively plausible (Marcus, 1980; First, the training instances are used to formulate an
Berwick, 1985). Second, it constructs complete parsers overly-general shift-reduce parser that is capable of
from scratch that produce full parse trees, as opposed producing parses from sentences. The initial parser is
to producing only bracketings (Pereira and Schabes, overly-general in that it produces a great many spuri-
1992; Brill, 1993) or requiring an existing, complex ous analyses for any given input sentence. The parser is
parser that over-generates (Black et al., 1992; Black then specialized by introducing search-control heuris-
et al., 1993). Third, the approach is more flexible in tics. These control heuristics limit the contexts in
several ways. It can produce parsers from either tagged
lIn an untagged treebank, parses are represented as
*This research was partially supported by the NSF un- phrase-level word groupings without lexical categories dom-
der grant IRI-9102926 and the Texas Advanced Research inating the words (e.g. parsed text from Penn Treebank
Program under grant 003658114. (Marcus et al., 1993))

748 Natural Language Processing


which certain operations are performed, eliminating use the various operators. The program must be spe-
the spurious analyses. cialized by including control heuristics that guide the
application of operator clauses. This section outlines
Constructing the Overly-General Parser the basic approach used in CHILL. More detail on in-
The syntactic parse of a sentence is a labeled bracket- corporating clause selection information in Prolog pro-
ing of the words in the sentence. For example, the noun grams can be found in (Zelle and Mooney, 1993a).
phrase, “a trip to Dallas”, might be decomposed into Program specialization occurs in three phases. First,
component noun and prepositional phrases as: [np[npa the training examples are analyzed to construct pos-
trip] Lpto [np dallas]]]. We represent such an analy- itive and negative control examples for each operator
sis as a Prolog term of the form: np: Cnp: Ca, trip1 , clause. Examples of correct operator applications are
pp: [to, np: [dallas]]]. generated by finding the first correct parsing of each
A shift-reduce parser to produce such analyses is eas- training pair with the overly-general parser; any sub-
ily implemented as a logic program. The state of the goal to which an operator is applied in a successful
parse is reflected by the contents of the stack and input parse becomes a positive control example for that op-
buffer. A new state is produced by either removing a erator. A positive control example for any operator is
single word from the buffer and pushing it onto the considered a negative example for all previous opera-
stack (a shift operation), or by popping the top one tors that do not have it as a positive example. Note
or two stack elements and combining them into a new that this assumes a deterministic framework in which
constituent which is then pushed back onto the stack each sentence will have a single preferred parsing. Once
(a reduce operation). an operator is found to be applicable to a particular
Each operation can be represented by a single pro- parser state, subsequent operators will not be tried.
gram clause with two arguments representing the cur- For example, in parsing the above phrase, when the re-
rent stack and input buffer, and two arguments to rep- duce(2) NP operator is first applied, the call to op ap-
resent their state after applying the operator. For ex- pears as: op( [trip,al , [to,dallas] , A, B) where A
ample, the operations and associated clauses required and B are as yet uninstantiated output variables. This
to parse the above example phrase are as follows (the
subgoal would be stored as a positive control example
notation, reduce(N) Cat, indicates that the top N
for the reduce(2) NP operator, and as a negative con-
stack elements are combined to form a constituent with
label, Cat): trol example for reduce(2) PP, assuming the order
of operator clauses shown above.
reduce( 2) pp:
In the second phase, a general first-order induction
op ( [Sl , S2 ISsl , Words, Cpp:CS1,S21ISsl, Words).
algorithm is employed to learn a control rule for each
reduce(2)np:
op([Sl,S2ISsl, Words, Cnp:CSl,S21ISsl, Words). operator. This control rule comprises a Horn-clause
reduce(l)np: definition that covers the positive control examples for
op(CSlISsl, Words, [np:CSll I Ssl, Words). the operator but not the negative. There is a growing
shft: op(Stack, [WordlWordsl, CWordlStackl, Words). body of research in inductive logic programming which
addresses this problem. CHILL combines elements from
Building an overly-general parser from a set of train- bottom-up techniques found in systems such as CIGOL
ing examples is accomplished by constructing clauses (Muggleton and Buntine, 1988) and GOLEM (Muggle-
for the op predicate. Each clause is a direct trans- ton and Feng, 1992) and top-down methods from sys-
lation of a required parsing action; there must be a tems like FOIL (Quinlan, 1990), and is able to invent
reduce operation for each constituent structure as well new predicates in a manner analogous to CHAMP (Ki-
as the general shift operator illustrated above. If the jsirikul et al., 1992). Details of the CHILL induction
sentence analyses include empty categories (detectable algorithm can be found in (Zelle and Mooney, 1993b;
as lexical tokens that appear in the analyses, but not Zelle and Mooney, 1994).
in the sentence), each empty marker is introduced via The final step in program specialization is to “fold”
its own shift operator which does not consume a word the control information back into the overly-general
from the input buffer. parser. A control rule is easily incorporated into the
The first step in the CHILL system is to analyze the overly-general program by unifying the head of an op-
training examples to produce the set of general oper- erator clause with the head of the control rule for the
ators that will be used in the overly-general parser. clause and adding the induced conditions to the clause
Once the necessary operators have been inferred, they body. The definitions of any invented predicates are
are ordered according to their frequency of occurrence simply appended to the program. As an example, the
in the training set. reduce(2) pp clause might be modified as:

Parser Specialization op([np:A,BISsl, Words, [pp:[np:A,B] ISsl,Words) :-


The overly-general parser produces a great many spu- preposition(B).
preposition(of). preposition(t.0). . ..
rious analyses for the training sentences because there
are no conditions specifying when it is appropriate to Here, the induction algorithm invented a new predicate

Corpus-Based 749
representing the category “preposition.“2 This new bank (specifically, the sentences in the file ti-tb). We
predicate has been incorporated to form the rule which chose this particular data because it represents realistic
may be roughly interpreted as stating: “If the stack input from human-computer interaction, and because
contains an NP followed by a preposition, then reduce it has been used in a number of other studies on au-
this pair to a PP.” The actual control rule learned for tomated grammar acquisition (Brill, 1993; Pereira and
this operator is more complex, but this simple example Schabes, 1992) that can serve as a basis for comparison
illustrates the basic process. to CHILL.
Experiments were actually carried out on four dif-
Parsing the Treebank ferent variations of the corpus. A subset of the cor-
Training a program to do accurate parsing requires pus comprising sentences of length less than 13 words
large corpora of parsed text for training. Fortunately, was used to form a more tractable corpus for system-
such treebanks are being compiled and becoming avail- atic evaluation and to test the effect of sentence length
able. For the current experiments, we have used parsed on performance. The entire corpus contained 729 sen-
text from a preliminary version of the Penn Treebank tences with an average length of 10.3 words. The re-
(Marcus et al., 1993). One complication in using this stricted set contains 536 sentences averaging 7.9 words
data is that sentences are parsed only to the “phrase in length. A second dimension of variation is the form
level”, leaving the internal structure of NPs unana- of the input sentences and analyses. Since CHILL has
lyzed and allowing arbitrary-arity constituents. Rather the ability to create its own categories, it can use un-
than forcing the parser to learn reductions for arbitrary tagged parse trees. In order to test the advantage
length constituents, CHILL was restricted to learning gained by tagging, we also ran experiments using lexi-
binary-branching structures. This simplifies the parser cal tags instead of words on both the full and restricted
and allows for a more direct comparison to previous corpus.
bracketing experiments (e.g. (Brill, 1993; Pereira and
Schabes, 1992)) which use binary bracketings. Experimental Met hod
Making the treebank analyses compatible with the Training and testing followed the standard paradigm
binary parser required “completion” of the parses into of first choosing a random set of test examples and
binary-branching structures. This “binarization” was then creating parsers using increasingly larger subsets
accomplished automatically by introducing special in- of the remaining examples. The performance of these
ternal nodes in a right-linear fashion. For example, parsers was then determined by parsing the test exam-
the noun-phrase, np :[the, big, orange,cat], would ples. Obviously, the most stringent measure of accu-
be binarized to create: np: [the) int (np) : [big, racy is the proportion of test sentences for which the
int (np) : [orange, cat] 11. The special labeling produced parse tree exactly matches the treebanked
(int(np) for noun phrases, int(s) for sentences, etc.) parse for the sentence. Sometimes, however, a parse
permits restoration of the original structure by merg- can be useful even if it is not perfectly accurate; the
ing internal nodes. Using this technique, the re- treebank itself is not entirely consistent in the handling
sulting parses can be compared directly with tree- of various structures.
bank parses. All of the experiments reported below To better gauge the partial accuracy of the parser,
were done with automatically binarized training ex- we adopted a procedure for returning and scoring par-
amples; control rules for the artificial internal nodes tial parses. If the parser runs into a “dead-end” while
were learned in exactly the same way as for the origi- parsing a test sentence, the contents of the stack at the
nal constituents. time of impasse is returned as a single, flat constituent
labeled S. Since the parsing operators are ordered and
Experimental Results the shift operator is invariably the most frequently
The Data used operator in the training set, shift serves as a sort
of default when no reduction action applies. Therefore,
The purpose of our experiments was to investigate
at the time of impasse, all of the words of the sentence
whether the mechanisms in CHILL are sufficiently ro-
will be on the stack, and partial constituents will have
bust for application to real-world parsing problems.
been built. The contents of stack reflect the partial
There are two facets to this question, the first is
progress of the parser in finding constituents.
whether the parsers learned by CHILL generalize well
to new text. An additional issue is whether the in- Partial scoring of trees is computed by determin-
ing the extent of overlap between the computed parse
duction mechanism can handle the large numbers of
and the correct parse as recorded in the treebank.
examples that would be necessary to achieve adequate
performance on relatively large corpora. Two constituents are said to match if they span ex-
actly the same words in the sentence. If constituents
We selected as our test corpus a portion of the ATIS
match and have the same label, then they are iden-
dataset from a preliminary version of the Penn Tree-
tical. The overlap between the computed parse and
21nvented predicatesactuallyhave system generated the correct parse is computed by trying to match
names. They are renamed here forclarity. each constituent of the computed parse with some

750 Natural Language Processing


Table la: Lexical Tags Table lb: Raw Text

Table 1: Results for restricted length corpus

constituent in the correct parse. If an identical con- the proportion of test sentences having no constituents
stituent is found, the score is 1.0, a matching con- that cross constituents in the correct parsing. The re-
stituent with an incorrect label scores 0.5. The sum maining column reports the percentage of (binarized)
of the scores for all constituents is the overlap score constituents that are consistent with the treebank (i.e.
for the parse. The accuracy of the parse is then com- cross no constituents in the correct parse).
puted as Accuracy = (& + &)/2 where 0 is The results for the restricted corpus in Table 1 are
the overlap score, Found is the number of constituents encouraging. While we know of no other results for
in the computed’parse, and Correct is the number of parsing accuracy of automatically constructed parsers
constituents in the correct tree. The result is an av- on this corpus, the figures of 33% completely correct
erage of the proportion of the computed parse that is using the tagged input and 17% on the raw text seem
correct and the proportion of the correct parse that quite good for a relatively modest training set of 300
was actually found. sentences. The figures for O-cross and crossing% are
Another accuracy measure, which has been used about the same as those reported in studies of au-
in evaluating systems that bracket the input sentence tomated bracketing for the unrestricted ATIS corpus
into unlabeled constituents, is the proportion of con- (Brill (1993) reports 60% and 91.12%, respectively).
stituents in the parse that do not cross any constituent However, our bracketing results for the unrestricted
boundaries in the correct tree (Black, 1991). Of course, corpus are not as good.
this measure only allows for direct comparison of sys- A comparison of Tables la and lb show that con-
tems that generate binary-branching parse trees3 By siderable advantage is gained by using word-class tags,
binarizing the output of the parser in a manner anal- rather than the actual words. This is to be expected
ogous to that described above, we can compute the as tagging significantly reduces the variety in the in-
number of sentences with parses containing no crossing put. The results for raw-text use no special mechanism
constituents, as well as the proportion of constituents for handling previously unseen words occurring in the
which are non-crossing over all test sentences. This testing examples. Achieving 70% (partial) accuracy
gives a basis of comparison with previous bracketing under these conditions seems quite good. Statistical
results, although it should be emphasized that CHILL approaches relying on n-grams or probabilistic context-
is designed for the harder task of actually producing free grammars would have difficulty due to the large
labeled parses, and is not directly optimized for the number of terminal symbols (around 400) appearing in
bracketing task. the modest-sized training corpus. The data for lexical
selection would be too sparse to adequately train the
Results pre-defined models. Likewise, the transformational ap-
The results of these experiments are summarized in proach of (Brill, 1993) is limited to bracketing strings
of lexical classes, not words. A major advantage of
Tables 1 and 2. The figures for the restricted length
corpus in Table 1 reflect averages of three trials, while our approach is the ability of the learning mechanism
the results on the full corpus are averaged over two tri- to automatically construct and attend to just those
features of the input that are most useful in guiding
als. The first column shows the size of the training set
from which the parsers were derived, while the remain- parsing.
ing columns present results for each of the four metrics It should also be noted that the system created
outlined above. Correct is the percentage of test sen- new categories in both situations. In the raw text
tences with parses that matched the treebanked parse experiments, CHILL regularly created categories for
. exactly. Partial is partial correctness using the over- preposition, verb, form-of-to-be, etc.. With
lap metric. The remaining columns reflect measures tagged input, various tags were grouped into classes
based on re-binarizing the parser output. O-Cross is such as the verb and noun forms. In both cases, the
system also formed numerous categories and relations
3A tree containing a single, flat constituent covering that seemed to defy any simple linguistic explanation.
the entire sentence always produces a perfect (non)crossing Nevertheless, these categories were helpful in parsing
score. of new text. These results support our view that any

Corpus-Based 751
~1 Size
50
100
Correct
6.2
4.7
Partial
54.0
54.9
O-Cross
33.1
41.2
Crossing %
68.9
72.0
150 8.5 59.6 39.7 71.5

Table 2a: Lexical Tags Table 2b: Raw Text

Table 2: Results for full corpus

practical acquisition system should be able to create make it feasible to produce parsers from thousands of
its own categories, as it is unlikely that independently- training sentences.
crafted feature systems will capture all of the nuances
necessary to do accurate parsing in a reasonably com- Related Work
plex domain. As mentioned above, most recent work on automati-
Table 2 shows results for the full corpus. As one cally constructing parsers from corpora has focused on
might expect, the results are not as good as for the re- acquiring stochastic grammars rather than symbolic
stricted set. There are a number of factors that could parsers. When learning in this framework, “one sim-
lead to diminishing performance as a function of in- ply gathers statistics” to set the parameters of a pre-
creasing sentence length. One explanation might be defined model (Charniak, 1993). However, there is a
that the longer sentences are simply more complicated long tradition of research in AI and Machine Learn-
and, thus harder to parse accurately. If the difficulty ing suggesting the utility of techniques that extract
is inherent in the sentences, the only solution is larger underlying structural models from the data. Earlier
training sets. work in learning symbolic parsers (Anderson, 1977;
Another possible problem is the compounding of er- Berwick, 1985) used fairly weak learning methods spe-
rors. If an operator is chosen incorrectly early in the cific to language acquisition and were not tested on
parse, it might lead the parser into states that have not real corpora. CHILL represents the first serious appli-
been encountered in training, leading to subsequent er- cation of modern, machine-learning methods to acquir-
rors in the application of other operators. This factor ing parsers from corpora.
might be mitigated by developing more robust training Brill (1993), p resents a technique for acquiring
procedures. By providing control examples from erro- parsers that produce binary-branching syntax trees
neous states as well as correct ones, the parser might be with unlabeled nonterminals. The technique, utilizing
trained to be somewhat self-correcting, choosing cor- structural transformation rules based on lexical cate-
rect operators later on even in the face of previous gory information, has proven quite successful on real
errors. corpora. CHILL, which creates fully labeled parses, has
A third possibility is that the additional sentence a more general learning mechanism allowing it to make
length is “swamping” the induction algorithm. In- distinctions based on more subtle structural and lexical
creasing the average sentence length significantly in- cues (e.g. creating semantic word classes for resolving
creases the number of control examples that must be attachment) .
handled by the induction mechanism. Training sizes of Our framework for learning deterministic, context-
several hundred sentences give rise to induction over dependent parsers is very similar to that of (Simmons
thousands of control examples. Additionally, longer and Yu, 1992); h owever, there are two advantages of
sentences lend themselves to more conflicting analyses our ILP method compared to their exemplar matching
and may increase the amount of noise in the control method. First, ILP methods can handle unbounded,
data making it more difficult to spot useful generaliza- structured data so that the context does not need to
tions. Additional progress here would require further be fixed to a limited window of the stack and the
improvement in the efficiency and noise-handling ca- remaining sentence. The entire stack and remaining
pabilities of the induction algorithm. sentence is available as potential context for deciding
Clearly further experimentation is needed to pin which parsing operator to apply at each step. Second,
down where the most improvement can be made. The the system is capable of creating its own syntactic and
results so far do indicate that the approach has poten- semantic word and phrase classes instead of relying on
tial. The current Prolog implementation running on a the user to provide part-of-speech tagging.
SPARC 2 was able to induce parsers from several hun- CHILL's ability to invent new classes of words and
dred sentences in a few hours, producing over 700 lines phrases specifically for resolving ambiguities such as
of Prolog code. One trial was run using 400 tagged prepositional phrase attachment makes it particularly
sentences from the full corpus; the resulting parser interesting. There is some existing work on learning
achieved 29.4% absolute accuracy and a partial scor- lexical classes from corpora (Schiitze, 1992); however,
ing of 82%. Further improvements in efficiency may the classes are based on word co-occurrence rather than

752 Natural Language Processing


the specific needs of parsing. There are also meth- Kijsirikul, B., Numao, M., and Shimura, M. (1992).
ods for learning to resolve attachments using lexical Discrimination-based constructive induction of
information (Hindle and Rooth, 1993); however, they logic programs. In Proceedings of the Tenth Na-
do not create new lexical classes. CHILL uses a single tional Conference on Artificial Intelligence, pages
learning algorithm to perform both of these tasks. 44-49. San Jose, CA.
Marcus, M. (1980). A Theory of Syntactic Recogni-
Conclusion tion for Natural Language. Cambridge, MA: MIT
This paper has demonstrated that modern machine- Press.
learning methods are capable of inducing traditional
Marcus, M., Santorini, B., and Marcinkiewicz, M.
shift-reduce parsers from corpora, complementing the
(1993). Building a large annotated corpus of en-
results of recent statistical methods. We believe that
glish: The Penn treebank. Computationad Lin-
the primary strength of corpus-based methods is not
guistics, 19(2):313-330.
the particular approach or type of parser employed
(e.g. statistical, connectionist, or symbolic), but the Muggleton, S. and Buntine, W. (1988). Machine in-
fact that large amounts of real data are used to au- vention of first-order predicates by inverting res-
tomatically construct complex parsers that are in- olution. In Proceedings of the Fifth International
tractable to build manually. However, we believe Conference on Machine Learning, pages 339-352.
our approach based on a very general inductive-logic- Ann Arbor, MI.
programming method has several advantages such as Muggleton, S. and Feng, C. (1992). Efficient induction
efficient, deterministic parsing; production of complete of logic programs. In Muggleton, S., editor, In-
labeled parse trees; and an ability to use untagged ductive Logic Programming, pages 281-297. New
text and automatically create new, useful lexical and York: Academic Press.
phrasal categories based directly on the needs of pars-
Muggleton, S. H., editor (1992). Inductive Logic Pro-
ing.
gramming. New York, NY: Academic Press.
References Pereira, F. and Schabes, Y. (1992). Inside-outside rees-
Anderson, J. R. (1977). Induction of augmented tran- timation from partially bracketed corpora. In Pro-
sition networks. Cognitive Science, 1:125-157. ceedings of the 30th Annual Meeting of the Asso-
ciation for Computationad Linguistics, pages I28-
Berwick, B. (1985). The Acquisition of Syntactic
135. Newark, Delaware.
Knowledge. Cambridge, MA: MIT Press.
Quinlan, J. (1990). L earning logical definitions from
Black, E., Jelineck, F., Lafferty, J., Magerman, D.,
relations. Machine Learning, 5(3):239-266.
Mercer , R., and Roukos, S. (1993). Towards
history-based grammars: Using richer models for Schiitze, H. (1992). Context space. In Working
probabilistic parsing. In Proceedings of the 31st Notes, AAAI Fadl Symposium Series, pages 113-
Annual Meeting of the Association for Computa- 120. AAAI-Press.
tional Linguistics, pages 31-37. Columbus, Ohio. Simmons, R. F. and Yu, Y. (1992). The acquisition and
Black, E., Lafferty, J., and Roukaos, S. (1992). Devel- use of context dependent grammars for English.
opment and evaluation of a broad-coverage prob- Computational Linguistics, 18(4):391-418.
abilistic grammar of English-language computer Tomita, M. (1986). Eficient Parsing for Natural Lan-
manuals. In Proceedings of the 30th Annual Meet- guage. Boston: Kluwer Academic Publishers.
ing of the Association for Computational Linguis-
tics, pages 185-192. Newark, Delaware. Zelle, J. M. and Mooney, R. J. (1993a). Combining
FOIL and EBG to speed-up logic programs. In
Black, E. et. al. (1991). A procedure for quantitatively
Proceedings of the Thirteenth International Joint
comparing the syntactic coverage of English gram- conference on Artificial intelligence, pages 1106-
mars. In Proceedings of the Fourth DARPA Speech 1111. Chambery, France.
and Natural Language Workshop, pages 306-311.
Zelle, J. M. and Mooney, R. J. (199313). Learn-
Brill, E. (1993). A u t omatic grammar induction and
ing semantic grammars with constructive induc-
parsing free text: A transformation-based ap-
tive logic programming. In Proceedings of the
proach. In Proceedings of the 31st Annual Meeting
Eleventh National Conference on Artificial Intel-
of the Association for Computational Linguistics,
ligence, pages 817-822. Washington, D.C.
pages 259-265. Columbus, Ohio.
Zelle, J. M. and Mooney, R. J. (1994). Combining top-
Charniak, E. (1993). Statistical Language Learning.
down and bottom-up methods in inductive logic
MIT Press.
programming. In Proceedings of the Eleventh In-
Hindle, D. and Rooth, M. (1993). Structural ambiguity ternational Conference on Machine Learning. New
and lexical relations. Computational Linguistics, Brunswick, NJ.
19(1):103-120.

Corpus-Based 753

You might also like