Professional Documents
Culture Documents
Extracting Verb Valency Frames With Nooj
Extracting Verb Valency Frames With Nooj
Extracting Verb Valency Frames With Nooj
The paper discusses the possibilities of The obtained results show that certain
describing verb valency on the basis of verbs, although in terms of semantic
local grammars developed in the NooJ valency can intuitively be described as
format. In this paper we use NooJ to de- two argument verbs, are exclusively real-
scribe the semantic and syntactic valency ized as one argument verbs on the syntac-
of app. 120 Croatian verbs belonging to tic level. Further results show the impor-
the semantic field of consumption (e.g. tance of non-core frame elements (e.g.
eat, drink, devour, imbibe etc.). The means, company) for certain lexical
whole semantic field, consisting of verbal units.
lexical units is viewed as a single seman-
tic frame. The approach relies on the The obtained results are further used for
theoretical background of frame seman- the refinement of verb frames in existing
tics used in the development of the Fra- and future verb valency lexica of Croa-
meNet (Baker et al., 1998; Ruppenhoffer tian verbs.
et al., 2006). In such an approach the se-
mantic valency of lexical units is de- 1 Introduction
scribed in terms of core (central) and
non-core (peripheral) elements character- Our main agenda is to describe the valency
istic for the whole frame. frames of Croatian verbs of consumption as fully
as possible. This will allow us to search for non-
The level of syntax is observed as a level core (peripheral) elements such as time, place,
of realization or non-realization of con- manner, company, instrument, cause and other in
ceptual arguments. As a starting point we the verb’s co-text. In order to do this, we are us-
use app. 40 sentence types consisting of ing core verb valency frames description and
morphosyntactic combinations possible then checking the co-text window of 4 phrases1
in Croatian (e.g. They thought them that proceed and follow the main verb. The data
mathematics – Nom (nominative) – Acc obtained are used for improving grammars for
(accusative) – Acc (accusative)). For syntactic and semantic verb co-text recognition.
each sentence type a local grammar is We start in the Section 2 with the explanation
built, with free word order taken into of the theoretical background used in this ap-
consideration. At the same time every proach. The Section 3 follows with the descrip-
verb is additionally described in the tion of Croatian verbs of consumption valency
Croatian NooJ dictionary. Each local main characteristics and the description of data
grammar is applied to a corpus, and each
1
occurrence or non-occurrence of lexical Chunks have alredy been labeled in the text so the term
‘phrase’ covers VP, PP and NP chunks.
in our lexicon. Then in Sections 4 and 5 we ex- component of a frame, while making the frame
plain in more detail the syntactic grammars used unique and different from other frames."
for detecting verb’s co-text. Finally, we conclude On the other hand, "frame elements that do not
with the description of data obtained in the ex- introduce additional, independent or distinct
tracted frames and possible future directions. events from the main reported event are charac-
terized as peripheral. Peripheral FEs [i.e. frame
2 Semantic Frame of Consumption (In- elements] mark such notions as TIME, PLACE,
gestion) Verbs MANNER, MEANS, DEGREE and the like."
(ibid., 2006:27). This does not mean that certain
The Berkley FrameNet project is an on-line lexi- frame elements classified as peripheral in one
cal resource for English based on scenes-and- frame cannot be classified as central in other.
frames semantics (Fillmore, 1977a; 1977b) and An element is classified as central even in
supported by corpus evidence. Ruppenhofer et al. cases when it does not appear in a sentence, but
(2006:5) point out that the project's "aim is to it is conceived as present on a conceptual level.
document the range of semantic and syntactic The semantic interpretation of a non-appearing
combinatory possibilities – valences – of each or missing element on the level of syntax can be
word in each of its senses […]." The FrameNet definite (definite null instantiation) or indefinite
lexical database contains app. 10 000 lexical (indefinite null instantiation). 2 The indefinite
units in nearly 800 hierarchically-related seman- cases are illustrated by the missing objects of
tic frames. verbs like eat, drink, sew, bake etc., when these
The lexical unit is defined as a pairing of a transitive verbs are used in intransitive
word with a meaning, i.e. each sense of a (poten- (monovalent) constructions.
tially polysemous) word belongs to a different Verbs like eat and drink belong to the
semantic frame. A semantic frame is conceived semantic frame Ingestion defined as: "An
as "a script-like conceptual structure that Ingestor consumes food, drink, or smoke
describes a particular type of situation, object, or (Ingestibles), which entails putting the
event along with its participants and props." Ingestibles in the mouth and taking them further
(ibid., 2006:5) into the body to be absorbed. This may include
Fillmore and Atkins (1994:370) stress that the the use of an Instrument."
"frame semantics [...] begins with the effort to The central frame elements are Ingestor and
discover and describe the conceptual framework Ingestibles. The Ingestor is defined as the person
underlying the meaning of the word, and ends eating, drinking or smoking, and the Ingestibles
with an explanation of the relationship between as the entities that are being consumed by the
elements of the conceptual frame and their reali- Ingestor). Periperal frame elements of the
zations within the linguistic structures that are semantic frame Ingestion in the FrameNet are
grammatically built up around the word." Each Degree, Duration, Instrument, Manner, Means,
semantic frame in the FrameNet contains the de- Place, Purpose, Source, Time.
scription of a typical situation or event, lexical For the term Ingestor we further use the term
units that belong to this frame and typical or ex- Consumer, and for Ingestibles the term
pected participants in this event and the circum- Consumed.
stances in which the whole event occurs. The
commonality or prototypycality of participants is 3 Lexicon
conceived in terms of Fillmore's (1977b) scenes
or Schank and Abelson's (1977) scripts. In other Croatian NooJ lexicon now has 1960 verbs with
words, each frame represents a typical event with the lexical information as described in (Vu kovi
typical participants (core or central frame ele- et al. 2008). Of that, 102 are verbs of consump-
ments) and typical circumstances (non-core or tion with the following distribution:
peripheral frame elements). • 12 verbs require only consumer (nomi-
The sentences from the corpora are annotated native case),
on three levels: frame element (semantic role), a
grammatical function (e.g. subject or object) and 2
Ruppenhofer i dr. (2006:33): "Sometimes FEs that are
a phrase type (e.g. NP or PP). Ruppenhofer et al. conceptually salient do not show up as lexical or phrasal
(2006:26) define a core frame element as the material in the sentence chosen for annotation. [...] The
"one that instantiates a conceptually necessary FE that has been identified indicates which semantic
role the missing element would fill, if it were present."
• 3 verbs require consumer (nominative • <+cons12> if the verb needs a consumer
case) and what is being consumed (geni- and what is being consumed (in genitive
tive case), case)
• 34 verbs require consumer (nominative
case) and what is being consumed (accu- Ona se najela gljiva.
sative case), (She has stuffed herself
• 2 verbs require consumer (nominative with mushrooms.)
case) and what is being consumed (in-
strumental case), • <+cons14> if the verb needs a consumer
and what is being consumed (in accusative
• 14 verbs require either only consumer or case)
consumer and what is being consumed
(genitive case),
Ja jedem ribu.
• 28 verbs require either only consumer or (I am eating fish.)
consumer and what is being consumed
(accusative case), • <+cons17> if the verb needs a consumer
• 5 verbs require either only consumer or and what is being consumed (in instru-
consumer and what is being consumed mental case)
(instrumental case),
• 3 verbs require either consumer and what Oni se hrane kukuruzom.
is being consumed (genitive case) or con- (They are feeding on corn.)
sumer and what is being consumed (accu-
sative case), Consumer in all these cases may or may not be
mentioned i.e. it is optional in all the descriptions
• 1 verb requires either consumer and what since it can be understood from the verb form.
is being consumed (accusative case) or Thus, the full description of our verbs of con-
consumer and what is being consumed sumption in the lexicon looks like this:
(instrumental case).
jesti,V+FLX=JESTI+Prelaz=pov
All the verbs of consumption in our lexicon
+cons+cons1+cons14
have been added the ‘cons’ label to show they
belong to the semantic field of consumption in
general. Additional labels for core arguments are meaning that the verb ‘jesti’ (to eat) is a re-
added to them in the following manner: flexive verb (+Prelaz=pov) of consumption
(+cons) with the two possible co-texts. One is
with only a consumer (that may or may not be
• <+cons1> if the verb needs a consumer
mentioned in the sentence) and the other one is
(in nominative case)
with a consumer in nominative case and some-
thing being consumed in accusative case.
Ja jedem.
(I am eating.)
Picture 1: the main graph for detecting verb’s co-text
4 Syntactic Grammar for Detecting rova street but carries
Verb’s Co-Text their food home.)
CONSMER>
pili vo ne sokove. (Table 3) +cons1> +cause>
+manner
<ADV
(Before the beginning of the
<NP+
meeting they ate croassans
and fruit and drank fruit
juices.) Table 4
-4 -3 -2 -1
Prije po etka sus- -1 0 +1 +2 +3 +4
reta prije jeli kroa- i pili vo ne
<PP+G> po e- su sane i soko-
tka vo e ve
susreta
0
jeli su
CONSUMED>
<VP+cons14>
<NP+Acc>
+cons14>
+cons14>
+1 +2 +3 +4
+time>
<ADV
<NP+
kroasane i pili vo ne
<VP
<C>
<VP
i vo e sokove
<NP+Acc> <C> <VP+cons14> <NP+Acc> Table 5
Table 3