Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 23

Computational

Lexicography
Elina Rozvod
Lecture 1.
Computationl Lexicography as a Brunch of
General Lexicological Science
Ever since the well-known dictum that “in the beginning
was the word”, the word has always been basic to
human understanding and communication.
Facts about the word are recorded in a dictionary,
whose making has been undertaken for centuries by
the lexicographer.
Modern Dictionary Making
Nowadays dictionary-making requires a number of new
meanings, especially with the computer which offers
its large resources for storage, analysis,
dissemination and exchange of data.
It also suggests increased sophistication in data-base
design and linguistic research.
Thus the procedure of dictionary-making now
prominently includes the work of linguists,
engineers and computer scientists.
Sciences: Interrelation
 Linguistic research on the theory and nature of the word is
concerned with the nature of the lexicon.
 Lexicology in general aims at the analysis of the lexicon.
Lexicography, popularly known as dictionary-making, is
concerned with the description of the lexicon.
 If these two related disciplines combine their efforts with the
computer for both lexicon-building and dictionary-
making, we’ll have a new science – computer lexicography,
which represents a convergence or interest from the
viewpoints of Computational Linguistics, Computational
Lexicography and Computer Corpus Linguistics.
Sciences: Tasks
 The task of Computational Linguistics is to specify lexicons
which are formal (i.e. explicit for the computers) and rich
enough for the building of natural language processing
systems. Such lexicons, however, may not necessarily be
suitable for human consumption.
 Computational Lexicography is used to refer to either using
the computer to achieve the goal of fully automatic
lexicographical tasks or utilizing existing machine-readable
versions of linguistic dictionaries into a format explicit
enough for computational linguistic systems.
 Computer Corpus Linguistics focuses on the principles and
practice of compiling texts of actual language in use.
CL: Tasks
Thus we can distinguish the following main tasks of
Computational Lexicography:
 lexicon extraction and building;
 lexicon-based language modelling;
 computational storage of the lexicon;
 the employment of richer lexicons for natural
language processing systems;
 defining standards for lexical exchange and
reusability, so that individual efforts can be
maximised.
The Lexicon in Computational
Lexicography
Contemporary linguistic theories are now emphasizing an ever-
greater reliance on the lexicon because the lexicon may be
viewed as the central repository of linguistic knowledge.
For the computational linguist the lexicon is the “bottleneck” of
natural language processing systems.
This includes attempting to manipulate machine-readable
versions of printed dictionaries and transforming them into
computational lexicons.
Such storing a dictionary in a lexical data/knowledge base allows
the search for lexical information beyond the perspective of
mere printed dictionary, as well as allowing the creation of
various lexicons, when needed.
Developing Notions of the Lexicon
 At early days the lexicon was equated merely as “a
dictionary, a book teaching the signification of words”.
 Nowadays the lexicon is generally understood as “the
vocabulary or a language, especially in dictionary form
offering various types of lingustic information”. D.Crystal
also called it lexis.
 The word “lexicon” can be treated differently. A useful
distinction may be made between the lexicon as ‘an object
defined by linguistic theory’ and the dictionary which
presents certain information drawn from the lexicon in a
stylized way.
Lexicon: Definitions
 George Grimes describes the lexicon as ‘simply the totality of all the
information about words and word-like objects in a natural language, it
registers items and their properties in contrast to the grammar (which
registers combinations of items and their properties) (1988)
 Paul Bennet makes a distinction between
a grammar (i.e. a set of rules for the formation of meaningful and well-
formed sentences) and
a lexicon (i.e. a set of words and expressions whose use is governed by those
rules) (1986).
Grimes definition for the lexicon is interesting since it raises the question of
whether a theory-neutral lexicon is possible to create. His definition also
concerns the problem as to what the lexicon should contain, since
individual lexicons will have their own specifications, depending on the
purpose for which they were built.
Lexicon: Definitions
A more recent definition was suggested by J.Mel’cuk
(1992).
He views the lexicon as ‘a specific list of lexical units
of a language, arranged in a specific way and
supplied with specific information, the whole being
designed for a specific purpose’.
Conclusion: the lexicon has to be discussed through its
relations to the grammar, since what precisely
constitutes grammatical and lexical facts
respectively continue to be a matter of debate.
Bloomfield’s Definition
Within the framework of American structural
linguistics the lexicon was treated as a
peripheral component in relation to grammar,
as illustrated by L.Bloomfield’s statement:
‘the lexicon is really an appendix of the
grammar, a list of basic irregularities’,
whereas a grammar was treated like ’the
meaningful arrangement of forms of a
language (1933).
Chomsky’s Definition
The lexicon was conceptualized as an independent
component in linguistic theory by Noam Chomsky,
one of the most influential linguists of this century.
However, in his theory lexical facts were not only said
to be a different type from general facts, but the
lexicon was still viewed as a ‘wastebin’, into which
irregular items went, whereas regular variations are
not matters for the lexicon, which should contain
only idiosyncratic items’ (Ch, 1968).
Chomsky’s Definition
Chomsky suggests the differentiation between
‘Internalized Language’ (I-language) i.e. mental
knowledge of the language, assuming that this
occurs in a homogeneous speaker-hearer community
(also called language competence), and
‘Externalized Language’ (E-language) (also called
language performance), i.e. everyday speech and
writing (newspapers, televised speeches and
dialogues etc).
Associative Lexicon
 The relation between the lexicons of E- and I-
language may be formulated in terms of
Associative Lexicon. The term was suggested
by A.Makkai in 1980.
 An Associative Lexicon is an information
retrieval system that represents in visual and
audible form the knowledge native speakers
possess about the lexicon of their language.
Associative Lexicon and the Human’s
Brain
The human brain is, naturally, the primary ‘information
retrieval system’ activating our ability to associate
lexemes with one another.
Any artificial system we may build must, therefore, try
to do justice to what there is in human
sociopsychological reality.
The natural Associative Lexicons we carry in our heads
are dialectically and sociolinguistically limited.
They are a subject to growth and shrinkage due to
learning and forgetting.
Associative Lexicon: Features
Associative Lexicon represents the cumulative
knowledge of geographic and sociolinguistic
dialects.
It indicates that members of various speech
communities have the ability to learn from one
another either by memorization or by immigration.
The AL is not to be linked to the ideal hearer-speaker in
the homogenous society, because such people do not
exist.
Associative Lexicon VS Printed Dictionary
The difference between a printed dictionary and AL is that:
 conventional dictionaries tend to form natural semantic nets
around concretely observable and abstract entities while AL
aims at building associative groups of lexemes.
 conventional dictionaries rely traditionally on
alphabetization by which they try to present a totality of the
available lexis in the form of a list, while AL represents the
set of lexemes according to their frequency of usage, exact
range of dialectal habitat, the speaker’s sociological status
etc.
Associative Lexicon: Advantages
 Starting from its creation AL was entitled not to be printed
but computerized, because a computerized lexicon offers
various non-alphabetic paths of access to the word according
to various linguistic (e.g. phonetic, grammatical and semantic
features of classification.
 The access also can be based on the word’s associative or
semantic interconnections with other words.
 Hence, the course of Computational Lexicography
emphasizes the importance of storing the lexicon in a
computer format. Storing the lexicon in this format allows for
flexibility in its retrieval.
 It also reduces the number of problems, associated with its
organization.
The Trend Towards Lexicalism
 Lexicalism i.e. the tendency to shift linguistic
explanation from facts about constructions to facts
about words, may be said to have started from
N.Chomsky.
 It emphasizes that the transformational rules within
the grammar are unsuitable for explaining the
relations between partially analogous structures:
 e.g. They destroyed Pompeii.
 Their destruction of Pompeii.
Analysis:
The lexical information for destroy should include
subcategorization features which allow for an object
Noun Phrase (NP) Pompeii.
The task of the lexicon is then to specify either the
nominal form (if destroy is the head of an NP) or the
verbal form (if destroy is a VP).
The relations between destroy and destruction can be
explained not by means of the transformational
component but rather in terms of the lexicon.
Lexicalism: Trends
Nowadays lexicalism has moved to a more thoroughgoing shift
from the grammar to the lexicon.
But it is one of the parallel trends noticeable in linguistics
starting from 1980. They are:
 Wholism – the tendency to minimize the distinction between
the lexicon and the rest of the grammar.
 Trans-constructionism – the tendency to reduce the number
of rules that are specific to just one construction.
 Poly-constructionism – the tendency to increase the number
of particular constructions that are recognized in grammar.
Lexicalism: Trends
 Relationism – the tendency to refer explicitly to grammatical
relations, and even to treat them as primary in relation to the
constituent structure.
 Mono-stratalism – the tendency to reject the
transformational idea that a sentence is a syntactic structure
and can not be shown in a single structural representation.
 Cognitivism – the tendency to emphasize the similarities and
continuities between linguistic and non linguistic knowledge.
 Implementationism – the tendency to implement grammars
in terms of computer programs.
Lexicalism and Word Grammar
 Lexicalism at its beginning failed to define precisely what the
lexicon is, and how it differs from the grammar.
 In 1990 there appeared Word Grammar introduced by
R.Hudson, where he reinterprets lexicalism as an approach to
grammar in which words are basic and the boundary around
the lexicon plays no part.
 In this view the word assumes central importance, because it
is universally recognized as the unit the grammar describes
and as the boundary between morphology (inside the word)
and syntax (relation between words).
 For Hudson the word is where internal structure is most
arbitrary and relative to the meaning, so it is the unit in
recognition of which memory plays the biggest part.

You might also like