Download as pdf or txt
Download as pdf or txt
You are on page 1of 45

UNIT 1 – INTRODUCTION

Because dictionaries focus on the meaning of words, the understanding of a word’s meaning by the users of a
certain language, and their capability to explain it by putting it into other words, are preconditions for the
creation of dictionaries.

Because ontologies focus on how concepts are captured and how they are codified in words, semantic analysis
is also a kind of precondition for the further construction of ontologies.

Hanks states that lexicographical compilations are inventories of words that have multiple applications and are
compiled out of many different sources (manually and computationally). However, none of them are completely
comprehensive and none of them are perfect.

For concepts to be transmitted we as humans need words, and this is why it is important to differentiate between
concepts and words. Understanding the similarities, differences and interrelations between them in the present
situation of such massive use of the Internet becomes more and more important.

This concept-word relationship concerns the process of conceptualization. Conceptualization is the relevant
information itself. It is independent from specific situations or representational languages, because it is not yet
about representation. Conceptualization is accessible after a specification step.

Precodification of entities or relations that usually lead to the lexicalization of nouns and verbs is a specification
step. This is the marking of either an entity or a relation in the notation of an ontology. For example. The verb
run as in “She runs the Brussels Marathon” is precodified as a ‘predicate’ and thus as a ‘relation’, and the nouns,
She, Brussels and marathon are precodified as entities.

What is the objective of an ontology? Basically, it is to conventionalize concepts in order to handle meaning and
knowledge efficiently. Every conceptualization is a mental product which stands for the view of the world adopted
by an agent; it is by means of ontologies, which are language-specifications of those mental products, that
heterogeneous agents (humans, artificial, or hybrid) can assess whether a given conceptualization is shared or
not, and choose whether it is worthwhile to negotiate meaning or not. To be useful, a conceptualization has to
be shared among agents, such as humans. The information content is defined by the collectivity of speakers.

There are two ways to access the study of words and concepts:

• Onomasiological approach > it tries to answer the question ‘how do you express X?’
o This approach is used to build ontologies
o It starts from a concept (an idea, an object, a quality, an activity, etc.) and asks for its names.
o What is the name for medium-high objects with four legs that are used to eat or to write on
them? > Table
• Semasiological approach
o This approach is used in the construction of terminologies, or banks of terms
o It starts with the word and asks what it means, or what concepts the word refers to.
o What is the meaning of the word ‘table’? > A medium-high object with four legs…

1. THE USE OF A METALANGUAGE

Saeed defines semantics as the study of meaning communicated through language. Since there are quite a
number of languages and since meaning and knowledge are, to some extent, interchangeable terms, we can say
that knowledge representation is fairly connected to the particular language on which the referred knowledge is
based.
Consequently, this author suggests that the use of a metalanguage could be a possible solution to the problem
of the circularity of the meaning of a word in a dictionary. Semantic and encyclopaedic knowledge, since
designing meaning representations, for example of words, involves arguing about which elements of knowledge
should be included. But metalanguages also create problems for lexical representation.

Generally, a metalanguage will be necessary to build up any ontology, especially if it is aimed to be applicable to
more than one language. There are two kind of components in an ontology that make use of a metalanguage:
the represented categories and relations, and the represented procedures. Sometimes the represented
procedures are just the relations themselves.

2. LANGUAGE DEPENDENT AND LANGUAGE INDEPENDENT KNOWLEDGE REPRESENTATION

The representation of knowledge based on language or based on concepts differs in a number of aspects. It
affects the creation of ontologies and dictionaries.

The importance of prototypicality in the organization of the lexicon blurs the distinction between semantic
information and encyclopaedic information. This means that the references to typical examples and
characteristic features are a natural thing to expect in dictionaries. On the other hand, in the construction of
ontologies prototypical examples of categories tend to be linked to higher categories or inclusive categories,
usually taking the lexical form of hyperonyms.

2.1 Language dependent applications

Language dependent knowledge representation is based on the way in which a certain language codifies a certain
category because it affects the representation of this particular piece of world knowledge. Within semantics
there are certain linguistic categories that are highly dependent on the language in which they are identified. For
example, the pronominal system.

As is well known, there are more or less universal linguistic features that all languages codify with various
syntactic, morphological or lexical devices to refer to the addressee. An example of language independent
knowledge or conceptual representation could be the lexical terms for numbers or the mathematical symbols
and the symbols of logic.

Language is a conceptual phenomenon. This means that different languages lexicalize with more or less detail
those aspects (concepts) of the external world that affect their speaker in particular ways. For example, the
different types of snow are lexicalized with different words in Eskimo languages.

Therefore, the way a certain language codifies a certain category – for example a certain state of affairs – affects
the representation of this particular piece of world knowledge because it selects some elements instead of
others. The codification of a certain state of affairs, for instance, affects the types of constructions that a certain
language may produce.

Knowledge representation in language applications

A particular kind of knowledge representation is based on lexical organization, which can take many forms:

• Network > a group of words that are not so tightly organized


• Set > an organized and bounded group of words
• Hierarchy > an organized group of words usually following a certain semantic relationship > e.g.
hyponomy or hyperonym.
In inheritance systems the main link is a certain feature (semantic or syntactic) that can be identified as recurrent
at different levels of a structure. Understanding how words are organized in a certain format is important for
both dictionaries and ontologies.

Meaning representation in language applications

Language is organized using a limited number of formal structures of different kinds (lexical, syntactic,
morphologic, phonetic), which are complemented with a whole repertoire of ontological and situational
(sensory-perceptual) information simultaneously processed. Formal codification of purely linguistic input is
inevitably limited because of processing constraints, but it is complemented with additional non-linguistic
information that enters the processing human system via other non-linguistic means.

Trying to represent something like meaning, highly dependent on contextual information, is not an easy task.
Part of the difficulty is that it is precisely the lexicon, syntax, morphology, or phonetics of a certain language that
we use to convey meaning. Only a small part of the aspects of social behaviour constitutes contextual information
which has been codified in human languages in many different ways, and which can be labelled with different
kinds of pragmatic parameters.

In addition, other non-pragmatically codified information is missing in linguistic representation as such, and must
enter the system through other non-linguistic-processing-input-systems. The paradox is that in order to codify
all that non-linguistic information we sometimes need a language, whether this language be a conventional
language, a metalanguage or another symbolic system.

Lexical representation in language applications

A typical example of lexical representations for language applications is a common parser, and the clearest
example of a language dependent linguistic application is a dictionary of any kind.

2.2 Language independent applications

An easy example of a language independent representation could be the Latin figures for numbers or the
mathematical symbols that mathematicians of all languages use. An easy example of language independent
application is mathematical notation.

The organization of concepts into hierarchies is relevant. Each concept is studied as related to its super- and sub-
concepts. The inheritance of defined attributes and relations from more general to more specific affects complex
conceptualizations.

2.3 Conclusions about knowledge representation

Knowledge representation has taken the form of printed and electronic dictionaries and ontologies. Dictionaries
are one possible kind of language-dependent instrument of knowledge representation in the sense that
dictionaries compile all or most part of the lexical information of a particular language.

Ontologies, on the other hand, compile other than lexical information. However, in order to codify non-lexical
information, lexical means are needed. Ontologies can be defined as knowledge networks. A knowledge network
is a collection of concepts that structure information and allows the user to view it. In addition, concepts are
organized into hierarchies where each concept is related to its super- and sub-concepts.

3. DIFFERENT APPROACHES TO MEANING REPRESENTATION

Lexical resources or linguistic ontologies have a structure motivated by natural language, and more particularly
by the lexicon.
3.1 Cognitivism and Ontologies

As for the influence of cognitivism on the design of ontologies, the main issue is relating to categorization. Studies
on categorization explain:
• how componential semantics is developed, in terms of the category of a word defined as a set of
features (syntactic and semantic) that distinguishes this word from others.
• how this model fits extremely well with the Formal Concept Analysis (FCA), and how componential
semantics is nowadays in use in several ontological approaches.

However, cognitive approaches see componential semantics as limited by various developments centred on the
notion of prototypicality. The association of words and even concepts within a category is a scalar notion. The
problem of category boundaries is therefore a serious one. Contextual factors have been said to influence these
boundaries. Another issue is the use of a feature list that has been considered too simplistic and that raises the
question of the definition of the features themselves.

Another common ground is the investigation of the models for concept types (sortal, relational, individual, and
functional concepts), category structure, and their respective relationships to ‘frames’. On the other hand, there
is wide converging evidence that language has a great impact on categorization. Where there is a word labelling
a concept in a certain language, it makes the learning of the corresponding category by children much faster and
easier.

3.2 Predication in Linguistics and in Ontologies

A predicate is a relational concept that links one or several entities. Languages frequently lexicalize predicates in
the form of verbs and adjectives.

Fillmore proposed that we should analyse words in relation to each other according to the frame or script in
which they appear. This field of research has resulted in the creation of resources as FrameNet and VerbNet.

All contextual information configures the background knowledge which is organized in a mental ontology. All this
encyclopaedic knowledge or background knowledge is stored in our minds. The challenge for language
representation and, after that, for computational linguistics too, is first how to model, and them how to transfer
this modelling to a machine-readable program. This is how we go from a linguistic model to a computational
program.

3.3 Are conceptual and semantic levels identical?

Many proponents of the cognitive approach to language postulate that semantics is equated with the conceptual
level. Jackendoff explains that surface structures are directly related to the concepts that they express by a set
of rules. Since, according to him, it is impossible to disentangle purely semantic from encyclopaedical information
without losing generalizations, semantic and conceptual levels must form a single level.

However, Levinson advanced serious arguments, involving Pragmatics in particular, in favour of highlighting the
distinction between semantic and conceptual. These differences are explained by the differing views on language
inherited from different theoretical perspectives. While Jackendoff focuses on Chomsky’s internal language,
Levinson insists on the importance of the social nature of language. Language primarily understood as a tool for
communicating rather than as a tool for representing knowledge (in someone’s mind) corresponds to these
different perspectives.

Bateman argues for the need of an interface ontology between the conceptual and surface levels. He specifies
that such a recourse should neither be too close to nor too far from the linguistic surface.

3.4 The philosophical perspective


Prevot et al. claim that determining a system of ontological categories in order to provide a view as to What kinds
of things exist? is one of the essential tasks of metaphysics. In this context, the focus has been mainly on the
upper level of the knowledge architecture and on finding an exhaustive system of mutually exclusive categories.
Recent approaches are rendering philosophical considerations worth the exploration.

Authors state that the crucial area in which philosophy can help the ontology builder might not be the positioning
of such and such a concept in a ‘perfect’ taxonomy, but rather the methodology for making such decisions. The
second important aspect is the distinction between:

• Descriptive metaphysics > they hold the view that the material accessible to the philosopher is not the
real world but rather how the philosopher perceives and conceptualizes
• Revisionary metaphysics > it uses rigorous paraphrases for supposedly imperfect common-sense
notions.

This move leads to the distinction between Ontology as a philosophical discipline and ontologies as knowledge
artefacts we are trying to develop. Modern ontology designers consider many potential ontologies concerning
different domains and capture different views of the same reality.

4. LEXICAL DESCRIPTION THEORIES

4.1 Introduction

De Miguel et al. recognize four main models in the description of the lexicon. They are called structural,
functional, cognitive and formal.

4.2 Structuralism

4.2.1 European Structuralism

The so-called structuralism has Lévy-Straus as one of its main representatives, and Saussure not only as a
structuralist, but also as the real founder of modern linguistics.

Structural linguistics is a reaction against 19th century historicism and philological studies and related
comparative grammars. Against this approach based on the analysis of isolated units, structuralism supports the
idea that there are no isolated linguistic units but that the only exist and make sense in relation with other
units.

Among Saussure’s many relevant contributions, the notion of system is of particular importance. For him
language is a system in which its units are determined by their place within the system rather than by any other
external reference. He claimed that concepts of meaning in language are purely differential. They are defined
not by positive terms (by their content), but rather negatively (by their relation to other meanings in the
language).

The main outcome of his notion of a system was the development of structural phonology in the Prague Circle.
While the Prague Circle developed structural phonology, the Copenhagen School focused strongly on
glossematics, a discipline more related to mathematics than to linguistics. Hjelmslev presented a formal logic
model of language based on mathematical correlations, anticipating the present developments of computer
sciences and its applications. Greimas, one of Hjelmslev followers, developed a structural model of lexico-
semantic analysis of meaning.
Initially, structural semantics or lexematics was just a direct application of phonological methods with
subsequent problems derived from the fact that phonemes and words are quite different types of units which
possibly called for different ways and methods of analysis.

The lack of regularity which characterizes the lexicon and the lousy structure of semantic units gave base for
Coseriu’s three problems of structural lexemantics: complexity, subjectivity and imprecision.

Coseriu also held that structural semantics, or lexematics, belongs to the systematic level of the lexicology of
contents, and he regrets that most lexical studies are based on the relations among meanings or, more
frequently, on the correlation between a significant and a signified; that is a semasiological perspective. In his
view, lexical analysis should only concentrate on the relation among meanings.

Lexematic structures, according to him, include paradigmatic and syntagmatic perspectives. Among paradigmatic
structures, what he defined as lexical fields is one of the most important ones. It is based on the idea that the
meaning of one lexical piece depends on the meaning of the rest of pieces in its lexical field.

Other linguists such as Pottier showed a tendency to base linguistic distinctions on the real world rather than on
the language itself. Coseriu adopts a much more pure linguistic approach and suggests going bottom up and
starting from basic oppositions within a lexical field as a better option. He was the first linguist who tried to offer
a complete typology of lexical fields based on the lexematic opposition that operates within them. He divides
lexical fields into two classes: unidimensional and pluridimensional ones. The former is divided into three
possible categories. Antonimic fields (high/low), gradual fields (cold, cool, warm, hot) and serial fields. These last
ones are based on equipollent oppositions. That is, oppositions that are logically equivalent and in which each
element is equally opposed to the rest (Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, Sunday).

4.2.2 American Structuralism

Franz Boas and Edward Sapir paved the way for a more fully-fledged linguistic structuralism, with Leonard
Bloomfield as its most relevant proponent.

One of the main differences between European and American structuralism is the type of languages they focused
on. While Europeans concentrated on well-known languages – old or modern -, American structuralists studied
American Indian languages with an important focus on distributionalism. Their studies were much more
sintagmatically oriented. Zelling Harris hols that the use of meaning can only be used heuristically, as a source of
hints to determine criteria to be used in distributional studies.

American structuralists were also heavily influenced by psychological behaviourist theories. It is perhaps because
of this that they lacked interest in the study of meaning.

4.3 Functional Models

The functional group of theories share a number of basic aspects, making them contrast with, for example, the
formalist and cognitivist approaches to language description and the description of the lexicon in particular.
Among these aspects, the most widely recognized concerns the importance given to communication as the
driving force of linguistic organization. A second important aspect is related to the role given to cognitivism in
lexical descriptions.

As a reaction to the Chomskian claim that language is a defining human competence, the first aspect shared by
most functional approaches is in the claim that communication is the most basic function of language. Language’s
representational power is less important for them than its communicative one. On the other hand, cognitive
linguistics, while sharing this emphasis on communication, has been less influential on lexical studies. Only when
allied with functionalist models is a cognitive approach to language description relevant for identifying the role
of the lexicon in language modelling.
For example, Goldberg’s works are based on how constructions are learned and processed and their own internal
relationships, apart from general cognitive processes such as categorization and various aspects of social
cognition. As with many other linguists, her work can be included within both the functional and the cognitive
paradigms.

Special attention should be given to these Spanish functionalists who first followed Coseriu’s lexicology. They
developed a combined model of lexical representation with strong ontological and cognitive connections.
Ricardo Mairal and Pamela Faber are key names in this line of work.

For them, the relationship between functionalism and the role of lexicon in language modelling and description
is first established and linked to ontologies. They recognize the importance of such links and provide a three-
group classification of theories depending on the scope of lexical representation:

• First group > include those lexical representations with purely syntactically relevant information.
• Second group > include those whose lexical representations feature a rich semantic component and
provide an inventory of syntactic constructions forming the core grammar of a language.
o These constructions use a descompositional system and only capture those elements that have
a syntactic impact.
• Third group > group of linguists which they define as ontological-driven lexical semanticists

An important development is Hengeveld and Mackenzie’s Functional Discourse Grammar. It features two
relevant aspects: it incorporates and emphasizes the level of discourse in the model; and it claims that this model
only recognizes linguistic distinctions that are reflected in the language in question. Because of this, the
conceptual component of the model restricts itself to those aspects with immediate communicative intention.

Hengeveld and Mackenzie stress the importance of not postulating universal categories of any nature until their
universality has been empirically demonstrated. This emphasis of FDG on not generalizing anything should be
valued in a context of proliferation of linguistic models based on little empirical background.

Butler recognizes the need for a more developed lexical component. However, the presence of the lexicon
permeates all levels of description precisely because this is only natural, having inherited Dik’s emphasis on the
Fund as a lexical repository.

Mairal’s later developments include a full incorporation of ontologies. This takes the form of a language of
conceptual representation (COREL) as in the development of his model where the initial 2002 lexical templates
have turned into meaning postulates and where a fully-fledged knowledge representation system is included to
integrate cognitivist modellization.

The inclusion of cognitivism has two different but related sides. On the one hand, conceptual schemes include a
semantic memory with cognitive information about words, a procedural memory about how events are
perceived, and an episodic memory storing information about geographical events.

On the other hand, these three types of knowledge are projected into a knowledge base. This in turn, includes a
Cognicon featuring procedural knowledge in the form of scripts, an Ontology focusing on semantic knowledge
that is represented in meaning postulates, and a general Onomasticon with names captured in two
microstructures, where pictures and stories are stored.

This language of conceptual representation (COREL) features a series of conceptual units. These, in turn, include
metaconcepts, basic concepts and terminal concepts, thus configuring a classic ontology. Again, COREL provides
a collection of hierarchically organized terms and a notational system. The authors claim that most of the
concepts are interpreted as having a lexical motivation in the sense that the concept is lexicalized in at least in
one language. And finally, these concepts have semantic properties captured in thematic frames and meaning
postulates.
Thematic frames specify the participants that typically take part in a cognitive situation, while meaning postulates
include two kind of features:
• Those with a categorization function > determine class inclusion
• Those with an identification function > used to exemplify the most prototypical members of the
category

Features are reduced to two classes:


• Nuclear > with a categorizing function
• Exemplar > with an identifying function

While the inclusion and development of an ontological perspective in linguistic modelization seems fruitful, the
attempt to include the role of cognition in functional linguistic descriptions leads to a dead end. The general
human linguistic processing and linguistic elicitation is indeed influenced by the general rule of human cognition,
but the actual linguistic codification and marking of meaningful items is probably determined by the basic
communicative nature of language.

4.4 Formal Models

Mendikoetxea explains how the division of labour between lexicon and syntax has always been a major issue in
the Chomskian tradition. Within this approach, the lexicon is considered a storage of lexical entries with arbitrary
semantic/phonological pairs together with idiosyncratic information about the lexical entries. In addition, a series
of formal features are translated into syntactic instructions that specify and determine the syntactic behaviour
of each lexical unit.

Mendikoetxea summarizes the type of problems that the lexical-syntax interface has to deal with:
• The levels at which lexical items are inserted in all these models
• The nature of lexical representations
• The relationship between lexical-semantic and syntactic properties of predicates

4.5 Cognitive Models

Lakoff and Langacker established the basis for an approach that focuses on how human perception, and the
human body in general, affect language.

One of the most widely accepted views among cognitive linguists is the idea that there is no separation of
linguistic knowledge from general thinking. In this sense, they oppose the influential views of other linguists,
such as Chomsky who see linguistic behaviour as just another separate part of the general cognitive abilities
which allow learning and reasoning.

Formal and functional approaches to linguistics are usually linked to certain views of language and cognition. For
instance, generative grammar is generally associated with the idea that language acquisition forms an
autonomous module or faculty independent of other mental processes of attention, memory and reasoning. This
view is often combined with a view of internal modularity so that different levels of linguistic analysis, such as
phonology, syntax, and semantics, form independent modules.

In Saeed’s view, cognitivists identify themselves with functional theories of language. Cognitivism holds that the
principles of language use embody more general cognitive principles. Saeed explains that the difference between
language and other mental processes is one of degree but not one of kind, and he adds that it makes sense to
look for principles shared across a range of cognitive domains. Similarly, he argues that no adequate account of
grammatical rules is possible without taking the meaning of elements into account.

Because of this, cognitive linguistics does not differentiate between linguistic knowledge and encyclopaedic
knowledge. The explanation of grammatical patterns can only be given in terms of the speaker’s intended
meaning in particular contexts of language use.

The rejection of objectivists semantics as described by Lakoff is another defining characteristic of cognitive
semantics. ‘Doctrine’ for Lakoff is the theory of truth-conditional meaning, which holds that truth consists of the
correspondence between symbols and the states of affairs in the world. Lakoff also rejects the ‘doctrine’ of
objective reference, which holds that there is an objectively correct way to associate symbols with things in the
world.

The conceptualization that language allows has been essential for us to survive and dominate other less
intellectually-endowed species. It is the abstraction potential of concepts that helps us to navigate the otherwise
chaotic world which surrounds us. Because speaker and hearer share a category [SNAKE], communication
between them has been possible.

Cruse explains how concepts are vital to the efficient functioning of human cognition, and he defines them as
organized bundles of stored knowledge which represent an articulation of events, entities, situations, and so on,
in our experience. If we were not able to assign aspects of our experience to stable categories, the world around
us would remain disorganized chaos. We would not be able to learn because each experience would be unique.
It is only because we can put similar (but no identical) elements of experience into categories that we can
recognize them as having happened before, and we can access stored knowledge about them.

One alternative proposal within the cognitive linguistic framework is called experientialism, which maintains that
words and language in general have meaning only because of our interaction with the world. Meaning is
embodied and does not stem from an abstract and fixed correspondence between symbols and things in the
world but from the way we human beings interact with the world.

Embodiment constitutes a central element in the cognitive paradigm. In this sense, our conceptual and linguistic
system and its respective categories are constrained by the way in which we, as human beings, perceive,
categorize and symbolize experience. Linguistic codification is ultimately grounded in experience: bodily,
physical, social and cultural.

What cognitivists approaches contributed to linguistic analysis is how general human cognition with operations
such as comparison, deduction, synthesis, etc., affects linguistic codification on many different levels. However,
these approaches sometimes tend to confuse the explanation of how meanings are understood with how they
are codified. Not all meaning that is understood or expressed by language is actually linguistically codified. There
is a lot of information that does not need to be linguistically codified because it can be retrieved from the
knowledge repositories and its ontological organization is already present in the human mind and continually
updated by new information.

Linguistic models should differentiate between communication and cognition as two different human
functionalities, which, although related, are different systems and should be explained differently when included
as parts of a linguistic model. The consequences of this distinction affect the design of both linguistic models and
their subsequent applications.
UNIT 2

WORDS AND WORD BOUNDARIES

1. INTRODUCTION

As dictionaries consist of words, we need to understand how words are formed.

2. INTERACTION BETWEEN SEMANTICS AND MORPHOLOGY

The semantic analysis of derivation processes addresses the issue of interaction between semantics and
morphological studies. Precisely because a change in meaning is often achieved by a general manipulation of
morphological resources available in each particular language, this connection is worth studying. The exploitation
of morphological analysis for the development of parsers in corpora is a well-known example of this connection.

Morphology is interconnected to all the other disciplines of linguistics. In this respect, Dieter Kastovsky has been
a pioneer of the idea of the important role of morphology in language analysis and studies. In the 70’s, thanks to
him and others, such as his mentor Hans Marchand, morphology broke out of its marginal role and started to
have a more important position on linguistics.

The fact that word-formation studies are connected to diachronic methods of language analysis is obvious: word
formation phenomena happen over time. They are a direct consequence of language change. However, we can
study them in synchrony, because in the present language we have proof of the effects of word-formation
processes in the language. Sometimes it is easy to observe language change:

• Transparent word-formation phenomena > whenever word-formation processes are overtly realized
o drive > driver, driving
• Opaque word-formation phenomena > cases in which we have a terminal word with no overtly realized
derivative morphemes, as in the case of the verb and the noun saw. Which one was first? The noun or
the verb? Marchand called this process zero derivation.

In order to see the derivative chain of non-transparent derivation phenomena, we have two types of analysis:

• A diachronic analysis > by looking at older stages of the language and be seeing which of these words
appeared first in texts and/or in lexicographical compilations
• A synchronic analysis > by making use of other devices, normally principles deriving from other areas of
language, such as semantics

In the latter case, we can refer to some of Marchand’s content-dependent criteria:

• Principle of semantic dependency > the word that is dependent for its analysis on the content of the
other pair member is automatically the derivative
• Principle of semantic range > of two homophonous words exhibiting similar sets of semantic features,
the one with the smaller field of reference is the derivative; i.e., the more specific word is the derivative.

Marchland also developed other criteria that linked morphology to other levels of language, such as phonology
and morphology, which he called form-related criteria:

• Principle of Phonetic shape > a certain phonetic shape may put a word into a definite word class
o Typical example > endings for nouns: -ation, -action, -ension, or -ment. Thus, verbs ending in -
ment could be considered derived from nouns in -ment
▪ To torment, or to ferment
• Principle of Morphological type > morphological type is indicative of the primary or derived character of
composite words
o It refers to the distinction between single or simple forms, and derived and compound forms
o Examples > blacklist (adj/noun), snowball (noun/noun), screwdriver (noun/noun), whose base
forms are list, ball, and driver, respectively.
• Stress criterion > with composites, stress is sometimes indicative of a directional relationship between
a substantive and a verb.
o Example > compound verbs with locative particles, since they have the basic stress pattern
“middle stress/heavy stress”, such as oùtlíve, ùnderéstimate, ìnteráct, whereas compound
nouns carry the heavy stress on the first element, as outhouse, únderwèar, or úndercùrrent.
Verbs carrying this pattern are called desubstantival verbs.

Additionally, Kastovsky also notices the interconnectivity of morphology and other areas of language. However,
even if morphology is the foundation for many fields of study, nowadays we still see that a model of description
for morphology in certain theories of grammar is under construction.

3. SETTING THE GROUNDS: REVIEW OF BASIC CONCEPTS

The kind of meaning we analyse in words and in word formation is related to the interconnection of form and
meaning; it is the type of linguistic analysis that connects word-structure and function. The word unit was
accurately outlined by Saussure with his famous distinction between signifier and signified, and later on, when
dealing with semiotics, by Ogden & Richards, who represented the process of meaning with their famous triangle:

It is important to note that the meaning of a word is not the object, but the concept that we have of that word
in our mind.

Concept > a concept is the mental image we have of a word, based on our daily experience.

Referent > the external, physical (or mental, if it is abstract) object that the word represents. For instance, for
the word smile, we have a mental image of something like: , the concept ‘smile’.

In the case of abstract concepts such as ‘love’ or ‘hate’, we do not have a physical reference, but we may relate
them to our thoughts and/or feelings. In both cases, referents are tied to our experience, and language is thus a
conceptualization of referents, of our external reality.

This is why different languages express different things according to what the speakers of these language
experience. As an example of this is the Spanish word merienda, referring to the snack or meal that is customary
in Spain between 17:00 and 18:00. Notice that the concept of ‘merienda’ is not lexicalized in other languages
such as English. Besides, defining it as an ‘afternoon snack’ can be confusing for an English recipient, for whom
the word afternoon refers to a span of time between 12 and 16 pm.

Finally, the word is usually identified as a lexical item, so we have to distinguish lexical item from lexeme.

4. WORDS AND WORD BOUNDARIES: LEXEME, STEM, LEMMA

A word is a lexical item. Words are separated by spaces or punctuation signs in written language. Both a word
and a lexeme are composed of morphemes. A word is a concrete unit of morphological analysis.

A lexeme is a basic lexical unit of language, consisting of one or several words, considered as an abstract unit,
and applied to a family of words related by form or meaning. A lexeme is an abstract unit of morphological
analysis in linguistics, which roughly corresponds to a set of forms taken by a single word. An example is run,
runs, ran and running, which are forms of the same lexeme, conventionally written as RUN.
The lexeme is then the set of inflected forms taken by a single word, such as the lexeme RUN including as
members run (lemma), running (inflected form) or ran, and excluding runner (derived term). Likewise, the forms
go, went, gone and going are all members of the English lexeme GO. Thus, word families in their inflectional
system are all part of the same lexeme. Lexeme, thus, refers to the set of all the forms that have the same base
meaning.

A lemma is the canonical form, dictionary form, or citation form of a set of words, i.e., of a lexeme. In English,
for example, run, runs, ran and running are forms of the same lexeme RUN with run as the lemma.

Lemmas and lexemes have special significance in highly inflected languages such as Turkish in order to observe
word-formation and inflectional processes. A lemma is also very important in the process of dictionary writing,
or in the compilation of any kind of lexicological or terminological work. The process of determining the lemma
for a given lexeme is called lemmatisation.

4.1 Lemma versus lexeme: on-going discussions

The distinction between lemma and lexeme is the concern of several areas of linguistics. Within the field of
psycholinguistics, Roelofs et al. state that the distinction between lemma and lexeme, known as lemma/lexeme
distinction is controversial. In their theory, the lemma of a lexical entry specifies its semantic-syntactic properties,
and the lexeme specifies its morphophonological properties.

There are two steps involved in the retrieval of lexical items:

• First step > lemma retrieval > the syntactic properties of the word and, in some views, its meaning, are
retrieved from memory
• Second step > lexeme retrieval > information about the morpho-phonological form of the word is
recovered.

However, Caramazza and Mizo state that the lemma/lexeme distinction is unnecessary, because there is only
one lexical level in the meaning of the word, and not two.

Within the field of lexicology, Kaszubski discusses the status of lemma versus the status of lexeme within the
framework of the building of a corpus database called the CELEX corpus. Kaszubski states that the lemma is the
headword of all lexical units included in a lexical entry, and it contains several types of linguistic information,
rather than just semantic information, as would occur in dictionary compilation.

CELEX is an example of what type of linguistic information a lexicological compilation may include:

• A lemma lexicon > is the most similar to an ordinary dictionary, as every entry in this lexicon represents
a set of related inflected words. In a lexicon, a lemma can be represented by using a headword such as
call or cat.
• A wordform lexicon > it yields all possible inflected words: each entry in the lexicon is an inflectional
variant of the related headword or stem. therefore, a word form lexicon contains words like call, calls,
calling, called, cats, cat and so on.
• A corpus lexicon > it gives you an ordered list of all alphanumeric strings found in the corpus with raw
string counts

The lexical data that can be selected for each entry in the different English lexicon types can be divided into
orthography, phonology, morphology, syntax and frequency.

4.2 Prandi’s view of lexical meanings: lexemes, the lexicon and terminology

As Prandi explains, there are two-fold roots of lexical meanings, both formal and cognitive. In particular, he notes
how it is relevant to define to what extent its structure depends on language-specific formal lexical structures,
and to what extent it depends on an independent cognitive categorization shared across different linguistic
communities.

Prandi also contrasts:


• Endocentric concepts > concepts whose identity is critically dependent on specific lexical structures
o It would be related in form to a specific language’s lexical structure
• Exocentric concepts > concepts firmly rooted in an independent categorization of shared things and
experiences, easily transferable across languages.
o It would be more cross-linguistic, precisely due to its higher independency from specific
linguistic categorizations.

As an example of exocentric concepts would be the concept of ‘mother’, expressed in different languages as
mum, mamá, mother, mère, moeder, etc. The lexicon that is related to the concept of ‘mother’ is similar, as their
lexical structure is independent from a specific language, and it is more related to universal categorizations.

Lexemes that carry exocentric concepts circumscribe within language-specific lexical structures a layer of natural
terminology, which finds its complement in specialized terminological repertories connected with special
purposes. Endocentric concepts, as they are more specialized, provide a useful bridge between natural language
and specialized lexicon, and between lexical and terminological research. Terms (specialized lexicon) connect a
signifier and a signified as any other sign, and their specificity lies in the way they are shared: not by a whole
linguistic community, but by groups containing specialised sectors of different linguistic communities.

5. THE LEVELS OF LANGUAGE ANALYSIS AND THEIR UNITS

There are different linguistic levels of analysis, or fields, which go from more abstract to more concrete, and from
more universal to more idiosyncratic. Furthermore, each of these levels is arranged in a way that the one above
comprises the level below. This organization of linguistic levels is represented in the internal structure of the
clause, and also in the internal structure of the word.

According to the functional paradigm then, Pragmatics, the outermost level of linguistic analysis, is seen as the
framework that embraces Semantics and Syntax. In turn, Semantics is one instrument of Pragmatics, and Syntax
is one instrument of Semantics. That is, we use syntax to convey meanings (semantics), and we use meanings to
communicate ideas in different, more or less subtle, ways (pragmatics). The main idea is that units at all linguistic
levels serve as part of the general enterprise: to communicate meaning.

This also means that meaning is a product of all linguistic levels. Let’s sew how.

At the level of Phonology, if we change one phoneme for another we get a different meaning: cat-rat. Thus,
phonemes carry some type of meaning. This system is also attached to meaning in the sense of prosody. When
we utter a sentence, we can transmit different meanings depending on how we utter it.

At the level of morphology, both inflectional and derivational morphemes provide words with specific meaning.

For instance, the word drivers is composed of three morphemes: driv(e)- + -er + -s: the root driv(e)- has two
bound morphemes, the derivational agentive suffix -er, and the inflectional plural suffix -s. The meaning that -er
gives is ‘person who does the action referred to by driv(e), that is, -er gives the meaning of agency at a semantic
level. -er also implies that the resulting derived word will be a noun. Thus, a derivational suffix interacts with
semantics and with grammar, since the derived word will belong to a specific or different grammatical class
depending on the derivational suffix.
As for the inflectional suffix -s, it does not modify the grammatical class of lexical items, but it changes the
grammatical information and the semantic information of the item. Inflectional affixes provide the word with
relevant syntactic information: if instead of one driver we have two or more drivers, the subject becomes plural
and the verb will also have to be modified in order to follow certain concordance rules. Semantically it is also
different to have one agent or to have two or more agents.

At the level of syntax, two different structures can produce slightly different meanings:

We see that the idea in each sentence is the same, but in each sentence the attention is directed to a different
part. In a) it may be emphasized that it was the window that John broke, and in b) it may by emphasized that it
was John who broke the window. Furthermore, a passive voice allows for the omission of the agent, which also
has meaning implications.

Thus, the way elements are organized and expressed in the sentence also carries meaning. Syntax is connected
to morphology and to semantics, and it is less equally realized across languages than phonology or morphology.

Semantics, as a linguistic discipline, studies the codification of the meaning that is transmitted among speakers
of a language. It can be approached from a sort of external logical and philosophical perspective or from an
internal perspective:

• External perspective > it can be seen as the study of concepts and of referents and their relations with
the surrounding world and it also has to do with the idea that language is a conceptual phenomenon.
o It is more dependent on culture and on other aspects that relate language to the world, than
syntax
o Semantics is the study of concepts and of referents, and these are directly tied to both the
external world and speakers. From this comes the idea that language is a conceptual
phenomenon.
o Thus, the interpretation of reality is systematized through the discipline of semantics.
• Internal perspective > in the case of lexical semantics, the study of meaning is connected to words and
the internal structure of conceptual phenomena as represented by such words, and it attempts to reach
higher levels of abstraction, that is, universality, as opposed to the idiosyncrasy of the external world.
o In this view, then, semantics is closely related to pragmatics.

At the level of pragmatics, pragmatics is the discipline that studies the connection of speakers to the world, i.e.,
it is the study of the speaker’s/hearer’s interpretation of language. Thus, it is meaning described in relation to
speakers and hearers, in context.

Language as a conceptual system

It supports the idea that language structures and cognition are related. It means that sometimes, different
languages express differently the same things, depending on the real world that surrounds them. Lakoff
exemplifies this by the fact that in some languages there are only three words for colours: red, black and white,
where they obviously perceive other colours such as the green that grass shows.

Around all this there is still an open debate in linguistics, closely linked to the question of how knowledge is
acquired. There are two main positions on this:

• Universalists > holding that there are pure ideas out of which we form concepts and also emphasizing
the idea that linguistic universality is based on our common biological endowment.
• Relativists > hold that we acquire knowledge through experience.
What is universally shared by all human beings is our ‘hardware’: our innate ability for language, to see the
underlying structure of a string of symbols and to extract a meaning from it, and going further, to relate that
meaning to our external world and to our own experiences as human beings. We have evolved more quickly than
all other species thanks to language.

This organization of language levels provides an important principle in functional theories of language: the more
semantically motivated a phenomenon is, the less cross-linguistic variation we find. That is, those lexically and
grammatically codified concepts are more widespread because they are based on our common human cognitive
equipment. On the other hand, those meanings based on contextual aspects are more language dependent. As
a result, the more pragmatically motivated a concept is the more cross-linguistic variation we find. This takes us
to the distinction made in functional grammars of linguistic levels:

The more outside the circle we go, the more variety we find across languages, that is, the less universal languages
are.

The inner circle shows that phonology deals with the smallest elements of language and that it depends or is
directly influenced by syntax, which is, at the same time, influenced by semantics, which, in turn, is influenced
by pragmatics. Pragmatics is the discipline that deals with the meaning of linguistic units in context. That is, it
goes beyond language itself and relates it to the world.

However, in order to complete this circle we should include morphology, which would be at the level of the word,
and this it should be included between the level of phonology and the level of grammar, being the complete
circle as follows:

In the chart below we have an outline of the main objects of analysis for each sub-discipline:
GRAPHICS: It refers to the systematic representation of language in writing. A single unit in the system is called
a grapheme.

PHONOLOGY: it deals with the sounds of a language and the study of these sounds. The study of the sounds of
speech taken simply as sounds is called phonetics. The study of the sounds of a given language as significantly
contrastive members of a system is called phonemics, and the members of the system are called phonemes.

Prosodic features are also studied by phonology: pitch, stress, tempo. They can even indicate grammatical
meaning.

MORPHOLOGY: it is the arrangement and relationships of the smallest meaningful units in a language. These
minimum units of meaning are called morphemes. In greyish, then, we have the base morpheme grey plus the
suffix morpheme -ish.

We have two types of morphemes: free and bound. Grey is a free morpheme and -ish is a bound morpheme,
since grey can appear independently, not as -ish, which cannot. Bound morphemes only form words when
attached to other morphemes. They are called affixes, and we have four types: prefixes, suffixes, infixes, and
circumfixes.

Affixes are basic units in morphology, and they can be inflectional or derivational.
• An inflectional affix indicates a grammatical feature such as number or tense.
o -s used to form plurals
o -ed used to indicate past tense
• A derivational affix usually changes the category of the word they are attached to, and they may also
change its meaning: e.g. the derivational suffix -ive in generative changes the verb generate to an
adjective.

Morphology is also a cross-linguistic phenomenon, in the sense that all languages have lexical items. What is
different is how the lexicon (words) of a language interacts with each other and within each other, since this
brings about different language types of language. This system is interrelated to syntax, in the sense that the
more inflectional a language is the less need it has to apply syntactic rules. In turn, a language with few
morphological processes tends to give more weight to the role of syntax in arranging and construction of
meaningful sentences.

SYNTAX: Syntax deals with word order and grammatical functions. Word order is a grammatical sign of natural
languages, even if there are some languages like English depending on it more heavily than others do.
Grammatical functions are related to the functions of different parts of speech: nouns, adverbs, adjectives, verbs
and to how they can be organized.

SEMANTICS: it is the study of the meanings expressed by a language. It also studies the codification of the
relationship between language and the external world.

All these systems interact in highly complex ways within a given language. Changes within one subsystem
produce a chain reaction of changes among other systems.

As regards the development of a model of morphology, all language models agree that it is an essential part of
language when we have to describe the components of a grammar, which are two: a rule component and a
lexicon which contains the words and morphemes that appear in the structures generated by the cited rules. This
system is represented as follows in Role and Reference Grammar:

In certain linguistic models such as Role and Reference Grammar (RRG) it is defended that grammatical structures
are stored in constructional templates, which can be combined with other templates, and which contain
morphosyntactic, semantic and pragmatic properties. These templates are grammatical structures in all
languages. They are also stored in the specific syntactic inventory of a specific language. They also include a
lexicon, formed by lexical items, and stored in the lexical inventory. The English language in particular has five
basic syntactic templates, which can combine to form more complex structures.

UNIT 3 – ON THE ARCHITECTURE OF WORDS

1. INTRODUCTION
What is morphology? The term morphology is attributed to the German poet, novelist, playwright, and
philosopher Goethe. In linguistics it is the branch of the discipline that studies the components of words: their
internal structure, and how they are formed.

2. WHAT IS THE SCOPE OF MORPHOLOGY? HOW TO SEPARATE DOWN A WORD

Morphology is the study of the internal structure of words (word grammar) and of words’ behaviour (word
formation phenomena). Words are composed of morphemes. A morpheme is the smallest linguistic unit with a
grammatical or semantic function.

A morpheme can consist of one word, such as way, car, ball, or a meaningful part of a word, such as the -ed of
played, which means that the verb play refers to the past. All of these morphemes cannot be divided into smaller
parts.

Morphemes can be:


• Free morphemes > those occurring independently, such as ask or drive
• Bound morphemes > those that have to be attached to other morphemes in order to exist. They are
also called affixes.
o Affixes that appear before the stem or root > prefixes
o Affixes that appear after the stem or root > suffixes

For example, in the noun disagreement both dis- and -ment are affixes, attached to the stem agree.

An interesting discussion arises when dealing with locative particles that can be considered as affixes. This is the
case of under, for example. Under- would not be considered a prefix here, but rather a free morpheme that
combines with a base, in order to form a compound. This is because it can occur alone: under.

The limits between compounding and derivation can sometimes blur, especially in the case of locative particles.
In order to see whether we should consider a word as compound or derivative, we can apply two criteria:

1) Whenever the particle can occur independently (i.e. as a free morpheme), we have a case of compounding.
For cases in which the word cannot occur alone, we are faced with a case of derivation. In this light, we could say
that underestimate is a compound, but unnecessary is a derivate, because un- cannot occur as a word on its own.

2) However, in the case of the verb overtake, over- has already lost its literal locative meaning as in over. Overtake
does not mean ‘to take someone over something’. Here, we can say that over- is indeed a prefix, not a free
morpheme.

Therefore, prefixes are semantically very important for the meaning of word.

The basic meaning of the word is given by the centre of the word, which is in fact the nucleus, and the elements
of its right and left provide meanings which also cover the nucleus: the core. Additionally, there can be other
layers, encapsulating the meaning of the core, and where the affixes’ meaning will modify such core. This implies
that, for us, the word structure has the same organization as the clause, and as language itself.

Let’s see this in this example > the derivate unpremeditatedly > [3.un-[2.pre-[1.meditat(e)]2.-ed]3.-ly]

Step 1. Nucleus: meditate: ‘to think about something deeply’

Step 2. Core: pre-………-ed > the prefix pre- and the suffixed -ed, affects and modifies the nucleus, meditate.

Step 3. Periphery: un-…………….-ly > here, again, the (semantic and grammatical) meaning of the affixes
(grammatical only in the case of the suffix) exerts an influence on the meaning of the core, premeditated, by
modifying its meaning. Thus, the negation prefix un- will negate the meaning referred to by the core, and the
adverbial suffix -ly will modify the grammatical class of the core, changing it from adjective or participial to
adverb, and also the semantics, since -ly is an adverb that indicates manner.

It is common that the more a word derivates from others, the less frequently the word is used. As a result, we
have words such as unpremeditatedly that are not even recorded in all dictionaries.

Word frequency is interesting because it shows the degree of derivation of a word. The more frequently used,
the more primitive it is. It also shows which affix comes first in the derivative chain. Thus, for unpremeditatedly,
what was added first? The rule says that it is the prefix that comes first, but it may not be necessarily be so. If we
want to make sure of the derivative process of a word we can refer to corpora. The more results we obtain for a
word, the more frequently used that word will be, and thus the more primitive.

Meditate, the base word, has the highest frequency of occurrences. The second most frequent word is
premeditated. Thus, the derived word has become more frequent than the base word (also called stem), which
is a less common situation.

In the second process of derivation, known as recursion or recursivity (when predicates are further derived by
means of the attachment of another prefield or postfield predicate), premeditated derives into unpremeditated.
It is therefore the most frequent word of all those undergoing recursion.

Summarizing: with the exception of premeditated, the more recursion, the less frequency of use.

There are other types of affixes, more seldom seen (and inexistent in English) called infixes and circumfixes.

• Infixes are affixes that are not bound to the right or the left of a word, but that occur in the middle
o An example is the Tagalog infix -um

• Circumfixes are affixes that come in two parts: one attached to the right of the vowel and the other to
the left
o One example is the Indonesian ke-………-an in the form kebesearan
o In English there are no cases of circumfixes.

Another important term in morphology is morph or allomorph, which refers to the specific phonological
realization of a morpheme. For example, the plural morpheme -s is realized as [z] after the voiced final of play,
and as [s] after the voiceless /t/ of cuts. The realization of one allomorph or the other depends on the place of
articulation of the final phoneme of the stem.

In this regard, stem is also a basic concept. It is the basic unit to which another morphological piece is attached.
Aronoff and Fudeman state that it has to be distinguished from the root, which is like a stem in that it constitutes
the core of the word to which other pieces attach, but it refers only to the morphologically simple units. Let’s see
an example:

disagreement < disagree < agree


Agree is the root of both disagree and disagreement. Likewise, agree is also the stem of disagree, but not of
disagreement. The stem of disagreement is disagree. Thus, they distinguish between simple stems (agree) or
complex stems (disagree). One stem can be realized phonetically by different roots. The abstraction of these
realizations, independent from all its allomorphs, is the stem. Thus, in the same way that a morpheme such as -
s has different allomorphs (-s, -es, etc.), a specific stem can have different allomorphs. This is what we call stem
allomorphy, as in Spanish would be muevo [muev-], from mover. This type of classification allows us to account
for irregular patterns too, such as the English more/-er/better allomorphy, which is called root allomorphy or
irregular allomorphy in Distributed Morphology.

3. LEXICAL CHANGE AND WORD FORMATION PROCESSES

3.1 Introduction

Word formation processes are common to all languages. What changes cross-linguistically and diachronically is
the frequency of occurrence of one type of process or another. Thus, during the Old English period, the most
common word formation processes were derivation and compounding, whereas present day English shows a
higher tendency for other word formation phenomena such as borrowing and acronymy.

3.2 Word formation processes

Morphological relations such as the process of derivation involve semantic meaning. Thus, strong-strength-
strengthen show the same formal relationship as long-length-lengthen. However, English does not always show
a correspondence between a formal process of derivation and the semantic relationships that result from these
processes. As an example, deep/depth/deepen as opposed to silent/silence/silence or beautiful/beauty/beautify.

3.2.1 Compounding

Compound words are composed of at least of two base morphemes (i.e., free morphemes): screwdriver,
skyscraper, landowner, etc.

Would, in this sense, taxi driver be considered a compound word? No, because it is composed of two separate
lexical items.

Compound words can undergo further word formation processes, such as inflection (screwdrivers), and the base
words that are joined together do not necessarily have to be “primitives” (i.e., as yet non-derived). Thus, the
word formation process that has taken place in screwdriver is:

(screw) + (drive(e)-er) > driver) > screwdriver

A question arises with compounding: if two of the morphemes are free, which of them is the head-base of the
word? Prototypically the head of the word is the base that provides the grammatical category for the compound
word, that is, the free morpheme located on the right.

Another criterion is the semantic one: The head is the word that provides most of the semantic content related
to the definition of the output. Since screwdriver refers to an instrument, the head of this compound is driver,
and screw is the adjunct that modifies the meaning transmitted by driver, indicating what type of tool it is: an
instrument used for screws (object, theme).

3.2.2 Inflection

Inflectional morphemes (which, in the case of English, are always suffixes) indicate grammatical features such as
number or tense.
They are directly linked to the syntactic structure of the clause: if a subject has an added suffix -s, it means that
the subject is a plural noun (phrase), so the verb has to agree with this plural form and be used in plural form.
Likewise, if a verb in English ends in –(e)s it means that the subject is singular and refers to a third person, so a
third person singular form has to be selected.

The inflectional system of a language is very important for language typology in order to classify languages into
different types. The genetic (or genealogical) classification is based on the idea of ‘language families’, that is to
say, on the assumption that some languages are closely related because they derive from a common ancestor.
Italian, Spanish, French, Portuguese, and Catalan are all members of the same family of Romance languages.
They all come from Latin.

The second type of classification is the typological, which is based on a comparison of the formal similarities that
exist between languages. It is an attempt to group languages according to their structural characteristics
regardless of whether they have a historical relationship. These formal similarities are mainly based on the
inflectional system of the language.

3.2.2.1 Typological classification of languages

Pyles and Algeo established four kinds of languages, according to their inflectional system.

Isolation, analytic or root languages

An isolating language is one in which words tend to be one syllable long and invariable in form. Each word has
one function. Words take no inflections or other suffixes. The function of words in a sentence is shown primarily
by word order, and also by tone. An example is Chinese.

Agglutinating or agglutinative languages

An agglutinative language is one in which words tend to be made up of several syllables, and each syllable has
one grammatical concept. Typically each word has a base or stem and a number of affixes. An example of such a
language is Turkish.

Inflective or fusional languages

Inflective languages are like agglutinative ones in that each word tends to have a number of suffixes. However,
in an inflective language the suffixes often show great irregularity by varying their shape according to the word-
base to which they are added. Also, a single suffix tends to express a number of different grammatical concepts.
An example is Latin.
Polysynthetic or incorporating languages

An incorporating language is one in which the verb and the subject or object of a sentence may be included in a
single word. What we would think of as the main elements of a sentence are joined in one word and have no
independent existence. For example, Eskimo.

3.2.2.2 The index of synthesis of a language

These four language types are just mere abstractions. No real language belongs solely to any one type. For
example, the English language has a use of monosyllables like to, for, when, not. Furthermore, it relies on word
order to signal grammatical meaning, which is proper of isolating languages. However, the existence of paradigms
like ox-oxen, show-shows-showing-shown, good-better-best is typical of inflective languages. Also, words like
activist, which are built by adding suffixes one by one to a stem (act, active, activist) is a characteristic of
agglutinative languages. And finally, forms like baby-sitting or horseback-riding are the hallmark of an
incorporative language.

So, what type does English belong to? it is more realistic to speak of tendencies, that is, to say that a given
language is more analytic than synthetic, or that it tends to be highly incorporative or highly agglutinative. It can
be said that English was originally closer to a synthetic language but it has gradually evolved towards “analycity”.

The problem is how to measure these tendencies. Greenberg provided a measurement tool, which he called “the
index of synthesis”. A low index of synthesis tells us that the language is analytic (Chinese). A higher index of
synthesis characterizes the language as synthetic. Agglutinative and inflective languages are both synthetic. A
very high index of synthesis identifies the language as being polysynthetic or incorporating.

3.2.3 Derivation

Derivational suffixes change the category of the word they are attached to, and they may also result in a change
in meaning. Example: -ive > generative; -less > joyless, colourless.

Types of derivation

Derivation occurs in different ways:


• we have to add affixes > addition through prefixation, suffixation or infixation
• to remove a morpheme > reduction or subtraction
• to mute something or change the vocalic structure of a syllable > conversion
• to add nothing > zero derivation

More than one of these processes can take place at the same time, such as mutation and addition.
As prefixes are on the left, they do not modify the word class of a word. Thus, prefixes only affect the semantic
meaning of the word, not the grammatical meaning, as suffixes can do. Thus, derivational suffixes provide
semantic meaning to the lexeme, but also add grammatical meaning.

Derivational suffixes can be classified into different types according to the category or word class they are
attached to, and the grammatical word class they give way to. Lets see some examples:

It is important to note that derivational suffixes can have different grammatical functions (and therefore
meaning). Thus, what looks like a single morpheme can in fact be two morphemes. An example is -ly: if we attach
-ly to an adjective the resulting word will be an adverb (quick > quickly), but if we attach -ly to a noun the result
is an adjective (love > lovely). Thus, there are two suffixes -ly: one that brings about adverbials and another one
that produces adjectives.

3.2.4 Other word-formation phenomena

Mutation > the words proud and pride are both semantically and formally related, but it is impossible to say that
one is formed because something was added to the other. Rather, derivation is accomplished here by a change
of a vowel; in other word pairs the change may be in consonants, as in believe and belief, or both vowel and
consonant, as with choose and choice; or by change of stress, as in verbs extráct, insúlt, progréss, in contrast to
nouns éxtract, ínsult, and prógress.

Conversion, zero change or zero-derivation > this is a simple change of a word from one class to a word of
another class with no formal alteration. Another term to identify this phenomenon is functional change.

With regards to the productivity of zero-derivation, Marchand notes that English is more productive than other
languages such as Latin, French, Spanish and German in derivations, and especially in the denominal verbs.

Examples of zero-derivation in English are clean, dry and equal, which are adjectives and also verbs. Fan, grasp
and hammer are verbs and also nouns; capital, initial and periodical are nouns and adjectives.

Abstraction or reduction > reduction is a word formation process by which new lexemes are formed by way of
removing parts of certain lexemes.

Acronym is a word derived from the written form of a construction; a construction is a sequence of words that
together have a meaning. The acronym is typically, but not always, formed from the first letter of each written
word. With a few exceptions, acronyms are essentially names.

• Some acronyms are pronounced as a sequence of letters: UK for United Kingdom


• In other acronyms the letters combine to produce something pronounceable: AIDS for Acquired Immune
Deficiency Syndrome.

Clipping is the use of a part of a word to stand for the whole word. Laboratory is abbreviated to lab, telephone
to phone. However, clipped forms sometimes come to have meanings that are distinct from the original sources.
Divvy is used as a verb whereas dividend is a noun. In other cases, the clipped form may have a connotation
different from the source word: hankie, undies are ‘cuter’ then handkerchief, underwear respectively.

Another form of reduction is blending, such as brunch (breakfast + lunch), workaholic (work + alcoholic).
In other stages of the English language the phenomena of word reduction took place at the level of phonology
(i-mutation, umlaut). There are even cases of internal sound loss, within the stem. This loss may either be related
to vowel length, or to the change of a diphthong to a short vowel. In both cases we have a reduction in vowel
quantity.

3.2.5 Transparent and opaque word formation phenomena

When one word is formed by adding a suffix to another, such as paint-er, or by subtracting one or more elements
from another, such as in prep from preposition, the direction of derivation is clear.

But the direction is unclear when the process is mutation or conversion. These are instances of opaque word
formation phenomena. In these cases there may be a problem in deciding what is derived from what: which
comes first, the noun hammer or the verb to hammer, the noun kiss or the verb to kiss?

One general principle is to go from concrete (more primitive) to more abstract meanings (more derived). The
noun hammer denotes a concrete object, the verb hammer means the use of that object; we can say, then, that
the verb is derived from the noun.

In numerous instances, however, we may have to be arbitrary in saying which the primary or basic word is and
which is the derived word. Other principles were given in chapter 2. Opaque phenomena are zero-derivation
phenomena, and transparent phenomena are the rest.

Lexical change takes place when new words appear, or when words that existed before are not used any longer.
Notice that these changes are different from semantic changes, in that in semantic changes the same word
undergoes a change in its meaning. Lexical changes are more idiosyncratic (that is, they are more tied to culture),
so here we will find less regularity.

4. ON WORD GRAMMAR: APPROACHES TO THE STUDY OF MORPHOLOGY

4.1 Introduction

Morphology deals with the internal structure of words: with their form and meaning. We have word grammar,
as opposed to the notion of sentence grammar. That is, it is a set of correspondence rules between forms and
meanings of words, and thus word families are sets of related words. The morphological system has the following
sources:
• Complex words > word formation
• Borrowing
• Phrases becoming words > univerbation
• Word creation > word manufacturing: different from word formation in that in the latter the meaning
of the new word is recoverable from that of its constituents, and is typically an intentional form of
language use.

4.2 Paradigmatic and syntagmatic relationships


A paradigm is a set of linguistic elements with a common property. From the paradigmatic perspective,
morphology is seen as the systematic study of form-meaning correspondences between the words of a language.
From the syntagmatic perspective, morphology is seen as the study of the internal constituent structure of words.

An example of syntagmatic relationships would be the relationship that holds between the and book in the book.
An example of paradigmatic relationships would be a and the when added to book; we can say either the book
or a book, but not *the a book.

In this light, there are two main approaches to the study of grammar. The first is the syntagmatic approach to
morphology. The main features are:

1. Morpheme-based morphology: the focus is on the analysis of words into their constituent morphemes.
Thus, morphology is the set of principles to combine morphemes into words.
2. Affixes are represented as bound lexical morphemes, with their own lexical entry.
3. Morphological use of the notion of head
4. Morphemes: minimal linguistic units with a lexical or grammatical meaning.

The challenge to this approach is non-concatenative morphology. This refers to the existence of morphological
operations that do not consist of the concatenation of morphemes. An example is the past tense of irregular
verbs in English, which is not formed through adding -ed to the base of the verb, but through alternations. This
pattern can only be expressed in paradigmatic terms.

The second approach is the paradigmatic approach to morphology. The justification of the existence of this
approach is the existence of paradigmatic cases of word formation, where by replacing one constituent with
another, a new word is formed. For instance, the feminine word for policeman is policewoman (analogy). The
main features of this approach are:

1. Lexeme-based morphology > lexemes are the starting point of morphological processes
2. The creation of new words is primarily the extension of a systematic pattern of form-meaning
relationships to new cases. Once we discover the abstract systematic pattern behind the words, we can
extend the pattern to others. For example: from eater to swimmer.
3. Bound morphemes like -er do not have lexical entries of their own, but exist as part of complex words
and of abstract morphological patterns such as pattern [x]v: [x-er]N “one who Vs”, as in swim: swimm-
er. Patterns are useful for predicting the correct combinations of affixes and stems.
4. Language acquisition > in mother tongue acquisition, one has to discover the existence of morph
patterns.
5. Syntagmatic interpretation > a pattern is a morphological rule for the attachment of bound morphemes.
Morphological rules must be lexeme-based because this allows poly-morphemic lexemes to function as
the bases of word formation.
6. The assumption of this affix-specific rule implies that bound morphemes do not exist as lexical items on
their own, but as part of morphology rules.
7. We can also express these patterns in the form of templates.
8. Semantic > the meaning of a complex word is a compositional function of the meaning of the base word
and its morphological structure. The category determining the nature of an affix is expressed in that
template. They reflect the way in which language users acquire the morphological system of a language.

Both approaches may lead to same analysis of word structure. The main difference is that in the lexeme-based
approach the bound morpheme -er has no lexical category label of its own, and it is not a lexical entry.

4.3 Some recent proposals for a model of morphology in grammar

The first time that the idea of word grammar was introduced to talk about morphological processes was within
the generativist school, under the influence of American Structuralism. Within it, there are two different
positions: the lexicalist hypothesis and distributed morphology.
Distributed Morphology is a model within the generative tradition opposed to the lexicalist hypothesis in some
theoretical aspects dealing with morphology. For instance, it is held that every morpheme occupies a slot in the
functional structure of the clause (morpheme-based approach), whereas in the lexicalist hypothesis it is held that
complex derived forms are inserted into the syntactic structure (lexeme-based approach) and that morphology
does not reduce to morphosyntax.

Within distributed morphology, the structure of a sentence is a structure that has morphemes as its terminals,
instead of words. In distributed morphology the only unit of “insertion” is thus the morpheme. Idiomatic or non-
compositional meaning is consequently handled in a different way.

On the other hand we have the lexicalist hypothesis, in which both words and phrases have syntax, which is not
the same as clause syntax – that is, they have parts, and there are rules and principles for putting the parts
together and for determining the properties of the resulting constructs.

The example given by Williams: How completeness? justifies the fact that syntax cannot look into morphology
and vice-versa. He says that we can know that for how we need an adjective, not a noun, so it is the external
property of the word, the category, which give us this information. That is, outer morphological properties of
words are related to syntax, and inner properties are related to morphology. This has an important implication,
the modular conception of language. This gives way to a principle called the Hypothesis of Lexical Integrity.

In functional grammar, they use the same system of analysis and representation, by which word syntax works in
the same way as phrase/clause syntax. This does not mean that when we are paying attention to the properties
of a phrase, such as how completeness, the inner properties of the adjective can play a role in the phrase. The
fact that the outer property of the word, that is, its category, is important for the clause, shows that there is
interaction. This communication and interaction among levels is also acknowledged in the lexicalist hypothesis,
but the rest of the ideas are denied.

In Ibañez Moreno’s view, generative theories and functional theories do not seem to agree, but this is just
because they focus on different aspects. The lexicalist hypothesis does not pay attention to the fact of whether
the system of syntactic phrase analysis can be applied to smaller linguistic units; it only focuses on the analysis
of inflectional and derivational affixes and especially on the role they play in the clause. Their focus of attention
is not, therefore, the same.

From her view, the clause and the word work with similar principles but they correspond to different levels of
language and thus each level has to be equated to specific analysis. Thus, we have to apply the same principles
at different levels but not mix them, with the exception of those outer properties that act as linkers of each level:
in the case of morphology and syntax, gender and category are the morphological properties that are realized
syntactically.

Thus, there is a clear interaction of all levels of grammar. Also the idea of word syntax is originally generative, as
taken from American Structuralism. The difference between functional models and generative models is that
generative models do not apply the same principles of analysis in the word and in the clause, since it is defended
that, even though there may be some interaction among them, the different modules of language are
independent, with their own internal mechanisms. Furthermore, functional theories are monostratal, while
generative theories are multistratal. There are, nonetheless, some ideas that may contribute to the development
of a functional model, as present in the lexicalist hypothesis.

Finally, some remarks need to be made on the similarities and differences between the generative (or
transformational) and the functional frameworks. The differences among them lie mainly on the degree of
interaction they admit and, above all, the descriptive and explanatory model used to account for this interaction.
What is different is that in functional grammars the same principles are applied to the analysis of words and of
phrases and clauses, and in that they focus on different issues. Generative theories focus on the analysis of
inflectional and derivational affixes and on the role they play in the clause. Functional grammars focus on the
interaction of these affixes with the other word constituents and also with the clause. Their attention is not then
focused on the same topics.

UNIT IV – WHAT IS LEXICOGRAPHY?

1. OBJECTIVES

Firstly, to review some basic notions related to the writing of dictionaries and the problems involved in the
definition of a lexical entry. Secondly, the links between the shape of a dictionary entry and different theories of
meaning.

2. INTRODUCTION TO THE WRITING OF DICTIONARIES

Wolfgang Teubert’s lament about the difficulty of pinning down word’s meaning when trying to write dictionaries
is:

When it comes to word meaning, we are in dire straits. Native speakers understand the meaning of
words of their language. But they are less competent in describing these meaning. This incompetence
seems to be shared to some extent by lexicographers. Whatever the reason may be, this may explain
why linguistics as we know it has been preoccupied with grammar. Rules are more elegant than the
intricacies of meaning.

3. MEANING AND DICTIONARY ENTRIES: WHERE MEANING ABIDES

3.1 Lexicography and linguistic theory

Apresjan explains how understanding lexicography as closely related to linguistics is of paramount importance.
He bases his lexicographical work on five principles:

1. Reconstructing the pattern of the conceptualizations underlying lexical and grammatical meanings of a
given language.
2. Integrating linguistic description in dictionary entries so that entries in a dictionary are sensitive to
grammatical rules.
3. Searching for systematicity in lexicon as manifested in various classes of lexemes such as searching for
lexicographic types, lexico-semantics paradigms, regular polysemy, etc.
4. Emphasizing meticulous studies of separate word sense in all of their linguistically relevant properties.
5. Formulating rules governing the interaction of lexical and grammatical meanings in the texts.

As Apresjan exemplifies, what the different languages codify when they put a somehow vague concept into
words is not the same in all languages. This is the first lexicographic problem. It is important to distinguish if the
codification of a certain meaning in a certain language is achieved through the grammaticalization or the
lexicalization of such concept or idea.

His second principle states that every linguistic description is ultimately made up of a grammar and a dictionary.
This has to do with the idea that ontologies, grammars and descriptive algorithms are closely related. This is why
the study of ontologies are based on the final reference to a set of entities and a set of relations that link those
entities. This in line with the Apresjan requirement of coherence and compatibility between grammars and
dictionaries.
Apresjan defines lexicographic types as a group of lexemes with a shared property or sets of properties,
necessarily semantic, which are sensitive to the same linguistic rules and which should be therefore uniformly
described in the dictionary.

He exemplifies this property comparing factive (know) and putative (think, believe, consider) verbs because in
English both classes of verbs cannot occur in the progressive form. However, he also illustrates how the
syntactically codified differences between both of them lead to certain lexicographical differences that, again,
can be interpreted as lexicographical rules that help dictionary writing.

His third semantic principle refers to lexico-semantic paradigms. Because the organization of the lexicon consists
of grouping those word classes, which have a common core meaning and which feature predictable semantic
distinctions, lexico-semantic paradigms can be described. Apresjan’s example here refers to converses. Such
terms denote the same situation seen from different perspective. Because of this, such terms assign different
syntactic ranks to the “actants” and may therefore enforce different theme-rheme articulations. Verbs such as
buy, sell, pay, or cost have different syntactic organization, depending on where the highest syntactic rank is
placed according with what is made more relevant in each case.

Apresjan’s third lexicographical principle requires that all salient lexical classes should be fully taken into account
and all their linguistically relevant properties are uniformly described in a dictionary.

And this leads to his fourth principle that features what he describes as a lexicographical portrait: an exhaustive
characterization of all the linguistically relevant properties of the lexeme, with particular emphasis on the
semantic motivation of its formal properties. However, he claims that a certain property is considered to be
linguistically relevant if there is a rule of grammar or some other general rule that accesses this property.

As for his last principle, it refers to the interactions of meanings in the text and the subsequent need to identify
the so-called projection rules, semantic amalgamation rules, and the like.

3.2 Lexicology and Lexicography

There is a distinction between lexicological and lexicographical approaches to meaning. Most authors recognize
a division between a more theoretically oriented side of the study of lexicon, closely related to semantics on the
one hand, and the application of this type of knowledge on the other. For Ullman, lexicology together with
phonology and syntax are the three structuring disciplines of linguistics, whereas lexicography constitutes a
special technique related to the writing of dictionaries.

Additionally, we must also distinguish between lexicology and terminology: whereas lexicology is the collection
of ‘all’ the words of a language and all their meanings, terminology deals with subsets of the lexicon (special
languages, that is, LSP).

Terminology, in turn, also has to be distinguished from terminography, which refers to the methodology followed
in the collection of terms. It is then, a special technique related to the collection of terminologies.

Butler says that the art and craft of lexicography is concerned, at its most basic level, with the description of a
given language, and he also adds that lexicography is concerned with the description of all or some of the words
of one or more languages in terms of their characteristic features, notably their meaning.

On the other hand, Porto Dapena thinks that lexicography can really be understood as an art or a technique
leading to the writing of dictionaries. He claims that the architecture of a dictionary is determined by the order
imposed on its entries and that the alphabetical ordering is the most widespread and practical.
Sydney Landau explains the different ways of approaching meaning that linguists and lexicographers have:
“Whereas philosophers are concerned with the internal coherence of their system, lexicographers are concerned
with explaining something their readers will understand”.

He also says that an important variable affecting the writing of dictionaries is the kind of format in which they
are written. The only possibility until quite recently was the book format. However, electronic format is now
becoming at least as important as the traditional book format.

There is a well known difference in the way in which meaning is handled in dictionaries and thesaurus.
Semasiologically and onomasiologically. The first starts from lexemes and the second from concepts. Halliday
proposes two principal methods of describing words in the sense of lexical items. One methods is by writing a
dictionary and the other by writing a thesaurus.

• In a thesaurus words that are similar in meaning are grouped together. For example, all words that are
species of fish.
o There is no separate entry for each word, and the words appear simply as part of a list and it is
the place of a word in the while construction of the book what tells you what it means.
• In a dictionary, words are arranged so that you can more easily find them, that is in “alphabetical order”.
Therefore, the place where a word occurs tells you nothing about.
o in a dictionary each entry stands by itself as an independent piece of work.

Aitchison et al define a thesaurus as a controlled list of terms linked together by semantic relationships. That is,
it is the vocabulary of a controlled indexing language, formally organized so that the a-priori relationships
between concepts are made explicit. They can be monolingual or bilingual. We can also find specialized thesauri,
depending on own specific interests, such as Old English, wine making or architecture.

A restricted type of dictionary is called a glossary. A glossary is an alphabetical list of terms in a particular domain
of knowledge with the definitions of those terms. Traditionally, a glossary appears at the end of a book and it
includes terms within that book that are either newly introduced or at least uncommon.

3.4 Types of dictionaries

The criteria used to classify the different types of dictionaries include a number of aspects. There is a broad
division between:
• terminological dictionaries, which deal with specific areas of knowledge (also called Language for
Specific Purposes or LSP)
o this are restricted to specific fields and subfields
• linguistic dictionaries
o forms a whole subarea in dictionary writing devoted to terminology closely linked to its
computer treatment via corpus analysis.

A taxonomy of linguistic dictionaries can be carried out according to a number of points of view:

1. A time perspective would divide dictionaries intro diachronic and synchronic dictionaries
2. The treatment of entries includes considering their volume, extension, way of ordering them, and type
of entry format.
3. The linguistic level, which refers to the linguistic differentiation among use, norm and discourse.
4. Whether the dictionary is monolingual or bilingual.

Lexicographers put special emphasis on the selection of a lexicographic corpus and Porto Dapena refers to it as
the database used to write the dictionary. Obviously the kind of texts selected will affect the character of the
dictionary. A reference file and a corpus file are two important and interrelated sources of information.
As Sterkenburg explains, there are many types of dictionaries of all kinds, but the approach taken by him is to
find a definition of a prototypical dictionary, such as

… the alphabetical monolingual general-purpose dictionary. Its characteristics are the use of the same
language for both the object and the means of description, the supposed exhaustive nature of the list
of described words and the more linguistic than encyclopaedic nature of the knowledge offered.

This monolingual general-purpose dictionary can also be defined in the following terms: it contains primarily
semasiological data, gives a description of a standard language, and serves a pedagogical purpose.

3.5 Dictionaries entries

Defining what a word is takes us back to the difficulties of pinning down the meaning of words.

The selection of a type of lexical entry is an important decision for lexicographers. It includes a definition of the
type of lexical unit to be used. Thus lexical units or lexical items, as basic organizational elements in dictionaries,
must be defined.

However, depending on the language, this is a more or less difficult task since word boundaries are more clearly
identifiable in some languages rather than in others. For example, Halliday explains how defining “words” and
saying whether take, takes, took, or taking are one or more “words” is, in English, an easier task compared with
Chinese where word boundaries are less clearly marked. Nevertheless, there is a general concept underlying this
diversity. That is the concept of lexical item. According to him every language has a vocabulary, or ‘lexicon’ that
forms one part of its grammar.

Therefore, deciding what to include and what to leave out has to do with the theoretical configuration of a lexicon
and with practical limitations of a dictionary of a prefixed type. The nature and segmentation of words, the type
of subentries and whether or not including fixed expressions or phrasal verbs, for example, in the case of English,
is something which also affects the selected type of entry. As a result, it is important to be able to choose among
a series of criteria.

Porto Dapena has grouped these criteria into:


• Extra-linguistic criteria > purpose, size, prejudice…
• Linguistic > frequency, correction, contrast…

Lemmatization or heading is particularly important and must also follow a certain convention that the
lexicographer should make as explicit as possible. The fact of deciding between cases of homonymy and polysemy
is another point to be taken into account in the configuration of the entries in a dictionary.

The layout of the definiens is an essential part of dictionary making as well. As Landau puts it:

The traditional rules of lexical definition based on Aristotle’s analysis demand that the word defined be
identified by genus and differentia. That is, the word must first be defined according to the class of thing
to which it belongs, and then distinguished from all other things within that class. Thus child is a person
(genus) who is young or whose relation to another person is that of a son or a daughter (differentia).

The selection of the criteria used to decide which elements to include in the definiens and which ones to leave
out is ultimately a linguistic decision. Halliday explains how entries are organized as follows:

• The headword or lemma, often in bold or some other special font


• The lemma is the base form under which the word is entered and assigned a place: typically,
the ‘stem’, or simplest form (singular noun, present/infinitive verb etc.)
• Its pronunciation, in some form of alphabetic notation
• Indicated by the International Phonetic Alphabet in a broad, phonetic description
• Its word class (‘part of speech’)
• One of the primary word classes > in English, usually verb, noun, adjective, adverb, pronoun,
preposition, conjunction, determiner/article.
• Its etymology (historical origin and derivation)
• It may include not only the earliest known form and the language in which it occurs but also
cognate forms from other languages.
• Its definition
• The definition takes one or both of two forms: description and synonym
o Description > may need to include words that are “harder” or less frequently used,
than the lemmatized word
o Synonymy > a word or little set of words of similar mining is brought it, often giving
slightly more specific senses.
• Citations (examples of its use)
• It shows how the word is used in context, illustrating a typical usage.

4. THE DIFFERENCE BETWEEN MEANING DEFINITION AND DICTIONARY DEFINITION

According to Sterkenburg: much of what a definition in a dictionary says, particularly where concrete nouns are
concerned, refers to a referent. The most neutral option is to say that a dictionary should provide information
on the meaning of the lexical units included and information on their usage in specific language situations.

Meaning definition requires a notational apparatus while dictionary definition can be directly retrieved from any
paper or online dictionary.

Garcia Velasco and Hengeveld claim that abstract meaning definitions should capture only the syntactically
relevant information that accounts for the lexeme’s distributional properties. In addition, these authors propose
that the meaning definitions of lexemes guide the selection of an appropriate predication frame, implying that
the lexemes are chosen first, followed by the frames.

For Butler, dictionary definitions take the form of a more finely nuanced predicate frame which indicates the
various components of meaning. In addition he also claims for a hierarchically ordered ontology of concepts.

The way in which Butler paraphrases the meaning postulate of OPEN is most interesting if compared with its
dictionary definition. He explains how the thematic frame is made up:

Butler explains how it shows that the event of opening something has an Agent, a Theme which is typically a
door or window, a Location, an Origin state and a final Goal state. He also includes its associated meaning
postulate as follows:
As Butler shows meaning definition is a much more complex construct than dictionary definition.

5. DICTIONARY WRITING AND CORPUS ANNOTATION

In order to base dictionary entries on empirical data rather than on the almost inevitable subjectivity of the
dictionary writer, most dictionary writing work these days is based on corpus analysis.

5.1 Text encoding and annotation

McEnery and Wilson claimed that empirically-based research in linguistics should be extracted from carefully
annotated texts.

Text encoding and annotation is not only important when constructing dictionaries, but it is also a basic source
of information in the construction of ontologies. The basic empirical reference for identifying some of the
building blocks of ontologies such as entities, relations or properties is a search for tagged nouns, verbs or
adjectives.

Leech viewed the processing of corpora from the perspective of how humans and machines interact and he
referred to research showing resemblances between classes discovered by the machine and those recognized by
grammatical tradition.

The following list shows different types of data and different forms of annotation:
• Morphosyntactic data
• Discursive data
• Pragmatic data
• Semantic data
• Lexical data

McEnery and Wilson state that if a corpus is said to be unannotated, it appears in its existing raw state as plain
text, whereas an annotated corpus has been enhanced with various types of linguistic information. The utility of
the corpus is improved when it has been annotated, making one which may be considered a repository of
linguistic information. The implicit information has been made explicit through the process of concrete
annotation.

5.2 Text annotation formats

In the past, many different approaches have been adopted, some more lasting than others. One long-standing
annotation practice is known as COCOA references. COCOA was an early computer program used for extracting
indexes of words in context from machine-readable texts.

A COCOA reference consists of a balanced set of angle brackets (<>) which contains two entities: a code which
stands for a particular variable name and a string or set of strings which are the instantiations of that variable.
For instance, the code “A” could be used to refer to the variable “author” and the string would stand for the
author’s name.

<A CHARLES DICKENS>


<A HOMER>
COCOA references only represent an informal trend for encoding specific types of textual information, e.g.
authors, dates, and titles. Current trends are moving more towards more formalized international standards of
encoding.

The most popular project of this current trend is the Text Encoding Initiative (TIE), whose aim is to provide
standardized implementations for machine-readable text interchange. The TEI uses a form of document mark-
up known as SGML. SGML has advantages such as clarity, simplicity, formal rigor and has already been recognized
as an international standard.

5.3 Types of annotation

McEnery and Wilson recognize that certain kinds of linguistic annotation, which involve the attachment of special
codes to words in order to indicate particular features, are often known as tagging rather than annotation, and
the codes which are assigned to features are known as tags.

They define part-of-speech annotation as the most basic type of linguistic corpus annotation. The aim of part of
speech tagging is to assign a code to each lexical unit in the text indicating its part of speech. Part-of-speech
annotation is useful because it increases the specificity of data retrieval from corpora and also forms an essential
foundation for further forms of analysis (such as syntactic parsing and semantic field annotation). It also allows
us to distinguish between homographs and help disambiguate them. Part-of-speech annotation is the most
common today.

The architecture of part of speech tagging includes three main aspects:

• There is a tokenization stage, where the input text is divided into tokens suitable for further analysis.
• An ambiguity look up involving the use of a lexicon and a guesser for tokens not represented in the
lexicon are important aspects to consider.
• There is a disambiguation stage, including information about the word itself and about the word/tag
sequences, which are basically different types of contextual information.

5.4 Parsing

Another kind of highly frequent type of annotation is called parsing. It involves the procedure of bringing basic
morphosyntactic categories into high-level syntactic relationships with one another. Parsed corpora are
sometimes known as tree-banks. This term alludes to the three diagrams or phrase markers frequently used in
parsing.

The semantics of morphological relations can be applied to parsing as a kind of lexicographical analysis in a
number of ways. Describing the meanings that are added when a verb becomes a noun, a noun becomes an
adjective or an adjective becomes a verb are the semantic changes involved in derivation processes studied in
previous chapters.

The description of these processes enables the research possibilities of collocation analysis of corpus thus back
influencing disambiguation processes.

All these means that static collocational analysis of these type of occurrences could be used to feed back in
tagging processes by ruling out or confirming verb/noun/adjective tagging.

Carrol divides parsers into three basic types:

Shallow parsers > these usually require a very basic syntactic analysis also called shadow analysis. One standard
methods consists on the partition of the input into a sequence of non-overlapping units or chunks of words.

Dependency parsers > are usually based on dependency grammars


Context free parsers > there have been attempts to produce automatic parsing but in any case context free
parsing algorithms constitute the basis of practically all approaches to parsing that are built upon hierarchical
phrase structure. However, in the last ten years, the massive development of the so-called industries of the
language based on internet expansion and the development of ontologies has made these types of parsing
obsolete.

5.5 Semantic annotation

Initially McEnery and Wilson assumed that only two types of semantic tagging were possible: those
differentiation relations such as agents or patients and those with no previous agreement. However, semantic
tagging is much more complicated than other types of tagging.

As it is widely recognized in linguistic studies, the use of closely defined categories is problematic and it is not
always applicable. Because of this, other less clearly bounded conceptual categories and their use have been
searched in prototypical analysis. According to McEnery and Wilson, the main contribution of corpus analysis to
semantics is that it can account for indeterminacy and gradience. These two concepts have been widely explored
in corpus based statistical analysis.

On the one hand, indeterminacy in linguistics is closely related to the logical handling of fuzzy concepts. On the
other, the mathematical concept of gradience has helped the understanding of ranges of words occurrence.

Because membership gradients are related to frequency of inclusion, corpora are valuable sources of data in
determining the existence and scale of such gradients.

Mindt demonstrated how corpus can be used to provide objective criteria for assigning meanings to linguistic
items. He points out that, frequently, in semantics the meanings of lexical items and linguistic constructions are
described by reference to the linguist’s own intuitions, that is, by taking a rationalist approach to linguistic
analysis. But, he also argues that semantic distinctions are associated with texts with characteristic, observable
contexts – syntactic, morphological and prosodic – and thus, that, by considering the environments of the
linguistic entities, we can obtain an empirical objective indicator for a particular semantic distinction that we
would like to identify.

McEnery and Wilson recognize two types of semantic annotation:

• The marking of semantic relationships between items in the text, for example, the agents or patients
actions.
• The marking of semantic features of words in the text, essentially the annotation of word senses.

There is no universal agreement about which semantic features ought to be annotated. However, Sedelow and
Sedelow made use of Roget’s Thesaurus, in which words are organized into general semantic categories.

These days semantic annotation is a very close equivalent to the semantic web searches such as semantic wikis,
semantic blogs, collaborative tagging and the different types of programs constantly being created in the internet
to help refine and contextualize intended meaning. However, while conventional web searches are only
understandable for humans, computers can also use the Semantic Web.

Aguado et al. offer a classification of different types of semantic tagging of corpora and they include three basic
kinds of tagging: semantic field tagging, word sense and semantic roles or verb arguments, which are taken
together as a whole.

5.6 Lemmatization
Another type of annotation is called lemmatisation. A lemma represent the set of words that includes all related
inflected words. The process of producing a list which groups together all forms belonging to each lemma is called
lemmatization. The program or computer tool that does this is called lemmatizer.

5.7 Pragmatic annotation

The inclusion of discourse and text linguistics in annotation is more problematic and aspects of language at the
levels of text and discourse are one of the least frequently encountered annotations in corpora. However,
occasionally, such annotations are applied.

There are some examples of discourse tags: apologies (sorry, excuse me), greetings (hello), hedges (really, that’s
right). However, despite their potential role in the analysis of discourse, these kinds of annotation have never
become widely used, possibly because the linguistic categories are context-dependent and their identification in
texts is a greater source of dispute than other forms of linguistic phenomena.

Notwithstanding, there is indeed a kind of text annotation which is anaphoric annotation. It can only be carried
out by human analysis, since one of the aims of annotation is to train computer program with these data to carry
out the task. There are only a few instances of corpora that have been anaphorically annotated.

5.8 Concordances and collocation

Concordances and collocations are very similar terms for the occurrence of a list of examples of a word, or a
group of words, taken from a corpus. This can be done with the help of the so-called concordance programs.

5.9 The concept of legomenon

Hapax legomenon is a transliteration – not the translation – of the Greek meaning “(something) said (only) once”.

The concept of “hapax” is important in lexicographic studies because it can be used to indicate the degree of
relevance of one or various terms in a corpus. It is inversely related to its rank in a frequency table extracted
from the same text.
UNIT V – TYPES OF ONTOLOGIES

1. INTRODUCTION

1.1 Background for relevant distinctions

The main difference between the codification of meaning in applications such as dictionaries and the codification
of meaning in ontologies is that:

• in dictionaries the meaning of a word is expressed by means of the definiens (definition).


• in ontologies is achieved by focusing on the underlying concept that a word may eventually represent.

From a semantic perspective, meaning is what words encapsulates. From an ontology perspective, meaning is
what meaning is what knowledge encapsulates. This is why ontologies and semantics are closely related. This
relation takes us back to the problem of knowledge and how it is acquired.

There were two reasons for us to go back to what the Greeks said about knowledge about 2.500 years ago. Firstly,
because they were the first ones who theorized about how knowledge is acquired and, secondly, because they
defined the philosophical discipline called ontology. This word comes from the Greek word ONTOS, meaning
“what is”, and the Greek word LOGOS, which means science or study of something. Therefore ontology is the
study of what is or exists.

We are now interested in the big abstraction the Greek philosophers made about what there is in the world, real
or imagined, physical or mental, because of the great simplification they made when they first distinguished
between entities and relations. Furthermore, relations were divided into two kinds: the relations that links
entities in a state of affairs and the relation of an entity with its properties.

These distinctions represent a comprehensive form of abstraction of reality that allow us to organize human
knowledge in a way that we can interact with machines.

1.2 What do we mean by concepts?

In computer programming a “language” usually refers to a mark-up language, such as HTML or DAML. In this
field, an “ontology” usually refers to the main concepts handled in an area of knowledge or in a database. This is
one important background for applications such as specialized glossaries or terminologies to be used either
online or on paper format.

Wikipedia describes the Semantic web as a collaborative movement led by the international standards body, the
World Wide Web Consortium. According to their description, “The Semantic Web provides a common framework
that allows data to be shared and reused across application, enterprise, and community boundaries”.

Such standard promotes common data formats on the World Wide Web, basically consisting of unstructured and
semi-structured documents into a “web of data”. This purpose is meant to be achieved by encouraging the
inclusion of semantic content in web pages.

The Semantic Web involves publishing in languages specifically designed for data such as RDF, OWL and XML.
While HTML is one such program that describes documents and the links between them, RDF, OWL and XML, by
contrast, can describe arbitrary things such as people, meetings, or airplane parts.

Technologies are combined in order to provide descriptions that supplement or replace the content of web
documents. Thus, content may manifest itself as descriptive data stored in web-accessible databases, or as
markup within documents. The machine-readable descriptions enable content managers to add meaning to the
content. In this way, a machine can process knowledge itself, instead of text, using processes similar to human
deductive reasoning and inference, thereby obtaining more meaningful results and helping computers to
perform automated information gathering and research.

However, for concepts to be transmitted we humans need words most of the time and this is why it is important
to always keep in mind the differences between concepts and words. While the human brain and sensory
processing systems store and relate all these types of information, if we wanted a machine to do the same type
of operations, different types and stages of codification would be required.

Understanding the similarities, differences and interrelations between both in the present situation of massive
use of the internet, where we interact with machines such as computers all the time, becomes more and more
important. As Prevot et al. explain:

Conceptualization is the process that leads to the extraction and generalization of relevant information
from one’s experience. A conceptualization is independent from specific situations or representation
languages, since it is not about representation yet. To be useful, a conceptualization has to be shared
among agents, such as humans, even if their agreement is only implicit.

2. DIFFERENT PERSPECTIVES IN ONTOLOGY DEFINITION AND DESCRIPTION

There is little agreement on what an ontology is, however, a bunch of definitions can be reviewed.

In Artificial Intelligence terms, an ontology is defined as “an engineering artefact, constituted by a specific
vocabulary, used to describe a certain reality, plus a set of explicit assumptions regarding the intended meaning
of the vocabulary word”.

Gruber says that formal ontologies are designed because when we choose to represent something in an ontology
we are marking design decisions. However, to make such decisions we need objective criteria founded on the
purpose of the resulting artefact, rather than based on a priori notions about Truth.

Crucial issue is interchangeability. There is a need of formal tools that allow importation and interchange of
ontologies.

Ontologies can be defined as knowledge networks and a knowledge network is a collection of concepts that
structures information and allows the user to view it. In addition, concepts are organized into hierarchies where
each concept is related to its superordinate and subordinate concepts. All this forms the basis for inheriting
defined attributes and relations from more general to more specific concepts.

Gruber explains how several philosophers have provided criteria for distinguishing between different kinds of
objects (e.g. concrete vs. abstract) and the relations between them. In the late 20th century, Artificial Intelligence
adopted the term and began using it in the sense of a “specification of a conceptualization” in the context of
knowledge and data sharing.

The word ontology has been used to describe artefacts with different degrees of structure. These range from
simple taxonomies (such as the Yahoo hierarchy), to metadata schemes (such as the Dublin Core), to complete
logical theories.

Ontologies are usually expressed in a logic-based language, so that detailed, accurate, consistent, sound, and
meaningful distinctions can be made among the classes, properties, and relations. Some ontology tools can
perform automated reasoning using ontologies, and thus provide advanced services to intelligent applications.

Helfin also explains that ontologies figure prominently in the Semantic Web as a way of representing the
semantics of documents and enabling the semantics to be used by web applications and intelligent agents.
Ontologies can prove very useful for a community as a way of structuring and defining the meaning of the
metadata terms that are currently being collected and standardized. By using ontologies, tomorrow’s
applications can be “intelligent”, in the sense that they can work at the human conceptual level more accurately.

Barry Smith explains the connections between the term ontology from a philosophical perspective and the
computational and/or web perspective of ontologies. For him:

Ontology, as a branch of philosophy, is the science of what is, of the kinds and structures of the objects,
properties and relations in every area of reality. In simple terms it seeks the classification of entities.
Each scientific field will of course have its own preferred ontology, defined by the field’s vocabulary and
by the canonical formulations of its theories.

Smith also explains the relation between Ontology and Information Systems by saying the following:

… a common backbone taxonomy of relevant entities of an application domain would provide significant
advantages over the case-by-case resolution of incompatibilities. This common backbone taxonomy is
referred to by information scientists as an ‘ontology’.

However, Sowa proposes the following definition:

The subject of ontology is the study of the categories of things that exist or may exist in some domain.
The product of such a study, called an ontology, is a catalogue of the types of things that are assumed
to exist in a domain of interest D from the perspective of a person who uses a language L for the purpose
of talking about.

There are other more computationally oriented definitions of ontologies, which are also worth quoting. Guarino
defines them:

A set of logical axioms designed to account for the intended meaning of a vocabulary.

De Groote defines an ontology is an agreed conceptualisation of reality, based on, e.g., Classes, Relations,
Functions, Axioms, Instances.

From an Artificial Intelligence perspective, therefore, ontology is not only a discipline, but also the outcome of
the activity of ontological analysis and modelling. In this sense we can speak of “an ontology of cardiac valves”
or “an ontology of inflammation”. Such ontologies are examples of the so-called “domain ontologies”, whereas
“foundational ontologies” represent domain- independent concepts like objects, events, processes.

It can be concluded that the three elements that must be included in any definition of ontology are the following:

• Classes (general things) in the many domains of interest


• The relationships that can exist among things
• The properties (or attributes) those things may have

And finally, ontologies may represent language-dependent and/or language-independent information, but the
kind of information included must be made explicit in the ontology’s specifications.

3. NATURAL LANGUAGES AS A SOURCE OF DATA FOR ONTOLOGY BUILDING

3.1 Data bases, dictionaries, terminologies and ontologies

When using natural languages as data source, two basic types of knowledge must be considered. The one that
can be extracted from natural languages and the one that comes from experience and or intuition.
Lenci claims that data must be extracted so as to build an ontology and part of these data can be extracted from
text written in a natural language. He classifies the types of potentially usable texts in two kinds:

• Structured > includes dictionaries and glossaries


• Not structured > includes corpora

The first ones are oriented for the human reader and they are not formally structured. However, if structured, it
is that very structure what can be used as a source of information. For example, the structure of glossaries can
be used directly to extract their conceptual categories, which have been previously stated. Or the use of
dictionary definitions for research purposes. A long standing line of research both in linguistics and in NLP has
been to use these definitions converting them into structured semantic entities as computational lexicons.
Therefore, it is possible to use NLP techniques to convert dictionary definitions in structured semantic entities
for computational purposes. The very structure of thesauruses and glossaries allow them to be used directly in
the construction of ontologies because of the ways they are designed in the first place. There are also facilities
precisely designed to organize shared knowledge.

Corpus processing is meant to overcome some of the problems that semi-structured lexical resources present.
The concepts relevant for organizing a particular domain of knowledge is what can be extracted from texts
representative of this domain. It is possible to retrieve and store linguistic information in a fast and reliable way
and corpus can be used for this. However, to serve these purposes corpora must be annotated.

Since corpora can be used to extract the relevant concepts in a domain of knowledge, they can also be sued to
extract different kinds of information. For example, a corpus of wine tasting notes allows us to extract the main
concepts related to the world of wine producing, tasting and commercialising it. This organized knowledge can
be used in the construction of better search engines.

It is also possible to analyse corpus extracted data using a mixture of NLP tools and statistics. This kind of analysis
can help to find regularities or other meaningful aspects. Therefore, because it is possible to understand how
knowledge is lexically codified, this information can be used to make the kind of tool that a machine can
understand and handle.

Because knowledge is codified linguistically in the first place, the study of the lexicon of a particular language
offers a double perspective. On the one hand, analysing the organization of a particular language as a form of
knowledge representation provides some access about how knowledge is organized in the human mind. On the
other hand, the present possibility of storing vast amounts of linguistic data turns this fact into a source of
empirical evidence for more general linguistic research. But for such empirical evidence to be used and share,
corpora must be annotated, and this annotation must be standardized.

Genuine semantic annotation is limited to lemmatization, synonymy, hyperonymy, meronymy and hyponymy
identification. And these linguistic data are precisely used as sources for ontology building. Therefore,
hyperonymy is used as a class-label, hyponymy as an identifier of a “kind of relation”, meronymy is used as a
“part of relation”, etc.

Linguistic annotation implies different things. If it has been automatically inserted this is so because these
characteristics are lexicalized, are grammaticalized, or they have recognizable patterns in a text. If annotation
has been manually performed, any kind of information can be included, provided that it has been previously
defined.

Databases and terminologies are products that can be linked to ontologies in certain ways. A prototypical
example of a language dependent knowledge representation product in WordNet. It is a lexical database for the
English language.
Terminologies can be related to expert domains and restricted ontologies of a field of knowledge. While
lexicology is the study of the words in general, terminology is the study of special language words or terms
associated with particular areas of specialized knowledge.

There are a number of differences and similarities between the work of lexicographers and that of
terminographers. Since dictionaries are usually word-based, the lexicographical work consists in identifying the
different senses of a particular word form and the usual representation is generally alphabetical. However,
terminographers differ from lexicographers in the type of corpus from which they gather their database.

While synonyms are usually scattered throughout the dictionary polysemous words and homonyms are grouped
together. On the other hand, terminologies are sense-based and reflect the fact that the terms they contain map
an area of specialized knowledge.

The relations between the concepts, which the terms represent, are the main organizing principle of the
terminological work. There are five basic stages in the compilation of terminological resources. First,
terminographers familiarize themselves with the area of specialized knowledge they have to deal with. Second,
they identify the knowledge sources they will need for the process of compilation. Third, terminographers
identify an initial set of forms. Fourth, they analyse them and, finally, they record this information.

Neither lexicology nor terminology are directly concerned with any particular application. Thus, they can be
considered as the theoretical basis for lexicography. If lexicography is the process of making dictionaries,
terminography or terminology is concerned with compiling the vocabulary of specialized languages.

There are important differences between ontologies and dictionaries:

• Dictionaries
o does not use a formal language, but rather an informal one: a human natural language.
o is meat to be read and interpreted by humans. No machine is currently capable of
understanding a dictionary.
o is descriptive > it provides definitions which are presumably appropriate at a point in time,
often with annotations about the usage of words up to the time of publication.
• A formal ontology
o Uses a formal language to state what a given term means > a term in an ontology is not a word
but a concept, although the concept will normally be given a name which is a word or
combination of words in order to support human understanding of the ontology. However, a
true formal ontology could have all its term names replaced with arbitrary codes and still have
the same formal properties.

3.2 Standards

Researchers in the Semantic Web area are developing models for the semantic annotation of web pages and also
trying to describe web resources by means of computer languages such as OWL or XML.

The increasing need of standardization has had the result of developing OntoTag hybrid, a linguistic and
ontological annotation. This product seeks to achieve the standardization goals proposed in the ISO-TC37SC4
committee. This institution in turn tries to enable interoperability between systems by means of sharing a
common vocabulary and resource description based on ontologies.

3.3 Ontologies and related disciplines

Schalley and Zaefferer claim that current progress in linguistic theorizing is increasingly informed by cross-
linguistic research and that comparison of languages relies on those concepts which are essentially the same
across human minds, cultures and languages and which therefore can be activated through the use of any human
language. These instances of mental universals join other less common concepts to constitute a complex
structure in our minds, a network of cross-connected conceptualizations of the phenomena that make up our
world. They label such system of conceptualizations as ontology.

In addition, the most reliable basis for any cross-linguistic research lies in the common core of the different
individual human ontologies.

Their initial ideal was to explore the relation between the ontologies in our minds and the language we
participate in, and their working assumptions is that the options provided by human languages for expressing a
given concept are constrained by the ontological status of this concept. In turn, ontological status is defined as
the position a concept holds within a given ontology and this position is determined by other concepts in the
same system. The authors also explain how ontological knowledge can be safely presupposed in normal use of
language whereas other knowledge cannot.

They differentiate between ontological knowledge, encyclopaedical knowledge and episodic world knowledge,
but focus on the relation between these kinds of knowledge and linguistic knowledge.

From the onto-linguistic point of view, linguistic knowledge is a special kind of ontological knowledge since:

• To know a language implies to know the signs of this language.


• Definitional knowledge is ontological knowledge.

Metzinger and Gallese explain that to have an ontology is to interpret the world and they argue that the brain,
viewed as a representational system aimed at interpreting the world, possess an ontology. It creates primitives
and makes existence of assumptions. The empirical evidence these authors gathered demonstrates that the brain
models movements and action goals in terms of multimodal representations of organism-object-relations.

Extracting conceptual knowledge from texts

Lenci establishes four different resources that can be used to help extracting ontolexical knowledge directly from
a text. Firstly, he focuses on the identification of semantic types. Secondly, on the detection of semantic
similarity. Thirdly, he concentrates on the use of inference and finally on the use of argument structure.

Experts in the field consider semantic typing as the backbone of most NLP applications. Types are formal,
symbolic ways of identifying the concepts expressed in linguistic expressions. Semantic similarity and relatedness
are also aspects that can be extracted from the text.

The real distinction between Ontology and natural language according to Niremburg and Raskin is based on the
idea of ambiguity. The premise they start from holds that ambiguity is what distinguishes natural language and
ontology. Because of this, the main distinction between both is that language emerges and it is used by people,
while ontologies, both as constructed for computers or as a philosophical discipline are made ad hoc.

Nickes et al. explain that there is a branch of computational linguistics that systematically takes advantage of
ontologies in the AI sense of the term. One important outcome in the field is Ontological Semantics as in
Niremburg and Raskin. Their ontology lists the definition of various concepts for describing the meanings of
lexical items of natural languages, but also for the specification of the meanings of the text-meaning
representation that serve the function of an interlingua for machine translation.
There is a linguistic domain that uses the term ontology to model-theoretic formal semantics. This is one of the
simples uses of the ontology applying first order logic to connect individuals, sets of n-individuals and truth
values.

The need for a more complex ontology required for this logical development purpose made the number of
ontological categories to grow.

Individual ontologies contain one or more language ontologies, but also something else that is often called
common-sense ontology. It could be ascribed to subjects with no language such as robots, macaque brains and
roundworms. Thus, the challenge consists in factoring the ontogenesis of individual ontologies into different
types:
• The conceptual default setting babies are born with
• The culturally induced development the conceptual system is subject to
• The effects of the individual language on this development.

This relates to another discipline using the term ontology: developmental cognitive science. Cognitivism also has
connections with the way knowledge is organized. This connection has to do with the mental operation of
categorization.

Both language ontologies and common sense ontologies are closely related, since every language ontology is a
conceptualization or categorization of what normal everyday human language can talk about and this is largely
determined by the requirements of everyday life.

4. TYPES OF ONTOLOGIES FROM MULTIPLE PERSPECTIVES

Recent developments of ontologies focuses on two distinct traditions. One developed within the field of AI and
cognitive models, and the other based on linguistic developments.

This preliminary distinction leads to a very general classification into the group of so-called linguistic ontologies
and the group of formal ontologies. However, the differences between linguistic products and ontologies should
always be differentiated. While synonymy or quasi synonymy are important linguistic relations among words,
they don’t fit in formal ontologies, since they represent the same or nearly the same concept. This is why genuine
lexical products like lexicons or glossaries are clearly different from ontologies.

While the tradition linked to AI and cognitive models focuses on the classes and types of categories that organize
our human knowledge and our processing capacities. Formal ontologies deal with the specification of concepts
only, and they are usually stated in first-order language.

the second tradition or linguistic tradition, more closely related to language, has a different orientation. It
organizes knowledge focusing on the meaning of words. And this organization, in turn, has also followed two
different semantic trends. One focusing on semantic features as the base for semantic organization and another
one focusing on semantic fields as the base for such same organization.

Ontologies in computer science: a survey

In computer science the term ontology refers to the construction of information models. An information model
is an abstract formal representation of entities that includes their properties and the relations that hold between
them. By contrast with data models, information models represent their entities without any specification of
implementation issues or protocols for data transportation. However, among computer scientist the word
ontology is so broadly used that it has been employed to refer practically any information model.

Ontologies in computer science can be defined as a specification in formal language of terms and definitions
describing things that make up the world.
The chart below represents the degree of formalization that the different elements have, ranging from lightly
formalized to highly formalized products.

Terms -formalized

Ordinary Glossaries

Ad hoc Hierarchies

Data Dictionaries

Thesauri

Structured Glossaries

XML, DTDs

Principled informal hierarchies

Database Schema

XML Schema
+ formalized
Formal Taxonomies

Frames
Formal ontologies

Data and Process Models

Description Logic-based

KIF, OCL, OWL

• Terms refer to a controlled and usually domain specific vocabulary


• Ordinary Glossaries are terms with natural language definitions, as one finds in many textbooks.
• Ad hoc Hierarchies are sets of terms with a relationship between terms, but where no formal semantics
for that relation is defined.
• Data dictionaries are more formal models of information, often of relational databases, where each
term and relation has an associated natural language definition.
• Thesauri is a world list that is not ordered alphabetically but according to conceptual relations.
• Structured Glossaries may include relationships among the terms in the glossary.
• XML, DTDs are used for communication among software systems. XML is a language for defining syntax
that has no associated constrains on semantics.
• Principled informal hierarchies are those that do not have a formal, logical model of relations between
terms, but at least have informal, common sense explanation of relationships.
• Database Schemas are Data Base structures that have more formal definitions of the meaning of terms
and relations, usually by employing statements in a database constraint language.
• XML Schema is a further development of XML and is the way to specify XML-based communication that
is recommended by the WWW
• Formal taxonomies are those that have a formal, logical semantic for the relations among terms.
• Data and Process Models couple taxonomies and defined relationships with semantics for representing
process and action.
• Description Logic languages combine knowledge representation elements in a frame system with the
ability to define rules; they are sublanguages of predicate logic.

A formal ontology is distinct from the most common instance of a set of terms and definitions that the dictionary
is.

A dictionary does not use a formal language, but rather an informal one: a human natural language. While a
dictionary is meant to be read and interpreted by humans, no machine is currently capable of understanding a
dictionary in any realistic sense of the world ‘understanding’. Furthermore, a dictionary is descriptive in the sense
that it provides definitions which are presumably appropriate in time, often with annotations about the usage of
words up to the time of publication.

In contrast, a formal ontology states what a given term means in a formal language. A term in an ontology us bit
a word but a concept, although the concept will normally be given a name which is a word or combination of
words in order to support human understanding of the ontology. However, a true formal ontology could have all
its term names replaced with arbitrary codes and still have the same formal properties.

There are a few aspects that most ontology experts agree on.

• Firstly, ontologies should be language independent


• Secondly, ontologies need to be well formed according to an axiomatic specification
• Thirdly, they should comply a similarity principle > a child must share the meaning of a parent
• Fourthly, the should comply a specificity principle > a child must differ from its parent in a distinctive
way that is the necessary and sufficient condition for being the child concept.
• Finally, they should comply an opposition principle: a concept must be distinguishable from its siblings
and the distinction between each pair of siblings must be represented.

5. ONTOLOGY APPLICATIONS

Vossen and Grishman focus on applications such as information extraction by name identification and
classification and event extraction. This can be achieved using commercial programs for these purposes. One
among many others is WordSmith.

Niremburg and Raskin also list a series of similar semantic applications: machine translation (MT), information
extraction (IE), question answering (QA), text summarization, answering and general human-computer dialog
systems and various combinations of them all. The differences is that they developed a program.

They are, basically, the same types of applications that computational dictionaries offer.

6. ONTOLOGIES AND ARTIFICIAL INTELLIGENCE

Artificial Intelligence can be defined as the branch of computer science that is concerned with the automation of
intelligent behaviour disregarding whether this effort is based on the abstract concept of rationality or involves
mimicking the human mind.

The majority of AI frameworks include some kind of less powerful knowledge base and an increasing number
also provides some facility specifically dedicated to concept knowledge. In contrast with other areas of computer
science, in AI the focus is on the computational reasoning about using ontological knowledge, on the
computationally intelligent acquisition and revision of new knowledge, and on the computational use of
ontological knowledge for decision-making.

7. KNOWLEDGE BASES
Knowledge bases refer to a collection of facts and rules, along with inference procedures to make use of these
rules.

This area of research originally focused on separating declarative specific knowledge from procedural one,
allowing a software system to behave more intelligently and less mechanically by dynamically combining small
chunks of knowledge to reach an answer.

Whereas a conventional software system would have specified a series of operations to be performed in a certain
order, a knowledge base system has a generic inference process that can apply a range of declaratively specified
knowledge in order to reach different answers to different queries.

A characteristic of open environment in knowledge base systems is that knowledge domains are dynamic and
can often be modelled with some uncertainty. Because of this, probabilistic ontologies provide the possibility to
describe concepts and instances with variable degrees of belief denoting uncertainty of description logic’s
terminological axioms.

Although knowledge bases and ontologies are close, the former refers to a collection of facts and rules, whereas
the latter is a much broader term, sometimes also including a database.

However, if a computational program lists a series of well described objects and a series of rules that link them
we have an ontology like product or a data base like product. How to label it very much depends on how broad
or specific the included entities are.

You might also like