Language Universals

Language Universals
W
DE
G
Language Universals
With Special Reference to Feature Hierarchies
by
Joseph H. Greenberg
with a preface by
Martin Haspelmath
Mouton de Gruyter
Berlin New York
Mouton de Gruyter (formerly Mouton, The Hague)
is a Division of Walter de Gruyter GmbH & Co. KG, Berlin.
This work appeared originally as volume 59 of the series

Janua Linguarum - Minor.
Library of Congress Cataloging-in-Publication Data
Greenberg, Joseph Harold, 1915-

Language universals : with special reference to feature hierarchies /
by Joseph H. Greenberg ; with a preface by Martin Haspelmath,
p. cm.
Includes bibliographical references.
ISBN 3-11-017284^ (pbk. : alk. paper)
1. Universals (Linguistics) I. Title.
P204.G74 2005
401'.3-dc22
2005003895
Printed on acid-free paper which falls within the guidelines

of the ANSI to ensure permanence and durability.
ISBN 3-11-017284-4
Bibliographic information published by Die Deutsche Bibliothek
Die Deutsche Bibliothek lists this publication in the Deutsche

Nationalbibliografie; detailed bibliographic data is available in the
Internet at <http://dnb.ddb.de>.
Copyright 1966, 2005 by Walter de Gruyter GmbH & Co. KG,

10785 Berlin.
All rights reserved, including those of translation into foreign languages. No
part of this book may be reproduced in any form or by any means, electronic
or mechanical, including photocopy, recording, or any information storage and
retrieval system, without permission in writing from the publisher.
Cover design: Sigurd Wendland, Berlin.
Printed in Germany.
Table of contents
Martin Haspelmath: Preface to the reprinted edition . . . . vii
Preface 5
1. Introduction: Marked and unmarked categories 9
2. Phonology 13
3. Grammar and lexicon 25
4. Common characteristics in phonology, grammar, and lexi-
con 56
5. Universals of kinship terminology 72
References . 88
Preface to the reprinted edition
by Martin Haspelmath
Joseph H. Greenberg's short book Language Universals, just 89

pages long, is one of the true gems of 20th century linguistics. While
the title might suggest a bland overview of known facts and issues in
language universals research, Greenberg instead offers us a strikingly
original set of observations about cross-linguistic patterns in phono-
logical, grammatical and lexical categories. In addition, Greenberg
sketches an explanatory account whose essentials have still not been
surpassed, forty years after he first presented these ideas.
The fundamental observation of Language universals is that pairs
of linguistic categories in phonology, grammar and the lexicon typi-
cally show asymmetrical behavior that is to a very large extent cross-
linguistically uniform. Category oppositions like voiced/voiceless,
glottalized/plain, long/short, singular/plural, present/future, positive/
negative, consanguineal/affinal had been described earlier by the
Prague School linguists Trubetzkoy and Jakobson as representing a
contrast between unmarked and marked. But it was Greenberg who
most forcefully claimed and demonstrated that these contrasts exist
not just as part of particular language systems, but can in principle
be observed in all languages, not only in phonology, but also
throughout the inflectional system and in the lexicon. Where the
structuralists Trubetzkoy and Jakobson saw markedness contrasts
as embedded in the structures of individual synchronic languages,
Greenberg emphasized the universal aspects of the substantive
factors of phonetics, semantics, and language use, and language
change was an integrated part of his explanatory framework.
If Greenberg's book had been written today, a title such as Typo-
logical Markedness Theory would be considered more appropriate.
But the abstract term markedness did not exist in the 1960s (it be-
came current only in the late 1970s), and highly general scientific
ideas were respectable also when they were not named "theories".
But the partly overly general ("language universals") and partly
overly technical ("feature hierarchy") title with the somewhat clumsy
middle part ("with special reference to") cannot fully explain why
Greenberg's book did not receive the attention that it deserved. To
viii Preface to the reprinted edition by Martin Haspelmath
be sure, Language Universals was widely read and cited, and the fact
that the terms marked and unmarked are known to every second-
year linguistics student is to a considerable extent due to its influence.
But Greenberg's earlier 1963 article (with its even clumsier title
"Some universale of language with particular reference to the order
of meaningful elements") became far more influential; the book in
which it appeared had to be reprinted three years later and is still
widely available on the antiquarian market, and Greenberg's article
is still commonly assigned as reading to graduate students in linguis-
tics.
Language Universals, too, should be compulsory reading for lin-
guists. The main reason why it did not come close to Greenberg's
word order work was that it mostly deals with phonology, morphol-
ogy, and kinship terminology. But in the 1960s and 1970s, the field
of linguistics was obsessed with syntax and its relation to semantics,
and many of the students entering the field did not have the solid
grounding in historical-comparative linguistics or the linguistics of
some non-European languages that was characteristic of Greenberg's
generation, and that could have helped readers to appreciate the full
significance of the proposed universals. Morphology was simply not
a hot topic, and phonology had to be done in Chomsky and Halle's
(1968) generative framework, which was more interested in morpho-
phonology than in explaining truly phonological patterns and relat-
ing them to phonetic factors. Greenberg's (1963) work on word order
universals was just as remote in spirit from the widely popular gener-
ative syntactic model as his phonological work was from generative
phonology, but the potential relevance of his word order universals
to Chomsky's "Universal Grammar" approach to syntax was evident
to everyone. In the 1980s, generative linguists began to incorporate
Greenberg's discoveries into their theories of Universal Grammar.
The markedness universals of Language Universals never made it on
the agenda of generative grammarians (in phonology, markedness is
now widely discussed again in the framework of Optimality Theory
[McCarthy 2002], but it mostly follows the markedness concept of
chapter 9 of Chomsky and Halle 1968 rather than Greenberg's).
The full impact of the ideas of Greenberg's typological marked-
ness theory on the field of linguistics is apparently still ahead of us.
That statistical regularities of language use are intimately connected
with language structure and are in fact an important ingredient for
Preface to the reprinted edition by Martin Haspelmath ix
explanatory theories was known before Greenberg (see, in particular,

Zipf 1935, 1949), but structuralist linguists were not interested in
these connections.1 It was only fairly recently that linguists became
more interested in the relation between language use and language
structure (e. g., Barlow and Kemmer 2000), and in particular in the
role of frequency of use in explaining language structure (e. g., By bee
and Hopper 2001; Bod et al. 2003).
After presenting a large number of correlations that are captured
by the theory of typological markedness, Greenberg (in chapter 4)
goes on to explicate the relationship between phonological marked-
ness and grammatical/lexical markedness, and finally to discuss the
role of frequency of use in the correlations. For phonology, he pro-
poses that tendencies of diachronic change (in particular the ten-
dency for the disappearance of the marked member if a contrast is
given up) are the cause for frequency asymmetries, but for grammar
and the lexicon, he sees the role of frequency as primary (pp. 65-
66). After all, speakers are free to say what they want, and a change
in language structure will not make them choose a meaningful cat-
egory (such as the singular or the future tense) any more or less
often. Greenberg goes so far as to equate "marked/unmarked" in
grammar and semantics with "less frequent/more frequent". This was
criticized by later commentators (e.g., Lehmann 1989; Andersen
1989), and of course it represents a fairly radical departure from
Trubetzkoy's and Jakobson's use of these terms (where "marked"
fundamentally meant "specified for a phonological/semantic fea-
ture"). One could ask whether Greenberg's story could not have been
told without using the terms "marked/unmarked" in the first place
(cf. Haspelmath 2005).
But Greenberg's main interest was in the language universals. He
did not shy away from the deeper explanatory questions, raised them
and attempted answers (from the present perspective, deeply insight-
ful answers). But he did not see his main task in providing these
answers. His unique contribution to linguistics was the truly global
perspective, the empirically based search for universals of human
language, whatever their ultimate explanation.
In his famous 1963 article, he listed and numbered the universals
he found, making the concept of a universal maximally concrete and
accessible. Many of these universals have become famous, and even
today we still refer to them using Greenberg's original numbers. Why
Preface to the reprinted edition by Martin Haspelmath
did he not do this in Language Universalst This book does not con-
tain a single numbered universal, set off from the main text in the
way in which typologists now routinely highlight their precious dis-
coveries.
The reason is simple: Language Universals contains too many uni-
versals to list them all! In an understatement, Greenberg (p. 10) an-
nounces "a considerable number of specific universals". And they
need not be listed individually, because they can be derived in a me-
chanical fashion from "a single rich and complex set of notions"
(p. 10). All we need to list is the set of (un)markedness properties
(called "markedness criteria" in Croft 1990) and the set of category
pairs (or more generally, category hierarchies). A few such properties
and category pairs are listed in (1)(2).
(1) phonology
unmarkedness properties: category pairs:
neutralization voiceless/voiced
higher text frequency short/long
greater phonemic differentiation non-nasal/nasal
greater subphonemic variation unpalatalized/palatalized
typological implicatum non-glottalized/glottalized
basic allophone unaspirated/aspirated
(2) grammar
unmarkedness properties: category pairs:
facultative expression singular/plural
contextual neutralization direct case/oblique case
higher text frequency masculine/feminine
zero expression positive/comparative
syncretism 3rd person/1st and 2nd person
defectivation indicative/hypothetical
irregularity present tense/future tense
For each category pair, it is claimed that universally (i. e., in all lan-
guages), the unmarked member will exhibit the unmarkedness prop-
erties of (1) and (2). For example, the following universals are among
those hypothesized by Greenberg:
Preface to the reprinted edition by Martin Haspelmath xi
(3) In all languages, if there is a frequency difference between unpala-

talized and palatalized consonants, the palatalized consonants
are more frequent.
(4) In all languages, if the phoneme inventory contains glottalized
consonants, it also contains (the corresponding) non-glottalized
consonants.
(5) In all languages, if there is a frequency difference between the
indicative and the hypothetical mood, the indicative is more fre-
quent.
(6) In all languages, if there is syncretism in nominal case inflection,
there will be syncretism in the oblique cases.
In phonology, Greenberg discusses just seven category pairs (obvi-
ously a small minority of the existing pairs) and six properties, result-
ing in 42 universals. In morphology, there are twenty-seven category
pairs (a list that is fairly representative of the most commonly occur-
ring grammatical categories) and seven widely applicable properties,2
yielding 189 testable universals. Altogether, Language universals thus
contains more than 230 universals. If all (or even just most) of these
universals turned out to be empirically supported, this would indeed
reveal "a vast amount of orderliness in language phenomena" (p. 33).
Tables 1-2 show the properties in the rows and the category pairs
in the columns, and the cells (each standing for a universal) indicate
the pages in Language Universals where Greenberg discusses the
relevant universal.
In Language Universals, Greenberg does not even begin to test the
predictions he makes (unlike in his 1963 article, where his 30-lan-
guage sample is a serious beginning). Instead, he limits himself to
making them plausible by pointing to individual examples. For the
most part, the empirical work of testing the predictions on a repre-
sentative sample of the world's languages remains to be done. But it
seems fair to say that by and large, at least the more robust proper-
ties (especially frequency, zero expression, defectivation, syncretism,
irregularity) have been confirmed by subsequent research (however,
Croft 2003 suggests that contextual neutralization and agreement a
potiori may not be valid correlating properties).
Thus, Greenberg's prediction that his results are "unlikely to be
seriously modified by subsequent work" (p. 15) seems to have been
on target. But he was in no way dogmatic about his claims. He notes
counterexamples to the general trend at various points (e. g., the un-
XU Preface to the reprinted edition by Martin Haspelmath
es
I
B
g"1
c

a > </> ) -
777 7
rt
a
I

! 1 1
I
(rt
(
'S
s
i

unmarked
8 2t .s
I
value
s | g 1
> C M CS >^ S
Preface to the reprinted edition by Martin Haspelmath xiii
ON
<N
oo m
\ m
m
r-
ri
oo oo
(N
I
<N (N
m
7 I
oo o
m rf
t
rt
. i ON
<N
v~t vo ON
"
O
l
t/l
o
'
8.
'S
l
13
^ H
arked
N
val
xiv Preface to the reprinted edition by Martin Haspelmath
c
Ov
Os
04

<N
\ 8
5.
o r- oo o\ o
t TfrTj-
8 Jo

!l fN
t
fN
*

Tj
<1 VO
***

Tf
oo
t
oo os o\
- Tt <
e
"3
S ""
Ii
ls sl
r- oo
'S
_c
c
H
unmarked
<N
I
value
Preface to the reprinted edition by Martin Haspelmath xv
expected behavior of long vowels, p. 22, and of the neuter gender,

p. 40) but is not worried by them because he is interested in the trend
itself and has no reason to assume that the trend should be not only
overwhelming, but also exceptionless.
Greenberg was also aware that markedness is not an absolute
property, but is often relative to a given context. "For example, whereas
for obstruents, voicing seems clearly the marked characteristic, for
sonants the unvoiced feature has many of the qualities of a marked
category" (p. 24).3 This situation has later become known as "marked-
ness reversal" or "local markedness" (e. g., Mayerthaler 1981; Tiersma
1982). It had apparently gone unnoticed before Greenberg.
Another important innovation of Green berg's is the scalar concep-
tion of markedness. This means that markedness is not just a binary
opposition "unmarked vs. marked", but that we rather have a scale
from maximally unmarked through moderately marked to maxi-
mally marked, and when comparing two categories, we can (or
rather, have to) say that one is less marked and the other is more
marked.4 Markedness becomes a quantitative concept, which is natu-
ral given that frequency, its most important indicator, is also quanti-
tative. For example, in nominal number, the frequency scale can be
described as "singular (most frequent), plural (less frequent), and
dual (least frequent)" (p. 31). Thus, we have a markedness scale of
number values "singular, plural, dual from the most unmarked to
the most marked" (p. 31). This scalar view of markedness has more
recently also been adopted in generative linguistics, in the form of
fixed constraint rankings in Optimality Theory (Prince and Smolen-
sky 1993; Aissen 1999).
Instead of "scale", Greenberg says "hierarchy", and instead of
"value", he says "feature". This results in "feature hierarchies" in-
stead of "scale of values", and this term (which hardly occurs in the
text) has come to be used in the subtitle "with special reference to
feature hierarchies". A binary markedness relation between two val-
ues is just a special case of a markedness hierarchy of features (or
scale of values).5
xvi Preface to the reprinted edition by Martin Haspelmath
Notes
1 On p. 14, Greenberg mentions that Trubetzkoy (1939: 230-41) noted the correlation
between higher text frequency and unmarkedness, but Trubetzkoy (in contrast to Zipf)
did not assign much real significance to text frequency. He explicitly rejected Zipfs
ideas about frequency as a causal factor in phonological simplicity. In a letter to
Jakobson in 1930, he put it bluntly: "statistics are beside the point" (Trubetzkoy 1975:
162, cf. Andersen 1989:21).
2 The properties "dominance" (p. 30) and "agreement a potiori" (p. 31) seem to be
relevant only to number and gender, respectively, so they are not included in the
count here.
3 Notice, incidentally, that Greenberg often used the term "feature" where nowadays
"(feature) value" would be used.
4 As Croft (2003) points out, this is true for most of the correlating properties, but not
for facultative expression and neutralization, so this is another reason for treating
these properties separately.
5 Note that Greenberg's "feature hierarchies" are very different from Silverstein's (1976)
"hierarchy of features", which is a true hierarchy (not a scale) and involves binary
features (i. e., features with two values, plus and minus).
References
Aissen, Judith
1999 Markedness and subject choice in Optimality Theory. Natural Language
and Linguistic Theory 17: 673-711.
Andersen, Henning
1989 Markedness theory - the first 150 years. In Mieska Tomio, Olga (ed.),
Markedness in Synchrony and Diachrony, 11-46. Berlin: Mouton de
Gruyter.
Barlow, Michael and Suzanne Kemmer (eds.)
2000 Usage-Based Models of Language. Stanford: CSLI Publications.
Bod, Rens, Jennifer Hay, and Stefanie Jannedy (eds.)
2003 Probabilistic Linguistics. Cambridge, Mass.: MIT Press.
Bybee, Joan L. and Paul Hopper (eds.)
2001 Frequency and the Emergence of Linguistic Structure. Amsterdam: Benja-
mins.
Chomsky, Noam and Morris Halle
1968 The Sound Pattern of English. New York: Harper & Row.
Croft, William
1990 Typology and Universals. Cambridge: Cambridge University Press.
2003 Typology and Universals. 2nd ed. Cambridge: Cambridge University
Press.
Preface to the reprinted edition by Martin Haspelmath xvii
Greenberg, Joseph H.
1963 Some universale of grammar with particular reference to the order of
meaningful elements. In Greenberg, Joseph H. (eds.), Universal of
Grammar, 73-113. Cambridge, MA: MIT Press.
Haspelmath, Martin
2005 Against markedness (and what to replace it with). Ms., Max-Planck-
Institute for Evolutionary Anthropology, Leipzig.
Lehmann, Christian
1989 Markedness and grammaticalization. In Miieska Tomic, Olga (ed.),
Markedness in Synchrony and Diachrony, 17590. Berlin: Mouton de
Gruyter.
Mayerthaler, Willi
1981 Morphologische Natrlichkeit. Wiesbaden: Athenaion.
McCarthy, John J.
2002 A Thematic Guide to Optimally Theory. Cambridge: Cambridge Univer-
sity Press.
Prince, Alan and Paul Smolensky
1993 Optimality Theory: Constraint Interaction in Generative Grammar. (Tech-
nical report, Rutgers University Center for Cognitive Science) Rutgers
University.
Silverstein, Michael
1976 Hierarchy of features and ergativity. In Dixon, R. M. W. (ed.), Gram-
matical Categories in Australian Languages, 112-71. Canberra: Austral-
ian Institute of Aboriginal Studies.
Tiersma, Peter
1982 Local and general markedness. Language 58: 832-49.
Trubetzkoy, Nikolaj
1939 Grundzge der Phnologie. Gttingen: Vandenhoeck & Ruprecht.
1975 Ltiers and notes. The Hague: Mouton.
Zipf, George K.
1935 The Psycho-Biology of Language: An Introduction to Dynamic Philology.
Houghton Mifflin. (Republished 1965 by MIT Press.)
1949 Human Behavior and the Principle of Least Effort: An Introduction to
Human Ecology. Cambridge, MA: Addison-Wesley.
Joseph H. Greenberg
May 28, 1915 - May 7, 2001
PREFACE
The work presented here is a somewhat revised and expanded

version of the paper "Language Universals" which appeared in
Volume III of Current Trends in Linguistics. Like most of the other
papers in that volume, it arose out of a group of four lectures
delivered in the Forum series of the Linguistic Institute at Indiana
University held during the summer of 1964. Discussion with
students and staff at the Institute and comments from colleagues
regarding the version later submitted to Current Trends have been
of value to me in preparing the present version. I wish to express
my appreciation of all those whose reaction to the paper in its
earlier form assisted me in the present revision. I wish particularly
to express my gratitude to Professor Cornelius van Schooneveld
for his assistance in carrying through arrangements for the publica-
tion in its present form.
TABLE OF CONTENTS
Preface 5
1. Introduction: Marked and Unmarked Categories . . . 9
2. Phonology 13
3. Grammar and Lexicon 25
4. Common Characteristics in Phonology, Grammar, and
Lexicon 56
5. Universals of Kinship Terminology 72
References 88
INTRODUCTION
MARKED AND UNMARKED CATEGORIES
The problem of universals in the study of human language as in

that of human culture in general concerns the possibility of generali-
zations which have as their scope all languages or all cultures.
The question is whether underlying the diversities which are
observable with relative ease there exist valid general principles.
Such invariants would serve to specify in a precise manner the
notion of 'human nature' whether in language or in other aspects
of human behavior. They would, in effect, on the lowest level
correspond to the 'empirical generalization' of the natural sciences.
On higher levels they might be dignified by the name of laws. The
search for universals, therefore, coincides on this view with the
search for laws of human behavior, in the present context more
specifically those of linguistic behavipr.
It was pointed out in an earlier study that for a statement about
language to be considered fully general it is sufficient that it have
as its logical scope the set of all languages.1 The logical form may
vary. It is typically, though not invariably, implicational. For
all values of X, if X is language, then, if it contains some feature
a, it always contains some further feature , but not necessarily
vice-versa. Statements of this form, it is maintained, satisfy all of
the usual requirements for fully general statements. The logical
equivalence of such statements to certain typological ones has
also been indicated.2 Thus if all languages with the feature a, also
have , then a typology defined by the four logically possible types
1
J. H. Greenberg, J. J. Jenkins, and C. E. Osgood, "Memorandum concerning
language universals", Universals of language, ed. J. H. Greenberg 258 (Cam-
bridge, Mass., 1963).
* J. H. Greenberg, ed., "Introduction", Universals of language (Cambridge,
Mass., 1963).
10 INTRODUCTION: MARKED AND UNMARKED CATEGORIES
produced by the combinations of and not- with and not-
(i.e. 1. Languages with both and , 2. Languages with and
with not- , 3. Languages with not- and , 4. Languages with
not- and not- ) when applied to the empirically existing languages
will give the following result. One of the types, namely and
not- will have no members since if a language has a, ex hypothesi
it always has also, and thus never not- .
It may be pointed out that the unrestricted (non-conditional)
universals can be considered a logically limiting case in which
a single feature only is involved. In this case there are two typo-
logical classes, languages with and languages with not-, and
the latter class has no members.
Though in previous studies all of the generalizations stated have
been synchronic in nature, it has been proposed that some connec-
tions between diachronic process and synchronic regularities must
exist since no change can produce a synchronically unlawful state
and all synchronic states are the outcome of diachronic processes.
In the present study, which is frankly speculative and exploratory,
the questions just mentioned are the subject of further investigation.
The topic of universals is here approached through the consideration
of a single, but as it will turn out, rich and complex set of notions,
those pertaining to marked and unmarked categories.
What at first might seem very limited subject matter in relation
to the more general one of universals, will in fact lead to the pro-
posing of a considerable number of specific universals. The
concept of the marked and unmarked will be shown to possess a
high degree of generality in that it is applicable to the phonological,
the grammatical, and the semantic aspects of language. Moreover,
the topic is of such a nature that it will afford the opportunity of
illustrating from concrete materials a number of the general
methodological problems already mentioned: the relation between
typology and universals; the relation of synchronic regularities
to diachronic processes; and the problems of levels of generalization.
In particular, it will be shown that the concept of marked and
unmarked categories provides the possibility of formulating higher
level hypotheses with deductive consequences in the form of more
INTRODUCTION: MARKED AND UNMARKED CATEGORIES 11
specific universale commonly arrived at by a more purely empirical

consideration of the evidence. Moreover, as is usual in such cases,
it will in certain instances suggest hypotheses which might not
have occurred to the investigator outside of the more inclusive
theory.
In the final section, a specific application of this kind is made
to the highly organized semantic area of kinship terminologies.
Although this subject matter no doubt presents a more systematic
semantic structure than is to be found by and large in language,
nevertheless it is reasonable to suppose that the principles to be
found here are operative elsewhere to the extent that a similar,
even if usually lesser, degree of such organization is to be found.
The idea of marked and unmarked categories is chiefly familiar
to linguists from Prague school phonology. The best known
instance is doubtless Trubetzkoy's classic Grundzge der Phnologie,
in which the notion plays an important role.3 In "Signe zero"
and other writings Jakobson showed that these ideas could be
applied to the study of grammatical categories and to semantics.4
In the present study we shall be chiefly concerned with the following
problems: what, if any, are the common features which would
justify the equating of the concept of unmarked and marked
catetories in fields as diverse as phonology, grammar, and se-
mantics? Is it possible to isolate some one characteristic which
might serve as definitional for this notion which tends to take on
Protean shapes? What is the connection between marked and
unmarked categories and universale? In the discussion of these
3
N. S. Trubetzkoy, Grundzge der Phnologie (Prague, 1939). It is not the
purpose here to give a detailed historical account. The first occurrence of the
terminology marked and unmarked (in phonology) appears to be by Trubetzkoy
in 1931, "Die phonologischen Systeme", TCLP, 4, 96-116, especially p. 97.
The first explicit use of this terminology for grammatical categories is probably
by Jakobson in "Zur Struktur des russischen Verbums", Charistera Guilelmo
Mathesio ... 74-84 (Prague, 1932). Cf. also with a different terminology
Hjelmslev in La Cat gorie des Cas (Aarhus, 1935), particularly p. 113. Earlier
adumbrations of these ideas in reference to inflectional categories are to be
found in certain Russian grammarians, e.g. Peshkovskij, Karcevskij.
1
R. Jakobson, "Signe Zero", in Melanges Bally 143ff. (Geneva, 1939).
12 INTRODUCTION: MARKED AND UNMARKED CATEGORIES
subjects, applications will first be considered in phonology and
then in grammar and semantics. The treatment will be at least
partly in terms of the history of the subject, but it should be under-
stood that the historical material is purely illustrative and merely
incidental to the main purpose.
PHONOLOGY
The first use of the concept of marked and unmarked categories

was in Prague school phonology. It arose in the context of the
problem of neutralization and the archiphoneme.
It was noted that in certain environments the contrast between
correlative sets was neutralized in that both could not occur.
By correlative set is meant a group of phonemes, usually two in
number, which differ only in a single feature of the same category
(e.g. voice, when one is unvoiced and the other voiced) and whose
remaining shared features are not found in any other set. Thus,
in English b and p are a correlative pair since they differ in voicing
only and in regard to their remaining features they are the only
non-nasal bilabial stops. In environments in which they do not
contrast, the representative of the so-called archiphoneme, that is,
the unit defined by the common features, may either be externally
determined, that is, be conditioned by adjacent phonemes, or be
internally determined. This last case is one in which a single
phoneme always appears regardless of the environing sounds. A
good example of external determination, found in many languages,
is the neutralization of the contrast among nasals before stops
where the choice is determined by the following homorganic con-
sonant. A commonly cited example of internal conditioning is the
neutralization of voice in final position for obstruents in German.
Here it is always the unvoiced phoneme which appears regardless
of the environment. The choice is thus internally conditioned.
Another well-known example is classical Sanskrit where, in sentence
final the opposition among voiced and unvoiced stops and aspirated
and non-aspirated stops is neutralized and the unvoiced, unaspirated
phoneme appears as the representative of the archiphoneme.
14 PHONOLOGY
Although, in principle, no doubt, neutralization is viewed as a

phenomenon specific to each language, one cannot help noting
that in different languages it is generally the same category which
appears in the position of neutralization. Thus in both German
and Sanskrit it is the unvoiced member of the unvoiced/voiced
opposition which is found. The feature which occurs in such
instances is called the unmarked feature and the other the marked.
Thus voicing is a marked feature; unvoicing an unmarked feature
in German. Again for Sanskrit the unaspirated feature and the
unvoiced feature are both unmarked as against the aspirated and
voiced which are marked features. It may be noted in passing that
in both these instances the unmarked feature is described phoneti-
cally by a term itself having a negative prefix un- while the marked
feature lacks it. This turns out to be generally true. Thus nasality
is a marked feature while non-nasality is the corresponding un-
marked feature. It is as though the marked feature is a positive
something, e.g. nasality, aspiration, while the unmarked feature
is merely its lack. This aspect, not explicitly noted by Tnibetzkoy,
will reappear very importantly in our later consideration of marked
and unmarked features in grammar and lexicon.
Another important characteristic of unmarked and marked cate-
gories noted by Trubetzkoy is that of text frequency. In general
the unmarked category has higher frequency than the marked.
It is of some interest to note that George K. Zipf, in his pioneering
studies of language frequency phenomena, had arrived at the same
hypotheses by a different, but, as can be shown, ultimately related
route, and some of his results are quoted by Tnibetzkoy. For,
if the marked feature contains something which is absent from the
unmarked, it is relatively more complex and by Zipf's well known
principle of least effort the more complex should be used less
frequently.1 Most of Zipf's data refer to the categories of voiced
and unvoiced consonants and aspirated and unaspirated consonants.
He also cites data regarding vowel length from Icelandic on the
assumption that long vowels are more complex than the corre-
1
G. K. Zipf, especially Psychobiology of language (Boston, 1935) and Human
behavior and the principle of least effort (Cambridge, Mass., 1949).
PHONOLOGY 15
spending short vowels, that is, that length is the marked feature.
In general ZipPs hypotheses regarding aspirated and voiced con-
sonants hold although there are a few exceptions. It may be noted
that Ferguson's hypotheses regarding the relatively greater text
frequency of non-nasal over nasal vowels is consonant with the
general thesis of the greater frequency of unmarked features.2
Additional data on the less frequently considered cases of marked
and unmarked phonologic features compiled by myself are presented
here, along with some evidence already published in other sources
and cited here for purposes of comparison. My own data are to
be considered tentative insofar as the samples are small, usually
1000 phonemes. The results, nevertheless, are obviously significant
and unlikely to be seriously modified by subsequent work. The
following are examples of counts, all done by myself, on the relative
frequency of glottalic and non-glottalic consonants in the following
languages: Hausa, in West Africa, and the Amerind languages
Klamath, Coos, Yurok, Chiricahua Apache, and Maidu. In the
case of Hausa, voiced implosives contrast with ordinary voiced
consonants in the pairs b/6 and d/d", and glottalized consonants
contrast with non-glottalized in the pairs k/k', s/s' (in Kano and
some other dialects usually ts'), and y/'y. In Maidu voiced im-
plosives as well as glottalized contrast with ordinary unvoiced
consonants in certain positions. In the other languages a single
series of unvoiced, unglottalized consonants occurs, but for
Chiricahua I have counted the three series unaspirated, aspirated,
and glottalized. The results for each language are found in Tables
, II, III, IV, V and VI, and the results for the six languages are
summarized and compared in Table VII.3
2
C. A. Ferguson, "Assumptions about nasals; a sample study in phonological
universals", in Universals of language ed. J. H. Greenberg 46 (Cambridge,
Mass., 1963).
' The Hausa count consists of the first 1000 phonemes on pages 1, 5 and 9 of
R. C. Abraham Hausa literature and the Hausa sound system (London, 1959)
[Greenberg]; Klamath from M. A. R. Barker, Klamath texts (Berkeley and
Los Angeles, 1963), first 1000 on pages 6, 16 and 26 [Greenberg]; Coos from
L. Frachtenberg, Coos texts (Leyden, 1913) first 1000 from pages 5, 7, 14, 17,
20 and 24 (14 from commencement of new story on middle of page) [Greenberg];
Yurok from R. H. Robins, The Yurok language (Berkeley and Los Angeles,
16 PHONOLOGY
TABLE I
Hausa (1000 phonemes)
b 17.0 6 00.2
d 19.8 <f 03.7
k 21.9 k' 02.8
s 14.2 ts' 00.3
y 19.3 y' 00.8
TABLE II
Klamath (WOO phonemes)
P 02.8 b 01.8 P' 00.3
t 07.6 d 04.7 t' 01.9
08.7 j 00.2 ' 01.5
k 10.4 g 06.1 k' 02.1
q 02.4 g 02.4 q' 01.9
1 05.4 L 00.5 00.7
m 04.0 M 00.4 m' 01.3
n 13.9 N 00.1 n' 00.8
W 08.3 W 00.2 w' 00.4
y 08.6 Y 00.0 y' 00.6
TABLE III TABLE IV

Coos (1000 phonemes) Yurok (1000 phonemes)
P 02.9 P' 00.0 p 08.9 p' 01.0
t 23.9 t' 01.1 t 14.3 t' 00.2
ts 12.8 ts' 00.0 c 12.9 c' 00.8
C 15.8 5' 01.9 k 38.3 k' 10.1
k 03.8 k' 01.0 k* 11.4 k"' 02.1
k 07.7 k' 02.0
q 09.9 q' 00.6
1 11.4 05.2
1958) first 1000 on pages 162, 164 and 166 [Greenberg]; Chiricahua from
H. Hoijer, Chiricahua and Mescalero Apache texts (Chicago, 1938), first 1000
on pages 5,10,15,20,23 and 25 [Greenberg]; Maidu from W. F. Shipley, Maidu
texts and dictionary (Berkeley and Los Angeles, 1963) first 1000 on pages 10,
20, 30 and 40 [Greenberg].
PHONOLOGY 17
TABLE V
Chiricahua (1000 phonemes)
unaspiratedlunglottalized aspirated glottalized
d 28.2 t 05.3 t' 03.3
z 03.0 c 05.8 c' 00.0
Z 07.7 01.8 C' 02.8
02.3 00.1 ' 01.2
g 21.4 k 13.4 k' 03.7
TABLE VI
Maidu (1000 phonemes)
p 09.3 p' 00.5 6 05.9
t 19.6 t' 01.4 rf 13.1
ts 00.2 ts' 19.9
k 19.2 k' 11.4
TABLE VII
Summary
Hausa: non-glottalic 92.2; glottalic 07.8
Klamath: unvoiced stops/voiced sonants 72.1; voiced stops/
unvoiced sonants 16.4; glottalized' 11.5
Coos: unvoiced non-glottalized 88.2; glottalized 11.8
Yurok: unvoiced non -glottalized 85.8; glottalized 14.2
Chiricahua: unaspirated 62.6; aspirated 26.4; glottalized 11.0
Maidu: unvoiced unglottalized 48.3; glottalized 32.7; im-
plosive 19.0
The material just cited displays a decisively greater over-all

frequency for non-glottalized over glottalized consonants and, in
the case of Chiricahua, also the unaspirated over aspirated as is
evident from the summary in Table VII. Further, with the single
exception of the Maidu pair ts/ts' this relationship holds for every
single pair. In Chiricahua where the second and third column
consist of consonants with the different marked features aspiration
and glottalization the usual hierarchy is unaspirated, aspirated,
glottalized as though glottalization were an even more marked
18 PHONOLOGY
feature than aspiration. However, for the sets with the lowest
over-all frequency /' and /' this is reversed. There is, however,
no exception in Chiricahua to the rule that in each set the consonant
in the first column is more frequent than that in either the second
or third.
For vowel nasalization, Ferguson and Chowdhury report a short
count on Bengali vowels in which the ratio of non-nasalized to
nasalized vowels was 50:1. I counted the first thousand vowels
in Stendhal's Le rouge et le noir and found 82.5% oral vowels to
17.5% nasal.4 In connection with vowel length, further data are
given below for Chiricahua Apache in which the vowel system
involves both length and nasalization. Here the ratio of oral
vowels, whether short or long, was 12.8:1 to the number of nasalized
vowels, whether short or long.
Data are now presented for vowel length for Icelandic, Sanskrit,
Hungarian, Finnish, Karok, and Chiricahua.6
TABLE VIII TABLE IX

Icelandic Sanskrit
a 9.724 a: 1.560 a 19.78 a: 8.19
e 5.728 e: 1.612 i 5.85 i: 1.19
i 7.028 i: 1.476 u 2.61 u: .73
1.488 : .632 e 2.84 a:i .51
4
C. A. Ferguson and M. Chowdhury, "The phonemes of Bengali", Language,
36, 22-59 (1960). A. Valdman, in "Les bases statistique de l'anteriorito arti-
culatoire du francais", Le Franfais Moderne 27.102-10 (1959) reports a frequency
of 16.2% for nasal vowels in a sample of 12,144 from a variety of oral texts.
* Icelandic, reported in G. K. Zipf, Psychobiology of language 318 (Boston,
1935), (sample size 25,000); Sanskrit, W. D. Whitney, Sanskrit grammar 26
(Cambridge, Mass., 1889) (sample size 10,000); Czech, H. Kucera, "Entropy,
redundancy and functional load in Russian and Czech", American contributions
to the 5th International Congress of Slavists 191-218 (Sofia, 1963), (sample
size 100,000); Hungarian, J. Lotz, "Vowel frequency in Hungarian", Word
8.227-35 (1952) (17,760 from writings of Petofi); Finnish, count by J. H
Greenberg from R. Austerlitz, Finnish reader and glossary (The Hague, 1963)
(first 5000 in pages 1, 3, 6 and 16); Karok by J. H. Greenberg from W. Bright,
The Karok language (Berkeley and Los Angeles, 1957) (first 1000 on pages
162 and 182); Chiricahua, by J. H. Greenberg from H. Hoijer, Chiricahua and
Mescalero Apache texts (Chicago, 1938) (first 1000 from pages 5, 10 and 15).
PHONOLOGY 19
u 4.336 u: .524 1.88 a:u .18
.600
TABLE X TABLE XI
Czech Hungarian
a 6.83 a: 2.08 a 22.48 a: 11.62
e 9.40 e: 1.11 e 26.64 e: 07.57
i 6.49 i: 3.70 i 09.30 i: 01.06
8.24 o: 0.00 o 11.00 o: 01.95
u 3.07 u: 0.60 u 02.63 u: 00.70
02.95 : 01.62
01.08 : 00.32
TABLE XII TABLE XIII

Finnish Kar ok (1000 phonemes
a 21.4 a: 03.0 a 40.3 a: 7.2
e 16.7 e: 01.4 e: 3.2
i 23.7 i: 01.3 i 20.6 i: 3.1
11.3 o: 00.2 o: 2.1
u 07.9 u: 00.8 u 19.1 u: 2.1
a 07.5 : 01.3
00.3 : 00.0
02.9 : 00.3
TABLE XIV
Chiricahua
a 31.8 a: 06.8 00.8 : 00.7
e 07.7 e: 05.0 g 00.0 : 00.0
i 19.4 i: 02.5 03.5 I: 00.9
08.1 o: 04.4 00.8 : 00.1
vocalic nasals: n 07.4; n: 00.1
For Icelandic, Sanskrit, and Czech the frequencies have been given
in reference to the entire set of phonemes; for the others the
percentages are of the vowel total. In Table XV the figures are all
reduced to percentages of vowel occurrences:
20 PHONOLOGY
TABLE XV
Icelandic: short vowels 83.3; long vowels 16.7
Sanskrit: short vowels 74.8; long vowels 25.2
Czech: short vowels 82.0; long vowels 18.0
Hungarian: short vowels 75.2; long vowels 24.8
Finnish: short vowels 91.7: long vowels 08.3
Karok: short vowels 80.0; long vowels 20.0
Chiricahua.
short non-nasal vowels 67.0; long non-nasal vowels 18.7;
short nasal vowels 05.1; long nasal vowels 01.7;
short syllabic nasal 07.4; long syllabic nasal 00.1
The Chiricahua results are particularly noteworthy since we have

two marked features, length and nasality, in combination, and
the prediction is borne out that the vowels with two unmarked
features should show the highest frequency, and those with the
two marked features, the lowest frequency, while the other two
categories have intermediate values.
As a final piece of evidence for the normally higher frequency
of unmarked features, data concerning the palatalized and un-
palatalized consonants of Russian are presented in Table XVI.
TABLE XVI
100 samples, 1000 each
P 23.090 P' 4.750
b 10.960 b' 3.680
f 9.470 .590
V 29.780 v' 10.160
t 42.660 t' 18.850
d 16.650 d' 10.390
s 30.930 s' 18.630
k 31.750 k' 5.340
m 23.170 m' 8.050
n 41.000 n' 22.970
1 26.640 r 20.810
r 29.070 r' 13.810
PHONOLOGY 21
Thus far two characteristics of unmarked features have been con-

sidered, appearance in internally conditioned neutralization and
higher frequency. Certain additional observations may be offered.
For example, Hockett notes two criteria of the unmarked member,
which he calls the simple as opposed to the complex." The first,
wider distribution in terms of environment, seems to resolve itself
into the already mentioned phenomenon of neutralization. For
insofar as the unmarked (simple) member has a wider distribution
it appears in environments in which the marked (complex) member
does not. Such environments are, precisely, environments of
neutralization in which the unmarked member appears as the
representative of the set. The other, however, is genuinely new,
namely, the greater variety of subphonemic variation. Thus in the
example mentioned by Hockett, the stops in Nootka, 'the unglot-
talized stops are unaspirated before vowels, aspirated finally or
before consonants; the glottalized stops and the spirants show no
such variations. From this, the conclusion is drawn that the
unglottalized stops are the unmarked or simple series.
Further, a connection exists, as a general rule, between the more
basic, that is, the implied feature in universal implicational state-
ments of phonology, and the unmarked feature. For example in
statements of the type that in any language the number of phonemes
with a particular feature is never greater than the number of
phonemes with some other feature, it generally seems to be the set
characterized by the marked feature which is less than or equal in
number to the set with the unmarked feature. Thus in Ferguson's
already cited statement that the number of nasal vowels is never
greater than the number of non-nasal vowels, it is nasality which
is the marked feature. It will be noted that other evidence for the
marked states of nasal vowels, of a frequency nature, has already
been adduced. The weaker form of such statements, namely that
all languages with nasal vowels have oral vowels, will, of course,
also have the marked feature as implicans and the unmarked feature
as implication, nasality implies nonnasality in vowels. There is
one fairly common exception to this general principle, namely
C. F. Hocken, Manual of phonology 166-7 (Baltimore, 1955).
22 PHONOLOGY
vowel length. It is not unusual for the number of long vowels to

be larger than the number of short vowels. The usual type of
exception is one in which midfront and back vowels e' and o' exist
without a short partner, as in Karok cited above, in many colloquial
forms of Arabic, and elsewhere.7 These quite possibly always arise
from the monophthongization of the diphthongs ai and au as is
known to have occurred in Arabic.
Again it will be found that in generalizing statements regarding
sound sequences it is usually the unmarked feature which figures
in the implication of conditional statements. Thus in the statement
that the existence of clusters containing at least one glottalized
member implies the existance of clusters containing exclusively
non-glottalized members, it is the unmarked feature, non-glottalized,
which is the implied one.8
One further phonological peculiarity of the marked/unmarked
opposition may be noted. In some cases, a particular allophone of a
phoneme may be looked upon as basic compared to one or more
others. I believe that it will always turn out that the basic variant
is the most frequent, but since frequency counts are always made
in terms'of phonemes, rather than allophones, there are no available
data. An alternative method for accounting for this choice is that
the non-basic allophone occurs in environments which share
specific features with the allophone, i.e. are assimilative, while the
basic allophone is independent of the phonetic nature of its environ-
ment. It is hypothesized that the non-basic allophone differs from
7
However, in certain cases at least more detailed consideration may suggest
a different analysis. Thus Egyptian Arabic is usually described in terms of a
vowel system a, i, u, a:, /':, u:, e:, o: but according to T. F. Mitchell, An intro-
duction to colloquial Arabic 112, (London, 1958) there is a qualitative distinction
in unstressed syllables between / proper and / in alternation with the /': of
stressed syllables and similarly for some speakers with u. Besides this, shortened
form of e: and o: exist in unstressed syllables. Phonetically then there would
be six (or for some speakers seven) short vowels as against five long vowels.
Perhaps the system can be 'interpreted as qualitative rather than quantitative.
On these questions, cf. C. F. Hockett, Manual of phonology 76-8. (Baltimore,
1955). '
* J. H. Greenberg, "Nekotoryje obobScenija kasajuScijesja vozmoinyx
ncal'nyx i konecnyx poskdovatelnostej soglasnyx", Voprosy Jazykoznanija,
4.41-65 (1964).
PHONOLOGY 23
the basic allophone by possession of a marked feature, while the

basic allophone has the corresponding unmarked feature.
An example is conversational Dutch where, to quote Daniel
Jones, "the sound g exists, but only before voiced consonants, e.g.
before d in [zagduk] (zakdoek, handkerchief) ... g is therefore a
member of the k phoneme in that language".9 It is psychologically
interesting that Jones here indentifies the phoneme with its basic
allophone, the unvoiced k as against the marked and assimilative
voiced allophone g. l believe that in general where phonemicists
are forced to a choice of symbols for a phoneme with a number of
differing allophones, they choose one which phonetically represents
the unmarked feature.
It may be further noted that the concept of the basic allophone
is closely related on the allophonic level to that of the internally
conditioned representative of the archiphoneme on the phonemic le-
vel. If, for example, k and g contrast in certain classes of environ-
ments they are separate phonemes. If in some other environment
jk/ appears as there presentative of the archiphoneme where its
occurrence is not determined by the unvoiced nature of the environ-
ment, it is internally conditioned. If [k] and [g] do not contrast in
any environment they are allophones of the same phoneme but
the (usually unmarked) allophone which is independent of the
phonetic nature of the envoronment is the basic allophone as here
defined.
The characteristics of the marked/unmarked oppositions in
phonology have, no doubt, been drawn in very broad strokes. It is
in order, however, to call attention to the fact that certain opposi-
tions show these characteristics more clearly and coherently than
others, e.g. the glottalized/non-glottalized opposition in consonants
and the nasal/oral opposition in vowels, while others, e.g. aspiration
in consonants and length in vowels, are less consistent. However,
it is reasonable to assert a general predominance of evidence for
the view presented here. It is also realized that the criteria of the
marked/unmarked oppositions have been presented very much as
an empirically arrived at cluster. To what extent there exists an
' D. Jones, The phoneme: its nature and -use 20 (Cambridge, Mass., 1962).
24 PHONOLOGY
inner connection among these and to what extent they are logically
independent has not been treated. This matter will be taken up later
after the problem of the marked and unmarked in grammar and
semantics has also been considered. One additional observation
may be offered before going on to these other topics. It should be
noted that in some cases we had what might be called conditional
categories for marked and unmarked. For example, whereas for
obstruents, voicing seems clearly the marked characteristic, for
sonants the unvoiced feature has many of the qualities of a marked
category.
GRAMMAR AND LEXICON
As was noted earlier, Jakobson in his article "Signe Zlro" indicated

the wide applicability of the marked/unmarked concept already
current at that time in phonology. In a much later formulation an
over-all definition is attempted. "The general meaning of a marked
category states the presence of a certain property A; the general
meaning of the corresponding unmarked category states nothing
about the presence of A and is used chiefly but not exclusively
to indicate the absence of A."1 This definition may be illustrated
by a number of examples. In English 'man' has two meanings, the
masculine being the unmarked category. Thus, in Jakobson's
terms, 'woman' states the presence of the marked category,
'feminine', while 'man' is used chiefly but not exclusively to indicate
the absence of'feminine'. 'Man' thus has two meanings, to indicate
the explicit absence of'feminine' in the meaning 'male human being'
but also to indicate 'human being' in general. It is this ambiguity
of 'man' which is exploited by Shakespeare in Hamlet's interview
with Rosencrantz and Guildenstern when he says, "No! Man
delights not me, nor woman neither, though by your smiling you
seem to say so." In the earlier part of his speech Hamlet had de-
scribed his reactions to nature so that the opposition first in mind
is nature vs. man, but immediately the possibility of the opposition
man vs. woman also occurs to Hamlet. The pervasive nature in
human thinking of this tendency to take one of the members
of an oppositional category as unmarked so that it represents
either the entire category or par excellence the opposite member to
the marked category can be shown to operate even within the
1
R. Jakobson, Shifters, verbal categories and the Russian verb 5 (Cambridge
Mass., 1957).
26 GRAMMAR AND LEXICON
austere confines of mathematical and logical symbolism. Thus

negative is always taken as the marked member of the positive-
negative opposition; -5 is always negative, but 5 by itself is either
the absolute value of 5, that is 5 abstracted from its sign value or
+5 as the opposite of the marked negative category. So, in logic/)
was used ambiguously either as the proposition p abstracted from
its truth value as either true or false or, on the other hand, for the
assertion of the truth of p. Note that logicians use the term 'truth
value', involving the unmarked member, not 'falsity value' to express
the over-all category which has truth and falsity as members so
that, as usual the unmarked member stands for the whole category
in the position of neutralization.
It is probable that the observation widely reported in the ethno-
graphic literature that the term for 'human being' and the tribal
name are the same words is not to be set down to feelings of tribal
superiority as is often assumed, but is rather an example of the
same principle. Thus for the Maidu of California in whose language
majdy indicates both a member of the Maidu tribe and human being
in general, Maidu is the unmarked member of a category in which
all other human groupings are separate marked members referred
to by phrases in which majdy has a modifier: Paviotso folommam
majdy; Negro pibutim majdy, white man wolem majdy, Yana
kombom majdy etc. but majdy either human being, or Maidu.
The at first sight rather tenuous connection between this notion
and that of marked and unmarked in phonology might be stated
in the following terms: the ambiguous nature of the unmarked term,
as indicating both the generic category and the specific opposite
of the marked member, is paralleled by the likewise ambiguous
status of the unmarked member of the phonological opposition
which in the position of neutralization represents the archiphone-
me, that is the common features of both the marked and unmarked,
as well as, by its physical nature, the unmarked member.
An important further characteristic of the marked/unmarked
opposition is indicated by the title of Jakobson's earlier article,
"Signe Zero"; I shall refer to it as zero expression of the unmarked
category. We have already encountered this in the Maidu example
GRAMMAR AND LEXICON 27
quoted above. Thus, parallel to the example man (unmarked),

woman (marked), we have author (unmarked), authoress (marked)
in which 'author' indicates either a writer regardless of sex or
specifically a male writer, whereas 'authoress' designates only a
female writer. In this latter instance the unmarked term author
has a zero where the marked term authoress has an overt affix -ess.
A third characteristic of the unmarked/marked distinction, also
pointed out by Jakobson, is syncretization. By this is meant that
distinctions existing in the unmarked member are often neutralized
in the marked categories. To illustrate this point, and a number
of others, reference will be made to the category of number within
which the unmarked status of the singular as against the marked
status of the plural provides a 'classic' example of the distinction
between marked and unmarked. In German the article and both
weak and strong forms of the adjectival declension have the same
forms for all three genders in the plural. In Hausa with two
genders, the masculine-feminine distinction is maintained in the
singular but neutralized in the plural. In classical Latin the dative
and ablative cases, in general distinct in the singular, are syncretized
in the plural. In the same language the vocative is different from
the nominative only for certain members of the two unmarked
categories masculine and singular but is the same for both cases
in the feminine and neuter and throughout the plural. Exactly
the same holds for the vocative in Bulgarian a language with only
two genders. Many other examples could be cited.
Among the writers besides Jakobson who have discussed the
unmarked-marked distinction in grammar are Hjelmslev and
Trnka.2 Hjelmslev in the Prolegomena considers what is essentially
the unmarked/marked distinction under the terms extensive
(unmarked) vs. intensive (marked). In the systematic discussion of
the various criteria for marked and unmarked in grammatical
categories which follows there will be occasion to mention some of
1
L. Hjelmslev, Prolegomena to a theory of language (Baltimore, 1953);
P. Trnka, "On Some Problems of Neutralization", Omagiu lui Jorgu lordan
861-6 (Bucharest, 1958).
those suggested by Hjelmslev and to make some use of his ter-

minology.
The criterion of syncretism has already been illustrated for the
generic category of number. That of zero expression is also easily
exemplified for the same category. The singular frequently has no
overt mark while the plural is marked by an affix as in English,
except for plurals of the type 'sheep'. A more careful statement
would therefore be that in no language is the {.lural expressed by
a morpheme which has no overt allomorph, while this is frequently
true for the singular. Another example is the marked status of
the adjectival comparative and superlative as against the unmarked
positive, e.g. in English where the positive has no overt mark while
the comparative and superlative have the suffixes -er and -est
respectively.
A third criterion is that equivalent to the definition of Jakobson
quoted above with semantic and derivational exemplification. We
will call it facultative expression. An example for the category of
number is Korean which has a plural suffix -tul which need not
always be used. Thus the form labelled singular in Korean gram-
mars, which incidentally has zero expression, may be either
specifically singular, or on occasion be used when more than one
object is involved, while the plural form is only used with plural
meaning. The phenomenon which corresponds to facultative
expression from the viewpoint of the hearer may be called par
excellence interpretation. Thus it may be presumed that the
Korean listener interprets the zero form usually or par excellence
as singular but as plural where the situation demands it. Indeed it
is precisely in such cases that the zero form will be used to express
the plural rather than -tul because the plural interpretation is
forced by the context. The suffix -tul, on the other hand, is always
interpreted as plural. I therefore consider facultative expression
and par excellence interpretation as statements of the same fact
from two points of view and usually refer to it by the first of these
expressions.
A fourth criterion is that called participation by Hjelmslev. I
prefer to call it contextual neutralization or simply neutralization
where the context makes it clear that we are dealing with non-
phonological matters. In certain environments the opposition
between two or more categories is suppressed, and it is the un-
marked member which appears. In Hungarian, Turkish and
certain other languages only the singular form of nouns may appear
with cardinal numbers. This is obviously the closest analogue to
neutralization in phonology.
A fifth characteristic is the lesser degree of morphological
irregularity in marked forms. For example, in the verbs of classical
Arabic, the basic form as against such derived forms as the causative
and intensive shows variation in the internal vowel of the imperfect,
i.e. the forms yaqtilu, yaqtulu, and yaqtalu all exist so that there are
three allomorphs in the discontinuous morpheme of the imperfect
tense forms. In all the derived forms there is a single allomorph,
e.g. in yuqattilu in the corresponding form of the intensive. In
German all dative plurals have uniformly -n or -en depending on
phonological factors while the dative singular varies with gender
and declensional class. In Sanskrit, the dual which is so to speak
even more marked than the plural has not only extensive case syn-
cretism so that there are only three distinct forms but also greater
regularity than plural or singular, particularly in the oblique cases.
It may be observed that in general the oblique cases have a marked
character as against the direct cases.
A sixth characteristic will be called, in conformity with Hjelmslev's
terminology, defectivation. The marked category may simply
lack certain categories present in the unmarked category. In fact
for inflectional categories, defectivation can be considered a form
of syncretism. Thus one might say that in the marked subjunctive
category, French lacks a future. This would be in conformity with
the usual terminology of grammars of French, but one might also
argue that there has been syncretism of the present and future in the
subjunctive and that the concept of defectivation rests in the
identification of the subjunctive as a form of the present rather
than the future because of its greater formal resemblance to the
present indicative. It is of interest to note here that the present as
an unmarked category in relation to the future is taken as the
representative of both. Indeed whenever defectivation occurs in a

category of a set which intersects with another set, we get simul-
taneous evidence for the marked character of two categories. Thus
in the above example the absence of a future subjunctive shows
both the marked character of the future since it lacks a subjunctive
and of the subjunctive since it lacks a future.
Somewhat different are instances where certain items have fewer
categories than others and these can be identified on formal grounds
with one category rather than another which is then to be considered
unmarked. Thus in the example cited below of the Hebrew nominal
category of number, some nouns, the majority, have two forms
while others have three. For those that have two the categories
are singular and a category which on formal grounds as well as by
adjective agreements can be identified with the plural rather than
the dual of nouns with three forms.
I shall also consider periphrastic forms which supply forms for
'expected' inflectional categories as instances of defectivation.
Thus in the Latin verb, the tenses of the perfective system (perfect,
pluperfect, future perfect) are supplied in the passive by a syntactic
construction of past passive participle + verb 'to be', e.g. amatus
sum have been loved'.
However this may be, the concept of defectivation has a useful
application in the area of derivation which is, by definition perhaps,
not compulsory. Thus in classical Arabic practically all verbs occur
in Form I, the basic form, but hardly a single verb possesses all of
the derivational forms, and most are defective for a majority of
them.
A seventh characteristic of marked and unmarked category is
perhaps confined to the category of number. Where a heterogeneous
collection is to be named, that is one which has members of two
or more categories, one of them is often regularly chosen as
representative in the plural, or in the dual where that is appropriate.
The Arab grammarians call this taghlib or 'dominance'. An example
cited by them is Pabawani literally 'the fathers (dual)' with the
meaning 'father and mother', where once more the unmarked
masculine functions as a surrogate for the gender category.
Another instance is Sanskrit ahani 'the days (dual)' for 'day and
night'. Compare also such usages as Spanish los padres for 'parents'
lit. 'the fathers'; los hijos 'the children' lit. 'the sons'.
A related phenomenon is agreement a potion in which words
from two or more selective categories such as gender have a
common modifier and the modifier is in the unmarked category,
e.g. Spanish el hijo y la hija son buenos 'The son and the daughter
are good' (masculine plural).
Finally the question may be raised whether an analogue to the
frequency phenomenon in phonology exists likewise for grammatical
categories. Data here are very sparse, for there are very few word
frequency studies which give information about the frequency of
the grammatical categories to which the words belong. Data will
be presented at this point only in regard to the category of number
in the noun where there is much evidence for a hierarchy singular,
plural, dual from the most unmarked to the most marked. Cor-
responding to the situation in phonology we might expect that the
text frequency for nominal categories of numbers will be singular
(most frequent), plural (less frequent), and dual (least frequent).
The data that I have been able to collect are: (1) for the noun in
the Rigveda by C. H. Lanman; (2) for the Russian noun by
Josselson; (3) for the Latin noun by using the data in the exhaustive
concordance of Terence by Edgar B. Jenkins.8 Zipfs list of word
frequencies compiled from four plays of Plautus was not suitable
for this purpose because homonymous forms are lumped together
(even forms which differ in vowel quantity), thus the occurrences of
eo go, thither, in him, in it' are all under one undifferentiated
entry. (4) I have recorded for the first thousand nouns in Francois
Mauriac's Le chair et le sang whether they were singular or plural.
Data from these and other studies will be cited later in regard to
other grammatical categories. The results for number in the noun
are set forth in Table XVII.
* C. H. Lanman, "Noun inflection in the Veda", Journal of the American

Oriental Society 10.325-601 (1880); H. H. Josselson, The Russian word count
(Detroit, 1953); E. B. Jenkins, Index verborum Terentianus (Chapel Hill, 1932).
TABLE XVII
Language Size of Sample Singular Plural Dual

Sanskrit 93,277 70.3 25.1 04.6
Latin (Terence) 8,342 85.2 14.8
Russian 8,194 77.7 22.3
French 1,000 74.3 25.7
The results are therefore in accordance with expectations.
Taking into account the criteria just discussed and thus far illus-
trated chiefly from the category of number in the noun, the evidence
for the marked or unmarked character of a number of generic
grammatical categories will now be considered. It will appear
that the various criteria tend to converge in a large number of cases
so that particular categories can be said to be marked or unmarked
on a cross-linguistic basis. The present conclusions are not based
on a formal sample and may therefore seem to be unsystematic
and anecdotal. However, whenever exceptions are known to
me or there is no clear evidence for the marked or unmarked of
a particular set of categories, this is pointed out. It should be noted
that for hypotheses of the type presented here certain languages
will not provide relevant evidence or will provide evidence which
is compatible though not confirmatory. Consider, for example,
evidence that the singular is unmarked as against the plural on the
basis of syncretization. Relevant evidence will not be forthcoming
from a particular language (1) if it does not have the category of
number in the noun; (2) if it does have number, but there are no
intersecting categories such as gender that are susceptible of syn-
cretization. Even if number exists and intersects with another
inflectional category, the language may not show syncretization
in either the singular or plural. In this case the facts of the language
are compatible with the thesis but not confirmatory in the strong
sense.4 If there is syncretization in the plural but not the singular,
4
This has been called the paradox of confirmation, that is, that a hypothesis
of the form -* on the formal interpretation of implication is true for all
truth values of and except truth of combined with falsity of . Hence it
holds if both and are false. Cf. G. H. Von Wright, A treatise on induction
and probability, 64-65 (Patterson, 1960).
the case is a confirmatory one, and such examples will be cited. If

there is syncretization in the singular but not in the plural, then the
language question presents a disconfirming instance. In general
confirming and disconfirming minstances are discussed here but no
others.
It should be evident that for distinctions in grammatical categories
to be clearly analyzable in terms of marked and unmarked a vast
amount of orderliness in language phenomena is required. Consider
for example a single generic category in relation to a single one
of the criteria just mentioned, say, the singular/plural category in
verbs and the criterion of frequency. Proceeding inductively for
the moment, we do not predict which will be the more frequent,
but we do expect that one or the other will be the more frequent
in all languages. We thus have a statement of universal scope.
If the available evidence indicates, as it does, that the singular
category is more frequent, we now called the singular the unmarked
and the plural marked. From this we now deduce certain other
universals, for example, that cases of zero expression of the singular
in the verb may be found as against non-zero expression in the
plural or, at least that the opposite will not occur. Such predicted
universals are deduced from what may be called second level
hypotheses as against empirically derived universals. These second
level hypotheses take such forms as the following: whenever
one specific category shows consistently greater text frequency than
another specific category of the same generic category, then the
one with greater text frequency also shows zero expression and
vice versa. Then unmarked is merely the name that we give to
the category which exhibits these features of higher frequency,
zero expression, etc. A still higher or third level of generalization
would be reached if we could construct a general theory from which
the second level interconnections of predicates of the kind just
cited could be derived. Such a theory would have two facets. On
the one hand from it we could deduce that any categories have
one predicate e.g. greater frequency would have some other predi-
cate e.g. zero expression. On the other hand it would allow us
to predict, a priori, regarding any specific set of categories, when
given their definitional specifications, what the relations of marked

and unmarked would be among them.
Even the first stage, of course, is far from accomplished yet and
requires far more extensive investigations than those reported here.
All that is asserted is that a cursory examination of the evidence is
encouraging in that in most cases the establishment of a marked-
unmarked categorization lies unambiguously and decisively in one
direction rather than another.
The category of number in regard to nouns has been largely
utilized for illustrative purposes in the previous sections. We now
proceed more systematically considering a fairly large number of
grammatical categories. In this enumeration we start with number
but now treat it as a category in pronouns, adjectives, and verbs,
in addition to nouns. In all cases we find the same hierarchy.
As exemplified in the frequency data from the Vedic noun, where
the dual is present, we have the singular as unmarked in relation
to the plural and the plural as unmarked in relation to the dual.
Those binarily inclined may look upon this as two binary relations,
or, if one prefers, one may describe it as a three-fold hierarchy,
singular, plural, and dual. Of course when the dual is present, the
plural no longer has the same meaning as when it is absent since
with the dual it means three-or-more, without the dual two-or-more.
It is noteworthy that when the dual is lost, it is absorbed into the
plural and not into the singular. No language has a single category
embracing one or two and another referring to three or more. In
languages with a dual and plural we sometimes have facultative
expression of the dual, in that the plural may be used for two
objects, while the dual only indicates two objects and need not
always be used. Sometimes the dual shows defectivation, e.g. in
Biblical Hebrew where only a limited number of nouns have a dual,
or rather, as explained earlier, for nouns with two forms the second
is to be identified with the plural rather than the dual of those with
three forms.
In pronouns, there are instances in which the plural is facultatively
expressed, e.g. in the older form of Mandarin, where, in addition,
the singular has zero expression, e.g. wo T, or on occasion 'we',
plural w6 men, always 'we'. In languages in which the second

person plural may be used as a polite form of address for a single
person, this same plural pronoun is often either polite or intimate,
e.g. French. In such cases we can say that the category of polite/
intimate is neutralized in the plural (the marked category). Pro-
nouns like nouns often show syncretism of gender or case categories
in the plural as against the singular or the dual as against the plural,
e.g. the lack of gender distinction in the English third person
pronoun in the plural, and that of classical Arabic in the dual in
contrast to the singular and plural. Thus in the third person
pronouns of classical Arabic we have singular huwa 'he'; hiya 'she';
in the plural hum 'they (masc.)'; hunna 'they (fern.) but in the dual
hurna 'they two' for either gender.
From the index to Terence's plays mentioned above we can
extract the evidence given in Table XVIII regarding the relative
frequency of Latin pronouns.
TABLE XV11I
First person singular 1786 : First person plural 146
Second person singular 1267 : Second person plural 98
Third person singular 'is' 750 : Third person plural 75
Third person singular 'ille' 531 : Third person plural 90
Third person singular 'iste' 88 : Third person plural 32
Total singular 4,422; Total plural 441
The following figures (see Table XIX) for pronominal forms are
taken from the Lorge Magazine count.5 It includes, however,
many non-pronominal occurrences of *it'. The second person
pronouns are not included since they are identical for singular and
plural.
TABLE XIX
Singular: 189,489; me 23,364; my 22,184; he 49,268; she 31,087;
it 52,107; his 30,748; her 31,869; its 5,827; him 18,136.
Total, 354,079.
* In E. L. Thorndike and I. Lorge, The teacher's word book of 30,000 words
(New York, 1944).
Plural, we 17,996; us 4,943; our 7,599; they 18,010; their

12,312; them 10,278.
Total, 71,138
For adjectives, the category of number is considered to be present
when the adjective agrees in number with the noun. Adjectives
display syncretism in the plural or dual just as does the noun. Thus
the adjective in German, Russian, Danish, and certain other
gender languages has total syncretization for gender in the plural.
In classical Arabic there is facultative use of the singular feminine
adjective in agreement with feminine plural nouns and, in fact,
it is more frequent than the plural feminine adjective form in this
syntactic function.
Data were compiled on number and gender forms for adjectives
in Spanish from Bou's volume on Spanish word frequencies.*
This work lists inflected forms separately. Unfortunately homo-
phones are not distinguished, e.g. corto cut' and corto 'short'
(masc. sing.) are consolidated. The hundred most frequent
adjectives which did not have any homophonous forms were
utilized. For 99 out of 100, the total plural occurrences were
more numerous than the singular, comprising 77.2% of 215,362,
while the singular accounted for the remaining 22.8%. The
percentages are thus close to those given above for nouns.
The category of number in verbs is found in two different ways.
Verbs may agree with their subject in number or more frequently in
number and person. There are also languages in which the verb
itself has singular and plural (sometimes also dual) stems which
indicate singularity or plurality of action. In such cases plurality
includes plural subject and/or plural object or frequentative action.
Considering the number category in terms of agreement with the
subject there are numerous indications once more of the unmarked
status of the singular. An example of contextual neutralization is
found in classical Arabic where if the verb is initial in the sentence
and there is an expressed nominal masculine subject only the
* I. R. Bou, Recuento de Vocabulario Espaftol vol. I (Universidad de Puerto

Rico, 1952).
masculine singular, of the verb can occur whether the noun subject
is singular, dual, or plural. In Semitic languages where the verb
has both sex gender and nominal concord with the subject there
is often syncretization of gender in the plural. Thus in Biblical
Hebrew in the perfective of the verb, there is gender distinction
in the third person singular but not in the plural. Many colloquial
Arabic dialects have gender distinction in the singular second and
third person but not in the plural for either person. In Tunica, an
Amerind language of the Gulf group, the verb has singular and
plural agreement forms. According to Haas, "the use of the plural
is far from consistent; one finds cases of plural occurrence referred
to by singular".7 This then appears to be an instance of facultative
expression.
Zero expression of the singular is common in imperatives.
Instances are German which always suffixes -t or -et to the singular
imperative to form the plural, and Russian which always suffixes
-Ye for the plural.
Quantitative data regarding number in verb forms are presented
in Table XX for Vedic Sanskrit, Latin, and Russian.8
TABLE xx
Tot
Total Singular Dual Plural
Sanskrit 29,370 71.0 05.6 23.4
Latin 10,948 91.0 09.0
Russian 3,560 77.1 22.9

(conversational)
The next category to be considered is case in the noun. Here for

the first time we encounter in particularly acute form a problem
which enters to some extent in all comparisons of grammatical
categories. Comparing the case systems of even fairly closely
related languages, we see that they cannot be equated with anything
7
M. Haas, "Tunica", Linguistic Structures of Native America ed. H. Hoyer 35
(New York, 1946).
* Russian, H. Josselson, op. cit.\ Latin, E. B. Jenkins, op. cit.; Sanskrit,
J. Avery, "Verb-inflection in Sanskrit", Journal of the American OrientafSoclety
10.219-324 (1880).
like the same closeness of fit that is generally possible for the category
of number. There are often a different number of cases and even
the same conventional name may hide important differences.
However, some reasonable, if rough, equivalences can be made,
e.g. the notion of direct cases (nominative, accusative, vocative) as
a group and the oblique cases. There is generally a possessive or
genitive case and a case of the subject and one of the object. Con-
fining ourselves to the direct/oblique opposition, we often find
that one or more direct cases have zero expression as compared
to the oblique suggesting that the direct cases comprise an un-
marked category in relation to the oblique. Thus in Turkish the
nominative and the indefinite accusative have a zero affix, while
definite accusative and all the oblique cases have overt marking.
In Sanskrit neuter nouns are distinguished from masculine and
feminine nouns in the direct cases of the plural, but this gender
opposition is neutralized in the oblique cases. In Latin the neuter
and masculine in general are only distinguished in the direct cases
and are merged in the oblique.9
In Table XXI data for direct and oblique case of the noun are
given for Sanskrit, Latin, and Russian.
TABLE XXI
Sample Size Direct Oblique
Sanskrit 93,277 72.5 27.5
Latin 8,342 68.7 31.3
Russian 6,194 65.2 34.8
The total for the direct cases is thus substantially greater than for
the oblique cases even though for each language the number of
oblique cases is larger.
For the category of gender, whether sex or non-sex, the evidence
is less clear than for the items already discussed. Here the problem
of interlinguistic comparability is, in general, even more difficult
than for case systems. By such terms as masculine or feminine are
' The fourth declension is a marginal exception in that standard grammars
give for the dative singular -HI but for the rare neuter -. In fact -u is at least
equally as common as -ui for the masculine.
meant heterogeneous collections of nouns which, however, share a

common semantic core in that all, or nearly all, masculine living
beings are in the gender labelled 'masculine' and correspondingly
for the feminine. In principle the same holds for other cases of
'natural' gender, e.g. animate vs. inanimate, 'tree' gender (as in
Bantu languages), etc. There are other instances, however, in noun
class languages in Africa and New Guinea, in which certain genders
have no discernible semantic core and cannot be reasonably
equated with specific genders in other languages.
Where masculine and feminine genders exist with or without
further genders, the masculine usually appears to be the unmarked
gender. Thus in Semitic languages the masculine singular in adjec-
tives has zero expression in the adjective and usually in the noun
as against the feminine singular marked by a suffix -at (classical
Arabic), Hebrew -3, etc. In Tunica, with a two-gender system,
the dual/plural distinction in the pronoun and article is found in
the masculine but neutralized in the feminine. In Bulgarian the
adjective has a vocative distinct from the nominative only in the
masculine singular. The masculine singular also has a nominative/
accusative case distinction in the definite form, a difference not
found in feminines, neuters, or plurals. Th masculine plural
shows two main variants in and in ove ~ eve, while the feminine
has uniformly /. An example of contextual neutralization may be
cited for classical Arabic. Certain adjectives which can by definition
only modify terms for females, e.g. 'pregnant', have a single form.
The unmarked masculine category appears in such cases. The
example is mutatis mutandis just like that of the neutralization of
the nominal number category with numbers, e.g. in Finnish and
Hungarian.
A possible contrary instance is Oneida and other Iroquoian
languages in which, according to Lounsbury, the feminine is the
unmarked gender.
Where a neuter exists alongside of a masculine and feminine,
the neuter is the most marked category and can be opposed to the
masculine/feminine. A well-known example is the neutralization
in Indo-European of the nominative accusative case distinction in
the neuter noun. In Dravidian languages the neuter syncretizes the

singular and plural. Parallel to this is the unmarked status of
animates in languages with animate/inanimate gender distinction.
Thus in Algonkian languages the inanimate lacks the singular/
plural distinction which exists in animates and also the category of
obviation roughly defined as reference to the second mentioned of
two animates. On the other hand, there are instances where in
context of neutralization the neuter is used as representative of all
genders in three gender systems, e;g. German Es ist ein Tisch,
Russian eto stol 'It is a table'.
Frequency data on gender in Spanish are available from the
count of the hundred most frequent adjectives mentioned above.
Of the 100 adjectives utilized, 64 show gender distinctions. Of the
total of 155,500 occurrences of these 64 adjectives, 62.7% are
masculine and 37.3 % feminine, thus providing further evidence for
the unmarked status of the masculine. With number and gender
both considered the distribution of frequencies in this set of adjec-
tives was as follows: masculine singular 49.0%; feminine singular
27.8%; masculine plural 13.7%; feminine plural 09.5%.
For the adjectival category of comparison, the positive is clearly
unmarked while the comparative and superlative are marked.
Where the distinction exists the positive almost always has zero
expression as in English and the comparative and superlative may
utilize periphrasis as again in English 'more ' and 'most '.
In Serbian, the distinction between definite and indefinite
adjective declension is syncretized in the comparative and
superlative where only the definite appears. There is evidence for
the unmarked character of the comparative as against the superla-
tive, so that the hierarchy would be positive, comparative, and
superlative from unmarked to marked. The marked status of the
superlative is shown by the fact that in certain languages the
superlative is formed from the comparative by an additional affix,
e.g. French plus grand 'larger', le plus grand'the largest'; Hungarian
jo 'good', jobb 'better', leg-jqbb 'best'.
A few frequency data supporting this hierarchy may be cited.
The Thorndike-Lorge list in general counts coalesces the frequencies
of positive, comparative, and superlative. Thus for long, longer,

longest there is only a single entry under long. However the irregular
forms good, better, best, and bad, worse, worst are all listed
separately. The frequencies from the Lorge Magazine Count are
as follows:
1. good 5,122; better 2,354; best 1,850
2. bad 1,011; worse 430; worst 292
For Latin a count of adjective forms in Terence gives these results
in a total of 1,544:
positive 89.2; comparative 05.6; superlative 05.2
Josselson records for the Russian adjective:
positive 95.1; comparative 02.7; superlative 02.2
A further substantival category is that of normal size versus
diminutive or augmentative. While this is treated derivationally in
European languages, elsewhere e.g. in Bantu lanugages, diminutives
and augmentatives are gender classes. In European languages
normal size, the unmarked member, always has zero expression.
In languages of the Bantu group and elsewhere where there are
diminutive or augmentative classes (the latter often pejorative),
there is facultative expression of small or large size, that is the
normal gender can also be used for small or large objects. There is
probably a further hierarchy in that, as it seems, the existence of
an augmentative class implies the presence of a diminutive but not
vice versa, suggesting that diminutive is unmarked in relation to
augmentative.
In numerals the cardinals are obviously the unmarked class as
compared to ordinals. The cardinal almost invariably has zero
expression. A single exception of which I am aware is Somali where
cardinals have a suffix -ad appended to ordinals. However in some
instances the ordinal, probably because it is closer to the descriptive
adjective, possesses a category of gender lacking in the cardinal,
e.g. in Spanish and Italian. Sometimes the cardinal/ordinal dis-
tinction is neutralized for all numbers above a fixed lower limit.
In such cases it is invariably the cardinal as the unmarked member
that functions for both categories. An example is Hopi in which the
cardinal/ordinal distinction does not exist for numbers larger than

four and a set which resembles the cardinals formally represents
both categories.
This latter example illustrates another hierarchy for numerals,
lower numerals being unmarked in relation to higher numerals.
It is fairly common for all the numerals above a certain number to
be neutralized for some or all inflectional categories. Thus in
Sanskrit the numbers from five on have no gender distinctions
although they are declined for case. In Rumanian all numbers
from three on are undeclined. In Swahili, as in other Bantu
languages, numbers above five do not take class concord. These
data suggest a hierarchy within which successively higher numbers
are more marked, except that possibly the base of the numeral
system, most often ten and its multiples, will be less marked than
other comparably high numbers. The frequency correlate of this
will be the successively lower frequency of higher numbers. Com-
bining this with the cardinal/ordinal contrast, we would predict
that there is a similar hierarchy in ordinals, but that each ordinal is
less frequent than the corresponding cardinal. This receives striking
confirmation for English from the Lorge Magazine Count (see
Table XXII).
TABLE XXII
Cardinals Ordinals
1 17,569 5,154
2 5,958 926
3 2,873 501
4 1,637 221
5 1,426 193
6 806 65
7 615 59
8 657 57
9 468 42
10 1,260 69
Here the only reversal is for the cardinals seven and eight. Every
cardinal is more frequent than the corresponding ordinal. There
is a further remarkable regularity in the number ten, which is less

frequent than five but more frequent than six for both the cardinal
and the ordinal. Very similar results were obtained for Spanish,
French and German. These are shown in Table XXIII.10
TABLE XXIII
Spanish French German
Cardinal Ordinal Cardinal Ordinal Cardinal Ordinal
1 36,000+ 9,698 1,000+ 817 230,000+ 10,960
2 36,000+ 4,188 1,000 + 237 7,331 4,760
3 36,000+ 2,365 631 + 97 4,535 2,489
4 5,714 (3,923) 349 + 31 2,073 760
5 3,714 1,341 336 17 1,296 352
6 2,654 611 193 1,015 277
7 1,960 273 157 669 186
8 1,894 (589) 229 12 (1,018) (490)
9 955 463 92 7 264 122
10 2,078 112 244 12 921 154
The vast size of the Kaeding count and the fact that in German
higher numbers are treated orthographically as single words e.g.
zweiundzwanzig versus 'twenty-two' allows us to pursue this topic
for the interval 11-99. The relatively higher frequency of multiples
of 10 and the small absolute frequencies of numbers in this range
even in a very large scale count would seem to justify a summary
by decades. The results once again are strikingly confirmatory as
can be seen from Table XXIV.11
10
Spanish from I. R. Bou, op. cif., French, G. E. Vander Beke, French word
book (New York. 1929); German, F. W. Kaeding, Hufigkeitswrterbuch der
Deutschen Sprache (Berlin, 1897-8). In all three languages the numeral One*
includes occurrences of the definite article. For French + 1000 is arbitrarily
chosen for the 69 items not counted by Vander Beke. These are the 69 most
frequent items in the earlier and smaller count of Henmon. The highest
frequency among the items figuring in the Vander Beke count is quelque with
1232 occurrences. The figures in parentheses have larger than expected frequen-
cies because they include homonymous forms, e.g. Spanish cuarto, both
'fourth' and 'room.'
11
F. W. Kaeding, op. cit.
TABLE XXIV
Cardinal Ordinal
2-9 18,199 9,590
10-19 2,307 822
20-29 721 115
30-39 416 69
40-49 220 93
50-59 239 17
60-69 113 10
70-79 101 13
80-89 90 20
90-99 55 8
We now consider the category of person. The situation is complex.
In general the third person appears as the 'most unmarked' and
may be considered as in opposition to the first/second person.
The following are examples tending to show the unmarked status
of the third person vis-a-vis the first and second persons. In Syriac
in the perfective form of all conjugations, basic and derived, the
third person masculine singular and the plural for both genders
has zero expression. In the Akkadian permansive and the Hebrew
perfective the third person masculine is the only zero form. In
some languages, e.g. Latin, there is a class of impersonal verbs
which are neutralized for person and in which the third person
appears as the surrogate for all three persons. In Masai and Nandi,
Nilotic languages of East Africa, there are verb forms which
include the pronominal object. The third person object has zero
expression. The same is true for Kanuri, a Saharan language of
a different branch of Nilo-Saharan. There are a few discordant
facts, e.g. the zero of the first person singular in the Dutch verb
where the third person has overt form. As between first and second
persons the predominant evidence is for the unmarked status of the
first person. In German in the preterite both the first and third
person singular have zero, whereas the second person singular
and the entire plural have suffixes. However in imperative and
hortatory forms, the second person is evidently the unmarked form
and frequently has zero expression.
These considerations would lead one to posit, tentatively at

least, a hierarchy in which the third person was the least marked,
and the second person the most marked, with the first person
intermediate. This is in general supported by the frequency data
which are available. There is, of course, the problem of the kind
of text being sampled. Ideally we should prefer texts which
represent the normal conversational use of language. Noncon-
versational texts will tend, presumably to overrepresent third
person forms. In this respect the Thorndike-Lorge material and
Vedic Sanskrit would doubtless tend to favor the third person.
However the data from Terence's plays and Josselson's Russian
evidence in which conversational and non-conversational frequen-
cies are segregated still show the same phenomenon of third
person predominance. Except for Vedic Sanskrit the first person
is more frequent than the second person. The extraordinarily
high frequency of second person forms is here, no doubt, due to
the nature of the texts which are odes addressed to divinities.
Josselson's Russian materials show the predicted frequency order
third person, first person, second person both for conversational
and non-conversational samples with a much greater predomi-
nance of third person in the non-conversational texts, as would
have been expected. The statistical data for verb forms are given
in Table XXV.
TABLE XXV
1st person 2nd person 3rd per son
Sanskrit 11.3 34.6 54.1
Latin 29.3 25.4 45.3
Russian (conversational) 31.9 17.7 50.4
In the opposition active vs. mediopassive, it is clear that the active
is the unmarked member.12 Zero expression of the active is normal.
" The marked character of the passive only applies to languages in which
there is one construction in which the agens whether with an intransitive or
transitive verb is compulsorily expressed (active) and another in which the
agens may be deleted while the fattens must be expressed (passive). The
situation in other types of languages, e.g. the ergative type in which the transitive
pattens is equated with the intransitive agens remains to be investigated.
Examples are Danish in which the passive suffixes -es, while the
active has no overt mark, and Swahili in which the passive is
formed by -w suffixed to the active stem. Sometimes as in English,
the passive is formed periphrastically. The passive often syncretizes
forms which are distinct in the active. Thus Finnish has a single
form for the three persons and two numbers of the active. In
Albanian the indicative and subjunctive are not distinguished in
the passive. The expected higher frequency of the active over the
mediopassive is shown by the data of Table XXVI, from Latin
(Terence) and Vedic Sanskrit:
TABLE XXVI
Active Passive
Latin 90.2 09.8
Sanskrit 73.1 26.9
We now consider mode. Here there are of course considerable
interlinguistic differences. Primary is the difference between the
indicative from which statements can be formed which are true
or false and the various non-indicatives, imperatives, hortatives,
subjunctives, optatives, etc.
Leaving aside for the moment imperative-hortatives, which raise
certain special problems, the indicative may be considered the
unmarked category as against the marked character of the one or
more hypothetical modes. An example of syncretism is Italian in
which for the present subjunctive all the three persons of the
singular, distinguished in the indicative, have a single form, while
in the past subjunctive the first and second person singular which
are distinct in the indicative are merged. In Akkadian the sub-
junctive is marked by a suffix - added to the indicative.
Hortatives usually of the first and third person but sometimes
found in the second person distinct from the imperative are surely
a marked category as against the indicative. Sometimes, as in
Latin, a form with general subjunctive meaning may be used
hortatively, including in this instance a second person hortative.
Latin has in addition an infrequent second or third so-called future
imperative always marked by an -o or -to formative added to the
indicative or imperative, e.g. est, sunt. Hebrew has a hortative

suffix -o which may be appended to the imperfect indicative in the
first or third person indicative or to the second person imperatives.
In classical Arabic a jussive is formed for all persons with zero
ending as against suffixes in the indicative and subjunctive but it
is usually accompanied by a hortatory prefix H-. Hortatives then,
whether confined to the first and/or third persons or including also
a second person distinct from the imperative, show the characteris-
tics of marked categories. The Latin future imperative has a very
low text frequency.
On the other hand, imperatives proper often have zero expression,
particularly in the singular. In such cases, however, there is
sometimes a difference of stress pattern or in other suprasegmentals
so that the form is not in fact a 'pure stem' form. In Table XXVII
frequency data are shown for the indicative, subjunctive, imperative,
(including future imperative) in the Latin of Terence; for the indic-
ative, subjunctive, optative, and imperative of Vedic Sanskrit; and
for the indicative, conditional, and imperative in Russian:
TABLE XXVII
Indicative Subjunctive Optative Conditional Imperative
Latin 70.0 22.7 07.3
Sanskrit 58.5 12.4 03.7 25.4
Russian 84.1 02.3 13.6
Another basic category of the verb to be considered is that of

tense/aspect. Amidst the complexities of this category, there is
at least one point on which the evidence is unmistakeable, the
marked status of the future and its special relation to the present.
The future is practically always marked overtly by an auxiliary or
affix. Sometimes there is facultative expression of the future.
A single non-past form indicates either the present or future while
a specialized future form only refers to the future. This is by and
large the situation in English. Often the future is less differentiated
than the present in regard to moods and non-finite forms such as
participles and infinitives. Somewhat less clearly preterites seem
to form a marked category in relation to the present. Thus in

English the preterite has an overt marker -ed and does not dis-
tinguish person and number except in the auxiliary 'to be*. Here
the preterite has fewer overtly expressed categories with two forms
was/were as compared to the present with three, amfis/are. In
certain languages a simple 'timeless' form is par excellence present,
while both past and future are facultatively expressed by overtly
marked forms. Thus in Khasi of Assam, the past and future have
auxiliaries, while the present does not, and this form which is
present/MW excellence can also be used to express the past or future.
In regard to Swahili, Ashton states concerning the -- tense that
it indicates the "act in taking place within a period or at a point
in time. If nothing in the context indicates part or future time, it
refers to the present."18 The so-called historical present of Latin
may perhaps be interpreted in this manner.
The frequency evidence from Vedic Sanskrit and the Latin of
Terence support this view insofar as the most frequent tense is the
unmarked present, the next most frequent are various past tenses
which even taken together do not approach the present in frequency,
while the least common is the future. These data are presented in
Table XXVIII.
TABLE XXVIII
Present Past Future
Sanskrit 53.6 46.3 00.1
Latin 62.1 26.6 11.3
It is tempting so identify the imperfective of aspectual systems
with the unmarked present and the perfective with the marked
past. A supporting instance is classical Arabic. Here the imper-
fective is commonly used for past habitual, continuous, or frequen-
tative acts, much like the Latin past imperfective as well as for
present and future, while the perfective is usually past in meaning
but can also be used much like the Latin future perfect, e.g. in the
protasis of conditions. The subordinate moods are built on the
" E. O. Ashton, Swahili Grammar, 37 (London, 1944).

imperfective. A facultative future formed by the prefix sa- added to

the imperfective is used relatively rarely. All of this is evidence for
the unmarked character of the imperfective.
The Russian aspect system, and that of Slavic in general, is
really quite different in that both the imperfective and perfect have
their own past and that the perfective without past formative has a
future meaning. The perfect is usually marked by a prefix or some
other extension from the imperfective but the opposite also occurs
not infrequently, e.g. dot' 'to give' (perfective), davaf 'to give'
(imperfective). Jakobson considers the imperfective to be the
unmarked category. The frequency data presented by Josselson
is discordant with this hypothesis as well as with the hypotheses
regarding tenses described above. Thus, of all the verb forms
recorded, in conversational texts the relative frequency of the
imperfective, 46.9 % is somewhat less than that of the perfective
and a similar disparity holds in non-conversational texts. The
imperfective present is the single most frequent tense with 32.5 %
of all indicative occurrences, but the two past tenses, the perfective
with 29.6% and the imperfective with 13.5 %, together total 43.1 %.
The two futures together account for 24.5%.
We next consider derivational categories in the verb. Some
languages have a systematic set of derived categories such as
causative, reciprocal, intensive, etc. These are invariably accom-
panied by overt markers whereas the basic non-derived verb has
zero. There is sometimes syncretism in the derived as compared to
the basic form. Thus in Arabic, Hebrew, and Aramaic the base
form of the verb distinguishes several sub-types marked by a
difference in the vowel of the second syllable in the perfect roughly
correlated with meaning, e.g. classical Arabic kataba 'he wrote' but
salima 'he is safe' (intransitive) and thaqula 'it is heavy' (adjectival).
This distinction is completely obliterated in all the forms of the
derived conjugations. Frequency data are available for Vedic San-
skrit in which the basic verb form accounts for 90.3% of all verb
occurrences and the remaining 09.7 % consists of causatives, intensive
denominatives and desideratives in that order. A study of verb
forms in the Koran by Elchouemi shows that the basic form (I of
traditional Arabic grammar) has a proportional frequency greater

than the total of the ten derived forms which occur in the Koran.14
Of these, the causative (class IV) again contains the second
largest number of occurrences.
Finally among grammatical categories we may consider those of
negation and interrogation which pertain to the sentence as a whole
although they may occur in connection with the verb as the indis-
pensible element of the major sentence type in many languages.
The negative always receives overt expression while the positive
usually has zero expression. There are certain rare instances, e.g.
Vietnamese, where there is a form which expresses the positive
category, but 1 do not believe that it is ever compulsory. Where
there are inflected negative forms of the verb, there is sometimes
neutralization of categories which are distinct in the positive form.
Thus in Shilluk there is syncretization of the present and future in
the negative. All of this is evidence for the marked character of the
negative as opposed to the positive.
In the contrast of declarative and interrogative, it is clearly the
interrogative which is the marked member. Languages often have
an interrogative particle but very rarely to my knowledge a declara-
tive indicator and then always with an interrogative marker in
addition (e.g. Kate in New Guinea). The declarative interrogative
distinction is, of course, often expressed by a difference of intonation
pattern in addition to, or without the existence of, an interrogative
particle. A third method often found with or without those just
mentioned is a difference in word order. A possible justification
for taking the typical interrogative intonations and word order as
evidence for the marked character of the category of interrogation
is considered below. I have no data on the relative frequency of
positive and negative utterances, but for declarative and inter-
rogative Fries gives information for a large corpus of English
telephone conversations in which it was found that 'statements'
furnished more than 60 per cent of the bulk; 'questions' something
" Elchoueini, "Statistique des formes verbales dans le Goran", Restime

Bulletin de la SocUte Unguistique de Paris SO/1, xxx-xxxi (1954).
over 28 percent; 'requests' less than 7 per cent; and 'calls' less than
1 per cent.15
In all of the grammatical examples considered here, the categories
have belonged to what are conventionally called the same part of
speech. However, it may be observed in passing that parts of speech
as a whole give some evidence of hierarchical structuring along the
lines being discussed. Thus the pronoun commonly has some of
the characteristics of an unmarked category as contrasted with the
noun, e.g. greater formal irregularity and greater differentiation
of inflectional categories. The frequency interpretation here would
presumably be applied in a somewhat different fashion. Thus
although the over-all frequency of nouns is greater than pronouns
in all instances where data are available, since the number of
pronominal forms is always far smaller than the number of nominal
forms individual pronouns usually have very high frequency.
Perhaps here the average frequency of individual forms of the two
classes is a fitting measure but, of course, the details remain to be
worked out.
In general the same criteria for marked and unmarked apply to
the area of lexical meaning as for grammatical categories. Instances
of zero expression may be cited from kinship terminology, e.g.
brother vs. brother-in-law, father vs. grandfather. It will be shown
later that there is much evidence to show that in general in kinship
systems consanguineal terms are unmarked in relation to affinal
and less distant are unmarked in relation to more distant lineal kin.
As an example of facultative expression in a lexical category we
may note the use of the unmarked 'author', incidentally with zero
expression, to refer to a writer regardless of sex, while 'authoress'
indicates only a female writer. A further illustration from kinship
is the extended use of such terms as 'mother' to include both the
consanguineal kin type female parent and the affinal type female
parent of spouse, while the marked term 'mother-in-law' designates
only the affinal kin type. This is then a further evidence of the
unmarked character of consanguineal in contrast to affinal in
" C. C. Fries, The Structure of English 51. (New York, 1952).

kinship terminologies. An example of syncretization can also be

drawn from kin terms, namely the neutralization of sex differences
in the higher degree of collaterally, i.e. in cousin terms, as against
the existence of this distinction in the lowest degree of collaterally,
that of siblings where we have the separate words 'brother' and
'sister'. The principle of contextual neutralization operates in such
cases as the following. In the context, female, the opposition
between author and authoress is suspended and only the unmarked
member 'author' appears.
Frequency data regarding kinship terms are presented later in
the discussion of the semantics of kinship systems. They show in
every case a substantially higher frequency for terms in the un-
marked category than for the corresponding terms in the marked
category. For the pair 'author' and 'authoress', Thorndike and
Lorge give a total of 9 occurrences of 'authoress' on the following
four counts, the Thorndike general count, the Lorge magazine
count, the juvenile book count, and the Lorge-Thorndike semantic
count. The combined frequencies of 'author' on these four counts
is 1,102, well over 100 times greater frequency for this term than
for 'authoress'.
In addition to the universals of kinship to be described later a
set of probable semantic universals based on the categories of
marked and unmarked in adjectival opposites may be briefly
considered. The most common example is probably good/bad.
A considerable number of languages, African, Amerind, and
Oceanic, have no separate term for 'bad' which is expressed by
'not good'. On the other hand, there is as far as is known to me,
no language which lacks a separate term for 'good' and expresses
it normally by 'not-bad'. Thus 'good' is the unmarked member.
Similarly, for long/short, wide/narrow, deep/shallow, many/few,
and possibly a few others, the first member is unmarked and the
second marked. For example, Hausa has no word for 'narrow'.
'It is broad' is literally 'it is with width', while 'it is narrow' is
'it is not with width', while for qualifiers 'possessing width' and
'lacking width' are used respectively. Likewise in Hausa 'shallow'
is 'lacking depth'. It is noteworthy that in English contextual
neutralization occurs with these terms and it is the unmarked

member which appears, e.g. What is its width? How wide is it?
not What is its narrowness? How narrow is it? Further examples
are how good, how many, how long, how deep. Frequency data
from English and Spanish show without exception a higher
frequency for the unmarked member for each of the pairs juts
cited. Note that Spanish exemplifies the hypothesis of zero
expression of the unmarked term in the case of 'shallow' which is
normally expressed as 'poco profondo'. The English is drawn
from the Lorge magazine count, the Spanish from Bou.
TABLE xxix
English Spanish
good 5,122; bad 1,001 bueno 36,000+; malo 9,811
many 3,874; few 2,730 mucho 36,000+; poco 6,321
long 5,362; short 887 largo 5,361 ; corto 1,612
wide 593; narrow 391 ancho 1,544 ; estrecho 507
deep 881; shallow 104
The figure 36,000+ is used for the most frequent 87 words in
Spanish whose frequency in Bou's count is larger than 36,000 but
for which the exact number of occurrences is not recorded.
A further manifestation of the marked-unmarked hierarchy is
shown in word association where the stimulus words selected by
psychologists have been exclusively drawn from the unmarked
category, e.g. singular nouns, positive adjectives. However, a
recent set of norms by D. S. Palermo and J. J. Jenkins [Word
association norms, grade school through college (Minneapolis, 1963)]
employs some stimulus words from grammatically marked cate-
gories. In formulating an hypothesis in advance of an examination
of these data, a further factor independent of the marked-unmarked
relationship has to be considered. There is a well-attested tendency
for stimulus words of a particular grammatical category to elicit
reponses of the same category. This is most completely documented
in a study of L. V. Jones and S. Fillenbaum, Grammatically
classified word associations (Chapel Hill, 1964), in which 'Stimuli
were classified on a part of speech basis into categories for each
one of which the response frequency was greatest for the same part
of speech.
If we hypothesize on the basis that, for example, singular nouns
ceteris paribus will elicit singular nouns and plural nouns will elicit
plural nouns, we will make a set of predictions of the following
form. A stimulus of an unmarked category will have responses
of the same unmarked category almost exclusively since both
factors, the tendency towards responses in the same category on the
marked-unmarked hierarchy are working in the same direction.
A marked stimulus will have a marked response but to a substan-
tially smaller degree.
It was possible to test this general hypothesis from the Palermo-
Jenkins material in the following instances with consistently
favorable results. For nouns there were 64 singulars as stimuli and
11 plurals and one ambiguous ('sheep'). The noun responses to
each noun were classified as singular or plural with the following
results.
TABLE XXX
Singular R Plural R Total R
Singulars .940 .060 41456
PluralS .367 .633 7058
Ambiguous S .897 .103 817
For adjectives some comparatives were included along with the
usual positives, but no superlatives. The number of comparative
responses to positive stimuli were so small (4 in 15,353) that it does
not figure in the percentage summary. Superlative responses to
comparative stimuli were exclusively with the same adjective base,
e.g. 'hottest' to 'hotter' as stimulus. There were 29 positive and 9
comparative adjective forms in the study, with these results.
TABLE XXXI
Positive R Comparative R Superlative R Total R
Positive .S 1.000 .000 .000 15353
Comparative S .294 .689 .017 6018
For verbs the data only included the 'general' (i.e. infinitive) form
and the present participle in utilizable form. In two instances

'come' and 'become' the stimulus was ambiguous as between the
general form and past participle but the results were tabulated with
other examples of the general form. Practically all participle
responses involved the same base as the general form stimulus.
There were 22 verb stimuli of the general category and 5 present
participles. The results are once more summarized in Table XXXII.
TABLE XXXII
General R Participle R Total R
GeneralS .997 .003 7686
Participles .194 .806 1749
COMMON CHARACTERISTICS IN
PHONOLOGY, GRAMMAR, AND LEXICON
The closeness of the relationship between the notion of marked

and unmarked in grammar and in lexicon is evident from the fact
that the same major criteria apply in what seems to be intuitively
the same manner so that we have the assurance that we are dealing
with essentially the same phenomenon in both cases. Only the
relatively minor categories of defectivation, dominance ['taghlib']
which might in fact be considered lexical, and agreement a potiori
are missing in the lexical area. Even of these the first, defectivation,
can possibly be exemplified, once more in kinship systems from
the absence of terms for certain kin types in certain systems, e.g.
more distant affinals in English such as spouse's cousin.
Indeed in certain cases, we may consider the same evidence from
one point of view to exemplify a contrast of grammatical category
and from another a lexical contrast. Thus in the instance of'author'
and 'authoress' cited above, the addition of -ess may be taken as
evidence of the non-zero expression of the marked member of the
lexical set 'author' and 'authoress'. On the other hand, given the
recurrent nature of such pairs as author/authoress, sculptor/
sculptress, etc., we isolate an element -ess labelled as derivational
so that a generalization of the relationship noted in the single
lexical pair leads to the over-all characterization of the derivational
category as marked in relation to the underlying category with
zero expression.
In contrast to this obvious inner relation of the lexical and gram-
matical uses of the concept of marked and unmarked categories,
its employement in phonology seems a quite different matter,
At first glance it seems by no means implausible to see here perhaps
no more than a tenuous metaphor, or, at best, a partial or complete
CHARACTERISTICS IN PHONOLOGY, GRAMMAR, LEXICON 57
isomorphism. In fact such an isomorphism can be established

through a set of correspondences of which the fundamental one
is that of the phoneme to the word or lexemic unit, when both are
considered to be constructed from features. For the phoneme the
features are the familiar ones of phonetics, for the lexemic unit it
is the constituent morphemes. However, this analysis into mor-
phemes must be of a particular type as we shall soon see. There is
further the notion of environment which corresponds in both cases,
preceding and following phonemes, preceding and following
lexemic units.
Then in either case the concept of marked and unmarked is a
relation between features which are mutually exclusive where they
are the source of minimal contrast between two phonemes or two
lexemic units. Thus in phonology the features of glottalization
and non-glottalization are mutually exclusive and susceptible of the
relation of marked to unmarked where they form correlative pairs,
e.g. where we have such oppositions as globalized dental stop and
non-glottalized dental stop. In grammar we deal chiefly with
inflectional categories where, e.g. a form cannot be singular and
plural at the same time and where we compare forms with the same
bases or class of bases existing in these two inflectional categories,
e.g. noun stem singular vs. noun stem plural. Such contrasts can
be expressed, of course, in terms of now traditional morphemics as
morpheme class of noun bases + singular morpheme vs. morpheme
class of noun bases + plural morpheme, but we must note then
that (1) the base has a different status than the inflection since it
is the latter that corresponds to the feature susceptible to being
marked or unmarked; (2) the base is often a morphemic sequence;
(3) we must in this case use a type of morphemic analysis in which
inflections involving different generic categories simultaneously
are treated as containing as many morphemes as there are categories
since we must sometimes, for example, contrast a first person
singular with its corresponding plural and on other occasions with
the second or third person singular.
For lexical items, e.g. kinship terms, the features correspond to
the components of contemporary componential analysis. Here
58 CHARACTERISTICS IN PHONOLOGY, GRAMMAR, LEXICON
there is no necessary hierarchy among the components, such as

that of base and inflection. The essential difference is that feature
here cannot be equated with morpheme but is rather a semantic
component or 'seme', if you will. Thus 'brother' is grammatically
a single morpheme, but it can be analyzed semantically into such
components as 'male', 'zero generation', 'consanguineal' which do
not themselves have morphemic status.
Given these equivalences, we can attempt to translate by sub-
stituting corresponding terms from the language of phonology
into that of grammar or lexicon and vice versa. Let us consider the
chief criteria for unmarked and marked categories of phonology
from this point of view. It will be recalled that our first phonological
criterion was that of neutralization in which the unmarked feature
appears. Our choice of terminology in the grammatical discussion
suggests that it is possible to equate this with contextual neutrali-
zation. And indeed, one can be mapped into the other by the
appropriate equivalences. The terms marked and unmarked like
environment, are, of course, invariant under this transformation.
Hence we have the following: when in a particular class of environ-
ment no contrast occurs within a set of {{ionemes which differ from
each other only in a single feature, it is the unmarked feature which
appears in this environment.
The second phonologic characteristic, greater frequency of the
unmarked member, is likewise subject to straightforward transla-
tion from one mode of speech to the other. In both cases we are
dealing with relative text frequencies of members of set formed by
phonemes/lexemes which differ in a correlative feature, and we
predict the greater frequency of the unmarked member.
The greater allophonic variability of the unmarked member of
a correlative set was mentioned as a third indicator of unmarked
versus marked status. Translating allophone into allomorph, we
have indeed one of the criteria of the unmarked category in
grammar. We have seen that in general, though exceptions can
be found, the unmarked grammatical category shows greater
allomorphic variation, except of course when, as is characteristically
the case, it is expressed by zero.
IN PHONOLOGY, GRAMMAR, LEXICON 59
As a fourth clue to unmarked status in phonology, it was

mentioned that the number of phonemes with the marked feature
is always less than or equal to the number with the unmarked
feature but not greater. Thus the number of nasal vowels is always
less than or equal to the number of oral vowels. A resolute attempt
to translate this into the language of grammatical analysis will in
fact show that it is the analogue of syncretization, provided we
keep in mind that, as mentioned earlier, the correspondent of
feature in phonology is really the inflective or derivational morpheme,
that is, the semanteme of European terminology. Thus the smaller
number of nasal vowels in some languages means that certain
oppositions present in the unmarked category non-nasal are
syncretized in the marked category. Thus comparing nasalization
with plurality as marked features, one may say that the opposition
between high and low vowels present among oral vowels is syncre-
tized among the nasalized in French, just as the opposition between
masculine and feminine is syncretized in the plural of the article,
the demonstratives, and the possessive adjectives in the same
language.
The fifth and last indicator mentioned for distinguishing marked
from unmarked in phonology was that the basic allophone, defined
in terms of phonologic independence of its environment, was the
one with the unmarked feature. The translation of this statement
into grammatical terminology requires that we find an equivalence
to independence in relation to environment. Now, it will be recalled,
that by independence in this case was meant non-assimilation
phonetically to adjacent sounds. A sound is assimilated to another
it it shares more features with it. Similarly a lexeme may be said
to be assimilated to another lexeme if it shares an additional feature
with it, meaning in this connection, as has been seen, a semanteme.
Now the sharing of semantemes in grammar is concord. Hence
we may equate the phonological character just mentioned with
agreement a potion. Thus in Spanish the adjective agrees with the
noun it modifies in gender, i.e. it shows a common semanteme.
The unmarked masculine is, however, more independent of its
environment in that it may be used in a Spanish expression such
as cuello y camisa blancos 'white collar (masc.) and shirt (fern.)'

where blancos, which contains the masculine morpheme, appears
in the environment of camisa 'shirt' which is feminine while the
feminine morpheme of an adjective could never appear except in
the environment of another feminine.
The possibility of translation for every one of the five character-
istics of the unmarked/marked dichotomy in phonology as enu-
merated earlier into grammatical terminology under fixed rules of
translation and with unmarked and marked corresponding to each
other in each case is sufficient evidence that the analogy between
these concepts in phonology and in grammar is not a far-fetched
one. As will be developed in more detail later, what connects the
uses of the term unmarked and of marked in at least some of these
statements with each other, and in corresponding ways in phonology
and grammar is the basic or fundamental character of the unmarked
as against the marked. This can be shown more exactly, in the
following way: whenever a statement of one of the above five types
can be put in terms of a universal implication, it is the unmarked
member which is the implied or basic term and the marked which
is the implying or secondary. Thus to the first type statement that
gender may be syncretized in the marked category, i.e. in the plural
and the phonologic statement that the opposition between high
and low vowels may be syncretized in the presence of nasality,
we have the implicational universals: (1) distinction of gender in
the plural implies its distinction in the singular but not necessarily
the converse; (2) distinction of vowel height in nasal vowels implies
its presence in oral vowels but not necessarily the converse. In
both of these statements the implicatum is the unmarked category,
singularity and non-nasality respectively.
Viewed psychologically there is perhaps justification for seeing
a similarity between the implied, fundamental characteristic, that
is the unmarked member, whether in phonology, grammar, or
semantics, and the Gestalt notion of ground, the frequent, thetaken-
for-granted, whereas the marked character would answer to figure
in the familiar dichotomy. It may be noted in passing that the
traditional arrangement of paradigms in grammars seems to display
an intuitive recogniton of these relationships. The singular is

always put above or in the left hand column, so with the active
versus the passive, etc.
It is time, however, to turn to points where the isomorphism is
perhaps not complete. Specifically, we can ask whether the addi-
tional characteristics of the unmarked/marked in the grammatical
and semantic spheres to which no correspondent has yet been
mentioned from phonology do have such a partner and, if they do
not, to seek for an explanation of the impossibility of a mapping
in these cases.
The characteristics involved are the following: zero expression,
facultative expression, defectivation, and dominance (taghlib). This
last can be eliminated as a relatively minor phenomenon. In fact,
it only applies to the category of number since it refers to the
characteristics of a collection and is therefore irrelevant to the
analyses of most grammatical and semantic categories. Defectiva-
tion also raises no real difficulties. It was seen that defectivation
is closely related to the concept of syncretism. In fact it might be
considered a variety of syncretism in which the representative of
the syncretized category can be definitely identified with a particular
member ofthat category. The others can then be said to be lacking
or defective. Thus if gender difference is syncretized in the plural
but the single gender present can, on some grounds, be identified
as masculine, then the feminine plural is missing. Similarly we
can say that the oral high vowels in French have no nasal partners
so that there is defectivation in the marked category of nasal vowels.
This, however, leaves two conspicuous indicators of the marked/
unmarked in grammar and semantics facultative expression of
the marked and zero expression of the unmarked. It will be recalled
that what has been called here 'facultative expression' is given
definitional status by Jakobson in his discussion of the marked/
unmarked dichotomy in relation to grammar. The analogy to
phonology has already been pointed out; namely, that the unmarked
member acts as a surrogate for the entire category. However, as
was just pointed out, the more exact analogy of phonological
neutralization is contextual neutralization. The comparison of
phonological neutralization, however, to facultative expression does

serve to point out the important ambiguity of the unmarked terms
in grammar and semantics as a simultaneous bearer of the generic
category meaning and the specific unmarked subcategory and the
similar ambiguous role of the archiphoneme. However, it remains
to be pointed out that, as important as the phenomenon of faculta-
tive expression is, it does not in fact apply for a number of the
grammatical categories mentioned.
These include the important categories of positive/negative and
declarative/interrogative. Thus it is not true that a statement in the
positive form, that is without an overt indicator of negation, can
be taken as either positive or negative and is merely par excellence
positive. Likewise in a language with a question particle, it is not
the case that the absence of this particle indicates that the sentence
can be taken as either declarative or interrogative. The same holds
for the lexical case of adjectival opposites. Again it is not true that
the unmarked member 'wide' can also mean 'narrow' saving
indication to the contrary. A further instance is lower versus higher
numerals.
This of course leaves us with the alternative of excluding cases
such as those just cited from consideration as instances of marked
and unmarked categories and setting up perhaps a still wider notion
under which these cases can be subsumed along with a different
subset including true instances of the marked/unmarked dichotomy.
However, the instances just considered are so like the other cases
that this seems inadvisable. It thus turns out that one of the two
remaining types which does not seem to have an exact respondent
in the area of phonology is not itself universally present in grammar
and semantics. This still leaves us with the very important gram-
matical indication of the unmarked category by zero expression.
Here, however, a literal translation, at least using the equivalences
which showed themselves to be efficacious in instances considered
above, is not possible. For zero expression involves the relation
between content, the grammatical or semantic category involved,
and expression, in this case the lack of overt sound seq uences.
At this point the fundamental difference between the phonological
and grammatical level asserts itself, the sound-meaning relationship

which is absent in the former and present in the latter.
Up to now the criteria of the marked and unmarked, whether in
phonology or grammar/semantic has been treated as an empirically
given bundle of concurrent phenomena; that is, such questions as
the following have not been asked. Why, for example, should the
less frequent category be the one which is subject to syncretizations?
The following remarks are to be taken merely as exploratory
soundings.
Consider first the situation in phonology. Here the fundamental
factor is quite possibly consitituted by certain dynamic diachronic
factors. Of these the chief would be the tendency for a more
complex (marked) item to lose its mark whenever it no longer
contrasts with the corresponding unmarked item. Thus in the
presumed course of events embodied in Grimm's first law, once
unvoiced stops had become fricatives, we would be left with such
sets as bb, bt f. The b with its marked feature of voicing having
no partner p, was free to lose its mark and become p. Now given
fth, p, f, in similar fashion the bb having no partner, b could lose
its marked feature of aspiration, although it became a voiced
fricative rather than a voiced stop in most environments. In this
schematic statement various complications are not considered,
notably those concerning Verner's law. The Grimm's law changes
do not, in general, involve merger. In other cases of complete or
conditioned merger under conditions where, typically but not
always, functional yield is low, it seems to be the general rule that
the merger is produced by the marked feature losing its mark.
Conditional mergers will evidently produce neutralization. Thus
in German and other languages voiced and unvoiced obstruents
have merged in word or sentence final by the loss of voicing in this
position.
Of course not all sound changes operate in this direction. For
example, by assimilative changes a complex may acquire a marked
feature of an adjacent sound as in assimilative voicing. There are
further sources of phonemes with marked features. An important
one is surely the development of complex articulations from previous
sequences. A typical instance is nasalized vowels, conjectured by

Ferguson to arise in all cases from sequences of oral vowel and
nasal consonant. In such instances it would presumably be the
case that first the oral vowel is nasalized non-distinctively before
the nasal consonant, and the consonant is subsequently lost.
In their relation to marked and unmarked features then, two
major classes of regular sound changes may be distinguished. The
first includes unconditioned changes, particularly mergers, and
those conditioned changes in which the specific class of environing
sounds is irrelevant, e.g. changes in word final. In these which may
be assigned to the paradigmatic aspect of language the overall
tendency is for the marked or phonetically complex series to give
way to the unmarked or simpler. Thus it may be asserted as a
diachronic universal that a glottalized series may merge with the
corresponding unglottalized series in an unconditional merger
but not vice versa. If the opposite occurred it would produce a
phonological system in which glottalized consonants occurred
without an unglottalized series and such is not known to occur.
The other class of changes which may be considered syntagmatic
consists of the mass of assimilatory conditioned changes which
often give rise to marked features. Thus the answer to the objection
that 'ease* of articulation, an expression which is avoided here,
but which can be given objective content should produce constantly
simpler phonologic systems in the evolution of language is that
there are two kinds of 'ease', paradigmatic which favors simplifica-
tion by loss of additional articulatory features regardless of context
and syntagmatic which favors the genesis of new assimilatory
modifications conditioned by the phonetic environment and so
gives rise to articulations which taken in isolation are more complex.
The greater frequency of the unmarked set can be largely ex-
plained as a resultant of the two processes just described. In
positions of neutralization only the unmarked member appears.
Where a set of marked phonemes arises from a sequence, the
original frequency of the undifferentiated protophoneme would
presumably be smaller before the limited set which furnished the
second members of the sequence, and this lesser frequency will
be reflected at a later stage by the correspondingly smaller frequency

of the marked set. Thus in Latin the frequency of any vowel before
all the non-nasal phonemes was presumably greater than before
the nasals alone. This same hypothesis will also explain another
characteristic of the marked category in phonology; namely, that
the number of marked phonemes of a set of correlative pairs is
usually less than or equal to the number of unmarked. When they
arise in this fashion they will in the beginning be equal in number.
They may then decrease by mergers, as with the French nasal vo-
wels. Given their initially smaller frequency, their functional yield
with each other is necessarily small. A further psychological
factor is the probably greater acoustic similarity of sets which
share a marked feature as against an unmarked feature. In a
psycholinguistic experiment of Greenberg and Jenkins, subjects
judged each pair distinguished by voice as closer together than
correlative pairs distinguished by voicelessness.1 Thus bid was
closer than p:t; b:g than p:k, etc. It is remarkable, to cite the
example of nasality as the marked feature, that a change m > n is
not uncommon, but b > d or p > t is practically unheard of.
The greater frequency of the unmarked then would be a resultant
of certain common diachronic factors. Where other diachronic
factors are at work, however, discrepancies may arise. Thus as
was pointed out, some languages have a larger number of long
vowel phonemes than short vowels because of the common mono-
phthongization of the diphthongs aj and ay. Of course, S and
having no short partner may be expected to become shorter, but
various morphological or canonical form factors may serve to
maintain length. For these reasons, while there is a far better
chance tendency not only for the total text frequency of an un-
marked set to be greater than that of the corresponding marked,
but even for each individual pair, there are occasional exceptions.
While frequency is thus merely a resultant, though a very im-
portant one, of overall diachronic tendencies in phonology, it is
tempting to adjudge its role in grammar-semantics as primary.
1
J. H. Greenberg and J. J. Jenkins, "Studies in the Psychological Correlates
of the Sound System of American English". Word 20.157-177 [esp. 177] (1964).
There is a real difference between frequency phenomena in phono-

logy and in the grammatical-semantic sphere. For the former,
we do not choose our expression in terms of sounds, except perhaps
marginally in poetry so that phonologic frequency is an incidental
characteristic which bears the marks of past diachronic changes.
But we make grammatical and semantic choices based on the
momentary situation. It is therefore plausible, insofar as there are
constants in the human situation, that, for example, everywhere
the singular should be more frequent than the plural and that this
remains quite constant over time in spite of changes in the means
of expression. Hence also generalizations regarding relative
phoneme frequencies are more precarious and exceptions are to
be expected. De Saussure here, perhaps anachronistically inter-
preted, had a real insight where he has sometimes been judged to
be obviously wrong; namely, in his identification of the diachronic
with the phonological and the synchronic with the grammatical.
The important phenomena of zero and facultative expression can
be understood in terms of frequency phenomena based on the
situation in the world with which the users of language must deal.
In fact there is here no real difference between semantic and gram-
matical phenomena. For example, it is not so much in English
that male is in general the unmarked category in relation to female,
but the frequency of association of things in the real world.
'Author' means facultatively a writer of either sex, butpar excellence,
male, because in fact most authors are male. We see this if we
compare the term 'nurse*. Since nurses are usually female, nurse
takes on the meaning of nurse in general, or non-male nurse. To
express the maleness of the nurse, when relevant, we use the marked
expression 'male nurse'. Just so we may compare the ordinary
semantic interpretation of words with or without syntactic modifiers
with the morphological expression of corresponding categories.
In a language without a grammatical category of diminutives and
augmentatives, where size is indicated by modifying adjectives, if
we use 'house' in a sentence without modifiers, the size is unspecified
but the house may in fact be unusually large or unusually small.
We will usually assume that it is of normal size because most
houses are of normal size. On the other hand, 'small house' or

'large house' exclude explicitly from interpretation as normal size.
The frequent assimilates the ambiguous, save contrary indications.
There are other advantages to a frequency interpretation of
marked and unmarked in grammar and semantics by which marked
simply means definitionally less frequent and unmarked means
more frequent. To begin with there is the obvious methodological
advantage that frequency phenomena can be explored for every
language whereas the other criteria are more limited in this respect,
e.g. neutralization of certain subcategories may not exist in a given
language. Frequency data will allow of degrees of marked and
unmarked by which the associated phonemes will be expected to
be most common and least subject to exception where the frequency
disparity is the greatest. This indeed seems to be the case insofar
as, for example, the hierarchy of persons is both less certain and
overwhelming in regard to frequency and also less clear in other
matters, whereas the hierarchy of numbers shows almost no
exception in non-frequency phenomena and great constancy
together with large frequency disparity for singular, plural, and
dual. In addition to gradualizing and quantifying the scale, it also
allows the construction of a much more subtle and manifold
hierarchy, for example, for the cardinal and ordinal numbers.
In addition the frequency definition will cover at least one case
in which none of the other criteria is present but which has been
considered as an example of the marked/unmarked distinction by
Jakobson; namely, normal (unmarked) versus emphatic (marked)
word order. The so-called normal order, it would seem, is neces-
sarily the most frequent. We may refer here to the well-known
story of the boy who cried wolf.
Finally it may help to overcome the problem of lack of inter-
linguistic comparability of categories. Thus, for gender categories,
we may at least conjecture that the associated phenomena such as
zero expression and neutralization will be present to the degree
that frequency differences exist among the genders. Since these
are largely or completely conventional semantically and differ in
size of membership, it is entirely plausible that the gender labelled
'masculine' in one language will be of much greater text frequency

than the feminine in that language, while in another language,
the relationship is reversed. We may hypothesize that in the first
language the masculine will display the other characteristics of the
unmarked category, while in the second it will rather be the feminine.
Where the categories are not 'conventional', e.g. for cases, the
way lies open to explain the frequencies of specific cases as a sum-
mation of a number of discrete uses, each substantially similar in
frequency among languages but differently combined in different
languages. For example, traditional grammar describes the uses
of the ablative in Latin under such rubrics as the ablative of personal
agent, separation, instrument, etc. If we had the frequencies of each
of these, we could then, for example, compare it with the Russian
cases by equating a component of separation with the genitive with
prepositions of and iz while agent and instrument would be equated
with the Russian instrumental.
There is at the moment a great practical difficulty here, of course,
as well as the theoretic problems of sampling. It is rare to have
frequency studies of grammatical categories, and even these do not
specify the separate uses of the categories. But this can in principle,
of course, be overcome in order to test the hypotheses presented
here.
The connection between frequency and the phenomena of gram-
matical or semantic neutralizations and morphological irregularities
has not yet been discussed. It has often been noted that the most
frequent forms are the most irregular. These are indeed now by
our definition the unmarked forms.
Where there is a complex set of intersecting categories, the
frequency differences between combinations of unmarked categories
and of marked categories are very great. For example, in Avery's
study of the Rigvedic verb, the form which involves all of the most
unmarked categories, singular, third person, present, active,
indicative has 1404 occurrences, while the dual, second person,
medio-passive perfect optative has zero frequency. Such enormous
disparities must surely have an effect in thet such a highly infrequent
formation must follow analogically other parts of the system, while
only a fairly frequent form can preserve irregularities. Hence, also

syncretisms produced by the accidents of sound change will in such
cases not lead immediately or inevitably to new formations to
reintroduce the lost distinctions. Thus the general course of the
reduction of the case system in Indo-European languages leads
to the coalescence of the marked oblique cases, and where the
whole structure finally collapses, it seems to be one of the direct
cases, nominative or accusative, which is the historical source of
the nouns now undifferentiated for case. Thus in phonology,
diachronic process explains frequency, while in grammar, frequency
explains diachronic process. Frequency not included in la langue
definitionally is in fact an ever present and poweful factor in the
evolution of grammatical categories and thus helps in explaining
the types of synchronic states actually found.
That such things happen is not to be wondered at. Though we
may justifiably define our subject in a coherent and consistent way,
the world is under no obligation to respect these boundaries and
it is a commonplace that we must often bring in external explanatory
factors.
A particular type of connection between marked categories in
phonology and grammar may be pointed out, and its explanation
will now be clear on the basis of the above considerations. Some-
times the marked category in phonology is the expression of a
marked category in grammar. Thus certain Amerind languages
use the marked feature of glottalization to express the marked
grammatical category of the diminutive. In German umlauted
vowels may be considered a marked phonetic category as against
their non-umlauted partners. Rounded front vowels always imply
rounded back vowels in a particular language; their number is
never greater, and their text frequency is generally less. Umlaut
is used in German as a grammatical process to express the marked
categories of plurality in the noun, comparative and superlative
in the adjective, and past subjunctive in the verb. These phenomena
result from zero expression of the unmarked where a phoneme
involved in the expression of the marked disappears after having
modified the simple preceding sound to produce a marked complex
sound, e.g. umlauting produced by a former or glottalization

from a former glottal stop.
Another example of phonological-grammatical connection is the
Widespread use of the marked category of final rising pitch for the
expression of interrogation. Here the problem is somewhat
different in that since the intonational pattern has this meaning
directly we may seem to be tautologous in asserting that the less
usual intonation expresses the less frequent category. However,
there is further independent evidence for the 'normality* of tonal
descent in that phonemes of pitch often have progressively lower
allophones the later they occur in the sentence, but the phenonenon
of allophonic raising never seems to occur.
If it turns out that in fact frequency is an adequate unifying
principle for the domain of the marked and unmarked in semantics
and grammar, a great over-all simplification will have been achieved.
But frequency is itself but a symptom and the consistent relative
frequency relations which appear to hold for lexical items and
grammatical categories are themselves in need of explanation.
Such explanations will not, in all probability, arise from a single
principle. Thus it may be noted that in adjectival opposites where
a theoretical scale with an implied zero point is unmarked, e.g.
heavy, large, wide, deep, etc., there is obviously a unifying principle
but it will not even apply to all adjectival opposites, e.g. good/bad,
and is irrelevant in a host of other examples. Again the center of
a normal frequency distribution is unmarked in relation to the
extremes, e.g. normal size as against diminutive or augmentative.
This topic is left for future exploration.
In phonology, a third level principle which, while requiring
further refinement, is evidently sufficient to predict for a wide range
which features will be marked and which unmarked is articulatory
complexity which is correlated with acoustic complexity. This can
be defined in an objective manner independently of the distribu-
tional and frequency phenomena employed here to distinguish
marked and unmarked categories. A particular articulation is to
be considered more complex than some other if it includes an
additional articulation defined in terms of departure of an organ
from the position it normally has in the absence of speech. This

notion can be extended to include successive additional articulations
in the case of length and diphongization.
An apparent exception is nasality. Acoustically the nasal is
more complex in that it involves additional nasal resonances but
from the articulatory view it seems to be superficially the oral
articulation that is complex since it requires a raising of the velum.
Note however the remarks of Heffner regarding nasal vowels.
"The contraction of the pillar of the fauces is a feature of the pro-
duction of nasal vowels" and "... nasal vowels are produced by
adding the vigorous lowering of the velum, accompanied by some
constriction of the palatopharyngeal arch, to the usual movements
of articulation peculiar to the analogous oral vowel."2
* R-M. S. Heffner, General phonetics (Madison, 1964), 31, 113.

UNIVERSALS OF KINSHIP TERMINOLOGY
In the discussion of universals of kinship terminology to which

we now turn, the attempt will be made to apply the principles
discussed earlier, in a particular semantic domain. In this con-
nection it will be possible to illustrate from concrete materials the
relationship between the over-all theory of the marked and un-
marked and typologies which accompany the specific universals
derived from the theory. It will then appear that such a theory
is of a higher level in that it binds together within common a
deductive structure various typologies which lack overt inter-
connections.
In the foregoing discussion several examples were adduced from
the kinship terminology of speakers of English as an illustration
of the principle of marked and unmarked categories. Thus in the
English term 'cousin' there is neutralization of sex reference as
against 'brother' and 'sister'. Again there is zero expression of the
consanguineal as against the affinal relation in such pairs as 'father'
vs. 'father-in-law' and 'brother' vs. 'brother-in-law'. As a further
example we might cite the absence of a term 'cousin-in-law', con-
cocted here for illustrative purposes, which will exemplify defec-
tivation, of the marked category 'affinal' which lacks, in ordinary
usage, a term corresponding to 'cousin' among consanguineal terms.
Of course all of these examples are taken from English. But as
will be shown later, the specific hierarchy of categories in English
kinship terminology such as lineal (unmarked) vs. collateral
(marked), consanguineal (unmarked) vs. affinal (marked) are very
widespread, and in fact for these, and others to be shortly men-
tioned, no significant exceptions have been found as yet. Let us
then pursue the matter further, confining ourselves for the moment
UNIVERSALS OF KINSHIP TERMINOLOGY 73
to English. In addition to the evidence we have already found for

the marked or unmarked nature of the lineal/collateral and
consanguineal/affinal categories, we have certain other evidence.
In the direct descent line, i.e. among lineal ascendants and des-
cendants, we see zero expression for the first ascending as against the
second ascending in the pairs father/grandfather, mother/grand-
mother. A corresponding contrast exists between G~* and G~8
in the pairs son/grandson, daughter/grandaughter. This system,
of course, extends further since G+s is marked as against G+* by
the prefix 'great-' and G+4 as against G+3 by an additional occur-
rence of 'great-' and correspondingly for descending generations.
We have then, in English, a recursive device by which a more
remote generation is always marked as against a less remote
generation.
These additional data already suggest a tentative hypothesis of
the third level as defined in the previous section. Of two categories
it is the more remote from the speaker which is always marked in
relation to the less remote. In fact it can be shown formally by the
counting of the number of occurrences in definitions reduced to a
chain of successive applications of the relation 'parent' and its
converse 'child' (abstracting from qualifiers such as sex, relative
age, etc.) that collateral and affinal relatives are more remote than
lineal and consanguineal respectively.
In testing further these and similar hypotheses, I have not set
up a formal sample. As the basic set, the Gifford study of California
kinship terminologies which contains kinship terminologies of
approximately 80 California Indian groups was utilized.1 This
was supplemented by approximately 40 additional terminologies
from various other parts of the world. While it cannot be, of
course, guaranteed that exceptions to the conclusions described
here do not exist, their absence in the set examined gives reasonable
assurance of at least statistical predominance. In what follows,
therefore, I will illustrate with examples from this sample but
without giving all the supporting instances from the sample.
1
E. W. Gilford, "California Kinship Terminologies", University of California
Publication in American Archaeology and Ethnography 18.1-285 (1922).
74 UNIVERSALS OF KINSHIP TERMINOLOGY
In the terminology of English speakers it was seen that the less

remote .generation has zero expression as against the more remote
marked category. Neutralization for sex reference, not found in
English, is fairly common elsewhere. Thus for the Bavenda, a
South African Bantu-speaking group, a single term makhulu
includes all four grandparents, father's father, father's mother,
mother's father, and mother's mother in its reference, whereas
there are separate terms for the male parent, khotsi 'father' and
female parent, rnme 'mother'. It is indeed a probable 'factual
universal' that all systems distinguish male and female parent by
separate terms even though very frequently other kin types are
included in the referents of both, e.g. father's brother is often
designated by the same term as father. The Bavenda example also
involves neutralization of the distinction between lineal and
collateral in the marked second ascending generation as against
the unmarked first ascending. The just quoted term makhulu
also comprehends siblings of grandparents, e.g. mother's mother's
brother. In the first ascending generation there are separate terms
for the mother's brother and the father's sister, malume and
makhadzi, respectively.
The Venda system is an example of the widespread bifurcate
merging type in which the father and father's brother are referred
to by the same term, while there is a separate term for mother's
brother. Similarly mother's sister and mother have a single designa-
tion, while mother's brother has a separate term. A similar
neutralization of the lineal-collateral distinction in the second
ascending generation is found also in some systems which like
English have a single term for both father's brother and mother's
brother and another for father's sister and mother's sister. Thus
Hanunoo in the Philippines has qmaq for 'father' and bpaq
which designates all the kin types to which we apply the term
'uncle'. Likewise there is qinaq 'mother' and byih 'aunt'. But
for the second ascending generation a single term lakih includes
both grandfathers and either grandmother's or grandfather's broth-
ers or grandmother's or grandfather's sister's husband. The term
qiduh 'grandmother' has a corresponding extension for females.
The same Hanunoo system exhibits still further neutralization

in the third generation in that the word qumput in addition to
covering both lineals and collaterals as does the second generation
term, makes no distinction in the sex of the referent. It covers,
therefore, all lineal and collateral relatives of the third ascending
generation. Returning to the Bavenda we find here also evidence
of the relatively marked character of the third ascending as against
the second ascending generation, in that the first is makhulukuku
formed from the grandparental term by the addition of a suffix
-kuku.
Similarly there are numerous evidences for a corresponding
hierarchy for the descending generations. Since the Hanunoo terms
for aunts, uncles, grandparents, and great-grandparents are all
self-reciprocal, that is whenever A calls B by one of these terms,
the same term is the appropriate one for B to call A, neutralizations
of successively increasing scope are found in the first, second, and
third descending generations as in the first, second, and third
ascending generations. A further example is the Sara dialect of
Ainu in which the sex difference found in the first descending
generation terms po 'son' and matne-po 'daughter' is neutralized
in the second descending generation in the sex-undifferentiated
term mitpo 'grandchild', which is, incidentally, also marked by a
prefix to the term for son. The third generation appellation is
likewise not distinguished for sex and has an additional prefix to
the second generation term, i.e. mitpo 'grandchild', san-mitpo
'great-grandchild'.
Similarly for the other categories already mentioned for which
there is evidence in English, such as lineal-collateral, consanguineal/
affinal and, we may add, step-relatives as against non-step-relatives,
examples of neutralization and non-zero expression in the marked
members are not difficult to find. Thus for ego's generation in
Malay of Singapore we have three terms abang Older brother',
kakak Older sister', and adik 'younger sibling of either sex'. For
cousin these distinctions of sex and relative age are all obliterated
in the single term sa-pupu.
As an example of neutralization in affinal as against consanguineal
relatives we may cite Umbundu from Angola in which, as every-

where, father and mother are distinguished in the terminology.
Here the term tata 'father' also includes male collaterals of the first
and second degree, i.e. father's brothers and father's cousins,
and mal includes 'mother' as well as female collateral relatives of
the mother. However, there is a single parent-in-law term
ndatembo embracing parents of either sex of either husband or
wife and with collateral extensions like that of the consanguineal
parental terms. There is thus neutralization for sex of the person
addressed.
Further observation of the generational hierarchy shows that an
additional factor to that of remoteness from ego must be taken into
consideration. There are many examples which show that ascending
generations are unmarked as against descending generations of
equal genealogical distance from ego. An example is from Logoli,
a Bantu-speaking people of Kenya, where we have guga 'grand-
father', guku 'grandmother', and omwitjuxulu 'grandchild* for a
lineal descendent of the second generation and of either sex.
For the first ascending as against the first descending generation
it is fairly common to find systems in which the marked character
of the latter is evidenced by neutralization for sex reference,
whereas, as has been seen, the distinction of father and mother
terms is universal. Thus in Bantu languages generally there is a
single 'child' term. This same situation usually holds in Austro-
nesian languages also. Thus in Malay we have both bapa 'father',
Smak 'mother', but anak 'child' without distinction in gender.
Of course both here and in the Bantu cases a qualifier can be added
when necessary to specify the sex of the child, but this is not usual.
At any rate, there are distinct morphemes for father and mother
and a monomorphemic term of the first descending generation
designates the child regardless of sex.
That seniority is involved as an additional factor distinct from
genealogical distance from ego is also shown in sibling terms.
When relative age is indicated in the terminology, which is quite
frequent, there are often indications that the terms designating
older siblings are unmarked whereas those indicating younger
siblings are marked. It may be noted that in the earlier example

cited in a different connection of sibling terms from the Malay of
Singapore, older siblings are distinguished for sex while younger
are not. The terms are abang Older brother', kakak Older sister',
and adik 'younger sibling of either sex'.
Further evidence for the factor of seniority is the marked
character of ego's own generation G as against the first ascending
generation. For example, it is not uncommon to find systems in
which father is distinguished from father's sister and mother from
mother's brother but in which their respective offspring, the siblings
and cousins of the speaker, are all merged in a single term, thus
eliminating the lineal-collateral distinction.
In this particular instance, however, it might be claimed that
kinship distance to a sibling is greater than to a parent. This
would follow from the uniform procedure of reckoning kinship
distance by the number of occurrences of the relation 'parent'
or its converse 'child' in the relational product required to define
the terms. Thus for either father or mother of ego the relation
parent obviously occurs once, whereas for brother or sister it
occurs twice, since my sibling is my parent's child.
Taking the two factors of seniority and genealogical distance
from ego, then, the hierarchy of generations will begin with the
first ascending as unmarked in relation to all others, then ego's
generation and the first descending generation as about equal, the
first descending being lesser in seniority but closer genealogically.
After these we have successively, second ascending generation,
second descending generation, third ascending generation, third
descending generation, etc.
The marked character of descending generations in relation to
corresponding ascending generations is also shown in the phenom-
enon of reciprocal terms. Two terms may be defined as reciprocals
if whenever refers to y by the first term, y refers to by the second.
If and y are identical, then we say the term is self-reciprocal.
Our English system of terminology has only one true reciprocal
term 'cousin' and it is self-reciprocal. Take, for example, grand-
father and grandson. If calls y grandfather then y calls grandson
only if is male. Therefore these terms are only partial reciprocals.

On the other hand, uncle and grandfather are non-reciprocal since
if calls y uncle, y never calls grandfather.
The reason that complete reciprocity fails in the case of the
English terms grandfather and grandson is obviously that for both
the speaker may be of either sex while the person addressed is
distinguished for sex. Where reciprocity holds, the following
cases are possible: Both speaker and addressee may be of either
sex, as with English 'cousin'. In such instances, the term may
be self-reciprocal as in the case with 'cousin'. Both speaker and
addressee may be of the same sex. Here also self-reciprocity is
possible. Thus many Bantu languages have a sibling term com-
monly glossed as 'sibling of the same sex'. This word may be used
by males to refer to males and by females to refer to females.
These are necessarily self-reciprocal. Finally sex of the speaker
and addressee may be different by the definitional requirement of
the term. The same Bantu languages which have a term 'sibling of
the same sex' normally have also a term meaning 'sibling of the
opposite sex', naturally also self-reciprocal.
Now very many kinship systems, of which our own is an example,
only contain terms which do not involve sex of the speaker in their
definition. Other systems contain some terms in which the sex
of the speaker is involved, but only along with terms of the former
type which are thus universal.
In the present connection what is significant is that commonly,
though not always, terms involving sex of the speaker are reciprocal
or self-reciprocal terms. They are, as it were, secondary, arising
from the reciprocal use of the extremely common type of term
in which sex of the speaker is not specified but sex of the addressee
is, as with all English terms except cousin. Thus the true reciprocal
of the term grandfather will be child's child where the speaker is
necessarily male and of grandmother will be child's child with the
speaker specified as female. In such instances we will gloss the
terms as man's child child and woman's child child.
Logically we could have reciprocals or self-reciprocals either of
the type grandfather with its reciprocal man's child's child or
grandson with its reciprocal man's parent's parent. The remarkable

fact seems to be that examples of the first type in which the so-to-
speak normal situation that the sex of the speaker is not specified
occurs for the ascending generation term but never, as for the
second type, in the descending generation term. This tentative
universal may be stated as follows: whenever there are two terms
differing in generation which are true reciprocals, or there is one
which is a self-reciprocal term with two referents, and one involves
the sex of the speaker in its definition and the other does not, it is
always the term of lower generational reference which contains
the sex of the speaker in its definition.
It may at first seem rather far-fetched to interpret the association
of the normal situation of lack of reference to speaker's sex in the
higher generation as a further evidence for the unmarked status
of higher generation in distinction from lower generation terms.
However, there are cases in which two distinct words are used,
that is the terminology is not self-reciprocal and the ascending
generation term, here interpreted as unmarked, has zero expression
while the lower generation term with sex of speaker specified has
an affix. Kawaiisu, a Shoshonean language of California, may serve
as an example. We have a whole series of paired terms of the
following type: Sinu- 'mother's brother', sinuci- 'man's sister's
child', togo- 'mother's father,' 'spouse's mother's father', togoci-
'man's daughter's child', 'man's daughter's child's spouse', etc.
Such reciprocals are most common in grandparental and uncle-aunt-
terms but are also found in great-grandparental and parental terms
and for in-laws.
The generalizations thus far offered have all been based on the
concept of marked and unmarked categories. It may be observed
that at least one very important category, sex, has not been con-
sidered from this point of view. It may well be that neither male
nor female can be described as the unmarked category on a uni-
versal basis. In a number of instances the male term has zero
expression where the corresponding female term has an additional
morpheme, but the data on neutralizations give conflicting evidence.
Further, Lounsbury, in a pioneering contribution on the subject,
describes the feminine as unmarked in Iroquois, in consonance

with purely linguistic facts concerning Iroquois sex gender.
In view of the earlier observations regarding the higher text
frequencies of unmarked forms, it will be of interest to consider
the data from English based once more on the Lorge magazine
count and data from Spanish by Bou. We approach these data
with the following expectations: among lineal terms the generational
hierarchy leads to the predicted ordering 1. parental terms;
2. sibling terms and first descending generation terms; 3. grand-
parental; 4. grandchildren; 5. great-grandparentl; 6. great-grand-
children. On the basis of the discussion of the sex category we
will not expect a consistent preponderance of either male or female
terms. The results conform fully to these expectations. In category
two, the children terms are more frequent than the sibling terms,
except for Spanish hija and hermana, suggesting that generational
remoteness is here more important than seniority as a factor. The
results are subsumed in Table XXXIII.
TABLE XXXIII
father 3235 padre 5631
mother 3993 madre 5598
son 993 hijo 3765
daughter 865 hija 1749
child 1574
G brother 659 hermano 3120
sister 590 hermana 1811
G+2 grandfather 173 abuelo 1234
grandmother 346 abuela 1540
grandson 32 nieto 94
granddaughter 33 nieta 58
great-grandfather 8 bisabuelo 83
great-grandmother 19 bisabuela 10
G~3 great-grandson 0 bisnieto 4
great-granddaughter 0 bisnieta 4
Data from English, Spanish, French, German and Russian from
the earlier mentioned sources in which terms of the same generation
with different sex reference are consolidated shows that the pre-
dicated hierarchy holds for these languages without exception.8
TABLE XXXIV
English Spanish French German Russian
1
G+ 7,228 11,229 1,260 9,428 +721
G-1 1,858 5,514 1,030 6,047 721
G 1,249 4,931 419 3,449 703
G+2 519 2,774 83 614 293
G-2 65 152 31 242 20
G+* 27 93 31
0 4 29
A second set of hypotheses predicts greater frequency for lineal
than corresponding collateral terms. This is also verified in the
figures of Table XXXV.
TABLE XXXV
English Spanish French German
+1
G lineal 7,228 11,229 1,260 9,428
G+1 collateral 1,504 4,717 511 1,219
G-1 lineal 1,858 5,514 1,030 6,047
G-1 collateral 148 361 140 464
G lineal 1,249 4,931 419 3,449
G collateral 316 867 151 427
G+2 lineal 519 2,774 83 614
G+2 collateral 0 0 6
G-2 lineal 65 152
31 242
G-2 collateral 0 0 6
Finally, as would be expected there is overwhelmingly greater
frequency for consanguineal terms over corresponding affinal ones.
* English, E. L. Thorndike and I. Lorge op. cit.; Spanish, I. R. Bou, op. cif.\
German, Kaeding, op. cit.; Russian, H. H. Josselson op. cit. Blanks indicate
items not concluded in the count. In Russian both 'father' and 'mother1 otets
and maf are in Josselson's group of words (Group I) whose frequency was
so great that they were not counted after a certain point. The figures are there-
fore not comparable with the rest but are necessarily greater than any of the
others in the first sources counted.
father 3235 father-in-law 17 padre 5631 suegro 15

mother 3993 mother-in-law 53 madre 5598 suegra 37
brother 659 brother-in-law 23 hermano 3120 cunado 50
sister 590 sister-in-law 18 hermana 1811 cunada 16
son 993 son-in-law 27 hijo 3765 yerno 17
daughter 590 daughter-in-law 19 hija 1749 nuera 16
The approach to universals of kinship has thus far been through

the concept of marked and unmarked categories. There has been
no overt mention of typologies. Yet it is easy to see that implicit
typologies are involved. Thus, to take one example among many,
the neutralization for sex of referent in the marked category of
second descending generation for lineal terms as against second
ascending generation can be restated in terms of a typology. We
classify kinship terminologies into those which distinguish sex of
referent in second ascending generation lineal terms and those
which do not. We similarly classify systems into two types for
second descending generation lineal terms. The operation of these
two sets of criteria simultaneously produces four logically possible
types of language. Type one, in which both ascending and des-
cending generations distinguish sex, is exemplified by English. Type
two, in which neither second ascending nor second descending
generations distinguish sex of referent, is represented by Lunda, a
Bantu group. Type three, with sex distinguished in the ascending
but not descending generation, has as one of its members Sara
Ainu. The fourth type, however, with sex distinction in the second
descending generation but not in the second ascending generation,
apparently has no members. From this we restate our universal
in the common implicational form, distinction of sex in the second
descending generation implies the same distinction in the second
ascending generation, but not vice versa.
The approach through typologies in these and similar instances
is clumsy and rather unrevealing because a separate typology is
required for almost every universal and because the connections
among these universals through the master principle of marked
and unmarked categories do not appear. In other instances the
sheer number of possible typologies makes this approach inadvis-

able. Consider, for example, the question of sex of speaker and
addressee discussed earlier. A full typology will be based on the
existence of nine possible classes of terms according to whether the
speaker is male, female, or either sex and the addressee male, female,
or either sex. Of these 9 types of terms, any system might theore-
tically contain a single type only, some combination of two types,
and so on up to use of all 9. Of course, some of these are excluded
by certain considerations. For example, a system consisting
exclusively of terms with addressees of male sex only would lack ail
designations for female kin. The theoretical possibilities are 29
or 512 types, and even the exclusion of some of these for reasons
such as those just described will leave several hundred types. On a
pure sampling basis some of these will be expected to be lacking.
A large variety of unenlightening implications will be possible.
However there are some instances in which a typological
approach is useful. There was earlier current a typology of kinship
systems, the main lines of which continue to be followed in more
recent work.3 Kinship systems were classified on the basis of
parental and parents' sibling terms, in other words, those of the
first ascending generation. The key terms here are for males
father, father's brother, and mother's brother.
We may distinguish four types of kinship terminologies. In the
generational type all three of these relatives are referred to by the
same terms. In the lineal type, to which our system belongs, the
father is distinguished from the two collateral relatives which are
merged in a single uncle term. In the bifurcate collateral system
all three father, paternal uncle, and maternal uncle are
designated by separate terms. Finally in the bifurcate merging
systems the paternal line relatives, father and father's brother,
receive the sample appellation, while a second term is used for the
mother's brother. There are thus four types, generational, lineal,
bifurcate collateral, and bifurcate merging, and no other type is
even considered. But, in fact, there are five logical possibilities.
* R. H. Low, "Kinship Terminology", Encyclopaedia Britannica (date of
first edition in which this article appears was not obtainable).
For we can have either one, two, or three terms for these three
kin types. Obviously the use of a single term or three separate terms
each give one type. But for systems with two terms, any one of the
three can receive a unique designation, while the other two fall
under a second term. There are therefore three additional types
producing a total of five not four. The missing type is the one in
which the father and mother's brother are covered by a single kin
term, while the father's brother is given a separate name. The fact
that this type is not even mentioned is sufficient evidence of its
extreme rarity or non-existence. In fact, I do not know of a single
instance of this type. Its usual absence leads to the following im-
plicational universal: whenever the father and mother's brother
are designated by the same term the father's brother is likewise
designated by the same term. Note that the father and mother's
brother are the two most divergent, as it were, of the three relatives
in that they differ both in the lineal/collateral dimension and in
line of descent paternal/maternal.
Analogous typologies can be constructed in other cases, and
their complexity, in the sense of number of possible types, will
depend of course on the size of the basic set of relatives. The
earlier observation that all languages distinguish father from
mother was an example of the simplest possible case. Here there
are only two kin types, father and mother, and therefore only
two logically possible types, those which use two terms and those
which use one, that is have no separate father and mother term.
Of these two types, apparently all languages belong to the first and
none to the second.
An example of a more complex typology is one based on grand-
parent terms, for here there are four kin types to be considered
father's father, mother's father, father's mother, and mother's
mother. In this instance there are fifteen logically possible clas-
sifications. With one term there is one possibility. With two terms
either term covers two relatives, or one covers three and the other
a single kin type. The former occurs three ways, the later four,
making a total of seven. For three terms the only possible division
is two, one, one; and this can occur in six ways. There is only one
possible way of applying four terms. This gives us a total of

1 + 7 + 6 + 1 or fifteen types. Of these fifteen types, only six
types occur in Gifford's survey of California kinship systems. Two
other types occur in my material, one being common elsewhere but
not found in California. In the following table each type is listed,
together with a judgement as to whether it is frequent, uncommon,
or, at least to my present knowledge, non-existent:
1. A. FaFa, FaMo, MoFa, MoMo common
2. A. FaFa, FaMo B. MoFa, MoMo occurs
3. A. FaFa, MoFa B. FaMo, MoMo common
4. A. FaFa, MoMo B. FaMo, MoFa not found
5. A. FaFa B. FaMo, MoFa, MoMo not found
6. A. FaMo B. FaFa, MoFa, MoMo not found
7. A. MoFa B. FaFa, FaMo, MoMo not found
8. A. MoMo B. FaFa, FaMo, MoFa occurs
9. A. FaFa, FaMo B. MoFa C. MoMo occurs
10. A. FaFa, MoFa B. FaMo C. MoMo occurs
11. A. FaFa, MoMo B. FaMo C. MaFo not found
12. A. FaMo, MoFa B. FaFa C. MoMo occurs
13. A. FaMo, MoMo FaFa C. MoFa not found
14. A. MoFa, MoMo B. FaFa C. FaMo not found
15. A.'FaFa B. FaMo C. MoFa D. MoMo common
A certain order is brought into this multiplicity of types if we search
for those combinations of kin types which are never classified
together in occurrent types, except in type 1 which involves a single
term for all relatives. In fact, there is only one such set consisting
of FaFa and MoMo. Among all terminologies which involve a
classification, that is all outside of type 1, those for which I have
found examples, namely 2, 3, 8, 9, 10, 12, and 15, put FaFa and
MoMo in different classes. For the converse there are only two
types, 5, and 14 which put FaFa and MoMo in different classes,
but which do not occur in my material. The explanation for the
non-occurrence of these types is probably that both involve some
terms in which sex of referent is specified and some which do not.
This result is obviously consonant with the conclusion derived
earlier from the consideration of first ascending generation terms.

It will be recalled that the only theoretically possible type which
did not occur was that in which father and mother's brother are
classified together as against father's brother. Here, similarly,
father's father differs from mother's mother in the two coordinates
of sex of connectiong relative and sex of referent. This is true of
one other pair, mother's father and father's mother, and indeed
there is only one occurrent type in which these are classified
together, outside of type 1. Of course, this is type 12 for which I
have thus far found only a single example, Wikmunkan in Australia.
It may be noted that throughout this discussion free use has been
made of number of categories, e.g. consanguineal vs. affinal, lineal
vs. collateral, generation, etc. It is worth noting that these cate-
gories play much the same role in the analysis of kinship termi-
nologies as features do in phonological comparison. It is this
analogy which underlies the current development of componential
analysis. Like the features they are a finite set of categories, usually
binary, in terms of which any kin term in any system may be
adequately specified and which provide the indispensible analytic
framework for comparative analyses, such as the present one. As
with the phonological features, certain ones are utilized in all
systems and certain ones are more restricted in their distribution.
A consideration, of these facts leads to universale of a kind analogous
to those of Jakobsonian phonology, e.g. that all languages exhibit
an opposition of vocalic and non-vocalic. This set of categories
was first described in a fundamental study of A. L. Kroeber, in
which he proposed eight categories: 1. generation; 2. lineal vs.
collateral; 3. relative age within generation; 4. consanguineal vs.
affinal; 5. sex of relative; 6. sex of speaker; 7. sex of connecting
relative; 8. condition of life of connecting relative, i.e. whether
living or dead.4
A category may be said to be used in a system if it enters into
the definition of at least one- kinship term. Thus sex of relative
(i.e. of referent) is present in the English system because of terms
4
A. L. Kroeber, "Classificatory Systems of Relationship", Journal of the
Royal Anthropological Society 39.77-84 (1909).
like 'brother' and 'sister' even though it is neutralized in 'cousin'.

On this basis three of Kroeber's eight categories seem to be uni-
versal. All systems make some use of (1) generation; (4) consan-
guineal vs. affinal distinction; and (5) sex of relative.
Conclusions such as those proposed here may seem of lesser
interest to students of social structure than the current typological
approaches in which differences in kinship types are the objects
of attention because they lend themselves to the framing of hypo-
theses connecting terminology with social institutions. In fact, of
course, the two enterprises are complementary. In seeking for
differences we by the same token discover similarities as a negative
result, and in seeking for similarities we uncover differences as
a negative result. It may also be pointed out that in the wider
context of social sciences in general, among which linguistics must
be numbered, a correlation involving kinship and social institutions
is a universal connecting linguistic and non-linguistic social data,
while a universal within terminologies connects linguistic with
other linguistic data, and these are also in the broad sense social.
It is to be hoped that the present results, which will certainly
both be amplified in scope and rectified in details in subsequent
studies, will serve to show that at least one area of semantics can
be treated with as great exactitude and can be as fruitful in con-
clusions of universal scope as the study of the more formal aspects
of language such as phonology and grammar.
REFERENCES
1. J. Avery, "Verb-inflection in Sanskrit", Journal of the American Oriental

Society 10.219-324 (1880).
2. I. R. Bou, Recuento de Vocabulario Espanol vol. I (Universidad de Puerto
Rico, 1952).
3. Elchouemi, "Statistique des formes verbales dans le Coran, Resume",
Bulletin de la Linguistique de Paris SO, xxx-xxxxi (1954).
4. C. A. Ferguson and M Chowdbury, "The Phonemes of Bengali", Language
36.22-59 (1960).
5. E. W. Gifford, "California Kinship Terminologies", University of California
Publication in American Archaeology and Ethnography 18.1-285 (1922).
6. J. H. Greenberg, ed., Universals of Language (Cambridge, Mass. 1963).
7. J. H. Greenberg, "Nekotoryje obobSienija kasaju&ijesja vozmoinyx
nacal'nyx i konecnyx posledovatelnostej soglasnyx", Voprosy Jazykoznanya,
4.41-65 (1964).
8. J. H. Greenberg and J. J. Jenkins, "Studies in the Psychological Correlates
of the Sound System of American English", Word 20.157-177 (1964).
9. R-M. S. Heffner, General Phonetics (Madison, 1964).
10. L. Hjelmslev, La Catgorie des Cos (Aarhus, 1935).
11. , Prolegomena to a theory of language (Baltimore, 1953).
12. C. F. Hockett, Manual of Phonology (Baltimore, 1955).
13. R. Jakobson, "Zur Struktur des russischen Verbums", Charistera Guilelmo
Mathesio, 74-84 (Prague, 1932).
14. , "Signe Zero", ?//# Bally 143ff. (Geneva, 1939).
15. , Shifters, verbal categories and the Russian verb (Cambridge, 1957).
16. D. Jones, The phoneme: its nature and use (Cambridge, Mass., 1962)
17. L.V.Jones and S. Fillenbaum, Gramatically classified world associations
(Chapel Hill 1964).
18. . . Josselson, The Russian wosd count (Detroit, 1953).
19. F. W. Kaeding, H ufigkeitsw rterbuch der Deutschen Sprache (Berlin,
1897-1898).
20. A. L. Kroeber, "Classificatory Systems of Relationship", Journal of the
Royal Anthropological Society 39.77-84 (1909).
21. H. Kucera, "Entropy, redundancy and functional load in Russian and
Czech", American contributions to the 5th International Congress of Slavists
191-218 (Sofia, 1963).
22. C. H. Lanman, "Noun inflection in the Veda", Journal of the American
Oriental Society 10.325-601 (1880).
REFERENCES 89
23. J. Lotz, "Vowel frequency in Hungarian", Word 8.227-35 (1952).
24. D. S. Palermo and J. J. Jenkins, Word association norms, grade school
through college (Minneapolis, 1963).
25. E. L. Thorndike and I. Lorge, The teacher's word book of 30,000 words
(New York, 1944).
26. B. Trnka, "On some problems of neutralization", Omagiu lui Jorgu Jordan
861-6 (Bucharest, 1958).
27. N. S. Trubetskoy, Grundzge der Phnologie (Prague, 1939).
28. A. Valdman, "Les bases statistiques dc I'anteriorite articulatoire du
francais", Le Francais Moderne 27.102-10 (1959).
29. G. E. Vander Beke, French word book (New York, 1929).
30. G. K. Zipf, Psychobiology of language (Boston, 1935).
31. , Human behavior and the principle of least effort (Cambridge, Mass.,
1963).

Language Universals - Greenberg

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Language Universals - Greenberg

Uploaded by

Copyright:

Available Formats

This work appeared originally as volume 59 of the series

Library of Congress Cataloging-in-Publication Data

Greenberg, Joseph Harold, 1915-

Printed on acid-free paper which falls within the guidelines

Bibliographic information published by Die Deutsche Bibliothek

Die Deutsche Bibliothek lists this publication in the Deutsche

Copyright 1966, 2005 by Walter de Gruyter GmbH & Co. KG,

Joseph H. Greenberg's short book Language Universals, just 89

explanatory theories was known before Greenberg (see, in particular,

(3) In all languages, if there is a frequency difference between unpala-

expected behavior of long vowels, p. 22, and of the neuter gender,

The work presented here is a somewhat revised and expanded

The problem of universals in the study of human language as in

specific universale commonly arrived at by a more purely empirical

The first use of the concept of marked and unmarked categories

Although, in principle, no doubt, neutralization is viewed as a

TABLE III TABLE IV

The material just cited displays a decisively greater over-all

TABLE VIII TABLE IX

TABLE XII TABLE XIII

The Chiricahua results are particularly noteworthy since we have

Thus far two characteristics of unmarked features have been con-

vowel length. It is not unusual for the number of long vowels to

the basic allophone by possession of a marked feature, while the

As was noted earlier, Jakobson in his article "Signe Zlro" indicated

austere confines of mathematical and logical symbolism. Thus

quoted above. Thus, parallel to the example man (unmarked),

those suggested by Hjelmslev and to make some use of his ter-

representative of both. Indeed whenever defectivation occurs in a

* C. H. Lanman, "Noun inflection in the Veda", Journal of the American

Language Size of Sample Singular Plural Dual

the case is a confirmatory one, and such examples will be cited. If

given their definitional specifications, what the relations of marked

plural w6 men, always 'we'. In languages in which the second

Plural, we 17,996; us 4,943; our 7,599; they 18,010; their

* I. R. Bou, Recuento de Vocabulario Espaftol vol. I (Universidad de Puerto

The next category to be considered is case in the noun. Here for

meant heterogeneous collections of nouns which, however, share a

the neuter noun. In Dravidian languages the neuter syncretizes the

of positive, comparative, and superlative. Thus for long, longer,

cardinal/ordinal distinction does not exist for numbers larger than

is a further remarkable regularity in the number ten, which is less

These considerations would lead one to posit, tentatively at

indicative or imperative, e.g. est, sunt. Hebrew has a hortative

Another basic category of the verb to be considered is that of

to form a marked category in relation to the present. Thus in

" E. O. Ashton, Swahili Grammar, 37 (London, 1944).

imperfective. A facultative future formed by the prefix sa- added to

traditional Arabic grammar) has a proportional frequency greater

" Elchoueini, "Statistique des formes verbales dans le Goran", Restime

" C. C. Fries, The Structure of English 51. (New York, 1952).

kinship terminologies. An example of syncretization can also be

neutralization occurs with these terms and it is the unmarked

and the present participle in utilizable form. In two instances

The closeness of the relationship between the notion of marked

isomorphism. In fact such an isomorphism can be established

there is no necessary hierarchy among the components, such as

As a fourth clue to unmarked status in phonology, it was

as cuello y camisa blancos 'white collar (masc.) and shirt (fern.)'

an intuitive recogniton of these relationships. The singular is

phonological neutralization, however, to facultative expression does

and grammatical level asserts itself, the sound-meaning relationship