Acm

Computational Linguistics
What is it and what (if any) are its unifying themes?
Computational linguistics
I often agree with XKCD
computational linguistics linguistics? physics chemistry biology neuropsychology psychology literary criticism
more rigorous
more less rigorous flakey

3
What defines the rigor of a field?

Whether results are reproducible Whether theories are testable/falsifiable Whether there are a common set of methods for similar problems Whether approaches to problems can yield interesting new questions/answers
Linguistics
engineering
linguistics
sociology
literary criticism
more rigorous
less rigorous
6
other areas of sociolinguistics (e.g. Deborah Tannen) theoretical linguistics (e.g. minimalist syntax)
less rigorous
theoretical linguistics (e.g. lexical-functional grammar)
historical linguistics
some areas of sociolinguistics (e.g. Bill Labov) psycholinguistics
experimental phonetics
more rigorous
The true situation with linguistics
Okay enough already What is computational linguistics

Text normalization/segmentation Morphological analysis Automatic word pronunciation prediction Transliteration Word-class prediction: e.g. part of speech tagging Parsing Semantic role labeling Machine translation Dialog systems Topic detection Summarization Text retrieval Bioinformatics Language modeling for automatic speech recognition Computer-aided language learning (CALL)
Computational linguistics
Often thought of as natural language engineering
But there is also a serious scientific component to it.
Why CL may seem ad hoc

Wide variety of areas (as in linguistics) If its natural language engineering, the goal is often just to build something that works Techniques tend to change in somewhat faddish ways
For example: machine learning approaches fall in and out of favor
10
11
12
13
14
Machine learning in CL
In general its a plus since it has meant that evaluation has become more rigorous But its important that the field not turn into applied machine learning For this to be avoided, people need to continue to focus on what linguistic features are important Fortunately, this seems to be happening
15
Some interesting themes

Finite-state methods:
Many application areas Raises interesting questions about how much of language is regular (in the sense of finite state)
Grammar induction:
Linguists have done a poor job at their stated goal of explaining how humans learn grammar
Computational models of language change:

Historical evidence for language change is only partial. There are many changes in language for which we have no direct evidence.
16
Finite state methods

Used from the 1950s onwards Went out of fashion a bit during the 1980s Then a revival in the 1990s with the advent of weighted finite-state methods
17
Some applications
Analysis of word structure morphology Analysis of sentence structure
Part of speech tagging Parsing
Speech recognition Text normalization Computational biology

18
Regular languages
A regular language is a language with a finite alphabet that can be constructed out of one or more of the following operations:
Set union Concatenation Transitive closure (Kleene star)
19
Finite state automata: formal definition
Every regular language can be recognized by a finite-state automaton. Every finite-state automaton recognizes a regular language. (Kleenes theorem)
20
Representation of FSAs: State Diagram
21
Regular relations: formal definition
22
Finite-state transducers
23
An FST
24
Composition
In addition to union, concatenation and Kleene closure, regular relations are closed under composition Composition is to be understood here the same way as composition in algebra:
R1oR1 means take the output of R1 and feed it to the input of R2
25
Composition: an illustration
26
R1 as a transducer
27
R2 as a transducer
28
R1R2
29
Some things you can do with FSTs

Text analysis/normalization
Word segmentation Abbreviation expansion Digit-to-number-name mappings i.e. mapping from writing to language
Morphological analysis Syntactic analysis

E.g. part-of-speech tagging
(With weights) pronunciation modeling and language modeling for speech recognition
30
Thats fine for engineering but

Does it really account for the facts?
Is morphology really regular? Is the mapping between writing and speech really regular?
31
What is morphology?
scripsrunt is third person, plural, perfect, active of scrb (`I write) Morphology relates word forms
the lemma of scripsrunt is scrb
Morphology analyzes the structure of word forms

scripsrunt has the structure scrb+s+runt
32
Morphology is a relation
Imagine you have a Latin morphological analyzer comprising:
D: a relation that maps between surface form and decomposed form L: a relation that maps between decomposed form and lemma
Then:
scripsrunt D = scrb+s+runt scripsrunt D L = scrb
33
English regular plurals

cat + s = cats /s/ dog + s = dogs /z/ spouse + s = spouses /z/ This can be implemented by a rule that composes with the base word, inserting the relevant form of the affix at the end
34
Templatic affixes in Yowlumne
Transducer for each affix transforms base into required templatic form and appends the relevant string.
35
Subtractive morphology
Transducer deletes final VC of the base

36
Bontoc infixation
Insert a marker > after the first consonant (if any) Change > into the infix um-
37
Side note infixation in English
Kalamazoo
f*****g
38
Reduplication: Gothic
Problem: mapping w to ww is not a regular relation

39
Factoring Reduplication
Prosodic constraints
Copy verification transducer C
40
Non-Exact Copies
Dakota (Inkelas & Zoll, 1999):
41
Non-Exact Copies
Basic and modified stems in Sye (Inkelas &
Zoll, 1999):
they will fall all over
42
Morphological Doubling Theory

(Inkelas & Zoll, 1999)
Most linguistic accounts of reduplication assume that the copying is done as part of morphology In MDT:
Reduplication involves doubling at the morphosyntactic level i.e. one is actually simply repeating words or morphemes Phonological doubling is thus expected, but not required
43
Gothic Reduplication under Morphological Doubling Theory
44
Summary
If Inkelas & Zoll are right then all morphology can be computed using regular relations This in turn suggests that computational morphology has picked the right tool for the job
45
Another Example: Linguistic analysis of text

Maps between the stuff you see on the page e.g. text written in the standard orthography of a language into linguistic units (words, morphemes, phonemes) For example:
I ate a 25kg bass [aI It twnti faIv kIlgrm bs]
This can be done using transducers

But is the mapping between writing and language really regular (finite-state)?
46
Linguistic analysis of text

Abbreviation expansion Disambiguation Number expansion Morphological analysis of words Word pronunciation
47
A transducer for number names

Consider a machine that maps between digit strings and their reading as number names in English. 30,294,005,179,018,903.56 thirty quadrillion, two hundred and ninety four trillion, five billion, one hundred seventy nine million, eighteen thousand, nine hundred three, point five six
48
Mapping between speech and writing

It seems obvious on the face of it that the mapping between speech and its written form is regular. After all, the words are ordered in the same way as speech. Even the letters tend to be ordered in the same way as the sounds they represent.
49
Some examples where it isnt

honorific inversion
m n `nx t
r`
w t
xpr
w nb
50
Finite state methods

In morphology they seem almost exactly correct as characterizations of the natural phenomenon In the mapping from writing to language, again, finite-state models seem almost exactly correct
51
Grammar induction
The common nativist view in linguistics
From Gilbert Harman's review of Chomsky's New Horizons in the Study of Language and Mind (published in Journal of Philosophy, 98(5), May 2001): Further reflection along these lines and a great deal of empirical study of particular languages has led to the "principles and parameters" framework which has dominated linguistics in the last few decades. The idea is that languages are basically the same in structure, up to certain parameters, for example, whether the head of a phrase goes at the beginning of a phrase or at the end. Children do not have to learn the basic principles, they only need to set the parameters. Linguistics aims at stating the basic principles and parameters by considering how languages differ in certain more or less subtle respects. The result of this approach has been a truly amazing outpouring of discoveries about how languages are the same yet different.
52
Similarly
Cedric Boeckx and Norbert Hornstein. 2003. The Varying Aims of Linguistic Theory. Children come equipped with a set of principles of grammar construction (i.e. Universal Grammar (UG)). The principles of UG have open parameters. Specific grammars arise once values for these open parameters are specified. Parameter values are determined on the basis of [the primary linguistic data]. A language specific grammar, then, is simply a specification the values that the principles of UG leave open.
53
My challenge with Shalom Lappin
54
55
Automatic induction of grammars from unannotated text

Klein, Dan and Manning, Christopher. 2004. Corpus-based induction of syntactic structure: models of dependency and constituency. Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics Lots of subsequent work
56
Different syntactic representations
57
Dependency Model with Valence (DMV)

Each head generates a set of non-STOP arguments to one side, then a STOP argument; then similarly on the other side
Trained using expectation maximization

58
Performance
59
Improvements
Constituent structure can be induced in a similar way to inducing word classes (e.g. parts of speech) by considering the environments in which the putative constituent finds itself. In Klein & Mannings constituent-context model (CCM) probability of a bracketing is computed as follows:
60
Combined DMV+CCM
Subsequent work e.g. Rens Bods 2006 Unsupervised Data Oriented Parsing report F-scores close to 83.0 For comparison, the best supervised parsers get about 91.0 61
Some objections and a synopsis

Children do not learn grammars from unannotated text corpora: they get a lot of guidance from the environmental situation
Sure
Performance of automatic induction algorithms is still far from human performance so they do not constitute evidence that we can do away with (nativist) linguistic theories of language acquisition
They do not show this. But the argument would have more weight if nativist theories had already been demonstrated to contribute to a working model of grammar induction
But Computational Linguistics is starting to make some serious contributions to this 50-year-old debate
62
The evolution of complex structure in language
Examples from: Stump, Gregory (2001) Inflectional Morphology: A Theory of Paradigm Structure. Cambridge University Press. 63
Evolutionary Modeling (A tiny sample)

Hare, M. and Elman, J. L. (1995) Learning and morphological change. Cognition, 56(1):61--98. Kirby, S. (1999) Function, Selection, and Innateness: The Emergence of Language Universals. Oxford Nettle, D. "Using Social Impact Theory to simulate language change". Lingua, 108(2-3):95--117, 1999. de Boer, B. (2001) The Origins of Vowel Systems. Oxford Niyogi, P. (2006) The Computational Nature of Language Learning and Evolution. Cambridge, MA: MIT Press.
64
A multi-agent simulation
System is seeded with a grammar and small number of agents
Each agent randomly selects a set of phonetic rules to apply to forms Agents are assigned to one of a small number of social groups
2 parents beget child agents.

Children are exposed to a predetermined number of training forms combined from both parents
Forms are presented proportional to their underlying frequency
Children must learn to generalize to unseen slots for words Learning algorithm similar to:
David Yarowsky and Richard Wicentowski (2001) "Minimally supervised morphological analysis by multimodal alignment." Proceedings of ACL-2000, Hong Kong, pages 207216. Features include last n-characters of input form, plus semantic class
Learners select the optimal surface form to derive other forms from (optimal = requiring the simplest resulting ruleset a Minimum Description Length criterion)
Forms are periodically pooled among all agents and the n best forms are kept for each word and each slot Population grows, but is kept in check by natural disasters and a quasiMalthusian model of resource limitations
Agents age and die according to reasonably realistic mortality statistics
65
Final states for a given initial state
66
Another example
Kirby, Simon. 2001. Spontaneous evolution of linguistic structure: an iterated learning model of the emergence of regularity and irregularity. IEEE Transactions on Evolutionary Computation, 5(2):102--110. Assumes two meaning components each with 5 values, for 25 possible words Initial speaker randomly selects examples from the 25, producing random strings for each, and teaches them to the hearer Not all of the slots are filled, thus producing a bottleneck: the hearer must compute forms for the missing slots
67
The basic algorithm produces results that are too regular

Initial state
Final state
68
A more realistic result

Addition of other constraints, including
a random tendency for speakers to omit symbols, a frequency distribution over the 25 possible meaning combinations
69
Summary
Evolutionary modeling is evolving slowly
We are a long way from being able to model the complexities of known language evolution
Nonetheless, computational approaches promise to lend insights into how complex social systems such as language change over time, and complement discoveries in historical linguistics
70
Final thoughts
Language is central to what it means to be human. Language is used to:
Communicate information Communicate requests Persuade, cajole (In written form) record history Deceive
Other animals do some or most of these things (cf. Anindya Sinhas work on bonnet macaques) But humans are better at all of these
71
Final thoughts
So the scientific study of language ought to be more central than it is We need to learn much more about how language works
How humans evolved language How languages changed over time How humans learn language
Computational linguistics can contribute to all of these questions.

72
73

Acm

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Acm

Uploaded by

Copyright:

Available Formats

Computational Linguistics

What is it and what (if any) are its unifying themes?

I often agree with XKCD

more less rigorous flakey

What defines the rigor of a field?

theoretical linguistics (e.g. lexical-functional grammar)

some areas of sociolinguistics (e.g. Bill Labov) psycholinguistics

The true situation with linguistics

Okay enough already What is computational linguistics

But there is also a serious scientific component to it.

Why CL may seem ad hoc

Some interesting themes

Computational models of language change:

Finite state methods

Speech recognition Text normalization Computational biology

Finite state automata: formal definition

Representation of FSAs: State Diagram

Regular relations: formal definition

Some things you can do with FSTs

Morphological analysis Syntactic analysis

Thats fine for engineering but

Morphology analyzes the structure of word forms

English regular plurals

Templatic affixes in Yowlumne

Transducer deletes final VC of the base

Side note infixation in English

Problem: mapping w to ww is not a regular relation

Copy verification transducer C

they will fall all over

Morphological Doubling Theory

Gothic Reduplication under Morphological Doubling Theory

Another Example: Linguistic analysis of text

This can be done using transducers

Linguistic analysis of text

A transducer for number names

Mapping between speech and writing

Some examples where it isnt

Finite state methods

My challenge with Shalom Lappin

Automatic induction of grammars from unannotated text

Different syntactic representations

Dependency Model with Valence (DMV)

Trained using expectation maximization

Some objections and a synopsis

The evolution of complex structure in language

Evolutionary Modeling (A tiny sample)

2 parents beget child agents.

Final states for a given initial state

The basic algorithm produces results that are too regular

A more realistic result

Computational linguistics can contribute to all of these questions.

You might also like