Download as pdf or txt
Download as pdf or txt
You are on page 1of 22

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/326539592

Wittgenstein, Turing, and Neural Networks

Article · July 2018

CITATIONS READS

0 481

2 authors:

Gianluigi Oliveri Salvatore Gaglio


Università degli Studi di Palermo Università degli Studi di Palermo
26 PUBLICATIONS   68 CITATIONS    328 PUBLICATIONS   3,180 CITATIONS   

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Automatic Image Annotation View project

Concepts representation View project

All content following this page was uploaded by Gianluigi Oliveri on 21 July 2018.

The user has requested enhancement of the downloaded file.


Wittgenstein, Turing, and Neural Networks

Gianluigi Oliveri - Salvatore Gaglio

Wittgenstein, Turing, and Neural Networks

Introduction

When dealing with problems like ‘What is mind?’, ‘What is meaning?’,


etc. some philosophers put forward solutions which presuppose a commitment
to the existence of, for example, a res cogitans, a Platonistic realm of abstract
objects, etc. Other philosophers, instead, guided by an Ockham’s razor inspired
iconoclastic zeal, devote themselves to addressing the questions above within a
‘naturalizing context’ whereby, as Quine famously put it:

«[K]nowledge, mind, and meaning are part of the same world that they have
to do with, and […] are to be studied in the same empirical spirit that ani-
mates natural science. There is no place for prior philosophy»1.

We take the work of the later Wittgenstein to be broadly characterized by


a naturalistic attitude which manifests itself, in particular, in passages like the
following where he claims that:

«When philosophers use a word – “knowledge”, “being”, “object”, “I”, “prop-


osition”, “name” – and try to grasp the essence of the thing, one must always
ask oneself: is the word ever actually used in this way in the language-game
which is its original home? – What we do is to bring words back from their
metaphysical to their everyday use»2.

The main task of this paper is grounding the socio-anthropological ‘natu-


ralization’ of meaning operated by the later Wittgenstein in his remarks on rule-
following in the Philosophical Investigations in considerations relating to models
of low-level (biological) processes of imitation, training, and learning. If the

1
W.V. Quine, Ontological Relativity & Other Essays, Columbia University Press, New York
1993, p. 26.
2
L. Wittgenstein, Philosophical Investigations, Basil Blackwell, Oxford 1983, section 116, p. 48e.

GdM 1/2018 215-235


215
Gianluigi Oliveri - Salvatore Gaglio

operation suggested above is successful, two of its immediate consequences are


that the social aspect of language can no longer be considered to be a primitive
notion, but needs to be placed upon, if not reduced to, a biological foundation;
and that the study of thought, and actually of certain brain processes, becomes
prior in the order of explanation to the study of language.
The issues raised in this article are relevant to Wittgenstein scholarship, to
any attempt to produce an acceptable philosophy of language, and to all those
interested in the tenability of one of the corner stones of analytic philosophy,
the so-called ‘priority thesis’: the study of language is prior, in the order of ex-
planation, to the study of thought, «[t]hat is to say, there can be no account of
what thought is, independently of its means of expression»3.
The core-discussion begins with a paradigmatic example of how meta-
physics comes about when studying matters relating to meaning, and, after hav-
ing given an account of the socio-anthropological ‘naturalization’ of meaning
operated by the later Wittgenstein, it proceeds to questioning the coherence of
the notions of ‘training’ (brute training) and learning he uses in his account of
rule-following in the Philosophical Investigations.
In the remaining part of the paper we argue that the particular example of
neural network there considered successfully models brute training and learn-
ing resolving the incongruities present in the Wittgensteinian treatment of
these two concepts. This shows that the appropriate foundation for the corre-
sponding phenomena is biological rather than socio-anthropological, as Witt-
genstein would have it instead.

From equations to metaphysics

One of the most stimulating questions for Frege’s reflections on mathe-


matics was the problem of explaining why equations like (1) are not informative
whereas equations like (2) are wonderfully so:

eiπ = eiπ ; (1)


eiπ = cos π + i sin π. (2)

As is well known, Frege’s solution of the conundrum posed by equations


(1) and (2)4, besides contributing a new theory of meaning to the philosophy

3
M.A.E. Dummett, The Logical Basis of Metaphysics, Duckworth, London 1991, p. 3.
4
Equation (2) is informative, because, although the proper names ‘eiπ’ and ‘cos π + i sin π’

216
Wittgenstein, Turing, and Neural Networks

of language, presupposes a rather elaborate metaphysics which appeals to the


existence of three realms of reality or, as Popper has it, three worlds.
To see this latter point consider that, for Frege, in correspondence of a
proper name5 like ‘The star nearest to the Earth’ (a definite description), there
is (a) a reference, represented by a unique object, the Sun, belonging to what he
calls ‘first realm’6; (b) a sense, which is the way the reference of the proper name
is presented to us: the star nearest to the Earth; and (c) a representation, that
is, our subjective image of the reference, subjective image which, besides being
part of the second realm7, is an object of description of the so-called ‘private
language’ or ‘language of sensations’8 .
According to Frege, the property of having a sense and a reference is not
confined to proper names. For him, those declarative sentences in which the
only thing that interests us is the denotation of the words occurring in them
have a reference and a sense as well, a reference and a sense represented, respec-
tively, by a truth-value and a thought.
However, in contrast with the psycho-physiological process of thinking
taking place at a certain point in time in our heads with all its surrounding
cohorts of feelings, moods, emotions, etc., thoughts, according to Frege, are

occurring on either side of the equality sign have the same reference, they, nevertheless, have a different
sense (content). See on this G. Frege, Senso e Denotazione, in A. Bonomi (ed.), La struttura logica del
linguaggio, Valentino Bompiani, Milano 1973, pp. 9-32.
5
According to Frege, a proper name is a word (or a sign or a connection of signs or an expression)
endowed with a sense and a (unique) reference. The reference of a proper name is an object (concrete
or abstract). For Frege, concepts (and relations) are not to be regarded as objects (see on this G. Frege,
‘Concetto e Oggetto’, in A. Bonomi [ed.], La struttura logica del linguaggio, Valentino Bompiani, Milano
1973, pp. 373-386).
6
According to Frege, concrete objects belong to the first realm, they: (1) can be perceived; (2) do
not belong to the content of our consciousness; (3) exist independently of us; (4) are ‘actual,’ that is, they
are subject to the principle of action and reaction. See G. Frege, Thoughts, in Id., Logical Investigations,
P.T. Geach editor, B. Blackwell, Oxford 1977, pp. 1-30.
7
For Frege, representations or ideas (1) cannot be perceived; (2) belong to the content of our
consciousness; (3) do not exist independently of us; (4) ‘every idea has only one owner’; (5) are the
objects of description of a private language, i.e., a language that only the speaker can understand (on this
last point see footnote 8). Cfr. G. Frege, Thought, cit.
8
Frege’s argument in favour of the existence of a private language can be found in the quotation
below: «[E]veryone is presented to himself in a special and primitive way, in which he is presented to
no-one else. So, when Dr Lauben has the thought that he was wounded, he will probably be basing it on
this primitive way in which he is presented to himself. And only Dr Lauben himself can grasp thoughts
specified in this way. But now he may want to communicate with others. He cannot communicate a
thought he alone can grasp» (ibi, pp. 12-13).

217
Gianluigi Oliveri - Salvatore Gaglio

abstract, timeless, and objective entities populating, together with the natural
numbers and other mathematical objects, his third realm9.
For Frege, thoughts are entities which we grasp with our understanding
(with ‘the power of thinking’). To this he adds that10:

«The expression ‘grasp’ is as metaphorical as ‘content of consciousness’. The


nature of language does not permit anything else. What I hold in my hand
can certainly be regarded as the content of my hand; but all the same it is
the content of my hand in quite another and a more extraneous way than
are the bones and muscles of which the hand consists or again the tensions
these undergo».

In spite of its suggestiveness, elegance, and explanatory power, Frege’s the-


ory of meaning presents a serious defect: it presupposes, among other things, a
Platonist third realm of abstract entities which is bound to remain but a myth
until the famously quasi-intractable problem concerning the accessibility con-
ditions to such a realm is satisfactorily solved. And, unfortunately, as the follow-
ing argument shows, the accessibility conditions offered by Frege in The Foun-
dations of Arithmetic, accessibility conditions which are largely based on the
so-called ‘context principle’, are neither necessary nor sufficient for believing
in the existence of, for example, abstract entities such as the natural numbers.
According to Frege, we are justified in believing in the existence of num-
bers as objects, even though these are neither ideas nor objects of intuition,
because, being in possession of non-vague identity conditions for numbers11,
(α) we can identify and re-identify numbers and (β) say true (or false) things
about them.
But, now, let us imagine we are playing Hamletic, where Hamletic is a
game that consists in producing true assertions about the characters of Shake-
speare’s Hamlet. In Hamletic wins the player that produces the largest number
of true assertions in a fixed interval of time.
Note that in Hamletic an assertion is true if and only if it is true according
to Hamlet12 ; and that, given any character of the play, we have identity condi-

9
For Frege the objects belonging to the third realm: (1) cannot be perceived; (2) do not belong
to the content of our consciousness; (3) exist independently of us; (4) are ‘objective’; (5) are ‘real’, but
not ‘actual’, because they are not subject to the principle of action and reaction. G. Frege, Thought, cit.
10
Ibi, footnote 6, pp. 24-25.
11
The identity conditions for natural numbers are provided by the definition of (cardinal)
number in conjunction with the so-called ‘Hume’s Principle’.
12
H. Field, Realism, Mathematics & Modality, B. Blackwell, Oxford 1989, p. 3.

218
Wittgenstein, Turing, and Neural Networks

tions for it if and only if we can identify and re-identify the character according
to Hamlet.
The interesting thing that emerges here is that in spite of the fact that
Hamletic satisfies Frege’s conditions (α) and (β) above this would, nevertheless,
be insufficient to show that Hamlet, Ophelia, Rosencrantz, Guildenstern, etc.
exist13.
Moreover, Frege’s conditions (α) and (β) are not necessary for believing in
the existence of objects either. For we are justified in believing in the existence
of X, simply because we perceive (are acquainted with) X, even though we do
not know what the identity conditions for X are, etc.

What is the meaning of ‘2n’?

The shortcomings of Frege’s theory of meaning – declarative sentences are


not names, etc. – and a merciless, Ockham razor driven, investigation of the
metaphysical presuppositions one should make to develop a tenable theory of
meaning, eventually led philosophers, like the later Wittgenstein and Quine, to
take a very different stand on these matters with respect to Frege.
According to the later Wittgenstein, and in contrast with the previous
tradition, before formulating ‘armchair theories of meaning’ which in the best
hypothesis produce only a possible account of, for instance, the language of
mathematics, and in the worse generate pseudo-problems, we should rather pay
much attention to how we actually use, and learn to use, the expressions belong-
ing to such a language.
For Wittgenstein, it is only on the basis of such a preliminary account of
‘learning and use’ that we should eventually reach out to the metaphysical pre-
suppositions needed for such phenomena/activities to be possible.
The new questions ‘How do we use expression φ in such-and-such a lan-
guage-game?’ and ‘How do we learn to use φ in such-and-such a way?’, loom
large from the very beginning of the Philosophical Investigations where, already
in section 1, Wittgenstein expounds and starts commenting upon Augustine’s
picture of language.

13
A well-known discussion of whether Frege’s context principle can be used to establish arith-
metical Platonism can be found in C. Wright, Frege’s Conception of Numbers as Objects, Aberdeen Uni-
versity Press, Aberdeen 1983, and in H. Field, Realism, Mathematics & Modality, cit., ch. 5, «Platonism
for cheap? Crispin Wright on Frege’s Context Principle», pp. 147–170.

219
Gianluigi Oliveri - Salvatore Gaglio

According to Wittgenstein’s understanding of Augustine’s views on these


matters, for Augustine14: «[T]he individual words in language name objects
[and] sentences are combinations of such names» and one of the crucial points
here is that, in the passage from the Confessions quoted by Wittgenstein, Augus-
tine proceeds to justify the position above by means of a recollection of how he
learned language.
Now, if in dealing with traditional controversies like the realism/anti- real-
ism dispute raging, for example, in the philosophy of mathematics we followed
Wittgenstein’s advice to study how we actually learn and do mathematics then,
according to him, we would realize that, in particular, we learn mathematics by
means of public and objective procedures, which involve neither teaching some-
one to contemplate abstract Platonic entities nor showing him how to engage in
introspective practices aimed at the identification of particular inner mental states.
It is the appreciation of the anti-metaphysical potentials of learning that
leads the later Wittgenstein to focus on the question ‘What is involved in the
understanding of the expression “2n”?’, rather than on the more traditional, and
metaphysically loaded, ‘What is the meaning of “2n”?’15.
This is because, whereas understanding ‘2n’, besides being strongly connect-
ed with the meaning of ‘2n’, is, nevertheless, dependent on the common or garden,
public and objective phenomenon known as ‘having learned some basic arithme-
tic’, this is not the case with the ‘deep question’ relating to the meaning of ‘2n’.
Indeed, for some authors, the meaning of ‘2n’ is one of the exhibits displayed
in Quine’s mythical museum so aptly described in ‘Ontological Relativity’, i.e., it is
either an abstract Platonic entity or a mental object which is given independently
of anybody’s understanding of ‘2n’ and of anybody’s learning arithmetic.
In particular, we should notice that, for Wittgenstein, understanding ‘2n’,
is not the outcome of a correct interpretation of the meaning of ‘2n’16:

«[T]here is a way of grasping a rule which is not an interpretation, but


which is exhibited in what we call “obeying the rule” and “going against it”
in actual cases.
‘[O]beying a rule’ is a practice»17.

14
L. Wittgenstein, Philosophical Investigations, cit., section 1, p. 2e .
15
As is well known, Wittgenstein defuses the question ‘What is the meaning of “2n”?’ by means of
the ‘mantra’: «For a large class of cases – though not for all – in which we employ the word “meaning”
it can be defined thus: the meaning of a word is its use in the language». L. Wittgenstein, Philosophical
Investigations, cit., section 43, p. 20e.
16
Ibi, section 201, p. 81e.
17
Ibi, section 202, p. 81e.

220
Wittgenstein, Turing, and Neural Networks

According to Wittgenstein, understanding ‘2n’ is matter of becoming mas-


ter of the technique that has been conventionally associated with the expression
‘2n’ within a given language-game: «The grammar of the word “knows” is evi-
dently closely related to that of “can”, “is able to”. But also closely related to that
of “understands”. (‘Mastery’ of a technique)»18.
In an attempt to produce a further demystification of what he takes to be
the apparently referential rôle of ‘2n’, Wittgenstein considers ‘2n’ simply as the
expression of a rule belonging to a given (arithmetical) language-game.
The main consequences of this operation, which takes place within a ‘nat-
uralizing context’, are that: (i) saying that ‘The arithmetical expression “2n” has
meaning’ is equivalent to saying that ‘Within arithmetic there are objective cri-
teria of correctness for the use of “2n”’; (ii) the technique conventionally associ-
ated with the expression ‘2n’ needs to be exemplified by a particular algorithm,
because, given the appropriate input, a rule must tell us what to do19 ; (iii) the
mastery of such a technique is something that is acquired neither through mag-
ic nor through psychoanalysis, but by being trained to carry out the algorithm;
(iv) ‘What is involved in the understanding of the expression “2n”?’ is replaced
by the much sharper question ‘What is involved in following correctly the 2n-
rule?’; and, finally, (v) to obey a rule is not a single-shot psychological process,
but is a custom, an institution.
Note how large the ‘philosophical distance’ is between our original ques-
tion ‘What is the meaning of “2n”?’ and the end product of the Wittgensteini-
an naturalization process: What is involved in following correctly the 2n-rule20?
However, in the extremely interesting operation carried out by Wittgen-
stein in the Philosophical Investigations, an operation aimed, as we have seen, at
expunging metaphysics from the philosophy of language, apart from the pos-

18
Ibi, section 150, p. 59e.
19
The correctness of this point is confirmed by Wittgenstein’s criticism of the Law of Excluded
Middle in the Philosophical Investigations, a criticism which develops along Brouwerian lines: «The law
of excluded middle says here: It must either look like this, or like that. So it really – and this is a truism –
says nothing at all, but gives us a picture. And the problem ought now to be: does reality accord with the
picture or not? And this picture seems to determine what we have to do, what to look for, and how – but
it does not do so, just because we do not know how it is to be applied» [ibi, section 352, p. 112e]. This
particular ‘take’ on Wittgenstein is at the heart of Dummett’s version of intuitionism.
20
The literature concerning Wittgenstein’s view of rule-following is very extensive. We are go-
ing to mention here a few items which might help the interested reader in developing a critical view of
the subject: S. Kripke, Wittgenstein on Rules and Private Language, Basil Blackwell, Oxford 1982; G.P.
Baker - P.M.S. Hacker, Scepticism, Rules and Language, Basil Blackwell, Oxford 1984; G. Oliveri, Le
Ricerche di Wittgenstein nella lettura di S. Kripke, in «Paradigmi» 6(1984), pp. 523-540; P. Frascolla,
Wittgenstein’s Philosophy of Mathematics, Routledge, London and New York 1994.

221
Gianluigi Oliveri - Salvatore Gaglio

sibly insuperable obstacle represented by the existence and importance in math-


ematics of non-computable functions and of highly non-constructive proof
procedures, there is at least one more obvious gap. A gap represented by the
vague concept of algorithm which appears to be based simply on intuition.
This is a serious problem, because if the connection between algorithms
and rules is internal, that is, if the algorithm is constitutive of the rule, it follows
that any vagueness present in the concept of algorithm is bound to spill over
onto the concept of rule. In this connection Turing’s proposal of the Turing
machine as a mathematically rigorous substitute for the concept of algorithm
proves to be of vital importance.
But, apart from the question concerning the vagueness surrounding the
concept of algorithm, there is another source of difficulty present in Wittgen-
stein’s account of the process of acquisition of the mastery of a particular tech-
nique (algorithm) associated (constitutively) with following the 2n-rule. Such
a source of difficulty has to do with the concept of training, and it is of great
importance, because the concept of training is what is meant to bring success-
fully to conclusion Wittgenstein’s socio-anthropological ‘naturalization’ of the
concept of understanding.
To see this consider that, for Wittgenstein, understanding the expression
‘2n’ no longer consists in ‘grasping an abstract object’ nor in being in a particu-
lar inner mental state, but is rather matter of becoming, through training, mas-
ter of a technique established within the context provided by a set of practices
forming a language-game or an institution:

«“So are you saying that human agreement decides what is true and what is
false?” – It is what human beings say that is true and false; and they agree in
the language they use. That is not agreement in opinions but in form of life»
[ibi, section 241, p. 88e].

Sophisticated and brute training

In reading the Philosophical Investigations there appear to be two different


types of training involved in Wittgenstein’s account of rule following. We shall
call the first type ‘sophisticated training’ and the other ‘brute training’. The main
difference between these two types of training is that whereas sophisticated train-
ing makes use, among other things, of explanations, brute training does not.
The explanations present in sophisticated training, besides appealing to
descriptions of the instructions concerning what to do in order to φ, make es-

222
Wittgenstein, Turing, and Neural Networks

sential use of justifications for interpreting the terms of the instructions in such-
and-such a way.
An example of sophisticated training is that of the trainer/instructor who
explains to the trainee how a particular Turing machine computes the function
2n by, among other things, attributing certain meanings to the symbols appear-
ing in the rows of the Turing machine, and in the tape on which the machine
operates.
Unfortunately, the concept of sophisticated training leads to an infinite re-
gress, because the trainee can ask the trainer to justify the particular interpreta-
tion of some of the terms present in the explanation, and once the required me-
ta-explanation has been provided the trainee can again ask the trainer to justify
the interpretation offered of some of the terms present in the meta-explanation,
etc. launching an infinite regress.
The rôle of brute training in the Wittgensteinian account of rule-follow-
ing is precisely that of stopping the infinite regress of explanations/justifications
present within sophisticated training. Brute training exemplifies the phenom-
enon known as ‘hitting rock-bottom’ in the regressing chain of explanations/
justifications.
An example of brute training is showing someone how to carry out the
instructions present in the rows of a particular Turing machine which calculates
the function 2n without providing explanations. Brute training is normative, it
consists of prescriptions, examples, and trials accompanied by positive or nega-
tive feed-back.
For Wittgenstein, brute training, operating on the basis of our specific an-
thropological background, is at the root of the formation of our social practices,
traditions, institutions, and, in more general terms, of our form of life:

«[O]ne human being can be a complete enigma to another. We learn this


when we come into a strange country with entirely strange traditions; and,
what is more, even given a mastery of the country’s language. We do not un-
derstand the people. (And not because of not knowing what they are saying
to themselves.) We cannot find our feet with them... If a lion could talk, we
could not understand him» [ibi, Part ii, Ch. xi, p. 223e].

But, is the notion of brute training coherent? If, even for brute training
to be successful, the trainee needs to see the point of it, we seem to be moving
in a circle, because the trainee now has the problem of understanding what the
point of the training is or, to put it in a different way, what the training is at-
tempting to make salient, that is, the intention of the trainer in doing so-and-so.

223
Gianluigi Oliveri - Salvatore Gaglio

In other words, it seems that, for brute training to be successful, the trainee
needs to ‘read the mind’ of the trainer.
Of course, if this is correct we have that it is only a theory of thought and,
in particular, a theory about ‘reading the mind of X’ that can explain how Y
can master via brute training the technique which establishes the conditions
for the correct use of the expression ‘2n’. Note that an immediate consequence
of the position above is that the study of thought becomes prior in the order of
explanation to the study of language turning upside down the priority thesis.
Moreover, how are we at this point going to construe ‘reading the mind of
X’ without reintroducing, as it were, from the window either the Platonism or
the psychologism which had been kicked out of the door? This is an extremely
important point, and it appears that, if we substitute for ‘reading the mind of
X’ the more modest ‘empathizing with X’, then research on mirror neurons can
show that our ability to empathize with X, and imitate X, is innate21. This, if
correct, would provide, among other things, a sharp biological underpinning
for the high-level phenomenon we have called ‘brute training’ which would
otherwise rest on vague socio-anthropological narratives.
Although appealing to mirror neurons avoids that the ghosts of departed
(metaphysical) entities reappear to haunt the dreams of the naturalist philoso-
pher of language, there is still a danger for the plausibility of the naturalist’s
story concerning brute training and learning, a danger represented by the pos-
sibly circular account of these two phenomena: for brute training to be possible
some learning must be given prior to it and vice versa.
On the other hand, if in our quest for a naturalist foundation of brute
training and learning we were prepared to widen our look ‘beyond mirror neu-
rons’, we would soon realize that recent research on neural networks seems to
show that some of these provide an interesting low-level biological model of
both brute training and learning. Two of the most interesting features of such
models are that: (a) training and learning are phenomena which can take place
independently of considerations relating to meaning and intentionality; and

21
See M. Iacoboni, Imitation, Empathy, and Mirror Neurons, in «Annu. Rev. Psychol.»
60(2009), pp. 666-667: «[R]esearch on mirror neurons, imitation, and empathy [...] tells us that our
ability to empathize [and, therefore, to imitate], a building block of our society (R. Adolphs, The social
brain: neural basis of social knowledge, in «Annu. Rev. Psychol.» 60 [in press]) and morality (F.B. de
Waal, Putting the altruism back into altruism: the evolution of empathy, in «Annu. Rev. Psychol.»
59[2008], pp. 279-300, J.P. Tangney et al., Moral emotions and moral behaviour, in «Annu. Rev.
Psychol.» 58[2007], pp. 345-372), has been built “bottom up” from relatively simple mechanisms of
action production and perception (M. Iacoboni, Mirroring People, Farrar, Straus & Giroux, New York,
2008)».

224
Wittgenstein, Turing, and Neural Networks

that (b) there are some forms of learning which are independent of any type of
training.
With regard to the importance of points (a) and (b) above, we need to
consider that whereas point (a) dispenses with the idea that, for the brute train-
ing of Y on the part of X to be successful, Y must be able to ‘read the mind of
X’; point (b) resolves the brute training/learning circularity objection.
In what follows in this paper we are going to study a particular type of neu-
ral network which makes a strong case in favour of the idea that the appropri-
ate foundation for the phenomena of brute training and learning is biological
rather than socio-anthropological.

Neural networks

A neural network is an abstract information processing system inspired


by the way biological nervous systems process information. The basic element
of a neural network is the artificial neuron, proposed for the first time by Mc-
Cullock and Pitts in 194322 as a simplified model of the biological neuron (see
figure 1).
A biological neuron is a cell made by a cell body (also called ‘soma’), a set
of dendrites and an axon. The dendrites and the axon are extensions of the cell
body. The axon forms connections with the dendrites of other neurons called
‘synapses.’ Through synapses on its dendrites, the neuron receives electro-chem-
ical signals from other neurons, and through the synapses that its axon forms
with other neurons, it transmits its processed signals. Synapses may have dif-
ferent strength and can be either excitatory or inhibitory. The change in the
strength of synapses over time, which depends on the rate of transmission of
signals that pass through them, is known as synaptic plasticity and permits im-
portant biological functions like memory and adaptation.
An artificial neuron is an abstraction of a biological neuron. It models the
strength of synapses through a set of n real-valued weights {w1, . . . , wn}, one
weight per synapse. We are going to represent these weights by means of the
n-vector W = (w1, . . . , wn). We are also going to represent the corresponding
set of n inputs {u1, . . . , un} – one input per synapse, such that ui = 0 or ui =1, for
1 ≤ i ≤ n – by means of the n-vector U = (u1, . . . , un). Now, if W·U is the inner

22
W. McCulloch, and W. Pitts, A Logical Calculus of the Ideas Immanent in Nervous Activity, in
«Bulletin of Mathem. Biophysics» 5(1943), pp. 115-133.

225
Gianluigi Oliveri - Salvatore Gaglio

product of W by U, the neuron computes its output according to the following


function:

f(U) =
{ 1 if W . U ≥ μ
0 otherwise

Fig. 1

where µ U R is a threshold. Therefore, an artificial neuron computes a weighted


sum of its inputs and fires if such a value is greater than or equal to a given
threshold.
The way in which neurons are connected to other neurons determines dif-
ferent network structures. The most popular network is known as multilayer
perceptron, where neurons are disposed into layers, and each neuron of a layer
connects only to neurons of the following layer, as in figure 2. The neurons of
the last layer are called ‘output neurons,’ and the neurons of the intermediate
layers are called ‘hidden neurons.’

226
Wittgenstein, Turing, and Neural Networks

Fig. 2

The multilayer perceptron learns to perform a so-called ‘associative com-


putation’, in the sense that it maps an input binary string (as 101) into an out-
put binary string (as 10). The mapping is determined by the weights of all the
neurons in the network, and such weights can be determined by means of a
supervised learning algorithm.
In supervised learning a teacher provides the network with a set of examples
of input-output mapping, and the network adjusts progressively its weights in
order to perform the desired mapping. The multilayer perceptron can be con-
sidered, therefore, a universal approximator, since it can be trained to compute
(approximate) a desired input-output function y = f (x)23.
The back-propagation algorithm24 provides an effective, and biologically
evocative, method to accomplish supervised learning. It minimizes the quadratic
error at the output neurons of the network by means of a collective adaptation
process in which each single neuron adjusts itself in response to a feedback error
signal. For the output neuron yk, where k = 1 or k = 2, of figure 2 the error is:

E = (d − r)2

23
Here f can be any function from a set D1 to a set D2. The elements of D1 and D2 are codified by
strings of binary digits (0 or 1).
24
D.E. Rumelhart et al., Learning Internal Representations by Error Propagation, in D.E.
Rumelhart - J.L. McClelland (eds.), Parallel Distributed Processing, vol. 1, mit Press, Boston 1983, pp.
318-362.

227
Gianluigi Oliveri - Salvatore Gaglio

where r is the actual output of the neuron and d is the desired (by the teacher)
output. A quadratic error function is chosen since it penalizes more large errors
and treats in the same fashion positive and negative errors.

The mathematics of learning

The network learns to accomplish a given task through the active and col-
lective participation of each neuron which modifies itself, via the modification
of the weights/strengths of its synapses, in relation to a feedback error signal.
Each neuron in the network must adjust its weights in order to reduce the error
E at the output neurons. To do so (see fig. 2), a mathematical entity known as
the gradient – the analogous of the first derivative for functions of more than
one variable – of the error E with respect to the set of weights {w1j , . . . , wnj }, for
i i

i, j U N, of the given neuron i of the jth layer must be computed.


The gradient is a measure of the rate of change of the error E with respect
to the variation of each weight wkj where 1 ≤ k ≤ n. We can say that it is a mea-
i

sure of the amount of responsibility to be ascribed to the single synapses of the


neuron for the global error at the output. The error E is maximally reduced if
the weights are modified by subtracting (or adding) to them their specific con-
tribution to the gradient of E. This correction performs a sort of compensation
of the error for each weight and is known as gradient descent technique. The
amount of correction for each synapse is obtained by propagating a feedback
signal of such a gradient called ‘delta value,’ starting from the output neurons
back to the neurons of the previous layers (see fig. 3).

Fig. 3

228
Wittgenstein, Turing, and Neural Networks

And what happens is that each neuron receives the delta values carrying
the error information from the neurons to which is connected, computes its
contribution to the global error, computes its delta value to determine its re-
sponsibility for the error, corrects its weights and, in turn, sends back its delta
value to the neurons that are connected to it.
The above learning procedure provides also quite an interesting side result
which can be considered significative from the biological point of view. Namely,
it can be shown that the network is able to generalize in the sense that it re-
sponds to patterns not seen during the training phase according to a criterion of
similarity: it generates similar outputs in response to similar inputs.

Some examples

The neural network of figure 4 can be trained to distinguish patterns codi-


fied as strings of 4 binary digits between two classes ‘good’ and ‘bad’. We can
codify with 1 the class ‘good’ and with 0 the class ‘bad’. In order to train the
network, the teacher presents many times to the network all the input-output
pairs in table 1, until the network has learned the association. The classes need
not be exhaustive, therefore the table includes only the entries of interest. The
network will perform its generalization choosing to answer to new patterns not
seen during training according to their similarity to the seen patterns.

Fig. 4

229
Gianluigi Oliveri - Salvatore Gaglio

Input Output
0000 Bad
0010 Good
0011 Good
0111 Bad
1001 Good
1100 Good
1101 Bad

Table 1

At the beginning of the training the weights are randomly selected. When
the teacher feeds the network with each entry in the table, the network com-
putes its output value r. If the network gives a wrong answer, it uses its error to
modify itself, by computing for the output neuron the delta value, namely the
contribution to the gradient of the error. Applying the back-propagation algo-
rithm, the delta values are propagated in a backward fashion, as in fig. 3, and
by using them each neuron adjusts its synapses to reduce the error. The whole
procedure is repeated until the network has learned the classification.
A neural network that implements the rule 2n, namely the product table
of 2 by n expressed in binary form with k bits, is shown in fig. 5. However,
this network is hardwired and properly designed to implement with a neural
network a sort of shift register25. On the other hand, a neural network of the
kind represented in fig. 2 can learn the rule by itself as in the previous example.
Again, the teacher shows many times, one after the other, the pairs shown in
table 2. At each step the network adjusts its weights until it learns the desired
mapping.

25
A shift register is a logical device that stores a linear string of binary digits and that can perform,
after a specific command, a shift operation over the string, either on the left or on the right. For instance,
if it stores the string 01101, after the left-shift command, the new string is 11010. In this case the first 0
is lost, and a new 0 is added to the right.

230
Wittgenstein, Turing, and Neural Networks

Fig. 5

Input Output Input Output


0000 00000 1000 10000
0001 00010 1001 10010
0010 00100 1010 10100
0011 00110 1010 10100
0100 01000 1100 11000
0101 01010 1101 11010
0110 01100 1110 11100
0111 01110 1111 11110

Table 2

Other forms of learning

The supervised learning that we have seen so far is only one of the possible
ways in which a neural network can learn from examples. Two other important
ways are reinforcement learning and unsupervised learning.
In reinforcement learning the network does not receive an indication of
what is the correct answer in each case. It occasionally receives a reward or a

231
Gianluigi Oliveri - Salvatore Gaglio

punishment from the external environment for its behavior26. Such a reward or
punishment may have different values, and the network adjusts its synapses to
increase what is called its ‘utility’, that is the algebraic sum of the rewards and
the punishments calculated over the sequence of its answers. ‘Learning’ in this
case means exploration of the environment and subsequent adaptation.
Unsupervised learning is the ability of a neural network to learn in the
absence of a teacher or of something that gives some indication as to what is
correct or best to do given a certain input. In such a situation the neural net-
work learns to discriminate among its inputs and adjusts its synapses to do so.
For instance, in the absence of a teacher a neural network may learn to classify
the patters shown in fig. 6 into two different classes or clusters. We may call
these classes ‘ovals’ and ‘lines’, but for the neural network names do not mat-
ter, because it uses a similarity criterion to distinguish between the members of
different classes. A popular neural network that uses this method of learning is
ART proposed by Grossberg27. Such a neural network is also able to introduce
automatically new classes if the match of a specimen with the existing classes
is poor. Kohonen’s self-organizing maps28, instead, project their inputs over a
space in which closeness means similarity.

Fig. 6

26
M.R. Coulom, Apprentissage par Renforcement Utilisant des Réseaux de Neurones, avec des Ap-
plications au Contrôle moteur, Ph. D. Thesis, Institut National Polytechnique de Grenoble, France 2002.
27
S. Grossberg, Competitive Learning: From interactive activation to adaptive resonance, in
«Cognitive Science» 11(1987), pp. 23-63.
28
T. Kohonen, Self-Organizing Maps, Springer, Berlin 2001.

232
Wittgenstein, Turing, and Neural Networks

Training, learning, and neural networks

In sections 6-8 we saw that there are at least three different types of learn-
ing of which a neural network is capable: supervised learning, reinforcement
learning, and unsupervised learning.
It is important to notice that, given a careful examination of the above
mentioned phenomena, we are justified in using the term ‘learning’ to describe
what happens to the neural network in question when this displays them. For
in all these cases the neural network acquires a new ability as a consequence of
well specified ways of treating information on its part so as to reprogram itself
in order to accomplish a particular task29.
If what we have argued so far is correct then, taking information as a primi-
tive notion, and in spite of neural networks being classified by scholars like Nils-
son30 as mere ‘stimulus-response agents’, we have that: (1) the teacher’s part in
the supervised learning procedures of a neural network effectively models the
notion of brute training we dealt with in the first sections of this article; and
that, (2) the case of unsupervised learning of a neural network shows that there
are forms of learning which are independent of brute training, and, therefore,
of training in general.
Concerning point (1) above, consider that, although training a neural net-
work in no way involves the use of explanations, nevertheless the neural net-
work learns something (acquires the ability to ... ) as a consequence of it.
Furthermore, since the training of a neural network, and the learning cor-
responding to it, have nothing to do with ‘Understanding the meaning of the
expression φ’ nor with ‘Individuating the point of the assertion ψ’, we can con-
clude that such processes take place independently of the existence of language.
One of the immediate consequences of these considerations is that, if we
are prepared to abandon the socio-anthropological level of analysis and reach
deep down to the biological level, there is a possible naturalistic foundation for
the concepts of brute training and (brute) learning.
To this, of course, a note of caution needs to be added, because, as we have
already remarked in section 5, neural networks are after all only a mathemati-
cal model inspired by the way biological nervous systems process information.

29
A mathematical treatment of learning based on the minimization of a loss or risk function
which justifies our proposal can be found in V.N. Vapnik, The Nature of Statistical Learning Theory,
Springer-Verlag, New York, Berlin, Heidelberg 1995. A loss or risk function measures the risk of not
being able to accomplish the required task.
30
See N.J. Nilsson, Intelligenza artificiale, transl. by S. Gaglio, Apogeo, Milano 2002, ch. 3, p. 37.

233
Gianluigi Oliveri - Salvatore Gaglio

But, even so, it seems to us that the relevance of neural networks to the prob-
lems we have been dealing with in this paper remains undiminished. For neural
networks provide compelling evidence in favour of the existence of relatively
simple information processing systems which can be subjected to brute train-
ing, can learn independently of training, etc.
Secondly, point (2) above can be used to argue that there is no obvious cir-
cularity between the concepts of brute training and learning, because although
it is the case that for brute training to be possible some learning must be given
prior to it, the converse is not true. Indeed, as we have seen, it is possible to have
(brute) learning on the part of a neural network in the absence of the relevant
training given prior to it.
Thirdly, if a tenable Wittgenstein-style naturalization of meaning is, as we
have argued, conditional upon seeing brute training and learning as phenomena
which can take place at the level of neural networks, it follows that the theory
of neural networks, that is, the mathematical theory of certain information pro-
cessing systems inspired by a particular type of brain processes, is prior, in the
order of explanation, to meaning theory and, a fortiori, to the philosophy of
language. This, of course, shows once more that the priority thesis is wrong.
Fourthly, an immediate consequence of the idea that a theory of thought
is prior to a theory of language in the order of explanation, is the most plausible
view according to which learning, training, and generalizing are activities to be
found also among those living organisms which do not have a language. Indeed,
it is the presence in living organisms of abilities such as these what, together
with the Russian roulette of evolution, is capable of providing a satisfactory ac-
count of their capacity to adapt to a rapidly changing environment, and survive.
Fifthly, let us now ask whether a neural network that has been trained to
follow the 2n-rule, where n is a natural number, and has learned to do so – i.e. it
has mastered the technique associated with the expansion of the 2n-rule, through
a modification of the weights of its neurons, etc. – has thereby obtained an under-
standing of the 2n-rule. Well, the obvious answer to this question has to be a clear
and resounding ‘No’, because ‘The neural network’, as some people say in such
circumstances, ‘does not know what is doing’. Indeed, there is more to following
the 2n-rule than learning to associate to an input n (given in binary notation) an
output 2n (given in binary notation); one such thing being, for example, knowing
that ‘2n’ is a rule of a given game. We take these considerations to show that, in
contrast with what Wittgenstein argues in the Philosophical Investigations, under-
standing cannot be construed simply in terms of mastery of a technique.
Lastly, since learning, distinguishing between different patterns, general-
izing, etc. are cognitive functions, it seems to us that neural networks capable

234
Wittgenstein, Turing, and Neural Networks

of performing such functions should be legitimately considered to be cognitive


systems. And, of course, since these cognitive systems operate in the absence of,
among other things, representations, concepts, and consciousness, it is neces-
sary to engage in a profound rethinking of the standard views of cognition, and
of the nature of mind31. This is the stimulating topic of part of our past and
present research work32.
Gianluigi Oliveri - Salvatore Gaglio
University of Palermo icar / cnr
gianluigi.oliveri@unipa.it / salvatore.gaglio@unipa.it

ABSTRACT

The main task of this paper is grounding the socio-anthropological “natural-


ization” of meaning operated by the later Wittgenstein in his remarks on rule-fol-
lowing in the Philosophical Investigations in considerations relating to models of
low-level (biological) processes of imitation, training, and learning. If the operation
suggested above is successful, two of its immediate consequences are that the social
aspect of language can no longer be considered as a primitive notion, but needs
to be placed upon, if not reduced to, a biological foundation; and that the study
of thought, and, actually, of certain brain processes, becomes prior in the order of
explanation to the study of language. The issues raised in this article are relevant
to Wittgenstein scholarship, to any attempt to produce an acceptable philosophy of
language, and to all those interested in the tenability of one of the cornerstones of
analytic philosophy, the so-called “priority thesis”: the study of language is prior, in
the order of explanation, to the study of thought.

31
See on this F.J. Varela et al., The Embodied Mind, The mit Press, Cambridge, Massachusetts
1993 (especially Part iv, Chapter 8), where the standard views of cognition and mind are discussed,
and alternative bold proposals (enactivism, embodied mind) are put forward; and the intriguing theory
of conceptual spaces, which challenges current conceptions of mind and cognition. A good place to
learn about conceptual spaces is P. Gärdenfors, Conceptual Spaces: The Geometry of Thought, mit Press,
Cambridge, Massachusetts 2004.
32
See on this A. Augello et al., An algebra for the manipulation of conceptual spaces in cognitive
agents, in «Biologically Inspired Cognitive Architectures» 6(2013), pp. 23-29; A. Augello et al.,
Mathematical Patterns and Cognitive Architectures, in A. Lieto et al., (eds.), Artificial Intelligence and
Cognition 2014 (ceur Workshops Proceedings, vol. 1510), University of Aachen, Aachen 2014, pp.
68-82; A. Augello et al., Pattern-Recognition: A Foundational Approach, in A. Lieto et al. (eds.), Artificial
Intelligence and Cognition 2015 (ceur Workshops Proceedings, vol. 1315), University of Aachen,
Aachen 2015, pp. 134-139.

235

View publication stats

You might also like