Professional Documents
Culture Documents
Oliveri-Gaglio - Wittgenstein, Turing, and Neural Networks
Oliveri-Gaglio - Wittgenstein, Turing, and Neural Networks
net/publication/326539592
CITATIONS READS
0 481
2 authors:
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Gianluigi Oliveri on 21 July 2018.
Introduction
«[K]nowledge, mind, and meaning are part of the same world that they have
to do with, and […] are to be studied in the same empirical spirit that ani-
mates natural science. There is no place for prior philosophy»1.
1
W.V. Quine, Ontological Relativity & Other Essays, Columbia University Press, New York
1993, p. 26.
2
L. Wittgenstein, Philosophical Investigations, Basil Blackwell, Oxford 1983, section 116, p. 48e.
3
M.A.E. Dummett, The Logical Basis of Metaphysics, Duckworth, London 1991, p. 3.
4
Equation (2) is informative, because, although the proper names ‘eiπ’ and ‘cos π + i sin π’
216
Wittgenstein, Turing, and Neural Networks
occurring on either side of the equality sign have the same reference, they, nevertheless, have a different
sense (content). See on this G. Frege, Senso e Denotazione, in A. Bonomi (ed.), La struttura logica del
linguaggio, Valentino Bompiani, Milano 1973, pp. 9-32.
5
According to Frege, a proper name is a word (or a sign or a connection of signs or an expression)
endowed with a sense and a (unique) reference. The reference of a proper name is an object (concrete
or abstract). For Frege, concepts (and relations) are not to be regarded as objects (see on this G. Frege,
‘Concetto e Oggetto’, in A. Bonomi [ed.], La struttura logica del linguaggio, Valentino Bompiani, Milano
1973, pp. 373-386).
6
According to Frege, concrete objects belong to the first realm, they: (1) can be perceived; (2) do
not belong to the content of our consciousness; (3) exist independently of us; (4) are ‘actual,’ that is, they
are subject to the principle of action and reaction. See G. Frege, Thoughts, in Id., Logical Investigations,
P.T. Geach editor, B. Blackwell, Oxford 1977, pp. 1-30.
7
For Frege, representations or ideas (1) cannot be perceived; (2) belong to the content of our
consciousness; (3) do not exist independently of us; (4) ‘every idea has only one owner’; (5) are the
objects of description of a private language, i.e., a language that only the speaker can understand (on this
last point see footnote 8). Cfr. G. Frege, Thought, cit.
8
Frege’s argument in favour of the existence of a private language can be found in the quotation
below: «[E]veryone is presented to himself in a special and primitive way, in which he is presented to
no-one else. So, when Dr Lauben has the thought that he was wounded, he will probably be basing it on
this primitive way in which he is presented to himself. And only Dr Lauben himself can grasp thoughts
specified in this way. But now he may want to communicate with others. He cannot communicate a
thought he alone can grasp» (ibi, pp. 12-13).
217
Gianluigi Oliveri - Salvatore Gaglio
abstract, timeless, and objective entities populating, together with the natural
numbers and other mathematical objects, his third realm9.
For Frege, thoughts are entities which we grasp with our understanding
(with ‘the power of thinking’). To this he adds that10:
9
For Frege the objects belonging to the third realm: (1) cannot be perceived; (2) do not belong
to the content of our consciousness; (3) exist independently of us; (4) are ‘objective’; (5) are ‘real’, but
not ‘actual’, because they are not subject to the principle of action and reaction. G. Frege, Thought, cit.
10
Ibi, footnote 6, pp. 24-25.
11
The identity conditions for natural numbers are provided by the definition of (cardinal)
number in conjunction with the so-called ‘Hume’s Principle’.
12
H. Field, Realism, Mathematics & Modality, B. Blackwell, Oxford 1989, p. 3.
218
Wittgenstein, Turing, and Neural Networks
tions for it if and only if we can identify and re-identify the character according
to Hamlet.
The interesting thing that emerges here is that in spite of the fact that
Hamletic satisfies Frege’s conditions (α) and (β) above this would, nevertheless,
be insufficient to show that Hamlet, Ophelia, Rosencrantz, Guildenstern, etc.
exist13.
Moreover, Frege’s conditions (α) and (β) are not necessary for believing in
the existence of objects either. For we are justified in believing in the existence
of X, simply because we perceive (are acquainted with) X, even though we do
not know what the identity conditions for X are, etc.
13
A well-known discussion of whether Frege’s context principle can be used to establish arith-
metical Platonism can be found in C. Wright, Frege’s Conception of Numbers as Objects, Aberdeen Uni-
versity Press, Aberdeen 1983, and in H. Field, Realism, Mathematics & Modality, cit., ch. 5, «Platonism
for cheap? Crispin Wright on Frege’s Context Principle», pp. 147–170.
219
Gianluigi Oliveri - Salvatore Gaglio
14
L. Wittgenstein, Philosophical Investigations, cit., section 1, p. 2e .
15
As is well known, Wittgenstein defuses the question ‘What is the meaning of “2n”?’ by means of
the ‘mantra’: «For a large class of cases – though not for all – in which we employ the word “meaning”
it can be defined thus: the meaning of a word is its use in the language». L. Wittgenstein, Philosophical
Investigations, cit., section 43, p. 20e.
16
Ibi, section 201, p. 81e.
17
Ibi, section 202, p. 81e.
220
Wittgenstein, Turing, and Neural Networks
18
Ibi, section 150, p. 59e.
19
The correctness of this point is confirmed by Wittgenstein’s criticism of the Law of Excluded
Middle in the Philosophical Investigations, a criticism which develops along Brouwerian lines: «The law
of excluded middle says here: It must either look like this, or like that. So it really – and this is a truism –
says nothing at all, but gives us a picture. And the problem ought now to be: does reality accord with the
picture or not? And this picture seems to determine what we have to do, what to look for, and how – but
it does not do so, just because we do not know how it is to be applied» [ibi, section 352, p. 112e]. This
particular ‘take’ on Wittgenstein is at the heart of Dummett’s version of intuitionism.
20
The literature concerning Wittgenstein’s view of rule-following is very extensive. We are go-
ing to mention here a few items which might help the interested reader in developing a critical view of
the subject: S. Kripke, Wittgenstein on Rules and Private Language, Basil Blackwell, Oxford 1982; G.P.
Baker - P.M.S. Hacker, Scepticism, Rules and Language, Basil Blackwell, Oxford 1984; G. Oliveri, Le
Ricerche di Wittgenstein nella lettura di S. Kripke, in «Paradigmi» 6(1984), pp. 523-540; P. Frascolla,
Wittgenstein’s Philosophy of Mathematics, Routledge, London and New York 1994.
221
Gianluigi Oliveri - Salvatore Gaglio
«“So are you saying that human agreement decides what is true and what is
false?” – It is what human beings say that is true and false; and they agree in
the language they use. That is not agreement in opinions but in form of life»
[ibi, section 241, p. 88e].
222
Wittgenstein, Turing, and Neural Networks
sential use of justifications for interpreting the terms of the instructions in such-
and-such a way.
An example of sophisticated training is that of the trainer/instructor who
explains to the trainee how a particular Turing machine computes the function
2n by, among other things, attributing certain meanings to the symbols appear-
ing in the rows of the Turing machine, and in the tape on which the machine
operates.
Unfortunately, the concept of sophisticated training leads to an infinite re-
gress, because the trainee can ask the trainer to justify the particular interpreta-
tion of some of the terms present in the explanation, and once the required me-
ta-explanation has been provided the trainee can again ask the trainer to justify
the interpretation offered of some of the terms present in the meta-explanation,
etc. launching an infinite regress.
The rôle of brute training in the Wittgensteinian account of rule-follow-
ing is precisely that of stopping the infinite regress of explanations/justifications
present within sophisticated training. Brute training exemplifies the phenom-
enon known as ‘hitting rock-bottom’ in the regressing chain of explanations/
justifications.
An example of brute training is showing someone how to carry out the
instructions present in the rows of a particular Turing machine which calculates
the function 2n without providing explanations. Brute training is normative, it
consists of prescriptions, examples, and trials accompanied by positive or nega-
tive feed-back.
For Wittgenstein, brute training, operating on the basis of our specific an-
thropological background, is at the root of the formation of our social practices,
traditions, institutions, and, in more general terms, of our form of life:
But, is the notion of brute training coherent? If, even for brute training
to be successful, the trainee needs to see the point of it, we seem to be moving
in a circle, because the trainee now has the problem of understanding what the
point of the training is or, to put it in a different way, what the training is at-
tempting to make salient, that is, the intention of the trainer in doing so-and-so.
223
Gianluigi Oliveri - Salvatore Gaglio
In other words, it seems that, for brute training to be successful, the trainee
needs to ‘read the mind’ of the trainer.
Of course, if this is correct we have that it is only a theory of thought and,
in particular, a theory about ‘reading the mind of X’ that can explain how Y
can master via brute training the technique which establishes the conditions
for the correct use of the expression ‘2n’. Note that an immediate consequence
of the position above is that the study of thought becomes prior in the order of
explanation to the study of language turning upside down the priority thesis.
Moreover, how are we at this point going to construe ‘reading the mind of
X’ without reintroducing, as it were, from the window either the Platonism or
the psychologism which had been kicked out of the door? This is an extremely
important point, and it appears that, if we substitute for ‘reading the mind of
X’ the more modest ‘empathizing with X’, then research on mirror neurons can
show that our ability to empathize with X, and imitate X, is innate21. This, if
correct, would provide, among other things, a sharp biological underpinning
for the high-level phenomenon we have called ‘brute training’ which would
otherwise rest on vague socio-anthropological narratives.
Although appealing to mirror neurons avoids that the ghosts of departed
(metaphysical) entities reappear to haunt the dreams of the naturalist philoso-
pher of language, there is still a danger for the plausibility of the naturalist’s
story concerning brute training and learning, a danger represented by the pos-
sibly circular account of these two phenomena: for brute training to be possible
some learning must be given prior to it and vice versa.
On the other hand, if in our quest for a naturalist foundation of brute
training and learning we were prepared to widen our look ‘beyond mirror neu-
rons’, we would soon realize that recent research on neural networks seems to
show that some of these provide an interesting low-level biological model of
both brute training and learning. Two of the most interesting features of such
models are that: (a) training and learning are phenomena which can take place
independently of considerations relating to meaning and intentionality; and
21
See M. Iacoboni, Imitation, Empathy, and Mirror Neurons, in «Annu. Rev. Psychol.»
60(2009), pp. 666-667: «[R]esearch on mirror neurons, imitation, and empathy [...] tells us that our
ability to empathize [and, therefore, to imitate], a building block of our society (R. Adolphs, The social
brain: neural basis of social knowledge, in «Annu. Rev. Psychol.» 60 [in press]) and morality (F.B. de
Waal, Putting the altruism back into altruism: the evolution of empathy, in «Annu. Rev. Psychol.»
59[2008], pp. 279-300, J.P. Tangney et al., Moral emotions and moral behaviour, in «Annu. Rev.
Psychol.» 58[2007], pp. 345-372), has been built “bottom up” from relatively simple mechanisms of
action production and perception (M. Iacoboni, Mirroring People, Farrar, Straus & Giroux, New York,
2008)».
224
Wittgenstein, Turing, and Neural Networks
that (b) there are some forms of learning which are independent of any type of
training.
With regard to the importance of points (a) and (b) above, we need to
consider that whereas point (a) dispenses with the idea that, for the brute train-
ing of Y on the part of X to be successful, Y must be able to ‘read the mind of
X’; point (b) resolves the brute training/learning circularity objection.
In what follows in this paper we are going to study a particular type of neu-
ral network which makes a strong case in favour of the idea that the appropri-
ate foundation for the phenomena of brute training and learning is biological
rather than socio-anthropological.
Neural networks
22
W. McCulloch, and W. Pitts, A Logical Calculus of the Ideas Immanent in Nervous Activity, in
«Bulletin of Mathem. Biophysics» 5(1943), pp. 115-133.
225
Gianluigi Oliveri - Salvatore Gaglio
f(U) =
{ 1 if W . U ≥ μ
0 otherwise
Fig. 1
226
Wittgenstein, Turing, and Neural Networks
Fig. 2
E = (d − r)2
23
Here f can be any function from a set D1 to a set D2. The elements of D1 and D2 are codified by
strings of binary digits (0 or 1).
24
D.E. Rumelhart et al., Learning Internal Representations by Error Propagation, in D.E.
Rumelhart - J.L. McClelland (eds.), Parallel Distributed Processing, vol. 1, mit Press, Boston 1983, pp.
318-362.
227
Gianluigi Oliveri - Salvatore Gaglio
where r is the actual output of the neuron and d is the desired (by the teacher)
output. A quadratic error function is chosen since it penalizes more large errors
and treats in the same fashion positive and negative errors.
The network learns to accomplish a given task through the active and col-
lective participation of each neuron which modifies itself, via the modification
of the weights/strengths of its synapses, in relation to a feedback error signal.
Each neuron in the network must adjust its weights in order to reduce the error
E at the output neurons. To do so (see fig. 2), a mathematical entity known as
the gradient – the analogous of the first derivative for functions of more than
one variable – of the error E with respect to the set of weights {w1j , . . . , wnj }, for
i i
Fig. 3
228
Wittgenstein, Turing, and Neural Networks
And what happens is that each neuron receives the delta values carrying
the error information from the neurons to which is connected, computes its
contribution to the global error, computes its delta value to determine its re-
sponsibility for the error, corrects its weights and, in turn, sends back its delta
value to the neurons that are connected to it.
The above learning procedure provides also quite an interesting side result
which can be considered significative from the biological point of view. Namely,
it can be shown that the network is able to generalize in the sense that it re-
sponds to patterns not seen during the training phase according to a criterion of
similarity: it generates similar outputs in response to similar inputs.
Some examples
Fig. 4
229
Gianluigi Oliveri - Salvatore Gaglio
Input Output
0000 Bad
0010 Good
0011 Good
0111 Bad
1001 Good
1100 Good
1101 Bad
Table 1
At the beginning of the training the weights are randomly selected. When
the teacher feeds the network with each entry in the table, the network com-
putes its output value r. If the network gives a wrong answer, it uses its error to
modify itself, by computing for the output neuron the delta value, namely the
contribution to the gradient of the error. Applying the back-propagation algo-
rithm, the delta values are propagated in a backward fashion, as in fig. 3, and
by using them each neuron adjusts its synapses to reduce the error. The whole
procedure is repeated until the network has learned the classification.
A neural network that implements the rule 2n, namely the product table
of 2 by n expressed in binary form with k bits, is shown in fig. 5. However,
this network is hardwired and properly designed to implement with a neural
network a sort of shift register25. On the other hand, a neural network of the
kind represented in fig. 2 can learn the rule by itself as in the previous example.
Again, the teacher shows many times, one after the other, the pairs shown in
table 2. At each step the network adjusts its weights until it learns the desired
mapping.
25
A shift register is a logical device that stores a linear string of binary digits and that can perform,
after a specific command, a shift operation over the string, either on the left or on the right. For instance,
if it stores the string 01101, after the left-shift command, the new string is 11010. In this case the first 0
is lost, and a new 0 is added to the right.
230
Wittgenstein, Turing, and Neural Networks
Fig. 5
Table 2
The supervised learning that we have seen so far is only one of the possible
ways in which a neural network can learn from examples. Two other important
ways are reinforcement learning and unsupervised learning.
In reinforcement learning the network does not receive an indication of
what is the correct answer in each case. It occasionally receives a reward or a
231
Gianluigi Oliveri - Salvatore Gaglio
punishment from the external environment for its behavior26. Such a reward or
punishment may have different values, and the network adjusts its synapses to
increase what is called its ‘utility’, that is the algebraic sum of the rewards and
the punishments calculated over the sequence of its answers. ‘Learning’ in this
case means exploration of the environment and subsequent adaptation.
Unsupervised learning is the ability of a neural network to learn in the
absence of a teacher or of something that gives some indication as to what is
correct or best to do given a certain input. In such a situation the neural net-
work learns to discriminate among its inputs and adjusts its synapses to do so.
For instance, in the absence of a teacher a neural network may learn to classify
the patters shown in fig. 6 into two different classes or clusters. We may call
these classes ‘ovals’ and ‘lines’, but for the neural network names do not mat-
ter, because it uses a similarity criterion to distinguish between the members of
different classes. A popular neural network that uses this method of learning is
ART proposed by Grossberg27. Such a neural network is also able to introduce
automatically new classes if the match of a specimen with the existing classes
is poor. Kohonen’s self-organizing maps28, instead, project their inputs over a
space in which closeness means similarity.
Fig. 6
26
M.R. Coulom, Apprentissage par Renforcement Utilisant des Réseaux de Neurones, avec des Ap-
plications au Contrôle moteur, Ph. D. Thesis, Institut National Polytechnique de Grenoble, France 2002.
27
S. Grossberg, Competitive Learning: From interactive activation to adaptive resonance, in
«Cognitive Science» 11(1987), pp. 23-63.
28
T. Kohonen, Self-Organizing Maps, Springer, Berlin 2001.
232
Wittgenstein, Turing, and Neural Networks
In sections 6-8 we saw that there are at least three different types of learn-
ing of which a neural network is capable: supervised learning, reinforcement
learning, and unsupervised learning.
It is important to notice that, given a careful examination of the above
mentioned phenomena, we are justified in using the term ‘learning’ to describe
what happens to the neural network in question when this displays them. For
in all these cases the neural network acquires a new ability as a consequence of
well specified ways of treating information on its part so as to reprogram itself
in order to accomplish a particular task29.
If what we have argued so far is correct then, taking information as a primi-
tive notion, and in spite of neural networks being classified by scholars like Nils-
son30 as mere ‘stimulus-response agents’, we have that: (1) the teacher’s part in
the supervised learning procedures of a neural network effectively models the
notion of brute training we dealt with in the first sections of this article; and
that, (2) the case of unsupervised learning of a neural network shows that there
are forms of learning which are independent of brute training, and, therefore,
of training in general.
Concerning point (1) above, consider that, although training a neural net-
work in no way involves the use of explanations, nevertheless the neural net-
work learns something (acquires the ability to ... ) as a consequence of it.
Furthermore, since the training of a neural network, and the learning cor-
responding to it, have nothing to do with ‘Understanding the meaning of the
expression φ’ nor with ‘Individuating the point of the assertion ψ’, we can con-
clude that such processes take place independently of the existence of language.
One of the immediate consequences of these considerations is that, if we
are prepared to abandon the socio-anthropological level of analysis and reach
deep down to the biological level, there is a possible naturalistic foundation for
the concepts of brute training and (brute) learning.
To this, of course, a note of caution needs to be added, because, as we have
already remarked in section 5, neural networks are after all only a mathemati-
cal model inspired by the way biological nervous systems process information.
29
A mathematical treatment of learning based on the minimization of a loss or risk function
which justifies our proposal can be found in V.N. Vapnik, The Nature of Statistical Learning Theory,
Springer-Verlag, New York, Berlin, Heidelberg 1995. A loss or risk function measures the risk of not
being able to accomplish the required task.
30
See N.J. Nilsson, Intelligenza artificiale, transl. by S. Gaglio, Apogeo, Milano 2002, ch. 3, p. 37.
233
Gianluigi Oliveri - Salvatore Gaglio
But, even so, it seems to us that the relevance of neural networks to the prob-
lems we have been dealing with in this paper remains undiminished. For neural
networks provide compelling evidence in favour of the existence of relatively
simple information processing systems which can be subjected to brute train-
ing, can learn independently of training, etc.
Secondly, point (2) above can be used to argue that there is no obvious cir-
cularity between the concepts of brute training and learning, because although
it is the case that for brute training to be possible some learning must be given
prior to it, the converse is not true. Indeed, as we have seen, it is possible to have
(brute) learning on the part of a neural network in the absence of the relevant
training given prior to it.
Thirdly, if a tenable Wittgenstein-style naturalization of meaning is, as we
have argued, conditional upon seeing brute training and learning as phenomena
which can take place at the level of neural networks, it follows that the theory
of neural networks, that is, the mathematical theory of certain information pro-
cessing systems inspired by a particular type of brain processes, is prior, in the
order of explanation, to meaning theory and, a fortiori, to the philosophy of
language. This, of course, shows once more that the priority thesis is wrong.
Fourthly, an immediate consequence of the idea that a theory of thought
is prior to a theory of language in the order of explanation, is the most plausible
view according to which learning, training, and generalizing are activities to be
found also among those living organisms which do not have a language. Indeed,
it is the presence in living organisms of abilities such as these what, together
with the Russian roulette of evolution, is capable of providing a satisfactory ac-
count of their capacity to adapt to a rapidly changing environment, and survive.
Fifthly, let us now ask whether a neural network that has been trained to
follow the 2n-rule, where n is a natural number, and has learned to do so – i.e. it
has mastered the technique associated with the expansion of the 2n-rule, through
a modification of the weights of its neurons, etc. – has thereby obtained an under-
standing of the 2n-rule. Well, the obvious answer to this question has to be a clear
and resounding ‘No’, because ‘The neural network’, as some people say in such
circumstances, ‘does not know what is doing’. Indeed, there is more to following
the 2n-rule than learning to associate to an input n (given in binary notation) an
output 2n (given in binary notation); one such thing being, for example, knowing
that ‘2n’ is a rule of a given game. We take these considerations to show that, in
contrast with what Wittgenstein argues in the Philosophical Investigations, under-
standing cannot be construed simply in terms of mastery of a technique.
Lastly, since learning, distinguishing between different patterns, general-
izing, etc. are cognitive functions, it seems to us that neural networks capable
234
Wittgenstein, Turing, and Neural Networks
ABSTRACT
31
See on this F.J. Varela et al., The Embodied Mind, The mit Press, Cambridge, Massachusetts
1993 (especially Part iv, Chapter 8), where the standard views of cognition and mind are discussed,
and alternative bold proposals (enactivism, embodied mind) are put forward; and the intriguing theory
of conceptual spaces, which challenges current conceptions of mind and cognition. A good place to
learn about conceptual spaces is P. Gärdenfors, Conceptual Spaces: The Geometry of Thought, mit Press,
Cambridge, Massachusetts 2004.
32
See on this A. Augello et al., An algebra for the manipulation of conceptual spaces in cognitive
agents, in «Biologically Inspired Cognitive Architectures» 6(2013), pp. 23-29; A. Augello et al.,
Mathematical Patterns and Cognitive Architectures, in A. Lieto et al., (eds.), Artificial Intelligence and
Cognition 2014 (ceur Workshops Proceedings, vol. 1510), University of Aachen, Aachen 2014, pp.
68-82; A. Augello et al., Pattern-Recognition: A Foundational Approach, in A. Lieto et al. (eds.), Artificial
Intelligence and Cognition 2015 (ceur Workshops Proceedings, vol. 1315), University of Aachen,
Aachen 2015, pp. 134-139.
235