Professional Documents
Culture Documents
(CONSTANTINI, D. & GALAVOTTI, M. C. 1986) Induction and Deduction in Statistical Analysis, ERKENNTNIS 24, 73-94
(CONSTANTINI, D. & GALAVOTTI, M. C. 1986) Induction and Deduction in Statistical Analysis, ERKENNTNIS 24, 73-94
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp
.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact support@jstor.org.
Springer is collaborating with JSTOR to digitize, preserve and extend access to Erkenntnis (1975-).
http://www.jstor.org
=
(1) ?(H|E)
-^-<i?(HnE).
This can be done by introducing some hypotheses as to the way in
which E affects %H, i.e., to the way in which experimental observa?
tions affect the initial estimated distribution given by ^(H). In par?
ticular, an inference of this kind starts from an initial evaluation of a
distribution on the compositions of the population, ^(H), and after a
sample from the population described by E has been observed, comes,
by means of (1), to the evaluation of the probabilities of those
which are compatible with E. This represents a purely
compositions,
inductive approach to statistical inference.
(2) ?(H E)
| ?(?) ?(? H).
|
=-^
The difference between (1) and (2) clearly lies in the fact that%(H fl E)
<
is transformed in order to take into account the likelihood (E H).
\
This implies that in applying (2) we assume that the family of laws for
the population is known. In fact, the use of (2) can bring about
satisfactory results only in the case when the law of the population
actually belongs to such a family, otherwise it will lead to worthless
results. This reflects the hypothetical aspect of Bayes' theorem. What is
assumed in this case is not a particular law, but rather the hypothesis
that the law belongs to a certain family; the unknown part of the law,
then, remains the parameters. So, what H amounts to is a formal
representation over the possible values of the parameters. In other
words, when we apply Bayes' theorem we start from an initial dis?
tribution ^(H) on the possible values of the parameters and we modify
it in view of the likelihood %(E \H), to obtain the final distribution
^(H E)
| on the parameters. Given the essential role played in such an
inference by the assumption of a hypothesis regarding the family to
which the law of the population belongs, it seems that it can be viewed
as hypothetico-inductive, i.e., as combining hypothetical and inductive
elements. One should note the different meanings of ^(H) in a
hypothetico-inductive and in a purely inductive inference. While in the
former case ^(H) represents a distribution over the possible values of
the parameters, in the latter it represents a distribution on the possible
compositions of the population, or, what can be shown to be the same, a
distribution on the properties under consideration. In other words,
^(H) gives in the first case the probabilities of the possible values of the
parameters characterizing the law of the population, and in the second
it gives the probabilities that an individual of the population has the
considered properties.
As suggested by the preceding considerations, in the context of
estimation a distinction can be made between two methods of in?
ference: a purely inductive method and a hypothetico-inductive one.
All the methods mentioned represent different ways of learning from
experience. To summarize, we think that it is possible to single out three
i.e., that partition of the set Z which is relative to the individual a?. The
set of all partitions relative to each individual is the set of atomic
propositions
Eat=U{Ef}.
igN
(4) PI
igN
Piai.
This is not the only possible kind of deterministic law. The following
qualifies as a deterministic law as well: "Except for the individuals
belonging to Ne, all individuals bear P{\ and can be formulated as
(7) nUD
h m N>m {Z;\r1ZN-p\<l/h}
(8) n/eN2 nu
hj
n
tnj N>ntj
{z;k,-?i<i/ji,}
=
Pi + Pz l, hj>0.
It should be noted that in the case of an infinite population the
number 1 does not correspond only to a law of type (4), but also to a law
of type (5), as well as to all those laws, the number of whose exceptions,
though infinite, tends to 0.
Let us consider next a more complex case, that is the case in which
individuals of the population can bear one out of k properties. Here we
have I = Nk. For deterministic laws, with or without exceptions, what
has already been said still holds, obviously with some modifications,
because k properties are taken into account. Then, for example, there
will be k exceptionless deterministic laws instead of two. Statistical laws
will be analogous to (8). Then the law "the limiting frequency of
individual constants bearing P; is p?, j e Nk" has the formulation
but this leads away from the path followed so far. The introduction of a
density can be seen in some respects as an arbitrary step. In fact only a
few densities can be reached as limiting cases of laws of type (9). When
this happens, starting from (4), we can reach (10) in a gradual way; such
a graduality, however, represents something which must be shown in
each case.
and IpO) = l.
(?Qtp(j))JeNk = 1
7
In a finite 1-instance law the first member is called the support, and the
second the distribution. It isworth noting that ?Q? = {s?; / e Nk} and So
represents a (potential) sample, or a (degenerate) population of 1
individual bearing the attribute P;. A finite 1-instance law is then an
assessment of probability to a set of (potential) samples.
A finite s-instance law is an ordered pair
~
= S+ k \ =
(?Q?,POU JeN,, q ( I p(j) 1.
As before, the first member of the pair is called the support and the
second member the distribution. A finite s-instance law is then an
assessment of probability to a set of (potential) populations, more
precisely to all the possible populations originated by s individuals and
k attributes.
It should be noted, first, that since the supports of 1-instance and
s-instance laws do not refer to specific individuals but rather to relative
frequencies, such laws refer to each individual of the population and to
each population of s individuals respectively and, second, that a well
defined universal law is a particular case of an s-instance law. In fact, if
we know (or suppose) that the composition of a population of s
individuals is (sjs,... ,Sj/s,... ,sk/s) and we want to express this
knowledge (or supposition) we choose the s-instance law whose dis?
tribution assigns 0 probability to all the fc-tuples different from
(si/s,... ,Sj/s,... ,sk/s) and assigns probability 1 to this fc-tuple.
As s-?oo, ?Qs becomes ?Rk, i.e., the set of all fc-tuples of non
negative real numbers whose sum is 1. If the distribution of a s-instance
law converges to a fc-dimensional distribution function a finite oo
instance law is obtained. It is also possible to define the concepts of
denumerable and continuous 1-instance, s-instance and oo-instance
laws.
3. HYPOTHETICO-DEDUCTIVE INFERENCES
TABLE I
Pi -?Pi
P2 Sn Si2 S\.
~~'^2 s2\ s22 S2
S.l S.2 s
members do or do not bear P2, Pi will have the same probability in both
of them. Let p stand for the probability of Pu and q = 1 -p for the
probability of ~iPi. On the same assumption, it can be argued that (a)
-
the selection of s elements from the population s.i of which bear Pi
- can occur in
and s2 bear ?iPi (lA) different ways, all having the same
probability, namely pSiqS2; (b) the selection of s elements, of which Si.
-
bear P2 supposing that Su of these bear Pi and the remaining si2 do
not - and s2. bear ~iP2 - supposing that s2i of these bear Pi and the
- can
remaining do not be obtained in (s}?)(s1?) different ways, again
same =
the to
having probability, equal ps^qs^ps^qs^ p'.iq'*.
It is easy to see that the probabilities considered in (a) are mutually
exclusive, and that they are the only ones compatible with the marginal
values of table I. It is equally easy to realize that the possibilities
considered at (b) are also mutually exclusive and that they are the only
favourable ones to the data of table I. It then follows that the probability
of obtaining an experimental result which conforms to the result
observed is given by
(17) (?Q2,P2?))
where = 0 for = 1 for =
p2(j) j^ sim and p2(;) ; Si..
Having assumed the validity of (16) and (17) the four frequencies of
table I are univocally determined by each one of them. Consider the
number of individuals bearing both properties, and let it be denoted by
a -
/. Such number may vary between max(0, s.i s2.) and min(s.u sa).
(16) and (17) can be used in order to determine the probabilities of (14)
and (15). The relative frequencies s.i/s and Si./s, to which (16) and (17)
assign probability 1, are taken as the probabilities that an individual of
the sample bears the properties Pi and P2 respectively. In other words,
these values are used in order to determine the distributions of (14) and
(15). We limit our attention to
(18) (?Q2,pi(y))
= the laws of the
with pi(l) s.i/s. In spite of stochastic independence,
samples to be considered are fixed by (16) and (17). In fact, if we
suppose we have observed one individual with Pi, the individuals with
- 1 total of individuals in the sample
this property become s.i and the
s - 1. It follows that after such an observation the law of the unobserved
part of the sample is no longer (16), but
?<HX-,)/C>
On the basis of
this law the acceptability of the hypothesis of an
association betweenthe considered properties can be evaluated.
In general terms, the core of the preceding argument can be outlined
as follows. We have a corpus of experimental data, provided by the
observation of a sample of s elements taken from the population, and k
properties are taken into account. On the basis of the relative frequen?
cies of the properties we can state a 1-instance (empirical) law of the
form
(22) (?Qf,pT(j))
that we want to test for acceptance or rejection. To do this, we must
compare (22) with (21). In view of the fact that an empirical distribution
given by a sample of s elements represents one of the possible members
of the support of an s-instance law, first we must determine, by means of
(22), the s-instance law relative to k properties
4. HYPOTHETICO-INDUCTIVE INFERENCES
As we have
already said, Bayesian inference can be seen as hypo?
thetico-inductive in character.6 Its hypothetical aspect is represented
by the choice of an initial law assigning 0 probability to all distribution
functions except for those belonging to a certain family. Its inductive
aspect is represented by the shift from an initial to a final law. One
should note that a Bayesian inference usually starts with a oo-instance
law, to end up with a law of the same kind. However, itmakes use of
1-instance laws to determine the influence of experimental results on
the initial law.
Let us suppose that a family with continuously many attributes is
- with the measurement of a
considered for instance those connected
given length. We suppose also that the result of such measurement can
be any real number, so that the properties to be considered are all
members of R. After having performed s measurements, we have the
s-tuple (jci, ..., jcs). The distribution we start with is relative to all
distribution functions, that is to all the possible compositions, of an
infinite population with respect to continuously many properties. We
now assume as a hypothesis that this distribution assigns 0 probability to
all distribution functions except for normal distributions with variance
or2. In this way all distributions can be represented by the set of all
possible mean values, that is by the set of real numbers. The initial
(continuous) oo-instance law we start with will then be
= = SO- + = S
M'F ; 2.1/ 2 >?F O"/, X 2- *?
s/a +1/0-7 ?=1
(26) (R,pF(j))JeR.
With respect to this law, the possible compositions of the population
having non-0 density are all and only the normal ones with variance a2.
All the remaining compositions have density 0 for the initial law, and
have the same density for the final law. This should clarify the role played
by the hypothesis that has been formulated as to the composition of the
population. In fact, if such a composition actually conforms to the
hypothesis, the inference will tell us more about the composition itself. If,
on the contrary, the population is not normal with variance a1, the
inference will not modify our knowledge about its composition.
In some cases it is possible to eliminate the hypothetical aspect of a
Bayesian inference. This is the case, for example, when a family of two
properties is considered. As an instance, we mention the well known
Laplace's rule of succession. All the possible compositions of an infinite
population with regard to two properties are given by the real numbers
in the interval [0. 1]. Let us suppose that s observations have been
performed, as to give the pair (si, s2). Let the initial finite oo-instance
law we start with be
(27) ([0,l],p'0)),/e[0,l]
where = 1.
p/(y)
Through Bayes theorem we determine the final distribution
= /"(l -/')S2dj)
(28) pF(j) /--(I
-/)'*/(?
by means of which we obtain the final finite oo-instance law
(29) ([0,l],pF(;)),/G[0,l].
As we have already said, in view of the fact that all the possible
compositions of the population can be taken into account, in this case it
is possible to eliminate from the inference its hypothetical aspect. To
this end, we only need - in this case - evaluate the probability for a
generic individual to have one of the two properties, irrespective of the
composition of the population. Such a probability is, with regard to the
initial law
(30) f 1/2
Jojpl(j)dj=
and, with regard to the final law
=
(31) f ypF(/)<?7 (si + l)/(s + 2).
Jo
(32) (OQip'O*)),/^
where = =
pJ(l) 1/2 and pJ(2) 1/2, while the final 1-instance law is
(33) (?QipF(7)),/eN2
where pF(l) = (sx+ l)/(s + 2) and pF(2) = (s2+ l)/(s + 2).
Finally, by means of (33) it is possible to evaluate the probability of
the possible compositions of an as yet unobserved sample and the law
relative to it.
5. INDUCTIVE INFERENCES
(34) CQtp'WJe^
where =
p*(j) yh
However, such a law does not remain unchanged through all the
steps of the inference. In fact, a purely inductive inference modifies the
law it starts with as new evidence becomes available. The way in which
such modifications are performed obeys the well known conditions that
called axioms of inductive logic. These conditions have as a
Carnap
consequence that after s observations, s, of which bear P;-,/eNk, the
1-instance law, when s= (si,..., ..., sk) is the of performed
sh k-tuple
observations, becomes
(35)(?Qf, pF(7))
where pF(j) = [(s;+ y,A)/(s + A)].
where
N\
pF(Ni,..., Nh ..., Nk)
N1\...Njl...Nk\
=
(38) f(xu ...,xh...,xk) (B(y{\,..., y;A,..., y^))"1 xy^~~l
=
l.
...x?X-l...x?X-\ixi
4-s7-,...ykA4-sfc))-1xrA+s'-\..xr^1
=
7 1
and again the values of the densities for various (jci, ..., x?,..., jck) are
not all on a par. This means that also in relation to a purely inductive
estimate of a oo-instance law, evidence plays an important role, as it
allows us to assign a higher density to some compositions than to others.
The example is meant to show how starting with a 1-instance law
another 1-instance law can be arrived at by a purely inductive in?
ference. In some cases the estimate can be extended to a oo-instance law
6. CONCLUDING REMARKS
APPENDIX
F^MP/O^eNj
and
_ . . .x
p(s) p(l) x p(s)^
(H$\p(H\so)),He8V\
A finite s-instance law is an ordered pair
(Qk-s,p(H\so)),HeQk'?.
Likewise denumerable and continuous laws can be defined.
NOTES
1
See R. Carnap: 1962, Logical Foundations and K. R. Popper:
of Probability, Chicago,
1965, The Logic of Scientific Discovery, New York. See also K. R. Popper, The
Demarcation Between Science and Metaphysics', and R. Carnap, in P. A.
'Replies',
Schupp (ed.), The Philosophy of Rudolf Carnap, La Salle, 111., 1963.
2
To answer a possible objection, which was in fact raised during the debate in Siena, we
stress that our analysis is concerned only with statistical methods, which are
essentially
ampliative in kind. Thus, our distinction between inductive and deductive methods in
statistical analysis is not to be taken as a distinction between non-demonstrative and
demonstrative, but rather between probabilistic and non-probabilistic methods. Some
interesting remarks on the nature of statistical methods are to be found in I. Scardovi:
1984, 'A proposito di conoscenza e strategia', Statistica XLIV, 131-156.
3
In this section we adopted the symbols and concepts used by Carnap. See especially R.
Carnap, 'A Basic System of Inductive Part 1 in R. Carnap and R. C. Jeffrey
Logic',
(eds.), Studies in Inductive and Probability, vol. I, Berkeley,
Logic 1971, and Part 2 in
R. C. Jeffrey (ed.), Studies in Inductive Logic and Probability, vol. II, Berkeley, 1980.
4
For the sake of are considered.
simplicity only prior probabilities
5
See R. A. Fisher, Statistical Methods for Research Workers (1925), 14th edition, New
York-London, 1970, pp. 96-97.
6
Theo Kuipers out that Hintikka's
pointed system of inductive logic is also of the kind
we call hypothetico-inductive. In the present we limit our attention to
paper, however,
those methods of inference which are actually used in statistical research.
7
See W. E. Johnson, the Relation of Proposal to Supposai',
'Probability: 'Probability:
Axioms', 'Probability: The Deductive and Inductive Problems', Mind XLI (1932), 1-16,
281-296, 409-423; and Logic, Dover, 1924. See also R. Carnap: 1945, 'On Inductive
University of Bologna
Dipartimento di Scienze
Statistiche "Paolo Fortunati"
Via Belle Arti 41
-
I 40126 Bologna
Italy
University of Bologna
Dipartimento di Filosof?a
Via Zamboni 38
1-40126 Bologna
Italy