(CONSTANTINI, D. & GALAVOTTI, M. C. 1986) Induction and Deduction in Statistical Analysis, ERKENNTNIS 24, 73-94

Induction and Deduction in Statistical Analysis
Author(s): Domenico Costantini and Maria Carla Galavotti

Source: Erkenntnis (1975-), Vol. 24, No. 1, Foundations of Estimation and Testing of
Hypotheses in Statistics (Jan., 1986), pp. 73-94
Published by: Springer
Stable URL: http://www.jstor.org/stable/20006549 .
Accessed: 19/06/2014 20:14
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp
.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact support@jstor.org.
Springer is collaborating with JSTOR to digitize, preserve and extend access to Erkenntnis (1975-).
http://www.jstor.org
This content downloaded from 195.34.78.61 on Thu, 19 Jun 2014 20:14:36 PM

All use subject to JSTOR Terms and Conditions
DOMENICO COSTANTINI AND MARIA CARLA GALAVOTTI
INDUCTION AND DEDUCTION IN

STATISTICAL ANALYSIS
... it will be found useful to distinguish three types of

propositions, each of which presents its own kind
of probability problem; namely (1) the singular
proposition... (2) the class-fractional proposition
... (3) the universal
proposition.
W. E. Johnson, The Relation of Proposal
'Probability:
to Supposai', Mind 41 (1932), p. 2.
1. INDUCTION AND DEDUCTION IN STATISTICS
As iswell known, "inductivism" and "deductivism" represent different,

and in some aspects opposite, tendencies. Their
epistemological
leaders, namely R. Carnap and K. R. Popper, waged a bitter argument,
each inspired by the conviction that his own position the
represented
correct approach to epistemology.1 Neither of them seems to have
considered the possibility of combining their individual viewpoints into
a single perspective.
The opposition is, after all, between confirmation and falsification, or,
using statistical terminology, between methods of estimation, which are
intrinsically inductive, and tests of significance, which are intrinsically
deductive.2 This paper aims to show that such opposition should leave
room for a more pluralistic view, according to which induction and
deduction represent complementary, albeit different, to
approaches
scientific methodology. Induction and deduction, it will be argued, are
not always sharply separable since, in addition to purely inductive
methods (like Johnson's or Carnap's methods), there are "mixed"
methods (like Bayesian estimation).
In addition, purely inductive and purely deductive methods would
better be considered not as opposed, but rather as devised for different
purposes, to be used in different contexts. In particular, we think that in
scientific research it is possible to distinguish between two contexts: a
"context of testing" and a "context of estimation". In the context of
testing one aims at comparing a hypothesis (stated in the form of a law)
with facts. This is done by performing a certain number of
experimental
Erkenntnis 24 (1986) 73-94.

? 1986 byD. Reidel Publishing Company

74 DOMENICO COSTANTINI AND MARIA CARLA GALAVOTTI
observations, of which one tries to reject the hypothesis;

by means the
procedure adopted cases
in such is a hypothetico-deductive one.
Hypothetico-deductive methods in statistics are represented by tests of
significance as defined, for example, by R. A. Fisher. They work by
- -
assuming a law let us call itH that will be rejected (falsified) in the
case that the observed results are such as to put in action a suitable
rejection criterion. The assumed distribution is called "theoretical" (the
null hypothesis), while the distribution of observed results is called
"empirical". In this context no use of the notion of probability ismade,
but rather of "hypothetical probabilities", which are referred to the law
that has been assumed. The core of such methods is falsification. In fact,
they can be seen as the counterpart, for statistical laws, of the principle
of strict falsification, which applies to deterministic laws.
On the other hand, the context of estimation is characterized by the
use of inductive procedures, which are probabilistic in kind. A main
difference between the two contexts is that in the context of testing one
starts by assuming a distribution is true, whereas in the context of
estimation one does not make such an assumption. In the context of
estimation the application of the methods adopted does not involve
accepting or rejecting a distribution, as happens in the context of
Instead, we have a process of from
testing. gradual learning experience.
More precisely, inductive inferences make use of conditional prob?
abilities, being based essentially on the principle of conditionalization.
In probabilistic terms, inferences of this kind reflect the shift from ^(H)
to C (H\E), where %H represents a distribution and E a corpus of
experimental observations. According to the multiplication axiom, the
conditionalization is made as follows:
=
(1) ?(H|E)
-^-<i?(HnE).
This can be done by introducing some hypotheses as to the way in
which E affects %H, i.e., to the way in which experimental observa?
tions affect the initial estimated distribution given by ^(H). In par?
ticular, an inference of this kind starts from an initial evaluation of a
distribution on the compositions of the population, ^(H), and after a
sample from the population described by E has been observed, comes,
by means of (1), to the evaluation of the probabilities of those
which are compatible with E. This represents a purely
compositions,
inductive approach to statistical inference.

INDUCTION AND DEDUCTION IN STATISTICAL ANALYSIS 75
A different approach to estimation is represented by Bayesian

inference. According to Bayes' theorem, the shift from the initial to the
final distribution is performed as
(2) ?(H E)
| ?(?) ?(? H).
|
=-^
The difference between (1) and (2) clearly lies in the fact that%(H fl E)
<
is transformed in order to take into account the likelihood (E H).
\
This implies that in applying (2) we assume that the family of laws for
the population is known. In fact, the use of (2) can bring about
satisfactory results only in the case when the law of the population
actually belongs to such a family, otherwise it will lead to worthless
results. This reflects the hypothetical aspect of Bayes' theorem. What is
assumed in this case is not a particular law, but rather the hypothesis
that the law belongs to a certain family; the unknown part of the law,
then, remains the parameters. So, what H amounts to is a formal
representation over the possible values of the parameters. In other
words, when we apply Bayes' theorem we start from an initial dis?
tribution ^(H) on the possible values of the parameters and we modify
it in view of the likelihood %(E \H), to obtain the final distribution
^(H E)
| on the parameters. Given the essential role played in such an
inference by the assumption of a hypothesis regarding the family to
which the law of the population belongs, it seems that it can be viewed
as hypothetico-inductive, i.e., as combining hypothetical and inductive
elements. One should note the different meanings of ^(H) in a
hypothetico-inductive and in a purely inductive inference. While in the
former case ^(H) represents a distribution over the possible values of
the parameters, in the latter it represents a distribution on the possible
compositions of the population, or, what can be shown to be the same, a
distribution on the properties under consideration. In other words,
^(H) gives in the first case the probabilities of the possible values of the
parameters characterizing the law of the population, and in the second
it gives the probabilities that an individual of the population has the
considered properties.
As suggested by the preceding considerations, in the context of
estimation a distinction can be made between two methods of in?
ference: a purely inductive method and a hypothetico-inductive one.
All the methods mentioned represent different ways of learning from
experience. To summarize, we think that it is possible to single out three

of them: a purely deductive one, which belongs to the context of

testing, plus a purely inductive and a "mixed" one, which belong to the
context of estimation.
2. PROBABILITY AND LAWS
In the preceding section, reference has been made to statistical laws. In

the present section we will clarify what we mean by "laws". We will
then state two different interpretations of laws, both entering into
statistical methods. Itmust be stated clearly from the beginning that we
use the term "law" to mean generalizations of various kinds to be
specified. However, we stress that our approach does not cover a
particular kind of laws, namely laws in which the order of individuals is
relevant, like, for example, laws describing time series.
First of all; let us consider a set of individuals {a?; i e N} and a family
can be Nk, or N, or R. Each individual
of predicates {P;; j e I}, where I
can bear one and only one predicate of the family. A model (state
a
is sequence Z: N ?? I. The set of all models is Z, i.e., the
description)3
certain proposition. An atomic proposition is represented by
(3) Pjai={Z;ZeZ and Z(i) = j}.

An atomicproposition is then a set of sequences. For the sake of
simplicity, however, we will simply say, when referring to a proposition
such as (3), that the individual a? bears the predicate P7.
The set of all atomic propositions relative to the predicates con?
sidered is
i.e., that partition of the set Z which is relative to the individual a?. The
set of all partitions relative to each individual is the set of atomic
propositions
Eat=U{Ef}.
igN
The set of propositions E is the (7-field generated by Eat on Z.

Deterministic laws can be expressed as statements like "all in?
dividuals of a certain kind bear a property". The formulation "if an
individual bears a property, then it bears another property as well"
cannot be used, since, as we are considering only one family, the same

individual cannot bear two properties at the same time. It should be

noted that in view of this characteristic, Hempel's paradox does not
arise here.
Let us consider first the simplest case. In this case each individual of
the population can bear one or the other of two properties Pi and P2,
we I = law "All individuals bear P^'
then have N2. The deterministic
can be formulated by means of the sequence: for all i e N, Z(i) = 1, or
else as follows:
(4) PI
igN
Piai.
This is not the only possible kind of deterministic law. The following
qualifies as a deterministic law as well: "Except for the individuals
belonging to Ne, all individuals bear P{\ and can be formulated as
(5) (PI P2ai)n( PI PiaA.

When we cannot or are not willing to continue enumerating the indices
of the individual constants of Ne, and we consider a finite population of
N members limiting our attention to the number of exceptions, we
obtain a statistical law. In order to define this kind of law we introduce
the notion of relative frequency of individuals bearing a predicate in the
initial part of a sequence Z. If Nx stands for the number of predicates
whose index is 1 in ZN, ? A statistical law regarding a
rZf4 NJN.
population of N individuals will be expressed by the proposition
(6) {Z;ZeZ and dN=//N}
where ie?NN.
There will be N+1 laws of this kind, 2 of which, namely those where
rlZNequals 1 or 0, will be of kind (4). Clearly, what is expressed by (5),
i.e., the specification of individuals not bearing Pl5 cannot be expressed
by any one of such numbers. In other words, a given relative
frequency
is compatible with different compositions of the population.
If we take into account an infinite population, the statistical law "the
limiting frequency of individual constants bearing P1 is pi" can be
expressed as
(7) nUD
h m N>m {Z;\r1ZN-p\<l/h}

If we want to show explicitly that we are considering just two

properties we can write (7) as
(8) n/eN2 nu
hj
n
tnj N>ntj
{z;k,-?i<i/ji,}
=
Pi + Pz l, hj>0.
It should be noted that in the case of an infinite population the
number 1 does not correspond only to a law of type (4), but also to a law
of type (5), as well as to all those laws, the number of whose exceptions,
though infinite, tends to 0.
Let us consider next a more complex case, that is the case in which
individuals of the population can bear one out of k properties. Here we
have I = Nk. For deterministic laws, with or without exceptions, what
has already been said still holds, obviously with some modifications,
because k properties are taken into account. Then, for example, there
will be k exceptionless deterministic laws instead of two. Statistical laws
will be analogous to (8). Then the law "the limiting frequency of
individual constants bearing P; is p?, j e Nk" has the formulation
(9) ny'eNknhj umy N>m?

n {z^-r^i/m
=
I;eNk P/ l, hj>0.
If denumerably many properties are considered, that is I = N, a
statistical law can be formulated as (9), with N in place of Nfc.
What we have said shows that the shift from deterministic laws of
type (4) to statistical laws with denumerably many properties is gradual,
in the sense that there is no conceptual gap between the former and the
latter. However, this does not apply to the passage from statistical laws
with denumerably many properties to statistical laws with continuously
many properties. In such a case I = R, and a law of this kind can be
formulated as
(10) /(;), jeR

The gap between laws previously considered and laws of this kind is
such that the latter cannot be formulated by means of our symbolism.
Of course it is possible to introduce a density function, i.e., a function
such that
/(/)^0 and f /(/)d/ = l,

J?00

but this leads away from the path followed so far. The introduction of a
density can be seen in some respects as an arbitrary step. In fact only a
few densities can be reached as limiting cases of laws of type (9). When
this happens, starting from (4), we can reach (10) in a gradual way; such
a graduality, however, represents something which must be shown in
each case.
We now give some examples of the laws we have been considering.

Given that "mortal" means "dead before 150 years old", "all men are
mortal" is an example of (4). For Christians "Jesus Christ is the sole
immortal man" is an example of (5). "In Italy the relative frequency of
death before 20 years old is 0.02" is an example of (6). "If we throw this
coin infinitely many times the relative frequency of heads is 0.497" is an
example of (7). The mortality tables of Italians represent an example of
(9) referred to a finite population, i.e., the p;s are relative frequencies.
"The infinite population of all army corps is such that the relative
frequency of army corps with a given number of deaths due to horse
kicks in one year tends to a limit" is an example of (9) with denumerably
many attributes. "The infinite population of elementary errors is such
that the relative frequency of whatever deviation from the mean value
measured in terms of the standard deviation tends to a limit" is an
example of (10).
The way in which laws have been regarded so far can be seen as
representing a "universal interpretation". However, there is also an?
other way of interpreting laws. For reasons which will be made clear,
the latter will be called "instance interpretation", and it is referred to a
generic individual. It is not difficult to see how a law can be interpreted
in two different ways. For example, the law "all men are mortal" can be
taken as to mean (i) Adam will die, Eve will die, Abel will die, and so
on, enumerating all men of the past, present and future; or (ii) whatever
man is observed, at any time or in any place, such a man will die. While
the universal interpretation is perhaps the more immediate one, inmany
circumstances reference ismade to the instance interpretation. In fact,
when stating a statistical law an individual interpretation ismore often
adopted. As a matter of fact one says: "the probability that this coin
turns up heads is 0.497", or "the probability of being killed by a horse
kick is given by the Poisson distribution", or "the probability of an error
of a given width is given by the normal distribution", and so on.
However, both interpretations are equally important in this respect, as
both of them enter into statistical inferences.

The instance interpretation of a deterministic law amounts to:
(11) for each / e N, P^.

It is easy to see that (4) and (11) can have the same probability4 only in
the case when the law which is dealt with is supposed to be true. In this
case, in fact, for each i e N, ^(P^ |Z) is equal to ^(fl ?eNP,a? |Z), and
both probabilities are equal to 1. On the other hand, in the case for
=
each i e N, ^(Pj?? Z) \ r, 0 < r< 1, such equality does not hold, since
we can have ^(H ?eN = 0.
P^ Z)
|
As we have already said, the universal interpretation of a statistical
law amounts to assigning a certain characteristic to the population.
More precisely, it amounts to stating that the limiting frequencies of
individual constants bearing P;, j e I, are equal to p;. On the other hand,
the instance interpretation of a statistical law amounts to stating that for
each individual of the population, no matter which one, the probability
of bearing P;, ; e I, is equal to p7. For example, with reference to a law
with k properties, such interpretation would state that
=
(12) for each i e N, ^(P^ |Z) ph j e Nk.
It is easy to see that the two interpretations in general lead to different
probability values, and that it can often happen that while (12) differs
from 0, (9) can be equal to 0.
Let us now give a formalization to both interpretations of laws that
have been singled out. Only statistical laws will be considered, since
deterministic laws can be seen as representing a limiting case of
statistical laws. As we have said, the laws we are considering make
reference to the void evidence; their extension to other kinds of
evidence is immediate.
We define a 1-instance law as an assessment of probability to the
possible compositions of a population of one individual considered with
respect to the attributes of a family. This definition can be easily
extended to universal laws, as a law of this kind can be seen as an
assessment of probability to the possible compositions of a population
of n individuals considered with respect to the attributes of a family.
In order to formalize these definitions, we introduce some new
=
symbols. sJ0 (0,... ,1,... ,0) is the ordered fc-tuple whose elements
are all 0 except for the yth element, which is equal to 1, ?Qs is the set of
all fractions whose denominators are equal to s and whose sum is 1.
Such a set then contains all the possible relative frequencies which can

be obtained by examining a population of s individuals. p(j) is a

probability function.
A finite 1-instance law is an ordered pair
and IpO) = l.
(?Qtp(j))JeNk = 1
7
In a finite 1-instance law the first member is called the support, and the
second the distribution. It isworth noting that ?Q? = {s?; / e Nk} and So
represents a (potential) sample, or a (degenerate) population of 1
individual bearing the attribute P;. A finite 1-instance law is then an
assessment of probability to a set of (potential) samples.
A finite s-instance law is an ordered pair
~
= S+ k \ =
(?Q?,POU JeN,, q ( I p(j) 1.
As before, the first member of the pair is called the support and the
second member the distribution. A finite s-instance law is then an
assessment of probability to a set of (potential) populations, more
precisely to all the possible populations originated by s individuals and
k attributes.
It should be noted, first, that since the supports of 1-instance and
s-instance laws do not refer to specific individuals but rather to relative
frequencies, such laws refer to each individual of the population and to
each population of s individuals respectively and, second, that a well
defined universal law is a particular case of an s-instance law. In fact, if
we know (or suppose) that the composition of a population of s
individuals is (sjs,... ,Sj/s,... ,sk/s) and we want to express this
knowledge (or supposition) we choose the s-instance law whose dis?
tribution assigns 0 probability to all the fc-tuples different from
(si/s,... ,Sj/s,... ,sk/s) and assigns probability 1 to this fc-tuple.
As s-?oo, ?Qs becomes ?Rk, i.e., the set of all fc-tuples of non
negative real numbers whose sum is 1. If the distribution of a s-instance
law converges to a fc-dimensional distribution function a finite oo
instance law is obtained. It is also possible to define the concepts of
denumerable and continuous 1-instance, s-instance and oo-instance
laws.
According to the universal interpretation, laws are seen as oo

instance, while according to the instance interpretation they are seen as
1-instance. Finally, according to the predictive approach to statistics,

laws are interpreted as s-instance. We stress that, at least in the present

paper, the distinction between various kinds of laws is intended to
clarify the structure of statistical inferences. In particular, we regard it
as very useful to state at what stages of an inference an appeal ismade
to laws, and of what kind. At the same time, this will help to distinguish
between various kinds of predictions which can be made by means of
statistical inferences. It seems, in fact, obvious that there is a sharp
difference between 1-instance, s-instance and oo-instance arguments. In
general, conclusions relative to a small value of s cannot be simply
extended to large values of s.
In what follows it will be shown by means of some examples how the
different interpretations of statistical laws that have been considered
enter into statistical methods of inference.
3. HYPOTHETICO-DEDUCTIVE INFERENCES
As has already been said, tests of significance can be seen as hypo?

thetico-deductive methods. To exemplify them, we will examine in this
section Fisher's exact test.5 This is a method for testing a statistical law,
and applies to contingency tables like the following:
TABLE I
Pi -?Pi
P2 Sn Si2 S\.
~~'^2 s2\ s22 S2
S.l S.2 s
The question to be answered is whether there is some kind of asso?

ciation between the properties Pi and P2 or not. Obviously we suppose
that Sn/s.i > S12/S 2.More precisely, the question can be posed as: is the
association between the two properties shown by the contingency table
really due to an association linking the properties in the population to
which the observed individuals belong?
starts by assuming that no association -
Fisher's exact method exists
- of
the null hypothesis and continues by comparing the consequences
such a hypothesis with the data of table I. If Pi and P2 are not associated
in the population, then the probability of bearing Pi must be the same,
regardless of the fact of bearing P2 or not. In other words, if we select
from the population two samples which differ as to the fact that their

members do or do not bear P2, Pi will have the same probability in both
of them. Let p stand for the probability of Pu and q = 1 -p for the
probability of ~iPi. On the same assumption, it can be argued that (a)
-
the selection of s elements from the population s.i of which bear Pi
- can occur in
and s2 bear ?iPi (lA) different ways, all having the same
probability, namely pSiqS2; (b) the selection of s elements, of which Si.
-
bear P2 supposing that Su of these bear Pi and the remaining si2 do
not - and s2. bear ~iP2 - supposing that s2i of these bear Pi and the
- can
remaining do not be obtained in (s}?)(s1?) different ways, again
same =
the to
having probability, equal ps^qs^ps^qs^ p'.iq'*.
It is easy to see that the probabilities considered in (a) are mutually
exclusive, and that they are the only ones compatible with the marginal
values of table I. It is equally easy to realize that the possibilities
considered at (b) are also mutually exclusive and that they are the only
favourable ones to the data of table I. It then follows that the probability
of obtaining an experimental result which conforms to the result
observed is given by
Vsii/ \s2J/ Vs.i/

The null hypothesis will be rejected if (13) takes a sufficiently small
value. In other words, in this case the hypothesis in question will be
considered falsified.
Let us now formalize the matter. Fisher's exact method, like every
hypothetico-deductive inference, starts with the assumption of a 1
instance law. In this case we have two families of attributes, and we
assume the two finite 1-instance laws:
(14) (?Qipi(/)),pi(l) + pi(2) = l

=
(15) (?Q?,p2(7)),p2(l) + p2(2) l
where px(l) and p2(l) stand for the probabilities that an individual has
Pi and P2 respectively. Since px(l) and p2(l) are unknown, (14) and (15)
do not suffice to calculate the distribution (13). To see how it is possible
to determine such probabilities, we consider once again table I. The
table may be seen as one of the members of the support of an s-instance
law. What has to be determined, then, is precisely an s-instance law. To
assume the validity of an s-instance law for the population brings forth
the stochastic independence of observations, but does not exclude

physical dependence. According to Fisher, the marginal frequencies of

table I are taken as given, and this implies the assumption of a
hypergeometric distribution. This amounts to supposing the validity of
the two finite s-instance laws for the sample of s individuals:
(16) (?Qi Pl(/))

= 0 for = 1 for =
where p\(j) j^ sA and pi(j) /' sy,
(17) (?Q2,P2?))
where = 0 for = 1 for =
p2(j) j^ sim and p2(;) ; Si..
Having assumed the validity of (16) and (17) the four frequencies of
table I are univocally determined by each one of them. Consider the
number of individuals bearing both properties, and let it be denoted by
a -
/. Such number may vary between max(0, s.i s2.) and min(s.u sa).
(16) and (17) can be used in order to determine the probabilities of (14)
and (15). The relative frequencies s.i/s and Si./s, to which (16) and (17)
assign probability 1, are taken as the probabilities that an individual of
the sample bears the properties Pi and P2 respectively. In other words,
these values are used in order to determine the distributions of (14) and
(15). We limit our attention to
(18) (?Q2,pi(y))
= the laws of the
with pi(l) s.i/s. In spite of stochastic independence,
samples to be considered are fixed by (16) and (17). In fact, if we
suppose we have observed one individual with Pi, the individuals with
- 1 total of individuals in the sample
this property become s.i and the
s - 1. It follows that after such an observation the law of the unobserved
part of the sample is no longer (16), but
(19) (?Q?-i, pi(;))

? -
where pi(/)
= 0 for
j^ s.i 1 and pi(y) = 1 for ; = s.i 1.
The 1-instance law to be used in order to determine the probability
that the second individual to be observed has Pi is as (18), but with
= -
(s.i-l)/(s-1). Analogously we will have an (s 2)-instance
pi(l)
-
law, an (s 3)-instance law, and so on, until we obtain the 1-instance
law we are interested in. This is the following:
(20) (Q2, Pl(/))

where Q2 is the set of all the relative frequencies compatible with the

limit fixed for /, and
?<HX-,)/C>
On the basis of
this law the acceptability of the hypothesis of an
association betweenthe considered properties can be evaluated.
In general terms, the core of the preceding argument can be outlined
as follows. We have a corpus of experimental data, provided by the
observation of a sample of s elements taken from the population, and k
properties are taken into account. On the basis of the relative frequen?
cies of the properties we can state a 1-instance (empirical) law of the
form
(21) (?Qf, pE(j))

where =
pE(j)Sj/s is the relative frequency of individuals bearing P; in
the sample. We then formulate a (theoretical) 1-instance law (the null
hypothesis) of the form
(22) (?Qf,pT(j))
that we want to test for acceptance or rejection. To do this, we must
compare (22) with (21). In view of the fact that an empirical distribution
given by a sample of s elements represents one of the possible members
of the support of an s-instance law, first we must determine, by means of
(22), the s-instance law relative to k properties
(23) (?Q5, pT(j))
and next we evaluate the conformity of (22) to (21) by means of the

that to the observed e =
probability (23) assigns sample (pE(j), j Nk)
e e ?Qs. If such a probability is sufficiently small, (22) will be
(Sj/s, j Nk)
rejected, otherwise it will be regarded as corroborated.
This represents an example of hypothetico-deductive statistical in?
ference, where a loose criterion of falsification is adopted, according to
which a hypothesis (the null hypothesis) is falsified (rejected) if the
probability of the empirical distribution, determined on the basis of it, is
sufficiently small. In the example two interpretations of laws occur,
namely 1-instance and s-instance. It is worth mentioning that other
tests of significance - like those based on - use
x2 make of oo-instance
laws.

4. HYPOTHETICO-INDUCTIVE INFERENCES
As we have
already said, Bayesian inference can be seen as hypo?
thetico-inductive in character.6 Its hypothetical aspect is represented
by the choice of an initial law assigning 0 probability to all distribution
functions except for those belonging to a certain family. Its inductive
aspect is represented by the shift from an initial to a final law. One
should note that a Bayesian inference usually starts with a oo-instance
law, to end up with a law of the same kind. However, itmakes use of
1-instance laws to determine the influence of experimental results on
the initial law.
Let us suppose that a family with continuously many attributes is
- with the measurement of a
considered for instance those connected
given length. We suppose also that the result of such measurement can
be any real number, so that the properties to be considered are all
members of R. After having performed s measurements, we have the
s-tuple (jci, ..., jcs). The distribution we start with is relative to all
distribution functions, that is to all the possible compositions, of an
infinite population with respect to continuously many properties. We
now assume as a hypothesis that this distribution assigns 0 probability to
all distribution functions except for normal distributions with variance
or2. In this way all distributions can be represented by the set of all
possible mean values, that is by the set of real numbers. The initial
(continuous) oo-instance law we start with will then be
(24) (R, p1 (j)), j eR

= - -
where p*(j) (2ttct]) exp( (j puf/lo^.
Through Bayes'theorem we determine the final distribution
= - -
(25) pF(j) (irai) exp( (j /iF)/2cr2,)
where
= = SO- + = S
M'F ; 2.1/ 2 >?F O"/, X 2- *?
s/a +1/0-7 ?=1
Then, the final continuous oo-instance law is obtained:
(26) (R,pF(j))JeR.
With respect to this law, the possible compositions of the population
having non-0 density are all and only the normal ones with variance a2.

All the remaining compositions have density 0 for the initial law, and
have the same density for the final law. This should clarify the role played
by the hypothesis that has been formulated as to the composition of the
population. In fact, if such a composition actually conforms to the
hypothesis, the inference will tell us more about the composition itself. If,
on the contrary, the population is not normal with variance a1, the
inference will not modify our knowledge about its composition.
In some cases it is possible to eliminate the hypothetical aspect of a
Bayesian inference. This is the case, for example, when a family of two
properties is considered. As an instance, we mention the well known
Laplace's rule of succession. All the possible compositions of an infinite
population with regard to two properties are given by the real numbers
in the interval [0. 1]. Let us suppose that s observations have been
performed, as to give the pair (si, s2). Let the initial finite oo-instance
law we start with be
(27) ([0,l],p'0)),/e[0,l]
where = 1.
p/(y)
Through Bayes theorem we determine the final distribution
= /"(l -/')S2dj)
(28) pF(j) /--(I
-/)'*/(?
by means of which we obtain the final finite oo-instance law
(29) ([0,l],pF(;)),/G[0,l].
As we have already said, in view of the fact that all the possible
compositions of the population can be taken into account, in this case it
is possible to eliminate from the inference its hypothetical aspect. To
this end, we only need - in this case - evaluate the probability for a
generic individual to have one of the two properties, irrespective of the
composition of the population. Such a probability is, with regard to the
initial law
(30) f 1/2
Jojpl(j)dj=
and, with regard to the final law
=
(31) f ypF(/)<?7 (si + l)/(s + 2).
Jo

The initial 1-instance law is then
(32) (OQip'O*)),/^
where = =
pJ(l) 1/2 and pJ(2) 1/2, while the final 1-instance law is
(33) (?QipF(7)),/eN2
where pF(l) = (sx+ l)/(s + 2) and pF(2) = (s2+ l)/(s + 2).
Finally, by means of (33) it is possible to evaluate the probability of
the possible compositions of an as yet unobserved sample and the law
relative to it.
5. INDUCTIVE INFERENCES
In this section we will present an example of purely inductive inference,

and we will see how laws are interpreted in such a context. The method
which will be considered was first devised by Johnson, and later
developed by Carnap.7 While briefly recalling it, we will try to
emphasize some of its consequences, which have often been neglected.
First of all, it should be noted that the Johnson-Carnap method aims
at estimating a 1-instance law, which can then be used to estimate an
s-instance law, as well as a oo-instance one, when this is possible. To do
this, it starts by formulating a prior 1-instance law. In other words, the

method in question always works on a given 1-instance law, the choice
of which represents the first step of the inference. Such a law has the
form
(34) CQtp'WJe^
where =
p*(j) yh
However, such a law does not remain unchanged through all the
steps of the inference. In fact, a purely inductive inference modifies the
law it starts with as new evidence becomes available. The way in which
such modifications are performed obeys the well known conditions that
called axioms of inductive logic. These conditions have as a
Carnap
consequence that after s observations, s, of which bear P;-,/eNk, the
1-instance law, when s= (si,..., ..., sk) is the of performed
sh k-tuple
observations, becomes
(35)(?Qf, pF(7))
where pF(j) = [(s;+ y,A)/(s + A)].

The distribution of (35) is a weighted mean between the distribution

of the a priori law (34) and the distribution of the empirical law
(36) (?Qf, pE(j))

= are A/(s + A) and
where pE(j) Sj/s, whose weights s/(s + A) respec?
tively.
The cardinality of the population has no role in estimating a
1-instance law. On the other hand, the cardinality becomes important if
we are estimating an s-instance law. Let us suppose first that the
cardinality of the population is N+ s and that we have observed a
sample of s individuals. In this case we estimate the N-instance law as
(37) (?Q&, pF(N1,... ,Nj,... ,Nk))
where
N\
pF(Ni,..., Nh ..., Nk)
N1\...Njl...Nk\
r(A + s)nr(y,A + s7+ JV,)

x k
r(A + s + N)nr(A + s + N)
J=l
The estimation is performed using (35) and the multiplicative axiom. It

is worth noting that, irrespective of the value pF(j) of (35), if N is
sufficiently large, then the values of (37) are all very small. But this does
not mean that they are on a par. If we make a suitable choice of A there
can also be large differences among them.
Let us now suppose that the population we are considering contains
countably many individuals. In this case we have a continuum of
possible populations. Each population is represented by a fc-tuple of
real numbers (rx,..., r7,..., rk), r?e [0,1], namely the limit of the
relative frequency of individuals bearing P;. In dealing with an infinite
number of individuals we must take densities into account. To this end,
we consider the probabilities of relative frequencies of P7 in the possible
populations. That is, we consider the random variable whose values are
in ?QJv. Let X = (Xi,..., X2,..., Xk) be this random variable, ^N(X)
its distribution and FN(x) its distribution function. If JV??oo? then

^N(X) -* 0 for every X. But if we consider FN(x) we see that
(i) the FN(x) converge towards a unique function F(x);

(ii) F(x) is a Dirichlet distribution,
i.e., a distribution function whose density is
=
(38) f(xu ...,xh...,xk) (B(y{\,..., y;A,..., y^))"1 xy^~~l
=
l.
...x?X-l...x?X-\ixi
The convergence towards 0 of the probability of each composition of

the population is a very natural consequence of the fact that we are
an estimation related to a continuum of possible com?
performing
It is useful to notice what happens if we take into account an
positions.
observed sample described by (si,..., s;,..., sk). In this case (38)
becomes
= +
f(xx,..., xh ..., xk | Si,..., sy,..., sk) (B(yxk si,..., y;A
4-s7-,...ykA4-sfc))-1xrA+s'-\..xr^1
=
7 1
and again the values of the densities for various (jci, ..., x?,..., jck) are
not all on a par. This means that also in relation to a purely inductive
estimate of a oo-instance law, evidence plays an important role, as it
allows us to assign a higher density to some compositions than to others.
The example is meant to show how starting with a 1-instance law
another 1-instance law can be arrived at by a purely inductive in?
ference. In some cases the estimate can be extended to a oo-instance law
by means of the rule of probability calculus.
6. CONCLUDING REMARKS
The preceding were intended to clarify the different

considerations
features characterizing methods of statistical inference. In particular,
we have seen that in the case of a purely inductive inference we
estimate a distribution, while in the case of a Bayesian inference we

estimate a parameter, after having assumed a family of distributions. On

the other hand, in the case of tests of significance a theoretical
distribution is compared with an empirical one, with the aim of either
rejecting or corroborating it.
Being so different in kind, such methods seem to apply to different
contexts. The distinction between the "context of estimation" and the
"context of testing" traced in section 1 represents a first, very broad
distinction between the proper contexts of application of inductivist and
deductivist methods. In the context of estimation, inferences of two
different kinds have also been distinguished, i.e., purely inductive and
hypothetico-inductive inferences. We can say that, in general, these
inferences also apply to different contexts, as they seem to be based on
different bodies of information, and they make different assumptions. In
general, the choice of an inductive method will be inspired by the
information - or -
available by the lack of information and, in
particular, it will be conditioned by the possibility (or impossibility) of
formulating hypotheses as to the law of the population. It should be
noted that inductive and Bayesian inferences qualify as self-corrective,
though in different ways. The situation is quite different with hypo?
thetico-deductive inferences, since they involve a yes-or-no kind of
response, not a gradual adjustment of an estimate as observational
evidence increases.
Going back to our distinction between context of estimation and
context of testing, and keeping inmind the further distinctions we have
made, we can say that a central feature of the former is the fact that the
initial distribution is introduced with the aim of modifying it. Once an
initial distribution has been chosen, we modify it according to obser?
vations, obtaining a sequence of distributions by adjusting each dis?
tribution as evidence increases. The passage from one distribution to
another is precisely what characterizes the context of estimation. On
the other hand, in the context of testing the law we start with is not
modified, but rather corroborated or falsified. In other words, it remains
unchanged until it is found to be in contrast with experimental results.
The different features of statistical inferences that have been stressed
seem to lead to a conclusion, which was anticipated at the beginning of
these pages, namely that induction and deduction qualify as equally
important components of statistical methodology, which should be
considered complementary, rather than opposed. At the moment, we
do not have much to say about the ways in which such methods

combine in scientific research, and on the conditions underlying the

choice of one method rather than another. However, we regard it as an
important issue for further study. Very probably the quantity and the
kind of information available play a determining role in such a choice.
A component of the information which is very important in statistics,
namely the size of the sample, should perhaps be evaluated in this
connection. Even if this idea should be dealt with in more detail, it
seems interesting to relate the size of the sample to the kind of
assumptions underlying various methods of inference. It seems, after
all, quite natural that when a small sample is available the hypotheses
one relies upon do not regard the distribution, but only the way of
learning from experience, while a larger sample allows for hypotheses
about the distribution. It seems likewise natural that when the size of the
sample is very large, inductive methods are no longer essential, and
deductive methods of inference can be adopted. Clearly, a complex set
of factors both historical and pragmatic in nature enter into the choice
of different methods of statistical inference, as well as the relationships
between context of estimation and context of testing. In our opinion,
such factors appear worthy of more detailed investigation.
APPENDIX
In what follows we give an alternative definition of 1-instance and

s-instance laws making use of a slightly different symbolism, along the
lines suggested by Theo Kuipers whom we warmly thank.
Let ax, a2,..., aN stand for the individuals (random variables) in the
=
population, F e
{P;; j Nk} for the family of properties and p(. |. ) for a
relative probability.
F^MP/O^eNj
and
_ . . .x
p(s) p(l) x p(s)^
Es <=F(s) is a state of length s.

description
= of individuals in Es bearing
Sj(Es) Sj is the number P;.
Ds is a structure description when it is a set of state

descriptions isomorphic to a given one.

Such a structure description is symbolized by the fc-tuple s=

^ =
(si,..., s? ..., sk), Sj 0, for all 7, and s.
?/*Li s;
Nfc-5<=Nk is the set of all the fc-tuples of natural numbers

which are less or equal to s and whose sum is s.
= + = =
If seNk's, s> l,...,sfc)6Nk's+1, s(1) s0
(si,...,s; {*?;jeNk},
(0,...,0,...,0).
When s>0, Qks<= Qk is the set of all fc-tuples of fractions

whose denominators are equal to s and whose sum is 1.
A finite 1-instance law is an ordered pair
(H$\p(H\so)),He8V\
A finite s-instance law is an ordered pair
(Qk-s,p(H\so)),HeQk'?.
Likewise denumerable and continuous laws can be defined.
NOTES
1
See R. Carnap: 1962, Logical Foundations and K. R. Popper:
of Probability, Chicago,
1965, The Logic of Scientific Discovery, New York. See also K. R. Popper, The
Demarcation Between Science and Metaphysics', and R. Carnap, in P. A.
'Replies',
Schupp (ed.), The Philosophy of Rudolf Carnap, La Salle, 111., 1963.
2
To answer a possible objection, which was in fact raised during the debate in Siena, we
stress that our analysis is concerned only with statistical methods, which are
essentially
ampliative in kind. Thus, our distinction between inductive and deductive methods in
statistical analysis is not to be taken as a distinction between non-demonstrative and
demonstrative, but rather between probabilistic and non-probabilistic methods. Some
interesting remarks on the nature of statistical methods are to be found in I. Scardovi:
1984, 'A proposito di conoscenza e strategia', Statistica XLIV, 131-156.
3
In this section we adopted the symbols and concepts used by Carnap. See especially R.
Carnap, 'A Basic System of Inductive Part 1 in R. Carnap and R. C. Jeffrey
Logic',
(eds.), Studies in Inductive and Probability, vol. I, Berkeley,
Logic 1971, and Part 2 in
R. C. Jeffrey (ed.), Studies in Inductive Logic and Probability, vol. II, Berkeley, 1980.
4
For the sake of are considered.
simplicity only prior probabilities
5
See R. A. Fisher, Statistical Methods for Research Workers (1925), 14th edition, New
York-London, 1970, pp. 96-97.
6
Theo Kuipers out that Hintikka's
pointed system of inductive logic is also of the kind
we call hypothetico-inductive. In the present we limit our attention to
paper, however,
those methods of inference which are actually used in statistical research.
7
See W. E. Johnson, the Relation of Proposal to Supposai',
'Probability: 'Probability:

Axioms', 'Probability: The Deductive and Inductive Problems', Mind XLI (1932), 1-16,
281-296, 409-423; and Logic, Dover, 1924. See also R. Carnap: 1945, 'On Inductive
Logic', Philosophy of Science 12, 72-97.
Manuscript received 8 May 1985
University of Bologna
Dipartimento di Scienze
Statistiche "Paolo Fortunati"
Via Belle Arti 41
-
I 40126 Bologna
Italy
University of Bologna
Dipartimento di Filosof?a
Via Zamboni 38
1-40126 Bologna
Italy


(CONSTANTINI, D. & GALAVOTTI, M. C. 1986) Induction and Deduction in Statistical Analysis, ERKENNTNIS 24, 73-94

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

(CONSTANTINI, D. & GALAVOTTI, M. C. 1986) Induction and Deduction in Statistical Analysis, ERKENNTNIS 24, 73-94

Uploaded by

Copyright:

Available Formats

Induction and Deduction in Statistical Analysis

Author(s): Domenico Costantini and Maria Carla Galavotti

This content downloaded from 195.34.78.61 on Thu, 19 Jun 2014 20:14:36 PM

INDUCTION AND DEDUCTION IN

... it will be found useful to distinguish three types of

1. INDUCTION AND DEDUCTION IN STATISTICS

As iswell known, "inductivism" and "deductivism" represent different,

Erkenntnis 24 (1986) 73-94.

This content downloaded from 195.34.78.61 on Thu, 19 Jun 2014 20:14:36 PM

observations, of which one tries to reject the hypothesis;

This content downloaded from 195.34.78.61 on Thu, 19 Jun 2014 20:14:36 PM

A different approach to estimation is represented by Bayesian

This content downloaded from 195.34.78.61 on Thu, 19 Jun 2014 20:14:36 PM

of them: a purely deductive one, which belongs to the context of

2. PROBABILITY AND LAWS

In the preceding section, reference has been made to statistical laws. In

(3) Pjai={Z;ZeZ and Z(i) = j}.

The set of propositions E is the (7-field generated by Eat on Z.

This content downloaded from 195.34.78.61 on Thu, 19 Jun 2014 20:14:36 PM

individual cannot bear two properties at the same time. It should be

(5) (PI P2ai)n( PI PiaA.

This content downloaded from 195.34.78.61 on Thu, 19 Jun 2014 20:14:36 PM

If we want to show explicitly that we are considering just two

(9) ny'eNknhj umy N>m?

(10) /(;), jeR

/(/)^0 and f /(/)d/ = l,

This content downloaded from 195.34.78.61 on Thu, 19 Jun 2014 20:14:36 PM

We now give some examples of the laws we have been considering.

This content downloaded from 195.34.78.61 on Thu, 19 Jun 2014 20:14:36 PM

The instance interpretation of a deterministic law amounts to:

(11) for each / e N, P^.

This content downloaded from 195.34.78.61 on Thu, 19 Jun 2014 20:14:36 PM

be obtained by examining a population of s individuals. p(j) is a

According to the universal interpretation, laws are seen as oo

This content downloaded from 195.34.78.61 on Thu, 19 Jun 2014 20:14:36 PM

laws are interpreted as s-instance. We stress that, at least in the present

As has already been said, tests of significance can be seen as hypo?

The question to be answered is whether there is some kind of asso?

This content downloaded from 195.34.78.61 on Thu, 19 Jun 2014 20:14:36 PM

Vsii/ \s2J/ Vs.i/

(14) (?Qipi(/)),pi(l) + pi(2) = l

This content downloaded from 195.34.78.61 on Thu, 19 Jun 2014 20:14:36 PM

physical dependence. According to Fisher, the marginal frequencies of

(16) (?Qi Pl(/))

(19) (?Q?-i, pi(;))

(20) (Q2, Pl(/))

This content downloaded from 195.34.78.61 on Thu, 19 Jun 2014 20:14:36 PM

limit fixed for /, and

(21) (?Qf, pE(j))

(23) (?Q5, pT(j))

and next we evaluate the conformity of (22) to (21) by means of the

This content downloaded from 195.34.78.61 on Thu, 19 Jun 2014 20:14:36 PM

(24) (R, p1 (j)), j eR

Then, the final continuous oo-instance law is obtained:

This content downloaded from 195.34.78.61 on Thu, 19 Jun 2014 20:14:36 PM

This content downloaded from 195.34.78.61 on Thu, 19 Jun 2014 20:14:36 PM

The initial 1-instance law is then

In this section we will present an example of purely inductive inference,

this, it starts by formulating a prior 1-instance law. In other words, the

This content downloaded from 195.34.78.61 on Thu, 19 Jun 2014 20:14:36 PM

The distribution of (35) is a weighted mean between the distribution

(36) (?Qf, pE(j))

(37) (?Q&, pF(N1,... ,Nj,... ,Nk))