Download as pdf or txt
Download as pdf or txt
You are on page 1of 16

More Trouble for Regular Probabilities Matthew W.

Parker

Many have suggested that probabilities should be regular, meaning roughly that only impossible events should be assigned probability zero (Kemeny 1955, Shimony 1955, Jeffreys 1961, Edwards et al. 1963, Carnap 1963, De Finetti 1964, Stalnaker 1970, Lewis 1980, Skyrms 1980, Appiah 1985, Jackson 1987, Jeffrey 1992). For objective chances, regularity encodes the seemingly sensible principle that what is possible is more likely than what is impossible. For credences it represents a reasonable willingness to update ones expectations in light of evidence (since credences of zero cannot be modified by Bayesian updating). But regularity faces a number of mathematical and philosophical obstacles. Several authors have attempted to overcome the technical obstacles by letting probabilities take values in the non-standard real numbers, or hyperreals, which include real numbers, infinitesimals, and their sums (Bernstein and Wattenberg 1967, Lewis 1980, 1981, Nelson 1987, Benci, Horsten, and Wenmackers 2011). However, Timothy Williamson (2007) has shown that this in itself does not solve all of the problems with regularity. Here we expose another problem, related to Williamsons, but with broader scope and immune to some of the objections that might be brought against Williamson. Briefly, regularity implies certain violations of rigid transformation invariance, and hence, regular probabilities cannot be uniform in the fullest sense.

Let us call S = , A, P a probability space, and P a probability function, if

(i) (ii) (iii) (iv) (v)

A P() is an algebra of sets (where P(A) is the power set {B: B A}),
P maps A into an ordered field F = F, +, , 0, 1, <, for all A A, 0 < P(A) < 1, P() = 1, and P is at least finitely additive, i.e., for all disjoint A, B A, P(A B) = P(A) + P(B).

Unlike the standard Kolmogorov definition, this does not restrict probabilities to the standard real interval [0, 1], but allows them to fall between the identity elements of any ordered field, including a hyperreal field. It also permits probability functions that are not countably additive, but only finitely additive. Of course, we do not rule out countable additivity, and below we will always use finitely additive to mean at least finitely additive, without ruling out stronger properties. Regularity may take stronger and weaker forms. Lets say that a probability space S, and by extension its probability function P, is weakly regular if for all A A, if P(A) = 0 then A = . Say S and P are strongly regular if they are weakly regular and A = P(). In applications, represents the set of possible outcomes of some experiment or process, so if S is only weakly regular but not strongly, there are some sets of possible outcomes that do not have positive probability, because they do not have probabilities at all. Hjek (2012) gives several arguments that rational credences need not always be strongly

regular, and in some cases shouldnt be.1 But it is weak regularity that mainly concerns us here, for provided A contains certain simple, bounded, Lebesgue-measurable point sets, weak regularity is enough to contradict rigid transformation invariance. Assuming finite additivity or better, weak regularity is equivalent to Euclideanism. Say a probability function P is Euclidean if for all A, B A = dom(P), if A B (where denotes the proper subset relation) then P(A) < P(B).2 The name derives from Euclidean notions of set size, i.e., those that satisfy the Elements Common Notion 5: The whole is greater than the part (Katz 1981, Mayberry 2000, Benci and Di Nasso 2003, Benci, Di Nasso, and Forti 2007, Mancosu 2009, Parker 2009, 2012). Benci, Horsten, and Wenmackers (2011; Wenmackers and Horsten 2010) highlight the connection between infinitesimal probabilities and Euclidean notions of set size. They have introduced a theory of Non-Archimedean Probabilities (NAPs) in which P(A | B) = n(A)/n(B), where n is a numerosity function, a Euclidean measure of set size with a range that forms an ordered semi-ring (Benci and di Nasso 2003, Benci et al. 2007). NAPs have the nice feature that they apply to all subsets of a continuum, not just the Lebesgue-measurable or Borel subsets. In fact, a total finitely additive probability

Hjek also gives one argument the can be directed against weak regularity, at least for probabilities that are defined over all singletons and infinitesimal sub-intervals in a hyperreal interval. We will not take up that argument here, but it is worth noting that, contra Hjek , there is no particular difficulty in giving Kolmogorov-style axioms of probability that allow probability functions to take a variety of different ranges besides those included in the standard real numbers. This is just what we have done here, and what Benci, Horsten, and Wenmackers (2011) do with a stronger, transfinite additivity property. Proof: Suppose P is weakly regular, A, B A, and A B. Since A is an algebra, B \ A A. Since B \ A is nonempty, regularity implies that P(B \ A) > 0. By additivity, P(B) = P(A (B \ A)) = P(A) + P(B \ A) > P(A). So P is Euclidean. If P is Euclidean and A A, then A, so 0 < P() < P(A). Hence P is weakly regular.
2

function with standard real values can be obtained by taking the real parts of NonArchimedean Probabilities (Benci et al. 2011). Parker 2012 discusses the limitations of Euclidean set sizes due to their unavoidable arbitrariness and violations of rigid transformation invariance. For any Euclidean measure of size, some sets that are exactly alike in structure but positioned differently in a background space must have different sizes. Even Euclidean relations of larger, smaller, and equal are not preserved by translations or rotations. In that paper it is remarked that Euclidean sizes may nonetheless have useful applications in areas including probability theory. Here we will see that some of the limitations of Euclidean set sizes are also significant limitations on Euclidean or regular probabilities. They imply that the attractive symmetries normally afforded by a uniform distribution are not possible for regular probabilities.

The uniform distribution over a bounded region of Rn is the normalized Lebesgue measure Pu(A) = (A)/(), where is Lebesgue measure, on the Lebesgue measurable subsets A of . All measure-zero sets are assigned probability zero, and sets that are not Lebesgue measurable get no probability at all. The term uniform here is justified by the fact that for each R, Pu assigns the same probability to all open balls B(x, ) . More generally, Pu is invariant under isometries (distance-preserving transformations) on Rn: For any isometry T: D Rn Rn and any Lebesgue measurable A , if TA then Pu(TA) = Pu(A). Consequently, Pu assigns the same probabilities

to any two geometrically congruent point sets as well as their reflections, provided the sets are measurable. Uniform distributions are standard examples in probability theory, and they are especially popular as prior probabilities, since they are thought by some to represent complete ignorance. This has sometimes been justified by appeal to the much maligned Principle of Indifference (Keynes 1921), or to the more sophisticated Maximum Entropy Principle (Jaynes 1968). However, these principles do not pick out the uniform distribution uniquely unless we presuppose a privileged coordinate system or background measure (Uffink 1995). Jaynes (1973) argues that the choice of prior probabilities can be narrowed or entirely determined by symmetry considerations. For example, if a problem concerning a probability distribution over a dart board does not state the position or orientation of the dart board, then in order for the problem to have a unique solution, we must assume that the solution is invariant under translations and rotations. Jaynes also argues that such considerations are likely to yield accurate predictions in physical applications, because distributions that have such symmetries require less skill (Jayness scare quotes). Jayness views are certainly debatable, but they illustrate some of the special uses of uniform distributions and rigid transformation invariance. In any case, we normally expect it to be at least possible to assign rational credences or generate objective chances in such a way that isometric point sets get the same value. But a regular probability function on an infinite domain cannot be uniform in this sense. Consider for example the radii of a circular dartboard with radius one. For every

R, let r be the radius at an angle of radians to the right-hand horizontal radius.

Let R = R (mod 2) = {r + n: n N}. Then each R is the union of a countable infinity of distinct radii, and for each n N, R +
n

R ; e.g., R1 R0. Now let P(R) denote the probability that an idealised dart with

point-like tip, thrown by a very poor player, but which nonetheless manages to hit the board, first strikes a point in R. If P is regular then it is also Euclidean, so P(R1) < P(R0). But R1 is a mere rotation of R0. So its impossible to have a regular distribution on the dartboard thats indifferent to rotations. If physical chances really are regular, then the dart must somehow discriminate between point sets that are exactly alike except for an angle of rotation. Its impossible, if regularists are right, to throw darts so badly that they dont discriminate in this way. And if rational credences really should be regular, credences too must so discriminate. In contrast, the usual uniform distribution Pu on the dartboard is rotation invariant, but not regular; in particular, Pu(R) = Pu(R) = 0 for all ,

R.
Defenders of regularity and Euclideanism might find such rotation dependence perfectly natural. After all, R0 contains all of the points in R1 and others as well, so from a Euclidean point of view, R0 ought to be assigned higher probability. But as we have just seen, what seems natural from a regularist or Euclidean perspective has strange and limiting consequences. Objective Bayesians like Jaynes would say that if we lack other information, then rationally, we must adopt rotationally symmetric probabilities for our dart experiment. But even if we dont accept Jayness arguments, we can surely imagine situations where rotationally symmetric probabilities are appropriate. Suppose a needle

rotates at a constant speed for several revolutions. Elsewhere, in a small vacuum, pairs of particles spontaneously appear and annihilate each other. What is the probability that one such vacuum fluctuation begins when our needle is at an angle A? Other things being equal, shouldnt it be the same as the probability that a fluctuation begins when the needle is at A + = { + : A}? But given regularity, this cant be so for all sets A [0, 2). But it gets worse. Not only rotations on a dartboard or spinner, but even translations and reflections on an interval fail to preserve Euclidean probabilities. Let us say T: [0, 1) [0, 1) is a two-piece translation on [0, 1) if Tx = x + c (mod 1) = (x + c if x + c < 1, x + c 1 otherwise) for some c [0, 1). So a two-piece translation is a piecewise translation, so to speak, made up of two translations T|[0, 1 c) and T|[1 c, 1). Like rotations of the circle, two-piece translations on [0, 1) generally fail to preserve regular probabilities. For let Tx = x + c (mod 1) with c irrational. Let Sx = {x + nc (mod 1): n N} for all x. Then TSx = {x + (n + 1)c (mod 1): n N} Sx. If P is regular, then it is Euclidean, so if P(Sx) and P(TSx) are defined, P(TSx) < P(Sx). Hence T does not preserve P. This implies that ordinary translations dont preserve regular probabilities either. Assume that P([0, 1 c)) is also defined. Since P(Sx) P(TSx), either P(T|[0, 1 c)(Sx|[0, 1
c)))

P(Sx|[0, 1 c)) or P(T|[1 c, 1)(Sx|[1 c, 1))) P(Sx|[1 c, 1)), by additivity. (Here A|B = A

B if A is a set and not a function.) So at least one of the translations T|[0, 1 c), T|[1 c, 1) fails to preserve P. Thus translations do not preserve weakly regular probabilities that are

defined on simple sets like Sx, TSx, and [0, 1 c). A fortiori, strongly regular probabilities on an interval are never preserved by all translations. Reflections on an interval dont preserve regular probabilities either, for every translation is a composition of two reflections. Further, if T translates a set A within [0, 1), we can easily find two reflections T1 and T2 defined on [0, 1) such that T1A [0, 1) and T2T1A = TA.3 But if P(A) P(TA) = P(T2T1A), then at least one of the two reflections T1 or T2 fails to preserve P. Since translations within [0, 1) dont preserve regular probabilities, reflections within [0, 1) dont either. Notice that we are no longer talking about cases where TA is a subset of A, and hence where the intuitions supporting regularity and Common Notion 5 directly justify a difference in probabilities. In fact, if P is defined on all singletons and intervals in [0, 1), and there is a set A [0, 1) and a translation or reflection T such that P(A) P(TA), then additivity implies that there are disjoint sets B and TB [0, 1) such that P(B) P(TB).4 So if a regular probability measure is defined on all singletons and intervals (as continuous probability measures normally are) then for any translation T and reflection T', there are disjoint sets A and TA that differ in probability, and disjoint sets B and T'B

To see this, let Tx = x + c and A [0, 1) such that TA [0, 1). Choose [a, b] [0, 1] such A [a, b] and T[a, b] [0, 1). Let T1 be the reflection T1x = a + b + c/2 x and T2x = a + b + 3c/2 x. Then T1[a, b] = [a + c/2, b + c/2] [0, 1), soT1A [0, 1), and T2T1A = TA.
4

Proof: Let T be a translation Tx = x r with r [0, 1). Suppose A, TA [0, 1) and P(A) P(TA). Choose n N so that 1/n < r. Then for each x, A[x, x + 1/n) T(A[x, x + 1/n)) = , and by finite additivity, m {1, 2,..., n}P(A[m, m + 1/n)) = P(A) P(TA) = m {1, 2,..., n}P(T(A[m, m + 1/n))). So for at least one m, P(A[m, 1/n)) P(T(A[m, 1/n))), and hence some translation of a subset B of A, disjoint from B, has a different probability from B. If T is a reflection Tx = (1 x) + r, again with r (1, 1), then T is a reflection about the fixed point (1+ r)/2, so A[0, (1 + r)/2) T(A[0, (1 + r)/2)) = A((1 + r)/2, 1) T(A((1 + r)/2, 1)) = , and by additivity, P(A[0, (1 + r)/2)) + P({(1 + r)/2}) + P(A((1 + r)/2, 1)) = P(A) P(TA) = P(T(A[0, (1 + r)/2))) + P(T{(1 + r)/2}) + P(T(A[(1 + r)/2, 1))). Since (1 + r)/2 = T((1 + r)/2), either P(A[0, (1 + r)/2)) P(T(A[0, (1 + r)/2))) or P(A[(1 + r)/2, 1)) P(T(A[(1 + r)/2, 1))). So in either case, we must have disjoint and isometric subsets of [0, 1) with different probabilities.

that differ in probability. These inconvenient inequities cannot be made more palatable by pointing out that A and B include their own images, because they dont. So on the regularist or Euclidean view, it is impossible to choose a random number in the interval so that no set is privileged over any of its disjoint translations, nor over its disjoint reflections. We cannot throw a dart at a rectangular dartboard in such a way that it is as likely to hit a point with x-coordinate in a set A as in a translation or reflection TA. Likewise, if quantum fluctuations occur in a small cubic vacuum with Cartesian coordinates, there will be bounded sets A of x-coordinates such that a fluctuation is slightly more likely to occur at a point in A than in certain translations and reflections of A, and similarly there will be bounded sets B of times such that a fluctuation is more likely to occur at a time t B than in certain translations and reflections of B. Furthermore, regularist arguments imply that rational priors must be regular, so certain sets of events must be given higher credence than some of their translations and reflections.

Let us now relate these observations to the current debate on regular probabilities. Williamson (2007) considers a countably infinite sequence S of independent coin flips, and the proper subsequence S' beginning with the second flip in S. Let us write H(S) for the proposition that a sequence S of coin flips comes out all heads. For the regularist, 0 < P(H(S)) < P(H(S')). Williamson argues very effectively (though in a tentative, openminded spirit) that this is mistaken. By hypothesis, the two sequences of flips are

identical in their qualitative physical properties. If physical circumstances determine chances (and by the Principal Principle, well informed rational credences), then P(H(S)) should not differ at all from P(H(S')), even by an infinitesimal. Likewise, if an entirely separate sequence U of coin flips begins at the same time as S' and carries on in parallel, then P(H(U)) should be the same as P(H(S)) and P(H(S')), provided the physical circumstances are the same in all three cases. Weintraub (2008) responds to Williamson by pointing out that the coin flips in S' and U occur at different times from those in S. (The former times form a proper subset of the latter.) In this way the physical circumstances are different, so there is no paradox if the respective probabilities differ. But this seems to miss an important point. It is a timehonored principle that the laws of physics do not depend on or change over time. If we find that physical systems behave differently at different times, we look for some other change in the physical circumstances besides the time itself. This may be only a methodological convention (Poincar 1911) vulnerable to eventual rejection (Quine 1951), but it is a usefully simplifying one, and to drop it would dramatically change our picture of how the world works. It would amount to saying that the way things behave changes over time for no physical reason at all. So while Williamsons argument doesnt show that regularity is logically paradoxical, it puts the regularist in an awkward dilemma: She must either abandon the standard and sensible principle that physical laws are unchanging and time invariant, or deny that chances are determined by physical circumstances and laws.5

Haverkamp and Shulz (2011) offer a related critique of Weintraubs argument. They argue that the same physical device or set-up ought to produce outcomes with the same probabilities regardless of when it is set running. However, there are difficulties with the idea of having the same physical device implement an infinite process at two different times. If we let the coin flips occur at sorter and shorter intervals, as

In response, regularists might suggest that the local laws governing individual finite-time processes are time invariant, as tradition would have it, but the derived laws governing the probabilities of infinite sequences of events are not. It is an assumption of Williamsons examples that the outcomes of the individual coin tosses are all independent and have exactly the same probability, namely one half. So the regularist must hold that even if these probabilities are determined by physical circumstances and time-invariant laws, the derived probabilities of infinite sequences are not time invariant. In particular, probabilities are partly determined by the principle of regularity itself, and under regularity, the P(H(S)) must be smaller than P(H(S')). For the regularist or Euclidean, this may seem quite natural, for the possible worlds in which H(S) holds are a proper subset of those in which H(S') does. But the situation is different for the physical examples presented here. These concern bounded sets of points in time and space, so they show that, in a sense, regular probabilities must violate even local invariance: Even a set of possible outcomes contained in a very small region of space-time will differ in probability from a very small translation or rotation of that set. And even more strikingly, it will differ in probability from disjoint translations and rotations. This is not particularly troubling for the dart example, for there is no reason to expect that the outcomes of dart throws should be so evenly distributed that translations and rotations always preserve probability. More likely they will be prejudiced toward the centre of the dartboard (or for me, to the bottom left). But for the vacuum fluctuation examples, we can assume identical qualitative physical circumstances at each point in space and time within the vacuum in question, so if the
Haverkamp and Shulz suggest, we lose the strong parallel between one sequence of flips and a later subsequence. The objection to Weintraub made here is more general and avoids these problems.

chance of a vacuum fluctuation occurring within a given set of times or places is determined by qualitative physical circumstances and space- and time-invariant laws, then it should not violate rigid transformation invariance. If instead regularity holds, some sets of outcomes will differ in probability from disjoint rigid transformations of those sets. We cannot make such violations of space and time invariance more palatable by pointing out that the sets in question include or are included in their images, because thats just not so. Another objection that might be brought against Williamson is that his infinite sequences of coin flips are unrealistic or even physically impossible. One cannot flip the same coin under the same circumstances infinitely many times, and arguably, one cannot perform any experiment infinitely many times. Williamson could propose ways around this difficulty, or argue that it misses the point (and I think it does), but our examples avoid it altogether. One does not need to conduct infinitely many chance experiments nor assume infinite time or space in order to show that regular probabilities violate rigid transformation invariance. We need only consider the possible outcomes of one instance of one experiment, conducted in a small region of space-time. Of course, one might argue that our quantum fluctuation examples are not realistic either, for other reasons. (Perhaps quantum fluctuations do not define exact points in space-time.) But we need only find some example of a distribution over exact real values which ought to be translation invariant, or rotation invariant, or reflection invariant, to obtain a similar argument against regularity. Regularity implies the impossibility of a rigid transformation invariant distribution of chances over exact realvalued quantities, and that is a heavy load to bear.

For rational subjective credences, regularity generates another dilemma. The regularist claims that it is irrational to be so convinced of a contingent proposition that one is willing to bet everything for an arbitrarily small return (or none at all), or to be unswayed by any amount of statistical evidence (as zero credence and Bayesianism would imply). But the objective Bayesian claims that in certain situations it is irrational to assign credences that are not translation- or rotation-invariant. Both cannot be right.

References Appiah, A. 1985. Assertion and Conditionals. New York: Cambridge University Press. Benci, V., & Di Nasso, M. 2003. Numerosities of labeled sets: A new way of counting. Advances in Mathematics 173: 5067. Benci, V., Di Nasso, M., and Forti, M. 2007. An Euclidean measure of size for mathematical universes. Logique et Analyse 50: 4362. Bernstein, A. R., and F. Wattenberg. 1969. Non-standard measure theory. In Applications of Model Theory of Algebra, Analysis, and Probability, W. A. J. Luxemburg, ed. New York: Holt, Rinehard and Winston. Carnap, R. 1963. Carnap's Intellectual Autobiography. In The Philosophy of Rudolf Carnap, The Library of Living Philosophers Vol. XI, Paul Arthur Schilpp, ed. Chicago: Open Court. De Finetti, B. 1964. Foresight: Its logical laws, its subjective sources. In Studies in subjective probability, H. Kyburg and H. Smokler, eds., 93158. Huntington, NY: Krieger.

Edwards, W., Lindman, H., and Savage, L. J. 1963. Bayesian statistical inference for psychological research, Psychological Review 70, 193-242. Hjek, A. 2012. Staying Regular? http://fitelson.org/few/hajek_paper.pdf. Retreived 24 May. Haverkamp, N., and Schulz, M. 2012. A note on comparative probability. Erkenntnis 76: 395402. Jackson, F. 1987. Conditionals. Blackwell. Jaynes, E.T. 1968. Prior probabilities. IEEE Transactions on Systems Science and Cybernetics 4, 227-241. -----. 1973. The well-posed problem. Foundations of Physics 3, 477-492. Jeffrey, R. 1992. Probability and the Art of Judgment. Cambridge: Cambridge University Press. Jeffreys, H. 1961. Theory of Probability, 3rd ed. Oxford: Clarendon Press. Katz, F. M. 1981. Sets and Their Sizes. Ph.D. Dissertation, MIT. Available from: http://citeseerx.ist.psu.edu/viewdoc/summary?doi= 10.1.1.28.7026. Kemeny, J. G. 1955. Fair bets and inductive probabilities. The Journal of Symbolic Logic 20: 263273. Keynes, J.M. 1921. A Treatise on Probability. Lewis, D. 1980. A subjectivists guide to objective chance. In Studies in Inductive Logic and Probability, v. II. R.C. Jeffrey, ed. Berkeley and Los Angeles: University of California Press.

Mancosu, P. 2009. Measuring the size of infinite collections of natural numbers: Was Cantors theory of infinite number inevitable? Review of Symbolic Logic 2: 612 646. Mayberry, J. 2000. The Foundations of Mathematics in the Theory of Sets. Cambridge: Cambridge University Press. Nelson, E. 1987. Radically elementary probability theory. Princeton: Princeton University Press (1987) Princeton, NJ. Parker, M. 2009. Philosophical method and Galileos paradox of infinity. In New Perspectives on Mathematical Practices, B. van Kerkhove, ed., 76113. Hackensack, NJ: World Scientific. ----. 2012. Set size and the part-whole principle. Submitted to Review of Symbolic Logic. Poincar, H. 1911. Levolution des lois. Scientia 9: 275-92. Trans. in Mathematics and Science: Last Essays. New York: Dover, 1963. Quine, W. 1951. Two dogmas of empiricism. The Philosophical Review 60: 20-43. Shimony, A. 1955. Coherence and the axioms of confirmation. The Journal Of Symbolic Logic 20: 128. Skyrms, B. 1980. Causal Necessity: A Pragmatic Investigation of the Necessity of Laws. Yale University Press: New Haven and London. Stalnaker, R. C. 1970. Probability and conditionals. Philosophy of Science 37: 6480. Uffink, J. 1995. Can the maximum entropy principle be explained as a consistency requirement? Studies in the History and Philosophy of Modern Physics 26: 223261.

*Weintraub, R. 2008. How probable is an infinite sequence of heads? A reply to Williamson. Analysis 68: 24750. Wenmackers, S., & Horsten, L. 2010. Fair infinite lotteries. Synthese (Online First): 1 25. Available from: http://dx.doi.org/10.1007/s11229-010-9836-x. Williamson, T. 2007. How probable is an infinite sequence of heads? Analysis 67: 173 180.

You might also like