Professional Documents
Culture Documents
A New Argument For An Old Causal Decision Theory (Full Text)
A New Argument For An Old Causal Decision Theory (Full Text)
Abstract:
The debate between evidential decision theory (EDT) and causal decision theory (CDT) has been open
since occurrence of the examples such as Newcomb's bomb and Common cause cases. Both of these examples
favor CDT over EDT, and CDT was more accepted theory (at least among philosophers) than EDT.
Recently, important problems for CDT were presented by Andy Egan in his paper "Some
counterexamples to causal decision theory". He concluded that neither the CDT, nor EDT were acceptable
solutions for rational decision theory. More recently, Dorothy Edgington proposed a way of saving CDT from
Egan's counterexamples, while accepting at the same time some of the ideas from Egan's paper.
In this paper, I go beyond Edgington's proposal. Although I do agree with most of the important points
of both Egan's and Edgington's articles, I will argue that their conclusions are not applicable generally, but
only in restricted areas, as shown in the table bellow.
However, I believe that the main ideas from these articles, together with some older points from
Savage's and Jeffrey's work, could form a good basis for accepting a version of CDT that was “in the shadow”
of Lewisian type of CDT, namely the theory proposed by Brian Skyrms in Causal Necessity. I will defend this
idea in the final part of the paper.
Introduction
Causal decision theory has gained popularity during the last two or more decades, winning the
battle against evidential decision theory. The main arguments in favor of CDT in the debate against
EDT were based on the examples such as Newcomb's bomb and Common cause cases. These examples
made a strong case against the strongest version of EDT, introduced by Richard Jeffrey. In section 1, I
offer a brief historical review of the discussion between EDT and CDT, analyzing the relevant
characteristics of both theories, and the main pros & cons.
Section 2 covers recent developments of the discussion. Andy Egan 1 reopened the discussion
with some interesting counterexamples to CDT. His article made CDT look significantly less attractive,
and one could conclude from his arguments that neither EDT nor CDT offered satisfactory solutions
for the decision theory (or that EDT and CDT are equally problematic theories). Dorothy Edgington 2
proposed a way of defending CDT from Egan's arguments. Her defense of CDT could be seen as an
indirect one, because instead of showing that Egan's conclusions are wrong, Edgington argued that his
conclusions could not be applied to CDT in general, but rather to a limited range of versions of CDT.
In section 3, I analyze Edgington's proposal. I will present an argument against one of her
points. At the same time, I will agree with some other points she made, arguing that together with some
older ideas of Jeffrey's and Savage's, it could form a good basis for accepting Skyrms's version of
CDT3. His version was slightly in the shadow of a Lewisian type of CDT in the past, but I think that
development of the debate between EDT and CDT shows that it should reintroduced as legitimate
candidate for a good decision theory. (By Lewisian type I mean the theories that use probability of
subjunctive conditionals as relevant for deciding.) In the final part (section 4) I will defend that claim.
1 Egan, Andy, 2007, “Some Counterexamples to Causal Decision Theory.” Philosophical Review, 116: pp 93–114.
2 Dorothy Edgington, 2011, “ Conditionals, Causation, and Decision”. Analytic Philosophy 52 (2): pp75-87.
3 Skyrms, Brian. 1980. Causal Necessity: A Pragmatic Investigation of the Necessity of Laws. New Haven, CT: Yale
University Press.
1. Some Historical background
We should start this historical review from Savage's decision theory. 4 One reason for that is the
connection between his theory and EDT (Jeffrey introduced EDT to solve troubles he saw in Savage's
work), and the other is the relevance of Savage’s theory for the later part of this paper.
At the start, we define decision problem D (for an agent N), as a {A, S, O}, where:
When risk is involved, the decisions of an agent must be determined with respect to probability of the
states occurring. Savage thought that we should consider the absolute probability of the states of the
world. So, every state si from the set S has some positive probability p(si). There are two principles an
agent should use in a decision situation: maximizing the expected utility (EU 5) and Sure thing
principle. The expected utility is determined with respect to the relevant probability and cardinal utility
of each of the outcomes:
Sure thing principle says that an agent should choose the dominant act if there is such an act in the set
A. From the definition of the dominant act we could infer that the dominant act will always be one with
maximal EU between possible acts (the converse does not hold, since an act with maximal EU doesn't
have to be the dominant one).
Heads Tails
Bet on heads 50 -50
Bet on tails -100 100
Probability of both states is .5, and consequently EU of both acts is 0. That is an intuitive result – for
each of the acts, an agent will lose half of the time the same amount that he wins other half of the time.
Also, if the utilities were as in this matrix:
Heads Tails
Bet on heads 100 50
Bet on tails 10 -10
The dominant act is betting on heads, because in both of the relevant states, we get more if we're
betting on heads. Sure thing principle is valid when it comes to situations like this one.
But, as Jeffrey pointed out in The Logic of decision7 there could be some problems with
Savage's Sure thing principle (and, implicitly, Savage's definition of expected utility) when an agent is
in a slightly different decision situation.
Marlboro man:
Take, for example, a man who is deciding whether to start smoking or not. He prefers smoking
over not smoking if the state of the world is such that he's going to get a lung cancer, and he also
prefers smoking over not smoking if the state of the world is such that he's not going to get a lung
cancer. Utilities of the outcomes are presented by table 3:
6 For the review of Savage's theory, and problems of it, see: Joyce, James. 1999. The Foundations of Causal Decision
Theory. Cambridge: Cambridge University Press.; part 3.4.
7 Jeffrey, Richard. [1965] 1983. The Logic of Decision. Second edition. Chicago: University of Chicago Press.
s1 [Get a lung cancer] s2 [Don't get a lung cancer]
a [Smoke] 2 10
b [Don't smoke] 1 8
Table 3
By Sure thing principle the rational act would be [smoke], because it is the dominant act. By EU [Sav]
an agent should choose [smoke] because that act has higher EU than the other act, for any of the
probability distribution over the set of relevant states of the world.
However, as Jeffrey pointed out, there is a mistake in such an account. In the case of Marlboro man
acts may affect probabilities of the relevant states of the world, since there are more chances to get a
lung cancer if we smoke, than if we don't. Therefore, by Jeffrey, we should consider conditional
utility is: [ Jeff ] EU ( a ) : ∑ p ( si∣a )⋅U ( Si ∧a ) .8 In this particular case, it is clear that p ( s 1 ) could change
if an agent chooses to do some of the possible acts – it gets lower conditional on agent choosing not to
smoke, and higher otherwise. Therefore, the rational act may not be the dominant one, since choosing
that act in this case maximizes our chances to get the outcome we don't desire. Savage's theory does not
detect this connection between the relevant states and the acts, so it endorsed an irrational decision in
such cases. Jeffrey's proposal seemed promising, because it handled the problematic situations and
there were no apparent troubles with handling the situations that Savage's theory handled well.
Soon there were troubles with Jeffrey's account too. The conditional probability that was essential to
Jeffrey's theory, lead to the irrational decisions in some situations:
An agent may choose either to take an opaque box or to take both the opaque box and
a transparent box. The transparent box contains one thousand dollars that the agent plainly
sees. The opaque box contains either nothing or one million dollars, depending on a prediction
already made. The prediction was about the agent's choice. If the prediction was that the agent
will take both boxes, then the opaque box is empty. On the other hand, if the prediction was that
the agent will take just the opaque box, then the opaque box contains a million dollars. The
prediction is reliable. The agent knows all these features of his decision problem. 11
By Jeffrey's account, an agent should take only one box, because prediction is highly
The mistake in the EDT line of thinking is easily detected in these examples: conditional probabilities
reflect the evidential dependence between acts and states, but not a causal one. In cases where the
evidential dependence goes along with the causal relation, it doesn't make any trouble, but in the cases
with a common cause, this mistake is a fatal one.
Gibbard and Harper13 (on Stalnaker's14 proposal) and Lewis15 presented a natural solution for
utility should be: [ CDT ] EU ( a )=∑ p i ( a → si )⋅U ( si ∧a ) . Continuing in the same line, Lewis proposed
a construction of dependency hypothesis K, a proposition which is maximally specific about how things
that the agent cares about depend causally on what the agent does. K is a conjunction of appropriate
subjunctive conditionals of the form: If a were performed, then s would obtain. ( a → s , from now on).
These conditionals must be of the appropriate kind, that is, non-backtracking 17. Expected utility by his
account would be: [ Lew ] EU ( a ) : ∑ p ( K )⋅U ( K∧a ) . As it was said, we have to count the probabilities
of subjunctive conditionals as relevant probabilities, because the dependency hypothesis consists of it.
This was a reasonable solution: there were no troubles with common cause cases or Newcomb's bomb:
in these examples p (a j → si ) ≠ p ( s i∣a j ) and consequently, depending on the exact values of
probabilities, CDT and EDT may choose different acts as rational. In the Newcomb's bomb example
p ( take one box → prediction of one boxing ) <p ( prediction of one boxing∣take one box ) , and the agent
should choose two boxes. In Marlboro man in Fisher's world example, an agent should choose to
smoke. In general case, this conception seemed to handle well the tracking of causal relations, which
12 There were a lot more opinions that EDT endorses the rational act after all in Newcom's bomb then in the Common
cause cases. Altough I don't think that is the case in Newcomb's bomb, it should be noted that the case against EDT
stands even with just Common cause examples, if one has problems with Newcomb's bomb example.
13 Gibbard, Allan and William Harper. [1978] 1981. “Counterfactuals and Two Kinds of Expected Utility.” In William
Harper, Robert Stalnaker, and Glenn Pearce, eds., Ifs: Conditionals, Belief, Decision, Chance, and Time, pp. 153–190.
Dordrecht: Reidel.
14 Stalnaker, Robert. [1972] 1981. “Letter to David Lewis.” In William Harper, Robert Stalnaker, and Glenn Pearce,
eds., Ifs: Conditionals, Belief, Decision, Chance, and Time, pp. 151–152. Dordrecht: Reidel.
15 Lewis, David. 1981. “Causal Decision Theory.” Australasian Journal of Philosophy, 59: 5–30.
16 For convenience sake, I use the same letters a and s both for the act/state and for the propositions saying that the act is
chosen/ the state obtains.
17 Op. cit. pp 8-14.
was the main problem for EDT.
2. Recent developments
Again, let us change the causal structure of the smoking example. Instead of a desire to smoke
and a lung cancer having a common cause, imagine that there is a genetic condition that
causes one to smoke and causes one's lungs to be vulnerable to tobacco smoke, in such way
that smoking causes lung cancer in those with genetic condition, but not in those without it.
Mary is debating whether to shoot Alfred. If she shoots and hits, things will be very good for
her. If she shoots and misses, things will be very bad. (Alfred always finds out about
unsuccessful assassination attempts, and he is sensitive about such things.) If she doesn’t
shoot, things will go on in the usual, okay-but-not-great kind of way. She thinks that it is very
likely that, if she were to shoot, then she would hit. So far, so good. But Mary also knows that
there is a certain sort of brain lesion that tends to cause both murder attempts and bad aim at
the critical moment. Happily for most of us (but not so happily for Mary) most shooters have this
lesion, and so most shooters miss. Should Mary shoot?
For CDT (at least for the Lewisian type of CDT) it is essential to use non-conditional probabilities of
appropriate subjunctive conditionals20 (plus, there is a ban on subjunctive conditionals of backtracking
kind). In Egan's example(s) that is exactly where the problem lies. Take The murder lesion case:
Relevant partition of the dependency hypothesis is { Shoot → hit;Shoot → ¬hit }. Probabilities of that
partition of the dependency hypothesis are:
p ( S → H )>. 5
p ( S → ¬H )<. 5
p ( S → H∣S )<. 5
p ( S → ¬H∣S )>. 5
In the smoking example, there is a similar situation. Relevant partition of dependency hypothesis in this
case is: { S → LC;S →¬ LC;¬S → LC;¬S →¬ LC } (S stands for smoke, LC stands for a lung cancer).
Non-conditional probabilities will be:
p ( S → LC ) <.5
p ( S → ¬LC )>.5
Because of this, CDT still endorses smoking as a rational solution (again, take utilities of the outcomes
from table 3). But, the conditional probability is:
P ( S → LC∣S )>. 5
P ( S → ¬LC∣S ) >.5
a s
a
cc
s
Egan's smoking case
a
cc s
ec
Figure 1.
In Figure 1, we could see one important aspect of Egan's critique – transformation of examples against
EDT to examples against CDT, by changing the causal structure of the example. We just have to
change Common cause case, in this manner: the common cause now puts in place an enabling
condition which allows act a to cause state s.
Dorothy Edgington raised several important points to defend the causal decision theory in her
article: “Conditionals, causation and decision”21. Her thesis is that Egan's counterexamples present
problems for counterfactual part22 of causal decision theory, not causal decision theory in general as
Egan thought.23
By Edgington's account, the trouble for CDT of Lewisian type in Egan's examples is that “no
two contingent, logically independent propositions are probabilistically independent in all probability
distributions”24. In Egan's case of smoking, take A to be a proposition that I will start smoking, and X be
a conditional that if I were to start smoking, I wouldn't get a lung cancer. In most of ¬ A -worlds, if I
were to start smoking, I wouldn't get a lung cancer. (because of genetic condition that causes one to
smoke and vulnerability of lungs to tobacco smoke. If I don't smoke, then I probably don't have genetic
condition, so if I were to start smoking, I wouldn't get cancer). But, in most of A-worlds, if I were to
start smoking, I would get lung cancer. Egan exploits the fact that probability for not getting lung
cancer if we were to smoke comes mainly from ¬ A - worlds.
A-worlds ⅂A-worlds
When we make decisions, we want all the information about the causal structure, on the assumption
that A (i.e. we want information from A-worlds). Edgington concludes this part of her paper with an
advice to use conditional probabilities because: “it is sometimes essential to use conditional
probabilities, and it never does any harm [to use conditional probabilities] (provided we are focusing on
the restricted category based on causal relations). 26“
But, we shouldn't just go back to EDT and Jeffrey – we need conditional probabilities of a
causal kind. Our goal is up-front approach about causation. That means that we need conditional
statements like:
According to this, the goal is to use conditional probabilities (because of Egan's counterexamples, and
the fact that probability of an appropriate subjunctive conditional may get most of the probability from
the case where its antecedent is false – which is worthless when we are in a decision situation), and be
up-front about causation. In that line, Edgington concludes that we need to assess the probability, on
the assumption that I do a, that s will be a causal consequence 27. She proposes that we should consider
25 Ibid.
26 Ibid.
27 Ibid.
p ( s as a result of a∣a ) . For all of the examples we examined so far her suggestion is a good one:
Together with these probabilities, her suggestion endorses the rational act in each of the cases. In the
regular smoking case the probabilities are like Jeffrey's, and so is the solution; in the Fisher's smoking
case, probabilities are different then Jeffrey's, and the rational act is [smoke], and in the Egan's case the
probabilities revert to Jeffrey's, and the rational act is once again [not smoke]. So far, so good.
While the suggestion made by Edgington works well in the situations that presented trouble for each of
the earlier decision theories, it fails, surprisingly, to provide good solution for the class of decision
problems raised by Savage, in the begging of the formal decision theory debate. Consider this example:
Clearly, the rational act is betting on non-heart. It’s a “fair deck”, so there is equal probability of
choosing a heart as any of three other signs. Therefore, for a random card to be heart, it is .25 chances,
and .75 for a random card not to be heart. Therefore, EU (a) is 3.5, and EU (b) is 7, considering utilities
of the outcomes. But, we cannot come to that solution with either of proposals Edgington made. Take
the second: both p ( a doesn't affect s 1, s2∣a ) and p ( b doesn't affect s 1, s2∣b ) will equal 1, and that
information is of no help to us. We need information on probability that random card from the deck
will be heart or it won't be heart, not information on causal connection between act of betting on heart
and choosing heart.
Let us consider another example: we decide whether to learn for exams. Naturally, we prefer
good grades over bad ones, and think that we will get a good grade if we get questions that we know
how to answer, and a bad grade otherwise. In this example, an information on probability of causal
connection between act of learning and state of getting the right question (one on which we know the
answer) is relevant.
But there is important underlying difference between the case with the cards, and the case with
exams. Of course, in both cases, we want to maximize our utility, but the ways of doing so are
different: in the first type of cases, we recognize that there is no causal connection between acts and
states, and we want to use an existing state of the world for our utility; in the second type of cases, we
want to manipulate (or change) the state of the world in favor of our utility, because we recognize that
now there is a causal connection. Edgington's proposal makes a lot sense for all of the second type
situations – conditional probability doesn't make any damage; and we're up-front about causation,
unlike Lewisian type of CDT. But there are problems with the first type of situations. What went wrong
there?
First, we should note that by this classification, the Fisher's smoking case is the first type
situation. Edgington's proposal worked well for that example, but it was incidental. In that particular
case, the proposal worked because the utilities are such that we're not interested in probabilities if there
is no causal connection between acts and states. That is, if the probability for getting a lung cancer
doesn't change with the respect to the acts we can do, the exact information of that probability is
irrelevant in this particular case. The act [smoke] is dominant, and if there is no causal dependence
between acts and states, the dominant act is one with the highest EU, which makes the exact
probabilities irrelevant. Because of that characteristic of the example, the information
p ( S doesn't affect LC, ¬LC∣S )=1 is useless, although at first it appeared to be useful because it
favored the right decision. For the most of the first type situations Edgington's proposal doesn’t work.
Consider another example:
We are learning for the exam in the Medieval philosophy. Because we were partying last seven
days, we're in a bad situation – we have just 6 hours to learn, and have no doubt that we are
not going to learn all questions that could come in the exam. Let us simplify the case: there are
two group of question on the exam (the first is about St. Anselm's Ontological proof, and the
other is about St. Augustine's theory of time), and we could learn just one group.
We also know that professor gives one group of questions more frequently than the other, but
we forgot which one. So we call our friend to ask information about probability that is relevant
to our decision between learning one of two groups of questions. What we should ask a friend
before make a decision:
“What is the probability that professor will give “St. Anselm questions” and what probability is
that he will give “St. Augustine questions”? “
“What is the probability that getting “St. Anselm questions” will be causal consequence of our
decision to learn “St. Anselm questions”, conditional on our learning “St. Anselm questions”?”
“What is the probability that our decision to learn “St. Anselm questions” doesn't affect getting
“St. Anselm questions”, conditional on our learning “St. Anselm questions”?”
By Edgington's proposal, we should make decision with the respect to answer on the latter
group of the questions, not the first question. But it could lead to irrational decisions: let us say
that professor is giving “St. Anselm questions” in 95% of cases, and “St. Augustine questions”
in just 5% of cases. We should be getting the books about St. Anselm in no time, but by
probabilities that Edgington proposes, we should prefer neither learning “St. Anselm questions”
nor learning “St. Augustine questions”, because our decisions don't affect relevant states, and
consequently, the probabilities will be:
p ( s 1 asaresultofa∣a )=0
p ( s 2 as a resultof a∣a ) =0
p ( adoesn'taffects1,s2∣a ) =1
p ( s1asaresultofb∣b )=0
p ( s 2 as a result of b∣b ) =0
Since the utilities are essentially the same for both of the acts, we should prefer none of these
acts if we use these probabilities.
What we see in this case is that these probabilities are of no importance to us when we have the first
type of cases. Using Edgington’s proposal we could end up with the right decision in Fisher's smoking
case, but that is purely incidental, because this case belongs to the first type. Therefore, we should
reject Edgington's proposal, because it could lead to the irrational decisions.
4. Could several proposals with troubles lead to one without
these troubles: Skyrms and his version of CDT
Let us go back to the classification of the types of cases that I've made in the section 3. When
we go from Savage to Edgington, what we see is: Savage handled the first type of situations well, but
not the second. Jeffrey handled the second, but failed with one class of the first type ( common cause
cases). Edgington handled well the second type, and the class that made problems for Jeffrey, but failed
in the case of a different class of the first type of situations. Could we learn some lessons from the
mistakes and/or from the good parts of their proposals?
In what I take29 to be a common everyday decision situation, there are very few mistakes that
we would make if guided by any of the mentioned theories. The reason for that, I think is because in
such situations, we easily choose the right kind of probability. If we recognize that we're in the first
type of situations rarely do we see anyone who's interested in p ( s i∣a ) (or even p ( a → s i ) or
p ( s i as a result of a∣a ) ) , but we want to know p ( s i ) . Let us consider again the Learning for the
exams with a little time left example. We will never ask our friend for some kind of a conditional
probability, or probability of some conditional in that situation – we will ask our friend for the
probability of professor giving one group of the questions or the other. And that makes all sense it
If, on the other hand, we recognize that we are in the second type of situations, then often we'll
be interested in appropriate conditional probability. 30 Let us say that in the game of Texas Holde'm
poker we are deciding whether to pull a big bluff or a small bluff on the river (last round of the hand).
We aren't interested in absolute probability of opponent folding his hand on the river, but in the
probability that he will fold conditional on our big bluff act or a small bluff act.
That is a way of doing what Edgington said: we need conditional probabilities, of causal sort,
i.e. we build our treatment of the situations on our information on causal structure 31, and then
(implicitly) choose the theory that could handle that particular type of the situation. But, this shows us a
little more than just the everyday way of doing what Edgington proposed – it shows one mistake in the
line of searching for the solution of what relevant probability is in decision situations. Through the
debate between EDT and CDT, the main question was: “Which probability we should use in the
decision situations?”. But, that question isn't specific enough. We should instead look for the answer on
the question: “Which probability we should use in this particular type of the decision situations?” The
answer is given in accordance with our knowledge (or beliefs) relevant for particular decision situation.
The causal part comes here: we determine the type of situation, and consequently, the relevant type of
probability, according to our knowledge (or beliefs) of the causal aspects of particular decision
situation.
We could learn from mistakes and good parts of the previous proposals. Egan's arguments and
Edgington's ideas favor the use of conditional probabilities, but with attention to causal structure.
Savage's and Jeffrey's theories handle well some types of situations, but not all. We should use good
parts of these theories, with attention to causal structures. All that together could lead us to just one
thing: Skyrms's version of CDT, from his book Causal Necessity.32 Skyrms proposes constructing two
partitions relevant to our decisions: Ki, that is a maximally specific proposition about the factors which
lie outside of the field of our influence in current decision problem; and Sj, that is a maximally specific
30 This point is far less intuitive than the first one. But, from my experience, this is a common everyday decision process.
31 Someone may see a problem here – we want to have formed beliefs on causal structure before making decisions. But it
is common basis for all theories – all counterexamples presented take that agent have formed beliefs on causal structure
of the world.
32 Skyrms, op cit.; see part IIC.
proposition about the factors which lie within the field of our influence in current decision problem. 33
According to that, the expected utility is34:
In a situation where all the factors lie outside of the field of our influence, the formula shrinks down to
Savages EU (because Sj is an empty partition), while in the opposite situation, it shrinks down to
Jeffrey's EU. And that is exactly what Edgington proposes – considering conditional probabilities of a
causal sort. Skyrms's theory also answers (although indirectly) a more specific question about the
probabilities – “Which probability we should use in this particular type of the decision situation?”.
If we accept that we should maximize our [ Sky ] EU , then we avoid the troubles of the previous
theories. In Egan's counterexamples relevant factors lie within the field of our influence. Therefore, K
partition is empty, and S partition consists of {getting a lung cancer; not getting a lung cancer}. Since
K partition is empty, we use just conditional probabilities, and the act with the highest [ Sky ] EU is [not
smoke] (similar to that, in The Murder lesion example, we shouldn't shoot, because conditional on
shooting, it is very unlikely that we are going to hit). In Fisher's case, relevant factors lie outside the
field of our influence, and now K partition consists of {getting a lung cancer, not getting a lung
cancer}. Therefore, we should consider just p ( lung cancer ) and p ( not lung cancer ) , so it is rational
to smoke. Here we could see another benefit of using [ Sky ] EU - we save Sure thing Principle, but
with a limited scope. When all factors lie outside of the field of our influence, the dominant act (if there
is such an act) is always one with the highest EU, and in this type of cases we should use Sure thing
principle. And, of course, in the regular smoking case, factors lie again within the field of our
influence, and the solution is the same as Jeffrey's.35