Download as pdf or txt
Download as pdf or txt
You are on page 1of 22

GSOE9210 Topic 7 Lecture Slides

Decisions and uncertainty


People nd deciding dicult when uncertain outcomes are at stake. It can be di-
cult even with homogenous outcomes, and much more so when the outcomes belong
to dierent categories. The diculty is compounded when the degree of uncertainty is
unknown.
Two illustrative scenarios
Example (Ex 1.11.1) An urn contains 30 Red (R) balls and 60 more Yellow (Y) and Black
(B) in an unknown proportion. If a bet is proposed on the colour of a randomly drawn
ball, most people like to rely on a known quantity and prefer R > B and also R > Y.
Implicitly, they assume Prob(R) > Prob(B) and Prob(R) > Prob(Y) - impossible with the
numbers of balls as given.
Paradoxically, if the betting propositions are reversed they prefer non-Red to non-Black
(and also to non-Yellow), implying their belief that Prob(R) > Prob(B). But this is
the same as
1 Prob(R) > 1 Prob(B) Prob(R) < Prob(B)
This paradox (due to D Ellsberg) can be termed the fear of unknown.
Example (based on Ex 1.11.2) Two patients are given a chance of survival
1
2
and
5
6
re-
spectively, without treatment. There is a treatment that will improve that chance by
1
6
(a)
1
2

2
3
(b)
5
6
1 (= 100%),
but available only to one patient. Which one should be treated? (It is a typical case of
medical triage.)
2
Leaving emotions aside, one has to decide at what to aim - what objective to maximise.
We compare (a) and (b) and also the () null option (fair ?) of not treating anyone (as
we cannot treat both).
Expected number of survivors
(a)
2
3
1 +
5
6
1 = 1.5
(b)
1
2
1 + 1 1 = 1.5
()
1
2
1 +
5
6
1 = 1.33
Probability that someone survives
It is 1 Prob(both die).
(a) 1
1
3

1
6
=
17
18
0.94
(b) 1
1
2
0 = 1 = 1.00
() 1
1
2

1
6
=
11
12
0.92
Probability that both survive
(a)
2
3

5
6
=
10
18
0.55
(b)
1
2
1 =
1
2
= 0.50
()
1
2

5
6
=
5
12
0.42
3
Strict uncertainty
We consider that we choose one of the actions a
1
, . . . , a
k
and, independently Nature de-
cides on the state of the (relevant) world, one of
1
, . . . ,
l
. Nature is deemed indierent,
not hostile (malevolent) nor helpful (benevolent).
At the time of decision we know the possible states of Nature. For each combination
(a
i
,
j
) we know, in advance, the outcome to us. We deed these outcomes directly com-
parable, for example summarised in monetary terms.
For now we suppose that we know nothing about the likelihoods of the possible Nature
states. Thus, we cannot speak of a state being more/less likely than any other; or it being
very/extremely likely/unlikely.
This is the basic framework; however, it represents great many real-life situations. It
permits for using a simple tabular form for the formal analysis. More complex are the
scenarios where the (eventuating) states of the world would depend on our decision
choice. Their analysis requires using decision trees and, in general, cannot be folded
into a tabular form. In particular, tables cannot easily include impossible states - those
that cannot occur for a given decision. This is why decision trees are strictly more
powerful than decision tables.
4
Standard decision criteria
The setting for deciding can be compressed to a table

1
. . . . . . . . . . .
l
a
1
v
11
. . . . . . . . . . . . . v
1l
.
.
.
.
.
.
.
.
.
.
.
.
a
j
v
j1
. . . v
ji
. . . v
jl
.
.
.
.
.
.
.
.
.
.
.
.
a
k
v
k1
. . . . . . . . . . . v
kl
with v
i j
- values of the consequences.
For every decision a
i
we have its worst possible outcome, the resulting value termed the
security level of a
i
s
i
= min
j
v
i j
(= min

j
v
i j
)
We also have the best possible outcome, its value termed the optimism level of a
i
o
i
= max
j
v
i j
(= max

j
v
i j
)
5
single outcome rules
Maximin rule (most pessimistic)
Choose a
k
st whose security level is as high as possible.
That is, look for k where
s
k
= max
a
i
s
i
= max
a
i
min

j
v
i j
If there is more than one maximising choice (several such k), break the tie arbitrarily.
Maximax rule (utopian, most optimistic)
Choose a
k
st whose optimism level is as high as possible.
That is, look for k where
o
k
= max
a
i
o
i
= max
a
i
max

j
v
i j
If there is more than one such maximising choice, break the tie arbitrarily.
Leximin rule
If there are multiple candidate actions, iterate the maximin procedure looking at the
second greatest, third greatest, etc. entry among its possible outcomes.
Important: look at the second (third, . . . ) largest entry - not at the second (third, . . . )
largest value.
6
averaging rules
Insufficient reason (indifference) rule
We can postulate a make-believe scenario of likelihoods and assume that every relevant
state of nature is equally likely. This is really a limited probabilistic model with equal
risks replacing uncertainties. Here we compute the average of all consequence values
av
i
=
v
i1
+ . . . + v
il
l
.
Choose action a
k
st its mean outcome (average of consequence values)
av
k
=
v
k1
+ + v
kl
l
is as high as possible.
Optimism-pessimism indexed rule
Instead of averaging all the possible outcomes (for a given action) we can look only
at the extreme outcomes and average them. Also, we can give dierent weights to the
greatest and least valued outcomes.
Select, beforehand, a number as your optimism
1
level, 0 1. Choose a
k
that
maximises o
k
+ (1 )s
k
. That is, look for
max
a
i
(max

v + (1 ) min

v)
In practice, = 0.5 is most common.
1
The principal reference uses as pessimism level, but this is uncommon.
7
regret centered rules
Instead of basing the choice on the outcome values, we can use outcome regrets (aka
opportunity losses). For every v
i j
(ie pair (a
i
,
j
)) dene
r
i j
= (max
a
i
v
i j
) v
i j
and replace the decision table with a regret table. Note that the maxima are computed
for each column separately.
All the criteria can be adopted to reasoning with regrets. As we want to minimise the
regrets (instead of maximising the outcomes), the various rules would usually switch the
role of maximising/minimising as compared to their direct counterparts.
In practice only one regret-based rule is used. We dene the (worst) regret of action a
i
as
i
= max

j
r
i j
.
Minimax regret rule
Choose a
k
for which the regret is minimal. That is, look for

k
= min
a
i
max

j
r
i j
8
Regret and satisfaction
Given a decision table D we denote by R(D) its regret table. This process can be iterated
- we can consider R(R(D)) the regret table based on a regret table.
We shall term it the satisfaction table S (D). The reason is that its entries are obtained
by subtracting from every entry in D the minimal value in its respective column. If there
are n actions a
1
, . . . , a
n
and m possible states of nature, we denoted by v
i j
the value
of the outcome of action a
i
given the state
j
. The corresponding entries for R(D) are
r
i j
= max
z
(v
z j
) v
i j
. For S (D) they are s
i j
= v
i j
min
z
(v
z j
).
Thus the regret expresses the dierence to the best possible outcome (if we could have
predicted it!), while the satisfaction measures the dierence from the worst possible
outcome (that we have managed to avoid).
Continuing to apply the regret operator R() does not produce anything new
R(R(R(D))) = R(D), S (S (D)) = S (D).
9
When outcomes have equal values
Entries in a decision table are, typically, real numbers and thus deemed (in principle)
never to be equal. However, there are some exceptions
Classroom, toy examples use integers.
Entries did arise from a sequential decision process; converting it to a strategic
(tabular) presentation may lead to several values being identical.
Some entries were very close; it might be prudent to round them to a common
equal value.
Presence of of such equal entries may lead to decision ambiguities - two actions score
the same value. The simplest way of resolving such ties is to use a form of lexicographic
ordering. We can dene generalisations
leximin - for maximin
leximax - for maximax
lexi-- for -optimism-pessimism
lexi-regret- for minimax regret
Leximin iterates the maximin procedure - it looks rst for the action for which the min-
imum outcome has the highest value. If such an action is not uniquely dened, then
it looks at the next-to-minimum value and then selects the action with the highest such
value. And so on.
The only time this procedure fails to select a unique action is when all the entries for
one action match those for another. These entires need not be in the same order - we still
consider such actions equally good as we act under (strict) uncertainty.
Leximax iterates the maximax in a very similar way - it looks at the best outcomes rst;
if they lead to a tie, it looks at the second-best outcomes and so on.
Lexi-regret applies this last procedure to the regret values rather than to the entires them-
selves.
10
Optimising randomised decisions
A (much) more important extension comes from a possibility of optimising a choice of
the action on average. For example

1

2
min
i

i
a
1
0 2 0
a
2
2 0 0
1
2
a
1
+
1
2
a
2
1 1 1
Clearly, averaging of our action choice appear benecial - as long as we can make sense
out of such interpolated behaviour. For a single instance of decision-making such a 50
50 action makes little sense. However, if we face this choice situation repeatedly, we can
randomise between actions a
1
and a
2
. Then, in a long run, this randomised behaviour
gives a better expected minimal value, thus it is a better way to proceed.
To nd the best randomised mix is not trivial. Fortunately, it turns out to be equivalent to
solving a zero-sum game based on the matrix of the outcome values. We shall discuss,
in detail, how to achieve it in the game theory part of the course.
There can be an objection to this (raised in the reference) - if we observe this situation
great many times, we can learn something useful about the frequencies of the states of
nature and advance from decision under uncertainty to a decision under risk. Still, we
may prefer to decide, intentionally, without paying attention to those frequencies. For
example, we may seek a randomised action that produces a certain minimal outcome for
every state of nature separately.
11
Discrepancy among decisions
These criteria may disagree, giving dierent recommendations with the same data. The
table below (suggested by J Milnor) shows how, with four choices of action, each of
them could be preferred under one of the criteria.
decision table
outcomes
1

2

3

4
s
i
o
i
av
i
a
1
2 2 0 1 0 2
5
4
a
2
1 1 1 1 1 1 1
a
3
0 4 0 0 0 4 1
a
4
1 3 0 0 0 3 1
regret table
regrets
1

2

3

4
max
i
a
1
0 2 1 0 2
a
2
1 3 0 0 3
a
3
2 0 1 1 2
a
4
1 1 1 1 1
All four criteria give dierent answers.
Minimum regret is not the same as minimax return.
The latter would mean (in a utopian world) that the Nature would choose its state
to maximally benet the action that we chose; at the same time we would select
the action that gains us the least of those benets.
Sort of: The Nature is benevolent, but we are meek and take no advantage.
Optimism-pessimism index - represents the optimism level.
12
Choosing decision methods
Four main criteria can give dierent answers from the same inputs. A simple example
with four choices and all dierent results is in Tables 2.4 and 2.5 of the Handout.
Therefore, we could try to look for the most reasonable criterion (one of the four above or
perhaps some new one, yet to be discovered). Reasonableness means satisfying certain
natural criteria and the following eight seem non-controversial:
(I) Complete ranking of all actions
(II) Independence of labeling
(III) Invariance wrt value scale
(IV) Strong domination
(V) Independence of spurious alternatives (additional choices)
(VI) Invariance under uniform change of values of a state
(VII) Invariance under reordering of a row
(VIII) Independence of multiple listing of any state
13
Table 2.7 (in Sec. 2.5) shows that each of the four main decision rules satises only seven
out of these eight criteria. It turns out that we cannot do any better. There holds an
Impossibility theorem
No decision strategy can satisfy all criteria (IVIII).
Well see later (Problem 2.7.7) that if we would agree to reason under risk (postulate the
probabilities of the states), then the MEU method would satisfy axioms (I)(VI).
Such probability-based methods can never be expected to satisfy the remainng axioms.
These axioms (VII and VIII) specically enforce our complete ignorance of risk values.
To use them would mean that we can damage the probability distribution and still come
to the same decision.
14
Rationale for the axioms
complete ranking
It contains two provisions:
any two actions are comparable
a
i
> a
j
or a
i
a
j
or a
i
a
j
there is no cycle
a
i
> a
j
and a
i
a
j
is not allowed
Humans often have diculty to be so decisive. It is especially visible when explicit
degree of risk is present: someone may prefer 7to J if 7is certain, but change if even
miniscule risk attaches to 7.
independence of labeling
Order of presentation should not matter, neither the order of options nor the order of
states of nature to which we respond.
In practice, most people are swayed quite easily by reordering the list of possible actions.
15
independence of rewards/losses scale
The requirement is that changing linearly all values
v
i j
v
i j
+ , > 0, any
does not aect anything.
The rst part - multiplying by > 0 is normally observed. It is like expressing the value
in a dierent currency - changing from dollars to euros - which does not aect anything.
The second part might lead to practical diculties - $5 and $10 do not compare in the
same way as $1,000,005 and $1,000,010.
strong domination
It states that if v(a
i
, ) > v(a
j
, ) for all states then a
i
must be preferred to a
j
. The
general idea is that the dominated actions can be removed from further consideration.
There is also a notion of weak domination. It states that if v(a
i
, ) v(a
j
, ) for all states
and, at least once, v(a
i
,
0
) > v(a
j
,
0
) then a
i
must be preferred to a
j
. This second
assertion is logically weaker (more permissive) than the rst.
In some settings one can insist on removing weakly dominated actions as well; how-
ever, it may lead to complications (discussed later in Game Theory part). Additional
comments are in Ex. 2.7.10.
16
independence of irrelevant alternatives
There is more to it than the example in the reference, which illustrates the
principle of independence of irrelevant alternatives.
The axiom requires that the values for each action can be computed in isolation from
other actions.
And there is a cute rationalisation of that irrational ip between chicken and beef din-
ners.
independence of augmenting a state (column)
If for one state of nature suddenly brings additional c units to all actions payos, we
should not change our decision.
Again, in practice people react dierently when comparing small values v and w and
large v + 10
k
and w + 10
k
.
The reactions can also readily change for small values of c when there is a change of
sign. One may compare dierently $10 and $20 to $5 vs. $5.
17
The last two axioms make sure that we indeed act under strict uncertainty.
independence of rearrangement of payoffs for the action
We should be completely indierent to which state of nature may produce what value,
as long as the same collection of values is available.
It implies that there can be no sentiment, none whatsoever, toward any state of nature
being even a little more plausible than another. And the payo values must summarise
all that there is to the decision.
independence of state (column) repetition
Firstly, it means that multiple listing of the same state does not change anything.
More relevant, it means that if two dierent states of nature and

have the same
outcomes for each of possible action, then one of these states can be disregarded.
18
Discussion of problems
2
in Sec. 2.7
Ex. 2.7.1
(i) maximin choose a
3
as its security level is 1 = min

u(a
3
, )
(ii) optimism-pessimism if the index is then a
3
is strictly preferred for all < 1
and weakly for = 0 (jointly with a
1
).
Such a superiority of the single action is due to
max

u(a
3
) max

u(a
i
), min

u(a
3
) min

u(a
i
)
(iii) minimax regret choose a
4
It has the least dierence between the best and worst outcomes.
(iv) mean (average) return choose a
1
The average of the a
1
-row is the greatest
Ex. 2.7.2
(a) maximin: a
1
if x 2, a
2
if x 2
(b) 5050 optimism - =
1
2
: a
1
if x 7, a
3
if x 7
(c) mean return: a
1
if x 3, a
4
if x 3
Note: a
4
> a
2
, a
3
2
Problems not in the course contents are omitted.
19
(d) minimax regret
x 6 x 6
max(6 x, 3) 3
5 max(x 2, 5)
4 max(x 3, 4)
6 max(x 6, 6)
x 6 a
1
as regret = 3
2 x < 6 a
1
as regret 4
x 2 a
3
as regret = 4
Ex. 2.7.3
An additional criterion is considered: minimise the mean regret.
It is always equivalent to miximising mean return. It is not equivalent to minimising the
maximum regret - Milnors table is an example.
20
Ex. 2.7.4
We can give an example smaller than the ocial solution if we permit the actions to be
initially equivalent. This problem is very relevant to designing voting schemes.
Build the table with a < b and c > d

1

2
a
1
a c
a
2
b d
Now a
1
a
2
. However, adding an additional choice a
3
: x, y with x > b > a, d < y < c
makes a
1
a
2
. A similar eect occurs when the rst column is replicated.
Ex. 2.7.5
Fourth action - a
4
: buy information at cost c

1

2

3
a
1
5 0 13
a
2
6 7 7
a
3
2 4 9
a
4
6 c 7 c 13 c
Criteria for choosing a
4
- purchasing information
21
maximin return - c = 0 - competing against a
1
minimax regret - c 4 - max regret under a
4
is c; max regret under a
3
is 4 (which
is better than a
1
or a
2
)
mean return - c 2
pessimism level =
2
3
; then the values of choices are
v(a
4
) =
2
3
6 +
1
3
13 c
v(a
1
) =
1
3
13
v(a
2
) =
2
3
6 +
1
3
7
v(a
3
) =
2
3
2 +
1
3
9
It leads to c 2
22
Ex. 2.7.7 The rule is that of MEU - maximum expected utility. The reason that it does
not satisfy axioms (VIIVIII) were mentioned earlier.
It satises the rst six axioms as their application cannot change the computed utility of
any given action.
Ex. 2.7.10 Weak domination is a reasonable decision criterion, under uncertainty, if one
believes that all the states of nature are possible (may occur).
Under risk, if all the states have probability greater than 0, this is equivalent to strong
domination.
Ex. 2.7.12 This is Milnors table. Action a
4
is stochastically dominated (weakly) by a
combination of a
1
and a
2
. Using the notation of expected utility
(0.5, a
1
; 0.5, a
2
) a
4
Therefore, a
4
can never be strictly preferred when the states have probabilities of occur-
rence.

You might also like