Games Machine Play

Games Machines Play
Wynn C. Stirling
Electrical and Computer Engineering Department
459CB Brigham Young University, Provo, UT, USA 84602
Competition, which is the instinct of selfishness, is another word for dissipation of

energy, while combination is the secret of efficient production.
— Edward Bellamy
Looking Backward (1888)
Abstract
Individual rationality, or doing what is best for oneself, is a standard model used to
explain and predict human behavior, and von Neumann-Morgenstern game theory is the
classical mathematical formalization of this theory in multiple-agent settings. Individual
rationality, however, is an inadequate model for the synthesis of artificial social systems
where cooperation is essential, since it does not permit the accommodation of group in-
terests other than as aggregations of individual interests. Satisficing game theory is based
upon a well-defined notion of being good enough, and does accommodate group as well
as individual interests through the use of conditional preference relationships, whereby a
decision maker is able to adjust its preferences as a function of the preferences, and not
just the options, of others. This theory is offered as an alternative paradigm to construct
artificial societies that are capable of complex behavior that goes beyond exclusive self
interest.
Keywords
game theory, decision theory, individual rationality, group rationality, satisficing, altruism
January 18, 2002 Page 1 Submitted to Minds and Machines

1 Introduction
In an environment of rapidly increasing computer power and greatly increased scientific knowl-
edge of human cognition, it is inevitable that serious consideration will be given to designing
artificial systems that function analogously to the way humans function. Many researchers
in this field concentrate on four major metaphors: (a) brain-like models (neural networks), (b)
natural language models (fuzzy logic), (c) biological evolutionary models (genetic algorithms),
and (d) cognition models (rule-based systems). The assumption is that, by designing according
to these metaphors, machines can be made at least to imitate, if not replicate, human behavior.
Such systems are often claimed to be intelligent.
The word “intelligent” has been appropriated by many groups, and may mean anything
from nonmetaphorical cognition (for example, strong AI) to advertising hype (for example,
intelligent lawn mowers). Some of the definitions in use are quite complex, some are circular,
and some are self-serving. But when all else fails, we may appeal to etymology—it owns
the deed to the word, everyone else can only claim squatters rights. Intelligent comes from the
Latin roots inter (between) + legĕre (to choose). Ultimately, it seems, there is only one essential
characteristic of intelligence in man or machine—an ability to choose between alternatives.
Classifying “intelligent” systems in terms of anthropomorphic metaphors deals primar-
ily with the way knowledge is represented, rather than with the way decisions are made.
Whether knowledge is represented by neural connection weights, fuzzy set-membership func-
tions, genes, production rules, or even differential equations, is a choice that must be made
according to the context of the problem and the preferences of the system designer. The way
knowledge is represented, however, does not dictate the rational basis underlying the way de-
cisions are made, and therefore has little to do with this important aspect of intelligence.
One question, when designing a decision-making machine or a society of machines, is the
issue of just where the actual choosing mechanism lies—with the designer, who must supply
the machine with all of rules it is to follow, or with the machine itself, in that it possesses
a degree of true autonomy. This essay does not address that question. Instead, it focuses
primarily on the issue of how decisions might be made, rather than who ultimately bears the

responsibility for making them.
Much current research is being devoted to the design and eventual implementation of
artificial social systems. The envisioned applications of this technology include automated
air-traffic control, automated highway control, automated shop floor management, computer
network control, and so forth. Although much effort has concentrated on such applications,
attempts to design artificial systems that are compatible with human behavior, and thus have
a chance of being accepted and trusted by people, have met with limited success. Perhaps the
most important (and most difficult) social attribute to imitate is that of coordinated behavior,
whereby a system of autonomous distributed machines coordinate their actions to accomplish
tasks that achieve the goals of both the society and its members.
This essay investigates rationality models that may be used by men or machines. If ef-
fective and trustworthy decision-making machines are to be constructed, they must function
according to an adequate model of human behavior, both in isolation and in social settings.
The decision-making mechanisms employed by such machines must be understandable to and
viewed as reasonable by the people who interface with such systems. In Section 2 we discuss
conventional rational choice theory as it is typically instantiated via game theory, and point out
its shortcomings. In Section 3 we turn our attention to the problem of synthesizing artificial
decision systems and motivate the need for a model of rationality that accounts for group as
well as individual interests. In Section 4 we introduce an alternative to conventional rational
choice, and then describe a new theory of games, which we call satisficing games, in Section
5. Section 6 describes a key feature of our approach, namely, the capability to condition on
preferences; this feature distinguishes our approach from approaches based upon conventional
notions of rationality. Section 7 then presents an example of a satisficing game, and Section 8
finishes with a discussion of our theory.
2 Individual Rationality
People make choices according to various criteria, ranging from capriciousness at the low end
of sophistication, then to random guesses, then to heuristics, and finally to rational choice as

perhaps the most sophisticated form of decision making. Rational choice requires the decision
maker to form a total ordering of his or her preferences, and then to select a most preferable
option. This procedure is termed the principle of individual rationality.
Individual rationality is the acknowledged standard for calculus/probability-based knowl-
edge representation and decision making, and it is also the de facto standard for the alternative
approaches based on anthropomorphic metaphors. When designing neural networks, algo-
rithms are designed to calculate the optimum weights, fuzzy sets are defuzzified to a crisp set
by choosing the element of the fuzzy set with the highest degree of set membership, genetic
algorithms are designed under the principle of survival of the fittest, and rule-based systems are
designed according to the principle that a decision maker will operate in its own best interest
according to what it knows.
When more than one decision maker is involved, each participant must take into consider-
ation the possible actions of the other participants when determining his or her best choice, but
the principle of individual rationality still applies—the decision maker’s best choice is merely
constrained by the possible choices of others. Game theory, as developed by (von Neumann
and Morgenstern, 1944), is a mathematical formalization of this activity. It takes into account
all possible actions and consequences for all participants, and determines what the best actions
are for all players. Game theory provides a normative rule that tells a rational person what he
or she should do in various circumstances. A solution to a game is called a strategy, which is
a set of rules the decision maker should follow in every possible circumstance, or state, of the
game. A strategy vector is a set of strategies, one for each participant.
Equilibrium strategy vectors are the most useful. A strategy vector is said to be a Nash
equilibrium if, should any one participant change his or her own strategy, the new strategy
would decrease the level of his or her satisfaction. If all players simultaneously feel that way,
then there is no incentive for any player to change unilaterally—hence the equilibrium. There
are two important issues with equilibrium solutions. First, there may be more than one equi-
librium state, and second, the solutions, though instructive, are not constructive, in that game
theory does not tell us how to obtain an equilibrium state—it merely identifies them. A signif-

icant part of classical game theory literature is devoted to refinements of the theory in attempts
to address these key issues.
Recently, much attention has turned to the investigation of games in repeated-play situa-
tions. In this context, it is possible for the participants to adapt their strategies over time in
an attempt to improve their long-run performance. These are called evolutionary games, since
they invoke metaphors such as natural selection in an attempt to converge to a Nash equilibrium
or to discover some new concept that provides some form of mutual or individual benefit. At
the end of the day, however, evolutionary games are based upon exactly the same fundamen-
tal premise as is conventional game theory; namely, the premise of individual rationality—the
decision maker does the best thing for itself, regardless of the consequences to others. One
of the main distinctions between conventional game theory and evolutionary game theory is
that in the latter, preference orderings emerge through experience, while in the former they are
assumed to be known a priori, perhaps provided by some Laplacian daemon who has access
to the hearts and minds of all participants.
The individual rationality premise is perhaps the dominant concept when constructing mod-
els of multiple-agent decision making. It appears to be one of the favorite models of economics,
political science, and psychology. One of its virtues is its simplicity. It is the Occam’s rasor
of interpersonal interaction, and relies only upon the minimal assumption that an individual
will put its own interests above everything and everyone else. It is understood, however, that
this model is an abstraction of reality and is not causal. That is, a participant may exhibit
behavior that is consistent with the individual rationality premise, but that does not mean that
the individual is actually computing numerical preference orderings according to some math-
ematical formula and consciously optimizing (in the single-agent case) or equilibrating (in the
multiple-agent case). The value of such models is that they provide insight into the workings
of a complex society, and can be used to explain past behavior or to predict future behavior.
In his celebrated book, The Evolution of Cooperation, (Axelrod, 1984) makes a com-
pelling argument that cooperation may emerge in repeated-play situations where, although
non-cooperation may be more satisfying in the short run, long run satisfaction is better served

by cooperating, even in situations that are conducive to exploitation. The main vehicle for
establishing this hypothesis is the famous Prisoner’s Dilemma game, which is taken by many
as a model to explain the actions of people in a wide variety of social situations, and provides
a normative basis for their behavior.
While this game may be a reasonable model in the context of human behavior, it is not
obvious that it should be applied to artificial systems that are to be designed from their inception
to cooperate. With the Prisoner’s Dilemma model, cooperation, if it occurs at all, emerges as
the participants gain experience with repeated play. Unfortunately, such cooperation is not
binding, and the participants are free to defect at any time. Thus, even if a machine were to
learn to cooperate, it would still be under no obligation to do so, and it could also learn in the
future that defection is more to its advantage, especially in an evolutionary environment where
the participants are susceptible to infection by agents with different value systems.
The Prisoner’s Dilemma game may be an appropriate model of behavior when the opportu-
nity for exploitation exists and cooperation, though possible, incurs great risk, while defection,
even though it offers diminished rewards, also protects the participant from catastrophe. Many
social situations, however, possess a strong cooperative flavor with very little incentive for ex-
ploitation. One prototypical game that captures this behavorial environment is the Battle of
the Sexes game.1 This is a game involving a man and a woman who plan to meet in town for
a social function. She (S) prefers to go to the ballet (B), while he (H) prefers the dog races
(D). Each also prefers to be with the other, however, wherever it may be. This is a coordina-
tion game in which the players act simultaneously, and is a prototype for many coordination
scenarios, such as might occur with multiple robots trying to achieve some task, or with allo-
cating resources on an automated shop floor. The classical way to formulate this game is via a
payoff matrix, as given below in ordinal form, with the payoff pairs which compose this matrix
representing the benefits to H and S, respectively.

S
D B
D (4,3) (2,2)
H
B (1,1) (3,4)
Key: 4 = best; 2 = next best; 2 = next worst; 1 = worst

Rather than competing, these players wish to cooperate, but they must make their decisions
without benefit of communication. Both players lose if they make different choices, but the
choices are not all of equal value to the players. This game has two Nash equilibria, (D, D)
and (B, B). Unfortunately, this observation is of little help in defining a strategy.
One of the perplexing aspects of this game is that it does not pay to be altruistic, since
both participants doing so guarantees the worst outcomes for both. Nor does it pay to be
selfish—that guarantees the next worst outcome for both. The best and next-best outcomes
obtain if one participant is selfish and the other altruistic. It seems, however, that an attitude
of moderation should be helpful, but there is no fool-proof way to express this attitude without
direct communication.
An approach that does not require communication is for each player to flip a coin and
choose according to the outcome of that randomizing experiment. If they were to repeat this
game many times, then, on average, each player would realize an outcome midway between
next best and next worst for each. But, for any given trial, they would be in one of the four
states with equal probability.
If the players could communicate, then a much better strategy would be to alternate be-
tween (D, D) and (B, B), thus ensuring an average level of satisfaction midway between best
and next best for each. If they possess the ability to learn from experience, they may also
converge to an alternation scheme under repeated play.
Regardless of the strategies that may be employed, this game, as it is configured by the
payoff matrix, illustrates the shortcomings of conventional utility theory for the characteriza-
tion of behavior when cooperation is essential. Each player’s level of satisfaction is determined
completely as a function if his or her own enjoyment. For example, the strategy vector (D, D)
is best for H, but it is because he gets his way on both counts: he goes to his favorite event
and he is with S. Her feelings, however, are not taken into consideration. According to the
setup, it would not matter to H if S were to detest dog races and were willing to put up with
that event at great sacrifice of her own enjoyment, just to be with H. Such selfish attitudes,
though not explicit, are at least implied by the structure of the payoff matrix, and are likely to

send any budding romance to the dogs. The problem is that the solution concept, based as it is
upon individual rationality, fosters competition, even though cooperation is desired.
As an illustration of a subtle type of competition that may emerge from this game, consider
the operation of a shop floor. Producer X1 can manufacture lamps or tables, and Producer X2
can manufacture lamp shades or table cloths, but each must choose which product to manufac-
ture without direct communication. Coordinated behavior would make both of their products
more marketable, as indicated below, where the net profit accruing to each producer as a func-
tion of their joint decisions are displayed. Clearly, this is an instantiation of the Battle of the
Sexes game.
X2
Shades Cloths
Lamps ($20, $5) ($10, $4)
X1
Tables ($8, $3) ($15, $10)
Using these numbers, X1 might reason that, since his profit for (Lamps, Shades) is twice
the profit to X2 for (Tables, Cloths) but the incremental change in profit is the same for both,
then his preference is stronger, and should prevail. On the other hand, however, X2 might
reason that, since it is worth twice as much to her if they produce (Tables, Cloths), rather
4
than (Lamps, Shades) and it is only 3 more valuable to X1 if they produce (Lamps, Shades)
rather than (Tables, Cloths), her preference is stronger, and should prevail. Of course, if side
payments are allowed, then this could help resolve the dilemma, but that would also create a
secondary game of how to arrive at equitable side payments—which may produce an infinite
regress of dilemmas.
Unfortunately, rational choice does not resolve such potential conflicts, although a com-
promise may emerge with repeated play. Fortunately, however, the payoff structure does not
tell the whole story. It does not capture the interrelationships that may exist between the two
producers. But interrelationships, such as deference to the other, are vital to human interaction,
and they should also be considerations for artificial social systems if they are to imitate human
behavior. These examples serve to motivate us to take a hard look at the basic premises of
decision making.

Taylor (Taylor, 1987) attempts to mitigate the effects of individual rationality with a formal
notion of altruism by transforming the game through the creation of a new game according to a
utility array whose entries account for the payoffs to others as well as to itself. He suggests that
the utility functions be expressed as a weighted average of the payoffs to itself and to others.
By adjusting the weights, the agent is able to take into consideration the needs of others. Taylor
even goes further, and suggests that each player’s utility might be a function of other players’
utilities as well as their payoffs, and shows how this potentially leads to a situation of infinite
regress, with the players trying to account for higher and higher orders of altruism.
Taylor’s form of altruism does not distinguish between the state of actually relinquishing
one’s own self-interest and the state of being willing to relinquish one’s own self interest under
the appropriate circumstances. The unconditional relinquishment of one’s own self interests
is a condition of categorical altruism, whereby a decision maker unconditionally modifies
its preferences to accommodate the preferences of others. A purely altruistic player would
completely replace its preferences with the preferences of others. A state of being willing to
modify one’s preferences to accommodate others if the need arises is a state of situational
altruism, whereby a decision maker is willing to accommodate the preferences of others in lieu
of its own preferences if doing so would actually benefit the other, but otherwise retains its own
egoistic preferences intact and avoids needless sacrifice.
Categorical altruism may be too much to expect from a reasonable decision maker, whereas
the same decision maker may be willing to permit a form of situational altruism. It is one thing
for an individual to modify its behavior if it is sure that doing so will benefit another individual,
but it is quite another thing for an individual to modify its behavior regardless of its effect on the
other. With the Battle of the Sexes, given that S has a very strong aversion to D (even though
she would be willing to put up with those extremely unpleasant surroundings simply to be
with him and thus receive her second-best payoff), H might be willing to manifest situational
altruism by preferring B to D, but if she did not have a strong aversion to D he would stick to
his egoistic preference for D over B.
The appeal of optimization and equilibration is a strongly entrenched attitude that domi-

nates much of decision making theory. There is great comfort in following traditional paths,
especially when those paths are founded on such a rich and enduring tradition as rational choice
affords. The justification for the individual rationality hypothesis is generally attributed to
Bergson (Bergson, 1938) and Samuelson (Samuelson, 1948), who assert that individual inter-
ests are fundamental, and that social welfare is an aggregation of individual welfares. While
this perception may be appropriate for environments of perfect competition, the individual ra-
tionality hypothesis loses much of its power in more general settings. As expressed by Arrow,
when the assumption of perfect competition fails, “the very concept of rationality becomes
threatened, because perceptions of others and, in particular, of their rationality become part
of one’s own rationality (Arrow, 1986).” Arrow further asserts that the use of the individual
rationality hypothesis is “ritualistic, not essential.”
What is essential is that any useful model of society be ecologically balanced in that it is
able to accommodate the various relationships that exist between agents and their environment,
including other agents. Many thoughtful scholars argue that individual rationality is not an
adequate model for group behavior (for example, see (Bicchieri, 1993; Raiffa, 1968)), and
perhaps Luce and Raiffa summarized the situation most succinctly when they observed that
general game theory seems to be in part a sociological theory which does not
include any sociological assumptions . . . it may be too much to ask that any so-
ciology be derived from the single assumption of individual rationality (Luce and
Raiffa, 1957, p. 196).
Often, the most articulate advocates of a theory are also its most insightful critics. Regardless
of the merits of individual rationality as a model of human societies, society is not bound to
comply with it or any other model that purports to characterize it. All such models are simply
analysis tools, and their use is limited to that role. They cannot dictate behavior to the members
of the society.

3 Synthesis
In the absence of truly nonmetaphorical artificial intelligence, any artificial society that is not
completely whimsical, random, or anarchic must be designed according to an externally im-
posed sociological model that accounts for inter-agent relationships. In contrast to natural
societies, where models are used for analysis purposes, models in an artificial society are used
for synthesis. The analyst is free to concoct any story-line that is not contradicted by the ob-
servations, and then to abstract this story-line into a game that captures the essence of the
interaction without the encumbrance of irrelevant details. In the synthesis context, however,
the story-line is critical. The participants must actually live the story as they function in their
environment. Thus, not only will the sociological models explain and predict behavior, they
will actually dictate behavior.
Most artificial social systems designs that appear in the literature are based on exactly the
same premise that governs models that are used to analyze natural systems—individual ratio-
nality. But Arrow’s impossibility theorem demonstrates that group preferences cannot always
be made to be consistent with individual preferences if reasonable assumptions regarding the
structure of the decision problem are made (Arrow, 1951). Despite the inability to guarantee
consistency between individual and group preferences, there are a number of approaches to
group decision making, based on the premise of individual rationality, that are seriously con-
sidered. One approach is to view the group itself as a higher-level decision making entity, or
superplayer, who is able to fashion a group expected utility to be maximized. This “group-
Bayesian” approach, however, fails to distinguish between the notion of group choices and
group preferences. While it is one thing to say that a group decides upon a group alternative,
it is quite another thing to assume that the group itself prefers the alternative in any meaning-
ful sense. Shubik refers to this conflation of concepts as an “anthropomorphic trap (Shubik,
1982)” and strongly suggests that it be avoided.
Another classical approach to group decision making is to invoke the Pareto principle,
which is to choose a group decision such that, if any individual were to make a unilateral
change to increase its level of satisfaction, the satisfaction level for at least one other member of

the group would be decreased. The obvious problem with this approach is that all participants
must agree to abide by the Pareto principle. It seems that, for this principle to be viable,
some notion of group preference must enter through the back door, if only to agree to be loyal
Paretians, not to mention the problem of deciding which Pareto choice to select if there is
not a unique one. Nevertheless, Pareto optimality is viewed by many as a rational means of
group decision making. But there is still some disquiet, as Raiffa confesses: “Not too long ago
this principle [Pareto optimality] seemed to me unassailable, the only solid cornerstone in an
otherwise swampy area. I am not so sure now, and I find myself in that uncomfortable position
in which the more I think the more confused I become . . . somehow the group entity is more
than the totality of its members (Raiffa, 1968, p. 233, 237).”
Perhaps the source of the concerns raised by Shubik and Raiffa is a critical flaw in the
doctrine of individual rationality, which flaw becomes a serious issue when designing a system
whose charter is cooperation. The problem is that it is not possible to accommodate both group
and individual interests simultaneously and still strictly adhere to the notion of rationality as
doing the best thing possible.
Much research has been devoted to ways, within the context of game theory, to overcome
the limitations of strict individual rationality (but without abandoning it) when dealing with
group-wide decision problems. (Schelling, 1960) advocates what he calls a “reorientation”
of game theory, wherewith he attempts to counterbalance the notion of conflict with a game-
theoretic notion of coordination. The antipodal extreme to a game of pure conflict is a game
of pure coordination, where all players have coincident interests and desire to take actions that
are simultaneously of self and mutual benefit. In contrast to pure conflict games were one
player wins only if the others lose, with a pure coordination game, either all players win or
all lose. In general, games may involve both conflict and coordination; Schelling terms such
games mixed-motive games. Although Schelling’s attempt at reorientation is useful, it is not a
fully adequate way to mitigate the notion of conflict, since his reorientation does not alter the
fundamental structure of game theory as ultimately dependent on the principle of individual
rationality. Lewis attempts to patch up the problem by introducing the notion of coordination

equilibrium as a refinement of Nash equilibrium. Whereas a Nash equilibrium is a combination
in which no one would be better off had he alone acted otherwise, Lewis strengthens this
concept and defines a coordination equilibrium as a combination in which no one decision
maker would have been better off had any one decision maker alone acted otherwise, either
himself or someone else. (Lewis, 1969, p. 14). Coordination equilibria, however, are common
in situations of mixed opposition and coincidence of interest. In fact, even a zero-sum game of
pure conflict can have a coordination equilibrium.
There are many other attempts to overcome the problems that arise with the individual
rationality hypothesis. For example, (Shapley, 1953) suggests that players in an N -person
game should formulate a measure of the how much their joining a coalition contributes to its
value, and should use this metric to justify their decision to join or not to join. This formulation,
however, requires the acceptance of additional axioms involving the play of composite games
(Rapoport, 1970). Another way to extend the notion of optimality is to form coalitions on
the basis of no player having a justifiable objection against any other member of the coalition,
resulting in what is called the bargaining set.
Yet another approach to the problem is to modify the decision maker’s utility function so
it is a function of the group reward as well as its individual reward. In effect, the participant
is “brainwashed” into substituting group interests for its personal interests. Then, when acting
according to is supposed own self interest, it is actually accommodating the group (Wolpert
and Tumer, 2001). A somewhat similar, although less radical, approach is taken by (Glass
and Grosz, 2000) and (Cooper et al., 1996), who attempt to instill a social consciousness into
agents by rewarding them for good social behavior by adjusting their utility functions with
“brownie points” and “warm glow” utilities for doing the “right thing.” (Sen, 1996) introduces
a probabilistic reciprocity mechanism that elicits cooperation between self-interested agents.
Also, it is certainly possible to invoke various voting or auctioning protocols to address this
problem (for example, see (Sandholm, 1999)).
Granted, it is possible under the individual rationality regime for a decision maker to sup-
press its own egoistic preferences in deference to others, but doing so is only a device to trick

individual rationality into providing a response that can be interpreted as unselfish. Such an
artifice provides only an indirect way to simulate socially useful attributes of cooperation, un-
selfishness, and altruism with a model that is more naturally attuned to competition, exploita-
tion, and avarice. All of the approaches discussed above, however, are based, at the end of the
day, upon the premise of individual rationality, and serve as refinements or extensions of that
fundamental premise.2 Rapoport summarizes the situation succinctly:
If . . . we wish to construct a normative theory, i.e., be in a position of advising
“rational players” on what the outcomes of a game “ought” to be, we see that
we cannot do this without further assumptions about what explicitly we mean by
“rationality” (Rapoport, 1970, p. 136).
It seems that the vast majority of the decision-theoretic community is still committed to the
fundamental concept of individual rationality. Yet, researchers understand that achieving the
ideal is often impractical. Proponents of the so-called bounded rationality approach to deci-
sion making recognize that, in the real-world, strict optimization/equilibration often cannot be
achieved, due to either computational or informational constraints. (Sims, 1980) characterizes
this problem as a research “wilderness of disequilibrium economics” with no general theory
to guide the search for alternatives to equilibrium-based solutions. (Kreps, 1990) suggests
that bounded rationality should be defined as “intendedly [individually] rational, but limitedly
so,” and presents the case for short-term optimization and long-term adaptation through ret-
rospection of past results. (Sargent, 1993) offers that this might be done via the methods of
artificial intelligence such as neural computing, genetic algorithms, simulated anealing, and
pattern recognition, with the expectation that artificial boundedly rational agents will learn to
behave as if they were operating under conventional rational choice and at least achieve an “ap-
proximate” equilibrium. Machine learning researchers have had some success in uncovering
such processes (Bowling, 2000; Kalai and Lehrer, 1993; Fudenberg and Levine, 1993; Hu and
Wellman, 1998). Despite the successes of bounded rationality, however, it still holds optimiza-
tion as the ideal to be sought, and accepts a compromise only as an unavoidable consequence
of real-world constraints.

Although much research has been concentrated on ways to deal with the exigencies of
practical decision making, it is perhaps somewhat surprising that these exigencies have not also
prompted a re-examination of the most fundamental premise of all—individual rationality. At
present, there does not appear to be a body of theory that supports the systematic synthesis of
multiple-agent decision systems that does not rely upon the individual rationality premise, and
any relaxation of the demand for the best seems to place us even further into Sims’ wilderness.
But there is always the possibility that somewhere in the wilderness is a promised land.
4 Dichotomies
Let us begin our search at the headwaters of rationality. It is a platitude that decision makers
should make the best choices possible, but we cannot rationally choose an option, even if we
do not know of anything better, unless we know that it is good enough. Being good enough is
the fundamental desideratum of rational decision makers—being best is a bonus. Therefore,
let us replace the demand for the best, and only the best, with a desideratum of being good
enough. To be useful, we must, of course, precisely define what it means to be good enough.
Mathematically formalizing a concept of being good enough is not as straightforward as equi-
librating. Being best is an absolute concept—it does not come in degrees. Being good enough,
however, is not an absolute, and does come in degrees. Consequently, we must not demand or
expect a unique good-enough solution, but instead must be willing to accept multiple solutions
with varying degrees of adequacy.
Optimization is very sophisticated concept, and requires that the decision maker be able to
accomplish three tasks. The first is to define all of its relevant options, the second is to rank-
order the options according to its preferences for the consequences of implementing them, and
the third is to search through the set of choices to identify the one that is the most preferable.
The starting point for much of conventional decision theory is that the first two tasks are
assumed to be given, and most attention is concentrated on ways to conduct the search. Thus,
much of what is usually termed “decision theory” might more properly be termed “search
theory.” In point of fact, once a decision maker has defined a rank-ordering and has determined

to optimize, it has made its decision—it will implement the highest-ranking option, and all that
remains is to find it.
Rank ordering requires that all options be evaluated according to their expected utility. Al-
though conventional utility theory maps options into numerical degrees of satisfaction, utilities
must also account for any dissatisfaction that may accrue to the decision maker. For example,
owning a luxury vehicle may be very satisfying, hence have high utility, but the cost of doing
so may be very dissatisfying, hence would have high dis-utility. Conventional utility theory
aggregates these two polarizing aspects into a single function that characterizes the net utility
of the option.
As a practical matter, however, people often separate these two aspects. Attached to virtu-
ally every nontrivial option are attributes that are desirable and attributes that are not desirable.
To increase performance, one usually expects to pay more. To win a larger reward, one expects
to take a greater risk. People are naturally wont to evaluate the upside versus the downside, the
pros versus the cons, the pluses versus the minuses, the benefits versus the costs. Operating this
way is almost intuitive; it may be the nearest thing to instinct. Perhaps the most fundamental
human decision-making activity is embodied in the process of evaluating tradeoffs, option by
option—putting the gains and the losses on the balance to see which way it tips. The result of
evaluating dichotomies in this way is that the benefits must be at least as great as the costs. In
this sense, such evaluations provide an intuitive notion of being good enough.
Suppose two hamburgers are placed before you and you are to make a choice. One way to
frame the question before you is to compare the two hamburgers to each other and either select
the better product (presumably on the basis of appearance and cost). Another way to frame the
question is first to confine attention to, say, Hamburger A, and form the decision either to accept
it or to reject it, and then form a similar decision for Hamburger B. The difference is that the
first way to frame the question is extrinsic to Hamburger A, that is, it involves considerations
of issues in addition to that item (namely, its comparison with Hamburger B), while the second
framing is intrinsic, in that it involves consideration only of that particular hamburger. With
the extrinsic framing, one must combine appearance and cost into a single utility to be rank-

ordered, but under the intrinsic framing, one may keep appearance cost separate and form
the binary evaluation of appearance versus cost. If only one of the hamburgers passes such a
test, the problem is resolved. If you conclude that neither hamburger’s appearance is worthy
of the cost, you are justified in rejecting them both. If you think both are worthy, you may
choose them both (assuming that your budget and appetite are both adequate). Suppose that
Hamburger A costs more than Hamburger B, but is also much larger and has more trimmings.
If you view both as being worth the price, but you only want one, then whatever your final
choice, you can’t make a bad decision. You at least get a good hamburger—you get your
money’s worth.
Let us take the notion of getting at least what you pay for as an operational definition of
being good enough. This definition circumvents the logical problem that occurs with other
concepts of being good enough, such as meeting minimum requirements. Simon has appropri-
ated the term satisficing to mean “good enough” and advocates the construction of “aspiration
levels” of how good a solution might reasonably be achieved, and halting the search when these
expectations are met (Simon, 1955). Although aspiration levels at least superficially establish
minimum requirements, this approach relies primarily upon experience-derived expectations.
If the aspiration is too low, performance may needlessly be sacrificed, and if it is too high, there
may be no solution. It is difficult to establish a good and practically attainable aspiration level
without first exploring the limits of what is possible, that is, without first identifying optimal
solutions—the very activity that satisficing is intended to circumvent. For single-agent low-
dimensional problems, specifying the aspirations may be noncontroversial. But, with multiple-
agent systems, interdependence between decision makers can be complex and aspiration levels
can be conditional (what is satisfactory for me may depend upon what is satisfactory for you).
The current state of affairs regarding aspiration levels does not appear to address completely
the problem of specifying minimum requirements in multiple-agent contexts. Furthermore, this
latter definition of being good enough actually begs the question, because being good enough
requires us to define minimum requirements (aspirations), which immediately plunges us into
semantic nonsense, because the criterion for defining minimum requirements can only be that

they are good enough.
Let us retain the “satisficing” terminology3 because our usage is consistent with the issue
that motivated Simon’s original usage—to identify options that are good enough in the sense
of comparing attributes of the options to a standard. Our usage differs only in the standard
used for comparison. Whereas Simon’s approach is extrinsic and compares attributes to exter-
nally supplied aspiration levels, our usage is intrinsic and compares the positive and negative
attributes of each option.
Definition 1 A choice is satisficingly rational if the expected gains achieved by making it
equal or exceed the expected loss, provided the gains and losses can be expressed in commen-
surable units. 2
Getting at least what you pay for may be a natural concept for people but, surprisingly,
the notion has received little attention in the literature on the design of artificial systems.
The optimality paradigm is usually taken for granted as the standard against which all ap-
proaches should be measured, and there simply has not been a perceived need to explore other
paradigms. Nevertheless, if people are wont to make decisions by evaluating dichotomies, then
it is reasonable to investigate ways that machines may incorporate this same paradigm.
One of the benefits of replacing strict optimality as the ideal with the softer stance of
being good enough is to move closer to conforming to Arrow’s observation that other agents’
rationality is part of one’s own rationality. A willingness to be content with a choice that
is adequate, if not optimal, opens the way to expanding one’s sphere of interest to permit the
accommodation of others, thereby paving the way for the development of a notion of rationality
that is able to account for the interests of others as well as for one’s self interest. The key
enabling concept for this to happen is the notion of conditional preferences.
Our appreciation of conditional preferences may be strengthened by a brief summary of
conventional utility theory as it is employed in mathematical games. Utility theory was devel-
oped as a mathematical way to encode individual preference orderings. It is built on a set of
axioms that describe how a “rational man” would express his preference between two alter-

natives in a consistent way. An expected utility function is a mathematical expression that is
consistent with the preferences, conforms to the axioms, and orders the individuals preferences
for the various options. Since, in a game-theoretic context, an individual’s preferences are gen-
erally dependent upon the actions of others, an individual’s expected utility function must be
a function of not only the individual’s own options, but of the options of all other individuals.
The individual may then compute the expected utility of its possible options conditioned on the
actions of other players. These expected utilities may then be juxtaposed into a payoff array,
and an equilibrium strategy may be identified.
The important thing to note about this structure is that it is not until the expected utilities
are juxtaposed into an array that the actual “game” aspects of the situation emerge. It is the
juxtaposition that reveals possibilities for conflict and coordination. These possibilities are not
explicitly reflected in the individual expected utility functions. In other words, although the
individual’s expected utility is a function of other players’ strategies, it is not a function of
other players’ preferences. This structure is completely consistent with exclusive self-interest,
where all a player cares about is its personal benefit as a function of its own and other players’
strategies, without any regard for the benefit to the others. Under this paradigm, the only
way the preferences of others factor into an individual’s decision-making deliberations is to
constrain behavior to limit the amount of damage they can do. This situation obtains if the
game is a game of pure competition (such as a zero-sum game), a game of mixed motives, or
even a game of pure coordination.
In societies that value cooperation, it is unlikely that the preferences of a given individ-
ual will be formed independently of the preferences of others. Knowledge about one agent’s
preferences may alter another agent’s preferences. Such preferences are conditioned on the
preferences of others. In contrast to conditioning only on the actions of other participants,
conditioning on the preferences of others permits a decision maker to adjust its preferences to
accommodate the preferences of others. It can bestow either deference or disfavor to others
according to their preferences as they relate to its own preferences. With the Battle of the
Sexes game, for example, it would not be unreasonable, say, for H to take into consideration

S’s attitude about D. Since traditional utility theory is a function of participant options, rather
than participant preferences, it cannot be used to express such relationships.
5 Satisficing Games
While the statement “What is best for me and what is best for you is also jointly best for us to-
gether” may be nonsense, the statement “What is good enough for me and what is good enough
for you is also jointly good enough for us together” may be perfectly sensible, especially when
we do not have inflexible notions of what it means to be good enough, and if decision makers
are able to accommodate the interests of others as well as of themselves. To generate a useful
theory, however, we must be able to define, in precise mathematical terms, what it means to
be good enough, we must develop a theory of decision making that is compatible with this
notion, and we must establish that it permits the accommodation of both group and individual
interests.
An alternative to von Neumann-Morgenstern N -player game theory is a new approach to
multiple-agent decision making called satisficing games (Stirling and Goodrich, 1999b; Stir-
ling and Goodrich, 1999a; Goodrich et al., 2000). Let X1 , . . . , XN be a society of decision
makers, and let Ui be the set of options available to Xi , i = 1, . . . , N . The joint action set is
the product set U = U1 , × · · · × UN , and denote elements of this set as u = {u1 , . . . , uN },
where ui ∈ Ui . Rather than defining a game in terms of a payoff array of expected utilities,
as is done with conventional normal-form games, a satisficing game incorporates the same in-
formation that is used to define expected utilities to form an interdependence function, denoted
pS1 ···SN R1 ···RN : U × U → [0, 1], which encodes all of the positive and negative interrela-
tionships between the members of the society. The interdependence function is a multivariate
mass function, that is, it is non-negative and normalized to unity. In this respect it is similar
to a probability mass function, but does not characterize uncertainty or randomness. Rather,
it jointly characterizes the selectability, that is, the degree to which the options achieve the
goals of the decision maker and the rejectability, or the degree to which the options consume
resources (money, fuel, exposure to hazard, or other costs). From this function we may derive

two functions, called joint selectability and joint rejectability functions, denoted pS1 ···SN and
pR1 ···RN , respectively, according to the formulas

X
pS1 ···SN (u) = pS1 ···SN R1 ···RN (u, v) (1)
v∈U
X
pR1 ···RN (v) = pS1 ···SN R1 ···RN (u, v) (2)
u∈U
for all (u, v) ∈ U × U. These functions are also multivariate mass functions. The joint
selectability function characterizes the degree to which a joint option leads to success for the
group, in the sense of achieving the goal of the decision problem, and the joint rejectability
function characterizes the cost, or degree of resource consumption, that is associated with the
joint option. The two functions are compared for each possible joint outcome, and the set of
joint outcomes for which joint selectability is at least as great as joint rejectability form the
jointly satisficing set.
Definition 2 A satisficing game is a triple {U, pS1 ···SN , pR1 ···RN }. The joint solution to a
satisficing game is the set
Σq = {u ∈ U: pS1 ···SN (u) ≥ qpR1 ···RN (u)}, (3)
where q is the index of caution, and parameterizes the degree to which the decision maker is
willing to accommodate increased costs to achieve success. An equivalent way of viewing this
parameter is as an index of boldness, characterizing the degree to which the decision maker is
willing to risk rejecting successful options in the interest of conserving resources. Nominally,
q = 1, which attributes equal weight to success and resource conservation interests. Σq is
termed the joint satisficing set, and elements of Σq are jointly satisficing actions. 2
The jointly satisficing set provides a formal definition of what it means to be good enough
for the group; namely, a joint option is good enough it the joint selectability is greater than or
equal to the index of caution times the joint rejectability.
Definition 3 A decision-making group is jointly satisficingly rational if the members of the
group choose a vector of options for which joint selectability is greater than or equal to the
index of caution times joint rejectability. 2

The marginal selectability and marginal rejectability mass functions for each Xi may be
obtained by summing the joint selectability and joint rejectability over the options of all other
participants, yielding:
X
pSi (ui ) = pS1 ···SN (u1 , . . . , uN ) (4)
uj ∈Uj
j6=i
X
pRi (ui ) = pR1 ···RN (u1 , . . . , uN ). (5)
uj ∈Uj
j6=i
Definition 4 The individually satisficing solutions to the satisficing game {U, pS1 ···SN , pR1 ···RN }
are the sets
Σiq = {ui ∈ Ui : pSi (ui ) ≥ qpRi (ui )}. (6)
The product of the individually satisficing sets is the satisficing rectangle:
Rq = Σ1q × · · · × ΣNq = {(u1 , . . . , uN ): ui ∈ Σiq }. (7)
Definition 5 A decision maker is individually satisficingly rational if it chooses an option
for which the marginal selectability is greater than or equal to the index of caution times the
marginal rejectability. 2
The individually satisficing sets identify the options that are good enough for the individ-
uals; namely, the options such that the marginal selectability is greater than or equal to the
index of caution times marginal rejectability. It remains, however, to reconcile, if possible, the
individual choices with the group choices. To do this, we need to establish the relationship
between the jointly satisficing set and the satisficing rectangle.
Definition 6 A compromise at q is a jointly satisficing solution such that each element is
individually satisficing. We denote this set
Cq = Σq ∩ Rq . (8)

A compromise at q exists if Cq 6= ∅, otherwise an impasse at q occurs. 2
The following theorem expresses the relationship between the individual and jointly satis-
ficing sets.
Theorem 1 If ui is individually satisficing for Xi , that is, if ui ∈ Σiq , then it must be the ith
element of some jointly satisficing vector u ∈ Σq .
Proof This theorem is proven by establishing the contrapositive, namely, that if ui is not
the ith element of any u ∈ Σq , then ui 6∈ Σiq . Without loss of generality, let i = 1. By
hypothesis, pS1 ···SN (u1 , v) < qpR1 ···RN (u1 , v) for all v ∈ U2 × · · · × UN , so pS1 (u1 ) =
1
v pR1 ···RN (u1 , v) = qpR1 (u1 ), hence u1 6∈ Σq .
P P
v pS1 ···SN (u1 , v) < q 2
The content of this theorem is that no one is ever completely frozen out of a deal—every
decision maker has a seat at the negotiating table. This is perhaps the weakest condition under
which negotiations are possible. Perhaps the most simple way to negotiate is to lower one’s
standards in a controlled way.
Corollary 1 There exists an index of caution value q0 ∈ [0, 1] such that a compromise exists
at q0 .
The proof of this corollary is trivial and is omitted. If the players are each willing to lower
their standards sufficiently by decreasing the level of caution, q, they may eventually reach a
compromise solution that is both individually and jointly satisficingly rational. The parameter
q0 is a measure of how much they must be willing to compromise to avoid an impasse. Note
that willingness to lower one’s standards is not total capitulation, since the participants are able
to control the degree of compromise by setting a limit on how small of a value of q they can
tolerate. Thus, a controlled amount of altruism is possible with this formulation. But, if any
player’s limit is reached without a mutual agreement being obtained, the game has reached an
impasse.

6 Conditioning
The interdependence function provides a complete description of the interrelationships that
may exist between participants in a multiple-agent decision problem. This function, however,
can be rather complex but, fortunately, its structure as a mass function permits its decomposi-
tion into constituent parts according to the law of compound probability, or chain rule (Eisen,
1969).4 Applying the formalism (but not the usual probabilistic semantics) of the chain rule,
we may express the interdependence function as a product of conditional selectability and re-
jectability functions. To illustrate, consider a two-agent satisficing game involving decision
makers X1 and X2 , with option sets U1 and U2 , respectively. The interdependence function
may be factored in several ways, for example, we may write
pS1 S2 R1 R2 (x, y, z, w) = pS1 |S2 R1 R2 (x|y, z, w) · pS2 |R1 R2 (y|z, w) · pR1 |R2 (z|w) · pR2 (w).
(9)
We interpret this expression as follows. pS1 |S2 R1 R2 (x|y, z, w) is the conditional selectability
mass associated with X1 taking action x, given that X2 places all of its selectability mass on
y, X1 places all of its rejectability mass on z, and X2 places all of its rejectability mass on
w. Similarly, pS2 |R1 R2 (y|z, w) is the conditional selectability mass associated with X2 taking
action y, given that X1 and X2 place all of their rejectability masses on z and w, respectively.
Continuing, pR1 |R2 (z|w) is the conditional rejectability mass associated with X1 rejecting z,
given that X2 places all of its rejectability mass on w. Finally, pR2 (w) is the unconditional
rejectability mass associated with X2 rejecting option w.
Many such factorizations are possible, but the appropriate factorization must be determined
by the context of the problem. These conditional mass functions are mathematical instantia-
tions of production rules. For example, we may interpret pR1 |R2 (z|w) as the rule: If X2 rejects
w, then X1 feels pR1 |R2 (z|w) strong about rejecting z. In this sense, they express local behav-
ior, and such behavior is often much easier to express than global behavior. Furthermore, this
structure permits irrelevant interrelationships to be eliminated. Typically, there will be some
close relationships between some subgroups agents, while other subgroups agents will func-
tion essentially independently of each other. For example, suppose that X1 ’s selectability has

nothing to do with X2 ’s rejectability. Then we may simplify pS1 |S2 R1 R2 (x|y, z, w) to become
pS1 |S2 R1 (x|y, z).
Such conditioning permits the designer of the decision problem to construct the interdepen-
dence function as a natural consequence of the relevant interdependencies that exist between
the participants. In extreme cases where all of the participants are closely interrelated, the
construction of the interdependence function may be very complex. Many multiple-agent sce-
narios, however, are not closely interconnected. Hierarchical systems and other Markovian-like
societies for example, may be very economically characterized by such factorizations. At the
other extreme, anarchic systems (every man for himself) may be characterized by complete
inter-independence, yielding factorizations of the form
pS1 S2 R1 R2 (x, y, z, w) = pS1 R1 (x, z) · pS2 R2 (y, w). (10)
Conditioning is the key to the accommodation of the desires of others. For example, if X2
were very desirous of implementing y if X1 were not to implement x, X1 could accommodate
X2 ’s preference by setting pR1 |S2 (x|y) to a high value (close to unity). Then, x would be highly
rejectable to X1 if y were highly selectable to X2 . Note, however, that if X2 should turn out
not to highly prefer y and so sets pS2 (y) ≈ 0, then the joint selectability/rejectability of (x, y),
namely, pS2 R1 (y, x) = pR1 |S2 (x|y)pS2 (y) ≈ 0, so the joint event of X1 rejecting x and X2
selecting y has negligible interdependence mass. Thus, X1 is not penalized for being willing
to accommodate X2 when X2 does not need or expect that accommodation. By controlling the
conditioning weights, X1 is able to achieve a balance between its own egoistic interests and
the interests of others.
Preference conditioning cannot be accommodated by von Neumann-Morgenstern game
theory, since the utility functions required for that formalism are not conditioned on prefer-
ences. The only kind of conditioning possible with conventional utility theory is for X1 to
condition its preferences on the options that are available to X2 , but not on X2 ’s attitudes
about the options. For X1 to accommodate the preferences of X2 , it would be required to
change its entire utility structure; that is, it must be willing to throw the game, if need be, to
accommodate X2 , even if X2 does not require or expect such a sacrifice. This is the effect of

categorical altruism.
Structuring the interdependence function according to the axioms of probability theory
assures (i) consistency, in that none of the preference linkages are contradictory; (ii) com-
pleteness, in that all relevant linkages can be accommodated; (iii) non-redundancy, in that
duplication is avoided; and (iv) parsimony, in that the linkages are characterized by the fewest
number of parameters possible. Thus, although a system model may be complex, it will not be
more complex than it needs to be. As noted by Palmer, “Complexity is no argument against a
theoretical approach if the complexity arises not out of the theory itself but out of the material
which any theory ought to handle (Palmer, 1971, p. 176).”
7 Satisficing Battle of the Sexes
Let us now cast the Battle of the Sexes as a satisficing game. We must first establish each
player’s notions of success and resource consumption. In accordance with our previous dis-
cussion, success is related to the most important goal of the game, which is for the two players
to be with each other, regardless of where they go. Resource consumption, on the other hand,
deals with the costs of being at a particular function. Obviously, H would prefer D if he did
not take into consideration S’s preferences; similarly, S would prefer B. Thus, we may express
the myopic rejectabilities for H and S in terms of parameters h and s, respectively, as
pRH (D) = h
(11)
pRH (B) = 1 − h
and
pRS (D) = 1 − s
(12)
pRS (B) = s,
where h is H’s rejectability of D and s is S’s rejectability of B. The closer h is to zero, the
more H is adverse to B; with an analogous interpretation for s with respect to S attending D.

1
To be consistent with the stereotypical roles, we may assume that 0 ≤ h < 2 and 0 ≤ s < 21 .
As will be subsequently seen, only the ordinal relationship need be specified, that is, either
s < h or h < s.

Selectability is a measure of the success support associated with the options. Since being
together is a joint, rather than an individual objective, it is difficult to form unilateral assess-
ments of selectability, but it is possible to characterize individually the conditional selectability.
To do so requires the specification of the conditional mass functions pSH |RS and pSS |RH ; that
is, H’s selectability conditioned on S’s rejectability and S’s selectability conditioned on H’s
rejectability. If S were to place her entire unit mass of rejectability on D, H may account for
this, if cares at all about S’s feelings, by placing some portion of his conditional selectability
mass on B. S may construct her conditional selectability in a similar way, yielding
pSH |RS (D|D) = 1 − α
pSH |RS (B|D) = α

(13)
pSH |RS (D|B) = 1
pSH |RS (B|B) = 0

and
pSS |RH (D|D) = 0
pSS |RH (B|D) = 1

(14)
pSS |RH (D|B) = β
pSS |RH (B|B) = 1 − β.
We observe that the valuations pSH |RS (B|D) = α and pSS |RH (D|B) = β are conditions
of situational altruism. If S were to place all of her rejectability mass on D, then H may
defer to S’s strong dislike of D, by placing α of his selectability mass, as conditioned by her
preference, on B. Similarly, S shows a symmetric conditional preference for D if H were to
reject B strongly. The parameters α and β are H’s and S’s indices of altruism, respectively,
and serve as a way for each to control the amount of deference each is willing to grant to the
other. In the interest of simplicity, we shall assume that both players are maximally altruistic,
and set α = β = 1. In principle, however, they may be set independently to any value in [0, 1].
Notice that, even in this most altruistic case, these conditional preferences do not commit one
to unconditional abdication of his or her own unilateral preferences. He still myopically (that
is, without taking S into consideration) prefers D and she still myopically prefers B, and there

is no intimation the either participant must throw the game to accommodate the other.
With these conditional and marginal functions, we may factor the interdependence function
as follows:
pSH SS RH RS (x, y, z, w) = pSH |SS RH RS (x|y, z, w) · pSS |RH RS (y|z, w) · pRH RS (z, w)
(15)
= pSH |RS (x|w) · pSS |RH (y|z) · pRH (z)pRS (w),
where we have assumed that H’s selectability conditioned on S’s rejectability is dependent
only on S’s rejectability, that S’s selectability conditioned on H’s rejectability is dependent
only on H’s rejectability, and that the myopic rejectability values of H and S are independent.
Application of (1) and (2) results in joint selectability and rejectability values of
pSH SS (D, D) = (1 − h)s
pSH SS (D, B) = hs
(16)
pSH SS (B, D) = (1 − h)(1 − s)
pSH SS (B, B) = h(1 − s)

and
pRH RS (D, D) = h(1 − s)
pRH RS (D, B) = hs
(17)
pRH RS (B, D) = (1 − h)(1 − s)
pRH RS (B, B) = (1 − h)s.

The marginal selectability and rejectability values for H and S are
pSH (D) =s pRH (D) =h (18)
pSH (B) =1 − s pRH (B) =1 − h (19)
and
pSS (D) =1 − h pRS (D) =1 − s (20)
pSS (B) =h pRS (B) =s. (21)
Setting the index of caution, q, equal to unity, we obtain the jointly satisficing set as

 {(D, B), (B, D), (B, B)} for s < h
Σq = {(D, D), (D, B), (B, D)} for s > h , (22)
{(D, D), (D, B), (B, D), (B, B)} for s = h


the individually satisficing sets are

 {B} for s < h
ΣH
q = {D} for s > h (23)
{B, D} for s = h


 {B} for s < h
ΣSq = {D} for s > h , (24)
{B, D} for s = h

and the satisficing rectangle is


 {B, B} for s < h
Rq = ΣH
q × Σ S
q = {D, D} for s > h . (25)
{{B, B}, {D, D}} for s = h

Thus, if S’s aversion to D is less than H’s aversion to B, then both players will go to
H’s preference, namely, D, and conversely. This interpretation is an example of interpersonal
comparisons of utility. Such comparisons are frowned upon by conventional game theorists,
but are essential to social choice theory, so long as the utilities are expressed in the same
units and have the same zero-level. Since the utilities are mass functions, however, they each
apportion a unit of value (selectability or rejectability) among the possible options.
It is useful to discuss structural differences between the representation of this game as a
traditional game and its representation as a satisficing game. With the traditional game, all of
the information is encoded in the payoff matrix, with the values that are assigned representing
the importance, cost, or informational value to the players. The payoff matrix primarily cap-
tures information conditioned on options; that is, gains or losses to a player conditioned on the
choice possibilities of the other player.
With the satisficing game structure, all of the information is encoded into the interdepen-
dence function. This function may be factored into products of conditional interdependencies
that represent the joint and individual goals and preferences of the players. The joint selectabil-
ity function, (16) and the joint rejectability function, (17), characterize the state of the problem
as represented by the conditional goals and individual preferences of the the players. Specify-
ing numerical values for the preference parameters, h and s (also for α and β), is as natural, it
may be argued, as it would be to specify numerical values for the payoff matrix.
The essential difference between the von Neumann-Morgenstern and the satisficing rep-
resentations of this game is that the von Neumann-Morgenstern utilities do not permit the

preferences of one player to influence the preferences of the other player. But in the context of
this game, it is reasonable to assume that such preferences exist. If they do, then there should
be a mechanism to account for them.
Another important way to compare these two game formulations is in terms of the solution
concept. The classical solution to the traditional problem is to solve for mixed strategy Nash
equilibria corresponding to a numerically definite payoff matrix. The main problem with using
a mixed strategy, as far as this discussion is concerned, is that, for it to be successful, both
players must use exactly the same payoff matrix, and they must use the randomized decision
probabilities that are calculated to maximize their payoff’s. Even small deviations in these
quantities destroys the equilibrium completely and the solution has no claim even on rationality,
let alone optimality. The reason for this sensitivity is that the players are attempting to optimize,
and to do so, they must exploit the model structure and parameter values to the maximum extent
possible.
The satisficing solution concept, on the other hand, adopts a completely different approach
to decision making. There is no explicit attempt to optimize. Rather than ranking all of the
possible options according to their expected utility, the attributes (selectability and rejectabil-
ity) of each option are compared, and a binary decision is made with respect to each option,
and it is either rejected or it is not. For this problem, the comparison made in terms of the
parameters s and h, and all that is important is their ordinal relationship; numerically precise
values would not even be exploited were they available.
This game illustrates the common-sense attribute of the satisficing solution without the
need for randomizing decisions or for S to guess what H is guessing S is guessing, and so
forth, ad infinitum. The players go to the function that is considered by both to be the least
rejectable. This decision is easily justified by common sense reasoning. Furthermore, the sat-
isficing solution is more robust than than the conventional solution. There is no need to define
numerically precise payoff’s, and there is no need to calculate precise probability distributions
according to which random decisions will be chosen. Finally, there is no need to specify more
than an ordinal relationship regarding player preferences to arrive at a decision.

To apply this result to the distributed manufacturing game, we interpret h as X1 ’s rejectabil-
ity of producing lamps and s as X2 ’s rejectability of producing table cloths. One reasonable
way to compute these rejectabilities is to argue that the rejectability ratio for each player for
its two options ought to be the reciprocal of the individual profit ratios for the two cooperative
h 15 s 5 3
solutions. This approach yields the ratios 1−h = 20 and 1−s = 10 , or h = 7 and s = 31 . Since
s < h in this case, the only jointly and individually satisficing solution is (Tables, Cloths).
8 Discussion
Exclusive self-interest fosters competition and exploitation, and engenders attitudes of distrust
and cynicism. An exclusively self-interested decision maker would likely assume that the other
decision makers also will act in selfish ways. Such a decision maker might therefore impute
damaging behavior to others, and might respond defensively. While this may be appropriate in
the presence of serious conflict, many decision scenarios involve situations where coodinative
activity, even if it leads to increased vulnerability, may greatly enhance performance. Espe-
cially when designing artificial decision-making communities, individual rationality may not
be an adequate principle with which to characterize desirable group behavior.
Satisficing game theory provides a new approach to decision making in multiple distributed
decision-making settings that permits decision makers to take into account the preferences of
others, as well as their own egoistic preferences, when defining its preferences in social en-
vironments. This formulation permits the accommodation of group interests as well as of
individual interests via the mechanism of conditional preferences. The price for this accom-
modation, however, is that the individual rationality concept of demanding the best outcome
for oneself (in terms of equilibration) must be replaced by being content with a solution that is
merely good enough, in the sense of the benefits of implementing it outweighs the costs.
The ability to condition on the preferences of others, not just on their actions, is a key
enabling feature of satisficing game theory. Because of this capability, satisficing game theory
is based upon a more general notion of rationality than individual self interest, and this feature
distinguishes the satisficing game theoretic approach from all approaches that are based on

individual rationality, including game theory and, to best of the author’s knowledge, virtually
all instantiations of decision theory that appear in the literature. Satisficing game theory permits
decision makers to achieve a balance between their own egoistic interests and the interests
of others, and does so without requiring an unconditional sacrifice of its egoistic interests.
The decision maker is able to control the degree to which it would be willing to sacrifice to
accommodate others.
When designing machines to perform tasks that humans might otherwise perform, it is im-
perative that the machines function in ways that are compatible with the ways humans function,
otherwise, humans will not be willing to rely upon the machines. The individual rationality
paradigm of demanding the best and only the best for oneself is a very rigid model of behav-
ior with which humans are not easily able to comply, and machine designs based upon this
paradigm are not likely to be compatible with human behavior. Satisficing, as defined in this
essay as getting what one pays for, on the other hand, is compatible with a great deal of human
behavior, and artificial decision-making systems designed according to that paradigm may be
more compatible with human behavior.
Individual rationality has dominated the decision-theoretic scene for a long time. Over a
century ago, the American pragmatist Charles Sanders Peirce observed the following:
Take, for example, the doctrine that man only acts selfishly—that is, from the
consideration that acting in one way will afford him more pleasure than acting in
another. This rests on no fact in the world, but it has had a wide acceptance as
being the only reasonable theory. (Peirce, 1877).
If exclusive self-interest is not an adequate model for human interaction, why should we sup-
pose it should be an adequate model for artificial decision-making societies that we may wish
to create to interface with humans? Perhaps one of the great appeals of the individual rational-
ity hypothesis is that it leads to simplified analysis of natural systems and systematic synthesis
of artificial systems. Search procedures based upon the calculus of maximization are of such
enormous value that one may be tempted to adopt the individual rationality hypothesis largely
out of convenience. Perhaps what is needed is an alternative systematic synthesis procedure

that does not rely upon the individual rationality hypothesis, yet admits a calculus that, while
perhaps more complicated, nevertheless provides a systematic synthesis tool. The calculus of
satisficing, as developed in this essay, points to such a tool.

Notes
1 Apologies to those who might be offended by gender stereotypes, but this game was widely
discussed long before sexist issues became sensitive and cell-phones became available.
2 Extra-game-theoretic considerations, such as friendship, habits, fairness, etc., may also
be applied to modify the behavior of decision makers, but these approaches typically lead to ad
hoc rules of behavior that may not be compatible with any well-defined notion of rationality.
3 Other researchers have appropriated this term to describe various notions of constrained
optimization. In this essay, however, usage is restricted to be consistent with Simon’s original
concept.
4 In the probability context, let X and Y be random variables and let x and y possible values
for X and Y , respectively. By the law of compound probability, pXY (x, y) = pX|Y (x|y)pY (y)
expresses the joint probability of the occurrence of the event (X = x, Y = y) as the conditional
probability of the event X = x occurring conditioned on the event Y = y occurring, times the
probability of Y = y occurring. This relationship may be extended to the general multivariate
case by repeated applications, resulting in what is called the chain rule.
References
Arrow, K. J. (1951). Social Choice and Individual Values. John Wiley, New York. 2nd ed.
1963.
Arrow, K. J. (1986). Rationality of self and others. In Hogarth, R. M. and Reder, M. W.,
editors, Rational Choice. Univ. of Chicago Press, Chicago.
Axelrod, R. (1984). The Evolution of Cooperation. Basic Books, New York.
Bergson, A. (1938). A reformulation of certain aspects of welfare economics. Quarterly
Journal of Economics, 52:310–334.
Bicchieri, C. (1993). Rationality and Coordination. Cambridge Univ. Press, Cambridge.

Bowling, M. (2000). Convergence problems of general-sum multiagent reinforcement learning.
In Proceedings of the Seventeenth International Conference on Machine Learning.
Cooper, R., DeJong, D. V., Forsythe, R., and Ross, T. W. (1996). Cooperation without rep-
utation: experimental evidence from prisoner’s dilemma games. Games and Economic
Behavior, 12(1):187–218.
Eisen, M. (1969). Introduction to Mathematical Probability Theory. Prentice-Hall, Englewood
Cliffs, NJ.
Fudenberg, D. and Levine, D. K. (1993). Steady state learning and Nash equilibrium. Econo-
metrica, 61(3):547–573.
Glass, A. and Grosz, B. (2000). Socially conscious decision-making. In Proceedings of
Agents2000 Conference, pages 217–224, Barcelona, Spain.
Goodrich, M. A., Stirling, W. C., and Boer, E. R. (2000). Satisficing revisited. Minds and
Machines, 10:79–109.
Hu, J. and Wellman, M. P. (1998). Multiagent reinforcement learning: Theoretical frame-
work and an algorithm. In Shavlik, J., editor, Proceedings of the Fifteenth International
Conference on Machine Learning, pages 242–250.
Kalai, E. and Lehrer, E. (1993). Rational learning leads to Nash equilibrium. Econometrica,
61(5):1019–1045.
Kreps, D. M. (1990). Game Theory and Economic Modelling. Clarendon Press, Oxford.
Lewis, D. K. (1969). Convention. Harvard Univ. Press, Cambridge, MA.
Luce, R. D. and Raiffa, H. (1957). Games and Decisions. John Wiley, New York.
Palmer, F. R. (1971). Grammar. Harmondsworth, Penguin, Harmondsworth, Middlesex.
Peirce, C. S. (1877). The fixation of belief. Popular Science Monthly, 12.

Raiffa, H. (1968). Decision Analysis. Addison-Wesley, Reading, MA.
Rapoport, A. (1970). N-Person Game Theory. The Univ. of Michigan Press, Ann Arbor, MI.
Samuelson, P. A. (1948). Foundations of Economic Analysis. Harvard University Press, Cam-
bridge, MA.
Sandholm, T. W. (1999). Distributed rational decision making. In Weiss, G., editor, Multiagent
Systems, chapter 5, pages 201–258. MIT Press, Cambridge, MA.
Sargent, T. J. (1993). Bounded Rationality in Macroeconomics. Oxford Univ. Press, Oxford.
Schelling, T. C. (1960). The Strategy of Conflict. Oxford Univ. Press, Oxford.
Sen, S. (1996). Reciprocity: a foundational principle for promoting cooperative behavior
among self-interested agents. In Proceedings of The Second International Conference
on Multi-Agent Systems, pages 322–329, Kyoto, Japan.
Shapley, L. S. (1953). A value for n-person games. In Kuhn, H. W. and Tucker, A. W., editors,
Contributions to the Theory of Games. Princeton Univ. Press, Princeton, NJ.
Shubik, M. (1982). Game Theory in the Social Sciences. MIT Press, Cambridge, MA.
Simon, H. A. (1955). A behavioral model of rational choice. Quart. J. Econ., 59:99–118.
Sims, C. (1980). Macroeconomics and reality. Econometrica, 69:1–48.
Stirling, W. C. and Goodrich, M. A. (1999a). Satisficing equilibria: A non-
classical approach to games and decisions. In Parsons, S. and Wooldridge,
M. J., editors, Workshop on Decision Theoretic and Game Theoretic
Agents, pages 56–70, University College, London, United Kingdom, 5 July.
http://www.ee.byu.edu/faculty/wynns/publications.html).
Stirling, W. C. and Goodrich, M. A. (1999b). Satisficing games. Information Sciences,
114:255–280.

Taylor, M. (1987). The Possibility of Cooperation. Cambridge Univ. Press, Cambridge.
von Neumann, J. and Morgenstern, O. (1944). The Theory of Games and Economic Behavior.
Princeton Univ. Press, Princeton, NJ. (2nd ed., 1947).
Wolpert, D. H. and Tumer, K. (2001). Reinforcement learning in distributed domains: An
inverse game theoretic approach. In Proceedings of the 2001 AAAI Symposium: Game
Theoretic and Decision Theoretic Agents, pages 126–133. March 26–28, Stanford Cali-
fornia. Technical Report SS-01-03.

Games Machine Play

Uploaded by

Copyright:

Available Formats

You might also like

Games Machine Play

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Games Machine Play

Uploaded by

Copyright:

Available Formats

Games Machines Play

Competition, which is the instinct of selfishness, is another word for dissipation of

classical mathematical formalization of this theory in multiple-agent settings. Individual

as individual interests through the use of conditional preference relationships, whereby a

January 18, 2002 Page 1 Submitted to Minds and Machines

Such systems are often claimed to be intelligent.

characteristic of intelligence in man or machine—an ability to choose between alternatives.

Classifying “intelligent” systems in terms of anthropomorphic metaphors deals primar-

Whether knowledge is represented by neural connection weights, fuzzy set-membership func-

One question, when designing a decision-making machine or a society of machines, is the

January 18, 2002 Page 2 Submitted to Minds and Machines

whereby a system of autonomous distributed machines coordinate their actions to accomplish

The decision-making mechanisms employed by such machines must be understandable to and

well as individual interests. In Section 4 we introduce an alternative to conventional rational

finishes with a discussion of our theory.

January 18, 2002 Page 3 Submitted to Minds and Machines

option. This procedure is termed the principle of individual rationality.

Individual rationality is the acknowledged standard for calculus/probability-based knowl-

approaches based on anthropomorphic metaphors. When designing neural networks, algo-

according to what it knows.

or she should do in various circumstances. A solution to a game is called a strategy, which is

game. A strategy vector is a set of strategies, one for each participant.

January 18, 2002 Page 4 Submitted to Minds and Machines

to address these key issues.

to the hearts and minds of all participants.

January 18, 2002 Page 5 Submitted to Minds and Machines

a normative basis for their behavior.

representing the benefits to H and S, respectively.

January 18, 2002 Page 6 Submitted to Minds and Machines

states with equal probability.

converge to an alternation scheme under repeated play.

January 18, 2002 Page 7 Submitted to Minds and Machines

upon individual rationality, fosters competition, even though cooperation is desired.

January 18, 2002 Page 8 Submitted to Minds and Machines

is a condition of categorical altruism, whereby a decision maker unconditionally modifies

egoistic preferences intact and avoids needless sacrifice.

his egoistic preference for D over B.

January 18, 2002 Page 9 Submitted to Minds and Machines

rationality hypothesis is “ritualistic, not essential.”

Raiffa, 1957, p. 196).

January 18, 2002 Page 10 Submitted to Minds and Machines

completely whimsical, random, or anarchic must be designed according to an externally im-

will actually dictate behavior.

be made to be consistent with individual preferences if reasonable assumptions regarding the

1982)” and strongly suggests that it be avoided.

January 18, 2002 Page 11 Submitted to Minds and Machines

than the totality of its members (Raiffa, 1968, p. 233, 237).”

doing the best thing possible.

group-wide decision problems. (Schelling, 1960) advocates what he calls a “reorientation”

games mixed-motive games. Although Schelling’s attempt at reorientation is useful, it is not a

fundamental structure of game theory as ultimately dependent on the principle of individual

January 18, 2002 Page 12 Submitted to Minds and Machines

concept and defines a coordination equilibrium as a combination in which no one decision

pure conflict can have a coordination equilibrium.

resulting in what is called the bargaining set.

a probabilistic reciprocity mechanism that elicits cooperation between self-interested agents.

problem (for example, see (Sandholm, 1999)).