Mathematical Social Sciences: Zhanwen Ding, Qiao Wang, Chaoying Cai, Shumin Jiang

Mathematical Social Sciences 67 (2014) 18
Contents lists available at ScienceDirect
Mathematical Social Sciences

journal homepage: www.elsevier.com/locate/econbase
Fictitious play with incomplete learning

Zhanwen Ding , Qiao Wang, Chaoying Cai, Shumin Jiang
Faculty of Science, Jiangsu University, Zhenjiang 212013, PR China
highlights

We model an incomplete learning process where learning need not occur in every period. Examples show that the results for complete learning may not hold for incomplete learning. In an incomplete learning process strict NE is absorbing if it is uniformly played. FP with infrequent switches exhibits consistency if learning is frequent enough. A 2 2 game with an identical learning-period set has FPP if learning is frequent enough.
article
info
abstract
In this paper, we consider a case that a game is played repeatedly in an incomplete learning process where each player updates his belief only in the learning periods rather than all the stages. For fictitious play process with incomplete learning, we discuss the absorbability of Nash equilibriums and the consistency of utilities in a finite game and discuss the convergence in a 2 2 game with an identical learningperiod set. The main results for incomplete learning models are that, if it is uniformly played, a strict Nash equilibrium is absorbing in a fictitious play process; a fictitious play has the property of utility consistency if it exhibits infrequent switches and players learn frequently enough; a 2 2 game with an identical learning-period set has fictitious play property that any fictitious process for the game converges to equilibrium provided that players learn frequently enough. 2013 Elsevier B.V. All rights reserved.
Article history: Received 27 July 2012 Received in revised form 1 August 2013 Accepted 17 October 2013 Available online 28 October 2013
1. Introduction We consider a number of players repeatedly playing a game in a strategic form. A scenario would be that each player keeps track of opponents past actions, makes a belief by interpreting the empirical joint distribution of opponents actions as a stationary distribution of opponents strategy profile, and plays a best response to his belief. This is the idea in fictitious play (FP), first introduced by Brown (1951) as an algorithm to calculate the value of a zero-sum game. In a process of fictitious play, a strict Nash equilibrium has the property of absorbability. That is, once a strict Nash equilibrium is played in a period, it will be played in all the subsequent periods (e.g., Fudenberg and Levine, 1998). It is also shown that, if a fictitious play process exhibits infrequent switches, utility consistency will be observed along the play history. Namely, in the long run any players time average of realized utility will be close enough to the utility he expects to get (see also Fudenberg and Levine, 1998). Fictitious play is said to converge to equilibrium if the sequence of players beliefs converges to the set of Nash equilibrium of the
Corresponding author. Tel.: +86 051188791976. E-mail address: dgzw@ujs.edu.cn (Z. Ding).
game. A game is said to have fictitious play property (FPP) if every fictitious play for the game converges to equilibrium. Zero-sum games (Robinson, 1951), games with identical payoff functions or games that are best-response equivalent to a game with identical payoff functions (Monderer and Shapley, 1996), non-degenerate 2 2 games (Monderer and Shapley, 1996), and non-degenerate 2 n games (Berger, 2005) are found to have FPP. Assuming a particular tie-breaking rule, Miyasawa (1961) established the same result for all 2 2 games, and Metrick and Polak (1994) proved it by an intuitive geometric approach. With no tie-breaking rules assumed, Monderer and Sela (1996) gave an example of 2 2 game that does not have FPP. In a game which has FPP, a fictitious play (FP) may require exponentially many rounds (in the size of the representation of the game) before an equilibrium is eventually played (Brandt et al., 2010). Convergence in fictitious play need not occur in games with more than two pure strategies per player (Shapley, 1964; Foster and Young, 1998; Richards, 1997). If players do not track the entire past history, they may sample from their past observations. This idea leads to partial sampling models, e.g., a model where players have full memory and draw observations randomly from the entire past (Kaniovski and Young, 1995), and a model where players have finite memory and sample from the observations in a certain number of nearest periods (Young, 1993).
0165-4896/$ see front matter 2013 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.mathsocsci.2013.10.004
Z. Ding et al. / Mathematical Social Sciences 67 (2014) 18
In all the work mentioned above, the common aspect is that every player moves (learns from the entire or partial history and updates his belief) in each period. We refer to such a learning process as a complete learning process. For some reasons in the real market (e.g. lags in producing or disseminating price lists), a firm may be committed to its actions for a finite length of time, during which other firms might move (Maskin and Tirole, 1988a). Maskin and Tirole (1987, 1988a,b) introduced and studied a class of alternating-move infinite-horizon models of duopoly, where two players move not simultaneously but alternatingly (once one firm has moved, it will not move again in the next period). Based on the assumption that each firm reacts to only the current action of the other rather than to an entire history of actions by both firms, dynamic programming equations for a Markov perfect equilibrium (MPE) are derived in their models. In this work, we also relax the assumption of complete learning in the process of fictitious play. And we consider a general situation, where players belief updating and strategy changing occur in a subsequence of the entire periods. Updating sporadically may be more rational when beliefs cannot be updated in time if the game is played fast (e.g., a rockscissorspaper game), or when frequent changes are not necessary if there is some trade cost to pay (e.g., in stock markets). On the other hand, we still assume that all players track the entire past history. In other words, we assume in our work that each player observes opponents move in each period, but only updates his belief and plays a new best response in a learning period and keeps his belief and action in any nonlearning period. We call it a process with incomplete learning if learning period only occurs in a subsequence of the entire period set. In this work, we reconsider the fictitious play with incomplete learning, discuss the absorbability of strict Nash equilibriums and the utility consistency in a fictitious play for any finite game, and discuss the convergence of fictitious play for 2 2 games with an identical learning-period set. 2. Fictitious play
i We consider a game G = { = {1, 2, . . . , N }; S = N i=1 S ; u = i (u1 , u2 , . . . , uN )}, where is the player set, S is player is pure strategy space consisting of a finite number of actions (pure strategies), S is the pure strategy profile space for all players, and ui is player is payoff function. S i is used to denote j=i S j and every si S i represents a pure strategy profile of the other N 1 players except player i. The N players play the game G repeatedly in discrete time, and the horizon is infinite. Periods are indexed by t (t = 1, 2, . . .) and the game G is played once each period. In a process of fictitious play, players behave as if they think they are facing a stationary, but unknown, distribution of opponents strategies (Fudenberg and Levine, 1998). In this kind of process, each player forms his belief about the probability distribution of his opponents move according to his observations in the past periods and chooses a myopic pure best reply to his belief. Below we briefly introduce a mathematic description by Fudenberg and Levine (1998) for complete learning processes. Suppose that player i has an exogenous initial weight function i i ki 1 (s ), which is defined on S . In the following periods, this weight function is updated by adding 1 to the weight of each strategy profile si S i each time it is actually played so that
i i up in the weight vector (ki S i ), which is simply t (s ) : s i written as kt .
According to the weight vector ki t , player i assigns a probability

i ki t (s ) rti (si ) = , ki t (s)
sS i
(2)
to si that opponents will use in period t . Player is fictitious belief in period t is described as a probability vector rti = (rti (si ) : si S i ). A fictitious play (with complete learning) is then defined as a learning process that any player i updates his belief in every period and chooses a best pure response to his belief. That is, in any time period he forms his belief according to (2) and chooses a pure strategy i (rti ) Bi (rti ), where Bi () is player is best-response correspondence. In the following we generalize the above formulation for complete learning process to our model for incomplete learning process. We suppose that player is learning process is incomplete and he updates his belief only in a subsequence of periods: i i i t1 , t2 , . . . , tn , . . . , which are called is learning periods. In the other periods (non-learning periods) player i keeps his belief and action unchanged. When player i does not update his belief in a nonlearning period, we assume, he still observes other players actions. But it needs to be noted that, in any period considered, an opponent having also an incomplete leaning period set may actively change his strategy or passively keep his action unchanged. If player i is able to detect an active selection by that player, player i may pay more attention to it. However, in our work we make an assumption that any player i does not know when his opponents are active or passive and can only assign an equal weight 1 to si S i whenever it is observed (no matter whether it is actively played or not). That is, he always counts the weight of si S i in the same way as shown in (1). i In any learning period tn , then player i is able to update his belief according to the weight vector ki i so that
tn
ki i (si ) t r ii (si ) = n i . tn k i ( s)
sS i tn
(3)
Since we assume that player i uses (1) to count weights, we have i that the sum of all the weights counted for S i up to time tn is i i i = sS i ki1 (s) sums player i initial weights. K1 + tn 1, where K1 So we rewrite (3) as
r ii tn
(s ) =
i
ki i (si )
tn i K1 i 1 + tn
(4)
i After updating his belief in tn , player i will not do it until the i i next learning period tn+1 . That is, he keeps his belief up to tn +1 1:
rti = r ii ,
tn
i i tn t < tn +1 .
(5)
1, i i i ki t (s ) = kt 1 (s ) + 0,
if if
i s t 1 i t 1 s
=s = si .
(1)
i i i i Namely, for any si S i , ki i (s ), where t (s ) = kt 1 (s ) + I{ st 1 } i s t 1 is the strategy profile actually played by opponents in period t 1 and I{ s i } is an indicative function on the single point set
t 1
i { s t 1 }. Then player is observations along the history are summed
Then a fictitious play process with incomplete learning can be formulated in a similar way as done for the complete learning process: in his learning period t player i updates his belief according to (4) and plays a pure strategy i (rti ) Bi (rti ), while keeps his belief and action unchanged in his non-learning period. it , If the action actually played by player i in period t is denoted by s then the action path for player i in a fictitious play with incomplete
Z. Ding et al. / Mathematical Social Sciences 67 (2014) 18 Table 1 Players weight vectors and actions. Jump steps t Player 1 k1 t 3 1 4 5 5 10 7 11 18 9 19 28 11 29 40 Player 2
(x0 , y0 ) is played (P )
or deviated (D)
1 s t
x1 x0 x0 x0 x1 x0 x0 x0 x1 x0
k2 t
2 s t
y0 y0 y1 y0 y0 y0 y1 y0 y0 y0 P D P D P D P D P
(0, 1) (0 + 3, 1) = (3, 1) (3 + 1, 1) = (4, 1) (4, 1 + 5) = (4, 6) (4 + 1, 6) = (5, 6) (5 + 7, 6) = (12, 6) (12 + 1, 6) = (13, 6) (13, 6 + 9) = (13, 25) (13 + 1, 25) = (14, 25) (14 + 11, 25) = (25, 15)
(1 , 0 ) (1, 0 + 3) = (1, 3) (1 + 1, 3) = (2, 3) (2 + 5, 3) = (7, 3) (7 + 1, 3) = (8, 3) (8, 3 + 7) = (8, 10) (8 + 1, 10) = (9, 10) (9 + 9, 10) = (18, 10) (18 + 1, 10) = (19, 10) (19, 10 + 11) = (19, 21)
learning can be formulated as1
i i = i (rti ) Bi (rti ), s
i i it = s it i (tn s t < tn +1 ). n tn i , s {(s )} =1 actually i i i
In such a two-player game, player 1s weight vector k1 = (k (y0 ), k1 (y1 )) and player 2s k2 = (k2 (x0 ), k2 (x1 )); player 1s
1
(6) played
belief vector r 1 = (r 1 (y0 ), r 1 (y1 )) = (1 q, q) and player 2s r 2 =
Given a path of strategy profile i by the N players, for every tn and every s ki i (si ) = ki i
tn tn1
S , we must have
(si ) +
i 1 tn
I{ i (s i s } = tn 1
). Then from (4), we get

i I{ i (s ) s }
i i i i ( K1 + tn 1 1)rt i (s ) +
n1
i 1 tn i = tn 1
(r 2 (x0 ), r 2 (x1 )) = (1 p, p). The above payoff matrix tells that player 1 will play x1 when q > 1/2 (i.e. k1 (y1 ) > k1 (y0 )) and play x0 when q < 1/2 (i.e. k1 (y1 ) < k1 (y0 )). Player 2 has a similar response to r 2 and k2 . Suppose player 1s initial weight vector k1 1 = (0, 1) and his learning time set T 1 = {1, 4, 11, 18, 29, 40, . . .}; player 2 2s initial weight vector k2 1 = (1, 0) and the learning time set T = {1, 5, 10, 19, 28, . . .}. Then we obtain the two players weight
vectors and actions as shown in Table 1, which jumps over all the non-learning periods of both players and shades the corresponding cells when a player updates belief and action in his learning period. From Table 1 we see that the strict Nash equilibrium (x0 , y0 ) is actually played by both players in an infinite number of periods (in period t = 4, 10, 18, 28, 40, . . . , or equivalently, in any period t = m2 + 3m for m = 1, 2, 3, . . .). However, (x0 , y0 ) is always deviated in a later period once (x0 , y0 ) is played. That is, the strict Nash equilibrium (x0 , y0 ) is played infinite times and also deviated infinite times. Theretofore, this strict Nash equilibrium (x0 , y0 ) does not have the property of absorbability in this incomplete learning process. But we also note that in every period t = m2 + 3m in this example, only one player has learnt and updated his belief and action and another player has been passive. So we give the following definition for the situation that each player has learnt at least one time in some periods considered. Definition 1. A strategy profile s = (si , si ) is uniformly played (by all players), if there exist T1 and T2 such that s is played in all the periods contained in the interval [T1 , T2 ] and each player has learnt at least in one period contained in [T1 , T2 ]. Below we show that in an incomplete learning process, a strict Nash equilibrium will also be fixed in all later periods once it is uniformly played in a period interval.
r ii (si ) =
tn
i i 1 K1 + tn
(7)
3. Absorbability of strict Nash equilibrium
is a strict Nash equilibrium, a strategy profile Suppose that s i , s i ) > ui (si , s i ) that for each player i the strict inequality ui (s holds for any si S i \ { si }. Then in a model of complete learning, s must be absorbing for the process of fictitious play. That is, once it will be played in all subsequent periods (see is played in a period, s e.g., Fudenberg and Levine, 1998). However, this conclusion cannot be established for the incomplete learning process. The following example shows that a strict Nash equilibrium may not be absorbing for the process of fictitious play in an incomplete learning model.
Example 1. Consider a two-player symmetric game with payoff matrices: Player 1 y0 x0 (1, 1) x1 (0, 0) y1 (0, 0) (1, 1)
Player 2
= (x0 , y0 ) is a strict Nash equilibObviously, the strategy profile s rium.
1 Eq. (6) means player is actions are adjusted in learning periods and locked in non-learning periods. It is noted that this formulation is only dependent on the beliefs in learning periods. In other words, whether player i adjusts beliefs in non-learning periods or not, Eq. (6) will take a same form provided that actions are locked in non-learning periods. So we can have a different interpretation for incomplete learning processes: even if actions are locked in non-learning periods, beliefs may be adjusted in every period. This interpretation will not change Eq. (6), and hence will not change the results in Sections 3 and 4 (because the analysis in the following two sections is based only on players action path).
i is player is only best response to s i , i.e. Lemma 1. Suppose that s i , s i ) > ui (si , s i ) for all si S i \ { i is a best response to ui ( s si }. If s i player is belief rT in period T and the other players keep their strategy i in all the periods contained in [T , T ), then s i must be a profile s i unique best-response to player is belief rT in period T . i in Proof. si S i . Because other players keep their strategy s [T , T ), must holds
i i i i ki si } (s ). T (s ) = kT (s ) + (T T )I{
4
i i Then by (4) we get rT (s ) = i K1 +T 1

ki (si )+(T T )I{ (si ) T si } i K1 +T 1
. So we have
i rT =
ki T
i i K1 + T 1 K1 +T 1 i K1 +T 1 i K1 i rT +
T T
i K1 + T 1
I{ si }
T T
i K1
+ T 1
I{ si }
i (1 )rT + I{ si } ,
where =
T T . i K1 +T 1
Therefore, for every si i we must have
i i i i i i ui (si , rT ) = (1 )u (s , rT ) + u (s , I{ si } ) i i ). = (1 )ui (si , rT ) + ui ( s i , s i
and Levine (1998). (ii) The intuitive meaning of Lemma 1 is that if the strategies of the other players stay the same within some window of time, and the strategy of player i is the best response both to his beliefs at the beginning of the window and to the profile of strategies of the other players, then the strategy of player i will still be the best response of player i to his beliefs at the end of the window. (iii) Instead of for all players, Lemma 1 is stated for a specific player whose strategy is the only best response to the profile of strategies of the other players. However, the proof of Proposition 1 acquires that Lemma 1 holds for any active player in any time wini is the only best dow. So it is needed that any player is strategy s i , i.e. s = (s i , s i ) is a response to opponents strategy profile s strict Nash equilibrium. (8) 4. Utility consistency In a process of fictitious play, each player keeps track of the data about the frequency of opponents actions and makes belief about the joint distribution of opponents strategies. In period t , with a belief rti player i may have an expected payoff Uti = max ui ( i , rti ).
i S i
must be player is unique best response Then we conclude that s i i i is a best response to rT i , s i ) > ui (si , s i ) for to rT and ui (s , since s i i i all s S \ { s }. = (s i , s i ) is a strict Nash equilibrium and s Proposition 1. If s will be played in all subsequent periods. is uniformly played, then s Conversely, any pure-strategy steady state in the process of fictitious play must be a Nash equilibrium.
Proof. According to the definition of uniform playing (Definition 1), there must exist T1 and T2 such that all the N players keep = (s i , s i ) in [T1 , T2 ]. Suppose that T3 is the the strategy profile s first learning period after T2 , and P denotes the set of players who learn in period T3 . Then each player i P keeps the strategy i in [T2 , T3 ) (and hence in [T1 , T3 )) and each player i \ P keeps s i in [T2 , T3 ] (and hence in [T1 , T3 ]). strategy s Now we consider the players in P . According to the definition of uniform playing again, each player i P must have learnt in a i must period i [T1 , T2 ], in which period player is actual choice s i be a best response to ri . Noting that all the other N 1 players keep
(9)
i i , s However, player is realized payoff in any period is ui (s ) i i , s since the actually played strategy profile is (s ) . Then the time average of player is realized payoffs is t 1
Uti =
t =1
i i , s ui ( s ).
(10)
The concept of utility consistency focuses on how well each player does relative to the utility he expects to get. This issue is formally addressed by the following definition (e.g. Fudenberg and Levine, 1998). Definition 2. Fictitious play is -consistent along a history if for any > 0, there exists a T such that for all i Uti + Uti holds for any t T . Utility consistency in fictitious play processes with complete learning can be guaranteed by the condition of infrequent switches, which requires low frequency for players changing strategies, i.e. low frequency of switches. In a process with complete learning, player is frequency of switches at any time t is defined as (Fudenberg and Levine, 1998) (11)
i in [T1 , T3 ) (and hence in [i , T3 )) and a strict Nash equilibrium s i , s i ) means s i is the only best response to s i , we conclude from (s i must be player is unique best response in period Lemma 1 that s i in period T3 . Therefore, T3 , which means player i must choose s i up to T3 . each player i P keeps his strategy s in all the periods So all the N players keep the strategy profile s uniformly in [T1 , T3 ]. By doing it repeatedly, in [T1 , T3 ] and play s will be played in all the we can show that the strategy profile s subsequent periods. Conversely, if the game play remains at a pure-strategy profile i , s i ) after a period T , then (s
lim r ii n tn
= I{ si } ,
i = 1, 2, . . . , N .
And it required that in the process of fictitious play, for every si S i holds
ti =
1 t
i i t ,s =s 1
1,
(12)
i , r ii ) ui (si , r ii ), ui ( s
tn tn
i tn T.
which leads to the following definition of infrequent switches in a process with complete learning (Fudenberg and Levine, 1998). Definition 3. Fictitious play exhibits infrequent switches, if for i for all every > 0 there exists a T such that for any t T , t i i . Namely, limt t = 0 for all i . The result about the utility consistency in fictitious play processes with complete learning was established about two decades ago, and it is stated by the following proposition (Fudenberg and Levine, 1998). Proposition 2. If fictitious play exhibits infrequent switches along a history, then it is -consistent along that history for every > 0.
i i i , I{ Let n , then we have ui (s si } ) u (s , I{ si } ), i.e. for all i
,s ) u (s , s ) u (s
i i
i , s i ) is a Nash equilibrium. holds for all si S i . It means (s

About Lemma 1 and Proposition 1 we have three marks. (i) Our proof of proposition is based on Lemma 1, and Lemma 1 is based on the linearity of Eq. (8), which is similar to the one considered in the proof of Proposition 2.1 in Fudenberg and Levine (1998). The difference is that in our proof of Lemma 1 the utility is computed at a more future stage rather than at a next stage as done in Fudenberg
Z. Ding et al. / Mathematical Social Sciences 67 (2014) 18 Table 2 Players weight vectors and best responses. t Player 1 k1 t 1 3 20 3 21 3 22 3 23 3 24 3 25 Player 2
1 s t
x1 x0 x1 x0 x1 x0 x1
k2 t
2 s t
y0 y1 y0 y1 y0 y1 y0
(0 , 1 ) (2 , 1 ) (2, 1 + 3 20 ) (2 + 3 21 , 1 + 3 20 ) (2 + 3 21 , 1 + 3 20 + 3 22 ) (2 + 3 21 + 3 23 , 1 + 3 20 + 3 22 ) (2 + 3 21 + 3 23 , 1 + 3 20 + 3 22 + 3 24 )
(1 , 0 ) (1 , 2 ) (1 + 3 20 , 2) (1 + 3 20 , 2 + 3 21 ) (1 + 3 20 + 3 22 , 2 + 3 21 ) (1 + 3 20 + 3 22 , 2 + 3 21 + 3 23 ) (1 + 3 20 + 3 22 + 3 24 , 2 + 3 21 + 3 23 )
In what follows we generalize the result established in Proposition 2 for complete learning processes to fictitious play processes with incomplete learning. We first note that, player is switch frequency (12) can be straightly extended to the case of incomplete learning process. That is, in an incomplete learning model, given player is learning i i i periods (t1 , t2 , . . . , tn , . . .) we define is frequency of switches at any time t as
Then from (4) we have
1 2 , , 3 3 rt1 = (rt1 (y0 ), rt1 (y1 )) = 2 1 , ,

3 3
if t = 3 22n1 (14a) if t = 3 22n ; if t = 3 22n1 (14b) if t = 3 22n .
ti =
1 t
i =s i tji t ,s t t j j1
(tji tji1 ).
(13)
2 1 , , 3 3 rt2 = (rt2 (x0 ), rt2 (x1 )) = 1 2 , ,

3 3
i Obviously, when {tn } = {1, 2, . . . , n, . . .}, ti in (13) is reduced to i that in (12). Then by the notation t described in (13), infrequent switches in our incomplete learning model can be defined in the same way as done in Definition 3. In addition, for an incomplete learning process we need to consider a new condition, which requires that each players learning is somewhat frequent even if his learning does not occur in all periods. Formally, we have the following definition.
And from the payoff matrices (see Example 1) we immediately get that for both t = 3 22n1 and t = 3 22n , Ut1 = max u1 ( 1 , rt1 ) =
1 S 1
2 3
Ut2 = max u2 ( 2 , rt2 ) =

1 S 1
2 3
Definition 4. A players learning is frequent enough if his learning t t period set {tn } satisfies the condition2 : limn n+t1 n = 0, i.e. limn
tn+1 tn
n
= 1.
It needs to be pointed out that, if it fails the frequent learning test given in Definition 4, a fictitious play process may fail to exhibit utility consistency. This observation can be seen from the following example. Example 2. Consider a 2 2 game with the same payoff matrices shown in Example 1. Suppose that player 1s initial weight vector 2 k1 1 = (0, 1), player 2s k1 = (1, 0), and both players have a same 1 set of learning periods {1, 3 20 , 3 21 , 3 22 , 3 23 }, i.e. tn = tn+1 2 n2 tn = 3 2 tn . Since limn t = 2, this learning process n does not meet the frequent learning requirement by Definition 4. We below show that this example does not have the property of utility consistency. By similar calculation as done in Example 1, we obtain the two players weight vectors and best responses in their learning periods and we list them in Table 2. From Table 2 we can easily get the two players weight vectors: k1 = (22n1 , 22n ), t =322n1 k2 = (22n , 22n1 ), t =322n1 k1 = (22n+1 , 22n ); t =322n k2 = (22n , 22n+1 ). t =322n
However, Table 1 tells that in each period the actually played 1 2 strategy profile (s t,s t ) is either (x1, y0 ) or (x0, y1 ), which means each players time average of his realized payoffs along such a history is zero. So we see it fails to satisfy inequality (11) in Definition 2 for -consistency when < 2/3. In the following we are to show that if an incomplete fictitious process meets both the conditions given by Definition 3 and by Definition 4, then it must have the property of utility consistency. We state this generalized result later in Proposition 3, before which we give a lemma as follows.
i i Lemma 2. Suppose that player is learning period set is {t1 , t2 ,..., i tn , . . .}. If his learning is frequent enough, then for any subsequence i i i i {n }n=1 such that tn n < tn +1 for n = 1, 2, . . . , we have i i limn (U i U i ) = 0.
tn
Proof. By the definition of Uti (see Eq. (10)), U i i U ii =

n
tn
n 1
i
i n
=1
i i , s ui (s )
tn 1 i tn =1
i i , s ui (s ) tn
i = n +1 i
i i ) (tn n
n
i
=1
i i i , s ui (s ) n i ti n n
i i , s ui ( s )
i = maxsS |ui (s)|, then we have Let u

i i i |U i U i| u t
n n
i i i i i i i i i (tn (tn 2u n n )n + n (tn n ) ) = . i ti i n t n n

i tn +1 i tn
Since player is learning is frequent enough, limn

i i i Then from tn n < tn +1 (which means limn
i n i tn
= 1.
= 1), we have
2 This condition can easily be satisfied mathematically. For instance, it is obviously satisfied by any sequence tn as a polynomial of n.
limn (U i i U ii ) = 0.
n
tn
Proposition 3. Any fictitious play process with incomplete learning is -consistent, if it exhibits infrequent switches and each players learning is frequent enough in this process. Proof. Suppose that the path of players actually played strategy i i , s profile is {(s )} =1 . Then from (7) we have, for any i and
i tn
i ), must holds Since ui is bounded (for any s S , |ui (s)| u |Ctii |

n
i 2u
i i 1 K1 + tn
i i , s i =s tji+1 tn t t j j+1
(tji+1 tji ) =
i i tn 2u i i 1 K1 + tn
ti i ,
n
T,
i
r ii tn
i i i (K1 + tn 1 1)r i
t i 1 + n i I i s } t =tn1 { n1 i i K1 + tn 1
i 1 tn
where i i is defined as in (13). By the definition of infrequent switches, i i 0. So we have

i limn Cn = 0. i holds i i = s Next we prove limn (Bi i U ii ) = 0. From (6), s tn tn tj tn tn
; hence
i i (K1 + tn 1)rtii
n
r ii tn1
i = tn 1
I{ i s }
i i K1 + tn 1 1
for all
tji
tji+1
in 1. Then it is easy to see B
i i 1 + tn K1 i i tn t1 i i t1 and tn
Bi n
Therefore U i i
tn1
= ui ((rtii ), rtii ) ui ((rtii ), rtii )

n1 n1 n n1
is just player is average of realized payoffs between 1. i , Since U ii is player is average of realized payoff between 1 and tn we have ui ((r ii ), I{ i ) s }
tn
tn
i i ( K1 + tn 1)ui ((rtii ), rtii )

n n
i 1 tn
i = tn 1
i t1 1 1 i i i i i i i i , s t i , s U ii = i ) . ui (s ) + (tn t1 )Bn + u (s t ti
n
tn
=1
i K1 i i ( K1 + tn 1)Uti i
n
i tn 1
1
tn
Then
i 1 tn i = tn 1
ui ((r ii ), I{ i ) s }
=
Then U i i tn
i i K1 + tn 1 1
in U ii = B
tn
i t1 i tn
in B
1
i tn
i t1 1 i i i i i , s t i , s t i ) . ui (s ) + u (s
=1
n n
in | u i ) implies |B i Noting that the boundedness of ui (|ui | u

1
i i 1 K1 + tn
i 1 tn
(
i K1
i tn 1
1)
U i i tn 1
i in is player is average of realized payoffs (between t1 since B and i tn 1), from the above equation we get
u ((
i
i = tn 1
r ii tn
), I{ i ) s }
i 1 tn
in U ii | |B t
n
i i t1 + (t1 1) + 1 i tn
i = u
i 2t1 i tn
i, u
i i U ii ) = 0. which means limn (B

tn tn
1
i i 1 K1 + tn
i i i ( K1 + tn + 1 1)Ut i
n1
i i i , s ui ( s ) . tn
i since U ii The boundedness of ui also guarantees that |U ii | u

tn
i = tn 1
i is player is average of realized payoffs (between 1 and tn ); then
tn
i i U ii ) = 0 and from limn (B

tn tn tn i Bi n U i = tn i i tn t1 i K1 i 1 + tn i i tn t1 i K1 i 1 + tn
The reason for the equality above to hold is that { si i } = ( ii ) and

i i ) = 1 but I{ i ) = 0 for any s i = I{ s i (s i (s . Therefore, by s } s } induction, we have i K1 i K1 tn
in U ii B
tn i i K1 + t1 1 i i 1 K1 + tn
U i i
tn
+ +
i t1 i tn
1 1 i Ui + i i 1 1 t1 K1 + t n 1
i K1
i
+1 1 n1 tj
=
i i i , s ui (s ) tj+1
in U ii ) (B t
n
U ii ,
tn
j =1
=tji
i we immediately get limn (Bi n U i ) = 0. i i By now we have shown that in (15) Ai n 0, Bn U i 0 and tn tn
i i K1 + t1 1 i K1
i n1 tj+1 1
i tn
1
1
U i i +
t1
1
j =1
i tn
i i i , s ui (s ) tj
=tji
i Cn 0 hold for all i . So for any > 0, there must exist a T i such that for all i , U ii + > U i i for any tn T . Together with i i the fact Uti = U i i (tn t < tn+1 ) and the result in Lemma 2, we tn conclude that there must be a T such that for all i , Uti + Uti for any t T . tn tn
+1 1 n1 tj
i i 1 + tn K1 j=1 =t i j
i i i i it i , s t i , s [ui (s ) u (s )]
j+1 j
i i Ai n + Bn + Cn
(15)
5. Convergence of fictitious play in 2 2 games In a complete learning process with a particular tie-breaking rule assumed, the convergence of fictitious play in any 2 2 game is shown by Miyasawa (1961) and also by Metrick and Polak (1994). In this section, we discuss the convergence of fictitious play in a 2 2 game played in an incomplete learning process, and we pay our attention only to a particular case that the learning-period sets for the two players are identical.
where Ai n 0 obviously. i i i i i i , s i, s i i Noting that ui (s ) u (s ) = 0 for s

tj+1 tj
tj+1
it i , we can =s
j
rewrite
i Cn
i Cn
as 1
tji+1 1
i i i i tji+1 tn ,s i =s t t j j+1
=tji
i K1
i 1 + tn
i i i i t i , s it i , s [ui (s ) u (s )].
j+1 j
Consider a game with two players and two pure strategies for each player. The two players have a same learning-period set : 1 2 {tn } = {tn } = {tn }. Player 1s pure strategy space S 1 = {x0 , x1 }; player 2s pure strategy space S 2 = {y0 , y1 }. Denote player 1s belief by r 1 = (1 q, q), which assign probabilities 1 q and q to player 2s pure strategy y0 and y1 , respectively. Player 2s belief is described as r 2 = (1 p, p), by which player 2 assigns a probability p to player 1s pure strategy x1 and a probability 1 p to x0 . In a period t , the strategy actually played by player 1 is denoted by x(t ) and the one actually played by player 2 is denoted by y(t ). Without loss of generality, we suppose the first learning period t1 = 1; otherwise, we need only to put each players observed data before the first learning time into his initial weight. From (7) we have, in a process with the same learning-period set {t1 , t2 , . . . , tn , . . .}, p(tn ) =
2 (K1 + tn1 1)p(tn1 ) + (tn tn1 )I{x(tn1 )} (x1 ) , 2 K1 + tn 1
For an incomplete learning process in a 2 2 game, the convergence property of fictitious play can be proven by the same method in Metrick and Polak (1994), provided that a same tie-breaking rule is assumed and the jump conditions similar to (17) are satisfied:
n
lim (p(tn+1 ) p(tn )) = 0,
lim (q(tn+1 ) q(tn )) = 0.
(18)
Since the proof can be done almost word by word as done in Metrick and Polak (1994), we omit it for shorter pages. Here we need only to point out that (18) must hold if the learning is frequent enough. In fact, from (16a) and (16b) we get p(tn+1 ) p(tn ) =
(tn tn+1 )[p(tn ) I{x(tn )} (x1 )]

tn+1 tn+1
1 K1 + tn+1 1
(16a)
q(tn+1 ) q(tn ) =
(tn tn+1 )[q(tn ) I{y(tn )} (x1 )]

tn+1 tn+1
1 ( K1 + tn1 1)q(tn1 ) + (tn tn1 )I{y(tn1 )} (y1 ) q(tn ) = , (16b) 1 K1 + tn 1
1 K1 + tn+1 1
,
tn+1 tn tn+1
which tells (18) must hold if limn
= 0, i.e. if players
where t1 = 1, p(1) =
k2 (x ) 1 1
2s initial weights, and the sum of player 1s initial weights. The convergence of a fictitious play process in a 2 2 game G implies that (p(tn ), q(tn )) converges to (p , q ) and the strategy profile ((1 p , p ), (1 q , q )) is a Nash equilibrium of G. In an incomplete learning process, the convergence property may fail to be observed. For instance, we return to Example 2 with the infrequent learning period set {1, 3 20 , 3 21 , 3 22 , 3 23 . . .}. Note that p(t ) is written for rt2 (x1 ) and q(t ) is for rt1 (y1 ); then from (14a) and (14b) in Example 2 we get p(t = 3 22n1 ) = q(t = 3 2
2n1
2 K1 1 K1 is
, q(1) =
k1 (y ) 1 1
1 K1
2 , K1 is the sum of player
learning is frequent enough. Therefore, having the same assumption of tie-breaking rule (Metrick and Polak, 1994), we have the following result. Proposition 4. In an incomplete process of fictitious play for a 2 2 game G with an identical learning-period set {tn } n=1 , if players learning is frequent enough then for any initial value (p(1), q(1)), the sequence (p(tn ), q(tn )) must converges to (p , q ) and the strategy profile ((1 p , p ), (1 q , q )) must be a Nash equilibrium of G. 6. Conclusion
1 3 2 3
, ,
p(t = 3 22n ) = q(t = 3 2 ) =

2n
2 3 1 3
;
In this work, we have formulated fictitious play (FP) for the incomplete learning process where learning need not occur in every period. In an incomplete learning process, each player observes the other players actions in every period, but keeps his belief and action in any non-learning period and only updates them in every learning period. Examples are given to show that properties for a complete learning process may not be established for an incomplete learning process, such as absorbability of strict Nash equilibrium, utility consistency, and convergence in 2 2 games. We have shown that, in a model with incomplete learning, a strict Nash equilibrium is absorbing in a process of fictitious play if it is uniformly played; a fictitious play has the property of utility consistency, if each players learning is frequent enough and this fictitious play process exhibits infrequent switches; and in a 2 2 game with an identical learning-period set, a fictitious play must converge to equilibrium, if players learn frequently enough and a tie-breaking rule is assumed. Acknowledgments We are very grateful to the anonymous referees for their valuable comments and suggestions that greatly help us to improve the paper. Financial support by the National Natural Science Foundation of China (No. 71171098 and No. 51306072) and by the Jiangsu Provincial Fund of Philosophy and Social Sciences for Universities (No. 2010-2-8) is gratefully acknowledged.
)=
which means (p(tn ), q(tn )) does not converge. This observation tells that the fictitious play process with infrequent learning may not be convergent. However, the fictitious play for this game with complete learning must converge, since it is a game with identical payoff functions (Monderer and Shapley, 1996) and also a non-degenerate game (Monderer and Shapley, 1996; Berger, 2005). Then a question arises: under what condition can we ensure the convergence of an incomplete learning process in a 2 2 game? Our answer is, for a 2 2 game with an identical learning-period set, any fictitious play is convergent if a particular tie-breaking rule is assumed (Metrick and Polak, 1994) and players learning is frequent enough (Definition 4). The proof of this conclusion can be done in the same way as in Metrick and Polak (1994). Metrick and Polak (1994) proved that a fictitious play (with complete learning) in a 2 2 game must converge to equilibrium if a tie-breaking rule is assumed (i.e. whenever there is a tie, x1 is selected over x0 and y1 is selected over y0 ). Their approach rests entirely on the geometric properties of the best-response correspondence. The mathematic key in his proof is that the step size of players move in the process of fictitious play converges to zero and hence becomes arbitrarily small. That is, for the complete learning process in any 2 2 game must hold
t
lim (p(t + 1) p(t )) = 0,
lim (q(t + 1) q(t )) = 0.
(17)
Z. Ding et al. / Mathematical Social Sciences 67 (2014) 18 Maskin, E., Tirole, J., 1988b. A theory of dynamic oligopoly, II: price competition, kinked demand curves, and edgeworth cycles. Econometrica 56 (3), 571599. Metrick, A., Polak, B., 1994. Fictitious play in 2 2 games: a geometric proof of convergence. Economic Theory 4 (6), 923933. Miyasawa, K., 1961. On the convergence of the learning process in a 2 2 non-zerosum two-person game. In: Economic Research Program, Princeton University, Research Memorandum No. 33. Monderer, D., Sela, A., 1996. A 2 2 game without the fictitious play property. Games and Economic Behavior 14 (1), 144148. Monderer, D., Shapley, L.S., 1996. Fictitious play property for games with identical interests. Journal of Economic Theory 68 (1), 258265. Richards, D., 1997. The geometry of inductive reasoning in games. Economic Theory 10 (1), 185193. Robinson, J., 1951. An iterative method of solving a game. Annals of Mathematics 54 (2), 296301. Shapley, L., 1964. Some topics in two-person games. In: Dresher, M., Shapley, L.S., Tucker, A.W. (Eds.), Advances in Game Theory. In: Annals of Mathematics Studies, vol. 52. Princeton University Press, Princeton, pp. 129. Young, P., 1993. The evolution of conventions. Econometrica 61 (1), 5784.
References
Berger, U., 2005. Fictitious play in 2 n games. Journal of Economic Theory 120 (2), 139154. Brandt, F., Fischer, F., Harrenstein, P., 2010. On the rate of convergence of fictitious play. In: Proceedings of the 3rd International Symposium on Algorithmic Game Theory SAGT-2010. In: Lecture Notes in Computer Science, vol. 6386. Springer, Berlin, pp. 102113. Brown, G.W., 1951. Iterative solutions of games by fictitious play. In: Koopmans, T.C. (Ed.), Activity Analysis of Production and Allocation. John Wiley & Sons Inc., New York, pp. 374376. Foster, D., Young, P., 1998. On the nonconvergence of fictitious play in coordination games. Games and Economic Behavior 25 (1), 7996. Fudenberg, D., Levine, D.K., 1998. The Theory of Learning in Games. The MIT Press, London, pp. 2950 (Chapter 2). Kaniovski, Y., Young, P., 1995. Learning dynamics in games with stochastic perturbations. Games and Economic Behavior 11 (2), 330363. Maskin, E., Tirole, J., 1987. A theory of dynamic oligopoly, III: cournot competition. European Economic Review 31 (4), 947968. Maskin, E., Tirole, J., 1988a. A theory of dynamic oligopoly, I: overview and quantity competition with large fixed costs. Econometrica 56 (3), 549569.

Mathematical Social Sciences: Zhanwen Ding, Qiao Wang, Chaoying Cai, Shumin Jiang

Uploaded by

Copyright:

Available Formats

You might also like

Mathematical Social Sciences: Zhanwen Ding, Qiao Wang, Chaoying Cai, Shumin Jiang

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Mathematical Social Sciences: Zhanwen Ding, Qiao Wang, Chaoying Cai, Shumin Jiang

Uploaded by

Copyright:

Available Formats

Mathematical Social Sciences 67 (2014) 18

Contents lists available at ScienceDirect

Mathematical Social Sciences

Fictitious play with incomplete learning

Z. Ding et al. / Mathematical Social Sciences 67 (2014) 18

i i up in the weight vector (ki S i ), which is simply t (s ) : s i written as kt .

According to the weight vector ki t , player i assigns a probability

i { s t 1 }. Then player is observations along the history are summed

learning can be formulated as1

belief vector r 1 = (r 1 (y0 ), r 1 (y1 )) = (1 q, q) and player 2s r 2 =

). Then from (4), we get

3. Absorbability of strict Nash equilibrium

= (x0 , y0 ) is a strict Nash equilibObviously, the strategy profile s rium.

Z. Ding et al. / Mathematical Social Sciences 67 (2014) 18

Therefore, for every si i we must have

i i i i i i ui (si , rT ) = (1 )u (s , rT ) + u (s , I{ si } ) i i ). = (1 )ui (si , rT ) + ui ( s i , s i

i i i , I{ Let n , then we have ui (s si } ) u (s , I{ si } ), i.e. for all i

i , s i ) is a Nash equilibrium. holds for all si S i . It means (s

Then from (4) we have

1 2 , , 3 3 rt1 = (rt1 (y0 ), rt1 (y1 )) = 2 1 , ,

if t = 3 22n1 (14a) if t = 3 22n ; if t = 3 22n1 (14b) if t = 3 22n .

2 1 , , 3 3 rt2 = (rt2 (x0 ), rt2 (x1 )) = 1 2 , ,

Ut2 = max u2 ( 2 , rt2 ) =

Proof. By the definition of Uti (see Eq. (10)), U i i U ii =

i = maxsS |ui (s)|, then we have Let u

i i i i i i i i i (tn (tn 2u n n )n + n (tn n ) ) = . i ti i n t n n

Since player is learning is frequent enough, limn

Z. Ding et al. / Mathematical Social Sciences 67 (2014) 18

i ), must holds Since ui is bounded (for any s S , |ui (s)| u |Ctii |

where i i is defined as in (13). By the definition of infrequent switches, i i 0. So we have

in 1. Then it is easy to see B

= ui ((rtii ), rtii ) ui ((rtii ), rtii )

i i ( K1 + tn 1)ui ((rtii ), rtii )

in | u i ) implies |B i Noting that the boundedness of ui (|ui | u

i i U ii ) = 0. which means limn (B

i since U ii The boundedness of ui also guarantees that |U ii | u

i is player is average of realized payoffs (between 1 and tn ); then

i i U ii ) = 0 and from limn (B

The reason for the equality above to hold is that { si i } = ( ii ) and

where Ai n 0 obviously. i i i i i i , s i, s i i Noting that ui (s ) u (s ) = 0 for s

Z. Ding et al. / Mathematical Social Sciences 67 (2014) 18

lim (p(tn+1 ) p(tn )) = 0,

lim (q(tn+1 ) q(tn )) = 0.

(tn tn+1 )[p(tn ) I{x(tn )} (x1 )]

(tn tn+1 )[q(tn ) I{y(tn )} (x1 )]

1 ( K1 + tn1 1)q(tn1 ) + (tn tn1 )I{y(tn1 )} (y1 ) q(tn ) = , (16b) 1 K1 + tn 1

which tells (18) must hold if limn

2 , K1 is the sum of player

p(t = 3 22n ) = q(t = 3 2 ) =

lim (p(t + 1) p(t )) = 0,

lim (q(t + 1) q(t )) = 0.

You might also like