06 MAS Multiagent Learning 5

Game Theory,
Evolutionary Dynamics, and

Multi-Agent Learning
Prof. Nicola Gatt
(nicola.gatti@polimi.it
i
Game theory
Game theory: basics
Normal for
•Player
•Action
•Outcome
•Utilitie
•Strategie
•Solutions
s
Game theory: basics

Player 2
Normal for
•Player
•Action
Player 1
•Outcome
•Utilitie
•Strategie
•Solutions
s
Game theory: basics

Player 2
Rock Paper Scissors
Normal for
•Player Rock
•Action
Player 1
•Outcome
Paper
•Utilitie
•Strategie
•Solutions Scissors
s
Game theory: basics

Player 2
Rock Paper Scissors
Normal for Player 2 Player 1

•Player Rock Tie
wins wins
•Action
Player 1
•Outcome Player 1
Tie
Player 2
Paper
•Utilitie wins wins
•Strategie
•Solutions Player 2 Player 1
Scissors Tie
wins wins
s
Game theory: basics

Player 2
Rock Paper Scissors
Normal for
•Player Rock 0,0 -1,1 1,-1
•Action
Player 1
•Outcome 1,-1 0,0 -1,1

Paper
•Utilitie
•Strategie
•Solutions Scissors -1,1 1,-1 0,0
s
Game theory: basics

Player 2
Rock Paper Scissors
Normal for
•Player Rock 0,0 -1,1 1,-1 1 (Rock)
•Action
Player 1
•Outcome 1,-1 0,0 -1,1

Paper 1 (Paper)
•Utilitie
•Strategie
•Solutions Scissors -1,1 1,-1 0,0 1 (Scissors)
2 (Rock) 2 (Paper) 2 (Scissors)

s
Game theory: basics

Player 2
Rock Paper Scissors
Normal for
•Player Rock 0,0 -1,1 1,-1 1/3
•Action
Player 1
•Outcome 1,-1 0,0 -1,1 1/3

Paper
•Utilitie
•Strategie
•Solutions Scissors -1,1 1,-1 0,0 1/3
1/3 1/3 1/3

s
Nash Equilibrium
A strategy pro le ( ⇤
1,
⇤
2) is a Nash equilibrium if and if
n o
•
⇤ ⇤
1 2 arg max 1 U1 2
1
n o
• ⇤
2 2 arg max
2
⇤
1 U2 2

fi
:
Evolutionary dynamics
An evolutionary interpretation
Player 1
33% 33%
33%
Rock Paper Scissors

Player 1
In nite population 33% 33%

of individuals
33%
Rock Paper Scissors

fi

Bacterials
Player 1 Player 2
33% 33% 30%
50%
20%
33%
Rock Paper Scissors Rock Paper Scissors

Player 1 Player 2
Rock Paper Scissors
33% 33% 30%

Rock 0,0 -1,1 1,-1
50%
Paper 1,-1 0,0 -1,1
Scissors -1,1 1,-1 0,0

20%
33%

Player 1 Player 2
Rock Paper Scissors
33% 33% 30%

Rock 0,0 -1,1 1,-1
50%
Paper 1,-1 0,0 -1,1
Scissors -1,1 1,-1 0,0

20%
33%
At each round, each individual of a population plays against each

individual of the opponent’s population weighted by the
corresponding probability
Player 1 Utility = 0*1/2 - 1*1/5 + 1*3/10 = 0.1 Player 2
Rock Paper Scissors
33% 33% 30%

Rock 0,0 -1,1 1,-1
50%
Paper 1,-1 0,0 -1,1
Scissors -1,1 1,-1 0,0

20%
33%
1/2 1/5 3/10
Player 1 Utility = 0.1 Player 2
Rock Paper Scissors
33% 33% 30%

Rock 0,0 -1,1 1,-1
50%
Paper 1,-1 0,0 -1,1
Scissors -1,1 1,-1 0,0

20%
33%
1/2 1/5 3/10
Utility = 1*1/2 - 0*1/5 - 1*3/10 = 0.2

Utility = -1*1/2 - 1*1/5 - 0*3/10 = -0.3
Player 1 Utility = 0.1 Player 2
Rock Paper Scissors
33% 33% 30%

Rock 0,0 -1,1 1,-1
50%
Paper 1,-1 0,0 -1,1
Scissors -1,1 1,-1 0,0

20%
33%
1/2 1/5 3/10
Utility = 0.2
Utility = - 0.3 Player 1
33% 33%
Utility = 0.1
33%
Rock Paper Scissors
Utility = 0.2
33% 33%
Utility = 0.1
33%
Rock Paper Scissors
Utility = 0.2
The average utility is 1/3 * 0.1 + 1/3 * 0.2 - 0.3 * 1/3 = 0

33% 33%
Utility = 0.1
33%
Rock Paper Scissors
Utility = 0.2
The average utility is 0
A positive (utility - average utility) leads to an increase of the population
A negative (utility - average utility) leads to a decrease of the population
Player 1
17%
37%
46%
Rock Paper Scissors
New population after the replication

Revision protocol
Question: how the populations change?
Replicator dynamics
⇣ ⌘
˙ 1 (a, t) = 1 (a, t) ea U 1 2 (t) 1 (t)U1 2 (t)
Utility given by playing a Average population utility

with a probability of 1

Example (1)
<latexit sha1_base64="9q1jqctE4KHqKuKzQKtfoqlT7mU=">AAACkXicbVFdi9QwFE3r11q/Rtc3UYKDssIytAuiPgjD+iL4Mn7M7sK0DLfpnW7YJC3JrTqUvvsX/RM++AvMdAu6O14IHM65Nyc5N6+VdBTHP4PwytVr12/s3Ixu3b5z997o/oMjVzVW4FxUqrInOThU0uCcJCk8qS2CzhUe52fvNvrxV7ROVuYLrWvMNJRGrqQA8tRy9CNdRKmTpYZlskcv+FsepYey9GSOpTQtWAvrrlVKdX/7UsLv5ET7qdv3I8+3+Nk5z7eEz70QpWiK4ebeLYvSbDkax5O4L74NkgGM2VCz5eh3WlSi0WhIKHBukcQ17SswhRPgP+oNSAqF3qJxWIM4gxIXHhrQ6LK2z67jzzxT8FVl/THEe/bfiRa0c2ud+04NdOouaxvyf9qiodXrrJWmbgiNODdaNYpTxTeL4IW0KEitPQBhpX8rF6dgQZBf1wUX3SiStvrWRT6k5HIk2+DoYJK8nMQfD8bTwyGuHfaIPWV7LGGv2JS9ZzM2Z4L9Ch4Gj4Mn4W74JpyGQ28YDDO77EKFH/4AMtXG1Q==</latexit>
h i
1 (t) = 1 (R, t) 1 (P, t) 1 (S, t)
<latexit sha1_base64="sj5l2fNS9NN/Y+o9w37YW0pxkJk=">AAACkXicbVFda9RAFJ3ErzZ+bW3fRBlclAplSRZK9UFY6ovgy/qxbWETlpvJ3XToZBJmbqpLyLt/0T/hg7/A2TSg7Xph4HDOvXNmzk0rJS2F4U/Pv3X7zt17W9vB/QcPHz0e7Dw5sWVtBM5EqUpzloJFJTXOSJLCs8ogFKnC0/Ti/Vo/vURjZam/0qrCpIBcy6UUQI5aDH7E8yC2Mi9gMd6n1/wdD+JjmTsyxVzqBoyBVdsopdq/fTHhd7Ki+dweuJFXG/z0iucbwpdOCGLUWX9z55YEcbIYDMNR2BXfBFEPhqyv6WLwO85KUReoSSiwdh6FFR0o0JkV4D7qDEgKhc6itliBuIAc5w5qKNAmTZddy186JuPL0rijiXfsvxMNFNauitR1FkDn9qa2Jv+nzWtavkkaqauaUIsro2WtOJV8vQieSYOC1MoBEEa6t3JxDgYEuXVdcylqRdKU39rAhRTdjGQTnIxH0eEo/DQeTo77uLbYU/aC7bOIHbEJ+8CmbMYE++Xtec+85/6u/9af+H2v7/Uzu+xa+R//ADqYxtk=</latexit>
h i
2 (t) = 2 (R, t) 2 (P, t) 2 (S, t)
<latexit sha1_base64="3m6f2fTy+/1bEHG+Usr96RVnu8s=">AAACgnicbVFNb9QwEHVCW4pL2y0cuVisqCrUrpIiVA4gVXDhWCS2rRRHK8eZ7FrrOJE9AUXR/lBO/Ad+Ad5sJPrBSPY8vTfjsZ+zWiuHUfQrCJ9sbe883X1G957vHxyOjl5cu6qxEqay0pW9zYQDrQxMUaGG29qCKDMNN9nyy1q/+QHWqcp8x7aGtBRzowolBXpqNmp5QqezmH1ilGsoMKE8g7kynbBWtKvOWruijEWMHTN2FvfJ75zTPh/flXryX81G8yQHkw/nUW7VfIEp5elsNI4mUR/sMYgHMCZDXM1Gf3heyaYEg1IL55I4qvFUC5M7KfzL/AhUUoMf0jiohVyKOSQeGlGCS7verBV745mcFZX1yyDr2bsdnSida8vMV5YCF+6htib/pyUNFh/STpm6QTByM6hoNMOKrZ1nubIgUbceCGmVvyuTC2GFRP8/96aUjUZlq58r6k2KH1ryGFyfT+L3k+jb+fjy82DXLnlFXpMTEpMLckm+kisyJZL8DraDg+Aw3ArfhnH4blMaBkPPS3Ivwo9/AY6+tZ4=</latexit>
2 3 <latexit sha1_base64="lfHFLfUQ5LGiWDAINGX28Ss12d0=">AAACgnicbVFNb9QwFHTSFoop7QLHXixWVKiCVbKoggNIFb1wLBLbVoqjleO87Fp1nMh+KYqi/aE98R/4BXizkegHT7I9mnn22OOs1sphFN0G4db2zpOnu8/o870X+wejl68uXNVYCTNZ6cpeZcKBVgZmqFDDVW1BlJmGy+z6bK1f3oB1qjI/sa0hLcXCqEJJgZ6aj1qe0Nl8yr4yyjUUmFCewUKZTlgr2lVnrV1RxiLGjhiL+/mDXzinG3B0R+vJfz0byZMcTD6cR7lViyWmlKfz0TiaRH2xxyAewJgMdT4f/eF5JZsSDEotnEviqMb3WpjcSeFf5i1QSQ3epHFQC3ktFpB4aEQJLu36sFbsrWdyVlTWD4OsZ+/u6ETpXFtmvrMUuHQPtTX5Py1psPicdsrUDYKRG6Oi0Qwrtk6e5cqCRN16IKRV/q5MLoUVEv3/3HMpG43KVr9W1IcUP4zkMbiYTuKTSfRjOj79NsS1Sw7JG/KOxOQTOSXfyTmZEUl+BzvBfnAQbofHYRx+3LSGwbDnNblX4Ze/kNa1nw==</latexit>
2 3
0 1 1 0 1 1
U1 = 4 1 0 1 5 U2 = 4 1 0 1 5
1 1 0 1 1 0
<latexit sha1_base64="hmnzCI0AIInL2FZ9KdItS8YA/l4=">AAACanicbVFNb9QwEPWGrxK+lnJCvVgsoCK1q6QSggtSBReOBbFtpThaTZzJrlXHjuwJsIry+/gN/Ae4cIAr3m0OtGUk209v3ujZz0Wjlack+T6Krl2/cfPW1u34zt179x+MH24fe9s6iTNptXWnBXjUyuCMFGk8bRxCXWg8Kc7erfsnn9F5Zc0nWjWY17AwqlISKFDzMYgsns3TXUH4lbzsPvZ7QpaWXvA3PBYaK8piUeBCmQ6cg1XfOef6mPOE8+ecp5t9PxxCxAJNOahi4dRiSXks8vl4kkyTTfGrIB3AhA11NB//EqWVbY2GpAbvszRpaE+DKb2E8IJgQUpqDCatxwbkGSwwC9BAjT7vNqH0/FlgSl5ZF5YhvmH/neig9n5VF0FZAy395d6a/F8va6l6nXfKNC2hkedGVas5Wb5OmJfKoSS9CgCkU+GuXC7BgaTwDxdc6laTcvZLH4eQ0suRXAXHB9P05TT5cDA5fDvEtcV22BO2y1L2ih2y9+yIzZhk39hP9pv9Gf2ItqPH0c65NBoNM4/YhYqe/gWJ1LjU</latexit>
⇥ ⇤
U1 (R, ·) = 0 1 1
<latexit sha1_base64="4lz9ISSYcxbYr7zFaWeKIzlWuxo=">AAACanicbVFNb9QwEPWGrxK+lnJCvVgsoCKVVVIJwQWpggvHRWLbSnG0mjiTXauOHdkTYBXl9/Eb+A9w4QBXvNscaMuTbD29mdEbPxeNVp6S5Psounb9xs1bO7fjO3fv3X8wfrh77G3rJM6l1dadFuBRK4NzUqTxtHEIdaHxpDh7v6mffEbnlTWfaN1gXsPSqEpJoCAtxiCyeL5I9wXhV/Kym/UHQpaWXvC3PBYaK8piUeBSmQ6cg3XfOef6mPOXKefPOU+2d+BCxAJNOXTFwqnlivJY5IvxJJkmW/CrJB3IhA2YLca/RGllW6MhqcH7LE0aOtBgSi8hvCBYkJIag0nrsQF5BkvMAjVQo8+7bSg9fxaUklfWhWOIb9V/JzqovV/XReisgVb+cm0j/q+WtVS9yTtlmpbQyHOjqtWcLN8kzEvlUJJeBwLSqbArlytwICn8wwWXutWknP3SxyGk9HIkV8nx4TR9NU0+Hk6O3g1x7bA99oTts5S9ZkfsA5uxOZPsG/vJfrM/ox/RbvQ42jtvjUbDzCN2AdHTv4ZfuNI=</latexit>
⇥ ⇤
U1 (P, ·) = 1 0 1
<latexit sha1_base64="9Np1xyX1zI7Sj8R6AijjyRFEVgs=">AAACanicbVFNb9QwEPWGr2K+lnJCvVgsoCK1q6QSgkulCi49toJtK8XRynEmu1YdO7InLasov6+/gf8AFw5wxbubA215kuWnNzN64+e81spjHH8fRHfu3rv/YOMhffT4ydNnw+ebJ942TsJEWm3dWS48aGVgggo1nNUORJVrOM3PPy/rpxfgvLLmKy5qyCoxM6pUUmCQpkPBUzqZJtsc4Rt62X7pdrgsLL5j+4xyDSWmlOcwU6YVzolF1zrnOspYwthbxnbXV8wY55SDKfouyp2azTGjPJsOR/E4XoHdJklPRqTH0XT4ixdWNhUYlFp4nyZxjTtamMJLEV4QLFBJDcGk8VALeS5mkAZqRAU+a1ehdOxNUApWWheOQbZS/51oReX9ospDZyVw7m/WluL/ammD5cesVaZuEIxcG5WNZmjZMmFWKAcS9SIQIZ0KuzI5F05IDP9wzaVqNCpnLzsaQkpuRnKbnOyNk/fj+HhvdPCpj2uDbJFXZJsk5AM5IIfkiEyIJFfkJ/lN/gx+RJvRy2hr3RoN+pkX5Bqi138BjA241Q==</latexit>
⇥ ⇤
U1 (S, ·) = 1 1 0
Example (2)
<latexit sha1_base64="IsId3nRVEE6Z4N9z1dgWF3Lcj0M=">AAAD6HictVPNjtMwEHYTfpbw14UjF4uKVVcKVVIJwQVptQiJY6B0d6W6qhzHzZo6TmRPgCrKO3BDXHkrLjwET4DTRmK7pXtBjOLo83wz48+TSVxIYSAIfnQc99r1Gzf3bnm379y9d7+7/+DE5KVmfMxymeuzmBouheJjECD5WaE5zWLJT+PFq4Y//ci1Ebl6D8uCTzOaKjEXjIJ1zfY7P8nEIzFPhaqo1nRZV9Ja7ZEkh4oYkWa0noV9AvwzGFa9q304xBgf2PUSr/kt+gCTY5H2x5sEYbbkIfHbpGG/iXz6p4bdEt/mEB9fDGlK2Texzy5R0dWiol2iov8panS1qNEuUaN/E+URrpL2U3pkOuv2gkGwMrwNwhb0UGvRrPvL3oeVGVfAJDVmEgYF+JKqxDBq58eWBsEkt8VLwwvKFjTlEwsVzbiZVquRrPET60nwPNd2KcAr78WMimbGLLPYRmYUzs1lrnH+jZuUMH8xrYQqSuCKrQ+alxJDjpv5xonQnIFcWkCZFlYrZudUUwb2L9g4JSslCJ1/qj3bpPByS7bByXAQPhsEb4e9o+O2XXvoEXqM+ihEz9EReoMiNEbMee0sHHBK94P7xf3qfluHOp025yHaMPf7b7rJNnk=</latexit>
⇣ ⌘
˙ 1 (R, t) = 1 (R, t) U1 (R, ·) 2 (t) 1 (t) U1 2 (t)
⇣ ⌘
˙ 1 (P, t) = 1 (P, t) U1 (P, ·) 2 (t) 1 (t) U1 2 (t)
⇣ ⌘
˙ 1 (S, t) = 1 (S, t) U1 (S, ·) 2 (t) 1 (t) U1 2 (t)
Evolutionary Stable Strategies
A strategy is an ESS if it is immune to invasion by mutant
strategies, given that the mutants initially occupy a small
fraction of populatio
Every ESS is an asymptotically stable xed point of

the replicator dynamics
n
fi
Evolutionary Stable Strategies (1)
A strategy is an ESS if it is immune to invasion by mutant
strategies, given that the mutants initially occupy a small
fraction of populatio
Every ESS is an asymptotically stable xed point of

the replicator dynamics
Nash equilibri While a NE always exists,

an ESS may not exist
ESS
a
fi
Given a Nash equilibrium, a small perturbation could
make the equilibrium unstable
Player 2
Action 1 Action 2
Action 1 1, 1 0, 0
Player 1
Action 2 0, 0 0, 0
Nash equilibrium
Player 2
Action 1 Action 2
Action 1 1, 1 0, 0 ✏
Player 1
<latexit sha1_base64="fKFPxQ2uOzd9AsOzCkr5rrYcu60=">AAACEXicbVDLTgIxFO3gC/GFunTTCCYuDJkhMbokunGJiYARCOl07kBDp520HQ2Z8BW61f9wZ9z6Bf6GX2CBWQh4kpucnHNvzs3xY860cd1vJ7eyura+kd8sbG3v7O4V9w+aWiaKQoNKLtW9TzRwJqBhmOFwHysgkc+h5Q+vJ37rEZRmUtyZUQzdiPQFCxklxkoP5Q7EmnEpyr1iya24U+Bl4mWkhDLUe8WfTiBpEoEwlBOt254bmzNORKApsUEpUYZRDuNCJ9EQEzokfWhbKkgEuptOfx/jE6sEOJTKjjB4qv69SEmk9Sjy7WZEzEAvehPxP6+dmPCymzIRJwYEnQWFCcdG4kkROGAKqOEjSwhVzP6K6YAoQo2tay4lSrhhSj6NC7Ykb7GSZdKsVrzzintbLdWusrry6Agdo1PkoQtUQzeojhqIIoFe0Ct6c56dd+fD+Zyt5pzs5hDNwfn6BVHNng8=</latexit>
Action 2 0, 0 0, 0 1 ✏
<latexit sha1_base64="DXlf8mI+2ebZtsecwtC+PNdDUi4=">AAACE3icbVBLSgNBFOzxG+Mv6tJNYyK40DATEF0G3biMYD6QCaGnpydp0j+6e5QQcgzd6j3ciVsP4DU8gZ1kFiax4EFR9R71qEgxaqzvf3srq2vrG5u5rfz2zu7efuHgsGFkqjGpY8mkbkXIEEYFqVtqGWkpTRCPGGlGg9uJ33wk2lApHuxQkQ5HPUETipF1UlgKLkKiDGVSlLqFol/2p4DLJMhIEWSodQs/YSxxyomwmCFj2oGv7DlDIjYYuagR0pZiRsb5MDVEITxAPdJ2VCBOTGc0/X4MT50Sw0RqN8LCqfr3YoS4MUMeuU2ObN8sehPxP6+d2uS6M6JCpZYIPAtKUgathJMqYEw1wZYNHUFYU/crxH2kEbausLkUnjJLtXwa511JwWIly6RRKQeXZf++UqzeZHXlwDE4AWcgAFegCu5ADdQBBgq8gFfw5j17796H9zlbXfGymyMwB+/rFzoJnoE=</latexit>
✏ 1 ✏
Player 2
Action 1 Action 2
The only best

response is Action 1 1, 1 0, 0 ✏
Player 1
Action 1
Action 2 0, 0 0, 0 1 ✏
✏ 1 ✏
Player 2
Action 1 Action 2
ESS
Action 1 1, 1 0, 0 1 ✏
Player 1
Action 2 0, 0 0, 0 ✏
1 ✏ ✏
Prisoner’s dilemma
Player 2
Cooperate Defeat
Cooperate 3,3 0,5

Player 1
Defeat 5,0 1,1

Player 2
Cooperate Defeat
Cooperate 3,3 0,5

Player 1
Defeat 5,0 1,1

Evolutionary Dy
Player 2
1 1
Cooperate Defeat
Player 2’s cooperate

0.75 0.75
probability
y1 y1
0.5 0.5
Cooperate 3,3 0,5
Player 1
0.25 0.25
Defeat 5,0 1,1

0 0
0 0.25 0.5 0.75 1 0
x1
probability
Figure 4: The replicator dynamics,
(left), the stag hunt (center), and ma

Evolutionary Dy
Player 2
1 1
Cooperate Defeat

0.75 0.75
probability
y1 y1
0.5 0.5
Cooperate 3,3 0,5
Player 1
0.25 0.25
Defeat 5,0 1,1

0 0
0 0.25 0.5 0.75 1 0
x1
asymptotically Player 1’s cooperate
stable probability
Figure 4: The replicator dynamics,
(left), the stag hunt (center), and ma

Stag hunt
Player 2
Stag Hare
Stag 4,4 1,3

Player 1
Hare 3,1 3,3

Stag hunt
Player 2
Stag Hare
Stag 4,4 1,3

Player 1
Hare 3,1 3,3

Stag hunt
Evolutionary Dynamics of Multi-Agent Le
Player 2
1 1 1
Stag Hare
0.75 0.75 0.75
Player 2’s stag

probability
y1 y1 y1
0.5 0.5 0.5
Stag 4,4 1,3
Player 1
0.25 0.25 0.25
Hare 3,1 3,3

0 0 0
0 0.25 0.5 0.75 1 0 0.25 0.5 0.75 1 0
x1 x1
Player 1’s stag
probability
Figure 4: The replicator dynamics, plotted in the unit simplex,
(left), the stag hunt (center), and matching pennies (right).

Stag hunt
Evolutionary Dynamics of Multi-Agent Le
Player 2 asymptotically
stable
1 1 1
Stag Hare
0.75 0.75 0.75
Player 2’s stag

probability
y1 y1 y1
0.5 0.5 0.5
Stag 4,4 1,3
Player 1
0.25 0.25 0.25
Hare 3,1 3,3

0 0 0
0 0.25 0.5 0.75 1 0 0.25 0.5 0.75 1 0
x1 x1
asymptotically Player 1’s stag
stable probability
Figure 4: The replicator dynamics, plotted in the unit simplex,
(left), the stag hunt (center), and matching pennies (right).

Matching pennies
Player 2
Head Tail
Head 0,1 1,0

Player 1
Tail 1,0 0,1

Matching pennies
Player 2
Head Tail
Head 0,1 1,0

Player 1
Tail 1,0 0,1

Matching pennies
Evolutionary Dynamics of Multi-Agent Learning
Player 2
1 1
Head Tail
0.75 0.75
Player 2’s hea

probability
y1 y1
0.5 0.5
Head 0,1 1,0
Player 1
0.25 0.25
Tail 1,0 0,1

0 0
0.25 0.5 0.75 1 0 0.25 0.5 0.75 1 0 0.25 0.5 0.75 1
x1 x1 x
Player11’s head
probability
4: The replicator dynamics, plotted in the unit simplex, for the prisoner’s dilemma
e stag hunt (center), and matching pennies (right).
d
Matching pennies
Player 2
1 1
Head Tail
0.75 0.75
Player 2’s hea

probability
y1 y1
0.5 0.5
Head 0,1 1,0
Player 1
0.25 0.25
Tail 1,0 0,1

0 0
0.25 0.5 0.75 1 0 0.25 0.5 0.75 1 0 0.25 0.5 0.75 1
x1 x1 x
Player11’s head
probability
d
Starting point
⇣ ⌘
˙ 1 (a, t) = 1 (a, t) ea U 1 2 (t) 1 (t)U1 2 (t)
If the starting point is 0, then it will be always zero
Having a starting point with zero strategies does not make sense
Starting point
Player 2
1 1
Action 1 Action 2
0.75 0.75
Player 2’s hea

probability
y1 y1
0.5 0.5
Action 1 …, … …, …
Player 1
0.25 0.25
Action 2 …, … …, …
0 0
0.25 0.5 0.75 1 0 0.25 0.5 0.75 1 0 0.25 0.5 0.75 1
x1 x1 x
Player11’s head
probability
d
Starting point
Player 2
1 1
Action 1 Action 2
0.75 0.75
Player 2’s hea

probability
y1 y1
0.5 0.5
Action 1 …, … …, …
Player 1
0.25 0.25
Action 2 …, … …, …
0 0
0.25 0.5 0.75 1 0 0.25 0.5 0.75 1 0 0.25 0.5 0.75 1
x1 x1 x
Player11’s head
probability
d
Starting point
Player 2
1 1
Action 1 Action 2
0.75 0.75
Player 2’s hea

probability
y1 y1
0.5 0.5
Action 1 …, … …, …
Player 1
0.25 0.25
Action 2 …, … …, …
0 0
0.25 0.5 0.75 1 0 0.25 0.5 0.75 1 0 0.25 0.5 0.75 1
x1 x1 x
Player11’s head
probability
d
Starting point
Player 2
1 1
Action 1 Action 2
0.75 0.75
Player 2’s hea

probability
y1 y1
0.5 0.5
Action 1 …, … …, …
Player 1
0.25 0.25
Action 2 …, … …, …
0 0
0.25 0.5 0.75 1 0 0.25 0.5 0.75 1 0 0.25 0.5 0.75 1
x1 x1 x
Player11’s head
probability
d
Single-population games
• There is a single population of individuals that is
playing against itsel
• An example: bacterials
f
Example
Single population
<latexit sha1_base64="dQwcmzBg8aG40FxjXubuxah7rZ4=">AAACInicbVDLSgMxFM3Ud31VXYmbYCsoSJkpiC5FNy4r9CG0pdxJ0zaYZIbkjlqG4s/oVv/DnbgS/Aq/wLR2YdUDgcO553JuThhLYdH3373MzOzc/MLiUnZ5ZXVtPbexWbNRYhivskhG5ioEy6XQvIoCJb+KDQcVSl4Pr89H8/oNN1ZEuoKDmLcU9LToCgbopHZuu9C0oqdgv4n8Di1LK85Eg+FBoZ3L+0V/DPqXBBOSJxOU27nPZidiieIamQRrG4Ef46EE3bEMXHIKBgWTfJhtJpbHwK6hxxuOalDcttLxZ4Z0zykd2o2MexrpWP25kYKydqBC51SAfft7NhL/mzUS7J60UqHjBLlm30HdRFKM6KgZ2hGGM5QDR4AZ4W6lrA8GGLr+plJUIlGY6HaYdSUFvyv5S2qlYnBU9C9L+dOzSV2LZIfskn0SkGNySi5ImVQJI/fkkTyRZ+/Be/Fevbdva8ab7GyRKXgfXwkQpCA=</latexit> <latexit sha1_base64="fqNxtUaYet8LtVV1klaGZOu2Ur4=">AAACInicbVDLSgMxFM34tr6qrsRNsAoVpMwURJeiG5cV+hDaUu6kaQ1NMkNyRy1D8Wd0q//hTlwJfoVfYFq70NYDgcO553JuThhLYdH3P7yZ2bn5hcWl5czK6tr6RnZzq2qjxDBeYZGMzHUIlkuheQUFSn4dGw4qlLwW9i6G89otN1ZEuoz9mDcVdLXoCAbopFZ2Z79hRVdBvoH8Hi1Ly85Ei4PD/VY25xf8Eeg0CcYkR8YotbJfjXbEEsU1MgnW1gM/xiMJum0ZuOQUDAom+SDTSCyPgfWgy+uOalDcNtPRZwb0wClt2omMexrpSP29kYKytq9C51SAN3ZyNhT/m9UT7Jw2U6HjBLlmP0GdRFKM6LAZ2haGM5R9R4AZ4W6l7AYMMHT9/UlRiURhortBxpUUTFYyTarFQnBc8K+KubPzcV1LZJfskTwJyAk5I5ekRCqEkQfyRJ7Ji/fovXpv3vuPdcYb72yTP/A+vwEKt6Qh</latexit> <latexit sha1_base64="utC7CMJOqp+hqnXnXrR5eCJ1QIs=">AAACInicbVDLSgMxFM34rPVVdSVugq2gIGWmIrosunFZoVWhLeVOmtZgkhmSO2oZij+jW/0Pd+JK8Cv8AtPHQqsHAodzz+XcnDCWwqLvf3hT0zOzc/OZhezi0vLKam5t/cJGiWG8xiIZmasQLJdC8xoKlPwqNhxUKPlleHM6mF/ecmNFpKvYi3lTQVeLjmCATmrlNgsNK7oKdhvI79GytOpM9KC/V2jl8n7RH4L+JcGY5MkYlVbuq9GOWKK4RibB2nrgx7gvQbctA5ecgkHBJO9nG4nlMbAb6PK6oxoUt810+Jk+3XFKm3Yi455GOlR/bqSgrO2p0DkV4LWdnA3E/2b1BDvHzVToOEGu2Siok0iKER00Q9vCcIay5wgwI9ytlF2DAYauv18pKpEoTHTXz7qSgslK/pKLUjE4LPrnpXz5ZFxXhmyRbbJLAnJEyuSMVEiNMPJAnsgzefEevVfvzXsfWae88c4G+QXv8xsMXqQi</latexit>
(Type 1) (Type 2) (Type 3)
Type 1 Type 2 Type 3

(Type 1)
Type 1 1 2 0
Single population
<latexit sha1_base64="dQwcmzBg8aG40FxjXubuxah7rZ4=">AAACInicbVDLSgMxFM3Ud31VXYmbYCsoSJkpiC5FNy4r9CG0pdxJ0zaYZIbkjlqG4s/oVv/DnbgS/Aq/wLR2YdUDgcO553JuThhLYdH3373MzOzc/MLiUnZ5ZXVtPbexWbNRYhivskhG5ioEy6XQvIoCJb+KDQcVSl4Pr89H8/oNN1ZEuoKDmLcU9LToCgbopHZuu9C0oqdgv4n8Di1LK85Eg+FBoZ3L+0V/DPqXBBOSJxOU27nPZidiieIamQRrG4Ef46EE3bEMXHIKBgWTfJhtJpbHwK6hxxuOalDcttLxZ4Z0zykd2o2MexrpWP25kYKydqBC51SAfft7NhL/mzUS7J60UqHjBLlm30HdRFKM6KgZ2hGGM5QDR4AZ4W6lrA8GGLr+plJUIlGY6HaYdSUFvyv5S2qlYnBU9C9L+dOzSV2LZIfskn0SkGNySi5ImVQJI/fkkTyRZ+/Be/Fevbdva8ab7GyRKXgfXwkQpCA=</latexit>
(Type 2)
Type 2 0 1 2
<latexit sha1_base64="fqNxtUaYet8LtVV1klaGZOu2Ur4=">AAACInicbVDLSgMxFM34tr6qrsRNsAoVpMwURJeiG5cV+hDaUu6kaQ1NMkNyRy1D8Wd0q//hTlwJfoVfYFq70NYDgcO553JuThhLYdH3P7yZ2bn5hcWl5czK6tr6RnZzq2qjxDBeYZGMzHUIlkuheQUFSn4dGw4qlLwW9i6G89otN1ZEuoz9mDcVdLXoCAbopFZ2Z79hRVdBvoH8Hi1Ly85Ei4PD/VY25xf8Eeg0CcYkR8YotbJfjXbEEsU1MgnW1gM/xiMJum0ZuOQUDAom+SDTSCyPgfWgy+uOalDcNtPRZwb0wClt2omMexrpSP29kYKytq9C51SAN3ZyNhT/m9UT7Jw2U6HjBLlmP0GdRFKM6LAZ2haGM5R9R4AZ4W6l7AYMMHT9/UlRiURhortBxpUUTFYyTarFQnBc8K+KubPzcV1LZJfskTwJyAk5I5ekRCqEkQfyRJ7Ji/fovXpv3vuPdcYb72yTP/A+vwEKt6Qh</latexit>
(Type 3)
Type 3 2 0 1
<latexit sha1_base64="utC7CMJOqp+hqnXnXrR5eCJ1QIs=">AAACInicbVDLSgMxFM34rPVVdSVugq2gIGWmIrosunFZoVWhLeVOmtZgkhmSO2oZij+jW/0Pd+JK8Cv8AtPHQqsHAodzz+XcnDCWwqLvf3hT0zOzc/OZhezi0vLKam5t/cJGiWG8xiIZmasQLJdC8xoKlPwqNhxUKPlleHM6mF/ecmNFpKvYi3lTQVeLjmCATmrlNgsNK7oKdhvI79GytOpM9KC/V2jl8n7RH4L+JcGY5MkYlVbuq9GOWKK4RibB2nrgx7gvQbctA5ecgkHBJO9nG4nlMbAb6PK6oxoUt810+Jk+3XFKm3Yi455GOlR/bqSgrO2p0DkV4LWdnA3E/2b1BDvHzVToOEGu2Siok0iKER00Q9vCcIay5wgwI9ytlF2DAYauv18pKpEoTHTXz7qSgslK/pKLUjE4LPrnpXz5ZFxXhmyRbbJLAnJEyuSMVEiNMPJAnsgzefEevVfvzXsfWae88c4G+QXv8xsMXqQi</latexit>
Convergence
Theorem. The replicator equation for a matrix game
(single-population) satis es
1. A stable rest point is a NE
2. A convergent trajectory in the interior of the

strategy space evolves to a NE
3. A strict NE is locally asymptotically stable

fi
:
 
 
Multi-population games
• Every player is associated with a different population
of individual
• Theorem. (p*, q*) is a two-species ESS if and only if
(p*, q*) is a strict Nash in a neighbourhood an
• is a locally asymptotically stable rest point of the
replicator equatio
• if it is in the interior, then it is a globally
asymptotically stable rest point of the replicator
equation
s
Known dynamics
Replicator dynamics Logit
BNN Smith
Properties
Multi-agent learning
Markov decision problem
Reinforcement learning
Q-learning (1)
For every pair state/action:
h i
Q(s, a) Q(s, a) + ↵ r + maxa0 Q(s, a0 ) Q(s, a)
learning rate reward discount factor

Example: normal-form games
Player
•1 stat
•2 actions (Cooperate, Defeat)
Player 2
Cooperate Defeat
Cooperate 3,3 0,5

Player 1
Defeat 5,0 1,1

e

(
1.0 a = Cooperate
1 (a) =
0.0 a = Defeat
⇣ ⌘ (
Q(a) Q(a) + ↵ r Q(a) ↵ = 0.2 0.2 a = Cooperate
1 (a) =
0.8 a = Defeat
Player 2
Cooperate Defeat
Cooperate 3,3 0,5

Player 1
Defeat 5,0 1,1

(
1.0 a = Cooperate
1 (a) =
0.0 a = Defeat
⇣ ⌘ (
1 (a) =
0.8 a = Defeat
round Player 2’s action Player 1’s Q function
t=0 — Q(Cooperate) = 0
t=1 a = Cooperate Q(Cooperate) = 0.6 Player 2
t=2 a = Defeat Q(Cooperate) = 0.48
Cooperate Defeat
t=6 a = Cooperate Q(Cooperate) = 0.496608
Cooperate 3,3 0,5

Player 1
Defeat 5,0 1,1

(
1.0 a = Cooperate
1 (a) =
0.0 a = Defeat
⇣ ⌘ (
1 (a) =
0.8 a = Defeat
Cooperate Defeat
Cooperate 3,3 0,5

Player 1
Defeat 5,0 1,1

(
1.0 a = Cooperate
1 (a) =
0.0 a = Defeat
⇣ ⌘ (
1 (a) =
0.8 a = Defeat
Cooperate Defeat
Cooperate 3,3 0,5

Player 1
Defeat 5,0 1,1

(
1.0 a = Cooperate
1 (a) =
0.0 a = Defeat
⇣ ⌘ (
1 (a) =
0.8 a = Defeat
Cooperate Defeat
Cooperate 3,3 0,5

Player 1
Defeat 5,0 1,1

(
1.0 a = Cooperate
1 (a) =
0.0 a = Defeat
⇣ ⌘ (
1 (a) =
0.8 a = Defeat
Cooperate Defeat
Cooperate 3,3 0,5

Player 1
Defeat 5,0 1,1

(
1.0 a = Cooperate
1 (a) =
0.0 a = Defeat
⇣ ⌘ (
1 (a) =
0.8 a = Defeat
Cooperate Defeat
Cooperate 3,3 0,5

Player 1
Defeat 5,0 1,1

(
1.0 a = Cooperate
1 (a) =
0.0 a = Defeat
⇣ ⌘ (
1 (a) =
0.8 a = Defeat
Cooperate Defeat
Cooperate 3,3 0,5

Player 1
Defeat 5,0 1,1

Q-learning (2)
Softmax (a.k.a. Boltzam exploration)
exp(Q(s, a)/⌧ )
i (a) = P 0 )/⌧ )
a 0 exp(Q(s, a
Q-learning (2)
exp(Q(s, a)/⌧ )
i (a) = P 0 )/⌧ )
a 0 exp(Q(s, a
temperature
Q-learning (2)
exp(Q(s, a)/⌧ )
i (a) = P 0 )/⌧ )
a 0 exp(Q(s, a
temperature
Every action is played with strictly positive probabilit
The larger the temperature, the smoother the functio
If the temperature is 0, we would have a best response

y
Q(Cooperate) Q(Defeat) 1 (Cooperate) 1 (Cooperate)

0 0 0.5 0.5
1 0 0.731 0.269
5 0 0.99331 0.00669
10 0 0.999955 0.000045

0 0 0.5 0.5
1 0 0.731 0.269
5 0 0.99331 0.00669
10 0 0.999955 0.000045

0 0 0.5 0.5
1 0 0.731 0.269
5 0 0.99331 0.00669
10 0 0.999955 0.000045

0 0 0.5 0.5
1 0 0.731 0.269
5 0 0.99331 0.00669
10 0 0.999955 0.000045
Self-play Q-learning dynamics
Self-play learning
Q-learning algorithm
Player 2
Cooperate Defeat
Q-learning algorithm
Cooperate 3,3 0,5

Player 1
Defeat 5,0 1,1

Learning dynamics
Assumptions
•Time is continuou
•All the actions can be selected simultaneously
:
Learning dynamics
Assumptions
↵ ⇣ ⌘ ⇣ X ⌘
1 (a, t) 0 0
˙ 1 (a, t) = ea U1 2 (t) 1 (t)U1 2 (t) ↵ 1 (a, t) log( 1 (a)) 1 (a ) log( 1 (a ))
⌧
a0
:
Learning dynamics
Assumptions
↵ ⇣ ⌘ ⇣ X ⌘
1 (a, t) 0 0
⌧
a0
exploitation term exploration term

:
Learning dynamics
Assumptions
↵ ⇣ ⌘ ⇣ X ⌘
1 (a, t) 0 0
⌧
a0
exploitation term exploration term
When the temperature is 0, the Q-learning behaves

as the replicator dynamics
:
Player 2 FAQ-learning
1
Cooperate Defeat
0.75 0

probability
y1 y1
0.5
Cooperate 3,3 0,5
Player 1
0.25 0
Defeat 5,0 1,1 0

0 0.25 0.5 0.75 1
asymptotically Player 1’sx1cooperate
stable probability
Lenient FAQ-learning
1
0.75 0

Stag hunt
Player 2
FAQ-learning
asymptotically
stable
1 1
Stag Hare
0.75 0.75 0.
Player 2’s stag

probability
y1 y1 y1
0.5 0.5 0
Stag 4,4 1,3
Player 1
0.25 0.25 0.
Hare 3,1 3,3

0 0
0 0.25 0.5 0.75 1 0 0.25 0.5 0.75 1
x1 asymptotically x11’s stag
Player
stable probability
Lenient FAQ-learning
1 1
0.75 0.75 0.

Matching pennies
ning Player 2
1 1
Head Tail
0.75 0.75
Player 2’s hea

probability
y1 y1
0.5 0.5
Head 0,1 1,0
Player 1
0.25 0.25
Tail 1,0 0,1

0 0
5 0.5 0.75 1 0 0.25 0.5 0.75 1 0 0.25 0.5 0.75 1
x
1
x
1 Playerx11’s head
probability
FAQ-learning
1 1
0.75 0.75
d
Other MAL algorithms dynamic

based on replicator equation
In nitesimal Gradient Ascent
IGA-Win or Learn Fast
Weighted Policy Learning
Cross Learning
Frequency-Adjusted Q-learning
Regret Minimization (Polynomial Weights)

fi
s

06 MAS Multiagent Learning 5

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

06 MAS Multiagent Learning 5

Uploaded by

Copyright:

Available Formats

Game Theory,

Evolutionary Dynamics, and

Game theory: basics

Game theory: basics

Rock Paper Scissors

Game theory: basics

Rock Paper Scissors

Normal for Player 2 Player 1

Game theory: basics

Rock Paper Scissors

•Outcome 1,-1 0,0 -1,1

Game theory: basics

Rock Paper Scissors

•Outcome 1,-1 0,0 -1,1

2 (Rock) 2 (Paper) 2 (Scissors)

Game theory: basics

Rock Paper Scissors

•Outcome 1,-1 0,0 -1,1 1/3

1/3 1/3 1/3

Rock Paper Scissors

In nite population 33% 33%

Rock Paper Scissors

33% 33% 30%

Rock Paper Scissors Rock Paper Scissors

Rock Paper Scissors

33% 33% 30%

Scissors -1,1 1,-1 0,0

Rock Paper Scissors Rock Paper Scissors

Rock Paper Scissors

33% 33% 30%

Scissors -1,1 1,-1 0,0

Rock Paper Scissors Rock Paper Scissors

At each round, each individual of a population plays against each

Rock Paper Scissors

33% 33% 30%

Scissors -1,1 1,-1 0,0

Rock Paper Scissors

33% 33% 30%

Scissors -1,1 1,-1 0,0

Utility = 1*1/2 - 0*1/5 - 1*3/10 = 0.2

Rock Paper Scissors

33% 33% 30%

Scissors -1,1 1,-1 0,0

Rock Paper Scissors

Rock Paper Scissors

The average utility is 1/3 * 0.1 + 1/3 * 0.2 - 0.3 * 1/3 = 0

Rock Paper Scissors

Rock Paper Scissors

New population after the replication

Utility given by playing a Average population utility

Every ESS is an asymptotically stable xed point of

Every ESS is an asymptotically stable xed point of

Nash equilibri While a NE always exists,

The only best

Cooperate 3,3 0,5

Defeat 5,0 1,1

Cooperate 3,3 0,5

Defeat 5,0 1,1

Player 2’s cooperate

Defeat 5,0 1,1

Player 2’s cooperate

Defeat 5,0 1,1

Utility = 11/2 - 01/5 - 1*3/10 = 0.2