Professional Documents
Culture Documents
06 MAS Multiagent Learning 5
06 MAS Multiagent Learning 5
Game theory
Game theory: basics
Normal for
•Player
•Action
•Outcome
•Utilitie
•Strategie
•Solutions
s
Normal for
•Player
•Action
Player 1
•Outcome
•Utilitie
•Strategie
•Solutions
s
Normal for
•Player Rock
•Action
Player 1
•Outcome
Paper
•Utilitie
•Strategie
•Solutions Scissors
s
•Outcome Player 1
Tie
Player 2
Paper
•Utilitie wins wins
•Strategie
•Solutions Player 2 Player 1
Scissors Tie
wins wins
s
Normal for
•Player Rock 0,0 -1,1 1,-1
•Action
Player 1
Normal for
•Player Rock 0,0 -1,1 1,-1 1 (Rock)
•Action
Player 1
Normal for
•Player Rock 0,0 -1,1 1,-1 1/3
•Action
Player 1
Nash Equilibrium
A strategy pro le ( ⇤
1,
⇤
2) is a Nash equilibrium if and if
n o
•
⇤ ⇤
1 2 arg max 1 U1 2
1
n o
• ⇤
2 2 arg max
2
⇤
1 U2 2
fi
:
Evolutionary dynamics
An evolutionary interpretation
Player 1
33% 33%
33%
Player 1
33%
Bacterials
An evolutionary interpretation
Player 1 Player 2
50%
20%
33%
Utility = 0.2
An evolutionary interpretation
Utility = - 0.3 Player 1
33% 33%
Utility = 0.1
33%
Utility = 0.2
An evolutionary interpretation
Utility = - 0.3 Player 1
33% 33%
Utility = 0.1
33%
Utility = 0.2
33% 33%
Utility = 0.1
33%
Utility = 0.2
The average utility is 0
A positive (utility - average utility) leads to an increase of the population
A negative (utility - average utility) leads to a decrease of the population
An evolutionary interpretation
Player 1
17%
37%
46%
Replicator dynamics
⇣ ⌘
˙ 1 (a, t) = 1 (a, t) ea U 1 2 (t) 1 (t)U1 2 (t)
Example (1)
<latexit sha1_base64="9q1jqctE4KHqKuKzQKtfoqlT7mU=">AAACkXicbVFdi9QwFE3r11q/Rtc3UYKDssIytAuiPgjD+iL4Mn7M7sK0DLfpnW7YJC3JrTqUvvsX/RM++AvMdAu6O14IHM65Nyc5N6+VdBTHP4PwytVr12/s3Ixu3b5z997o/oMjVzVW4FxUqrInOThU0uCcJCk8qS2CzhUe52fvNvrxV7ROVuYLrWvMNJRGrqQA8tRy9CNdRKmTpYZlskcv+FsepYey9GSOpTQtWAvrrlVKdX/7UsLv5ET7qdv3I8+3+Nk5z7eEz70QpWiK4ebeLYvSbDkax5O4L74NkgGM2VCz5eh3WlSi0WhIKHBukcQ17SswhRPgP+oNSAqF3qJxWIM4gxIXHhrQ6LK2z67jzzxT8FVl/THEe/bfiRa0c2ud+04NdOouaxvyf9qiodXrrJWmbgiNODdaNYpTxTeL4IW0KEitPQBhpX8rF6dgQZBf1wUX3SiStvrWRT6k5HIk2+DoYJK8nMQfD8bTwyGuHfaIPWV7LGGv2JS9ZzM2Z4L9Ch4Gj4Mn4W74JpyGQ28YDDO77EKFH/4AMtXG1Q==</latexit>
h i
1 (t) = 1 (R, t) 1 (P, t) 1 (S, t)
<latexit sha1_base64="sj5l2fNS9NN/Y+o9w37YW0pxkJk=">AAACkXicbVFda9RAFJ3ErzZ+bW3fRBlclAplSRZK9UFY6ovgy/qxbWETlpvJ3XToZBJmbqpLyLt/0T/hg7/A2TSg7Xph4HDOvXNmzk0rJS2F4U/Pv3X7zt17W9vB/QcPHz0e7Dw5sWVtBM5EqUpzloJFJTXOSJLCs8ogFKnC0/Ti/Vo/vURjZam/0qrCpIBcy6UUQI5aDH7E8yC2Mi9gMd6n1/wdD+JjmTsyxVzqBoyBVdsopdq/fTHhd7Ki+dweuJFXG/z0iucbwpdOCGLUWX9z55YEcbIYDMNR2BXfBFEPhqyv6WLwO85KUReoSSiwdh6FFR0o0JkV4D7qDEgKhc6itliBuIAc5w5qKNAmTZddy186JuPL0rijiXfsvxMNFNauitR1FkDn9qa2Jv+nzWtavkkaqauaUIsro2WtOJV8vQieSYOC1MoBEEa6t3JxDgYEuXVdcylqRdKU39rAhRTdjGQTnIxH0eEo/DQeTo77uLbYU/aC7bOIHbEJ+8CmbMYE++Xtec+85/6u/9af+H2v7/Uzu+xa+R//ADqYxtk=</latexit>
h i
2 (t) = 2 (R, t) 2 (P, t) 2 (S, t)
<latexit sha1_base64="3m6f2fTy+/1bEHG+Usr96RVnu8s=">AAACgnicbVFNb9QwEHVCW4pL2y0cuVisqCrUrpIiVA4gVXDhWCS2rRRHK8eZ7FrrOJE9AUXR/lBO/Ad+Ad5sJPrBSPY8vTfjsZ+zWiuHUfQrCJ9sbe883X1G957vHxyOjl5cu6qxEqay0pW9zYQDrQxMUaGG29qCKDMNN9nyy1q/+QHWqcp8x7aGtBRzowolBXpqNmp5QqezmH1ilGsoMKE8g7kynbBWtKvOWruijEWMHTN2FvfJ75zTPh/flXryX81G8yQHkw/nUW7VfIEp5elsNI4mUR/sMYgHMCZDXM1Gf3heyaYEg1IL55I4qvFUC5M7KfzL/AhUUoMf0jiohVyKOSQeGlGCS7verBV745mcFZX1yyDr2bsdnSida8vMV5YCF+6htib/pyUNFh/STpm6QTByM6hoNMOKrZ1nubIgUbceCGmVvyuTC2GFRP8/96aUjUZlq58r6k2KH1ryGFyfT+L3k+jb+fjy82DXLnlFXpMTEpMLckm+kisyJZL8DraDg+Aw3ArfhnH4blMaBkPPS3Ivwo9/AY6+tZ4=</latexit>
2 3 <latexit sha1_base64="lfHFLfUQ5LGiWDAINGX28Ss12d0=">AAACgnicbVFNb9QwFHTSFoop7QLHXixWVKiCVbKoggNIFb1wLBLbVoqjleO87Fp1nMh+KYqi/aE98R/4BXizkegHT7I9mnn22OOs1sphFN0G4db2zpOnu8/o870X+wejl68uXNVYCTNZ6cpeZcKBVgZmqFDDVW1BlJmGy+z6bK1f3oB1qjI/sa0hLcXCqEJJgZ6aj1qe0Nl8yr4yyjUUmFCewUKZTlgr2lVnrV1RxiLGjhiL+/mDXzinG3B0R+vJfz0byZMcTD6cR7lViyWmlKfz0TiaRH2xxyAewJgMdT4f/eF5JZsSDEotnEviqMb3WpjcSeFf5i1QSQ3epHFQC3ktFpB4aEQJLu36sFbsrWdyVlTWD4OsZ+/u6ETpXFtmvrMUuHQPtTX5Py1psPicdsrUDYKRG6Oi0Qwrtk6e5cqCRN16IKRV/q5MLoUVEv3/3HMpG43KVr9W1IcUP4zkMbiYTuKTSfRjOj79NsS1Sw7JG/KOxOQTOSXfyTmZEUl+BzvBfnAQbofHYRx+3LSGwbDnNblX4Ze/kNa1nw==</latexit>
2 3
0 1 1 0 1 1
U1 = 4 1 0 1 5 U2 = 4 1 0 1 5
1 1 0 1 1 0
<latexit sha1_base64="hmnzCI0AIInL2FZ9KdItS8YA/l4=">AAACanicbVFNb9QwEPWGrxK+lnJCvVgsoCK1q6QSggtSBReOBbFtpThaTZzJrlXHjuwJsIry+/gN/Ae4cIAr3m0OtGUk209v3ujZz0Wjlack+T6Krl2/cfPW1u34zt179x+MH24fe9s6iTNptXWnBXjUyuCMFGk8bRxCXWg8Kc7erfsnn9F5Zc0nWjWY17AwqlISKFDzMYgsns3TXUH4lbzsPvZ7QpaWXvA3PBYaK8piUeBCmQ6cg1XfOef6mPOE8+ecp5t9PxxCxAJNOahi4dRiSXks8vl4kkyTTfGrIB3AhA11NB//EqWVbY2GpAbvszRpaE+DKb2E8IJgQUpqDCatxwbkGSwwC9BAjT7vNqH0/FlgSl5ZF5YhvmH/neig9n5VF0FZAy395d6a/F8va6l6nXfKNC2hkedGVas5Wb5OmJfKoSS9CgCkU+GuXC7BgaTwDxdc6laTcvZLH4eQ0suRXAXHB9P05TT5cDA5fDvEtcV22BO2y1L2ih2y9+yIzZhk39hP9pv9Gf2ItqPH0c65NBoNM4/YhYqe/gWJ1LjU</latexit>
⇥ ⇤
U1 (R, ·) = 0 1 1
<latexit sha1_base64="4lz9ISSYcxbYr7zFaWeKIzlWuxo=">AAACanicbVFNb9QwEPWGrxK+lnJCvVgsoCKVVVIJwQWpggvHRWLbSnG0mjiTXauOHdkTYBXl9/Eb+A9w4QBXvNscaMuTbD29mdEbPxeNVp6S5Psounb9xs1bO7fjO3fv3X8wfrh77G3rJM6l1dadFuBRK4NzUqTxtHEIdaHxpDh7v6mffEbnlTWfaN1gXsPSqEpJoCAtxiCyeL5I9wXhV/Kym/UHQpaWXvC3PBYaK8piUeBSmQ6cg3XfOef6mPOXKefPOU+2d+BCxAJNOXTFwqnlivJY5IvxJJkmW/CrJB3IhA2YLca/RGllW6MhqcH7LE0aOtBgSi8hvCBYkJIag0nrsQF5BkvMAjVQo8+7bSg9fxaUklfWhWOIb9V/JzqovV/XReisgVb+cm0j/q+WtVS9yTtlmpbQyHOjqtWcLN8kzEvlUJJeBwLSqbArlytwICn8wwWXutWknP3SxyGk9HIkV8nx4TR9NU0+Hk6O3g1x7bA99oTts5S9ZkfsA5uxOZPsG/vJfrM/ox/RbvQ42jtvjUbDzCN2AdHTv4ZfuNI=</latexit>
⇥ ⇤
U1 (P, ·) = 1 0 1
<latexit sha1_base64="9Np1xyX1zI7Sj8R6AijjyRFEVgs=">AAACanicbVFNb9QwEPWGr2K+lnJCvVgsoCK1q6QSgkulCi49toJtK8XRynEmu1YdO7InLasov6+/gf8AFw5wxbubA215kuWnNzN64+e81spjHH8fRHfu3rv/YOMhffT4ydNnw+ebJ942TsJEWm3dWS48aGVgggo1nNUORJVrOM3PPy/rpxfgvLLmKy5qyCoxM6pUUmCQpkPBUzqZJtsc4Rt62X7pdrgsLL5j+4xyDSWmlOcwU6YVzolF1zrnOspYwthbxnbXV8wY55SDKfouyp2azTGjPJsOR/E4XoHdJklPRqTH0XT4ixdWNhUYlFp4nyZxjTtamMJLEV4QLFBJDcGk8VALeS5mkAZqRAU+a1ehdOxNUApWWheOQbZS/51oReX9ospDZyVw7m/WluL/ammD5cesVaZuEIxcG5WNZmjZMmFWKAcS9SIQIZ0KuzI5F05IDP9wzaVqNCpnLzsaQkpuRnKbnOyNk/fj+HhvdPCpj2uDbJFXZJsk5AM5IIfkiEyIJFfkJ/lN/gx+RJvRy2hr3RoN+pkX5Bqi138BjA241Q==</latexit>
⇥ ⇤
U1 (S, ·) = 1 1 0
Example (2)
<latexit sha1_base64="IsId3nRVEE6Z4N9z1dgWF3Lcj0M=">AAAD6HictVPNjtMwEHYTfpbw14UjF4uKVVcKVVIJwQVptQiJY6B0d6W6qhzHzZo6TmRPgCrKO3BDXHkrLjwET4DTRmK7pXtBjOLo83wz48+TSVxIYSAIfnQc99r1Gzf3bnm379y9d7+7/+DE5KVmfMxymeuzmBouheJjECD5WaE5zWLJT+PFq4Y//ci1Ebl6D8uCTzOaKjEXjIJ1zfY7P8nEIzFPhaqo1nRZV9Ja7ZEkh4oYkWa0noV9AvwzGFa9q304xBgf2PUSr/kt+gCTY5H2x5sEYbbkIfHbpGG/iXz6p4bdEt/mEB9fDGlK2Texzy5R0dWiol2iov8panS1qNEuUaN/E+URrpL2U3pkOuv2gkGwMrwNwhb0UGvRrPvL3oeVGVfAJDVmEgYF+JKqxDBq58eWBsEkt8VLwwvKFjTlEwsVzbiZVquRrPET60nwPNd2KcAr78WMimbGLLPYRmYUzs1lrnH+jZuUMH8xrYQqSuCKrQ+alxJDjpv5xonQnIFcWkCZFlYrZudUUwb2L9g4JSslCJ1/qj3bpPByS7bByXAQPhsEb4e9o+O2XXvoEXqM+ihEz9EReoMiNEbMee0sHHBK94P7xf3qfluHOp025yHaMPf7b7rJNnk=</latexit>
⇣ ⌘
˙ 1 (R, t) = 1 (R, t) U1 (R, ·) 2 (t) 1 (t) U1 2 (t)
⇣ ⌘
˙ 1 (P, t) = 1 (P, t) U1 (P, ·) 2 (t) 1 (t) U1 2 (t)
⇣ ⌘
˙ 1 (S, t) = 1 (S, t) U1 (S, ·) 2 (t) 1 (t) U1 2 (t)
Evolutionary Stable Strategies
A strategy is an ESS if it is immune to invasion by mutant
strategies, given that the mutants initially occupy a small
fraction of populatio
fi
Evolutionary Stable Strategies (1)
A strategy is an ESS if it is immune to invasion by mutant
strategies, given that the mutants initially occupy a small
fraction of populatio
fi
Evolutionary Stable Strategies (2)
Given a Nash equilibrium, a small perturbation could
make the equilibrium unstable
Player 2
Action 1 Action 2
Action 1 1, 1 0, 0
Player 1
Action 2 0, 0 0, 0
Nash equilibrium
Evolutionary Stable Strategies (2)
Given a Nash equilibrium, a small perturbation could
make the equilibrium unstable
Player 2
Action 1 Action 2
Action 1 1, 1 0, 0 ✏
Player 1
<latexit sha1_base64="fKFPxQ2uOzd9AsOzCkr5rrYcu60=">AAACEXicbVDLTgIxFO3gC/GFunTTCCYuDJkhMbokunGJiYARCOl07kBDp520HQ2Z8BW61f9wZ9z6Bf6GX2CBWQh4kpucnHNvzs3xY860cd1vJ7eyura+kd8sbG3v7O4V9w+aWiaKQoNKLtW9TzRwJqBhmOFwHysgkc+h5Q+vJ37rEZRmUtyZUQzdiPQFCxklxkoP5Q7EmnEpyr1iya24U+Bl4mWkhDLUe8WfTiBpEoEwlBOt254bmzNORKApsUEpUYZRDuNCJ9EQEzokfWhbKkgEuptOfx/jE6sEOJTKjjB4qv69SEmk9Sjy7WZEzEAvehPxP6+dmPCymzIRJwYEnQWFCcdG4kkROGAKqOEjSwhVzP6K6YAoQo2tay4lSrhhSj6NC7Ykb7GSZdKsVrzzintbLdWusrry6Agdo1PkoQtUQzeojhqIIoFe0Ct6c56dd+fD+Zyt5pzs5hDNwfn6BVHNng8=</latexit>
Action 2 0, 0 0, 0 1 ✏
<latexit sha1_base64="DXlf8mI+2ebZtsecwtC+PNdDUi4=">AAACE3icbVBLSgNBFOzxG+Mv6tJNYyK40DATEF0G3biMYD6QCaGnpydp0j+6e5QQcgzd6j3ciVsP4DU8gZ1kFiax4EFR9R71qEgxaqzvf3srq2vrG5u5rfz2zu7efuHgsGFkqjGpY8mkbkXIEEYFqVtqGWkpTRCPGGlGg9uJ33wk2lApHuxQkQ5HPUETipF1UlgKLkKiDGVSlLqFol/2p4DLJMhIEWSodQs/YSxxyomwmCFj2oGv7DlDIjYYuagR0pZiRsb5MDVEITxAPdJ2VCBOTGc0/X4MT50Sw0RqN8LCqfr3YoS4MUMeuU2ObN8sehPxP6+d2uS6M6JCpZYIPAtKUgathJMqYEw1wZYNHUFYU/crxH2kEbausLkUnjJLtXwa511JwWIly6RRKQeXZf++UqzeZHXlwDE4AWcgAFegCu5ADdQBBgq8gFfw5j17796H9zlbXfGymyMwB+/rFzoJnoE=</latexit>
✏ 1 ✏
<latexit sha1_base64="DXlf8mI+2ebZtsecwtC+PNdDUi4=">AAACE3icbVBLSgNBFOzxG+Mv6tJNYyK40DATEF0G3biMYD6QCaGnpydp0j+6e5QQcgzd6j3ciVsP4DU8gZ1kFiax4EFR9R71qEgxaqzvf3srq2vrG5u5rfz2zu7efuHgsGFkqjGpY8mkbkXIEEYFqVtqGWkpTRCPGGlGg9uJ33wk2lApHuxQkQ5HPUETipF1UlgKLkKiDGVSlLqFol/2p4DLJMhIEWSodQs/YSxxyomwmCFj2oGv7DlDIjYYuagR0pZiRsb5MDVEITxAPdJ2VCBOTGc0/X4MT50Sw0RqN8LCqfr3YoS4MUMeuU2ObN8sehPxP6+d2uS6M6JCpZYIPAtKUgathJMqYEw1wZYNHUFYU/crxH2kEbausLkUnjJLtXwa511JwWIly6RRKQeXZf++UqzeZHXlwDE4AWcgAFegCu5ADdQBBgq8gFfw5j17796H9zlbXfGymyMwB+/rFzoJnoE=</latexit>
<latexit sha1_base64="fKFPxQ2uOzd9AsOzCkr5rrYcu60=">AAACEXicbVDLTgIxFO3gC/GFunTTCCYuDJkhMbokunGJiYARCOl07kBDp520HQ2Z8BW61f9wZ9z6Bf6GX2CBWQh4kpucnHNvzs3xY860cd1vJ7eyura+kd8sbG3v7O4V9w+aWiaKQoNKLtW9TzRwJqBhmOFwHysgkc+h5Q+vJ37rEZRmUtyZUQzdiPQFCxklxkoP5Q7EmnEpyr1iya24U+Bl4mWkhDLUe8WfTiBpEoEwlBOt254bmzNORKApsUEpUYZRDuNCJ9EQEzokfWhbKkgEuptOfx/jE6sEOJTKjjB4qv69SEmk9Sjy7WZEzEAvehPxP6+dmPCymzIRJwYEnQWFCcdG4kkROGAKqOEjSwhVzP6K6YAoQo2tay4lSrhhSj6NC7Ykb7GSZdKsVrzzintbLdWusrry6Agdo1PkoQtUQzeojhqIIoFe0Ct6c56dd+fD+Zyt5pzs5hDNwfn6BVHNng8=</latexit>
Evolutionary Stable Strategies (2)
Given a Nash equilibrium, a small perturbation could
make the equilibrium unstable
Player 2
Action 1 Action 2
<latexit sha1_base64="fKFPxQ2uOzd9AsOzCkr5rrYcu60=">AAACEXicbVDLTgIxFO3gC/GFunTTCCYuDJkhMbokunGJiYARCOl07kBDp520HQ2Z8BW61f9wZ9z6Bf6GX2CBWQh4kpucnHNvzs3xY860cd1vJ7eyura+kd8sbG3v7O4V9w+aWiaKQoNKLtW9TzRwJqBhmOFwHysgkc+h5Q+vJ37rEZRmUtyZUQzdiPQFCxklxkoP5Q7EmnEpyr1iya24U+Bl4mWkhDLUe8WfTiBpEoEwlBOt254bmzNORKApsUEpUYZRDuNCJ9EQEzokfWhbKkgEuptOfx/jE6sEOJTKjjB4qv69SEmk9Sjy7WZEzEAvehPxP6+dmPCymzIRJwYEnQWFCcdG4kkROGAKqOEjSwhVzP6K6YAoQo2tay4lSrhhSj6NC7Ykb7GSZdKsVrzzintbLdWusrry6Agdo1PkoQtUQzeojhqIIoFe0Ct6c56dd+fD+Zyt5pzs5hDNwfn6BVHNng8=</latexit>
Action 1
Action 2 0, 0 0, 0 1 ✏
<latexit sha1_base64="DXlf8mI+2ebZtsecwtC+PNdDUi4=">AAACE3icbVBLSgNBFOzxG+Mv6tJNYyK40DATEF0G3biMYD6QCaGnpydp0j+6e5QQcgzd6j3ciVsP4DU8gZ1kFiax4EFR9R71qEgxaqzvf3srq2vrG5u5rfz2zu7efuHgsGFkqjGpY8mkbkXIEEYFqVtqGWkpTRCPGGlGg9uJ33wk2lApHuxQkQ5HPUETipF1UlgKLkKiDGVSlLqFol/2p4DLJMhIEWSodQs/YSxxyomwmCFj2oGv7DlDIjYYuagR0pZiRsb5MDVEITxAPdJ2VCBOTGc0/X4MT50Sw0RqN8LCqfr3YoS4MUMeuU2ObN8sehPxP6+d2uS6M6JCpZYIPAtKUgathJMqYEw1wZYNHUFYU/crxH2kEbausLkUnjJLtXwa511JwWIly6RRKQeXZf++UqzeZHXlwDE4AWcgAFegCu5ADdQBBgq8gFfw5j17796H9zlbXfGymyMwB+/rFzoJnoE=</latexit>
✏ 1 ✏
<latexit sha1_base64="DXlf8mI+2ebZtsecwtC+PNdDUi4=">AAACE3icbVBLSgNBFOzxG+Mv6tJNYyK40DATEF0G3biMYD6QCaGnpydp0j+6e5QQcgzd6j3ciVsP4DU8gZ1kFiax4EFR9R71qEgxaqzvf3srq2vrG5u5rfz2zu7efuHgsGFkqjGpY8mkbkXIEEYFqVtqGWkpTRCPGGlGg9uJ33wk2lApHuxQkQ5HPUETipF1UlgKLkKiDGVSlLqFol/2p4DLJMhIEWSodQs/YSxxyomwmCFj2oGv7DlDIjYYuagR0pZiRsb5MDVEITxAPdJ2VCBOTGc0/X4MT50Sw0RqN8LCqfr3YoS4MUMeuU2ObN8sehPxP6+d2uS6M6JCpZYIPAtKUgathJMqYEw1wZYNHUFYU/crxH2kEbausLkUnjJLtXwa511JwWIly6RRKQeXZf++UqzeZHXlwDE4AWcgAFegCu5ADdQBBgq8gFfw5j17796H9zlbXfGymyMwB+/rFzoJnoE=</latexit>
<latexit sha1_base64="fKFPxQ2uOzd9AsOzCkr5rrYcu60=">AAACEXicbVDLTgIxFO3gC/GFunTTCCYuDJkhMbokunGJiYARCOl07kBDp520HQ2Z8BW61f9wZ9z6Bf6GX2CBWQh4kpucnHNvzs3xY860cd1vJ7eyura+kd8sbG3v7O4V9w+aWiaKQoNKLtW9TzRwJqBhmOFwHysgkc+h5Q+vJ37rEZRmUtyZUQzdiPQFCxklxkoP5Q7EmnEpyr1iya24U+Bl4mWkhDLUe8WfTiBpEoEwlBOt254bmzNORKApsUEpUYZRDuNCJ9EQEzokfWhbKkgEuptOfx/jE6sEOJTKjjB4qv69SEmk9Sjy7WZEzEAvehPxP6+dmPCymzIRJwYEnQWFCcdG4kkROGAKqOEjSwhVzP6K6YAoQo2tay4lSrhhSj6NC7Ykb7GSZdKsVrzzintbLdWusrry6Agdo1PkoQtUQzeojhqIIoFe0Ct6c56dd+fD+Zyt5pzs5hDNwfn6BVHNng8=</latexit>
Evolutionary Stable Strategies (2)
Given a Nash equilibrium, a small perturbation could
make the equilibrium unstable
Player 2
Action 1 Action 2
ESS
Action 1 1, 1 0, 0 1 ✏
<latexit sha1_base64="DXlf8mI+2ebZtsecwtC+PNdDUi4=">AAACE3icbVBLSgNBFOzxG+Mv6tJNYyK40DATEF0G3biMYD6QCaGnpydp0j+6e5QQcgzd6j3ciVsP4DU8gZ1kFiax4EFR9R71qEgxaqzvf3srq2vrG5u5rfz2zu7efuHgsGFkqjGpY8mkbkXIEEYFqVtqGWkpTRCPGGlGg9uJ33wk2lApHuxQkQ5HPUETipF1UlgKLkKiDGVSlLqFol/2p4DLJMhIEWSodQs/YSxxyomwmCFj2oGv7DlDIjYYuagR0pZiRsb5MDVEITxAPdJ2VCBOTGc0/X4MT50Sw0RqN8LCqfr3YoS4MUMeuU2ObN8sehPxP6+d2uS6M6JCpZYIPAtKUgathJMqYEw1wZYNHUFYU/crxH2kEbausLkUnjJLtXwa511JwWIly6RRKQeXZf++UqzeZHXlwDE4AWcgAFegCu5ADdQBBgq8gFfw5j17796H9zlbXfGymyMwB+/rFzoJnoE=</latexit>
Player 1
Action 2 0, 0 0, 0 ✏
<latexit sha1_base64="fKFPxQ2uOzd9AsOzCkr5rrYcu60=">AAACEXicbVDLTgIxFO3gC/GFunTTCCYuDJkhMbokunGJiYARCOl07kBDp520HQ2Z8BW61f9wZ9z6Bf6GX2CBWQh4kpucnHNvzs3xY860cd1vJ7eyura+kd8sbG3v7O4V9w+aWiaKQoNKLtW9TzRwJqBhmOFwHysgkc+h5Q+vJ37rEZRmUtyZUQzdiPQFCxklxkoP5Q7EmnEpyr1iya24U+Bl4mWkhDLUe8WfTiBpEoEwlBOt254bmzNORKApsUEpUYZRDuNCJ9EQEzokfWhbKkgEuptOfx/jE6sEOJTKjjB4qv69SEmk9Sjy7WZEzEAvehPxP6+dmPCymzIRJwYEnQWFCcdG4kkROGAKqOEjSwhVzP6K6YAoQo2tay4lSrhhSj6NC7Ykb7GSZdKsVrzzintbLdWusrry6Agdo1PkoQtUQzeojhqIIoFe0Ct6c56dd+fD+Zyt5pzs5hDNwfn6BVHNng8=</latexit>
1 ✏ ✏
<latexit sha1_base64="DXlf8mI+2ebZtsecwtC+PNdDUi4=">AAACE3icbVBLSgNBFOzxG+Mv6tJNYyK40DATEF0G3biMYD6QCaGnpydp0j+6e5QQcgzd6j3ciVsP4DU8gZ1kFiax4EFR9R71qEgxaqzvf3srq2vrG5u5rfz2zu7efuHgsGFkqjGpY8mkbkXIEEYFqVtqGWkpTRCPGGlGg9uJ33wk2lApHuxQkQ5HPUETipF1UlgKLkKiDGVSlLqFol/2p4DLJMhIEWSodQs/YSxxyomwmCFj2oGv7DlDIjYYuagR0pZiRsb5MDVEITxAPdJ2VCBOTGc0/X4MT50Sw0RqN8LCqfr3YoS4MUMeuU2ObN8sehPxP6+d2uS6M6JCpZYIPAtKUgathJMqYEw1wZYNHUFYU/crxH2kEbausLkUnjJLtXwa511JwWIly6RRKQeXZf++UqzeZHXlwDE4AWcgAFegCu5ADdQBBgq8gFfw5j17796H9zlbXfGymyMwB+/rFzoJnoE=</latexit>
<latexit sha1_base64="fKFPxQ2uOzd9AsOzCkr5rrYcu60=">AAACEXicbVDLTgIxFO3gC/GFunTTCCYuDJkhMbokunGJiYARCOl07kBDp520HQ2Z8BW61f9wZ9z6Bf6GX2CBWQh4kpucnHNvzs3xY860cd1vJ7eyura+kd8sbG3v7O4V9w+aWiaKQoNKLtW9TzRwJqBhmOFwHysgkc+h5Q+vJ37rEZRmUtyZUQzdiPQFCxklxkoP5Q7EmnEpyr1iya24U+Bl4mWkhDLUe8WfTiBpEoEwlBOt254bmzNORKApsUEpUYZRDuNCJ9EQEzokfWhbKkgEuptOfx/jE6sEOJTKjjB4qv69SEmk9Sjy7WZEzEAvehPxP6+dmPCymzIRJwYEnQWFCcdG4kkROGAKqOEjSwhVzP6K6YAoQo2tay4lSrhhSj6NC7Ykb7GSZdKsVrzzintbLdWusrry6Agdo1PkoQtUQzeojhqIIoFe0Ct6c56dd+fD+Zyt5pzs5hDNwfn6BVHNng8=</latexit>
Prisoner’s dilemma
Player 2
Cooperate Defeat
Player 2
Cooperate Defeat
Player 2
1 1
Cooperate Defeat
probability
y1 y1
0.5 0.5
Cooperate 3,3 0,5
Player 1
0.25 0.25
Prisoner’s dilemma
Evolutionary Dy
Player 2
1 1
Cooperate Defeat
probability
y1 y1
0.5 0.5
Cooperate 3,3 0,5
Player 1
0.25 0.25
Stag hunt
Player 2
Stag Hare
Player 2
Stag Hare
Player 2
1 1 1
Stag Hare
0.75 0.75 0.75
Stag hunt
Evolutionary Dynamics of Multi-Agent Le
Player 2 asymptotically
stable
1 1 1
Stag Hare
0.75 0.75 0.75
Matching pennies
Player 2
Head Tail
Player 2
Head Tail
Player 2
1 1
Head Tail
0.75 0.75
0.25 0.25
Matching pennies
Evolutionary Dynamics of Multi-Agent Learning
Player 2
1 1
Head Tail
0.75 0.75
0.25 0.25
Starting point
⇣ ⌘
˙ 1 (a, t) = 1 (a, t) ea U 1 2 (t) 1 (t)U1 2 (t)
Having a starting point with zero strategies does not make sense
Starting point
Evolutionary Dynamics of Multi-Agent Learning
Player 2
1 1
Action 1 Action 2
0.75 0.75
0.25 0.25
Action 2 …, … …, …
0 0
0.25 0.5 0.75 1 0 0.25 0.5 0.75 1 0 0.25 0.5 0.75 1
x1 x1 x
Player11’s head
probability
4: The replicator dynamics, plotted in the unit simplex, for the prisoner’s dilemma
e stag hunt (center), and matching pennies (right).
d
Starting point
Evolutionary Dynamics of Multi-Agent Learning
Player 2
1 1
Action 1 Action 2
0.75 0.75
0.25 0.25
Action 2 …, … …, …
0 0
0.25 0.5 0.75 1 0 0.25 0.5 0.75 1 0 0.25 0.5 0.75 1
x1 x1 x
Player11’s head
probability
4: The replicator dynamics, plotted in the unit simplex, for the prisoner’s dilemma
e stag hunt (center), and matching pennies (right).
d
Starting point
Evolutionary Dynamics of Multi-Agent Learning
Player 2
1 1
Action 1 Action 2
0.75 0.75
0.25 0.25
Action 2 …, … …, …
0 0
0.25 0.5 0.75 1 0 0.25 0.5 0.75 1 0 0.25 0.5 0.75 1
x1 x1 x
Player11’s head
probability
4: The replicator dynamics, plotted in the unit simplex, for the prisoner’s dilemma
e stag hunt (center), and matching pennies (right).
d
Starting point
Evolutionary Dynamics of Multi-Agent Learning
Player 2
1 1
Action 1 Action 2
0.75 0.75
0.25 0.25
Action 2 …, … …, …
0 0
0.25 0.5 0.75 1 0 0.25 0.5 0.75 1 0 0.25 0.5 0.75 1
x1 x1 x
Player11’s head
probability
4: The replicator dynamics, plotted in the unit simplex, for the prisoner’s dilemma
e stag hunt (center), and matching pennies (right).
d
Single-population games
• There is a single population of individuals that is
playing against itsel
• An example: bacterials
f
Example
Single population
Type 1 1 2 0
Single population
<latexit sha1_base64="dQwcmzBg8aG40FxjXubuxah7rZ4=">AAACInicbVDLSgMxFM3Ud31VXYmbYCsoSJkpiC5FNy4r9CG0pdxJ0zaYZIbkjlqG4s/oVv/DnbgS/Aq/wLR2YdUDgcO553JuThhLYdH3373MzOzc/MLiUnZ5ZXVtPbexWbNRYhivskhG5ioEy6XQvIoCJb+KDQcVSl4Pr89H8/oNN1ZEuoKDmLcU9LToCgbopHZuu9C0oqdgv4n8Di1LK85Eg+FBoZ3L+0V/DPqXBBOSJxOU27nPZidiieIamQRrG4Ef46EE3bEMXHIKBgWTfJhtJpbHwK6hxxuOalDcttLxZ4Z0zykd2o2MexrpWP25kYKydqBC51SAfft7NhL/mzUS7J60UqHjBLlm30HdRFKM6KgZ2hGGM5QDR4AZ4W6lrA8GGLr+plJUIlGY6HaYdSUFvyv5S2qlYnBU9C9L+dOzSV2LZIfskn0SkGNySi5ImVQJI/fkkTyRZ+/Be/Fevbdva8ab7GyRKXgfXwkQpCA=</latexit>
(Type 2)
Type 2 0 1 2
<latexit sha1_base64="fqNxtUaYet8LtVV1klaGZOu2Ur4=">AAACInicbVDLSgMxFM34tr6qrsRNsAoVpMwURJeiG5cV+hDaUu6kaQ1NMkNyRy1D8Wd0q//hTlwJfoVfYFq70NYDgcO553JuThhLYdH3P7yZ2bn5hcWl5czK6tr6RnZzq2qjxDBeYZGMzHUIlkuheQUFSn4dGw4qlLwW9i6G89otN1ZEuoz9mDcVdLXoCAbopFZ2Z79hRVdBvoH8Hi1Ly85Ei4PD/VY25xf8Eeg0CcYkR8YotbJfjXbEEsU1MgnW1gM/xiMJum0ZuOQUDAom+SDTSCyPgfWgy+uOalDcNtPRZwb0wClt2omMexrpSP29kYKytq9C51SAN3ZyNhT/m9UT7Jw2U6HjBLlmP0GdRFKM6LAZ2haGM5R9R4AZ4W6l7AYMMHT9/UlRiURhortBxpUUTFYyTarFQnBc8K+KubPzcV1LZJfskTwJyAk5I5ekRCqEkQfyRJ7Ji/fovXpv3vuPdcYb72yTP/A+vwEKt6Qh</latexit>
(Type 3)
Type 3 2 0 1
<latexit sha1_base64="utC7CMJOqp+hqnXnXrR5eCJ1QIs=">AAACInicbVDLSgMxFM34rPVVdSVugq2gIGWmIrosunFZoVWhLeVOmtZgkhmSO2oZij+jW/0Pd+JK8Cv8AtPHQqsHAodzz+XcnDCWwqLvf3hT0zOzc/OZhezi0vLKam5t/cJGiWG8xiIZmasQLJdC8xoKlPwqNhxUKPlleHM6mF/ecmNFpKvYi3lTQVeLjmCATmrlNgsNK7oKdhvI79GytOpM9KC/V2jl8n7RH4L+JcGY5MkYlVbuq9GOWKK4RibB2nrgx7gvQbctA5ecgkHBJO9nG4nlMbAb6PK6oxoUt810+Jk+3XFKm3Yi455GOlR/bqSgrO2p0DkV4LWdnA3E/2b1BDvHzVToOEGu2Siok0iKER00Q9vCcIay5wgwI9ytlF2DAYauv18pKpEoTHTXz7qSgslK/pKLUjE4LPrnpXz5ZFxXhmyRbbJLAnJEyuSMVEiNMPJAnsgzefEevVfvzXsfWae88c4G+QXv8xsMXqQi</latexit>
Convergence
Theorem. The replicator equation for a matrix game
(single-population) satis es
1. A stable rest point is a NE
Multi-population games
• Every player is associated with a different population
of individual
• Theorem. (p*, q*) is a two-species ESS if and only if
(p*, q*) is a strict Nash in a neighbourhood an
• is a locally asymptotically stable rest point of the
replicator equatio
• if it is in the interior, then it is a globally
asymptotically stable rest point of the replicator
equation
s
Known dynamics
BNN Smith
Properties
Multi-agent learning
Markov decision problem
Reinforcement learning
Q-learning (1)
h i
Q(s, a) Q(s, a) + ↵ r + maxa0 Q(s, a0 ) Q(s, a)
Player
•1 stat
•2 actions (Cooperate, Defeat)
Player 2
Cooperate Defeat
Player 2
Cooperate Defeat
temperature
Q-learning (2)
Softmax (a.k.a. Boltzam exploration)
exp(Q(s, a)/⌧ )
i (a) = P 0 )/⌧ )
a 0 exp(Q(s, a
temperature
Cooperate Defeat
Q-learning algorithm
Learning dynamics
Assumptions
•Time is continuou
•All the actions can be selected simultaneously
↵ ⇣ ⌘ ⇣ X ⌘
1 (a, t) 0 0
˙ 1 (a, t) = ea U1 2 (t) 1 (t)U1 2 (t) ↵ 1 (a, t) log( 1 (a)) 1 (a ) log( 1 (a ))
⌧
a0
:
Learning dynamics
Assumptions
•Time is continuou
•All the actions can be selected simultaneously
↵ ⇣ ⌘ ⇣ X ⌘
1 (a, t) 0 0
˙ 1 (a, t) = ea U1 2 (t) 1 (t)U1 2 (t) ↵ 1 (a, t) log( 1 (a)) 1 (a ) log( 1 (a ))
⌧
a0
Learning dynamics
Assumptions
•Time is continuou
•All the actions can be selected simultaneously
↵ ⇣ ⌘ ⇣ X ⌘
1 (a, t) 0 0
˙ 1 (a, t) = ea U1 2 (t) 1 (t)U1 2 (t) ↵ 1 (a, t) log( 1 (a)) 1 (a ) log( 1 (a ))
⌧
a0
Prisoner’s dilemma
Player 2 FAQ-learning
1
Cooperate Defeat
0.75 0
0.25 0
0.75 0
Stag hunt
Player 2
FAQ-learning
asymptotically
stable
1 1
Stag Hare
0.75 0.75 0.
0.25 0.25 0.
0.75 0.75 0.
Matching pennies
ning Player 2
1 1
Head Tail
0.75 0.75
0.25 0.25
0.75 0.75
d
Cross Learning
Frequency-Adjusted Q-learning