Download as txt, pdf, or txt
Download as txt, pdf, or txt
You are on page 1of 10

This is a video about the most famous problem in game theory.

Problems of this sort pop up everywhere, from nations locked in conflict to


roommates
doing the dishes.
Even game shows have been based around this concept.
Figuring out the best strategy can mean the difference between life and death, war
and
peace, flourishing and the destruction of the planet.
And in the mechanics of this game, we may find the very source of one of the most
unexpected
phenomena in nature.
Cooperation
On the 3rd of September 1949, an American weather monitoring plane collected air
samples
over Japan.
In those samples, they found traces of radioactive material.
The Navy quickly collected and tested rainwater samples from their ships and bases
all over
the world.
They also detected small amounts of Serium 141 and Itrium 91.
But these isotopes have half-lives of one or two months, so they must have been
produced
recently.
And the only place they could have come from was a nuclear explosion.
But the US hadn't performed any tests that year, so the only possible conclusion
was that
the Soviet Union had figured out how to make a nuclear bomb.
This was the news the Americans had been dreading.
Their military supremacy, achieved through the Manhattan Project, was quickly
fading.
This makes the problem of Western Europe and the United States far more serious
than
it was before, and perhaps makes the remnants of war greater.
Some thought their best course of action was to launch an unprovoked nuclear strike
against
the Soviets while they were still ahead.
In the words of Navy Secretary Matthews, to become aggressors for peace.
John von Neumann, the founder of Game Theory, said,
If you say why not bomb them tomorrow, I say why not bomb them today?
If you say today at five o'clock, I say why not at one o'clock?
Something needed to be done about nuclear weapons and fast, but what?
In 1950, the Rand Corporation, a US-based think tank, was studying this question.
And as part of this research, they turned to Game Theory.
That same year, two mathematicians at Rand had invented a new game, one which
unbeknownst
to them at the time, closely resembled the US-Soviet conflict.
This game is now known as the Prisoner's Dilemma, so let's play a game.
A banker with a chest full of gold coins invites you and another player to play
against
each other.
You each get two choices.
You can cooperate or you can defect.
If you both cooperate, you each get three coins.
If one of you cooperates but the other defects, then the one who defected gets five
coins
and the other gets nothing.
And if you both defect, then you each get a coin.
The goal of the game is simple, to get as many coins as you can.
So what would you do?
Suppose your opponent cooperates, then you could also cooperate and get three
coins.
Or you could defect and get five coins instead.
So you are better off defecting.
But what if your opponent defects instead?
Well, you could cooperate and get no coins, or you could defect and at least get
one coin.
So no matter what your opponent does, your best option is always to defect.
Now if your opponent is also rational, they will reach the same conclusion and
therefore
also defect.
As a result, when you both act rationally, you both end up in the suboptimal
situation,
getting one coin each, when you could have gotten three instead.
In the case of the US and Soviet Union, this led both countries to develop huge
nuclear
arsenals of tens of thousands of nuclear weapons each, more than enough to destroy
each other
many times over.
But since both countries had nukes, neither could use them.
And both countries spent around ten trillion dollars developing these weapons.
Both would have been better off if they had cooperated and agreed not to develop
this technology
further.
But since they both acted in their own best interest, they ended up in a situation
where
everyone was worse off.
The prisoner's dilemma is one of the most famous games in game theory.
Thousands and thousands of papers have been published on versions of this game.
In part, that's because it pops up everywhere.
Impalas living in between African woodlands and savannas are prone to catching
ticks, which
can lead to infectious diseases, paralysis, even death.
So it's important for Impalas to remove ticks, and they do this by grooming.
But they can't reach all the spots on their bodies and therefore they need another
Impala
to groom them.
Now grooming someone else comes at a cost.
It costs saliva, electrolytes, time, and attention.
All vital resources under the hot African sun, where a predator could strike at any
moment.
So for the other Impala, it would be best not to pay this cost.
But then again, it too will need help grooming.
So all Impalas face a choice.
Should they groom each other or not?
In other words, should they cooperate or defect?
Well, if they only interact once, then the rational solution is always to defect.
That other Impala is never going to help you, so why bother?
But the thing about a lot of problems is that they're not a single prisoner's
dilemma.
Impalas see each other day after day, and the same situation keeps happening over
and
over again.
So that changes the problem.
Because instead of playing the prisoner's dilemma just once, you're now playing it
many,
many times.
And if I defect now, then my opponent will know that I defected, and they can use
that
against me in the future.
So what is the best strategy in this repeated game?
That is what Robert Axelrod, a political scientist, wanted to find out.
So in 1980, he decided to hold a computer tournament.
He invited some of the world's leading game theorists for many different subjects
to
submit computer programs that would play each other.
Axelrod called these programs strategies.
Each strategy would face off against every other strategy, and against a copy of
itself.
And each matchup would go for 200 rounds.
That's important, and we'll come back to it.
Now Axelrod used points instead of coins, but the payoffs were the same.
The goal of the tournament was to win as many points as possible.
And in the end, the whole tournament was repeated five times over, to ensure the
success was
robust and not just a fluke.
Axelrod gave an example of a simple strategy.
It would start each game by cooperating, and only defect after its opponent had
defected
twice in a row.
In total, Axelrod received 14 strategies, and he added a 15th called random, which
just
randomly cooperates or defects 50% of the time.
All strategies were loaded onto a single computer where they faced off against each
other.
One of the strategies was called Friedman.
It starts off by cooperating, but if its opponent defects just once, it will keep
defecting
for the remainder of the game.
Another strategy was called Joss.
It also starts by cooperating, but then it just copies what the other player did on
the
last move.
Then around 10% of the time, Joss gets sneaky and defects.
It was also a rather elaborate strategy called Grasscamp.
This strategy works the same as Joss, but instead of defecting probabilistically,
Grasscamp
defects in the 50th round, to try and probe the strategy of its opponent and see if
it
can take advantage of any weaknesses.
The most elaborate strategy was, name withheld, with 77 lines of code.
After all the games were played, the results were tallied up and the leaderboard
established.
The crazy thing was that the simplest program ended up winning.
A program that came to be called Tit for Tat.
Tit for Tat starts by cooperating, and then it copies exactly what its opponent did
in the last move.
So it would follow cooperation with cooperation and defection with defection.
But only once, if its opponent goes back to cooperating, so does Tit for Tat.
When Tit for Tat played against Friedman, both started by cooperating and they kept
cooperating, both ending up with perfect scores for complete cooperation.
When Tit for Tat played against Joss, they too started by cooperating, but then on
the
sixth move, Joss defected.
This sparked a series of back and forth defections, a sort of echo effect.
OK, so now you've got this alternating thing, which will remind you of some of the
politics
of the world today, where we have to do something to you because of what you did to
us.
And then when this weird program throws in a second, unprovoked defection.
Now it's really bad because now both programs are going to defect on each other for
the
rest of the game.
And that's also like some of the things that we're seeing in politics today and in
international
relations.
As a result of these mutual retaliations, both Tit for Tat and Joss did poorly.
But because Tit for Tat managed to cooperate with enough other strategies, it still
won
the tournament.
Hey, my God, there's Professor Axelrod.
Hey!
There's James George.
Well, let it shrink, this is.
And I imagine initially it'd be sort of like computer chess where you need a pretty
complicated program to play a sophisticated game, but in fact, it was not like that
at
all.
It was the simplest strategy that did the best.
So I analyzed how that happened.
First, they were all nice, which just means they are not the first to defect.
So Tit for Tat is a nice strategy, it can defect, but only in retaliation.
The opposite of nice is nasty, that's a strategy that affects first.
So Joss is nasty.
Out of the 15 strategies in the tournament, eight were nice and seven nasty.
The top eight strategies were all nice.
And even the worst performing nice strategy, still far outscored the best
performing nasty
one.
The second important quality was being forgiving.
A forgiving strategy is one that can retaliate, but it doesn't hold a grudge.
So Tit for Tat is a forgiving strategy.
It retaliates when its opponent defects, but it doesn't let defections from before
the
last round influence its current decisions.
Friedman, on the other hand, is maximally unforgiving.
After the first defection, just from the opponent, would defect for the rest of the
game.
Okay, that's it.
No mercy.
And that might feel good to do, but it doesn't end up working out well in the long
run.
This conclusion that it pays to be nice and forgiving came as a shock to the
experts.
Many had tried to be tricky and create subtle nasty strategies to beat their
opponent and
eke out an advantage, but they all failed.
Instead in this tournament, nice guys finished first.
Now Tit for Tat is quite forgiving, but it's possible to be even more forgiving.
Axelrod's sample strategy only defects after its opponent defected twice in a row.
It was tit for two tats.
Now that might sound overly generous, but when Axelrod ran the numbers, he found
that
if anyone had submitted the sample strategy, they would have won the tournament.
I mean, it's so clever.
There are so many layers to this story.
The, after Axelrod published his analysis of what happened or circulated it among
these
game theorists, he said, now that we all know what worked well, let's try again.
So he announced a second tournament where everything would be the same except for
one
change, the number of rounds per game.
See, in the first tournament, each game lasted precisely 200 rounds, and that is
important.
Because if you know when the last round is, then there's no reason to cooperate in
that
round.
So you're better off defecting.
Of course, your opponent should reason the same and so they should also defect in
the
last round.
But if you both anticipate defection in the last round, then there's no reason for
you
to cooperate in the second to last round or the round before that or before that
and
so on all the way to the very first round.
And so in Axelrod's tournament, it was a very important thing that the players
didn't
know exactly how long they were going to be playing.
They knew on average it would be 200 rounds, but there was a random number
generator that
prevented them from knowing with certainty.
Yeah, if you're not sure when it ends, then you have to kind of keep cooperating
because
it might keep going and you might need them on your side.
For this second tournament, Axelrod received 62 entries and again added random.
The contestants had gotten the results and analysis from the first tournament and
could use this
information to their advantage.
This created two camps.
Some thought that clearly being nice and forgiving were winning qualities, so they
submitted
nice and forgiving strategies.
One even submitted tit for two tats.
The second camp anticipated that others would be nice and extra forgiving, and
therefore
they submitted nasty strategies to try to take advantage of those that were extra
forgiving.
One such strategy was called Tester.
It would defect on the first move to see how its opponent reacted.
If it retaliated, Tester would apologize and play tit for tat for the remainder of
the
game.
If it didn't retaliate, Tester would defect every other move after that.
But again, being nasty didn't pay.
And once again, tit for tat was the most effective.
Nice strategies again did much better.
In the top 15, only one was not nice.
Similarly, in the bottom 15, only one was not nasty.
After the second tournament, Axelrod identified the other qualities that
distinguished the
better performing strategies.
The third is being retaliatory, which means if your opponent defects, strike back
immediately.
Don't be a pushover.
Always cooperate is a total pushover, and so it's very easy to take advantage of.
Tit for tat, on the other hand, is very hard to take advantage of.
The last quality that Axelrod identified is being clear.
Programs that were too opaque, that were too similar to a random program.
You couldn't figure them out because they were so complicated.
It was very hard to establish any pattern of trust with a program like that because
you
couldn't figure out what it was doing.
The other programs that was playing couldn't figure them out.
So they would end up more or less defaulting to thinking every turn is like the
last time
I'm going to see you, so I might as well defect.
What to me is mind blowing about this is that these four principles being nice,
forgiving,
provocable, and clear, is a lot like the morality that has evolved around the world
that
is often summarized as an eye for an eye.
It's not Christianity, by the way.
It's not to turn the other cheek philosophy.
It's some older philosophy.
What's interesting is that while Tit for two tats would have won the first
tournament,
it only came 24th in the second tournament.
This highlights an important fact.
In the repeated prisoners dilemma, there is no single best strategy.
The strategy that performs best always depends on the other strategies it's
interacting
with.
For example, if you put tit for tat in an environment with only the ultimate
bullies
of always defect, then tit for tat comes in last.
I wanted to see whether, for example, that tit for tat did well because it did well
with
really stupid rules that didn't do well in all themselves, that basically it took
advantage of people.
So he ran a simulation where successful strategies in one generation would see
their numbers
grow, and unsuccessful ones would see their numbers drop.
In this simulation, the worst performing strategies quickly shrink and go extinct,
while
the top performing strategies become more common.
Harrington, the only nasty strategy in the top 15, first grew quickly.
But then as the strategies it was praying on went extinct, Harrington's numbers
also
quickly dropped.
This shows a main benefit of this simulation because it tests how well a strategy
does
with other successful strategies.
After a thousand generations, the proportions are mostly stable, and only nice
strategies
survive.
Again, tit for tat comes out on top, representing 14.5% of the total population.
Now this process may sound similar to evolution, but there is a subtle difference,
which is
that in this case there are no mutations, so it's actually an ecological
simulation.
But what if the world you started in was different?
Imagine a world that is a really nasty place to live, more or less populated with
players
that always defect, except there's a little cluster of tit for tat players that
live in
some kind of nucleus.
And they get to play with each other a lot because they're geographically
sequestered.
They will start building up a lot of points, and also because that translates into
offspring,
they'll start to take over the population.
So in fact, Axelrod showed that a little island of cooperation can emerge and
spread,
and eventually will take over the world, which is fantastic.
How can cooperation emerge in a population of players who are self-interested, who
are
not trying to be good because they're good-hearted?
You don't have to be altruistic.
You could be looking out for number one for yourself and your own interests, and
yet cooperation
can still emerge.
Some argue that this could explain how we went from a world full of completely
selfish organisms,
where every organism only cared about themselves, to one where cooperation emerged
and flourished,
and polis grooming each other to fish cleaning sharks.
Many life forms experience conflicts similar to the prisoner's dilemma, but because
they
don't interact just once, both can be better off by cooperating.
And this doesn't require trust or conscious thought either, because the strategy
could
be encoded in DNA.
As long as it performs better than the other strategies, it can take over a
population.
Axelrod's insights were applied to areas like evolutionary biology and
international conflicts,
but there was one aspect that his original tournaments didn't cover.
What happens if there is a little bit of random error in the game?
Some noise in the system.
For example, one player tries to cooperate, but it comes across as a defection.
Little errors like this happen in the real world all the time.
Like in 1983, the Soviet satellite-based early warning system detected the launch
of an
intercontinental ballistic missile from the US, but the US hadn't launched
anything.
The Soviet system had confused sunlight reflecting off high-altitude clouds with a
ballistic
missile.
Thankfully, Stanislav Petrov, the Soviet officer on duty, dismissed the alarm.
This example shows the potential costs of a signal error, and the importance of
studying
the effects of noise on those strategies.
The word game sounds like it's a children's game, or there's something a misnomer
maybe in calling it game theory, because these are life and death matters,
obviously.
As you mentioned that this came up in the Cold War, it could actually be life and
death
of the whole planet, the whole we could annihilate human civilization.
So these are not games in any kind of trivial sense.
It's just the term that is used by mathematicians and theorists.
When tit-for-tat plays against itself in a noisy environment, both start off by
cooperating.
But if a single cooperation is perceived as a defection, then the other tit-for-tat
retaliates,
and it sets off a chain of alternating retaliations.
And if another cooperation is perceived as a defection, then the rest of the game
is constant
mutual defection.
Therefore, in the long run, both would only get a third of the points they would
get
in a perfect environment.
Tit-for-tat goes from performing very well to performing poorly.
So how do you solve this?
Well, you need a reliable way to break out of these echo effects.
And one way to do this is by playing Tit-for-tat, but with around 10% more
forgiveness.
So instead of retaliating after every defection, you only retaliate around 9 out of
every 10 times.
This helps you break out of those echoes, while still being retaliatory enough to
not be
taken advantage of.
And so we also ran the tournament with noise and generosity.
And that did quite well.
My favorite example is, Tit-for-tat does really well, but it could never do better
than
the player it's playing with.
I mean, think about it.
By design, all they can do is lose or draw.
And yet, when the results of all interactions are tallied up, they come out ahead
of all other strategies.
Similarly, always defect can never lose a game.
It can only draw or win.
But overall, it performs extremely poorly.
This highlights a common misconception, because for many people when they think
about winning,
they think they need to beat the other person.
Like in games like chess or poker, this is true, since one person's game is
necessarily
another person's loss.
So these games are zero sum.
But most of life is not zero sum.
To win, you don't need to get your reward from the other player.
Instead, you can get it from the banker.
Only in real life, the banker is the world.
It is literally everything around you.
It is just up to us to find those win-win situations and then work together to
unlock those
rewards.
Cooperation pays, even among rivals.
From 1950 to 1986, the US and Soviet Union had trouble cooperating and both kept
developing
nukes.
But then, from the late 80s onwards, they started reducing their nuclear
stockpiles.
They too had learned how to resolve conflict.
Rather than making an agreement to abolish all nuclear arms at once and essentially
turning
it into a single prisoner's dilemma, they would disarm slowly, a small number of
nukes
each year, and then they'd check each other to see that they had both cooperated
and
then repeat the year after.
And the year after that, all along checking to ensure mutual cooperation.
In the more than 40 years since Axelrod's tournaments, researchers have continued
to study
which strategies perform best in a variety of environments.
In doing so, they varied everything from payoff structures to strategies to errors
and
more, some even allowed the strategies to mutate.
While tit for tat or generous tit for tat doesn't always come out on top, Axelrod's
main
takeaways still hold.
Be nice, forgiving, but don't be a pushover.
Can I ask you, why did Anatole Rappaport submit tit for tat?
Well, the reason was because I asked it too.
And he wrote saying, yeah, I'm going to do that, but I just want to be clear that
I'm not
sure that this is really such a good idea.
He was a peace researcher, and I think his own inclinations were to be much more
forgiving
and maybe not be so provocable.
But I find fascinating is that one of the main things that sets life apart from
non-living
things, that life gets to make decisions.
We get to make choices, choices that don't only change our future, but also the
future
of those we interact with.
As you see, in the short term, it is often the environment that shapes the player,
that
determines who does well, but in the long run, it is the players that shape the
environment.
So let's play a game, the game of life, and make your choices wisely, because their
impact may reach further than you think.
Using the right strategy matters, but figuring out what the best strategy isn't
easy.
It requires critical thinking and innovative solutions, like Axel Rod's
tournaments.
If you're looking for an easy way to build your problem-solving skills, then check
out
this video's sponsor, Brilliant.
Brilliant will help make you a better thinker in everything from math and data
science
to programming technology you name it.
You can start right now for free, straight from your device, the device you're
watching
this on.
All you need to do is set your learning goal, and Brilliant will design the perfect
path
for you, equipping you with all the tools you need to reach it.
Did you like today's insights from Game Theory?
Then Brilliant's brand new course, Intro to Probability, is the perfect next step.
Learning to Probability is your gateway to mastering the tools of chance, risk, and
prediction.
You will learn how to construct and analyze models of real world situations, from
electrons
and business decisions all the way to the 2023 Women's World Cup.
Whether you're a budding statistician, or you just want to learn about randomness
and
chance, this course will equip you with the skills you need to make decisions in
uncertain
situations.
You'll even take a page out of Axel Rod's playbook, and learn how to build computer
simulations
to put your strategies to the test.
Beyond Probability, Brilliant has a massive library of content covering everything
from
math to data science and programming to technology.
What I love about Brilliant is that each lesson is hands-on, so you'll build real
intuition,
and the best part is that you can learn with Brilliant on the go.
To try everything Brilliant has to offer for free for a full 30 days, visit
brilliant.org
slash veritasium, and the first 200 of you to sign up will get 20% off Brilliant's
annual Premium subscription.
So I want to thank Brilliant for sponsoring this video, and I want to thank you for
watching.

You might also like