Saturday, August 21, 2021

A prisoner’s dilemma cheat sheet

 

A prisoner’s dilemma cheat sheet

Because cooperation is hard.

Buster Benson
Jul 29, 2018 · 16 min read

Prisoner’s dilemma is a strange but fascinating thought experiment / game that can teach us all why some strategies for cooperation are better than others.

Imagine 2 people being questioned about the same crime. They are each talking to the interrogator separately. The interrogator gives each person the same deal: they can choose to vouch for the other person’s innocence, or rat them out. And of course there’s a twist. If both people vouch for each other, they’ll each get 3 months off their sentence, but if the first person vouches for the second person, and the second person rats them out, the first person will get no time off their sentence and the second person will get 5 months off their time. Lastly, if they both rat each other out, they each get 1 month off their time.

It ends up working like this:
Vouch + Vouch = 3 month reduction each
Vouch + Rat = 0 month reduction for first person, 5 for other
Rat + Vouch = 5 month reduction for first person, 0 for other
Rat + Rat = 1 month reduction each

The reason this is tricky is that if you know what the other person is going to do, it’s always to your advantage to rat them out. If they’re going to rat you out, it’s better to rat them out as well. And if they’re going to vouch for you, it’s still also better for you to rat them out. BUT since you don’t know what the other person will do, on average the number of you get reduced will be higher if you vouch for the other person (3+3 = 6, which is larger than 0+5 and also larger than 1+1).

The reason this is interesting is because there’s a slight advantage when you cooperate with others. You don’t benefit as much immediately, over time the total months reduced increases faster as a result of cooperation. And this is how we think cooperation evolved… because the group becomes stronger over the long haul.

(For the purposes of making the game a bit easier to understand we will refer to the outcomes from each game as points rather than time off a sentence.)

There’s no board involved in this game, it’s really as simple as picking a strategy and running a program that plays your strategy against everyone else’s to see how it does. Points are assigned to encourage cooperation, but cooperating also makes you vulnerable for betrayal. When you’re betrayed, the other player gets all the points instead of sharing them. We’ll be going into more detail soon, but if you want a bit more info about the mechanics of the game, it might help to read this or watch this video.

How prisoner’s dilemma strategies work: In prisoner’s dilemma, each player chooses a strategy made up of two moves (Cooperate and Defect) and logic that describes when they want to cooperate with others and when they don’t. All you have to go on is your past interactions with that player.

Here are some examples of popular strategies, and their pros and cons in various situations:

  • Always cooperate, no matter what.
  • Always defect, no matter what.
  • Cooperate unless someone defects, then punish them to some degree.
  • Try to figure out what someone’s strategy is, then play what’s best against that.

Let’s talk through each one and identify their pros and cons.

Always Cooperate

Strategy: Cooperate every move.

This is the strategy that exemplifies “why can’t we all just get along”. It’s unconditional love where you always cooperate no matter how others treat you. Generally admitted to be an unsustainable strategy if applied to all relationships. This scene in Mars Attacks demonstrates why:

If you haven’t seen Mars Attacks recently, it’s so good! And it happens to be a great illustration of how being too optimistic about cooperation can have painful results.

Pros: Always Cooperate is the most altruistic strategy possible. It’s a viable strategy in environments of extremely high trust like high-functioning teams, loving families, etc.

Cons: Cooperation without trust is an invitation for tons of abuse. Just a minor flaw.

Poll for you to take at the end of this article:

Question: If aliens landed on Earth tomorrow, which strategy would you vote for the world’s leaders to use when interacting with them?

What would you do if you were Earth’s ambassador, in charge of figuring out how to respond to the aliens? Think carefully, because the way you answer this question is, ultimately, a reflection of your strategy for cooperation. There will be a poll at the end of the article (if you’re impatient, you can also go straight to it) that will ask you to pick a strategy for interacting with aliens, once we tease apart the different strategies a bit more.

Always Defect

Strategy: Defect every move.

This strategy exemplifies lost faith in cooperation. It’s the ultimate defensive strategy because nobody can turn you into a sucker like President James Dale in Mars Attacks. A classic example of this strategy are the Daleks in Doctor Who:

Daleks were engineered to hate everything and to destroy everything that they hate.

Pros: You will always win or tie against any specific opponent because they never have an opening to grab points from you.

Cons: Though you win or tie every round, you do so at a lower point threshold than cooperation would have achieved.

Time to explain how the game actually works!

❋ How Prisoner’s Dilemma works

A “turn” in a prisoner’s dilemma game takes place between two players and can include one or more “interactions”. An interaction is one opportunity to Cooperate or Defect.

If Always Cooperate interacts with Always Defect, we get this result:

Cooperate|Defect

In this case, Always Cooperate gets 0 points, and Always Defect gets 5 points. Always Cooperate is the sucker here, like President James Dale. The Martians took advantage of a vulnerability, and got an outsized gain.

Now imagine President James Dale came back to life, and had another chance. Always Cooperate would dictate that he continue to cooperate even if the Martians continue to attack.

Let’s say a Turn is 100 interactions: Always Cooperate offers to cooperate 100 times, and the Always Defect defects 100 times. In that scenario, Always Cooperate gets 0 points, and Always Defect gets 500 points.

Now let’s say Always Defect plays another Always Defect strategy… a typical war scenario, where both sides hate each other and would never consider a truce. That looks like this:

Defect|Defect 

In this case, where both sides defect, it’s a tie, and both sides get 1 point per turn. You aren’t a sucker, but you don’t make much progress because both sides have their guard up. After 100 interactions this would result in 100 points for each side.

Finally, imagine Always Cooperate played another Always Cooperate. We’d get this result:

Cooperate|Cooperate

Both sides offer to cooperate, and as a result progress is made! At least, a bit more than would have if both sides defected. On the other hand, it might require some trade-offs that make the result less ideal than if you just got your way entirely.

When both sides cooperate, each side gets 3 points. And after 100 interactions, both sides would have 300 points.

Going forward, I’ll be referring to the above results by (C|D), (D|D), (C|C), and the reverse of the first, from the Martian’s perspective: (D|C). And a full turn will be reported like this:

Always Defect vs Always Cooperate:
100% (D|C)
----------------------------------
= 500 points | 0 points

You might be asking, “Why have 100 interactions when every interaction is the same?” Good question! These first 2 strategies are the simplest strategies possible — they always do the same thing. But that’s not how we typically behave in the real world. Most likely, if President James Dale came back to life, he would not choose to cooperate a 2nd, 3rd, and 100th time.

Let’s talk about a few more strategies that account for this new information.

Tit For Tat

Strategy: Start by cooperating, then copy whatever the other player did last move.

Tit For Tat is similar to the philosophy of “an eye for an eye”. The punishment matches the crime. If someone defects against you, then immediate defect against them. If they start cooperating again, then you start cooperating again too. It’s the incarnation of strict fairness.

How does this strategy do against the others? Like this:

Tit For Tat vs Always Cooperate: 
100% (C|C)
-------------------------------
= 300 points | 300 points
Tit for tat vs Always defect:
1% (C|D)
+ 99% (D|D)
-----------------------------
= 99 points | 104 points

Pros: Because Tit For Tat starts by cooperating, and then copies the other player’s last move, it behaves like Always Cooperate when interacting with it (getting 300 points/turn), and behaves like Always Defect except for the first move when interacting with it (getting 99 points/turn instead of 0 that Always Cooperate would get). This adaptability makes it a very strong strategy for people who like the idea of Always Cooperate but don’t want to play the sucker.

Cons: This strategy plays to tie, not to win.

Amongst Prisoner’s Dilemma fans, Tit For Tat was considered the best strategy for a couple decades, even though it technically loses to Always Defect in a one-on-one match.

Why is that? Because of the way game dynamics change in one-on-one versus many-to-many, or multiplayer, matches.

❋ Multi-player strategies give the game a twist

If you’re playing a single person a single time, the optimal strategy is to defect. If you do that, no matter what, you’ll tie (D|D) or win (D|C).

This holds true no matter how many times you play that one player.

If you’re playing against 2 other players, however, the dynamics can change if the other two players team up against you. Imagine a match with 2 Tit For Tat players and 1 Always Defects player. After everyone played everyone else, the results be:

Always Defect vs Tit For Tat 1:
1% (D|C) : starts nice
+ 99% (D|D) : but quickly adjusts
-------------------------------
= 104 points | 99 points
Always Defect vs Tit For Tat 2:
1% (D|C) : starts nice
+ 99% (D|D) : then quickly adjusts
-------------------------------
= 104 points | 99 points
Tit For Tat 1 vs Tit For Tat 2:
100% (C|C) : starts and remains nice
-------------------------------
= 300 points | 300 points

The final result of the matches would be:

Always Defect: 208 pointsTit For Tat 1: 399 pointsTit For Tat 2: 399 points

Even though Always Defect technically won both of its matches, it earned a lower point total than the other two players, because they benefited with being able to cooperate with each other, more than making up for the small point loss from their match with Always Defect.

This family of strategies that respond to defection with some kind of punishment can be understood by thinking of them as living on a spectrum between forgiving and unforgiving.

Here are some examples of punishing strategies that start with cooperation, sorted by forgiving to unforgiving:

  • Tit For Two Tats: Cooperates on the first move, and defects only when the opponent defects two times in a row.
  • Firm But Fair: Cooperates on the first move, and cooperates except after receiving a (C|D). Will cooperate after (D|D).
  • Generous Tit for Tat: Same as Tit For Tat, except that it cooperates with a 10% probability when the opponent defects.
  • Soft Majority: Cooperates on the first move, and cooperates as long as the number of times the opponent has cooperated is greater than or equal to the number of times it has defected, else it defects.
  • Hard Tit for Tat: Cooperates on the first move, and defects if the opponent has defected on any of the previous 3 moves, else cooperates.
  • Two Tits for Tat: Same as Tit For Tat except that it defects 2x whenever the opponent defects.
  • Gradual: Cooperates on the first move, and cooperates as long as the opponent cooperates. After the first defection of the other player, it defects 1x and cooperates 2x. After the Nth defection it reacts with N consecutive defections and then calms down its opponent with two cooperations, in order to reset them if they are also forgiving.
  • Soft Grudger: Cooperates, until the opponent defects, then punishes them with D, D, D, D, C, C.
  • Grim trigger: Cooperates, until the opponent defects, and thereafter always defects.

And here are some punishing strategies that start with defecting, but can come back to cooperating:

  • Suspicious Tit for Tat: Same as Tit For Tat, except that it defects on the first move.
  • Hard Majority: Defects on the first move, and defects if the number of defections of the opponent is greater than or equal to the number of times it has cooperated, else cooperates.
  • Reverse Tit for Tat: It does the reverse of TFT. It defects on the first move, then plays the reverse of the opponent’s last move.

What’s the best strategy for punishment? It’s not as simple as you may think. Some thoughts from the Theories of Punishment section an online legal encyclopedia I found:

Theories of punishment can be divided into two general philosophies: utilitarian and retributive.

The utilitarian theory of punishment seeks to punish offenders to discourage, or “deter,” future wrongdoing. Under the utilitarian philosophy, laws should be used to maximize the happiness of society.

Under the retributive theory of punishment, offenders are punished for criminal behavior because they deserve punishment. Criminal behavior upsets the peaceful balance of society, and punishment helps to restore the balance.

A common strategy for punishment is denunciation. Under the denunciation theory, punishment should be an expression of societal condemnation. The denunciation theory is a hybrid of utilitarianism and retribution. It is utilitarian because the prospect of being publicly denounced serves as a deterrent. Denunciation is likewise retributive because it promotes the idea that offenders deserve to be punished.

❋ The optimal strategy dances around

When you’re playing against only 1 other player, the optimal strategy is to Always Defect, because you’re guaranteed to win or tie.

When you’re playing against multiple other players, Tit For Tat becomes optimal, if you can team up and benefit from cooperation while also defending against Always Defectors.

It’s like a 4-dimensional game of rock-paper-scissors:

Left: Always Cooperate plays everyone; Right: Grim Trigger plays everyone.

But what happens when everyone starts playing Tit For Tat and other cooperative strategies, and that becomes the norm?

The fascinating thing about prisoner’s dilemma is that whenever a strategy is effective and gains popularity, it opens up an opportunity for another strategy to exploit those strengths and turn them into weaknesses.

Enter…

Prober

Strategy: Start with Defect, Cooperate, Cooperate, then defect if the other player has cooperated in the second and third move (meaning they may be Always Cooperate or another forgiving strategy); otherwise, play Tit For Tat.

When we greet each other in person, we often smile, wave, nod our head, or shake hands. Other cultures bow, or stick out their tongue, or kiss, or press foreheads. Why do we do this? A greeting helps us calibrate the intent of the other person.

Photo by doc obee via Flickr, via 11 Ways People Greet Each Other Around the World.

The Prober strategy starts with a “handshake” of three moves (Defect, Cooperate, Cooperate). By the way the other player responses, Prober can make a better guess about which strategy to play the rest of the round. Let’s look at what would happen in the first 3 moves when it plays each of the others:

Prober vs Always Cooperate: D|C, C|C, C|C = Play Always Defect
98% (D|C)
2% (C|C)
---------------------------
= 496 points | 6 points
Prober vs Always Defect: D|D, C|D, C|D = Play Always Defect
98% (D|D)
2% (C|D)
------------------------
= 98 points | 108 points
Prober vs Tit For Tat: D|C, C|D, C|C = Play Tit For Tat
1% (D|C)
1% (C|D)
98% (C|C)
----------------------
= 299 points | 299 points

With this handshake, Prober can now play a strategy that is appropriate to the player. It’s similar to Tit For Tat in how it is responsive to the other person’s interactions, but it’s responding directly to the strategy rather than to the individual moves.

Pros: If the game was filled with only Always Defect and Tit For Tat players, Prober wouldn’t win (you can see how it earns fewer points against those players than Tit For Tat does on its own) but if there was a small pocket of Always Cooperate players in the game, then that would give it the edge and make it win.

Cons: This strategy only works in a certain game environment, so unless you know that environment ahead of time, it’s difficult to know whether or not to pick this strategy.

The Martians at the beginning of this article used Prober against President James Dale: they determined his intent to cooperate, and then defected for the kill. Which is why handshakes are both powerful and dangerous. They help you learn about the other player, but also help the other player learn about you.

❋ The power of secret handshakes

11 secret handshakes you and your bestie should learn (BuzzFeed)

There are an infinite variety of handshake-like strategies out there. They are each designed to exploit a certain kind of environment, and also need to protect against being exploited themselves. Here are a few others (from this list of strategies):

  • Handshake: Defects on the first move and cooperates on the second move. If the opponent plays the same moves, it always cooperates. Otherwise, it always defects.
  • Naive Prober: Like Tit for Tat, but occasionally defects with a small probability. The probing, in this case, is to occasionally test for an overly-forgiving strategy.
  • Remorseful Prober: Like Naive Prober, but it tries to break the series of mutual defections after defecting. In effect, probing for forgiveness.
  • Adaptive: Starts with C, C, C, C, C, C, D, D, D, D, D and then takes choices which have given the best average score re-calculated after every move.
  • Pavlov: Plays Tit For Tat in the first six moves and identifies the opponent by means of a rule-based mechanism. The strategies of the opponent are categorized into four groups: Tit For Tat, Always Defect, Suspicious Tit For Tat, and Random. If the other player doesn’t start defecting, it is identified to be cooperative and will behave as Tit For Tat. If the other player defects more than four times in six consecutive moves, it is identified as an Always Defect type and will always defect. If the opponent just defects three times in six moves, it is identified as Suspicious Tit For Tat type and will adopt Tit For Two Tats in order to recover mutual cooperation. Any strategy that does not belong to the former three categories will be identified as a random type. In this situation, Pavlov will play Always Defect. In order to deal with the situations in which the opponents may change their actions, the average payoff is computed every six rounds. If it is lower than a threshold, the process of opponent identification may restart.
  • Fortress3: Like Handshake, it tries to recognize kin members by playing D, D, C. If the opponent plays the same sequence of D, D, C, it cooperates until the opponent defects. Otherwise, it defects until the opponent defects on continuous two moves, and then it cooperates on the following move.
  • Fortress4: Same as Fortress3 except that it plays D, D, D, C in recognizing kin members. If the opponent plays the same sequence of D, D, D, C, it cooperates until the opponent defects. Otherwise, it defects until the opponent defects on continuous three moves, and then it cooperates on the following move.
  • Collective strategy: Plays C and D in the first and second move. If the opponent has played the same moves, plays Tit For Tat. Otherwise, plays Always Defect.
  • Southampton Group strategies: A group of strategies are designed to recognize each other through a predetermined sequence of 5–10 moves at the start. Once two of these strategies recognize each other, they will act as a ‘master’ or ‘slave’ role — a master will always defect while a slave will always cooperate in order for the master to win the maximum points. If the opponent is recognized as not being a SGS, it will immediately defect to minimize the score of the opponent.

Pros: If you know who you’re playing, that knowledge is power.

Cons: At the same time, it takes multiple turns to achieve this knowledge, and that means that you take a hit in total points even if you pick the right strategy.

As you can see with Fortress3 and Fortress4, strategies are susceptible to being found out and worked around. The way to play against a handshake strategy is to mimic one kind of strategy until the other player locks their strategy in, and then to switch to a strategy that exploits that. Pavlov is an example of a strategy that anticipates a lot of trickery from the other player, and tries to constantly stay one step ahead.

This arms race between strategies is similar to Batesian mimicry in evolution. Honeyflies evolve to look like bees and Milk snakes evolve to look like poisonous Coral snakes to avoid getting eaten. Weeds evolve to look like crops to avoid getting weeded.

Left: Milk Snake (not poisonous); Right: Coral Snake (poisonous)

There is no optimal strategy, for long. You gotta stay on your toes.

How cooperation evolves

Chaos → Fight to survive → Team up against common enemy→ Cooperation norms form → Norms expose new opportunities for mimicry to evolve → Someone exploits cooperation → Repeat ⏎

The ultimate cooperation strategy isn’t static. It has to be a strategy for changing strategies, more than anything else. And it needs to consider optimizations to the following broad problem areas:

  • It needs to defect against defectors, yet also be forgiving.
  • It needs to cooperate with cooperators, yet also set boundaries.
  • It needs to continuously improve and camouflage its methods for identifying defectors and cooperators, to account for mimicry and exploitation.
  • Meanwhile, it pays to build trust with others in order to be taken into account as other players’ continuously improve their methods for identifying defectors and cooperators.

You’ve now seen many of the basic strategy building blocks: how tactics like teaming up, punishment, and secret handshakes can all be used to get a temporary advantage.

And you probably also sense that somewhere in your subconscious, you’re already well aware of all of these building blocks, and have been testing different strategies your entire life.

The most interesting thing is that cooperation has evolved, even if it feels impossibly complicated and always on the verge of tipping over into fake cooperation (mimicry) and probing (extortion).

Prisoner’s dilemma is a simple way to give words to these subconscious strategies that have evolved into us all. Now that you have this framework, you’ll probably start seeing different strategies showing up everywhere in life.

Hopefully all of this has sparked some self-reflection. Even if it didn’t, perhaps re-watching this scene from Princess Bride will make this whole article worth reading:

If aliens landed on Earth tomorrow, which strategy would you vote for the world’s leaders to use when interacting with them? Would you pick Always Cooperate? Tit For Tat? Grim Trigger? Prober? Or something else?

Choose wisely. 👇

Here’s a direct link to the poll, as well.

Thank you!

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.