Skip to main content

In the iterated Prisoner's Dilemma, how would a change in the payoff matrix affect strategy?

Added a little background
Source Link
durron597
  • 2.4k
  • 2
  • 21
  • 29

Quick summary of the Prisoner's Dilemma: Two criminals are charged with a crime. Each is given an opportunity by the police to confess.

  • If they both stay silent, they don't have enough evidence to convict for much, and they both get 1 year in prison
  • If they both confess, they have a lot more evidence and they both get 2 years in prison.
  • If one confesses and the other does not, the one who confesses gets a plea bargain and gets to leave freely. The one who doesn't confess gets all the liability and gets 3 years in prison.

You can put these in a chart, called a "payoff matrix" in Game Theory. Here's a typical, standard payoff Matrix for Prisoner's Dilemma (adapted from Wikipedia)

         | B coop | B defect
----------------------------
A coop   | R, R   | S, T
----------------------------
A defect | T, S   | P, P

Where T > R > P > S. It doesn't matter much if you're only playing one version of the game. But in iterated Prisoners dilemma, you want to maximize your score over many rounds, which has optimal strategies beyond "always defect", e.g. tit for tat

What I would like to know is, what happens if, for example, R >>> P? Example:

T = 101
R = 100
P = 1
S = 0

In this case, the penalty for having your opponent defect is so much worse, it dominates anything about your own decision. Your own decision can only move your score by 1 point, but your opponent can move your score by 100 points. To me, this implies that both players would be much more inclined to cooperate (and, of course, always defect in the last round).

  • Does this make sense? Would you adjust an algorithm to be more cooperative if this were the payout matrix?
  • What about other extremes? e.g. (P >>> S, or T = 101, R = 100, P = 99, S = 0). Would you ever cooperate?

Here's a typical, standard payoff Matrix for Prisoner's Dilemma (adapted from Wikipedia)

         | B coop | B defect
----------------------------
A coop   | R, R   | S, T
----------------------------
A defect | T, S   | P, P

Where T > R > P > S. It doesn't matter much if you're only playing one version of the game. But in iterated Prisoners dilemma, you want to maximize your score over many rounds, which has optimal strategies beyond "always defect", e.g. tit for tat

What I would like to know is, what happens if, for example, R >>> P? Example:

T = 101
R = 100
P = 1
S = 0

In this case, the penalty for having your opponent defect is so much worse, it dominates anything about your own decision. Your own decision can only move your score by 1 point, but your opponent can move your score by 100 points. To me, this implies that both players would be much more inclined to cooperate (and, of course, always defect in the last round).

  • Does this make sense? Would you adjust an algorithm to be more cooperative if this were the payout matrix?
  • What about other extremes? e.g. (P >>> S, or T = 101, R = 100, P = 99, S = 0). Would you ever cooperate?

Quick summary of the Prisoner's Dilemma: Two criminals are charged with a crime. Each is given an opportunity by the police to confess.

  • If they both stay silent, they don't have enough evidence to convict for much, and they both get 1 year in prison
  • If they both confess, they have a lot more evidence and they both get 2 years in prison.
  • If one confesses and the other does not, the one who confesses gets a plea bargain and gets to leave freely. The one who doesn't confess gets all the liability and gets 3 years in prison.

You can put these in a chart, called a "payoff matrix" in Game Theory. Here's a typical, standard payoff Matrix for Prisoner's Dilemma (adapted from Wikipedia)

         | B coop | B defect
----------------------------
A coop   | R, R   | S, T
----------------------------
A defect | T, S   | P, P

Where T > R > P > S. It doesn't matter much if you're only playing one version of the game. But in iterated Prisoners dilemma, you want to maximize your score over many rounds, which has optimal strategies beyond "always defect", e.g. tit for tat

What I would like to know is, what happens if, for example, R >>> P? Example:

T = 101
R = 100
P = 1
S = 0

In this case, the penalty for having your opponent defect is so much worse, it dominates anything about your own decision. Your own decision can only move your score by 1 point, but your opponent can move your score by 100 points. To me, this implies that both players would be much more inclined to cooperate (and, of course, always defect in the last round).

  • Does this make sense? Would you adjust an algorithm to be more cooperative if this were the payout matrix?
  • What about other extremes? e.g. (P >>> S, or T = 101, R = 100, P = 99, S = 0). Would you ever cooperate?
Source Link
durron597
  • 2.4k
  • 2
  • 21
  • 29

In iterated Prisoner's Dilemma, how would a change in the payoff matrix affect strategy?

Here's a typical, standard payoff Matrix for Prisoner's Dilemma (adapted from Wikipedia)

         | B coop | B defect
----------------------------
A coop   | R, R   | S, T
----------------------------
A defect | T, S   | P, P

Where T > R > P > S. It doesn't matter much if you're only playing one version of the game. But in iterated Prisoners dilemma, you want to maximize your score over many rounds, which has optimal strategies beyond "always defect", e.g. tit for tat

What I would like to know is, what happens if, for example, R >>> P? Example:

T = 101
R = 100
P = 1
S = 0

In this case, the penalty for having your opponent defect is so much worse, it dominates anything about your own decision. Your own decision can only move your score by 1 point, but your opponent can move your score by 100 points. To me, this implies that both players would be much more inclined to cooperate (and, of course, always defect in the last round).

  • Does this make sense? Would you adjust an algorithm to be more cooperative if this were the payout matrix?
  • What about other extremes? e.g. (P >>> S, or T = 101, R = 100, P = 99, S = 0). Would you ever cooperate?