In the iterated Prisoner's Dilemma, how would a change in the payoff matrix affect strategy?

Question

Quick summary of the Prisoner's Dilemma: Two criminals are charged with a crime. Each is given an opportunity by the police to confess.

If they both stay silent, they don't have enough evidence to convict for much, and they both get 1 year in prison
If they both confess, they have a lot more evidence and they both get 2 years in prison.
If one confesses and the other does not, the one who confesses gets a plea bargain and gets to leave freely. The one who doesn't confess gets all the liability and gets 3 years in prison.

You can put these in a chart, called a "payoff matrix" in Game Theory. Here's a typical, standard payoff Matrix for Prisoner's Dilemma (adapted from Wikipedia)

         | B coop | B defect
----------------------------
A coop   | R, R   | S, T
----------------------------
A defect | T, S   | P, P

Where T > R > P > S. It doesn't matter much if you're only playing one version of the game. But in iterated Prisoners dilemma, you want to maximize your score over many rounds, which has optimal strategies beyond "always defect", e.g. tit for tat

What I would like to know is, what happens if, for example, R >>> P? Example:

T = 101
R = 100
P = 1
S = 0

In this case, the penalty for having your opponent defect is so much worse, it dominates anything about your own decision. Your own decision can only move your score by 1 point, but your opponent can move your score by 100 points. To me, this implies that both players would be much more inclined to cooperate (and, of course, always defect in the last round).

Does this make sense? Would you adjust an algorithm to be more cooperative if this were the payout matrix?
What about other extremes? e.g. (P >>> S, or T = 101, R = 100, P = 99, S = 0). Would you ever cooperate?

Pburg · Accepted Answer · 2014-06-28 22:12:27Z

The importance of the numbers depends on time discounting and what solution concepts you'd like to use. Your intuition is good.

You mention tit for tat. This and other punishment strategies exist to ensure cooperation. However, you may be interested in finding punishment strategies that are credible, introducing refinements to the Nash equilibrium concept common in the economic literature. Credibility results don't necessarily hinge on the gain from defecting.

In your example, tit for tat is credible.

Here's my work showing why tit for tat is credible.

Let there be a discount factor $\delta\in(0,1)$. We can consider the average payoffs to player 1 in the four states given the tit for tat protocol:

When both players cooperate, player 1 earns $V_{1}(w_{CC})=100$.

When both defect, $V_{1}(w_{DD})=1$.

When only 1 defects, $V_{1}(w_{DC})=(1-\delta)101+\delta V_{1}(w_{CD})$.

When only 2 defects, $V_{1}(w_{CD})=(1- \delta )0+\delta V_{1}(w_{DC})$.

Then, $V_{1}(w_{DC})=\frac{101}{1+\delta}$ and $V_{1}(w_{CD})=\frac{\delta101}{1+\delta}$.

Then, CC is a Nash eq. in the state CC if $100\geq(1-\delta)101+\delta\frac{\delta101}{1+\delta}$, requiring $\delta\geq\frac{1}{100}$.

Now we check that once one player has deviated, they will find it in their interest to continue playing tit for tat. We compare average payoffs from different strategies.

So in states CD and DC, cooperating is a best response if $\frac{\delta101}{1+\delta}>(1-\delta)1+\delta=1$. This is satisfied if $\delta\geq\frac{1}{100}$

And defecting is a best response if $\frac{101}{1+\delta}\geq(1-\delta)3+\delta3=3,$ which is satisfied for any $\delta$ in $(0,1)$.

Following in this pattern, you would see that tit for tat is not credible for a game with payoffs T=4, R=3, S=-1, P=1.

For an academic reference, see

Mailath, G. J., & Samuelson, L. (2006). Repeated games and reputations: long-run relationships. OUP Catalogue.

Your answer seems to be relying on assumptions that are stated neither in the question nor your answer, such as the number of iterations is uncertain. — Acccumulation, Commented Dec 15, 2019 at 22:45

Acccumulation · Accepted Answer · 2019-12-15 22:57:01Z

In a game with perfect meta-knowledge and game-theoretic rational behavior, this would not change the game theoretic result. Backwards induction still applies: there is no incentive to cooperate in the last round, thus there is no incentive to cooperate in the second to last round, etc.

The only way it would change things is if there is uncertainty. For instance, suppose there is no one definite last round. Instead, after each round there is a probability p to stop the game. In that case, increasing the reward for CC allows continued cooperation at higher p.

Another possible source of uncertainty is that players may be uncertain how rational the other player is, or uncertain how certain the other player is that they are rational, or uncertain how certain the other player is about how certain ... etc. This also can allow for cooperation: each step of backwards induction relies on another level of metaknowledge. Being certain that the other player will defect on round n-1 relies on them being certain that you will defect on round n. Being certain that they will defect on n-2 relies on them being certain that you will defect on n-1, which relies on you being certain that they will defect on n. With enough rounds and a high enough reward for CC, enough uncertainty can propagate to allow cooperation.

klm123 · Accepted Answer · 2014-06-01 14:56:21Z

I would answer that the numbers are not important here unless you define full model of the situation mathematically strictly and with exact numbers. The model includes both: prison rules and prisoners way of thinking. Currently you have defined only the first one.

To illustrate this on a simple example what happens when a model is defined only partially:

In the same way it is not important how big probability to lose in the game until you do not define how must you win when you win and how much you lose when you lose. Let's note probability to win as P, amount you win as AW, amount you lose as AL. Then in average you will win A = AW*P+AL*(1-P).
```
If you take P = 50%, AW = 200, AL = 100. A = 50.
If you take P = 0.5%, AW = 20000, AL = 1. A ~= 99.
```
One can see that looking on P one can not judge at all about decision.
Ones you define the prisoners strictly in mathematical sense I believe you will have no problems to predict they behaviour. For example superational prisoners will always cooperate (if R > P AND S+T < 2R).

Stack Exchange Network

In the iterated Prisoner's Dilemma, how would a change in the payoff matrix affect strategy?

3 Answers 3

Not the answer you're looking for? Browse other questions tagged
game-theory
or ask your own question.

Hot Network Questions

In the iterated Prisoner's Dilemma, how would a change in the payoff matrix affect strategy?

3 Answers 3

Not the answer you're looking for? Browse other questions tagged game-theory or ask your own question.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
game-theory
or ask your own question.