Quick summary of the Prisoner's Dilemma: Two criminals are charged with a crime. Each is given an opportunity by the police to confess.
- If they both stay silent, they don't have enough evidence to convict for much, and they both get 1 year in prison
- If they both confess, they have a lot more evidence and they both get 2 years in prison.
- If one confesses and the other does not, the one who confesses gets a plea bargain and gets to leave freely. The one who doesn't confess gets all the liability and gets 3 years in prison.
You can put these in a chart, called a "payoff matrix" in Game Theory. Here's a typical, standard payoff Matrix for Prisoner's Dilemma (adapted from Wikipedia)
| B coop | B defect
----------------------------
A coop | R, R | S, T
----------------------------
A defect | T, S | P, P
Where T > R > P > S
. It doesn't matter much if you're only playing one version of the game. But in iterated Prisoners dilemma, you want to maximize your score over many rounds, which has optimal strategies beyond "always defect", e.g. tit for tat
What I would like to know is, what happens if, for example, R >>> P
? Example:
T = 101
R = 100
P = 1
S = 0
In this case, the penalty for having your opponent defect is so much worse, it dominates anything about your own decision. Your own decision can only move your score by 1 point, but your opponent can move your score by 100 points. To me, this implies that both players would be much more inclined to cooperate (and, of course, always defect in the last round).
- Does this make sense? Would you adjust an algorithm to be more cooperative if this were the payout matrix?
- What about other extremes? e.g. (
P >>> S
, orT = 101, R = 100, P = 99, S = 0
). Would you ever cooperate?