Asymmetric Iterated Prisoner’s Dilemma on BA Scale-Free Network

Yunhao Ding¹¹1First author Chunyan Zhang²²2Second author Jianlei Zhang jianleizhang@nankai.edu.cn

Abstract

In real-world scenarios, individuals often cooperate for mutual benefit. However, differences in wealth, reputation, and rationality can lead to varying outcomes for similar actions. Besides, in complex social networks, an individual’s choices are frequently influenced by their neighbors. To explore the evolution of strategies in realistic settings, we conducted repeated asymmetric prisoner’s dilemma experiments on a weighted Barabási-Albert (BA) scale-free network using a memory-one strategy framework. First, our analysis highlighted how the four components of memory-one strategies affect win rates. Second, during strategy evolution on the network, two key strategies emerged: ”self-bad, partner-worse” and ”altruist”. Finally, by introducing optimization mechanisms, we increased the cooperation levels among individuals within the group. These findings offer practical insights for addressing real-world problems.

keywords:

Iterated Prisoner’s Dilemma , Evolutionary Game , BA Scale-Free Network , Cooperation

\affiliation

[a]organization=Department of Automation, College of Artificial Intelligence,addressline=Nankai University, city=Tianjin, postcode=300071, country=China

\affiliation

[b]organization=Tianjin Key Labortory of Intelligence Robotics,addressline=Nankai University, city=Tianjin, postcode=300071, country=China

{highlights}

We analyze and compare the characters of each component in the framework of memory-one strategy.

We find ”altruists” strategy and ”self-bad, partner-worse” strategy within an iterated asymmetric prisoner’s dilemma game on weighted BA scale-free network.

We explore methods to enhance the average fitness of the population.

1 Introduction

Cooperation refers to the behavior where individuals coordinate to achieve better outcomes driven by common interests[1]. In the biological realm, cooperative behaviors are ubiquitous, ranging from foraging activities among animals to relations between nations[2]. To study the impact of individuals’ choices to cooperate or not under complex conditions on the benefits to both parties, game theory and evolutionary game theory have emerged successively[3]. Game theory provides the mathematical framework for analyzing scenarios characterized by conflict or competition. Evolutionary game theory, a branch of game theory, integrates concepts from evolutionary biology to explore strategic choices within a population and the dynamic processes of behavioral evolution[4, 5].

The Prisoner’s Dilemma (PD) is a classic model in game theory, originating from a scenario involving two captured prisoners who are unable to communicate with each other. PD presents a seemingly paradoxical problem: when faced with the choice between betrayal and cooperation, the rational choice for each prisoner is to betray, because, regardless of the other’s decision, confessing yields the best individual outcome. However, if both prisoners choose to betray, they will end up with a worse outcome than if they both had cooperated[6, 7, 8].

In the 1980s, Robert Axelrod organized two tournaments to study the performance of various strategies in the iterated Prisoner’s Dilemma (IPD) and to determine which strategies could balance cooperation and betrayal[9]. In IPD studies, strategies are often endowed with a degree of ”memory,” allowing individuals to recall outcomes of several previous rounds. It is generally believed that players with stronger memory perform better in repeated games. However, research indicates that long-term memory does not significantly advantage over short-term memory[10]. As a result, memory-one strategies have become the most widely used framework in repeated games. Scholars have proposed several strategies within this framework, highlighting their benefits in specific contexts. In the aforementioned tournaments, a simple Tit-for-Tat (TFT) strategy won consecutively. The TFT strategy involves cooperating in the first round and then replicating the opponent’s action from the previous round. This cooperative approach yielded excellent results by not initiating betrayal but responding to it, thus balancing cooperation and punishment[11, 12]. Inspired by TFT, Robert Axelrod proposed the Generous-TFT (GTFT) strategy, which also starts with cooperation and continues if the opponent does. However, unlike TFT, GTFT forgives the opponent’s betrayal with a certain probability[13]. This strategy maintains TFT’s ability to establish cooperation while adding tolerance, helping to avoid vicious cycles and promoting long-term cooperation. In the 1950s, psychologist Donald Hebb introduced the concept of Win-Stay, Lose-Shift (WSLS). In the 1990s, Nowak and Sigmund formally defined this strategy. Its principle is simple: if the previous round’s result was favorable, maintain the same decision in the current round; otherwise, change the decision[14, 15]. In 2012, Press and Dyson introduced the Zero-Determinant (ZD) strategy, which can unilaterally control the opponent’s payoff and enforce a linear relationship between their payoffs. Unlike the aforementioned strategies with clear rules, the ZD strategy encompasses a cluster of strategies based on repeated games[16]. Notably, its payoff is the expected long-term payoff rather than the exact payoff in any specific round.

Complex networks are systems composed of numerous interconnected nodes, which can be individuals, organizations, or other social units in reality. In these networks, the decisions and behaviors of individuals are often influenced by their surrounding nodes, leading to complex interactions and dynamic evolution[17, 18, 19]. Studying evolutionary games on complex networks allows us to better understand the dynamics of interactions, cooperation, and competition among individuals, offering solutions to real-world social problems.

Game theory typically assumes that participants are completely rational and symmetric. However, in reality, participants often differ in identity, characters and assets, which significantly influence their decisions and payoffs[20]. Introducing these differences makes game models more realistic and capable of accurately reflecting the complex interactions in the real world[21, 22, 23, 24]. These participants have varying goals and resources in the game, leading to different strategies. For instance, resource-rich participants may be more willing to take risks, while resource-limited participants might prefer conservative strategies. By considering differences in varied attributes, more complex and optimized game models can be designed, resulting in fairer and more effective solutions.

The main structure of this paper is as follows: Chapter 1 is a brief introduction to game theory, the prisoner’s dilemma, complex networks, and asymmetric games. Chapter 2 describes the models and experimental procedures used in the study in detail. Chapter 3 presents and analyzes the experimental results. Chapter 4 summarizes the work conducted in the paper.

2 Models and Settings

2.1 The Prisoner’s Dilemma

The Prisoner’s Dilemma (PD) is a classic game theory model. In the traditional PD, there are two participants, X and Y, each with the same options: cooperate (C) or defect (D). Let the cost of cooperation be denoted as $c$ , and the benefit obtained be denoted as $b$ [25, 26]. In a single round, if both X and Y choose to cooperate, they both receive the same payoff $R(reward)=b-c$ . If X chooses to cooperate while Y chooses to defect, the naive cooperator X incurs the cost of cooperation, resulting in payoff $S(sucker)=-c$ , while the greedy defector Y avoids the cooperation cost and directly gains payoff $T(temptation)=b$ . If both X and Y choose to defect, neither incurs the cost, and neither gains the benefit, resulting in a payoff $P(punish)=0$ for both. Generally, it holds that $b>c>0$ and $T>R>P>S$ [27, 28]. In this paper, we set $b=4$ and $c=1$ , yielding the following payoff matrix.

	C	D
C	R(3)	S(-1)
D	T(4)	P(0)

Table 1: Payoff Matrix

2.2 Memory-one Strategy

In this study, all individuals are assumed to adopt a memory-one strategy $\textbf{p}=(p_{R},p_{S},p_{T},p_{P})$ for their interactions. The four parameters in this model correspond to the probability that an individual will choose to cooperate in the current round, based on the outcomes of the previous round being $\textbf{XY}=CC$ , $\textbf{XY}=CD$ , $\textbf{XY}=DC$ and $\textbf{XY}=DD$ respectively. Besides, these probabilities satisfy the condition $p_{R}\in[0,1]$ , $p_{S}\in[0,1]$ , $p_{T}\in[0,1]$ and $p_{P}\in[0,1]$ [29]. Specifically, to distinguish the strategies of both parties in the game, when X and Y engage in a repeated PD, the strategy of individual X is denoted as $\textbf{p}=(p_{1},p_{2},p_{3},p_{4})$ , where $p_{1}$ , $p_{2}$ , $p_{3}$ and $p_{4}$ represent the probabilities of X choosing to cooperate given the previous round’s outcomes of $\textbf{XY}=CC$ , $\textbf{XY}=CD$ , $\textbf{XY}=DC$ and $\textbf{XY}=DD$ respectively. Similarly, the strategy of Y is denoted as $\textbf{q}=(q_{1},q_{2},q_{3},q_{4})$ , where $q_{n}$ represent the probabilities of Y choosing to cooperate given the previous round’s outcomes of $\textbf{XY}=CC$ , $\textbf{XY}=DC$ , $\textbf{XY}=CD$ and $\textbf{XY}=DD$ respectively.

2.3 Asymmetric Element

The asymmetry in this study is reflected in the concept of ”wealth value”. Wealth value integrates factors such as reputation, status and capital, leading to varying returns for individuals in the game. In this study, the wealth value $k$ ranges from $(0,10)$ to reflect the differences between the individuals[30].

Assuming the total number of individuals in the group is $N$ , and $N$ random numbers within the range $(0,10)$ are generated and assigned to each individual as their initial wealth value before the first round.

From Table 1, the basic payoff matrix under symmetric games can be derived as follows.

A=\begin{bmatrix}R&S\\ T&P\end{bmatrix}=\begin{bmatrix}b-c&-c\\ b&0\end{bmatrix}=\begin{bmatrix}3&-1\\ 4&0\end{bmatrix}

(1)

For X and Y, with wealth values $k_{1}$ and $k_{2}$ respectively, the payoff matrix is defined as follows.

A_{\textbf{X}}=\begin{bmatrix}R&S\\ T&P\end{bmatrix}=\begin{bmatrix}k_{1}(b-c)&-k_{1}c\\ k_{1}b&0\end{bmatrix}=\begin{bmatrix}3k_{1}&-k_{1}\\ 4k_{1}&0\end{bmatrix}

(2)

A_{\textbf{Y}}=\begin{bmatrix}R&S\\ T&P\end{bmatrix}=\begin{bmatrix}k_{2}(b-c)&-k_{2}c\\ k_{2}b&0\end{bmatrix}=\begin{bmatrix}3k_{2}&-k_{2}\\ 4k_{2}&0\end{bmatrix}

(3)

This settings integrate the impact of wealth value into the payoff matrix. It can be understood as follows: due to the differing identities, statuses, and assets of the two participants in the game, they can only obtain returns that match their positions. For instance, in a special PD, two prisoners have committed a crime together, but their sentences differ due to their different roles in the crime. Suppose prisoner X’s crime is more severe, leading to a longer sentence, while prisoner Y’s crime is less severe, resulting in a shorter sentence. If they both cooperate, they will achieve outcomes proportional to their sentences: 2 years and 1 year respectively. If X cooperates and Y defects, Y will be immediate released, while X will stay in the prison for 10 years. If on contrast, X will get immediate release, while Y will receive 8 years. If they both defect, they will receive relatively bad outcomes: 5 years and 3 years respectively.

2.4 Payoff Calculation

In a symmetric game, where the wealth values of X and Y are both 1, the payoff vector for X is defined as $R_{X}=(3,-1,4,0)$ , and correspondingly, the payoff vector for Y is defined as $R_{Y}=(3,4,-1,0)$ . For the asymmetric game, due to changes in the payoff matrix, the payoff vectors for both players become $R_{\textbf{X}}=(3k_{1},-k_{1},4k_{1},0)$ and $R_{\textbf{Y}}=(3k_{2},4k_{2},-k_{2},0)$ respectively. For both scenarios, after a single interaction, the expected payoffs for X and Y can be calculated using the following formula.

	$\displaystyle r_{\textbf{X}}=\frac{\mu\cdot\mathbf{R}_{X}}{\mu\cdot 1}=\frac{D% (p,q,R_{\textbf{X}})}{D(p,q,1)}$		(4)
	$\displaystyle r_{\textbf{Y}}=\frac{\mu\cdot\mathbf{R}_{Y}}{\mu\cdot 1}=\frac{D% (p,q,R_{\textbf{Y}})}{D(p,q,1)}$		(4)

Among them, $\mu$ is the stationary vector of matrices $p$ and $q$ . In addition,

	$\displaystyle\mu\cdot h$	$\displaystyle=D(p,q,h)=\left\|\begin{array}[]{cccc}-1+p_{1}q_{1}&-1+p_{1}&-1+q_% {1}&h_{1}\\ p_{2}q_{3}&-1+p_{2}&q_{3}&h_{2}\\ p_{3}q_{2}&p_{3}&-1+q_{2}&h_{3}\\ p_{4}q_{4}&p_{4}&q_{4}&h_{4}\end{array}\right\|$		(5)
	$\displaystyle\mathbf{h}$	$\displaystyle=\begin{bmatrix}h_{1}\\ h_{2}\\ h_{3}\\ h_{4}\end{bmatrix}$		(5)

In this paper, the expected payoff for each node in the network is defined as the average payoffs obtained by itself from interactions with all its neighbors.

2.5 Strategy Updating Method

In complex networks, the actions of individuals can be divided into interaction and updating. During the simulation process, interaction involves each individual playing a two-person asymmetric PD game with all its neighbors and obtaining the corresponding payoff. Updating occurs after each individual completes a round of games: each individual randomly selects one of its neighbors to compare payoffs and decide whether to update their strategy.

For the basis of deciding whether to update the strategy, we choose to use the Fermi function in this paper. For X, whose neighbor set is P, X obtains an average payoff $r_{\textbf{X}}$ from the recent round with all players in P. At this point, X randomly selects a player Y from P, who has obtained an average payoff $r_{\textbf{Y}}$ in the same round. According to the Fermi dynamics, $r_{\textbf{X}}$ will adopt $r_{\textbf{Y}}$ ’s strategy in the next round with a probability given by $w$ or will continue using its current strategy with a probability of $1-w$ [31, 32].

w=\frac{1}{1+\exp{\frac{r_{\textbf{X}}-r_{\textbf{Y}}}{k}}}

(6)

In the denominator of the formula, $k$ represents the rationality level of individuals in the network. As $k$ approaches infinity, individuals gradually tend to make completely random choices regarding whether to update their strategy. Conversely, as $k$ approaches zero, individuals become fully rational, meaning they will adopt the other individual’s strategy as long as the other’s expected payoff is higher than their own.

According to previous research, in symmetric games, $k$ is often set to 1. However, this value cannot be directly applied to asymmetric games. For example, consider a game between X with huge wealth and Y with relatively low wealth. Because X has a much larger principal, he can obtain significantly higher payoffs compared to Y. However, Y should not easily adopt X’s strategy, because with his relatively smaller principal, adopting X’s strategy will not lead to a significant increase in his payoff. In this paper, the parameter $k$ is set to 8 to match the outcomes of symmetric games where $k=1$ .

Refer to caption — Figure 1: (a). Distribution Map of Conversion Probability on Symmetric Network. (b). Distribution Map of Conversion Probability on Asymmetric Network.

2.6 Wealth Updating Method

In this paper, all wealth values are defined within the interval $(0,10)$ . Initially, each ”participant” in the network randomly receives a wealth value within this range. During one round, participants receive an average payoff, which depends on their original wealth value and strategy choice. Given the bounded interval for wealth values, the average payoff will also fall within a specific range. After this single round, the original wealth of all participants and their average payoff from that round are summed. This total is then normalized to the interval $(0,10)$ to ensure a unified standard for wealth values, preventing any strong individual’s wealth from growing excessively and causing the strategy set to converge too quickly. Additionally, it is important to emphasize that after each round of wealth updates, the payoff vector $R=(3k,-k,4k,0)$ of each individual will also change accordingly. This means the strategy dynamics are continually influenced by the updated wealth and payoff values, maintaining a dynamic and adaptive system throughout the simulation process.

3 Results

This chapter is divided into three sections. The first section provides a brief classification and discussion of the strategy domain. The second section elaborates on the evolutionary game of asymmetric Prisoner’s Dilemma on BA scale-free network and analyzes the result. The third section conducts supplementary experiments on the evolutionary outcomes.

3.1 Classification and Discussion of Strategy Domain

3.1.1 Analysis of Win Rate Curves at Different Cooperation Levels

An analysis is conducted to understand the impact of each component of $S$ on the payoff (win rate) against random strategies. To perform it, for each $p\in[0,1$ in $S=(p_{1},p_{2},p_{3},p_{4})$ , we choose $p=0.2$ , $p=0.5$ and $p=0.8$ to represent ”low cooperation willingness”, ”medium cooperation willingness” and ”high cooperation willingness” respectively. $p_{1}$ , $p_{2}$ , $p_{3}$ and $p_{4}$ are controlled separately for simulations and statistical classification. 10,000 random strategies are generated to calculate the win rate of $S$ against these random strategies. Partial experimental results are presented below, and others are presented appendix.

In Figure 2, the curves indicates that the performance of strategy $S$ against random strategies is negatively impacted by increases in the values of parameters $p_{1}$ , $p_{2}$ , $p_{3}$ and $p_{4}$ .

Specifically, the trends shown by the solid, dashed, and dotted curves of the same color suggest that increase in parameter $p_{4}$ has a detrimental effect on the win rate. Similarly, the trends shown by the same line types in different colors indicate that increase in parameter $p_{3}$ performs the same. Furthermore, the overall downward trends observed across the three figures as the parameter $p_{1}$ increase, as well as the progressive reductions in the values of the corresponding curves, demonstrate that growth in both parameters $p_{1}$ and $p_{2}$ adversely affect the win rate.

3.1.2 Comparison of Each $p$

To analyze the relative impact of the four components on win rates, heatmaps were generated under different combinations of these components. For instance, when comparing the relative effects of $p_{1}$ and $p_{1}$ , fixed values were assigned to $p_{3}$ and $p_{4}$ . As $p_{1}$ and $p_{2}$ varied from low to high, 1000 random strategies were generated at each sampling point, and the win rate against these random strategies was computed to create a heatmap of the distribution. Below is a selection of the experimental results, with the remaining results displayed in appendix.

Figure 3 illustrates the impact of $p_{1}$ and $p_{2}$ on the win rate of $S=(p_{1},p_{2},p_{3},p_{4})$ against random strategies when $p_{3}$ and $p_{4}$ are set to 0.2, 0.5 and 0.8 respectively. The horizontal and vertical axes have consistent meanings, and the two distinct color spots in the figure represent win rates ( $W$ ) satisfying $0.495\leq W\leq 0.505$ and $0.745\leq W\leq 0.755$ (with only the former appearing in Figure (i)). Taking Figure (a) as an example, both $p_{3}$ and $p_{4}$ are at relatively low levels (corresponding to the practical scenario where the current round’s probability of cooperation is low after the previous round’s defection by the player who uses strategy $S$ ). To improve the player’s win rate, $p_{1}$ and $p_{2}$ should be maintained at low levels, which is consistent with the preliminary conclusions obtained earlier. Furthermore, maintaining $p_{2}$ at a low level is more conducive to achieving better results than reducing $p_{1}$ , indicating that ”reducing $p_{1}$ ” is more beneficial for victory compared to ”reducing $p_{2}$ ”. Given the practical significance of $p_{1}$ and $p_{2}$ , it can be inferred that ”greedily” defecting can reap greater benefits when both sides cooperated in the previous round, while showing some tolerance when the S-user cooperated, and the opponent defected in the previous round, might also yield favorable outcomes.

Figure 4 illustrates the impact of $p_{3}$ and $p_{4}$ on the win rate of $S=(p_{1},p_{2},p_{3},p_{4})$ against random strategies when $p_{1}$ and $p_{2}$ are set to 0.2, 0.5 and 0.8 respectively. In this example, $p_{1}$ is at a high level and $p_{2}$ is at a moderate level. This corresponds to a situation where the player has a high probability of cooperating if both players cooperated in the previous round, and a moderate probability of cooperating if the player cooperated but the opponent defected in the previous round. The results show that to achieve a high win rate, $p_{3}$ and $p_{4}$ should be maintained at relatively low levels. Specifically, keeping $p_{3}$ at a low level is more beneficial for outperforming the random strategy than keeping $p_{4}$ at a low level. This suggests that when both players defected in the previous round, it may be advantageous to ”reconcile” to some degree, rather than continuing to defect. Conversely, when the player cooperated but the opponent defected in the previous round, the player should consider continuing to ”exploit” the opponent’s goodwill, as this can lead to more favorable outcomes.

Our research on the four-parameter set reveals that the parameters have varying degrees of impact on the win rate, with $p_{1}$ being the most influential, followed by $p_{2}$ , $p_{3}$ and $p_{4}$ . This suggests that if the strategy $S=(p_{1},p_{2},p_{3},p_{4})$ has to focus on improving one parameter, it would be most beneficial to prioritize keeping $p_{1}$ at a relatively low level, while considering a moderate increase in $p_{4}$ . Interpreting this in practical terms, when facing the outcome of mutual cooperation in the previous round, it would be the optimal choice for strategy $S$ to lean towards defection. Similarly, when confronting the outcome of mutual defection in the previous round, continuing to defect would certainly be the best option. However, moderately increasing the probability of cooperation in this scenario can help maintain one’s own payoff while appearing less purely self-interested.

3.2 Asymmetric IPD on BA Scale-Free Network

3.2.1 Experimental Settings and Steps

The BA scale-free network model is a dynamic network model commonly used to generate scale-free networks. It simulates the ”rich-get-richer” phenomenon observed in social networks, where nodes with higher degrees are more likely to attract new connections[33]. In our experiment, the network is defined with a final node count of $n=1000$ , and each newly added node has an initial degree $m=20$ . This results in a total of $e=19600$ edges.

In the experiment, each node in the network represents an individual with an initial state characterized by a randomly assigned memory-one strategy. These individuals engage in interactions with all their neighbors in each round of the game, generating payoffs based on these interactions. After each round, individuals update their strategies based on the payoffs they and their neighbors received. This iterative process continues until the network reaches an equilibrium state. An equilibrium state is defined as either a state where only one strategy remains across the network, or a dynamic equilibrium where, after 2000 rounds of games, several (usually no more than three) strategies persist.

3.2.2 Experimental Results

Following the aforementioned method, 1000 repeated experiments are conducted, resulting in over 1200 distinct dominant strategies. The strategies were categorized and analyzed using clustering algorithms.

Due to the narrow distribution of the strategy set within the interval $p\in[0,1]$ , clustering algorithms such as DBSCAN, which require specifying point spacing and neighborhood size, presented significant challenges. Therefore, we utilized the K-Means algorithm, supported by the elbow method and silhouette coefficient and discover that the appropriate number of clusters is 6. Table 2 presents the coordinates of the center points and the number of individuals in each cluster.

Number	Center Points of Clusters	Cluster Sizes
1	$[0.2761,0.6548,0.1609,0.2059]$	229
2	$[0.7709,0.3186,0.1537,0.1782]$	256
3	$[0.1964,0.3716,0.6072,0.6647]$	170
4	$[0.7031,0.3372,0.7134,0.3185]$	155
5	$[0.6894,0.3174,0.1914,0.7038]$	167
6	$[0.2365,0.1649,0.1814,0.2794]$	284

Table 2: Clusters’ Information

To further investigate the characteristics of each cluster, we recorded the payoffs when clusters confronted each other, as well as their win rates and average payoffs against 10,000 random strategies. The results are shown in Table 3.

	$S_{1}$	$S_{2}$	$S_{3}$	$S_{4}$	$S_{5}$	$S_{6}$
$S_{1}$	0.9122	0.6274	1.7089	1.4411	1.6576	0.6205
$S_{2}$	1.0206	0.7428	1.7172	1.4502	1.7318	0.7242
$S_{3}$	0.5270	0.5229	1.4123	1.5617	1.4366	0.4060
$S_{4}$	0.6135	0.6807	1.3813	1.5878	1.3996	0.4989
$S_{5}$	0.5841	0.6079	1.4118	1.5600	1.5476	0.3979
$S_{6}$	1.0201	0.6828	1.7492	1.4266	1.7859	0.7163

Table 3: The Game Results of Each Cluster and the Win Rate and Average Payoff Facing Random Strategies

The table entries indicate the payoff obtained by the horizontal strategy when facing the vertical strategy. For example, the value 0.6274 in the cell corresponding to $S_{1}-S_{2}$ indicates that strategy $S_{1}$ gains a payoff of 0.6274 when confronting $S_{2}$ .

The red entries show the payoffs of strategy against all other strategies, highlighting that $S_{2}$ consistently achieves higher payoffs compared to its opponents, regardless of what strategy they choose. Additionally, $S_{2}$ ’s self-play payoff is lower than that of other strategies’ self-play payoffs. This indicates that $S_{2}$ displays a significant advantage against random strategies, demonstrating a ”self-bad, partner-worse” outcome in the asymmetric prisoner’s dilemma, suggesting that such strategies can emerge and maintain a certain scale[34].

The green-background entries represent the results of strategy $S_{4}$ against all other strategies. It is observed that $S_{4}$ always obtains a lower payoff compared to its opponents, who achieve relatively high payoffs. Moreover, $S_{4}$ ’s self-play payoff is higher than that of the other strategies’ self-play payoffs, indicating its inclination towards seeking cooperation and ensuring better outcomes for both parties. In real life, this strategy corresponds to the ”altruists” who prioritize the overall good. Furthermore, $S_{4}$ does not exhibit an advantage against random strategies and has the smallest number of individuals among the six strategy clusters, aligning with logical reasoning and common sense.

3.3 The Evolution and Spread of Cooperative Strategies on Network

In the previous section, we observed that in the evolutionary game of the asymmetric prisoner’s dilemma on a BA scale-free network, the $S_{4}$ strategy might ultimately evolve. In real life, we always hope that groups tend towards cooperation. For example, parents teach their children in kindergarten to become good friends with their peers rather than encouraging hostility towards them. Similarly, in international affairs, powerful nations have always sought friendly exchanges with other nations to promote cooperation and mutual development. This part of the experiment aims to find a method to foster cooperation.

A crucial question is how to define the manifestation of enhanced cooperation. We propose the following research method: if the level of cooperation increases, the fitness of the population will improve, corresponding to an increase in the average payoff of the group in this experiment[35]. Based on this idea, we conducted two supplementary experiments.

In this section, we employ a BA scale-free network model with 100 nodes and an initial degree of 4 for each newly added node. Each node also has an initial ”wealth value.” Based on the experimental procedures described in the second section of this chapter, we made the following two modifications.

(1) Initial Entry of Strategy $S_{4}$

Before starting the experiment, we introduced strategy $S_{4}$ to the initial network according to the following rules.

Random Selection: Randomly select several nodes at the initial stage and assign them Strategy $S_{4}$ .

Degree-Based Selection: Select several nodes at the initial stage based on their degree from high to low and assign them Strategy $S_{4}$ .

Wealth-Based Selection: Select several nodes at the initial stage based on their initial wealth value from high to low and assign them Strategy $S_{4}$ .

These operations aim to spread Strategy $S_{4}$ by leveraging the strategies of important nodes.

(2) Eliminating Low-Cooperation Strategies

During the strategy update phase after each round of the game, if a component $p$ of a node’s strategy $S=(p_{1},p_{2},p_{3},p_{4})$ is lower than a certain threshold (indicating a very low level of cooperation), there is a certain probability that a neighboring node will be randomly selected (where all four components of the neighboring node’s strategy $S^{\prime}=(p_{1}^{\prime},p_{2}^{\prime},p_{3}^{\prime},p_{4}^{\prime})$ must be greater than this threshold) to adopt its strategy in the next round.

Each type of experiment described above was repeated 100 times, generating 100 dominant strategies that evolved. Each of these strategies was then subjected to 1000 tests against random strategies, and the average payoff of the random strategies was recorded. A distribution curve of these 100 average payoffs was plotted.

The understanding is that the initial entry of Strategy $S_{4}$ may influence the evolution of strategies within the network. If the influence is positive, the evolved dominant strategies should be able to promote cooperation within the group. And promoting group cooperation, in turn, is partly reflected in the increased fitness of the group when facing a random population, manifested as an increase in average payoff. The experiment yielded the following results.

From Figure 6, it can be observed that the evolution results of randomly introducing 10 strategies $S_{4}$ into the initial network did not have a significant impact on the fitness of the population when facing a random population. Similarly, selectively introducing 10 nodes with the highest wealth values, 10 nodes with the highest degrees, or a combination of 5 nodes with the highest wealth values and 5 nodes with the highest degrees to adopt strategy $S_{4}$ in the initial network did not significantly affect the average fitness of the population. This only resulted in the fitness of individuals in the population being more centered around an intermediate level.

From Figure 7, it can be seen that selectively introducing 10 nodes with the highest wealth values to adopt strategy $S_{4}$ in the first round, combined with a 50% probability of resetting low-cooperation strategies, resulted in a better distribution of population fitness. On one hand, this approach led to more individuals having intermediate fitness levels within the population. On the other hand, it also resulted in a certain number of high-fitness individuals. Furthermore, the range of the horizontal axis indicates that the mechanism of resetting low-cooperation strategies with a 50% probability effectively eliminated individuals with negative payoffs in the random population. This suggests that the strategy reset mechanism can significantly enhance cooperation in the evolutionary outcome, thereby promoting cooperation during the evolution process.

4 Conclusion and Discussion

In the experiments above, the strategy domain of the framework used was classified and discussed, with a qualitative analysis of the impact of the four components of strategy $S=(p_{1},p_{2},p_{3},p_{4})$ on the win rate of it against random strategies. To increase its win rate, the S-user should maintain a low level of cooperation. The experiments also compared the relative effects of the four components. In the study of $(p_{1},p_{2})$ , it was found that ”reducing $p_{1}$ ” is more conducive to victory compared to ”reducing $p_{2}$ ”. Specifically, in scenarios where both parties chose to cooperate in the previous round, a very low cooperation rate is optimal for the current round. Conversely, when the individual cooperated and the opponent defected in the previous round, a certain level of tolerance can help avoid mutually detrimental outcomes. In the study of $(p_{3},p_{4})$ , it was found that ”reducing $p_{3}$ ” is more conducive to victory compared to ”reducing $p_{4}$ ”.

In the strategy evolution experiments, the K-Means clustering algorithm identified six strategy clusters. Among these clusters, not only did a ”self-bad, partner-worse” strategy cluster emerge, but a ”altruists” strategy cluster also evolved. Similar to the zero-determinant strategies, the ”self-bad, partner-worse” strategy can control the opponent’s payoff to be lower than its own. The ”altruists” strategy however, is at a disadvantage against other strategy clusters but achieves the highest payoff in self-play. The existence of this strategy is beneficial for the continuation and development of the group.

In the network evolution experiments involving the ”altruists” strategy cluster, it was observed that introducing the ”altruists” strategy cluster into the initial network according to different rules resulted in the evolved strategies generally placing the fitness of individuals in the random population at intermediate to high levels, with little impact on the average fitness. Additionally, after introducing a mechanism for eliminating low-cooperation strategies, the fitness of individuals in the population tended to be more centered around intermediate levels, and a certain number of high-fitness individuals also emerged. This mechanism overall enhanced the fitness of the population and proved to be a method for promoting cooperation.

The experiments provide a theoretical foundation for the evolutionary processes of social networks. Through the study of asymmetric prisoner’s dilemma on weighted network, we uncover the relationships among strategies in complex evolutionary game environments, offering a framework for individuals and organizations to deploy effective countermeasures in practical decision-making. These findings not only enrich evolutionary game theory but also provide new perspectives and strategies for understanding and promoting cooperative behavior in social systems, thereby opening new avenues for enhancing overall population fitness and sustainable development.

Appendix A Other Win Rate Curves at Different Cooperation Levels

Appendix B Other Comparison of $p$

References

[1] Robert Axelrod and Robert O Keohane. Achieving cooperation under anarchy: Strategies and institutions. World politics, 38(1):226–254, 1985.
[2] Haihui Cheng and Xinzhu Meng. Evolution of cooperation in multigame with environmental space and delay. Biosystems, 223:104801, 2023.
[3] Karl Sigmund. Introduction to evolutionary game theory. Evolutionary game dynamics, 69:1–26, 2011.
[4] Jörgen W Weibull. Evolutionary game theory. MIT press, 1997.
[5] Zhengwu Zhao and Chunyan Zhang. The mechanisms of labor division from the perspective of task urgency and game theory. Physica A: Statistical Mechanics and its Applications, 630:129284, 2023.
[6] Jonathan B King. Prisoner’s paradoxes. Journal of Business Ethics, 7:475–487, 1988.
[7] Jesus Gomez-Gardenes, Miguel Romance, Regino Criado, Daniele Vilone, and Angel Sánchez. Evolutionary games defined at the network mesoscale: The public goods game. Chaos: An Interdisciplinary Journal of Nonlinear Science, 21(1), 2011.
[8] Alexander J Stewart and Joshua B Plotkin. Extortion and cooperation in the prisoner’s dilemma. Proceedings of the National Academy of Sciences, 109(26):10134–10135, 2012.
[9] Robert Axelrod and William D Hamilton. The evolution of cooperation. science, 211(4489):1390–1396, 1981.
[10] Seung Ki Baek, Hyeong-Chai Jeong, Christian Hilbe, and Martin A Nowak. Comparing reactive and memory-one strategies of direct reciprocity. Scientific reports, 6(1):25676, 2016.
[11] Martin A Nowak and Karl Sigmund. Tit for tat in heterogeneous populations. Nature, 355(6357):250–253, 1992.
[12] Lee Alan Dugatkin and Michael Alfieri. Guppies and the tit for tat strategy: preference based on past interaction. Behavioral Ecology and Sociobiology, 28:243–246, 1991.
[13] Claus Wedekind and Manfred Milinski. Human cooperation in the simultaneous and the alternating prisoner’s dilemma: Pavlov versus generous tit-for-tat. Proceedings of the National Academy of Sciences, 93(7):2686–2689, 1996.
[14] Martin Nowak and Karl Sigmund. A strategy of win-stay, lose-shift that outperforms tit-for-tat in the prisoner’s dilemma game. Nature, 364(6432):56–58, 1993.
[15] Lorens A Imhof, Drew Fudenberg, and Martin A Nowak. Tit-for-tat or win-stay, lose-shift? Journal of theoretical biology, 247(3):574–580, 2007.
[16] William H Press and Freeman J Dyson. Iterated prisoner’s dilemma contains strategies that dominate any evolutionary opponent. Proceedings of the National Academy of Sciences, 109(26):10409–10413, 2012.
[17] Marialisa Scatà, Alessandro Di Stefano, Aurelio La Corte, Pietro Liò, Emanuele Catania, Ermanno Guardo, and Salvatore Pagano. Combining evolutionary game theory and network theory to analyze human cooperation patterns. Chaos, Solitons & Fractals, 91:17–24, 2016.
[18] Martin A Nowak and Robert M May. Evolutionary games and spatial chaos. nature, 359(6398):826–829, 1992.
[19] KM Ariful Kabir, Jun Tanimoto, and Zhen Wang. Influence of bolstering network reciprocity in the evolutionary spatial prisoner’s dilemma game: A perspective. The European Physical Journal B, 91:1–10, 2018.
[20] Wen-Bo Du, Xian-Bin Cao, and Mao-Bin Hu. The effect of asymmetric payoff mechanism on evolutionary networked prisoner’s dilemma game. Physica A: Statistical Mechanics and its Applications, 388(24):5005–5012, 2009.
[21] Jose A Cuesta, Carlos Gracia-Lázaro, Alfredo Ferrer, Yamir Moreno, and Angel Sánchez. Reputation drives cooperative behaviour and network formation in human groups. Scientific reports, 5(1):7843, 2015.
[22] Qing Jian, Xiaopeng Li, Juan Wang, and Chengyi Xia. Impact of reputation assortment on tag-mediated altruistic behaviors in the spatial lattice. Applied Mathematics and Computation, 396:125928, 2021.
[23] Yu-Zhong Chen, Zi-Gang Huang, Sheng-Jun Wang, Yan Zhang, and Ying-Hai Wang. Diversity of rationality affects the evolution of cooperation. Physical Review E—Statistical, Nonlinear, and Soft Matter Physics, 79(5):055101, 2009.
[24] Wenxing Ye and Suohai Fan. Evolutionary snowdrift game with rational selection based on radical evaluation. Applied Mathematics and Computation, 294:310–317, 2017.
[25] Nastaran Lotfi and Francisco A Rodrigues. On the effect of memory on the prisoner’s dilemma game in correlated networks. Physica A: Statistical Mechanics and its Applications, 607:128162, 2022.
[26] Zhipeng Zhang, Yu’e Wu, and Shuhua Zhang. Reputation-based asymmetric comparison of fitness promotes cooperation on complex networks. Physica A: Statistical Mechanics and its Applications, 608:128268, 2022.
[27] Christian Hilbe, Martin A Nowak, and Karl Sigmund. Evolution of extortion in iterated prisoner’s dilemma games. Proceedings of the National Academy of Sciences, 110(17):6913–6918, 2013.
[28] Yan Bi and Hui Yang. Heterogeneity of strategy persistence promotes cooperation in spatial prisoner’s dilemma game. Physica A: Statistical Mechanics and its Applications, 624:128939, 2023.
[29] Genki Ichinose and Naoki Masuda. Zero-determinant strategies in finitely repeated games. Journal of theoretical biology, 438:61–77, 2018.
[30] Jia-Xu Han and Rui-Wu Wang. Complex interactions promote the frequency of cooperation in snowdrift game. Physica A: Statistical Mechanics and its Applications, 609:128386, 2023.
[31] Jialu He, Jianwei Wang, Fengyuan Yu, and Lei Zheng. Reputation-based strategy persistence promotes cooperation in spatial social dilemma. Physics Letters A, 384(27):126703, 2020.
[32] György Szabó and Csaba Tőke. Evolutionary prisoner’s dilemma game on a square lattice. Physical Review E, 58(1):69, 1998.
[33] Guoyong Mao and Ning Zhang. Fast approximation of average shortest path length of directed ba networks. Physica A: Statistical Mechanics and its Applications, 466:243–248, 2017.
[34] Chunyan Zhang, Siyuan Liu, Zhijie Wang, Franz J Weissing, and Jianlei Zhang. The “self-bad, partner-worse” strategy inhibits cooperation in networked populations. Information Sciences, 585:58–69, 2022.
[35] Chao Luo, Xiaolin Zhang, Hong Liu, and Rui Shao. Cooperation in memory-based prisoner’s dilemma game on interdependent networks. Physica A: Statistical Mechanics and its Applications, 450:560–569, 2016.