1 s2.0 S0022519319304722 Main

Journal of Theoretical Biology 486 (2020) 110103
Contents lists available at ScienceDirect
Journal of Theoretical Biology

journal homepage: www.elsevier.com/locate/jtb
Probabilistic punishment and reward under rule of trust-based

decision-making in continuous public goods game
Yuhang Jiao, Tong Chen, Qiao Chen∗
College of Management and Economics, Tianjin University, Tianjin 300072, PR China
a r t i c l e i n f o a b s t r a c t
Article history: Altruistic punishment and reward have been proved to promote the evolution of cooperation in the public
Received 31 July 2019 goods game(PGG), but the punishers and the rewarders have to pay a price for these behaviors and that
Revised 1 November 2019
results in overall loss of interest. In present work, probabilistic punishment and reward are introduced to
Accepted 2 December 2019
PGG. Probabilistic punishment and reward mean that punishment and reward are executed with a cer-
Available online 3 December 2019
tain probability. Although that will reduce unnecessary costs, occasional absence of execution can lead to
Keywords: distrust. Thus we focus on how to implement punishment and reward efficiently within the structured
Probabilistic punishment and reward populations. Numerical simulations are performed and prove that probabilistic punishment and reward
Trust can promote the evolution of cooperation more effectively. Further researches indicate that there is an
Executing probability optimal executing probability to promote cooperation and maximize reduction of cost. In addition, when
Cooperation the unit cost is high, the PGG with probabilistic punishment and reward still helps the evolution of al-
Public goods game
truistic punishers and rewarders, thereby avoiding collapse of cooperation.
© 2019 Elsevier Ltd. All rights reserved.
1. Introduction various selective incentives, such as reward (Sasaki et al., 2015;

Szolnoki and Perc, 2010; Sasaki and Uchida, 2014; Dos Santos and
In both nature and human society, cooperative behaviors are Peña, 2017), punishment (Fowler, 2005; dos Santos and Wedekind,
common. Human are inseparable from each other in activities of 2015; Brandt et al., 2003; Bettina and Manfred, 2006), or volun-
production and living. However, not all individuals will take coop- teer (Semmann et al., 2003; Hauert et al., 2002), have been used
erative action in a group (Hardin, 1968; West et al., 2006). How to modify structures of payoff and suppress human selfish behav-
to understand the emergence and maintenance of cooperative be- ior. In addition, diversity (Santos et al., 2008; Maciejewski et al.,
havior among selfish individuals has attracted extensive attention 2014), information (Liu et al., 2018; Chen et al., 2018; Uchida and
(Nowak and Sigmund, 2004; Sasaki and Uchida, 2013). The evolu- Sasaki, 2013) and emotions (Chen et al., 2016; Wang et al., 2017)
tionary game theory explores how individuals with bounded ratio- have also been shown to be conducive to promoting cooperation.
nality can maximize returns over time in the process of repeated Reciprocity (Gintis, 20 0 0b; Nowak and Sigmund, 1998; Boyd and
game (Hauert et al., 2019; McAvoy and Hauert, 2015). At present, Richerson, 1988; Uchida, 2010) plays an important role in promot-
evolutionary game theory is considered to be one of the most ing cooperation. Indirect reciprocity often combines with reputa-
powerful means to study cooperative behavior (Perc and Szol- tion (Ohtsuki and Iwasa, 2004). A body of literatures in psychol-
noki, 2010; Wang et al., 2015; Gintis, 20 0 0a). Prisoner’s dilemma ogy suggest that reward is a better incentive in changing attitude
game (PDG) (Weibull, 1997; Szolnoki et al., 2008; Wang and Lv, and behavior than punishment is Kim et al. (2006) and Larsen and
2019b)and public goods game (PGG) (Hardin, 1968; Wang and Lv, Tentis (2003). Other studies concluded that punishment is more
2019a; dos Santos, 2015) are the most common models for study- effective than reward in repeated PGG (Choi and Ahn, 2013; Sutter
ing cooperative behaviors. et al., 2010). Some researchers have suggested punishment is very
Some experiments have also shown that because of the friend- effective for deterring free-riding and, thus, facilitating cooperation
liness, revenge, and transparency of Tit-for-Tat, cooperation can in PGG (Bochet et al., 2006; Carpenter, 2007). Tolerance-based pun-
evolve in selfish individuals (Newth, 2009; Kümmerli et al., 2007; ishment (Gao et al., 2012) has been investigated and the results
Nowak and Sigmund, 1992; Milinski, 1987). It is well known that may enhance the understanding of altruistic punishment in the
evolution of human cooperation. However, punishment are costly,
and who bears the costs has become a new problem. “Second-
∗
Corresponding author. order free riders” who avoid execution prevail and thus eliminate
E-mail address: chenqiao@tju.edu.cn (Q. Chen). the threat of punishment (Szolnoki and Perc, 2017).
https://doi.org/10.1016/j.jtbi.2019.110103
0022-5193/© 2019 Elsevier Ltd. All rights reserved.
2 Y. Jiao, T. Chen and Q. Chen / Journal of Theoretical Biology 486 (2020) 110103
Fig. 1. the whole circulatory process of evolution.
At present, some researchers (Chen et al., 2016) turn to some the emergence and maintenance of cooperation within the struc-
practical examples to explore the way of solving the dilemma of tured populations. The result of the simulation is to provide advice
public goods. “Carrots and sticks” are measures that are widely on how the individuals perform reward and punishment to enable
used in practice to promote cooperation. For example, in order to the public goods to be well supplied.
raise funds from local villagers for a kind of public cultural goods, The rest of this paper is structured as follows. Public goods
village opera, village officials and local prestigious elders take some game model with probabilistic punishment and reward is intro-
measures of punishment and reward in some areas. However, they duced in detail in Section 2. The results of the numerical simu-
sometimes do not execute it due to their unwillingness to bear lation are analyzed in Section 3 and the conclusions are drawn in
these additional costs. Under this circumstance, cooperation can Section 4.
still emergence and be sustained a few years later (Village opera
is held once a year). We also find that when they hardly perform 2. The PGG model with probabilistic punishment and reward
punishment and reward, the effect of promoting cooperation is bad
in the long run because of the distrust of villagers. Thus we ex- To preserve comparability with previous works, the public
plore whether there is an optimal probability of implementing to goods game is staged on a square lattice using periodic bound-
promote and maintain cooperation efficiently. Inspired by previous ary conditions. In the network, lattices represent agents and every
literature and realities, this article introduce probabilistic punish- agent has ki = 4 neighbors. At each time step, agents on the net-
ment and reward in PGG. It means that individuals perform pun- work attend ki + 1 games groups that are centered on itself and
ishment or reward with a certain probability (ρ ). That is to say, centered on its neighbors. The group consists of a central lattice
the punisher (rewarder) sometimes does not punish (reward) the and its four neighbors. There are six types of strategies (Si∗ ), in-
defectors (cooperators). If a individual averagely performs punish- cluding pure cooperator (CN), pure defector (DN), punishing co-
ment (rewards) v times per w times public goods games, it can be operator (PC), punishing defector (PD), rewarding cooperator (RC)
said ρ = wv , (0 ≤ ρ ≤ 1). Since punishment or reward is costly, and rewarding defector (RD). Each punishing individual (rewarding
probabilistic execution could reduce unnecessary losses in PGG. individual) sanctions (rewards) its nearest defectors (cooperators)
However, this kind of incomplete performance leads to the distrust, in the group centered on itself (not considering itself) with a fine
which in turn induces opportunistic behavior causing decreasing of (bonus) at a personal cost. Initially, CN and DN respectively account
cooperators. Some researches have shown that expressing trust is for 25 percent, and other types of players respectively account for
an important social mechanism to promote cooperative behaviors 12.5 percent. Fig. 1 shows the whole circulatory process of evolu-
(Axelrod and Hamilton, 1981). The raise of donation is referred to tion. The important parameters and their meanings in the model
as the result of sense of trust and the level of trust changes over are summarized in the Table 1.
time and with experience (Fehr and Fischbacher, 2003; King-Casas The interaction. Each cooperator i (CN, PC, RC) will contribute
et al., 2005). This design takes this into account and combines ai = 1 to the group that it attends, while the defectors i (DN,
trust with probabilistic punishment and reward. In general, this PD, RD) contribute nothing and ai = 0. Subsequently, the sum of
work aims to investigate the role of probabilistic performance in contributions in a group is multiplied by the synergy factor R. The
Y. Jiao, T. Chen and Q. Chen / Journal of Theoretical Biology 486 (2020) 110103 3
Table 1
The definitions and descriptions of parameters.
Parameter Definition and description
ki The number of agent i’s neighbors

R The synergy factor
ai The contribution of the agent i
Nc The number of cooperators in a group
i A set of game groups containing agent i
pi The payoff of agent i in the absence of punishment and reward
pi ∗ The final payoff of agent i under punishment and reward
ρ The probability of executing punishment or reward
f The fine
r The bonus
h The unit cost of punishment or reward
u The number of RCt and RDt in the group centered on a cooperator
m The number of PCt and PDt in the group centered on a defector
g The number of cooperators in the group centered on a RCt or RDt
b The number of defectors in the group centered on a PCt or PDt
The amplitude of environment noise
j The agent i’s nearest neighbors
k The punisher or rewarder who is among j
cik (t) Agent i’s evaluation of credibility of k at time t
i The increment or decrement of cik
CT The tolerance value of credibility of surrounding
Ci (t) Agent i’s overall evaluation of credibility of surrounding at time t
n The number of k
n1 The number of punishers among k
n2 The number of rewarders among k
α1 The weight coefficients of fines for a defector
α2 The weight coefficients of bonuses for a defector
β1 The weight coefficients of fines for a cooperator
β2 The weight coefficients of bonuses for a cooperator
resulting amount is then shared equally among all members of the implementers (PCt , PDt , RCt , RDt ) will bear the corresponding
group, irrespective of their strategy. The agent i’s payoff is the sum costs, while the payoffs of false implementers (PCf , PDf , RCf , RDf )
of shares of all the groups that it belongs to. The pi represents the are equal to that of CN or DN respectively. We use p∗i to represent
payoff of each agent i in the absence of punishment and reward, the final payoff under probabilistic punishment and reward, and
then it can be expressed as: its calculations are on the basis of pi . So p∗i can be expressed as
⎧
R ∗ Nc pi + u ∗ r Si∗ = CN (PC f , RC f )
pi = pij = − ai (1) ⎪
⎪
kj + 1 ⎪
⎪ Si∗ = DN (P D f , RD f )
j∈i j∈i ⎨ pi − m ∗ f
pi + u ∗ r − b ∗ h Si∗ = PCt
Where j represents one of the game groups that player i at- p∗i = (3)
⎪ pi + u ∗ r − g ∗ h Si∗ = RCt
⎪
⎪
tends. R represents the synergy factor. Nc represents the number ⎪
⎩ pi − m ∗ f − b ∗ h Si∗ = P Dt
of cooperators in the group j. ai is the contribution of player i. pi − m ∗ f − g ∗ h Si∗ = RDt
When agent i is a cooperator, ai = 1, and when agent i is a de-
fector, ai = 0. kj is the number of neighbors of centered agent in Where Si∗ represents strategy that is adopted by i. f and r rep-
group j. i represents all the game groups containing player i. resent the fine and bonus respectively. It is assumed that the costs
Probabilistic punishment and reward. At the end of the interac- of punishment and reward are equal, and they are denoted by h.
tion, punishers (PC, PD) and rewarders (RC, RD) execute punish- u (m) represents the number of RCt , RDt (PCt , PDt ) in the group
ment and reward. Because punishment and reward are costly, the centered on cooperator (defector) i (not considering i). g (b) repre-
agent sometimes does not perform them. In this article, the above sents the number of cooperators (defectors) in the group centered
behavior is called probabilistic execution. That is to say, punish- on RCt , RDt (PCt , PDt ) i (not considering i).
ment and reward are performed with a certain probability (ρ ). Initial strategy (Si ). After calculating the p∗i , each agent tries to
We assume that all agents have the same probability of execution maximize its own payoff and imitates others based on the Fermi
(0 ≤ ρ ≤ 1). Thus after an interaction, some of performers authen- formula. We call the strategy by imitation an initial strategy (Si ).
tically punish or reward its nearest neighbors, and others do not. The probability of imitating depends on the difference between the
Accordingly, some of the defectors and cooperators get fines and returns of the two individuals. It can be expressed by
bonuses, and others do not. In order to clearly distinguish differ- 1
ent actions, we divide the punishing and rewarding actions into W Si ← S j = (4)
1 + exp p∗i − p∗j /
two categories according to whether they actually take action. As
shown in the following formula, one is the true implementer who Where t indicates time step. p∗i and p∗j respectively represent
truly perform punishment or reward, denoted by PCt , PDt , RCt , RDt , the final payoffs of agent i and agent j. represents where the
the other is the false implementer who falsely perform punish- amplitude of environment noise. In this paper, = 0.1 is a fixed
ment or reward, denoted by PCf , PDf , RCf , RDf . number (Szabó and Hauert, 2002).
Since the issue of social information is not the focal point, we
true(ρ ) : PCt , P Dt , RCt , RDt
PC, P D, RC, RD (2) assume that agents have local information about strategy types
false(1 − ρ ) : PC f , P D f , RC f , RD f
of its nearest neighbors (Boros et al., 2010). Because individual’s
The calculation of final payoff p∗i . After probabilistic punishment impressions of punishers (rewarders) who did not carry out pun-
and rewards, the payoffs will change. Obviously only the true ishment (reward) become worse, they tend not to believe those
people in future interactions. In order to explain the impact of the meanings of f and r are consistent with the above. As shown in
probabilistic execution on degree of trust, we define cik (t) as agent the above formulas, when Si = D, defector i will calculate the ex-
i’s evaluation of credibility of punisher or rewarder k (k ∈ j) who pected benefit (EB) of switching to cooperation based on the up-
is among its nearest neighbors j ( j = {1, 2, 3, 4}) at time t. cik (t) per formula. EB should be composed of the benefit that is exempt
is updated at the ene of interactions. If k authentically performs from sanction and the added bonus due to transition to coopera-
punishment (reward) at time t − 1, cik (t) will increase by i at tive strategy. Since a defector can get rewards only if it firstly con-
time t. Conversely, if k does not perform punishment (reward) at tributes ai = 1 to the public pool, the contribution as cost should
time t − 1, cik (t) decreases by i at time t. In order to keep cik (t) be deducted from the bonus n2 r. α 1 and α 2 respectively represent
always in interval (0, 1), we set cik (t ) = 1, when cik (t) > 1 and the weight coefficients of fines and bonuses. The bonus is more
cik (t ) = 0, when cik (t) < 0 in program. cik (t) can be expressed as valued than fine for defectors in the process of switching to co-
cik (t ) = cik (t − 1 ) ± i , 0 ≤ cik (t ) ≤ 1 (5) operation, so α 1 < α 2 . And considering the individual heterogene-
ity, α 1 and α 2 are respectively drawn from the uniform distribu-
Where we set cik (0 ) = 0.5. k (k ∈ j) represents every punisher tion of interval (0, 0.5) and (0.5, 1). Finally, if the calculated EB is
or rewarder among i’s neighbors j. Notably, this definition of cik is greater than zero (EB > 0), the defector will transition to cooper-
agent i’s subjective evaluation or impression of k. Thus, consider- ator (Si∗ = C). Otherwise the initial strategy will remain unchanged
ing the heterogeneity (Fischbacher and Gachter, 2010), i of each (Si∗ = Si = D). When Si = C, cooperator i will calculate the expected
agent is assigned a random number from a interval (0, 0.5) and re- loss (EL) of switching to defection based on the lower formula. EL
mains unchanged in evolution. Each punishing or rewarding agent is composed of the additional fine and the lost bonus due to tran-
builds up its credibility or “a image score” (Fu et al., 2008) by true sition to defection. Because if a cooperator transitions from coop-
execution. eration to defection, the cost of contributing 1 to the public pool
T rust − based rule of decision − making. Probabilistic implemen- will be saved, the cost is deducted from the fine n1 f. β 1 and β 2
tation affects the degree of trust. Whether individuals trust pun- respectively represent the weight coefficients of fines and bonuses.
ishers and rewarders to carry out them or not influences decision- Cooperators more mind the fine in the process of switching to de-
making. After initial decision, only when individuals believe that fection, so β 1 > β 2 . Considering the individual heterogeneity, β 1
punishment or reward will be executed, will they believe that and β 2 are respectively drawn from the uniform distributions of
their payoffs vary and make a sensible decisions again. Therefore, interval (0.5, 1) and (0, 0.5). Finally, if the calculated EL is less
a trust-based rule of decision-making (see Fig. 1) is introduced. than zero (EL < 0), then the cooperator will transition to defec-
Through two-step judgments, each agent makes final decision (Si∗ ) tor (Si∗ = D), otherwise the initial strategy will remain unchanged
based on initial strategy (Si ) that is chosen by imitating. (Si∗ = Si = C).
In step one, agent i decide whether to trust the punishers
and rewarders to execute punishment and reward. We define Ci (t) 3. Numerical simulation results and analysis
(0 ≤ Ci (t) ≤ 1) as agent i’s overall evaluation of credibility of sur-
rounding, which is determined by cik (t) (k ∈ j). The calculation of The simulations are conducted on a square lattice of size N =
Ci (t) is given by the following formulas: 10 0 0 0 (L × L = 100 × 100) with periodic boundary conditions. At

cik (t ) the initial moment, cooperators (CN, PC, RC) and defectors (DN, PD,
Ci (t ) = k
k ∈ {1, 2, 3, 4} (6)
n RD) respectively account for 50 percent. After each initialization,
Where t represents time step. The meaning of k is the same as the system evolves 10,0 0 0 time steps, and 20 independent simu-
above. The sum runs again over all k. n(1 ≤ n ≤ 4) is given by the lations ensure reliability of the evolutionary results. We set syn-
total number of k. From the above formula, Ci (t) is the mean of ergy factor (R) to be 2.6. We assume that fine (f) and bonus (r)
all cik (t)s of agent i. Therefore, Ci (t) can be kept in (0, 1) because are equal, and the costs (h) of the reward and punishment are
the value of each cik (t) is within the interval (0, 1). If the value of equal. In addition, in order to explore the impact of probabilis-
Ci (t) is equal or greater than a certain number that is called the tic punishment and reward function more intuitively, we introduce
tolerance value CT , agent i trusts the all punishers and rewarders traditional punishment and reward as a comparison. In the tradi-
to sanction and reward. Considering the heterogeneity of the re- tional punishment and reward model, there are six different types
quirements for credibility, CT of each agent is drawn from the uni- of strategies. The strategy types and their initial proportions are
form distribution of interval (0, 1) and remains unchanged in the the same as that in the probabilistic execution model. In the ini-
process of evolution. If Ci (t) < CT , it means that credibility of sur- tial setup of traditional punishment and reward model, the pure
rounding does not meet i’ requirement at time t and, thus Si∗ = Si . cooperators (CN) and the pure defectors (DN) respectively account
Otherwise, the expected benefit (EB) or expected loss (EL) of agent for 25 percent, and punishing cooperator (PC), punishing defector
i will be calculated to help to determine Si∗ . (PD), rewarding cooperator (RC) and rewarding defector (RD) re-
In step two, agent i determines Si∗ according to the calculation spectively account for 12.5 percent. However, unlike probabilistic
of expected benefit (EB) or expected loss (EL). Because individuals execution model, the punishment and reward are fully executed in
are payoff-driven, they only adopt strategy that is profitable or can the traditional punishment and reward model, so the PC, PD, RC,
reduce losses. For a defector, if the return is greater than the loss RD completely correspond to the true implementers (PCt , PDt , RCt ,
after transition to cooperative strategy, it would be willing to coop- RDt ) in the probabilistic execution model, and there are no false
erate. Otherwise, it will maintain the initial strategy. Similarly, for implementers. The calculation of payoff in the traditional model
a cooperator, if the loss is greater than the return after transition is also the same as that in the probabilistic execution model. In
to defective strategy, it will maintain the initial strategy. Otherwise, addition, players update their strategy solely by imitation learning
it will defect. Based on the above analysis, we give the calculation based on the Fermi formula, which is different from probabilistic
rules for EB and EL respectively: execution model. The simulation results are as follows.
First of all, we need to explore whether the probabilistic ex-
EB = α1 n1 f + α2 ∗ (n2 r − 1 ) Si = D
(7) ecution will affect the emergence and maintenance of coopera-
EL = β1 ∗ (n1 f − 1 ) + β2 n2 r Si = C
tion in continuous public goods game. Then, we will further ex-
Where Si represents the initial strategy. And it is the basis of plore whether there is an optimal probability that can minimize
determining Si∗ . n1 and n2 respectively represent the total num- the executing cost and maximize fraction of cooperators. Therefore,
ber of punishers and rewarders among i’s nearest neighbors. The we compare the PGG of traditional punishment and reward with
Fig. 2. The evolution of cooperation over time in the PGG with different probability of execution (ρ = 0.2, 0.4, 0.6, 0.8) and the PGG with traditional punishment and reward.
The PGG with traditional punishment and reward finally reaches the state of global defection. The greater executing probability (ρ ) is, the more favorable it is to promote
cooperation. Other parameter settings: = 0.1, R = 2.6, f = r = 1, h = 0.5.
Fig. 3. (a)The evolution of the fraction of cooperation over time for high, medium and low f (r) ( f = r = 1.5, 1, 0.5). The cooperation rate increases with the increase of f
(r), but due to the limit of lower ρ , the maximal cooperation rate is only up to 0.45. Other parameter settings: ρ = 0.4, = 0.1, R = 2.6, h = 0.5. (b)The evolution of the
fraction of cooperation over time for different f (r) ( f = r = 1.5, 1, 0.5). The cooperation rate increases with the increase of f (r). When ρ is large, the maximal cooperation
rate is up to 1. Other parameter settings: ρ = 0.8, = 0.1, R = 2.6, h = 0.5.
that of probabilistic execution. As shown in Fig. 2, we investigate the condition that the executing probability is at a relatively low
the cooperative evolution of PGG model under five different cases. level (ρ = 0.4). It can be clearly seen that when the fine and the
Here, we set f = r = 1, and h = 0.5. The dark blue curve represents bonus are 1.5 or 1, the cooperation rate first rises to the peak and
the PGG with traditional punishment and reward, and the other then declines and gradually stabilizes over time. And the fraction
four curves respectively represent the PGG with different executing of cooperation reaches 0.45 and 0.25 respectively in equilibrium,
probability (ρ = 0.2, 0.4, 0.6 and 0.8). It can be clearly seen that both lower than 0.5. When f = r = 0.5, the fraction of cooperation
cooperation rate drops to 0 when steady state is reached in tradi- decreases from the beginning and finally stabilizes ( ≈ 0.1) over
tional punishment and reward. When ρ is low (ρ = 0.2), the result time. It can be concluded that higher f and r are more conducive
is still global defection. The situation improves for ρ = 0.4, but the to the emergence and maintenance of cooperation than the a lower
cooperation rate is still at a low level. While ρ is large (ρ = 0.6 or fines (bonuses). Fig. 3(b) shows the evolution of fraction of coop-
0.8), the result of evolution reaches a state of global cooperation. eration over time step for the three different levels of fines and
Therefore, we can draw the following conclusions: (i)Probabilistic bonuses ( f = r = 1.5, 1, 0.5), when the executing probability is at
punishment and reward can promote the emergence and main- a high level (ρ = 0.8). From Fig. 3(a) and Fig. 3(b), it can be seen
tenance of cooperation in continuous public goods game. (ii) The that under different executing probabilities (ρ ), the level of coop-
greater ρ is, the more favorable it is to promote cooperation. When eration increases with the increases of fine and bonus. In addition,
ρ = 0.6, the result of evolution is close to the global cooperation. for a relatively small ρ (see Fig. 3(a)), even if f and r are very large,
In this case, about 0.4 of the cost can be saved and efficiency of the effect of promoting cooperation is still not significant. But for
supplying public goods is improved. a large executing probability (see Fig. 3(b)), as long as f and r are
According to the introduction of the model in Section 2, the moderate, system can achieve global cooperation. The possible rea-
calculation of expected returns and expected losses is also an im- son is that when the value of ρ is small, people largely distrust
portant factor that affect the decision-making except the executing punishers and rewarders. Therefore, even if f and r are very high, it
probability. The payoff-driven agent considers changing strategies will not promote the cooperation well. Moreover, when f and r are
only if it is profitable or reducing losses. The expected return and very small, the state of all defection cannot be changed even if ρ is
expected loss are calculated based on the fine and bonus. There- large. Therefore, it is inappropriate to choose very small ρ and f (r).
fore, we continue to study how the different f and r affect the evo- In order to better explore the effect of probabilistic execution,
lution of cooperation in PGG with probabilistic punishment and re- we next focus on how different combinations of ρ and f (r) af-
ward. fect the level of cooperation. Here, we introduce the mean pay-
Fig. 3 (a) shows the evolution of fraction of cooperation over
N
p∗i
time step for different fines and bonuses ( f = r = 1.5, 1, 0.5) on off (MP) as a measure. It is defined as MP = i=1
, where p∗i is
N
Fig. 4. (a)The cooperation rate as executing probability (ρ ) for three kinds of fines and bonuses ( f = r = 1.5, 1, 0.5). For different f(r) values, the cooperation rate increases
with the increase of ρ . (b)The mean payoff as executing probability (ρ ) for three kinds of fines and bonuses ( f = r = 1.5, 1, 0.5). For different f (r), the mean payoff increases
with the increase of ρ . Other parameter settings: = 0.1, R = 2.6, h = 0.5.
Fig. 5. (a) The cooperation rate as fine (f) and bonus (r) for different executing probabilities (ρ = 0.9, 0.6, 0.3) and traditional punishment and reward. For different ρ ,
cooperation rate increases with increasing value of f (r). (b)The mean payoff as fine (f) and bonus (r) for the different executing probabilities (ρ = 0.9, 0.6, 0.3) and traditional
punishment and reward. For different ρ , the mean payoff increases with increasing f(r). Other parameter settings: = 0.1, R = 2.6, h = 0.5.
each agent’s final payoff (see Section 2), N is the population (N =

10 0 0 0). Fig. 4(a) and (b) respectively show the variation of frac-
tion of cooperators and the mean payoff with ρ under three differ-
ent levels of punishment and reward ( f = r = 1.5, 1, 0.5). As shown
in Fig. 4(a), the patterns of the three curves are generally analo-
gous. It can be seen that the cooperation rate increases with the
increase of ρ regardless of the value of f and r. When ρ is less
than 0.6, three curves all show an upward trend. In addition, for
f = r = 1.5 or f = r = 1, when the execution probability is 0.6, the
system reaches global cooperation. However, for f = r = 0.5, even
if it is completely executed (ρ = 1), the system cannot achieve
global cooperation. It can be seen that in order to promote co-
operation better, f and r are not suitable to be too low. Accord-
ing to Fig. 4(b), we can draw a similar conclusion. The mean pay-
off increases with the increase of ρ . But when ρ > 0.6, there be
no significant increase. In addition, for a fixed ρ , the mean pay- Fig. 6. The cooperation rate as cost (h) of executing punishment (reward), under
different executing probabilities (ρ = 0.9, 0.6, 0.3) and traditional punishment and
off increases with the increase of f (r), but it does not continue
reward. For ρ = 0.9, 0.6, 0.3, the cooperation rate hardly decreases with the in-
to increase, when f (r) reaches the critical value. Thus, combina- crease of cost (h), but for traditional execution, the cooperation rate drops sharply
tion of moderate execution probability and fines (bonuses), such from h = 0.3 and eventually falls to zero. Other parameter settings: = 0.1, R =
as ρ = 0.6, f = r = 1, should be adopted in order to achieve a good 2.6, f = r = 1.
effect of promoting cooperation in real life. In this case, cost of 40
percent can be saved. If the cost of execution is relatively high, it
is better to choose a small executing probability (ρ = 0.5) and a the fractions of cooperations increase with the values of f and r.
high f (r) ( f = r = 1.5). The fraction of cooperations in the PGG with ρ = 0.3 is always
In order to explore the critical f (r) values that enable the at a low level. Interestingly, when the executing probability is
system to reach global cooperation under a certain executing high (ρ = 0.9), the critical f (r) value that enables system to reach
probability, we continue to run simulation. Numerical simulation global cooperation is 0.7. However, when the executing probability
results are shown in Fig. 5(a). The blue curve represents the public is a moderate value (ρ = 0.6), the critical f (r) value increases
goods game with traditional punishment and reward. The rest of to 1. Moreover, when the probability is low (ρ = 0.3), the peak
three curves respectively represent the PGG under high, medium cooperation rate is only 0.1 at f = r = 1. The possible reason is
and low executing probabilities. It can be seen from it that when that a low ρ lead to distrust. Fig. 5(b) shows the relation between
f = r < 1, the fraction of cooperations in the traditional PGG mean payoff and the fine (bonus) for different ρ . It can be seen
model is always 0, while in the PGG with ρ = 0.6 and ρ = 0.9, that in the PGG model with traditional punishment and reward,
Fig. 7. (a)The fraction of cooperators as synergy factor (R) for different f and r ( f = r = 1.5, 1, 0.5). For both f = r = 1.5 and f = r = 0.5, the levels of cooperation remain
unchanged with the increase of R, and they are maintained at 1 and 0.5 respectively, and for f = r = 1, the level of cooperation increases with the increase of R. Other
parameter settings: = 0.1, ρ = 0.6. (b)The fraction of cooperators as synergy factor (R) for different executing probabilities (ρ = 0.9, 0.6, 0.3). For ρ = 0.9 and ρ = 0.3, the
levels of cooperation remain unchanged with the increase of R, they are maintained at 1 and 0.1 respectively, and for ρ = 0.6, the level of cooperation increases with the
increase of R. Other parameter settings: = 0.1, f = r = 1 .
Fig. 8. Snapshots of typical distributions of six types of players at different time steps (t = 0, 10, 10 0 0, 10 0 0 0) for four cases. Colors distinguish CN (blue), DN (red), PC
(green), PD (yellow), RC (pink), RD (orange). The first row shows the PGG with traditional punishment and reward. DN (red) occupy the whole network in the end. In the
second row (ρ = 0.3), RD (orange) take over almost all lattices in the end. In the third row (ρ = 0.6), PC (green) occupy almost all lattices in the end, and a small number
of PD (yellow) still survive in the population. In the fourth row (ρ = 0.9), PC (green) occupy all lattices in the end. Other parameter settings: = 0.1, R = 2.6, f = r = 1,
h = 0.5. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
mean payoff reaches the peak when f = r = 1.1, whereas in the low f (r) and high executing probability (ρ = 0.9) could be chosen.
PGG model with ρ = 0.9 and ρ = 0.6, the fine (bonus) values Conversely, if the cost of execution is high, a combination of high f
that maximize the mean payoff respectively decrease to 0.7 and (r) and moderate executing probability (ρ = 0.6) could be chosen.
1. Therefore, the critical f (r) value is negatively correlated with Compared with the PGG with traditional execution, achieving
ρ . In real life, if the cost of execution is low, a combination of the aim of reducing cost for the rewarder and the punisher is the
advancement of probabilistic execution. Therefore, we continue to make final decision. Researchers have discovered the existence of
explore the relation between h and cooperation rate. The results trust in social cooperation in the past study (Fehr and Fischbacher,
are shown in Fig. 6. In the PGG with traditional punishment and 2003; King-Casas et al., 2005). This design takes this into account
reward, the cooperation rate decreases sharply from h = 0.3. When and combine it with probabilistic punishment and reward. The
h = 0.5, the system reaches a state of global defection. However, present work aims to examines the role of trust-based probabilistic
when probabilistic execution is introduced, the level of coopera- punishment and reward in continuous public goods games within
tion has been significantly improved. When ρ = 0.9, the rate of structured population. We conducted numerical simulations from
cooperation does not decrease with the increase of h and is always several aspects through agent-based model.
maintained at a high level ( ≈ 1). When ρ = 0.6, the cooperation Through numerical simulation, the following conclusions can
rate shows a slightly downward trend, but it is still close to 1. For be drew. First of all, compared with traditional punishment and
ρ = 0.3, since the executing probability is too low, it directly leads reward, the PGG with probabilistic punishment and reward can
to distrust. Thus the cooperation rate remains at a low value of reduce unnecessary costs in the long run and make punishment
0.1. Therefore, we can draw the following conclusion: When cost is and reward more efficient. Secondly, when the ρ is extremely low
very high, compared with traditional punishment and reward, the (ρ < 0.3), the cooperation rate is at a low level regardless of
probabilistic execution is more conducive to the evolution of al- the value of f (r). The possible reason is that a very low ρ eas-
truistic punishers and rewarders, thereby avoiding the collapse of ily causes distrust of individuals who interacts with the punisher
cooperation. (rewarder). When individuals distrust, they will not calculate the
Synergy factor (R) is fixed in the above simulations, but it is un- expected benefit or expected loss. Under this circumstance, deter-
clear whether the probabilistic execution still plays a role in coop- rence of punishment and temptation of reward do not work. Sim-
erative evolution when R varies. Therefore, we further explore how ilarly, when f (r) is extremely low ( f = r = 0.3), global cooperation
R affect the level of cooperation. Fig. 7(a) shows the relation be- cannot be achieved even if ρ is high. The reason is that deterrence
tween fraction of cooperators and the R for different f and r. It can of punishment and the temptation of the reward could not be suf-
be clearly observed that there are two horizontal curves, which in- ficient to promote global cooperation, instead coexist of coopera-
dicate that for extremely large or small fines and bonuses ( f = r = tors and defectors. Therefore, in practice, a combination of moder-
1.5 or f = r = 0.5), the level of cooperation is almost independent ate ρ and f (r) could be adopted. When one of them is high, the
of the synergy factor R. Specifically, when they are extremely large other can be relatively low. This kind of combination can signifi-
( f = r = 1.5), the fraction of cooperators is always maintained at cantly promote cooperation with minimal cost in interactions. Fi-
1. When they are extremely small ( f = r = 0.5), the fraction of co- nally, through the analysis of results, we find that when ρ = 0.6,
operators is always maintained at a low value ( ≈ 0.5). The other as long as the fines and bonuses are not very small, the emer-
curve represents f = r = 1, and the level of cooperation increases gence and maintenance of cooperation can be achieved. Therefore,
with R and finally reaches a stable state of global cooperation. ρ = 0.6 is an optimal choice. Moreover, the result of simulation
Fig. 7(b) shows the relation between fraction of cooperators and R suggests that when h is large, cooperative rate in PGG with prob-
for different ρ . The blue curve represents the PGG with traditional abilistic execution is still maintained at a high level. But there is a
execution, and R = 2.8 is the critical value that enable the coopera- significant decrease of cooperation with increase of cost h in PGG
tion rate to rise sharply from 0 to 1. The two horizontal curves re- with traditional punishment and reward. This reveals that even
spectively represent ρ = 0.3 and ρ = 0.9. These indicate that when when the cost of punishment and reward is relatively high, the
ρ is extremely low or high, the level of cooperation is hardly af- PGG model with probabilistic execution can help evolution of al-
fected by the synergy factor R. Specifically, when ρ is extremely truistic punishers and rewarders, thereby avoiding collapse of co-
large, the fraction of cooperators is always maintained at a very operation. The finding provides a reference for solving the dilemma
high level of 1 even if R is small. When ρ is extremely small, the in the supply of public goods. It is possible to rely on probabilistic
fraction of cooperators is always maintained at a low level ( ≈ 0.1) punishment and reward to aspire people’s enthusiasm for coop-
even if R is large. The other curve represents the medium proba- eration. In addition, the village officials and other social members
bility of execution (ρ = 0.6). The fraction of cooperators increases should pay attention to the establishment of credibility and avoid
with the R and finally stabilizes ( ≈ 1). use a low executing probability.
The snapshots of Fig. 8 demonstrate the coarsening process im-
pressively for four cases. In the PGG with traditional punishment
and reward (the first row), DN (red) fast spread in the population Acknowledgments
and occupy all lattices in the end. Compared with the first row,
other cases respectively promote the evolution of reward and pun- We would like to acknowledge funding support from the Na-
ishment. In the PGG with ρ = 0.3 (the second row), although the tional Social Science Fund of China (NSSFC) (Grant No. 17BZZ006).
rewarders (RD, RC) have gained an advantage of evolution, there
are only a small fraction of cooperators (RC). In the PGG with
ρ = 0.6 (the third row), although a small number of PD (yellow) References
still survive in the population, PC (green) occupy almost all lat-
Axelrod, R., Hamilton, W.D., 1981. The evolution of cooperation. Science 211 (4489),
tices in the end. In the PGG with ρ = 0.9 (the fourth row), PC
1390–1396.
(green) occupy all lattices in the end, which is similar to previous Bettina, R., Manfred, M., 2006. The efficient interaction of indirect reciprocity and
researches (Hilbe and Sigmund, 2010). costly punishment. Nature 444 (7120), 718–723.
Bochet, O., Page, T., Putterman, L., 2006. Communication and punishment in volun-
tary contribution experiments. J. Econ. Behav. Org. 60 (1), 11–26.
4. Conclusion Boros, E., Elbassioni, K., Gurvich, V., Makino, K., 2010. A pumping algorithm for er-
godic stochastic mean payoff games with perfect information. In: International
Conference on Integer Programming and Combinatorial Optimization. Springer,
In this article, the probabilistic punishment and reward are in-
pp. 341–354.
troduced to PGG. It means that punishers (rewarders) execute pun- Boyd, R., Richerson, P.J., 1988. The evolution of reciprocity in sizable groups. J. Theor.
ishment (reward) with a certain probability (ρ ) due to saving cost. Biol. 132 (3), 337–356.
Thus when individuals interact with them, they first judge their Brandt, H., Hauert, C., Sigmund, K., 2003. Punishment and reputation in spatial pub-
lic goods games. Proc. R. Soc. London. Ser. B 270 (1519), 1099–1104.
credibility. If punishers and rewarders are creditable, the expected Carpenter, J.P., 2007. The demand for punishment. J. Econ. Behav. Org. 62 (4),
benefit or expected loss is further calculated to help individual to 522–542.
Chen, Q., Chen, T., Wang, Y., 2016. How the expanded crowd-funding mechanism of Nowak, M.A., Sigmund, K., 1998. The dynamics of indirect reciprocity. J. Theor. Biol.
some southern rural areas in china affects cooperative behaviors in threshold 194 (4), 561–574.
public goods game. Chaos Soliton. Fractal. 91, 649–655. Nowak, M.A., Sigmund, K., 2004. Evolutionary dynamics of biological games. Science
Chen, T., Wu, Z., Wang, L., 2018. Disseminators or silencers: the effect of informa- 303 (5659), 793–799.
tion diffusion intensity on cooperation in public goods game. J. Theor. Biol. 452, Ohtsuki, H., Iwasa, Y., 2004. How should we define goodness? Reputation dynamics
47–55. in indirect reciprocity. J. Theor. Biol. 231 (1), 107–120.
Choi, J.-K., Ahn, T., 2013. Strategic reward and altruistic punishment support coop- Perc, M., Szolnoki, A., 2010. Coevolutionary games a mini review. BioSyst. 99 (2),
eration in a public goods game experiment. J. Econ. Psychol. 35, 17–30. 109–125.
Dos Santos, M., Peña, J., 2017. Antisocial rewarding in structured populations. Sci. Santos, F.C., Santos, M.D., Pacheco, J.M., 2008. Social diversity promotes the emer-
Rep. 7 (1), 6212. gence of cooperation in public goods games. Nature 454 (7201), 213.
Fehr, E., Fischbacher, U., 2003. The nature of human altruism. Nature 425 (6960), dos Santos, M., 2015. The evolution of anti-social rewarding and its countermea-
785. sures in public goods games. Proc. R. Soc. B 282 (1798), 20141994.
Fischbacher, U., Gachter, S., 2010. Social preferences, beliefs, and the dynamics of dos Santos, M., Wedekind, C., 2015. Reputation based on punishment rather than
free riding in public goods experiments. Am. Econ. Rev. 100 (1), 541–556. generosity allows for evolution of cooperation in sizable groups. Evolut. Human
Fowler, J.H., 2005. Altruistic punishment and the origin of cooperation. Proc. Nat. Behav. 36 (1), 59–64.
Acad. Sci. 102 (19), 7047–7049. Sasaki, T., Uchida, S., 2013. The evolution of cooperation by social exclusion. Proc. R.
Fu, F., Hauert, C., Nowak, M.A., Wang, L., 2008. Reputation-based partner choice pro- Soc. B 280 (1752), 20122498.
motes cooperation in social networks. Phys. Rev. E 78 (2), 026117. Sasaki, T., Uchida, S., 2014. Rewards and the evolution of cooperation in public good
Gao, J., Li, Z., Cong, R., Wang, L., 2012. Tolerance-based punishment in continuous games. Biol. Lett. 10 (1), 20130903.
public goods game. Phys. A 391 (16), 4111–4120. Sasaki, T., Uchida, S., Chen, X., 2015. Voluntary rewards mediate the evolution of
Gintis, H., 20 0 0. Game theory evolving: a problem-centered introduction to model- pool punishment for maintaining public goods in large populations. Sci. Rep. 5,
ing strategic behavior. Princeton university press. 8917.
Gintis, H., 20 0 0. Strong reciprocity and human sociality. J. Theor. Biol. 206 (2), Semmann, D., Krambeck, H.-J., Milinski, M., 2003. Volunteering leads to rock–pa-
169–179. per–scissors dynamics in a public goods game. Nature 425 (6956), 390.
Hardin, G., 1968. The tragedy of the commons. Science 162 (3859), 1243–1248. Sutter, M., Haigner, S., Kocher, M.G., 2010. Choosing the carrot or the stick? endoge-
Hauert, C., De Monte, S., Hofbauer, J., Sigmund, K., 2002. Volunteering as red nous institutional choice in social dilemma situations. Rev. Econ. Stud. 77 (4),
queen mechanism for cooperation in public goods games. Science 296 (5570), 1540–1566.
1129–1132. Szabó, G., Hauert, C., 2002. Phase transitions and volunteering in spatial public
Hauert, C., Saade, C., McAvoy, A., 2019. Asymmetric evolutionary games with envi- goods games. Phys. Rev. Lett. 89 (11), 118101.
ronmental feedback. J. Theor. Biol. 462, 347–360. Szolnoki, A., Perc, M., 2010. Reward and cooperation in the spatial public goods
Hilbe, C., Sigmund, K., 2010. Incentives and opportunism: from the carrot to the game. EPL (Eur. Lett.) 92 (3), 38003.
stick. Proc. R. Soc. B 277 (1693), 2427–2433. Szolnoki, A., Perc, M., 2017. Second-order free-riding on antisocial punishment re-
Kim, H., Shimojo, S., O’Doherty, J.P., 2006. Is avoiding an aversive outcome reward- stores the effectiveness of prosocial punishment. Phys. Rev. X 7 (4), 041027.
ing? Neural substrates of avoidance learning in the human brain. PLoS Biol. 4 Szolnoki, A., Perc, M., Szabó, G., 2008. Diversity of reproduction rate supports coop-
(8), e233. eration in the prisoner’s dilemma game on complex networks. Eur. Phys. J. B 61
King-Casas, B., Tomlin, D., Anen, C., Camerer, C.F., Quartz, S.R., Montague, P.R., 2005. (4), 505–509.
Getting to know you: reputation and trust in a two-person economic exchange. Uchida, S., 2010. Effect of private information on indirect reciprocity. Phys. Rev. E 82
Science 308 (5718), 78–83. (3), 036111.
Kümmerli, R., Colliard, C., Fiechter, N., Petitpierre, B., Russier, F., Keller, L., 2007. Hu- Uchida, S., Sasaki, T., 2013. Effect of assessment error and private information on
man cooperation in social dilemmas: comparing the snowdrift game with the stern-judging in indirect reciprocity. Chaos Solit. Fractal. 56, 175–180.
prisoner’s dilemma. Proc. R. Soc. B 274 (1628), 2965–2970. Wang, X., Lv, S., 2019. The public goods game with shared punishment cost in
Larsen, M.A., Tentis, E., 2003. The art and science of disciplining children. Pediatr. well-mixed and structured populations. J. Theor. Biol..
Clin. 50 (4), 817–840. Wang, X., Lv, S., 2019. The roles of particle swarm intelligence in the prisoner’s
Liu, X., Pan, Q., He, M., 2018. Promotion of cooperation in evolutionary game dy- dilemma based on continuous and mixed strategy systems on scale-free net-
namics with local information. J. Theor. Biol. 437, 1–8. works. Appl. Math. Comput. 355, 213–220.
Maciejewski, W., Fu, F., Hauert, C., 2014. Evolutionary game dynamics in populations Wang, Y., Chen, T., Chen, Q., Si, G., 2017. Emotional decisions in structured popula-
with heterogenous structures. PLoS Comput. Biol. 10 (4), e1003567. tions for the evolution of public cooperation. Phys. A 468, 475–481.
McAvoy, A., Hauert, C., 2015. Structural symmetry in evolutionary games. J. R. Soc. Wang, Z., Wang, L., Szolnoki, A., Perc, M., 2015. Evolutionary games on multilayer
Interf. 12 (111), 20150420. networks: a colloquium. Eur. Phys. J. B 88 (5), 124.
Milinski, M., 1987. Tit for tat in sticklebacks and the evolution of cooperation. Na- Weibull, J.W., 1997. Evolutionary Game Theory. MIT press.
ture 325 (6103), 433. West, S.A., Griffin, A.S., Gardner, A., Diggle, S.P., 2006. Social evolution theory for
Newth, D., 2009. Asynchronous Iterated Prisoner’s Dilemma. Adaptive Behavior 17 microorganisms. Nature Rev. Microbiol. 4 (8), 597.
(2), 175–183.
Nowak, M.A., Sigmund, K., 1992. Tit for tat in heterogeneous populations. Nature
355 (6357), 250.

1 s2.0 S0022519319304722 Main

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

1 s2.0 S0022519319304722 Main

Uploaded by

Copyright:

Available Formats

Journal of Theoretical Biology 486 (2020) 110103

Contents lists available at ScienceDirect

Journal of Theoretical Biology

Probabilistic punishment and reward under rule of trust-based

1. Introduction various selective incentives, such as reward (Sasaki et al., 2015;

Fig. 1. the whole circulatory process of evolution.

Parameter Deﬁnition and description

ki The number of agent i’s neighbors

each agent’s ﬁnal payoff (see Section 2), N is the population (N =

You might also like