Part 2
Python code walkthrough
This is part 2 of a series of blog posts. Please read part 1 first.
In this 2nd blog post I want to present a brief description of the python code of my implementation of the iterated prisoner's dilemma experiment.
All agent classes are derived from the Agent base class. A simplified version of it looks like this:
class Agent:
no_of_instances = 0
def __init__(self) -> None:
self.instance_no = Agent.no_of_instances
Agent.no_of_instances += 1
def reset(self):
self.history = []
self.payoff = 0
def get_decision(self): # abstract method
return None
def set_result(self, result, payoff):
self.history.append(result)
self.payoff += payoff
From this base class many agent variants with different behavior can be constructed. We will discuss some of them later.
Now a population of agents needs to be generated. The frequency of each agent class can be defined:
agent_distribution = [
(DefectAgent,0.15),
(CooperateAgent,0.15),
(TitForTatAgent,0.15),
(ForgivingTitForTatAgent,0.4),
#(GrimTriggerAgent,0.1),
(RandomAgent,0.05),
#(RNNPredictorAgent,0.1),
#(LSTMPredictorAgent,0.075),
(OptimisticRNNPredictorAgent,0.05),
(LookAheadRNNPredictorAgent,0.05),
#(SmartLearnLookAheadRNNPredictorAgent,0.1)
#(ThresholdedOptimisticRNNPredictorAgent,0.05)
]
The creation probabilities (second element of tuple) should add up to 1. This is checked and an error is output if agent_distribution is not configured correctly.
Now NO_OF_AGENT agents are generated according to this distribution (i.e. the agents list is populated):
for i in range(NO_OF_AGENTS):
# find type of agent
ag = [ a[0] for a in agent_distribution ]
weights = [ a[1] for a in agent_distribution ]
selected_agent = choices(ag,weights)[0]
# get parameter list of agent class
required_params = selected_agent.get_config_options()
params = {}
for p in required_params:
params[p] = hyperparameters[p]
# create agent
agents.append(selected_agent(**params))
Now, for a number of training episodes, the following steps are repeated:
1 - Two different (!) agents are chosen randomly:
while True:
agent1 = choice(agents)
agent2 = choice(agents)
if agent1 != agent2:
break
2 - Before each encounter with another agent, the reset methods of both agent instances must be called which resets the history list to [] and the payoff variable to 0:
agent1.reset()
agent2.reset()
3 - Then we let the two agents play a (fixed) number of steps against each other. Note that this number of steps is always unknown to the agents:
for j in range(steps_per_episode):
# get the decisions from both agents
cooperating1 = agent1.get_decision() # some parameters omitted
cooperating2 = agent2.get_decision()
# communicate the results back to the two agents
if cooperating1 and cooperating2:
agent1.set_result((True, True),-2)
agent2.set_result((True, True),-2)
elif (not cooperating1) and (not cooperating2):
agent1.set_result((False, False),-5)
agent2.set_result((False, False),-5)
elif cooperating1 and (not cooperating2):
agent1.set_result((True, False),-10)
agent2.set_result((False, True),0)
else: # (not cooperating1) and cooperating2
agent1.set_result((False, True),0)
agent2.set_result((True, False),-10)
This are the key ideas implememented in the code. Now let's look at some examples of agent classes:
DefectAgent and CooperateAgent are agents with very simple constant response.
class DefectAgent(Agent):
def get_decision(self, learning_phase, maturity):
return False
class CooperateAgent(Agent):
def get_decision(self, learning_phase, maturity):
return True
The well known Tit for Tat agent always starts with cooperation. In later steps the decision depends on the previous move of the adversary: if the adversary defected in the last iteration, it will defect too, otherwise it will cooperate.
class TitForTatAgent(Agent):
def get_decision(self, learning_phase, maturity):
if len(self.history)==0:
return True # always start with cooperation!
else:
return self.history[-1][1] # repeat opponents last decision
Another classic agent is the Grim Trigger agent. It starts - like the Tit for Tat agent - always cooperating. In later decisions, the agent only cooperates if the adversary has never defected before.
class GrimTriggerAgent(Agent):
def get_decision(self, learning_phase, maturity):
if len(self.history)==0:
return True # start with cooperation
else:
# if the opponent defected once -> always defect from then on
cooperate = True
for h in self.history:
if h[1] == False:
cooperate = False
return cooperate
The more advanced agents which make this implementation interesting are based on neural networks. I have tried two different architectures and the simpler (RNN) surprisingly seems to work better than the more sophisticated one (LSTM). They all try to predict the adversary's next move from the history list of previous moves and implement different strategies of how to react based on this prediction.
Due to the nature of the experiments, the learning process cannot use batches. This makes learning rather inefficient. The agent based on tree search (class LookAheadRNNPredictorAgent) is computationally very expensive. To speed up the experiment, it helps to create only a minimal fraction of such agents.
The complete python code for this experiment is available for download on my Github page (MIT license).
In the next blog post (following soon), I will present some results and insights from my experiments. Stay tuned.
Follow me on X to get informed about new content on this blog.