1
$\begingroup$

I've been fiddling around with a card game which is a slight variation of President game (wikipedia link), which I used to play a lot on my university during breaks with friends, as a personal project to learn a bit more about reinforcement learning, I've coded the game logic and trained a deep reinforcement learning model to play against itself (Github link to my code) (I used deep RL since the state space is of order close to $2^{80}$, so a regular Q-table uses around 40GB while inputting everything as int8), so far my model wins around 80% of the time on 1 versus 1 games versus a random agent, and ends first around 43% of the time when playing 4 player games against random agents.

One key feature of the game is that when playing 1 versus 1, the game is of perfect information, as all the cards are dealt and therefore you know what cards your opponent is holding. Therefore there has to be an optimal strategy.

I'm looking for references with regard to learning optimal policies for competitive probabilistic games like this. So far all my training has been done with deep q learning with time punishing rewards (i.e. the model awards a reward of $-1$ for every play that doesn't win the game and $0$ if the play wins or the agent has already won before), but I do not know how to test how 'optimal' is the deep-q-leraning policy other than testing against random agents.

There's 3 main things i'd like to address:

  1. What kind of reward functions are used in this kind of framework?
  2. Is there a metric for optimallity that is common to this kind of problems?
  3. Is deep Q-learning a good model to find optimal policies, or is there another approach that may yield better results?
$\endgroup$

0

You must log in to answer this question.