10
$\begingroup$

For an evolutionary extensive form game is the use of multi-agent systems a necessity or can more traditional approximate Nash equilibria and mean-field interactions suffice?

Few games can be represented completely by a simple tree and solved exactly from the outset, or even to the point where a direction to the course can be established. Often there are too many players whom don't utilize an expected strategy, usually the rules are often not completely know or evolving as new things come into play.

An example where the rules are simple but there's no strategy is matching pennies, and an example where there are expected rules which aren't necessarily always followed is the coordination game. In more realistic examples nothing is so simple.

Realistic games have a long time horizon, partially-observed state, and high-dimensional, continuous action and observational space.

This makes choosing a specific strategy difficult if not impossible.

An example relevant to this question is: In Dota 2 agents were trained to play against themselves.

There is an interesting Q&A on Stats.SE titled: "The Two Cultures: statistics vs. machine learning?" which contains a few excellent points:

  • "Methodological Statistics papers are still overwhelmingly formal and deductive, whereas Machine Learning researchers are more tolerant of new approaches even if they don't come with a proof attached."

  • "The biggest difference I see between the communities is that statistics emphasizes inference, whereas machine learning emphasized prediction. When you do statistics, you want to infer the process by which data you have was generated. When you do machine learning, you want to know how you can predict what future data will look like w.r.t. some variable."

  • "Ken Thompson quote: ''When in doubt, use brute force''.
    In this case, machine learning is a salvation when the assumptions are hard to catch; or at least it is much better than guessing them wrong."

(Phrased differently) Without asking a new question (the only question asked is in the first sentence), at what point are agents beyond using explicit calculations and their proofs and we are better off letting the program write its own equations (learn the questions to ask, and how to solve them)? There's a couple of examples below where the program knew nothing, it's only input was the pixels on the screen (and the score), still it was able to decide what the rules must be and how best to exploit them; beating human players.

Definition: Approximation Theory

References:

  • "AlphaStar: Mastering the Real-Time Strategy Game StarCraft II", article by the AlphaStar and DeepMind teams.

    "AlphaStar interacted with the StarCraft game engine directly via its raw interface, meaning that it could observe the attributes of its own and its opponent’s visible units on the map directly, without having to move the camera - effectively playing with a zoomed out view of the game. ... its neural network architecture is capable of modelling very long sequences of likely actions - with games often lasting up to an hour with tens of thousands of moves - based on imperfect information. Each frame of StarCraft is used as one step of input, with the neural network predicting the expected sequence of actions for the rest of the game after every frame.".

  • "Human-level performance in 3D multiplayer games with population-based reinforcement learning" (May 31 2019), by Max Jaderberg, Wojciech M. Czarnecki, Iain Dunning, Luke Marris, Guy Lever, Antonio Garcia Castañeda, Charles Beattie, Neil C. Rabinowitz, et al.

    "Conclusion: In this work, we have demonstrated that an artificial agent using only pixels and game points as input can learn to play highly competitively in a rich multiagent environment: a popular multiplayer first-person video game. This was achieved by combining PBT of agents, internal reward optimization, and temporally hierarchical RL with scalable computational architectures. The presented framework of training populations of agents, each with their own learned rewards, makes minimal assumptions about the game structure and therefore could be applicable for scalable and stable learning in a wide variety of multiagent systems.".

  • "Population Based Training of Neural Networks", (arxiv paper), (Nov 28 2017), by Max Jaderberg, Valentin Dalibard, Simon Osindero, Wojciech M. Czarnecki, Jeff Donahue, Ali Razavi, Oriol Vinyals, Tim Green, Iain Dunning, Karen Simonyan, Chrisantha Fernando, and Koray Kavukcuoglu

    "Conclusions: We have presented Population Based Training, which represents a practical way to augment the standard training of neural network models. We have shown consistent improvements in accuracy, training time and stability across a wide range of domains by being able to optimise over weights and hyperparameters jointly. It is important to note that contrary to conventional hyperparameter optimisation techniques, PBT discovers an adaptive schedule rather than a fixed set of hyperparameters. We already observed significant improvements on a wide range of challenging problems including state of the art models on deep reinforcement learning and hierarchical reinforcement learning, machine translation, and GANs. While there are many improvements and extensions to be explored going forward, we believe that the ability of PBT to enhance the optimisation process of new, unfamiliar models, to adapt to non-stationary learning problems, and to incorporate the optimisation of indirect performance metrics and auxiliary tasks, results in a powerful platform to propel future research.".

  • "Approximate Nash Equilibria in Partially Observed Stochastic Games with Mean-Field Interactions" (June 5 2018), by Naci Saldi, Tamer Basar, and Maxim Raginsky

    "Conclusion This paper has considered discrete-time partially observed mean-field games subject to infinite-horizon discounted cost, for Polish state, observation, and action spaces. Under mild conditions, the existence of a Nash equilibrium has been established for this game model using the conversion of partially observed Markov decision processes to fully observed Markov decision processes in the belief space and then using the dynamic programming principle. We have also established that the mean-field equilibrium policy, when used by each agent, constitutes a nearly Nash equilibrium for games with sufficiently many agents.".

The single question I ask is in the first sentence.

$\endgroup$
0

0