Multi-Agent Simulation: A Key Function in Inference-Time Intelligence

Gadi_Singer · ‎10-26-2022

Published October 25th, 2022

Image credit: Local_doctor via Adobe Stock.

Gadi Singer is Vice President and Director of Emergent AI Research at Intel Labs leading the development of the third wave of AI capabilities.

Avoiding Combinatorial Explosions in What-if Scenarios Involving Multiple People or Intelligent Machines

We are about to see a significant change in the role of simulation to evaluate real-time what-if scenarios in materializing machine intelligence. I believe that it can play an even more purposeful role if expanded to include agent-based simulation at inference time. This type of computation seeks to iteratively resolve problems based on inputs from multiple agents (humans or other AIs) which is characteristic of more real-world learning. As such, it has the potential to impart multiple “models of mind” during the machine learning process and advance the next generation of AI.

What is simulation, really?

To ground the discussion below, we need to start with a definition of simulation in the context of this discussion.

Here, we define simulation as a method that uses a specialized model to mimic real or proposed system operations to provide evidence for decision-making under various scenarios or process changes.

Simulation uses a specialized model to mimic real or proposed system operations to provide evidence for decision-making under various scenarios or process changes.

To better understand how simulation is relevant to human cognition, consider a situation commonly encountered by humans — a meeting of a medium-sized group of individuals. For example, this could be a meeting of a school sports team and their coach before an important game or match. All the individuals in the meeting will have slightly different contexts and objectives.

Fig 1. A coach has a detailed model of each team player’s personality. Image source: Adobe Stock.

The coach will be able to simulate the unfolding of the meeting with a fairly high degree of precision and will actively utilize this simulation capability to plan what to say and how to achieve the best effect. What cognitive functions does this simulation require?

The coach must be able to keep track of what information is available to which individuals. Some information is public, like the name of the opposing team and the date of the match, while other information is private, like the health records of the individual players. She knows not to restate publicly known information unnecessarily, and to keep private information concealed.
She will need to model the mental and physical state of each player, as well as their objectives. She knows which players have been recently injured and which ones that have beaten their personal records. She understands that some are defending an already strong position while others are hoping for an opportunity to shine. She also knows which players respond well to challenges and which ones need extra encouragement.
She will continue to build her models of the players throughout the meeting. For example, if one child shows behavior that indicates strong personal growth, the coach will make note of it and adjust her future behavior accordingly.
Finally, the coach can model a sequence of potential interactions. For example, she knows that critiquing a player once will have a different effect than critiquing the same player three times in quick succession.

This causal multi-agent simulation capacity is at the very core of human social cognition. If we were to translate and refine the above features into more technical terms, we would need to extrapolate the following features as those which AI must have to exercise simulation more similarly to humans:

Ability to model, instantiate and update individual, distinguishable agents and other complex objects in the environment.
Ability to iterate through environment and agent states — i.e., AI would need to be capable of iteratively playing out sequences of relevant behaviors and interactions between the agents themselves and the agents with the environment.
Ability to model the behavior of each agent/object as a combination of generic and potentially custom functions (i.e., All children behave like F(x), and Kelly, in particular, has F(x=a) behavior).
Ability to track relevant input sequences and internal state (including state of knowledge) of each agent.

In the standard context of modern artificial intelligence, simulation does not typically include the above capabilities, especially at inference time.

Environmental simulation and its limitations

Most simulation-based AI research today focuses on problems like environmental simulation for the motion training of robots or autonomous vehicles. It is also used to compute an optimal action in reinforcement learning scenarios like video games. This type of simulation is based on a monolithic model — meaning that all inference is based on internally stored data. It is usually characterized by an explicitly defined objective (e.g. win the game). The AI agent’s objective does not account for potential qualitative changes in the environment or the objectives of other agents it must interact with.

Environmental simulation has achieved several impressive milestones. Notable among them is the work of Professor Joshua Tenenbaum and the team within the Department of Brain and Cognitive Sciences at MIT, who study simulation in the context of developmental milestones and physical scene understanding. In a similar vein, researchers at Google Brain have achieved more robust reasoning capabilities in large language models by injecting information from a physics simulation engine. And OpenAI’s Dota bot is the first AI bot to ever beat a world champion e-sports team in Dota 2, an online, multiplayer battle arena game.

Still, standard approaches in machine learning lack several features:

The simulations are typically run during training time rather than at inference time.
The simulation environment is typically “faceless” in that it doesn’t include complex, continuously evolving agents whose behavior can vary depending on the preceding sequence of interactions.
They cannot model agents acting on different objectives, something that humans do with ease. Such would require a type of simulation that incorporates a more complex world model and theory of mind — those key tenets of advanced intelligence that are so seamlessly embedded in the developing brain of a child and manifested in the crayon drawings of a kindergartener.

Open-ended real-world interactions involve agents acting on a variety of objectives, and therefore cannot be easily simulated using the paradigm of the best possible action given the environmental state. Furthermore, reinforcement learning (which is the paradigm traditionally used in this context) is already beset with immense state spaces, even for narrowly defined environments that are currently used today.

Zeroing in on causal agent-based simulation

Most machine learning does not incorporate multi-agent simulation, which is largely computationally prohibitive due to the explosion in the size of the sample space that it causes. This is a barrier that must be crossed to give AI the anticipatory capability it needs to address some of the world’s more overarching problems.

Could there be an approach that overcomes this computational intractability of an open-ended, multi-agent environment and that allows AI agents to become usefully integrated into such environments?

First, let’s more precisely describe where the computational intractability of traditional end-to-end approaches comes from.

Most of the intelligent tasks targeted by AI-based solutions today are non-situational, in the sense that the output is not dependent on the context or the specific situation in which the query is made. They also do not track the recent history of particular individuals or complex objects in their environment. In contrast, humans always apply their intelligence in a very strong contextual/situational setting; they are rarely ‘generic’ in their responses. Next-generation AI must incorporate representational constructs and functional modeling to rectify this gap.

When an AI with situational intelligence is placed in an environment with multiple complex agents, it must be able to perform two key functions:

track the input and previous behavior of those agents;
simulate what-if scenarios with potential response sequences and determine how those sequences might impact the environment and those agents.

Within current approaches, the system tries to create a comprehensive input-to-output function (e.g., implemented as a massive scale neural network) so that when presented with a situation, it can predict or recommend the next step. To map a multi-agent setting to such a “flat” input-to-output function, it needs to unroll all the potential sequences and multi-agent interactions during training, which can quickly become intractable.

However, if the paradigm is changed to use simulation of “what-if” scenarios during inference, there is no need to unroll a large combinatorial space. One would only simulate the relevant sequences to be evaluated at inference time. This would involve an infinitesimally smaller number of sequences, thus avoiding a combinatorial explosion.

In such cases, causal simulation with encapsulated agent models is not only the most efficient way of achieving the desired outcome but the only way. This simulation would allow the agent to interact with partial what-if scenarios without the need to unroll the entire environment at once. Reasoning could then be performed by iteratively going from non-viable to viable scenarios.

To illustrate this process, consider our earlier example of a sports team and coach. Let’s say we have ten players (agents), each of which has 100 possible behaviors. Our AI tries to generate potential what-if scenarios to choose the best course of action. If an AI tries to learn a model of each of the ten agents executing each of the possible behaviors for each possible environmental state, this would result in a massive combinatorial explosion. But in any realistic scenario, only a small fraction of agents’ behaviors and world states would be relevant. If the agent models are individually encapsulated and separated from the world model, the AI could perform a search to first select the relevant behaviors and world states, and then only unroll those simulated scenarios which would be causally likely and relevant.

This would be akin to a monolithic embedding space (learned by an end-to-end network) that is disentangled into discrete units, each holding the representation of the relevant environment or individual agent. These discrete units could then be queried to generate counterfactual scenarios, thereby containing the combinatorial explosion.

Summary

As AI systems move from the lab and into businesses and homes, they will require new capabilities to become more adaptive, situational, deeply contextual, and adept in persistent interaction with the people and entities around them. Causal agent-based simulation holds the key to the next generation of AI solutions. It addresses two massive needs: the need to support the human labor force with cooperative AI-based agents and perform tasks that rely on situation awareness but are beyond human capacity. Making these advances tractable and scalable will inevitably require the modularization of AI architectures to enable inference-time simulation capabilities.

References

Wikipedia contributors. (2022, October 10). Simulation. Wikipedia. https://en.wikipedia.org/wiki/Simulation
What is Simulation? What Does it Mean? (Definition and Examples). (n.d.). TWI. Retrieved October 24, 2022, from https://www.twi-global.com/technical-knowledge/faqs/faq-what-is-simulation
Li, Y., Hao, X., She, Y., Li, S., & Yu, M. (2021). Constrained motion planning of free-float dual-arm space manipulator via deep reinforcement learning. Aerospace Science and Technology, 109, 106446.
Pérez-Gil, Ó., Barea, R., López-Guillén, E., Bergasa, L. M., Gómez-Huélamo, C., Gutiérrez, R., & Díaz-Díaz, A. (2022). Deep reinforcement learning based control for autonomous vehicles in carla. Multimedia Tools and Applications, 81(3), 3553–3576.
Joshua Tenenbaum. (2022, October 6). MIT-IBM Watson AI Lab. https://mitibmwatsonailab.mit.edu/people/joshua-tenenbaum/
Battaglia, P. W., Hamrick, J. B., & Tenenbaum, J. B. (2013). Simulation as an engine of physical scene understanding. Proceedings of the National Academy of Sciences, 110(45), 18327–18332.
Liu, R., Wei, J., Gu, S. S., Wu, T. Y., Vosoughi, S., Cui, C., … & Dai, A. M. (2022). Mind’s Eye: Grounded Language Model Reasoning through Simulation. arXiv preprint arXiv:2210.05359.
Berner, C., Brockman, G., Chan, B., Cheung, V., Dębiak, P., Dennison, C., … & Zhang, S. (2019). Dota 2 with large scale deep reinforcement learning. arXiv preprint arXiv:1912.06680.
Piper, K. (2019, April 14). OpenAI’s Dota AI beats pro team OG as first AI to defeat reigning world champions. Vox. https://www.vox.com/2019/4/13/18309418/open-ai-dota-triumph-og
Singer, G. (2022, August 17). Beyond Input-Output Reasoning: Four Key Properties of Cognitive AI. Medium. https://towardsdatascience.com/beyond-input-output-reasoning-four-key-properties-of-cognitive-ai-3f82cde8cf1e
Singer, G. (2022b, October 7). Advancing Machine Intelligence: Why Context Is Everything. Medium. https://towardsdatascience.com/advancing-machine-intelligence-why-context-is-everything-4bde90fb2d79