Reviewing different Neural Network Models for Multi-Agent games on Arm using Unity

September 11, 2023

6 minute read time.

During the Game Developer Conference (GDC) in March 2023, we showcased our multi-agent demo called Candy Clash, a mobile game containing 100 intelligent agents. In the demo, the agents are developed using Unity’s ML-Agents Toolkit which allows us to train them using reinforcement learning (RL). To find out more about the demo and its development, see our previous blog series. Previously, the agents had a simple Multi-Layer Perceptron (MLP) Neural Network (NN) model. This blog explores the impact of using other types of neural networks models on the gaming experience and performance.

Candy Clash game demo

Candy Clash demo

The Game Developer Conference was held near Easter, which inspired the setup for the Candy Clash demo. In Candy Clash, there are two teams of rabbits, both with an egg to protect. The aim is for the rabbits to either:

Break their opponent’s egg.
Defeat all their opponents.

There are 3 rabbit roles: Attacker, Defender, and Wanderer. There is also a Planner agent which dynamically assigns the roles of the rabbits during play.

Agent roles in the Candy Clash demo

Figure 1: Different agent roles in the Candy Clash demo

The rabbits’ behaviors are created by training the rabbits with different reward functions. This is enforced by their policy, which is effectively the rabbits’ “brain”. Their policy takes the observation as input and determines an action as output. The policy is trained to maximize the reward by providing the optimal action for any given observation. For more details see ML-Agents official documentation. The Neural Network (NN) models are used to model the optimal policy, which is the same model for all of the rabbits. However, the policy modelled by the NN will be different after training. This is because the weights are adjusted to model the optimal policy to give the maximum reward.

RL and NN models

The original demo used the simple MLP network. This is one of many networks which can be used to model the policy. Unity’s ML-Agents Toolkit allows the user to introduce slight changes to the baseline MLP network by changing the settings in the yaml configuration file. See Unity's official documentation. For example, you can add “memory” to the agent. This, in NN model terms, means adding a recurrent NN layer. In ML-Agents this is a single Long-Short Term Memory (LSTM) layer. This network is more complex than the simple MLP network. It enables the agent to learn which observations to remember and observations to forget. To set up a network with an LSTM layer in Unity, the following changes were made to the configurations file.

Fullscreen

1
2
3
4
network_settings:
        normalize: false
        hidden_units: 64
        num_layers: 2
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

network_settings:
        normalize: false
        hidden_units: 64
        num_layers: 2

Code 1: Network configuration for the MLP Network

Fullscreen

1
2
3
4
5
6
7
network_settings:
        normalize: false
        hidden_units: 64
        num_layers: 1 
        memory:
          sequence_length: 32
          memory_size: 32
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

network_settings:
        normalize: false
        hidden_units: 64
        num_layers: 1 
        memory:
          sequence_length: 32
          memory_size: 32

Code 2: Network configuration for the LSTM Network

Figure 2 shows the NN models of the different configuration settings and the resultant ONNX models after training. The differences are highlighted by the orange and blue boxes. The blue shows the LSTM layer and the recurrent inputs and outputs. The LSTM model is more complex.

MLP NN structure and LSTM NN structure

Figure 2: MLP NN structure and LSTM NN structure

Impact in game experience and performance

To investigate the effect of changing the model for the mobile gaming experience, we trained new Wanderer agents using the LSTM-based model for 50 million steps. This is the same number used to train the original MLP model. In this trial, we only retrained the Wanderer rabbit because the gaming experience improvements are easier to observe in Wanderer vs. Wanderer games. Also, any performance improvements are similar irrespective of rabbit type. To explore the impact of the different NN models on the gaming experience, we looked at the relative “intelligence” of the agents and the games frame rate performance.

Both network models were trained using self-play. To learn more about training setup, see the previous blog series about the Boss Battle Demo. A consequence of training the models using self-play is that there is no readily available metric to compare the models’ “intelligence”. Consequently, we decided to play the Wanderer rabbits with the two different NN models against each other. See Figure 3:

Candy clash demo using different NN models

Figure 3: Wanderers with LSTM NN (Blue) vs. Wanderer with MLP (orange)

There are only Wrabbits on both teams. The aim of the Wanderer rabbits is to defeat all their opponents, so the Wanderers reward function encourages defeating and attacking the opponents. The orange rabbits have the old MLP model and the blue rabbits have the new LSTM model, which means there are slight differences in strategy. Both teams employ a form of wave attacks, but the LSTM model team stays in groups and has a more defensive strategy. The rabbits with the LSTM-based model appear to win more consistently than the MLP-based rabbits. See Table 1 for the result after 11 games. This suggests that they have learnt a better strategy, at least against the simpler MLP network agents, which extrapolates to suggest that they are cleverer.

Number of wanderers	20	80
LSTM team wins	7	10
LSTM team losses	4	1
Fractions of wins	0.636364	0.909091

Table 1: Number of wins for LSTM model-based rabbit team against MLP model-based rabbits.

Note: Games were played with only Wanderer rabbits.

Another aspect of the performance metric is the inference performance, which influences the frame rate. The game ran on a Google Pixel 7 Pro with inference only on the CPU. To understand the reasoning behind this, see our previous blog. When comparing the inference time between the two, there is a slight increase in time taken for the computation for the LSTM-based model. The decide Action is approximately 1.6 times slower, see Figure 4.

Graph showing Inference Time comparison between LSTM based model and MLP based model.

Figure 4: Inference Time comparison between LSTM based model and MLP based model.

The Decide action function call is the call in Unity’s ML-agents when the actual model computation is run. This is a relatively significant increase in computation time. However, the Frames Per Second (FPS) profiling runs indicated that it can still run close to 60 FPS with 100 Wanderer agents with LSTM-based NN, see Figure 5.

Graph showing FPS comparison between MLP and LSTM models for the Wanderer rabbits

Figure 5: FPS comparison between MLP and LSTM models for the Wanderer rabbits

The results show that the non-player character (NPC) behavior is improved by changing the NN used. This leads to a more engaging gaming experience. The trade-off is increased inference time, which might result in game performance reduction. However, no drops in FPS were seen in our results.

Conclusion

Using different NN models for the policy when training multi-agent systems results in different and perhaps “cleverer” behaviors. This may lead to a more engaging gaming experience. However, the trade-off is that, as models become more complex, the model size and inference time increases, which affects the game’s performance. This potential drop in performance might be offset by reducing the number of intelligent NPC in the game. This also affects the inference time.Therefore, finding the right trade-off which is optimal for a particular gaming scenario is difficult. However, as mobiles become more efficient at processing data, and new NN are available, there are more opportunities to explore the use of different models to get the most engaging gaming experience.

0 comments
0 members are here

AI blog

Bringing Generative AI to the masses with ExecuTorch and KleidiAI

Gian Marco Iodice

With the recent Arm SME2 announcement, the role of Arm KleidiAI is increasingly clear as Arm’s AI accelerator layer powering the next wave of AI.
- August 13, 2025
Yellow Teaming on Arm: A look inside our responsible AI workshop

Annie Tallund

Led a hands-on Yellow Teaming workshop at WeAreDevelopers, exploring Responsible AI and LLMs on Arm-powered tech.
- July 28, 2025
Arm at KubeCon and CloudNativeCon China 2025: Powering the future of Cloud Native AI

Fei Xiang

Arm energized KubeCon + CloudNativeCon China 2025, driving record dev engagement and showcasing cloud-native AI innovation on Arm-based infrastructure.
- July 21, 2025

AI blog

Announcements

Architectures and Processors blog

Automotive blog

Embedded and Microcontrollers blog

Internet of Things (IoT) blog

Laptops and Desktops blog

Mobile, Graphics, and Gaming blog

Operating Systems blog

Servers and Cloud Computing blog

SoC Design and Simulation blog

Tools, Software and IDEs blog

Reviewing different Neural Network Models for Multi-Agent games on Arm using Unity

Candy Clash demo

RL and NN models

Impact in game experience and performance

Conclusion

Bringing Generative AI to the masses with ExecuTorch and KleidiAI

Yellow Teaming on Arm: A look inside our responsible AI workshop

Arm at KubeCon and CloudNativeCon China 2025: Powering the future of Cloud Native AI