Part 3: Build dynamic and engaging mobile games with multi-agent reinforcement learning

July 3, 2023

5 minute read time.

Part 3 of 3 blog series

In part 2 of this blog series, we showed how the game AI agents were designed for our Candy Clash Demo. Part 3 looks at how the game runs on mobiles.

Performance on Arm-based devices

Up until now, I have mainly discussed how multi-agents work. I will now shift the discussion to the inference performance on Arm when this game is run on mobile devices. Let's start by looking at the performance when ML-Agents are executed.

ML-Agents execution time

Figure 1. ML-Agents execution time

The graph illustrates how the execution time varies as the number of agents increases. Light blue is the execution time when ML Agents are run on the CPU, while dark blue shows the execution time on the GPU. As you can see, the CPU is able to run ML Agents more quickly. This is because the ML Agents' NN model size is not large enough to utilize the GPU efficiently and data transfer between the CPU and GPU becomes a bottleneck. Additionally, the GPU is typically busy with graphics-intensive processing, which is another reason to favor using the CPU for ML Agents.

Frame-interleaving inference

Next, I would like to introduce frame-interleaving inference, an implementation technique I used to run the ML agents to improve overall performance. This technique is commonly used to distribute processing over time, and I applied it in the same way for the ML agent execution.

Frame-interleaving inference

Figure 2. Frame-interleaving inference

This diagram illustrates how the inference is performed frame by frame. The horizontal axis is time, with frames numbered from 1 to 8. We start with attackers on Team A at frame 1, followed by defenders and wanderers. We then switch to Team B from frame 4: attackers, defenders, and wanderers. We switch back to Team A again at frame 7. As a result, the NN model for one rabbit is executed every 6 frames. In this manner, we distribute the processing based on rabbit roles and teams in turn.

The reason for this distribution is that models with the same weights can be executed in a batch. Batch execution is a mechanism that allows models with the same weights to run together, making the execution time more efficient than running multiple models separately. The image below compares screenshots from the Unity profiler when only the attacker role is executed in a single frame (top) and when all three roles are executed (bottom). As you can see, since each role has the same weight, the models are batch-executed for each role. This means even executing one rabbit model of a different role can cause significant overhead, suggesting that executing one role per frame is the most efficient distribution.

Figure 3. Comparison of model executions between one role (top) and all roles (bottom) in a single frame

The planner's execution is also distributed, but in a different way. The planner for both teams runs every frame, but only one rabbit's role is updated per frame. Instead of updating all the rabbits simultaneously, we distribute the number of rabbits to be updated over time.

The model execution interval can be easily set using DecisionPeriod parameter in DecisionRequester component provided by ML-Agents. However, it does not support frame-interleaving inference processing, so you need to implement it yourself. Without such processing, all rabbit models would be executed on the same frame. In other words, if the interval is set to 6, all rabbit models would be executed on frames 1, 7, 13, and so on. We modified the existing DecisionRequester code to implement this feature. Hopefully, we will soon further generalize this code and contribute it to the ML-Agents GitHub repository.

How effective is frame-interleaving inference? The graph below illustrates the changes in frame rate with and without this technique. The horizontal axis is the number of agents, and the vertical axis is the frame rate.

Frame-interleaving inference execution time

Figure 4. Frame-interleaving inference

As the number of agents increases, the amount of processing required increases, leading to a decrease in the frame rate.

Deployment of ML Agents on Devices: Best Practices

There are several key considerations to keep in mind when deploying ML-Agents to mobile devices.

Profiling on Target Devices: When measuring performance, it's crucial to profile the performance on the target device. This ensures that your measurements are accurate and relevant to the actual use case.
Accounting for Pre-processing and Post-processing Times: The execution time of your NN model is not the only factor that affects overall performance. It's also important to account for the time spent on pre-processing, such as collecting input data for the NN model, and post-processing, which includes performing actions based on the output of the model.
Understanding Agent Capacity: To understand how many agents your device can handle, consider creating a plot similar to the ML-Agents execution time graph I showed earlier. As you can see from the graph, there's a correlation between the number of agents and the execution time. Therefore, you can predict the number of agents your device can handle within acceptable performance limits.
Using Multiple Small NN Models: Rather than using one large model, consider implementing multiple smaller NN models, as in the implementation we've discussed. Smaller models are easier to fine-tune during training and provide the advantage of being more modular, making them easier to replace or update.
Choosing Between CNN and MLP: In our implementation, we chose to use a CNN-based model for the planner. However, our experience shows some potential drawbacks to this approach. While CNNs are good at learning patterns in image-like information, they need large amounts of data and take significantly longer to train compared to MLPs. Therefore, you may want to consider whether an MLP-based model can achieve what you’re trying to do before deciding to use a CNN.

Conclusion

In conclusion, our exploration of multi-agent systems presents exciting potential for mobile gaming. We have demonstrated that carefully designed roles, dynamic strategies, and efficient use of computational resources can lead to complex and emergent behaviors in games. While there are challenges to be addressed – from ensuring efficient processing on mobile devices to fine-tuning the training of multiple models – the future is promising. With continual technological advancements, we look forward to even more compelling game experiences and improved performance in the mobile gaming space.

1 comment
0 members are here

AI blog

Bringing Generative AI to the masses with ExecuTorch and KleidiAI

Gian Marco Iodice

With the recent Arm SME2 announcement, the role of Arm KleidiAI is increasingly clear as Arm’s AI accelerator layer powering the next wave of AI.
- August 13, 2025
Yellow Teaming on Arm: A look inside our responsible AI workshop

Annie Tallund

Led a hands-on Yellow Teaming workshop at WeAreDevelopers, exploring Responsible AI and LLMs on Arm-powered tech.
- July 28, 2025
Arm at KubeCon and CloudNativeCon China 2025: Powering the future of Cloud Native AI

Fei Xiang

Arm energized KubeCon + CloudNativeCon China 2025, driving record dev engagement and showcasing cloud-native AI innovation on Arm-based infrastructure.
- July 21, 2025

AI blog

Announcements

Architectures and Processors blog

Automotive blog

Embedded and Microcontrollers blog

Internet of Things (IoT) blog

Laptops and Desktops blog

Mobile, Graphics, and Gaming blog

Operating Systems blog

Servers and Cloud Computing blog

SoC Design and Simulation blog

Tools, Software and IDEs blog

Part 3: Build dynamic and engaging mobile games with multi-agent reinforcement learning

Performance on Arm-based devices

Frame-interleaving inference

Deployment of ML Agents on Devices: Best Practices

Conclusion

Bringing Generative AI to the masses with ExecuTorch and KleidiAI

Yellow Teaming on Arm: A look inside our responsible AI workshop

Arm at KubeCon and CloudNativeCon China 2025: Powering the future of Cloud Native AI