Arm Community
Arm Community
  • Site
  • User
  • Site
  • Search
  • User
Arm Community blogs
Arm Community blogs
AI blog Part 3: Build dynamic and engaging mobile games with multi-agent reinforcement learning
  • Blogs
  • Mentions
  • Sub-Groups
  • Tags
  • Jump...
  • Cancel
More blogs in Arm Community blogs
  • AI blog

  • Announcements

  • Architectures and Processors blog

  • Automotive blog

  • Embedded and Microcontrollers blog

  • Internet of Things (IoT) blog

  • Laptops and Desktops blog

  • Mobile, Graphics, and Gaming blog

  • Operating Systems blog

  • Servers and Cloud Computing blog

  • SoC Design and Simulation blog

  • Tools, Software and IDEs blog

Tags
  • Game Developers Conference (GDC)
  • Machine Learning (ML)
  • Unity
  • Graphics and Gaming
Actions
  • RSS
  • More
  • Cancel
Related blog posts
Related forum threads

Part 3: Build dynamic and engaging mobile games with multi-agent reinforcement learning

Koki Mitsunami
Koki Mitsunami
July 3, 2023
5 minute read time.
Part 3 of 3 blog series


In part 2 of this blog series, we showed how the game AI agents were designed for our Candy Clash Demo. Part 3 looks at how the game runs on mobiles.

Performance on Arm-based devices

Up until now, I have mainly discussed how multi-agents work. I will now shift the discussion to the inference performance on Arm when this game is run on mobile devices. Let's start by looking at the performance when ML-Agents are executed.

 ML-Agents execution time

Figure 1. ML-Agents execution time

The graph illustrates how the execution time varies as the number of agents increases. Light blue is the execution time when ML Agents are run on the CPU, while dark blue shows the execution time on the GPU. As you can see, the CPU is able to run ML Agents more quickly. This is because the ML Agents' NN model size is not large enough to utilize the GPU efficiently and data transfer between the CPU and GPU becomes a bottleneck. Additionally, the GPU is typically busy with graphics-intensive processing, which is another reason to favor using the CPU for ML Agents.

Frame-interleaving inference

Next, I would like to introduce frame-interleaving inference, an implementation technique I used to run the ML agents to improve overall performance. This technique is commonly used to distribute processing over time, and I applied it in the same way for the ML agent execution.

 Frame-interleaving inference

Figure 2. Frame-interleaving inference

This diagram illustrates how the inference is performed frame by frame. The horizontal axis is time, with frames numbered from 1 to 8. We start with attackers on Team A at frame 1, followed by defenders and wanderers. We then switch to Team B from frame 4: attackers, defenders, and wanderers. We switch back to Team A again at frame 7. As a result, the NN model for one rabbit is executed every 6 frames. In this manner, we distribute the processing based on rabbit roles and teams in turn.

The reason for this distribution is that models with the same weights can be executed in a batch. Batch execution is a mechanism that allows models with the same weights to run together, making the execution time more efficient than running multiple models separately. The image below compares screenshots from the Unity profiler when only the attacker role is executed in a single frame (top) and when all three roles are executed (bottom). As you can see, since each role has the same weight, the models are batch-executed for each role. This means even executing one rabbit model of a different role can cause significant overhead, suggesting that executing one role per frame is the most efficient distribution.


Figure 3. Comparison of model executions between one role (top) and all roles (bottom) in a single frame

The planner's execution is also distributed, but in a different way. The planner for both teams runs every frame, but only one rabbit's role is updated per frame. Instead of updating all the rabbits simultaneously, we distribute the number of rabbits to be updated over time.

The model execution interval can be easily set using DecisionPeriod parameter in DecisionRequester component provided by ML-Agents. However, it does not support frame-interleaving inference processing, so you need to implement it yourself. Without such processing, all rabbit models would be executed on the same frame. In other words, if the interval is set to 6, all rabbit models would be executed on frames 1, 7, 13, and so on. We modified the existing DecisionRequester code to implement this feature. Hopefully, we will soon further generalize this code and contribute it to the ML-Agents GitHub repository.

How effective is frame-interleaving inference? The graph below illustrates the changes in frame rate with and without this technique. The horizontal axis is the number of agents, and the vertical axis is the frame rate.

 Frame-interleaving inference execution time

Figure 4. Frame-interleaving inference

As the number of agents increases, the amount of processing required increases, leading to a decrease in the frame rate.

Deployment of ML Agents on Devices: Best Practices

There are several key considerations to keep in mind when deploying ML-Agents to mobile devices.

  • Profiling on Target Devices: When measuring performance, it's crucial to profile the performance on the target device. This ensures that your measurements are accurate and relevant to the actual use case.
  • Accounting for Pre-processing and Post-processing Times: The execution time of your NN model is not the only factor that affects overall performance. It's also important to account for the time spent on pre-processing, such as collecting input data for the NN model, and post-processing, which includes performing actions based on the output of the model.
  • Understanding Agent Capacity: To understand how many agents your device can handle, consider creating a plot similar to the ML-Agents execution time graph I showed earlier. As you can see from the graph, there's a correlation between the number of agents and the execution time. Therefore, you can predict the number of agents your device can handle within acceptable performance limits.
  • Using Multiple Small NN Models: Rather than using one large model, consider implementing multiple smaller NN models, as in the implementation we've discussed. Smaller models are easier to fine-tune during training and provide the advantage of being more modular, making them easier to replace or update.
  • Choosing Between CNN and MLP: In our implementation, we chose to use a CNN-based model for the planner. However, our experience shows some potential drawbacks to this approach. While CNNs are good at learning patterns in image-like information, they need large amounts of data and take significantly longer to train compared to MLPs. Therefore, you may want to consider whether an MLP-based model can achieve what you’re trying to do before deciding to use a CNN.

Conclusion

In conclusion, our exploration of multi-agent systems presents exciting potential for mobile gaming. We have demonstrated that carefully designed roles, dynamic strategies, and efficient use of computational resources can lead to complex and emergent behaviors in games. While there are challenges to be addressed – from ensuring efficient processing on mobile devices to fine-tuning the training of multiple models – the future is promising. With continual technological advancements, we look forward to even more compelling game experiences and improved performance in the mobile gaming space.

Anonymous
  • stephen.su
    stephen.su over 1 year ago

    Nice work!!

    • Cancel
    • Up 0 Down
    • Reply
    • More
    • Cancel
AI blog
  • Unlocking audio generation on Arm CPUs to all: Running Stable Audio Open Small with KleidiAI

    Gian Marco Iodice
    Gian Marco Iodice
    Real-time AI audio on Arm: Generate 10s of sound in ~7s with Stable Audio Open Small, now open-source and ready for mobile.
    • May 14, 2025
  • Deploying PyTorch models on Arm edge devices: A step-by-step tutorial

    Cornelius Maroa
    Cornelius Maroa
    As AI adoption in edge computing grows, deploying PyTorch models on ARM devices is becoming essential. This tutorial guides you through the process.
    • April 22, 2025
  • Updates in KleidiCV: Multithreading support and OpenCV 4.11 integration

    Mark Horvath
    Mark Horvath
    What's new with KleidiCV 0.2.0 and 0.3.0? Updates include new features and performance enhancements.
    • February 25, 2025