Bringing realistic clothing simulation to mobile: A new frontier for game developers

June 6, 2025

7 minute read time.

As mobile games become more visually ambitious, players expect their favorite titles to rival console-quality experiences. That means better lighting, richer environments—and, increasingly, more realistic characters. One detail that’s been difficult to get right on mobile, however, is clothing. Not just texture or appearance, but the way clothes move, fold, stretch, and react to the characters and their environment.

Traditionally, clothing simulation has relied on physics-based methods—techniques that require substantial computational power to calculate how fabric behaves. While this is feasible on high-end PCs, it’s a serious challenge on mobile devices, which are inherently limited in processing power, thermal budget, and battery life.

But what if we could bring realistic cloth simulation to mobile without those heavy physics calculations? That’s the challenge we took on, and it led us to neural graphics.

Why clothing simulation matters

In modern games, clothing is more than a cosmetic detail. It plays a key role in how characters are perceived and how their actions are communicated. A cape billowing in the wind, a hoodie bouncing as a character sprints, or a skirt flowing when dancing—all these elements enhance realism and responsiveness.

For extended reality (XR) and interactive storytelling, such nuances contribute to immersion. On mobile, though, these effects have been difficult to achieve. The usual simulation methods are simply too demanding to run in real time on phones and tablets.

The challenge of real-time clothing simulation

Physically based simulations work well, but they’re compute-intensive. They require solving complex equations frame-by-frame, which eats up processing power and drains battery life. Real-time cloth must react instantly to unpredictable gameplay events (like sudden character movements or environmental changes), which rules out extensive pre-baking and forces developers to choose simplified approximations that cannot fully capture realistic bending, stretching, inertia, gravity and collision. The challenge deepens in customisable XR experiences, where players swap garments and avatars on the fly: here, the simulation must stay robust and flexible even when the characters, clothes, and animations are unknown until runtime. Encouragingly, recent advances in lightweight machine-learning models are beginning to bridge the gap, delivering convincing cloth behaviour at a fraction of the traditional computational cost.

A smarter way: neural clothing

Our goal was to deliver high-fidelity cloth simulation that runs in real time, directly on mobile devices—without requiring ground-truth simulation data.

To do that, we focused on four core requirements:

Realism: Capturing not just static wrinkles, but importantly dynamic deformations.
Generalization: Ensuring we can use a single model that works across different characters, garments, and animations.
Unsupervised Learning: Eliminating the need for precomputed simulation data, which is difficult to obtain.
Mobile Compatibility: Prioritizing low resource usage and fast performance.

We explored a range of machine learning approaches. Recurrent Neural Network (RNN) methods were lightweight but could not generalize beyond specific garments. Graph Neural Network (GNN) solutions demonstrated great generalization but were too slow. And similarly, Graph Attention Network (GAT) approaches, while powerful, had limitations in terms of training and deployment.

Inspired by great research from academia, we built a model to address these challenges.

Our solution: A mobile-optimized graph attention network

We developed a hierarchical GAT tailored specifically for mobile graphics. This model delivers realistic cloth motion while being lean enough to run on modern mobile chips.

Graph shows the attention neural network with encoder and decoder

The input to the model is the garment nodes together with their closest body vertex (within a radius) and the output is the predicted velocities for each vertex. Our GAT’s latent size consists of 4 attention heads and 32 channels. We are performing 3 message passing steps for each of the 4 levels – basic topology and 3 coarse levels.

As we are using GAT, collision information is passed as vertex instead of dynamically constructed edges. This way we ensure that the graph topology is static, and we can eliminate scatter-reduce operations, which can be computationally expensive and not well supported. The current model does include edge features, but importantly not stateful edge features. Furthermore, we remove the overhead from dynamic shaped inputs. The number of edges or vertices remains constant, only the connectivity changes frame-to-frame to reflect which vertex is the closest.

As always, we have profiled the model on device and identified a number of operators that are either unsupported on GPU or performing sub-optimally and where appropriate we swapped them with others. We recommend doing that for any use-case where the performance is critical.

Hierarchical GAT graph

Finally, we have experimented with a range of configurations to accommodate different quality and performance targets. Our “tiny model” includes only 3 message passing steps across base topology and 2 coarse resolution levels for even lighter usage.

The system learns cloth behavior without any pre-labeled data. We trained it in a fully unsupervised way, using a physically-based loss function including bending, stretching, inertia, gravity, and collision. To keep the cloth grounded while allowing fluid movement, we stabilized simulations using pinned waist vertices. Our model is trained without skinning. During inference, model outputs are combined with skinning (5-20%) after each frame. This way stability is improved without overly penalising the dynamics.

Our dataset included nine stylized body types, about six garments per character, and around 300 Mixamo animations—giving the model a robust sense of how clothes should respond realistically across a range of human motion.

Results

Quality of the deformations

The system produced smooth and believable deformation, generalizing well across different garments and characters. Smaller variants of the model provide developers with flexible trade-offs between performance and visual fidelity, depending on the target hardware.

Inference is run separately for each garment and the body serves as the only obstacle. Additional information can be incorporated but performance trade-offs have to be considered carefully for each use-case.

In our case, we are assuming that top garments should be over or on top of bottoms, and we implemented collision resolution as post-processing to address this.
Figure 1: Results with "Baseline" model

Figure 2: Results with "Baseline" model

Figure 3: Results with "Baseline" model

This opens up a new level of realism for mobile games. Developers no longer need to pre-bake every cloth animation or design specific models for each outfit. The same neural engine can handle it all, dynamically and on the fly.

Inference speed

We tested it on two recently released phones, using both CPU and GPU configurations, to see how well it handles real-time inference. Results below are with a T-shirt mesh with 4,424 vertices.

We compared three versions: our Baseline model, a Tiny variant, and HOOD, a state-of-the-art method that, like ours, is based on Graph Neural Networks. HOOD makes a good benchmark because it shares many structural similarities with our approach, offering a fair and meaningful comparison.

For HOOD, we were only able to gather performance data on the CPU due to unsupported operators on the GPU. Even so, our achieved nearly a 4x speedup compared to HOOD, highlighting its efficiency and suitability for real-time mobile applications.

Compute configuration (mobile)	HOOD	Tiny version of our model	Our baseline model
4x Cortex-X4 up to 3.25GHz 4x Cortex-A720 up to 2.0GHz	1,017ms	169ms	267ms
Immortalis-G720 MC12	Unsupported operators	80ms	160ms
1x Cortex-X925 up to 3.62 GHz 3x Cortex-X4 4x Cortex-A720	750ms	125ms	195ms
Immortalis-G925 MC12	Unsupported operators	75ms	150ms

Figure 4: Performance measurements from two configurations from recently released phonesFigure 5: Results with our "Tiny" version of the model

Closing the gap: toward full real-time performance on mobile

We believe our model strikes a strong balance between quality and generalization. By training just once across a wide range of garments, body types, and animations, we have created a solution that adapts well to diverse scenarios—without the need for garment-specific tuning. Importantly, the training is fully unsupervised, meaning we don’t rely on ground-truth simulation data, which makes the approach far more scalable.

On the performance side, we have made significant progress compared to existing methods, especially in terms of real-time readiness on mobile. That said, there’s still room for further optimization. We’re confident the gap will continue to close as neural hardware accelerators improve and model efficiency advances. Looking ahead, our goal is to scale this system to handle multiple characters and garments simultaneously in real-time, all while preserving enough compute budget for the rest of the frame’s processing needs.

What this means for developers

If you are building mobile games, virtual try-ons or XR experiences, this work changes the game. Graph Neural Networks—once thought too heavy for mobile—are now within reach for real-time applications. With unsupervised training, studios can scale their production pipelines without needing complex simulation setups.

As neural hardware acceleration becomes more common in mobile devices, and with techniques such as model quantization, this approach will only get faster and more efficient. Whether you’re using Unity, Unreal, or a custom engine, neural clothing can elevate your characters with realism once reserved for AAA titles, without sacrificing FPS or thermal budgets.

Looking ahead

We see this work as just the beginning. Similar neural methods could soon power real-time material deformation, neural lighting and radiance caching, or even full-scene reconstructions using techniques such as NeRFs and Gaussian splatting.

We are actively seeking collaborators—both from academia and industry—who want to help bring this research into production. If you are interested in shaping the future of mobile graphics, we would love to hear from you.

Mobile, Graphics, and Gaming blog

Unlock the power of SVE and SME with SIMD Loops

Vidya Praveen

SIMD Loops is an open-source project designed to help developers learn SVE and SME through hands-on experimentation. It offers a clear, practical pathway to mastering Arm’s most advanced SIMD technologies…
- September 19, 2025
What is Arm Performance Studio?

Jai Schrem

Arm Performance Studio gives developers free tools to analyze performance, debug graphics, and optimize apps on Arm platforms.
- August 27, 2025
How Neural Super Sampling works: Architecture, training, and inference

Liam O'Neil

A deep dive into a practical, ML-powered approach to temporal super sampling.
- August 12, 2025

AI blog

Announcements

Architectures and Processors blog

Automotive blog

Embedded and Microcontrollers blog

Internet of Things (IoT) blog

Laptops and Desktops blog

Mobile, Graphics, and Gaming blog

Operating Systems blog

Servers and Cloud Computing blog

SoC Design and Simulation blog

Tools, Software and IDEs blog