As mobile games become more visually ambitious, players expect their favorite titles to rival console-quality experiences. That means better lighting, richer environments—and, increasingly, more realistic characters. One detail that’s been difficult to get right on mobile, however, is clothing. Not just texture or appearance, but the way clothes move, fold, stretch, and react to the characters and their environment.
Traditionally, clothing simulation has relied on physics-based methods—techniques that require substantial computational power to calculate how fabric behaves. While this is feasible on high-end PCs, it’s a serious challenge on mobile devices, which are inherently limited in processing power, thermal budget, and battery life.
But what if we could bring realistic cloth simulation to mobile without those heavy physics calculations? That’s the challenge we took on, and it led us to neural graphics.
In modern games, clothing is more than a cosmetic detail. It plays a key role in how characters are perceived and how their actions are communicated. A cape billowing in the wind, a hoodie bouncing as a character sprints, or a skirt flowing when dancing—all these elements enhance realism and responsiveness.
For extended reality (XR) and interactive storytelling, such nuances contribute to immersion. On mobile, though, these effects have been difficult to achieve. The usual simulation methods are simply too demanding to run in real time on phones and tablets.
Physically based simulations work well, but they’re compute-intensive. They require solving complex equations frame-by-frame, which eats up processing power and drains battery life. Real-time cloth must react instantly to unpredictable gameplay events (like sudden character movements or environmental changes), which rules out extensive pre-baking and forces developers to choose simplified approximations that cannot fully capture realistic bending, stretching, inertia, gravity and collision. The challenge deepens in customisable XR experiences, where players swap garments and avatars on the fly: here, the simulation must stay robust and flexible even when the characters, clothes, and animations are unknown until runtime. Encouragingly, recent advances in lightweight machine-learning models are beginning to bridge the gap, delivering convincing cloth behaviour at a fraction of the traditional computational cost.
Our goal was to deliver high-fidelity cloth simulation that runs in real time, directly on mobile devices—without requiring ground-truth simulation data.
To do that, we focused on four core requirements:
We explored a range of machine learning approaches. Recurrent Neural Network (RNN) methods were lightweight but could not generalize beyond specific garments. Graph Neural Network (GNN) solutions demonstrated great generalization but were too slow. And similarly, Graph Attention Network (GAT) approaches, while powerful, had limitations in terms of training and deployment.
Inspired by great research from academia, we built a model to address these challenges.
We developed a hierarchical GAT tailored specifically for mobile graphics. This model delivers realistic cloth motion while being lean enough to run on modern mobile chips.
The input to the model is the garment nodes together with their closest body vertex (within a radius) and the output is the predicted velocities for each vertex. Our GAT’s latent size consists of 4 attention heads and 32 channels. We are performing 3 message passing steps for each of the 4 levels – basic topology and 3 coarse levels.
As we are using GAT, collision information is passed as vertex instead of dynamically constructed edges. This way we ensure that the graph topology is static, and we can eliminate scatter-reduce operations, which can be computationally expensive and not well supported. The current model does include edge features, but importantly not stateful edge features. Furthermore, we remove the overhead from dynamic shaped inputs. The number of edges or vertices remains constant, only the connectivity changes frame-to-frame to reflect which vertex is the closest.
As always, we have profiled the model on device and identified a number of operators that are either unsupported on GPU or performing sub-optimally and where appropriate we swapped them with others. We recommend doing that for any use-case where the performance is critical.
Finally, we have experimented with a range of configurations to accommodate different quality and performance targets. Our “tiny model” includes only 3 message passing steps across base topology and 2 coarse resolution levels for even lighter usage.
The system learns cloth behavior without any pre-labeled data. We trained it in a fully unsupervised way, using a physically-based loss function including bending, stretching, inertia, gravity, and collision. To keep the cloth grounded while allowing fluid movement, we stabilized simulations using pinned waist vertices. Our model is trained without skinning. During inference, model outputs are combined with skinning (5-20%) after each frame. This way stability is improved without overly penalising the dynamics.
Our dataset included nine stylized body types, about six garments per character, and around 300 Mixamo animations—giving the model a robust sense of how clothes should respond realistically across a range of human motion.
The system produced smooth and believable deformation, generalizing well across different garments and characters. Smaller variants of the model provide developers with flexible trade-offs between performance and visual fidelity, depending on the target hardware.
Inference is run separately for each garment and the body serves as the only obstacle. Additional information can be incorporated but performance trade-offs have to be considered carefully for each use-case.
In our case, we are assuming that top garments should be over or on top of bottoms, and we implemented collision resolution as post-processing to address this.Figure 1: Results with "Baseline" modelFigure 2: Results with "Baseline" model
Figure 3: Results with "Baseline" modelThis opens up a new level of realism for mobile games. Developers no longer need to pre-bake every cloth animation or design specific models for each outfit. The same neural engine can handle it all, dynamically and on the fly.
We tested it on two recently released phones, using both CPU and GPU configurations, to see how well it handles real-time inference. Results below are with a T-shirt mesh with 4,424 vertices.
We compared three versions: our Baseline model, a Tiny variant, and HOOD, a state-of-the-art method that, like ours, is based on Graph Neural Networks. HOOD makes a good benchmark because it shares many structural similarities with our approach, offering a fair and meaningful comparison.
For HOOD, we were only able to gather performance data on the CPU due to unsupported operators on the GPU. Even so, our achieved nearly a 4x speedup compared to HOOD, highlighting its efficiency and suitability for real-time mobile applications.
Figure 4: Performance measurements from two configurations from recently released phonesFigure 5: Results with our "Tiny" version of the model
We believe our model strikes a strong balance between quality and generalization. By training just once across a wide range of garments, body types, and animations, we have created a solution that adapts well to diverse scenarios—without the need for garment-specific tuning. Importantly, the training is fully unsupervised, meaning we don’t rely on ground-truth simulation data, which makes the approach far more scalable.
On the performance side, we have made significant progress compared to existing methods, especially in terms of real-time readiness on mobile. That said, there’s still room for further optimization. We’re confident the gap will continue to close as neural hardware accelerators improve and model efficiency advances. Looking ahead, our goal is to scale this system to handle multiple characters and garments simultaneously in real-time, all while preserving enough compute budget for the rest of the frame’s processing needs.
If you are building mobile games, virtual try-ons or XR experiences, this work changes the game. Graph Neural Networks—once thought too heavy for mobile—are now within reach for real-time applications. With unsupervised training, studios can scale their production pipelines without needing complex simulation setups.
As neural hardware acceleration becomes more common in mobile devices, and with techniques such as model quantization, this approach will only get faster and more efficient. Whether you’re using Unity, Unreal, or a custom engine, neural clothing can elevate your characters with realism once reserved for AAA titles, without sacrificing FPS or thermal budgets.
We see this work as just the beginning. Similar neural methods could soon power real-time material deformation, neural lighting and radiance caching, or even full-scene reconstructions using techniques such as NeRFs and Gaussian splatting.
We are actively seeking collaborators—both from academia and industry—who want to help bring this research into production. If you are interested in shaping the future of mobile graphics, we would love to hear from you.