With the announcement of Arm neural technology, Arm is enabling neural networks and a new class of neural graphics capabilities to run efficiently on mobile GPUs. Neural Super Sampling (NSS), denoising, and machine learning (ML)-powered rendering enhancements are just the start.
Today, we are excited to share that ExecuTorch, through the new ExecuTorch General Availability (GA) release, now includes support for Arm Neural Technology through the backend. This backend provides a complete ahead-of-time (AOT) export and runtime execution path for a large set of neural networks that will target Arm’s next-generation neural GPU acceleration. It also enables export for direct use in game engines.
This builds on the foundations Arm has prepared over the past few years. A key enabler is TOSA (Tensor Operator Set Architecture), which standardizes ML operators for acceleration on Arm platforms. Thanks to TOSA, ExecuTorch could already target Arm Ethos-U-based devices — and that same infrastructure now extends forward to future neural technology capable hardware, providing:
The TOSA standard provides consistent behavior and high performance across our accelerator technology, be that Ethos-U or the neural technology in our future Arm GPUs. In addition, the suite of open-source software for using TOSA (including compilers, torch.fx passes, and an MLIR dialect) makes it easy to use in an interoperable way with both PyTorch and ExecuTorch.
This continues the story we told in earlier blogs: ExecuTorch and TOSA and ExecuTorch support for Ethos-U85.
What makes this new support possible is the VGF backend. It introduces an ahead-of-time compilation flow and runtime integration that bridge the gap between PyTorch models and efficient deployment on neural technology capable hardware. The backend provides tooling to export models as portable files and to run them through the ExecuTorch runtime and execute them on a VGF emulator. This makes it possible to develop networks on a standard ML development platform and target any hardware with future Arm GPUs.
To run this example, you need the following packages:
pip install executorch ./examples/arm/setup.sh --i-agree-to-the-contained-eula --disable-ethos-u-deps --enable-mlsdk-deps
Then the following Python example will produce an exported model:
You can also explore the example models in the Executorch examples/models tree:
python3 -m examples.arm.aot_arm_compiler -t vgf --delegate --model_name="add" -i ./out_add -o out_add.pte
This PTE file can then be executed by building and using the test executor runner:
# Set up target build environment – host linux with mlsdk emulator ./setup.sh --disable-ethos-u-deps --enable-mlsdk-deps source examples/arm/ethos-u-scratch/setup_path.sh # Build the ExecuTorch Runtime cmake --preset linux -DEXECUTORCH_BUILD_VULKAN=ON -DEXECUTORCH_BUILD_VGF=ON -DCMAKE_INSTALL_PREFIX=cmake-vgf -Bcmake-vgf cmake --build cmake-vgf -j$(nproc) --target executor_runner # Run the produced PTE file with the runtime example application ./cmake-vgf/executor_runner -model_path add_module_vgf.pte
For further details and many additional options to tailor the flow to your requirements, take a look at our ExecuTorch example notebooks.
Better still, these networks can be used directly in game engines for use cases such as NSS.
We would like to invite developers to try out the VGF backend today. By doing so, you will be ready to target Arm neural technology as it arrives in upcoming Arm GPU generations.
To help you get started, here are some resources: