Generative AI applications for mobile devices have evolved beyond traditional chatbots and virtual assistants, now offering advanced intelligent capabilities. Current features range from sound generation to image captioning, mathematical reasoning, and summarization across text, audio, video, and group chats, among other capabilities. In the future, developers can use SME2 for advanced image processing, multi-modal AI, and interactive NPC speech in games.
Arm has introduced Scalable Matrix Extension 2 (SME2), a set of advanced instructions in the Armv9 architecture to accelerate matrix multiplications common in AI workloads across a wide range of domains. SME2 enables these complex workloads to run directly on power-efficient mobile devices.
And the very best part is that developers do not need to change a single line of code to take advantage of SME2 in their models or applications. Arm has ensured the seamless integration of SME2 into leading AI frameworks and runtimes in the ecosystem through KleidiAI. These integrations mean that SME2 should already be embedded within the software stack for developers, provided their apps use the supported frameworks and runtimes.
Accelerating AI workloads on Android with XNNPack
The first AI inference library to support SME2 is XNNPack, a neural network inference solution that is widely used to accelerate a wide range of machine learning (ML) frameworks from PyTorch & ExecuTorch, to MediaPipe and TensorFlow Lite on Arm. Together, Arm and Google accelerated Int4 matrix multiplications in the Gemma 3 model using Int8 outer product instructions, achieving a 6x speedup in chatbot response times with SME2 enabled.
Gemma 3 also can begin text summarization of a four-paragraph page (around 800 words) in under one second, with SME2 on a single CPU core. Read the full SME2 optimization analysis in this blog post. The post discusses how it works and how these results are the culmination of one year of optimizing XNNPack, enabling developers to unlock seamless and transparent AI performance.
We are thrilled to be at WeAreDevelopers World Congress 2025 in Berlin, Germany this week, introducing SME2 to one of the world’s leading events for developers and AI innovators. Join Gian Marco Iodice this Thursday July 10th at 2:10PM local time (CEST) on Stage 1 as he shares and in-depth overview of SME2 technology and how developers can take advantage of it right away.
If you would like to get hands-on accelerating AI application on mobile devices, join us for a 2-hour workshop Mobile AI on Arm: AI Text and Audio Generation Entirely on Device Thursday July 10th 1:00PM -3:00PM local time. In this two-part workshop you will:
If you are not at WeAreDevelopers this week, no worries, we have a fantastic audio generation Code-Along and Expert Q&A planned for next month:
Register here
While next-generation Android mobile devices capable of SME2 acceleration are on the horizon, there is no need to wait. With SME2 already available across the latest iOS devices, developers can start building AI applications on top of a wide range of AI frameworks and runtime libraries that have native support for SME2.
The SME2-enhanced performance in your applications will then be portable across Arm-based platforms from iOS and iPadOS to MacOS and Android.
Check out our new Arm Developer Launchpad for SME2, a one-stop-shop with all of the information and developer resources you need to learn more about SME2, accelerated AI use cases, and hand-on code examples & tutorials to explore the performance
And if you are ready to get see the performance benefits of SME2 first-hand, you can get started today on Apple M4-based device or iPhone16 devices with this learning path on accelerating matrix multiplication with SME2.
Get started building the next generation AI apps of tomorrow, today!