With each new generation, Arm CPUs deliver generational performance improvements and introduce architectural advancements that cater to the needs of evolving compute workloads. In this blog post, we highlight three use cases demonstrating the proven impact of the architectural features of Armv9 CPUs in real-world scenarios, particularly:
The good news is that some of Arm’s SVE2 optimizations, discussed in this blog post, are available for developers to use now. This has the potential to enhance the user experience for the most popular media apps that are influencing how we communicate, work and entertain.
First, it is important to explore the current challenges for mobile app developers. There are more than 2 million Android apps in the market today, competing for user adoption. For the apps to remain competitive, they need to land their innovations quickly across a large cross-section of mobile devices. Relying on fixed-function hardware is challenging for time to market and portability. Metrics synonymous with a great end-user experience, like app launch time, UI fluidity, tokens per second and frames per second (FPS) stability need to meet real user expectations. Therefore, the OEMs need to balance performance improvements against the wider user needs, like longer battery life, reduced data usage and the cost of the device. Falling short on any of these parameters is likely to lead to a dissatisfied user who may not see value in upgrading their mobile devices.
Developing software on Armv9 CPUs can address the challenges for both OEMs and developers.
Let us look at three case studies where software optimizations are proven to accelerate real-world workloads. First, as a refresher, here is a subset of SVE2 and the new vector instructions in Armv9 CPUs that accelerate key workloads on mobile devices:
Using these vectors instructions can results in the optimized software using less CPU cycles, which has 2 key benefits. First, increased battery life because of less CPU cycles that translate to lower energy consumption and second improved app performance.
Consuming multimedia content is one of the most common workloads on mobiles and the biggest source of traffic on a mobile network. So, there is a continuous drive for more efficient codecs that conserve network bandwidth while also supporting great image quality.
HDR technology brings more life-like details, even to very dark and very bright scenes due to better accuracy of color. It uses 10 bits, instead of 8 bits, to represent each color channel. Both AV1 and VP9, as well as other modern codecs, support HDR videos.
AV1 is a more recent format and offers better compression while VP9 has wider compatibility across browsers and devices. Popular apps like Netflix, Instagram, Facebook and YouTube among others, use the AV1 and VP9 formats for their videos. For example, the libdav1d an open-source AV1 decoder is bundled in the Facebook app.
The SVE2 optimization accelerated HDR video decode by around 10 percent, with an 8 percent uplift in VP9 decode and 10 percent uplift in AV1 decode. This translates to an around 10 percent reduction in CPU cycles and an equivalent reduction in battery consumption, allowing users to experience longer battery life while streaming on-demand video on their mobile devices. Essentially, watching Facebook and Instagram reels, YouTube shorts and videos, and Netflix content just got better!
Optimizations to the libdav1d (Av1 decoder) and libvpx (Vp9 decoder) are up-streamed and available for developers to use now.
It is important to note that everyone uses LibYUV without even realizing it.
LibYUV is an open-source library that is used in color space conversion (between RGB and YUV), scaling camera sensor data, and camera filtering and rotation. It processes data coming in from the camera sensor before it is consumed by the video decoder. In many cases, the data from the video decoder is processed by LibYUV before it is sent to the display.
The SVE2 optimization accelerates LibYUV by 26 percent (geomean of multiple kernels across Armv9 CPUs). Around 100 kernels in LibYUV have been optimized with SVE2, and there are other kernels where work is in progress. Some of the work has been up streamed, found here.
LibYUV is distributed as a part of Chromium, an open-source browser project which is the foundation for Chrome and custom browsers from leading mobile OEMs, including Mi Browser and Samsung Internet, among others. It is also integrated to AOSP and Android Jetpack. Since LibYUV is so integral to mobile devices, it has the potential for a far-reaching impact on the overall mobile experience. From better video conferencing experience and smoother rotation between portrait and landscape mode, to a better video consumption experience at a far improved battery life.
Halide is a domain-specific language for image processing. It is used by apps like Adobe Photoshop and by some of the OEMs for their camera pipelines.
SVE2 instructions such as Gather Load, Scatter Store instructions and TBL (Programmable Table Look-up, used to vectorize small look up tables) have accelerated some of the key Computer Vision (CV) pipelines in Halide. iToFDepth (for sensing depth), Bilateral Grid (for edge-aware tone mapping) and Local Laplacian (for filtering) are just some of the compute intensive algorithms that have seen nearly 20 percent uplift with SVE2.
Optimizing software with SVE2 potentially enables some of the photographic effects to be applied in real time, opening new possibilities for entry-tier mobile devices where users can enjoy better quality photos without dedicated hardware.
Arm has optimized the Halide back-end for SVE2 code-generation. The good news is that some of the patches have been up streamed, while others are in progress.
Graph showing the comparison between Halide-SVE2 and Halide-Neon CPU cycles.
Example image showing depth effects
Example image showing edge-aware tone mapping
SVE2 introduces several new instructions that are ideal for accelerating key real-life workloads and apps. Our upcoming technical blog posts will talk in greater detail about how some of the performance uplifts were achieved with Armv9 CPUs.
Arm is committed to bringing the right balance of developer enablement and performance enhancement to the ecosystem. Some of the open-source libraries and kernels optimized for SVE2 have already been up-streamed and there is more to come.
Targeting the latest developments on Armv9 CPUs will enable developers to land innovations faster and bring ever better user experience to end-consumers across all tiers of mobile devices.
The time to adopt and build with SVE2 is now!