In this blog post, I will give an overview of Python on Arm platforms, mainly used on Linux and Windows operating systems.
Let me start by introducing myself. My name is Diego Russo and in October 2023, I celebrated 12 years at Arm. During my time at Arm, I have had a variety of different roles. However, they all have one thing in common and that is Python.
During my 18 years working, I have used a variety of programming languages and Python has always been there. Python is popular because of its versatility. It can be used in different environments. For example, I have used it for:
Since 2011, I have been attending EuroPython. For the last two events in 2022 and 2023, I have been a part of organizing for the conference. I am also helping to organize this years, in 2024. At EuroPython in 2023, I gave a talk: Python on Arm.Some of the information from the presentation is reported in this blog post. However, there are some other important updates that we have been working on since then.
First, I want to recognize the efforts of the Upstream Community to enable Arm architectures. Python and Arm share a long history together, the first Arm related commit is from 2001 and the first AArch64 related commit is from 2012. In 2019, the PEP 599 was created and accepted. This PEP defines the manylinux2014 platform tag and it is important because it officially introduces the support for Arm platforms.
There are examples where the community made a great progress with enabling Python packages on AArch64. Conda Forge is a community-led collection of recipes, build infrastructure, and distributions for the Conda package manager (different from pypi.org). They have done a fantastic job in migrating available packages to AArch64. In fact, you can check from the Conda migration status page that the majority of the packages are available on AArch64.
Arm and partners joined community effort to enable the Python ecosystem on AArch64. As part of the first launch of Arm instances in the public cloud, we wanted the developers to have a smoother experience when dealing with the Python ecosystem. For example, if a developer installs a built distribution package, it should work without falling back to a recompilation of the source distribution. Starting in 2020, almost 2900 packages were analyzed. This was achieved through testing them on x86 and AArch64 and checking what the issues were. The failing ones were sorted in a priority list and fixed. After 2 years, more than 200 Python projects have been enabled by generating AArch64 built distributions. As an example, I wrote a learning path on how to deploy a Django application on an Arm server. It just works and no extra steps are needed.
So far, we have seen the enablement of the Python ecosystem on AArch64. But what about performance? If you go on speed.python.org, you can see AArch64 performance across a range of benchmarks. Last year, only x86 results were present and we have been working on getting AArch64 results enabled. You can see the beginning of the engagement in the Python forum. The website is powered by codespeed, which is a Django application that runs the web interface, the API and the database. The benchmarks are part of the pyperformance benchmark suite. Pyperformance can upload benchmark data directly into codespeed and are very well integrated. I have been working with Łukasz Langa who has supported me in the development and deployment of the fixes because the website needed some care. We have integrated AArch64 results and back filled data since May 2023.
Windows on Arm (WoA) support has been added since Python 3.8, but no official builds were present until 2022. In fact Python 3.11 officially supports WoA and this has been possible thanks to a joint effort between Arm, Qualcomm, Microsoft, CIX technology and Linaro. The overall goal of this partnership is to have an ecosystem which supports native development on WoA. Of course, Python is part of this ecosystem, but other applications have been used as well.
Similar to the AArch64 enablement, the top 520 packages have been tested on WoA. Over 70 percent of these packages have been successful. They acted on Python packages directly but also on dependencies needed to build such packages, like third party libraries and toolchain packages. Also, Linaro has been hosting a Surface Pro X in the official CPython buildbot instance to enable the build of arm64 Python version. If you want to build the package for WoA, this is possible by using cross compilation on any other platform.
The reason for building WoA native applications can be attributed to performance. Windows 11 on Arm can run binaries compiled for other architectures with emulation. However, the overhead of translating between differentinstructions is significant. In the case of pyperformance, natively compiled AArch64 Python is almost twice as fast as the emulated x86 version. This has an impact on the user experience of the application.
Time normalized in Python. Click the image to increase the size.
We also want to briefly touch on the activities that we have been doing in the Machine Learning (ML) space. At the beginning of 2020, I was working in the same team that collaborates with Google on TensorFlow Lite and the TensorFlow Model optimization toolkit. TensorFlow Lite is a library for deploying models on mobile, microcontrollers (MCUs) and other edge devices. The contributions to the library are related to the introduction of int16 post-training quantization mode for activations and corresponding reference kernels, continuous improvements in quantization support, maturing TFLite-to-TOSA legalization code and engagement around 4-bit support in TFLite.
The TensorFlow Model Optimization Toolkit is a suite of tools for optimizing ML models for deployment and execution. We contribute with different optimizations and how to apply them collaboratively together.
Thanks to a collaboration between AWS, Arm, Google and Linaro, starting with TensorFlow 2.10, AArch64 packages are available on pypi.org. Also, since TensorFlow 2.10, there is the integration of Compute Library for the Arm Architecture (ACL) through oneDNN to accelerate performance on AArch64 CPUs. So, if you are on an AArch64 machine, you can just pip install TensorFlow and it will be optimized for the platform.
For PyTorch, AArch64 packages are available since version 1.8, but if you want to test bleeding edge version of PyTorch, Arm does provide docker images. These contain both OpenBLAS, which is the default backend, and one DNN+ACL backend.
If you want to experiment with Keras Core and different ML backends (TensorFlow, PyTorch and JAX) on AArch64, have a look at this learning path I wrote, It just works!
Other contributions include:
One of the gaps we identified at the time of the presentation was the CPython platform support for Arm platforms defined in the PEP 11: we are currently on both in Tier-2 and Tier-3. Recently, we have engaged with the community with a proposal to move AArch64 platforms to Tier-1.The reception has been positive, but more work is needed before promoting the platform to Tier-1 companies. Watch this space.
So, what's next? Our request is to try migrating your workloads to Arm. The migration should be seamless and painless, but if you see any issues, please raise it with the upstream communities. Now, every developer can access the Arm platform on all major clouds. So, I recommend that you give it a try.
Also, if you provide a package, start building and validating it on Arm platforms. We saw that there are real benefits in terms of performance on WoA, with more a people migrating workloads to AArch64.
Finally, we are here to help. Feel free to engage directly with us via the Arm developer program. This is a community of almost 10,000 developers where you can ask questions and share your expertise. We have a discord server where our Arm experts can help you out with your queries. Go on arm.com/developerprogram and sign up.