Better Python software through multi-platform support

July 30, 2021

9 minute read time.

The proliferation of alternative platforms in the cloud, edge, desktop, and laptop heralds a new era of both opportunity and complexity. Opportunity arrives with different platform characteristics to improve performance, security, efficiency, latency, and more. Complexity arrives when we actively exploit the opportunity and need to manage the lifecycle of our application across alternative platforms. For example, your app may target AArch64 at the edge, and Armv7 in a gateway device but also includes cloud components running on legacy x86.

This is the world of multi-platform software engineering. In this blog I am going to explore how, with care, your application can be made multi-platform ready. We use Python and the Python ecosystem for this example, but the concepts are applicable across many language ecosystems.

What is a multi-platform application?

A multi-platform application has three key characteristics:

Portability: The application runs on more than one platform.
Distribution: The application runs simultaneously in more than one place.
Awareness: The application components are aware of the platform they are running on.

Portability is a measure of a code’s ability to run on a different platform and is a desirable property of good code. We can observe that long-lived high-performance computing (HPC) code bases have migrated across different platforms as the platform popularity in HPC has ebbed and flowed. Wikipedia’s page on portability provides a good overview on the benefits and the value of portability as a property of software quality.

Distribution is a property of modern applications seeking robustness in the world of massive scale and inevitable component failure. Microservice architecture attempts to stylize the design decisions that make a distributed application possible. Developers who can manage the complexity of a distributed app gains two benefits: Resiliency as no single point of failure of the system halts the app, and reduced overhead when they move part of their app to a faster or less expensive host.

Awareness means that a component of the app is aware and able to exploit unique properties of the host platform. For example, hardware cryptography extensions should be used if present. In practice, when an app is written against a runtime environment (that is, Java or Python) the property of platform awareness is typically implemented in libraries that are introduced to an app as a dependency.

Multi-platform computing is not new. In micro: within a single machine, exploiting different configurations of a machine is described by heterogenous computing. In the macro: using multiple different machines federated together in as a distributed application has been pioneered by grid computing.

Multi-platform code is higher-quality code. It is well tested and fit for opportunities today and tomorrow. Arm makes the step from mono-platform to multi-platform easiest with the largest choice of platforms and tooling fit for every budget and configuration.

Software lifecycle of a multi-platform application

If we look at typical software lifecycle (previous), each phase requires us to care how our app works in a multi-platform environment. In this blog, I focus attention on the build, test, and release phases. Taking each of these in turn and using Python as our chosen language.

Building a multi-platform Python app

Python is a hugely popular language with a vigorous and active development community. The Python library ecosystem is extensive and has an artifact distribution system that supports multiple platforms. An understanding of multiple platforms within the artifact distribution system greatly simplifies managing multi-platform complexity. Many of today’s most important toolsets for machine learning and data science choose Python as a primary language.

Why do I need to care about the platform when I use Python?

Python programs are written as human readable text. This text is compiled by the Python interpreter to produce 'pyc' files. These pyc files are typically executed by Python's virtual machine. This design means that programs written in Python run anywhere the Python virtual machine is present. In some cases, however, the behavior of the Python virtual machine is unsuitable for the program being written. A programmer can choose to call out to native code and bypass the Python virtual machine. A common motivation to bypass the Python virtual machine is to improve performance.

One example of a popular library that has a Python virtual machine bypass is lxml. This library provides XML tooling to a Python application by exposing the popular C libraries libxml2 and libxslt. Because there is a dependency on C libraries, if you build an application in Python using lxml, your Python application will also need these C libraries. C libraries must be compiled for the platform (AArch64, x86, and so on) that they execute on.

One of the challenges software engineers face today is maintaining platform neutrality while pursuing performance. Platform neutral applications cost nothing to port to a new platform. Moving to a new platform can save money or improve one or more performance characteristics. As the network edge grows in importance as a venue to host low-latency, high-bandwidth applications the ability to run code at the edge becomes a commercial advantage. Platform neutral code enables a developer to gain advantage as they can be first to run on the edge regardless of the underlying architecture (AArch64, x86, or other).

Cross-compiling, emulation, or building natively?

Since I began working on ecosystem enablement for Arm, a lot has changed. Today, there are many options for building natively either using CI/CD offerings, something local on your desktop, or a cloud instance. Emulation has become significantly better and more convenient with containers and tools like buildx. Before I look at emulation, let us begin with cross compiling.

Cross-compiling is a time-honored method to get your software running on a platform that is either inconvenient or incapable of building on natively. The primary criticism leveled at cross-compiling is legitimate: You need a separate, inconvenient step to test the output of your build.

Let us say, in our case, we can automate away the testing activity and we can choose to cross-compile for the many benefits. These might include:

Convenience: I can verify my application builds for a new platform without access to that platform
Performance: Cross-compiling is as fast as native compilation
Opportunity: My app now builds for any platform that my cross-compiler supports
Reliability: The behavior of the cross-compiler is highly predictable and not affected by a complex emulation layer

Cross-compiling using Dockcross

Dockcross provides cross compiling toolchains as Docker images for x86_64 platforms. The community maintains many images for cross-compiling everything from Linux to Windows, and x86 to s390x. Of particular interest to me is:

dockcross/manylinux2014-aarch64: Docker manylinux2014 image for building Linux AArch64 / arm64 Python wheel packages. It includes Python 3.5, 3.6, 3.7, 3.8, and 3.9. Also has support for the dockcross script, and it has installations of CMake, Ninja, and scikit-build.

This page: https://github.com/ARM-software/developer/blob/master/projects/python-wheels/multi-platform.md#building-aarch64-wheels-on-x86 describes the operation of Dockcross in detail.

Dockcross reduces the complexity of managing a cross-compiler environment with your existing workflow to zero. In addition to building for an architecture other than the one, you have locally, you can also build for alternative Python runtimes. This might include older versions of the Python runtimes or alternative runtimes like Pypy.

Software emulation of AArch64 on x86_64.

As an alternative to cross-compiling, you can run native code in an emulated environment. Platform emulation is the technique of making one machine behave like another. For the purposes of this blog, we are going to be talking about software running on x86_64 platforms that give us a AArch64 environment. A good example of this software solution is QEMU. QEMU is freely licensed and actively developed. Supported architectures include x86_64 and AArch64.

Emulators are valuable tools when developing for platforms other than your current platform. They offer the opportunity to bring up entire operating systems on a given hardware platform. Emulation enables the behavior of the app during execution to be observed. This can be particularly valuable if the emulator allows activation of hardware features (that is, hardware crypto) or variation in software environmental factors (that is, choice of kernel scheduler). Emulation also allows for advanced debugging where investigating the contents of individual registers is helpful. In general, benefits of emulation include:

Convenience: I can build for a new platform without access to that platform
Realizable: I can evaluate the behavior of my app during execution with different hardware features
Opportunity: My app now is built and tested for any platform that my cross-compiler supports
Simplicity: I can build with advanced build systems (that is, Ninja, Bazel) that cannot target non-native platforms

One criticism of emulators is that they are slow or inaccurate, or both. QEMU uses dynamic translation to improve performance. However, there appears to be an unavoidable compromise with emulation: the closer an emulator models the underlying platform, the slow the performance. Faster performance comes at a cost of accuracy. For many applications, the accuracy of a fast emulator is good enough. In the case of JIT’d languages (that is, languages running in the JVM) emulator accuracy requirements can become acute.

Building a AArch64 wheel using emulation

Docker again provides a convenient mechanism to get a recent QEMU environment setup. https://hub.docker.com/r/multiarch/qemu-user-static includes a recent QEMU and convenient setup.

Running manylinux2014 for AArch64

The goal of the manylinux project is to provide a convenient way to distribute binary Python extensions as wheels on Linux. The most recent version of the spec they work to is PEP 599: manylinux2014. The project has a goal of making images that run on the largest variety of Linux distros possible and to achieve this, careful attention to the build environment is necessary. To simplify the process of building many Linux compatible wheels, the project provides Docker images to control the build environment.

A containerized environment is available for AArch64 to build wheels to the current manylinux2014 specification. The container is available here: https://quay.io/repository/pypa/manylinux2014_aarch64. The steps to run the manylinux2014_aarch64 container on x86_64 is described in some detail here:

https://github.com/ARM-software/developer/blob/master/solutions/infrastructure/languages-and-libraries/python/multi-platform.md#run-a-aarch64-native-container-on-x86-with-emulation

Building natively

Building natively on AArch64 has significant benefit in that performance is native for both build and test on AArch64. There is a downside though: to achieve a multi-platform library, you still need to build and test for alternative platforms. The techniques described previously (emulation, cross-compiling) can be employed to build and test for x86, on AArch64.

AArch64 servers, in the home, Cloud, and on your lap.

There are countless platforms available today that provide an AArch64 platform. These include all shapes and sizes, including laptops, and workstations. Several vendors offer hosted AArch64 platforms. A list of offerings can be found here:

https://developer.arm.com/solutions/infrastructure/developer-resources/development-platforms

CI/CD systems with multi-platform support

Continuous integration and continuous delivery (or deployment) (CI/CD) describes a service where software is built and tested in a controlled, reproducible, and convenient way. A typical open-source development model will build and test every contribution before it is reviewed or accepted into the project. CI/CD systems are available as a remotely hosted service, for example Travis-CI and Github Actions. CI/CD systems can also be ‘on-prem’ where you download and run the software on your own machines, for example Jenkins. In addition there are hybrid models where you can download and host the worker component and do you builds locally. Github Actions includes support for the hybrid model.

Often, it is possible to apply the techniques of cross-compiling and emulation (described previously) for a CI/CD service that is missing native support today. However, the various advantages and disadvantages are inherited in the CI/CD service as experienced locally. In this blog, I am going to focus on Travis-CI hosted CI/CD services with AArch64 support. Travis-CI has a zero-cost option for both AArch64 and x86.

Instructions on using Travis-CI to build a wheel are included here:

https://github.com/ARM-software/developer/blob/master/solutions/infrastructure/languages-and-libraries/python/multi-platform.md#building-a-wheel-from-a-cicd-system-that-supports-aarch64

One distinct advantage of using native CI/CD is that you can perform your build using the convenience of a native build. In addition, once built, you can immediately test your code without needing emulation. A disadvantage of this method is that you need a connection to the Internet and you are relying on a remote service.

Summary

Avoiding unconsciously adding platform dependencies is simple in a simple app. Particularly if you are writing for a language that typically executes from a runtime environment, like Python. However, all the simple apps are already taken, and your app will probably depend on libraries that people before us have perfected and optimized. In this case, it is worth spending a little time now to ensure your app has multi-platform support.

In this blog, we have covered generating wheel artifacts for AArch64. Building wheels for AArch64 is an important part of your multi-platform journey for Python. Once you have your wheels, you need to deploy them either into a public repository like Pypi.org or something private. Managing software distribution, on Python or other languages, is vital to any modern application and designing in platform awareness today provides for new opportunities tomorrow.

By including multi-platform as a requirement for your application today, you begin to future-proof your system. Testing your app on multiple platforms improve code quality and allow you to exploit TCO opportunities. In short, multi-platform apps are ready for the markets of tomorrow when the edge, cloud, and accelerators are needed to keep your app competitive.

Learn more about the hardware and service options to begin your multi-platform journey today.

1 comment
0 members are here

Tools, Software and IDEs blog

Product update: Arm Development Studio 2025.0 now available

Stephen Theobald

Arm Development Studio 2025.0 now available with Arm Toolchain for Embedded Professional.
- July 18, 2025
GCC 15: Continuously Improving

Tamar Christina

GCC 15 brings major Arm optimizations: enhanced vectorization, FP8 support, Neoverse tuning, and 3–5% performance gains on SPEC CPU 2017.
- June 26, 2025
GitHub and Arm are transforming development on Windows for developers

Pareena Verma

Develop, test, and deploy natively on Windows on Arm with GitHub-hosted Arm runners—faster CI/CD, AI tooling, and full dev stack, no emulation needed.
- May 20, 2025

AI blog

Announcements

Architectures and Processors blog

Automotive blog

Embedded and Microcontrollers blog

Internet of Things (IoT) blog

Laptops and Desktops blog

Mobile, Graphics, and Gaming blog

Operating Systems blog

Servers and Cloud Computing blog

SoC Design and Simulation blog