The proliferation of alternative platforms in the cloud, edge, desktop, and laptop heralds a new era of both opportunity and complexity. Opportunity arrives with different platform characteristics to improve performance, security, efficiency, latency, and more. Complexity arrives when we actively exploit the opportunity and need to manage the lifecycle of our application across alternative platforms. For example, your app may target AArch64 at the edge, and Armv7 in a gateway device but also includes cloud components running on legacy x86.
This is the world of multi-platform software engineering. In this blog I am going to explore how, with care, your application can be made multi-platform ready. We use Python and the Python ecosystem for this example, but the concepts are applicable across many language ecosystems.
A multi-platform application has three key characteristics:
Portability is a measure of a code’s ability to run on a different platform and is a desirable property of good code. We can observe that long-lived high-performance computing (HPC) code bases have migrated across different platforms as the platform popularity in HPC has ebbed and flowed. Wikipedia’s page on portability provides a good overview on the benefits and the value of portability as a property of software quality.
Distribution is a property of modern applications seeking robustness in the world of massive scale and inevitable component failure. Microservice architecture attempts to stylize the design decisions that make a distributed application possible. Developers who can manage the complexity of a distributed app gains two benefits: Resiliency as no single point of failure of the system halts the app, and reduced overhead when they move part of their app to a faster or less expensive host.
Awareness means that a component of the app is aware and able to exploit unique properties of the host platform. For example, hardware cryptography extensions should be used if present. In practice, when an app is written against a runtime environment (that is, Java or Python) the property of platform awareness is typically implemented in libraries that are introduced to an app as a dependency.
Multi-platform computing is not new. In micro: within a single machine, exploiting different configurations of a machine is described by heterogenous computing. In the macro: using multiple different machines federated together in as a distributed application has been pioneered by grid computing.
Multi-platform code is higher-quality code. It is well tested and fit for opportunities today and tomorrow. Arm makes the step from mono-platform to multi-platform easiest with the largest choice of platforms and tooling fit for every budget and configuration.
If we look at typical software lifecycle (previous), each phase requires us to care how our app works in a multi-platform environment. In this blog, I focus attention on the build, test, and release phases. Taking each of these in turn and using Python as our chosen language.
Python is a hugely popular language with a vigorous and active development community. The Python library ecosystem is extensive and has an artifact distribution system that supports multiple platforms. An understanding of multiple platforms within the artifact distribution system greatly simplifies managing multi-platform complexity. Many of today’s most important toolsets for machine learning and data science choose Python as a primary language.
Python programs are written as human readable text. This text is compiled by the Python interpreter to produce 'pyc' files. These pyc files are typically executed by Python's virtual machine. This design means that programs written in Python run anywhere the Python virtual machine is present. In some cases, however, the behavior of the Python virtual machine is unsuitable for the program being written. A programmer can choose to call out to native code and bypass the Python virtual machine. A common motivation to bypass the Python virtual machine is to improve performance.
One example of a popular library that has a Python virtual machine bypass is lxml. This library provides XML tooling to a Python application by exposing the popular C libraries libxml2 and libxslt. Because there is a dependency on C libraries, if you build an application in Python using lxml, your Python application will also need these C libraries. C libraries must be compiled for the platform (AArch64, x86, and so on) that they execute on.
One of the challenges software engineers face today is maintaining platform neutrality while pursuing performance. Platform neutral applications cost nothing to port to a new platform. Moving to a new platform can save money or improve one or more performance characteristics. As the network edge grows in importance as a venue to host low-latency, high-bandwidth applications the ability to run code at the edge becomes a commercial advantage. Platform neutral code enables a developer to gain advantage as they can be first to run on the edge regardless of the underlying architecture (AArch64, x86, or other).
Since I began working on ecosystem enablement for Arm, a lot has changed. Today, there are many options for building natively either using CI/CD offerings, something local on your desktop, or a cloud instance. Emulation has become significantly better and more convenient with containers and tools like buildx. Before I look at emulation, let us begin with cross compiling.
Cross-compiling is a time-honored method to get your software running on a platform that is either inconvenient or incapable of building on natively. The primary criticism leveled at cross-compiling is legitimate: You need a separate, inconvenient step to test the output of your build.
Let us say, in our case, we can automate away the testing activity and we can choose to cross-compile for the many benefits. These might include:
Dockcross provides cross compiling toolchains as Docker images for x86_64 platforms. The community maintains many images for cross-compiling everything from Linux to Windows, and x86 to s390x. Of particular interest to me is:
dockcross/manylinux2014-aarch64: Docker manylinux2014 image for building Linux AArch64 / arm64 Python wheel packages. It includes Python 3.5, 3.6, 3.7, 3.8, and 3.9. Also has support for the dockcross script, and it has installations of CMake, Ninja, and scikit-build.
This page: https://github.com/ARM-software/developer/blob/master/projects/python-wheels/multi-platform.md#building-aarch64-wheels-on-x86 describes the operation of Dockcross in detail.
Dockcross reduces the complexity of managing a cross-compiler environment with your existing workflow to zero. In addition to building for an architecture other than the one, you have locally, you can also build for alternative Python runtimes. This might include older versions of the Python runtimes or alternative runtimes like Pypy.
As an alternative to cross-compiling, you can run native code in an emulated environment. Platform emulation is the technique of making one machine behave like another. For the purposes of this blog, we are going to be talking about software running on x86_64 platforms that give us a AArch64 environment. A good example of this software solution is QEMU. QEMU is freely licensed and actively developed. Supported architectures include x86_64 and AArch64.
Emulators are valuable tools when developing for platforms other than your current platform. They offer the opportunity to bring up entire operating systems on a given hardware platform. Emulation enables the behavior of the app during execution to be observed. This can be particularly valuable if the emulator allows activation of hardware features (that is, hardware crypto) or variation in software environmental factors (that is, choice of kernel scheduler). Emulation also allows for advanced debugging where investigating the contents of individual registers is helpful. In general, benefits of emulation include:
One criticism of emulators is that they are slow or inaccurate, or both. QEMU uses dynamic translation to improve performance. However, there appears to be an unavoidable compromise with emulation: the closer an emulator models the underlying platform, the slow the performance. Faster performance comes at a cost of accuracy. For many applications, the accuracy of a fast emulator is good enough. In the case of JIT’d languages (that is, languages running in the JVM) emulator accuracy requirements can become acute.
Docker again provides a convenient mechanism to get a recent QEMU environment setup. https://hub.docker.com/r/multiarch/qemu-user-static includes a recent QEMU and convenient setup.
The goal of the manylinux project is to provide a convenient way to distribute binary Python extensions as wheels on Linux. The most recent version of the spec they work to is PEP 599: manylinux2014. The project has a goal of making images that run on the largest variety of Linux distros possible and to achieve this, careful attention to the build environment is necessary. To simplify the process of building many Linux compatible wheels, the project provides Docker images to control the build environment.
A containerized environment is available for AArch64 to build wheels to the current manylinux2014 specification. The container is available here: https://quay.io/repository/pypa/manylinux2014_aarch64. The steps to run the manylinux2014_aarch64 container on x86_64 is described in some detail here:
https://github.com/ARM-software/developer/blob/master/solutions/infrastructure/languages-and-libraries/python/multi-platform.md#run-a-aarch64-native-container-on-x86-with-emulation
Building natively on AArch64 has significant benefit in that performance is native for both build and test on AArch64. There is a downside though: to achieve a multi-platform library, you still need to build and test for alternative platforms. The techniques described previously (emulation, cross-compiling) can be employed to build and test for x86, on AArch64.
There are countless platforms available today that provide an AArch64 platform. These include all shapes and sizes, including laptops, and workstations. Several vendors offer hosted AArch64 platforms. A list of offerings can be found here:
https://developer.arm.com/solutions/infrastructure/developer-resources/development-platforms
Continuous integration and continuous delivery (or deployment) (CI/CD) describes a service where software is built and tested in a controlled, reproducible, and convenient way. A typical open-source development model will build and test every contribution before it is reviewed or accepted into the project. CI/CD systems are available as a remotely hosted service, for example Travis-CI and Github Actions. CI/CD systems can also be ‘on-prem’ where you download and run the software on your own machines, for example Jenkins. In addition there are hybrid models where you can download and host the worker component and do you builds locally. Github Actions includes support for the hybrid model.
Often, it is possible to apply the techniques of cross-compiling and emulation (described previously) for a CI/CD service that is missing native support today. However, the various advantages and disadvantages are inherited in the CI/CD service as experienced locally. In this blog, I am going to focus on Travis-CI hosted CI/CD services with AArch64 support. Travis-CI has a zero-cost option for both AArch64 and x86.
Instructions on using Travis-CI to build a wheel are included here:
https://github.com/ARM-software/developer/blob/master/solutions/infrastructure/languages-and-libraries/python/multi-platform.md#building-a-wheel-from-a-cicd-system-that-supports-aarch64
One distinct advantage of using native CI/CD is that you can perform your build using the convenience of a native build. In addition, once built, you can immediately test your code without needing emulation. A disadvantage of this method is that you need a connection to the Internet and you are relying on a remote service.
Avoiding unconsciously adding platform dependencies is simple in a simple app. Particularly if you are writing for a language that typically executes from a runtime environment, like Python. However, all the simple apps are already taken, and your app will probably depend on libraries that people before us have perfected and optimized. In this case, it is worth spending a little time now to ensure your app has multi-platform support.
In this blog, we have covered generating wheel artifacts for AArch64. Building wheels for AArch64 is an important part of your multi-platform journey for Python. Once you have your wheels, you need to deploy them either into a public repository like Pypi.org or something private. Managing software distribution, on Python or other languages, is vital to any modern application and designing in platform awareness today provides for new opportunities tomorrow.
By including multi-platform as a requirement for your application today, you begin to future-proof your system. Testing your app on multiple platforms improve code quality and allow you to exploit TCO opportunities. In short, multi-platform apps are ready for the markets of tomorrow when the edge, cloud, and accelerators are needed to keep your app competitive.
Learn more about the hardware and service options to begin your multi-platform journey today.
thank you!