Arm Architecture v8.2 features were detailed in 2016. Just two years later, in 2018, flagship Android phones with ARMv8.2 features (Google Pixel3, Huawei Mate 20 Pro, Samsung Galaxy S9, etc) started shipping in volume. Naturally, cellphone vendors and software developers want to see the latest Arm architectural features being fully exploited in their products.
The Android Runtime (ART) is the main application runtime environment layer used by the Android operating system, which supports Android framework code and Android Apps written in Java/Kotlin. This is the layer where the Armv8.2 instructions can play an important role, as the ART compiler can use these new instructions to optimise Android APIs and Java/Kotlin code for the user, and deliver big performance improvement on the Android system. As the Android Runtime layer sits quite high in Android software stack, introducing new instructions in ART requires a few additional steps. In 2018, there was such collaboration between Linaro, Arm and Google engineers to enable some of Armv8.2 features in the AOSP ART project and therefore enabled in latest Android builds.
The Armv8 architecture has continued to evolve, the Armv8.1 and Armv8.2 instruction sets have introduced several enhancements to AArch64 atomic read-write instructions, additions to the Advanced SIMD instruction set, half-precision floating point data processing support, memory model enhancements, introduction of RAS support, and introduction of statistical profiling, etc. As well as the additions, the optional CRC instructions in v8.0 become a requirement in ARMv8.1. Read the Armv8.2-A documentation on Arm Developer. These updates have been described more fully in Armv8-A architecture evolution.
For end users, Android App developers, Game developers, the good news is you don't need to do anything for Java apps. The optimisations to use the new Armv8.2 features described below are enabled automatically by the JIT (just-in-time compiler) on supported Android devices.
For Arm partners and phone manufacturers working on lower level platform and requiring building/flashing Android images for their own Android products, please make sure the architecture variant configuration is correctly set for the target CPUs, so that the new features are correctly propagated and enabled in Clang and ART compilers when building Android images. For example, a BoardConfig for Google Pixel3 device in Android AOSP tree:
TARGET_ARCH := arm64 TARGET_ARCH_VARIANT := armv8-2aTARGET_CPU_ABI := arm64-v8a
The target arch variant 'armv8-2a' enables Armv8.2 features in Clang and ART compiler when building Android images on host machines. For more details: android / device / google / crosshatch / pie-dr1-release / . / BoardConfig-common.mk
In Java, calculation of CRC-32 checksum with the specified array of bytes can be done through java.util.zip.CRC32 class APIs. The ART compiler can optimise this API using Armv8 CRC32{bhwx} instructions.The following example shows how Android API java.util.zip.CRC32.update(int b) will be optimised with CRC32B instruction. The CRC32 instruction implementation of java.util.zip.CRC32.update() delivers 20 times performance improvement, compared to the unoptimised version.
java.util.zip.CRC32.update(int b)
MVN W0, W2 ; W2 == b CRC32B W0, W0, W1 ; W1 == this.crcMVN W0, W0 ; return W0
Armv8.2 provides support for half-precision floating point data processing instructions. Such instructions are ideal for optimising Android public API android.util.Half class - a wrapper and a utility class to provide full software implementation for manipulating half-precision 16-bit IEEE 754 floating point data types. For user code which is using android.util.Half class, the conversions between 16-bit FP16 data and 32-bit single precision float data are sped up with Armv8.2 FP16 instructions. The following example shows how current android.util.Half.toFloat() API can be optimised with Armv8.2 FP16 instructions. The Armv8.2-FP16 instruction implementation of android.util.Half.toFloat method delivers 50% performance improvement, compared to the unoptimised version.
android.util.Half.toFloat(Half h)
FMOV H31, W1 ; W1 == hFCVT S0, H31 ; return S0
In Armv8.2 supported by Cortex-A55 / Cortex-A75 and later CPUs, the signed dot product (SDOT) and unsigned dot product (UDOT) are introduced to improve machine learning performance. For user code that involves calculating dot product from byte-arrays, the new ART compiler support auto-vectorizing the unrolled loop pattern, and automatically replacing the kernel loop with SDOT/UDOT SIMD instructions for much faster parallel execution. Compared to the unoptimised loop, the ARMv8.2 dot product instructions brings over 7x performance improvements on Java loop like this.
// byte[] a; byte[] b; for (int i = 0; i < size; i++) { sum += a[i] * b[i]; }
...
... UDOT v0.4s, v1.16b, v2.16b...
For more information about Armv8 dot product instructions and optimisations, please refer to our blogpost: Exploring the Arm dot product instructions.
A dynamic compiler such as Android Runtime compiler will require up-to-date support for assembling/disassembling instruction opcodes for the dynamically-compiled code sequences. VIXL is an opensource Arm/Arm64 assembler/disassembler designed to make it easy to support dynamic code generation for the latest Arm architecture versions. The VIXL release from 2018-Q3 includes full support for Armv8.3 instructions. The Arm ART team has worked closely with Google engineers to fully integrate the latest upstream VIXL into Google AOSP project. VIXL also provides a set of development/debug/simulation features for developers to debug and test latest Armv8.2 features. The following example shows how engineers can download VIXL and test CRC32 instructions and checksum implementations with VIXL simulator.
This is also a workflow of how our Arm ART developers debug new Armv8.2 instructions for ART compiler.
$ git clone https://android.googlesource.com/platform/external/vixl; cd vixl$ vi examples/aarch64/crc-checksums.cc$ scons aarch64_examples -j8$ obj/latest/examples/aarch64/crc-checksums
Currently, Android runtime compiler has an option to detect CPU feature based on Clang compiler's predefined macros. AOSP's prebuilt clang version has updated to support new Armv8.2 instructions and new CPUs including Cortex-A55 and Cortex-A75. For developers who are building Android apps or Android images on host machines, you should ensure the IDEs/compilers/BoardConfig are correctly configured to enable the correct predefined macros for your selected Arm CPUs, which will be used by ART to select appropriate instruction generation. The following examples show how developers can quickly check available ARM CPU feature macros provided by Clang compiler:
# Example: check Dot Product feature$ clang++ -target aarch64-none-linux-gnueabi -mcpu=cortex-a55 -E -dM - < /dev/null | grep DOTPROD#define __ARM_FEATURE_DOTPROD 1
# Example: check FP16 feature$ clang++ -target aarch64-none-linux-gnueabi -march=armv8.2a+fp16 -E -dM - < /dev/null | grep FP16#define __ARM_FEATURE_FP16_SCALAR_ARITHMETIC 1#define __ARM_FEATURE_FP16_VECTOR_ARITHMETIC 1#define __ARM_FP16_ARGS 1#define __ARM_FP16_FORMAT_IEEE 1
A more advanced way of configuring code generation optimally is to dynamically detect CPU features and take advantage of new Arm instructions when they are identified to be available on the target. Such optimisation is especially useful when running Android apps with ART Just-in-Time mode. Dynamic detection requires 'HWCAP' support from Android kernel, and Bionic header files are also updated to allow testing of the latest HWCAP features. Arm has worked with Google to intro runtime detection in ART JIT compiler, to enable these new Armv8.2 automatically on Android phones that have the latest cpus, without rebuilding. The following example shows how dynamic hwcap detection is typically done in runtime systems:
uint64_t hwcaps = getauxval(AT_HWCAP);has_crc_feature = hwcaps & HWCAP_CRC32 ? true : false;has_lse_feature = hwcaps & HWCAP_ATOMICS ? true : false;has_fp16_feature = hwcaps & HWCAP_FPHP ? true : false;has_dotprod_feature = hwcaps & HWCAP_ASIMDDP ? true : false;
Refer to android / platform / bionic / pie-release / . / libc / kernel / uapi / asm-arm64 / asm / hwcap.h for Armv8.2 features supported for dynamic HW_CAPs detection.
For any questions about the Armv8.2 deployment on Android please contact developer@arm.com.
Xueliang,
You pointed out that it's already supported in AOSP. I would like to know which branch do you mean. It's still not available in current pie release.
In https://android.googlesource.com/platform/build/soong/+/refs/heads/pie-release/cc/config/arm64_device.go, the option "-mcpu=" for cortex-a55 is not yet supported, there use "-mcpu=cortex-a53" instead.
Thanks!
Hi,The option "-mcpu=" for cortex-a55/cortex-a75 is supported in AOSP master, and should be included in Android Q preview branch, please refer to:
- https://android.googlesource.com/platform/build/soong/+/refs/heads/master/cc/config/arm64_device.go
- https://android.googlesource.com/platform/build/soong/+/refs/tags/android-q-preview-2.5/cc/config/arm64_device.go
Thanks