Bringing Armv8.2 Instructions to Android Runtime

February 26, 2019

6 minute read time.

Arm Architecture v8.2 features were detailed in 2016. Just two years later, in 2018, flagship Android phones with ARMv8.2 features (Google Pixel3, Huawei Mate 20 Pro, Samsung Galaxy S9, etc) started shipping in volume. Naturally, cellphone vendors and software developers want to see the latest Arm architectural features being fully exploited in their products.

The Android Runtime (ART) is the main application runtime environment layer used by the Android operating system, which supports Android framework code and Android Apps written in Java/Kotlin. This is the layer where the Armv8.2 instructions can play an important role, as the ART compiler can use these new instructions to optimise Android APIs and Java/Kotlin code for the user, and deliver big performance improvement on the Android system. As the Android Runtime layer sits quite high in Android software stack, introducing new instructions in ART requires a few additional steps. In 2018, there was such collaboration between Linaro, Arm and Google engineers to enable some of Armv8.2 features in the AOSP ART project and therefore enabled in latest Android builds.

Quick overview of Armv8.1 and Armv8.2 enhancements

The Armv8 architecture has continued to evolve, the Armv8.1 and Armv8.2 instruction sets have introduced several enhancements to AArch64 atomic read-write instructions, additions to the Advanced SIMD instruction set, half-precision floating point data processing support, memory model enhancements, introduction of RAS support, and introduction of statistical profiling, etc. As well as the additions, the optional CRC instructions in v8.0 become a requirement in ARMv8.1. Read the Armv8.2-A documentation on Arm Developer. These updates have been described more fully in Armv8-A architecture evolution.

How do users enable new Armv8.2 instructions in ART compiler?

For end users, Android App developers, Game developers, the good news is you don't need to do anything for Java apps. The optimisations to use the new Armv8.2 features described below are enabled automatically by the JIT (just-in-time compiler) on supported Android devices.

For Arm partners and phone manufacturers working on lower level platform and requiring building/flashing Android images for their own Android products, please make sure the architecture variant configuration is correctly set for the target CPUs, so that the new features are correctly propagated and enabled in Clang and ART compilers when building Android images. For example, a BoardConfig for Google Pixel3 device in Android AOSP tree:

Pixel3 BoardConfig
TARGET_ARCH := arm64 TARGET_ARCH_VARIANT := armv8-2a TARGET_CPU_ABI := arm64-v8a

The target arch variant 'armv8-2a' enables Armv8.2 features in Clang and ART compiler when building Android images on host machines. For more details: android / device / google / crosshatch / pie-dr1-release / . / BoardConfig-common.mk

CRC32

In Java, calculation of CRC-32 checksum with the specified array of bytes can be done through java.util.zip.CRC32 class APIs. The ART compiler can optimise this API using Armv8 CRC32{bhwx} instructions.
The following example shows how Android API java.util.zip.CRC32.update(int b) will be optimised with CRC32B instruction. The CRC32 instruction implementation of java.util.zip.CRC32.update() delivers 20 times performance improvement, compared to the unoptimised version.

Android API
java.util.zip.CRC32.update(int b)	MVN W0, W2 ; W2 == b CRC32B W0, W0, W1 ; W1 == this.crc MVN W0, W0 ; return W0

FP16 extensions

Armv8.2 provides support for half-precision floating point data processing instructions. Such instructions are ideal for optimising Android public API android.util.Half class - a wrapper and a utility class to provide full software implementation for manipulating half-precision 16-bit IEEE 754 floating point data types. For user code which is using android.util.Half class, the conversions between 16-bit FP16 data and 32-bit single precision float data are sped up with Armv8.2 FP16 instructions. The following example shows how current android.util.Half.toFloat() API can be optimised with Armv8.2 FP16 instructions. The Armv8.2-FP16 instruction implementation of android.util.Half.toFloat method delivers 50% performance improvement, compared to the unoptimised version.


android.util.Half.toFloat(Half h)	FMOV H31, W1 ; W1 == h FCVT S0, H31 ; return S0

Dot Product

In Armv8.2 supported by Cortex-A55 / Cortex-A75 and later CPUs, the signed dot product (SDOT) and unsigned dot product (UDOT) are introduced to improve machine learning performance. For user code that involves calculating dot product from byte-arrays, the new ART compiler support auto-vectorizing the unrolled loop pattern, and automatically replacing the kernel loop with SDOT/UDOT SIMD instructions for much faster parallel execution. Compared to the unoptimised loop, the ARMv8.2 dot product instructions brings over 7x performance improvements on Java loop like this.

Java Dot Product Loop Code	Armv8 Dot-product implementation
// byte[] a; byte[] b; for (int i = 0; i < size; i++) { sum += a[i] * b[i]; }	... ... UDOT v0.4s, v1.16b, v2.16b ...

Java Dot Product Loop Code

Armv8 Dot-product implementation

// byte[] a; byte[] b; 
for (int i = 0; i < size; i++) {
 sum += a[i] * b[i]; 
}

...

... 
UDOT v0.4s, v1.16b, v2.16b
...

For more information about Armv8 dot product instructions and optimisations, please refer to our blogpost: Exploring the Arm dot product instructions.

Assembler/disassembler & simulator support

A dynamic compiler such as Android Runtime compiler will require up-to-date support for assembling/disassembling instruction opcodes for the dynamically-compiled code sequences. VIXL is an opensource Arm/Arm64 assembler/disassembler designed to make it easy to support dynamic code generation for the latest Arm architecture versions. The VIXL release from 2018-Q3 includes full support for Armv8.3 instructions. The Arm ART team has worked closely with Google engineers to fully integrate the latest upstream VIXL into Google AOSP project. VIXL also provides a set of development/debug/simulation features for developers to debug and test latest Armv8.2 features. The following example shows how engineers can download VIXL and test CRC32 instructions and checksum implementations with VIXL simulator.

This is also a workflow of how our Arm ART developers debug new Armv8.2 instructions for ART compiler.


$ git clone https://android.googlesource.com/platform/external/vixl; cd vixl $ vi examples/aarch64/crc-checksums.cc $ scons aarch64_examples -j8 $ obj/latest/examples/aarch64/crc-checksums

Static CPU feature detection support in ART

Currently, Android runtime compiler has an option to detect CPU feature based on Clang compiler's predefined macros. AOSP's prebuilt clang version has updated to support new Armv8.2 instructions and new CPUs including Cortex-A55 and Cortex-A75. For developers who are building Android apps or Android images on host machines, you should ensure the IDEs/compilers/BoardConfig are correctly configured to enable the correct predefined macros for your selected Arm CPUs, which will be used by ART to select appropriate instruction generation. The following examples show how developers can quickly check available ARM CPU feature macros provided by Clang compiler:

Clang command line example
# Example: check Dot Product feature $ clang++ -target aarch64-none-linux-gnueabi -mcpu=cortex-a55 -E -dM - < /dev/null \| grep DOTPROD #define __ARM_FEATURE_DOTPROD 1 # Example: check FP16 feature $ clang++ -target aarch64-none-linux-gnueabi -march=armv8.2a+fp16 -E -dM - < /dev/null \| grep FP16 #define __ARM_FEATURE_FP16_SCALAR_ARITHMETIC 1 #define __ARM_FEATURE_FP16_VECTOR_ARITHMETIC 1 #define __ARM_FP16_ARGS 1 #define __ARM_FP16_FORMAT_IEEE 1

Clang command line example

# Example: check Dot Product feature
$ clang++ -target aarch64-none-linux-gnueabi -mcpu=cortex-a55 -E -dM - < /dev/null | grep DOTPROD
#define __ARM_FEATURE_DOTPROD 1


# Example: check FP16 feature
$ clang++ -target aarch64-none-linux-gnueabi -march=armv8.2a+fp16 -E -dM - < /dev/null | grep FP16
#define __ARM_FEATURE_FP16_SCALAR_ARITHMETIC 1
#define __ARM_FEATURE_FP16_VECTOR_ARITHMETIC 1
#define __ARM_FP16_ARGS 1
#define __ARM_FP16_FORMAT_IEEE 1

Dynamic CPU feature detection support in ART

A more advanced way of configuring code generation optimally is to dynamically detect CPU features and take advantage of new Arm instructions when they are identified to be available on the target. Such optimisation is especially useful when running Android apps with ART Just-in-Time mode. Dynamic detection requires 'HWCAP' support from Android kernel, and Bionic header files are also updated to allow testing of the latest HWCAP features. Arm has worked with Google to intro runtime detection in ART JIT compiler, to enable these new Armv8.2 automatically on Android phones that have the latest cpus, without rebuilding. The following example shows how dynamic hwcap detection is typically done in runtime systems:

HW_CAP example
uint64_t hwcaps = getauxval(AT_HWCAP); has_crc_feature = hwcaps & HWCAP_CRC32 ? true : false; has_lse_feature = hwcaps & HWCAP_ATOMICS ? true : false; has_fp16_feature = hwcaps & HWCAP_FPHP ? true : false; has_dotprod_feature = hwcaps & HWCAP_ASIMDDP ? true : false;

HW_CAP example

uint64_t hwcaps = getauxval(AT_HWCAP);
has_crc_feature = hwcaps & HWCAP_CRC32 ? true : false;
has_lse_feature = hwcaps & HWCAP_ATOMICS ? true : false;
has_fp16_feature = hwcaps & HWCAP_FPHP ? true : false;
has_dotprod_feature = hwcaps & HWCAP_ASIMDDP ? true : false;

Refer to android / platform / bionic / pie-release / . / libc / kernel / uapi / asm-arm64 / asm / hwcap.h for Armv8.2 features supported for dynamic HW_CAPs detection.

For any questions about the Armv8.2 deployment on Android please contact developer@arm.com.

Parents

Haili Tian over 5 years ago

Xueliang,

You pointed out that it's already supported in AOSP. I would like to know which branch do you mean. It's still not available in current pie release.

In https://android.googlesource.com/platform/build/soong/+/refs/heads/pie-release/cc/config/arm64_device.go, the option "-mcpu=" for cortex-a55 is not yet supported, there use "-mcpu=cortex-a53" instead.

Thanks!
- Cancel
- Up 0 Down
- Reply
- More
- Cancel

Comment

Haili Tian over 5 years ago

Xueliang,

You pointed out that it's already supported in AOSP. I would like to know which branch do you mean. It's still not available in current pie release.

In https://android.googlesource.com/platform/build/soong/+/refs/heads/pie-release/cc/config/arm64_device.go, the option "-mcpu=" for cortex-a55 is not yet supported, there use "-mcpu=cortex-a53" instead.

Thanks!
- Cancel
- Up 0 Down
- Reply
- More
- Cancel

Children

Xueliang Zhong over 5 years ago in reply to Haili Tian

Hi,
The option "-mcpu=" for cortex-a55/cortex-a75 is supported in AOSP master, and should be included in Android Q preview branch, please refer to:

- https://android.googlesource.com/platform/build/soong/+/refs/heads/master/cc/config/arm64_device.go

- https://android.googlesource.com/platform/build/soong/+/refs/tags/android-q-preview-2.5/cc/config/arm64_device.go

Thanks
- Cancel
- Up 0 Down
- Reply
- More
- Cancel

Operating Systems blog

Enhancing Chromium’s Memory Safety with Armv9

Richard Townsend

The Arm Open-source Software team is delighted to mark the release of Chromium M115, with experimental support for Arm’s Memory Tagging Extension (MTE).
- August 7, 2023
New Memory Tagging Extension User Guide for Android OS Developers

Roberto Lopez Mendez

In this blog, read about what to expect with the new MTE User Guide for Android OS.
- May 25, 2023
Enhancing Chromium's Control Flow Integrity with Armv9

Richard Townsend

This blog explains how Control Flow Integrity, an Armv9 security feature, works on the newly launched Chromium M105.
- October 11, 2022

AI and ML blog

Announcements

Architectures and Processors blog

Automotive blog

Embedded blog

Graphics, Gaming, and VR blog

High Performance Computing (HPC) blog

Infrastructure Solutions blog

Internet of Things (IoT) blog

Operating Systems blog

SoC Design and Simulation blog

Tools, Software and IDEs blog