Develop software with the Arm Scalable Vector Extension (SVE) and Arm Fast Models

The Scale Vector Extension (SVE) is an optional extension to the Armv8-A architecture. A scalable vector length enables multiple processor performance and cost points. Prior to silicon availability, one way to learn SVE is to use the Arm HPC Compiler and Arm Fast Models to experiment with Linux applications compiled for SVE.

To demonstrate this, a Fixed Virtual Platform (FVP) can be used to run openSUSE, one of the Linux distributions used for HPC applications. FVPs are complete simulations of an Arm system, including processors, memories, and peripherals, and provide a way to start executing software without a physical system.

Although simplistic, a Linux distribution can be viewed as the combination of a Linux kernel and a file system with applications. These are the two primary components needed to achieve the goal of running SVE applications on the Tumbleweed release of openSUSE.

Because SVE is a recent technology, some extra work is required to create an environment capable of running SVE applications. Support for SVE in the Linux kernel came in during versions 4.15 and 4.16. As of this writing, 4.16 is the latest stable kernel and is used for this article. A kernel configuration value called CONFIG_ARM64_SVE in the Armv8.2 section was added to enable SVE support. This must be present and enabled in the kernel configuration for SVE applications to work. Without it SVE applications will generate an unrecognized exception and fail.

There are multiple ways to attain the goal of running Linux SVE applications, but one way is to perform the following steps:

  • Download, build, and run the Linaro deliverables on the FVP to establish a working build and run environment
  • Update the Linux kernel to version 4.16 to get SVE support
  • Change the OpenEmbedded file system to the openSUSE Tumbleweed file system
  • Setup virtual networking, ssh, and scp access to the simulated system
  • Install the Arm HPC compiler and license
  • Compile and run SVE applications

The following sections outline how to perform each step and arrive at the goal of running SVE applications on openSUSE Tumbleweed. The steps have been done using an Ubuntu 16.04 host. Other Linux host machines are possible, but some workarounds will be necessary. About 25GB of disk space is needed to complete the process. The FVP is only available for Linux. The FVP can be changed, other virtual prototypes can be created, and Windows is supported using the Arm Fast Models product.

Environment setup

A good first step is the software stack from Linaro. This provides confidence in the setup as this is a documented procedure. There are many different FVPs available. The one needed to run SVE applications is the Armv8-A Base Platform FVP with the Architectural Envelope Model (AEM) for Armv8-A. AEM is a generic CPU model of the Arm architecture which supports versions 8.0 to 8.4 but does not represent any specific CPU implementation. Find the Armv8-A FVP on the FVP page.

Extract the downloaded FVP. The executable for the FVP is in the models/Linux64_GCC-4.9 directory.

The only requirement to run the FVP using the Linaro scripting is to set the MODEL environment variable to point to the executable. I downloaded the FVP to my ~/Downloads/ directory, extracted it, and set the environment variable in bash using:

$ export MODEL=~/Downloads/Base_RevC_AEMv8A_pkg/models/Linux64_GCC-4.9/FVP_Base_RevC-2xAEMv8A

The FVP does not require a FlexLM license but is subject to the included EULA.

The development platforms wiki on Arm community provides the instructions to setup the Linaro software stack. The path to follow is 64-bit Linaro with the latest-armlt kernel running the OpenEmbedded LAMP software. Although this is the wrong kernel and file system for the final goal, it provides all of the scripting to download the software components including software tools, UEFI, and u-boot. It is much easier to start from a working system and change the Linux kernel and file system compared to starting from scratch.

First, download the workspace_1801.py script and make sure python3 is installed. The workspace setup may flag missing packages which can be installed using apt-get. If this happens, install the missing packages and try again.

$ python3 workspace_1801.py

The answers to the script questions are 3, 1, 1, 2, 3, and y to create the workspace. This is a good time to take a break while all of the tools and software is downloaded. Once the setup is complete, a message will appear with instructions to build.  

To build use:

$ build-scripts/build-all.sh all

This is another appropriate time to take a break while the compilation proceeds.

When the build is done, edit the file model-scripts/run_model.sh to specify the disk image. I did this around line 157. The DISK parameter should be a path the .img file for OpenEmbedded. The model is going to be relative to a subdirectory 4 levels down which contains the other boot artifacts. The complete run_model.sh is attached to the bottom of this article.

Here is the added line:

DISK=../../../../lt-vexpress64-openembedded_lamp-armv8-gcc-4.9_20150912-729.img

To run the FVP use:

$ ./model-scripts/run_model.sh output/fvp/fvp-oe/uboot

If everything works as expected, Linux should boot to the root prompt. This confirms UEFI, u-boot, the Linux kernel, device tree, and OpenEmbedded file system are all setup correctly. This serves as a good starting point to update the kernel from 4.14 to 4.16 for SVE support and then move to openSUSE.

Linaro OpenEmbedded boot

SVE support must be enabled in two places, the simulated Arm system and the Linux kernel.

SVE support in the FVP

To run SVE instructions, a plugin is required for the FVP. Without this plugin, the FVP will fail with illegal instructions when SVE is used. Edit the file model-scripts/run-model.sh file and add the plugin. The plugin is added to the launch command around line 307.

--plugin /home/jasand01/Downloads/Base_RevC_AEMv8A_pkg/plugins/Linux64_GCC-4.9/ScalableVectorExtension.so \

The SVE plugin can be configured with various parameters, including the vector length (as a multiple of 64-bits). For example, to set the vector length to 128 use:

-C SVE.ScalableVectorExtension.veclen=2 \

Even with the plugin SVE will still fail without proper Linux kernel support.

Update Linux to 4.16

To upgrade the Linux kernel, download the 4.16 kernel from kernel.org. There are multiple ways to do this, but I typically download the tarball link and extract it using tar. For a seamless transition move the linux/ directory in the workspace created above to linux-4.14 and move the linux-4.16 directory to linux/ in place of the old source tree. Just changing out the linux/ source directory is not enough because the mainline kernel doesn’t have the device tree files for the FVP. These can be copied from the 4.14 kernel tree:

$ cp linux-4.14/arch/arm64/boot/dts/arm/fvp-base* linux/arch/arm64/boot/dts/arm/ 

The Makefile in linux/arch/arm64/boot/dts/arm also needs to be edited to include the device tree blob for the FVP. The third line with the 2 fvp-base* .dtb files should be added. This will build the FVP device tree files for the new kernel.

dtb-$(CONFIG_ARCH_VEXPRESS) += \
        foundation-v8.dtb foundation-v8-psci.dtb \
        fvp-base-aemv8a-aemv8a.dtb fvp-base-aemv8a-aemv8a-t1.dtb \
        foundation-v8-gicv3.dtb foundation-v8-gicv3-psci.dtb

Once the 4.16 kernel source is in place, rebuild and confirm everything still works.

$ rm -rf output/ 
$ build-scripts/build-all.sh all

Run again using:

$ ./model-scripts/run_model.sh output/fvp/fvp-oe/uboot

Now uname -a should print a 4.16 kernel.

Linux 4.16 kernel

Change to openSUSE

The next step is to change the file system to openSUSE Tumbleweed for Arm. I used the XFCE image, but other variations can be used. For this article the window manager is not important, and I ended up disabling it and running everything from the command line. After the file is downloaded, uncompress it using unxz:

$ unxz openSUSE-Tumbleweed-ARM-XFCE-efi.aarch64-2018.02.02-Build1.2.raw.xz

This is the openSUSE file system which will be used. Unfortunately, it has a couple of things which need attention, the partition table and amount of free space. The partition layout is different from the OpenEmbedded .img file used above. The openSUSE file has three partitions instead of two for the OpenEmbedded file. The other problem is the space is not going to be enough to install the Arm HPC Compiler, so more space needs to be added.

One solution is to make a copy of the OpenEmbedded file, add more space, delete all of the data, and copy the data from the openSUSE file to the new copy of the OpenEmbedded file. The result is an openSUSE image with more space. This can be achieved using Linux utilities.

$ cp lt-vexpress64-openembedded_lamp-armv8-gcc-4.9_20150912-729.img sve.img
$ dd if=/dev/zero bs=1M count=10240 >> sve.img
$ sudo parted sve.img
GNU Parted 3.2
Using /home/jasand01/linaro/sve.img
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) print                                                            
Model:  (file)
Disk /home/jasand01/linaro/sve.img: 14.0GB
Sector size (logical/physical): 512B/512B
Partition Table: msdos
Disk Flags: 

Number  Start   End     Size    Type     File system  Flags
 1      32.3kB  79.7MB  79.7MB  primary  fat16        boot, lba
 2      79.7MB  3221MB  3142MB  primary  ext4

(parted) resizepart 2 13900MB
(parted) print                                                            
Model:  (file)
Disk /home/jasand01/linaro/sve.img: 14.0GB
Sector size (logical/physical): 512B/512B
Partition Table: msdos
Disk Flags: 

Number  Start   End     Size    Type     File system  Flags
 1      32.3kB  79.7MB  79.7MB  primary  fat16        boot, lba
 2      79.7MB  13.9GB  13.8GB  primary  ext4

(parted) quit    

                 
$ fdisk -lu sve.img
Disk sve.img: 13 GiB, 13958643712 bytes, 27262976 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x517cf287

Device     Boot  Start      End  Sectors  Size Id Type
sve.img1   *        63   155646   155584   76M  e W95 FAT16 (LBA)
sve.img2        155648 26291015 26135368 12.5G 83 Linux

Multiply partition 2 start (155648) by 512 to get mount point offset and mount it on the host machine /mnt. Once mounted, resize the file system so it uses the available space.

$ sudo mount -o loop,offset=79691776 sve.img /mnt

$ ls /mnt
./   bin/   dev/  etc/   lib/         media/  opt/   run/   sys/  usr/
../  boot/  EFI/  home/  lost+found/  mnt/    proc/  sbin/  tmp@  var/

$ df /mnt
Filesystem     1K-blocks    Used Available Use% Mounted on
/dev/loop0       2954128 1903696    880656  69% /mnt

$ sudo resize2fs /dev/loop0 12500M

$ df /mnt
Filesystem     1K-blocks    Used Available Use% Mounted on
/dev/loop0      12537600 1906688  10074980  16% /mnt

Now mount the openSUSE file on new directory /mnt2 using same approach of multiplying the start of partition 3 (1128448) by 512.

$ sudo mkdir /mnt2
$ fdisk -lu openSUSE-Tumbleweed-ARM-XFCE-efi.aarch64-2018.02.02-Build1.2.raw
Disk openSUSE-Tumbleweed-ARM-XFCE-efi.aarch64-2018.02.02-Build1.2.raw: 5.5 GiB, 5891948544 bytes, 11507712 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 2B7A8EB5-992C-45DE-9806-EC8C7D84759F

Device                                                              Start      End  Sectors  Size Type
openSUSE-Tumbleweed-ARM-XFCE-efi.aarch64-2018.02.02-Build1.2.raw1    2048   411651   409604  200M EFI System
openSUSE-Tumbleweed-ARM-XFCE-efi.aarch64-2018.02.02-Build1.2.raw2  413696  1126403   712708  348M Linux file
openSUSE-Tumbleweed-ARM-XFCE-efi.aarch64-2018.02.02-Build1.2.raw3 1128448 11507584 10379137    5G Linux file

$ sudo mount -o loop,offset=577765376 openSUSE-Tumbleweed-ARM-XFCE-efi.aarch64-2018.02.02-Build1.2.raw /mnt2

Remove the OpenEmbedded data and replace it with the openSUSE file system:

$ cd /mnt
$ sudo rm -rf *
$ sudo cp -r -p /mnt2/* .
$ sudo umount /mnt2 /mnt

Now, test the new openSUSE file system by modifying the run_model.sh and change the DISK image from the OpenEmbedded .img file to the newly created sve.img. This is at line 159 of the attached run_model.sh file. 

$ ./model-scripts/run_model.sh output/fvp/fvp-oe/uboot

The first boot is takes more time, but subsequent boots will be faster. The time to boot also depends on the size of the root file system. Adding less extra space will result in a faster boot so feel free to adjust the numbers. At the end of the process there was still 4GB of free space, so adding only 6-7GB would be enough to complete the process.

The root password is linux. The file system is now openSUSE.

OpenSUSE boot

After logging in, I recommend to disable the graphical login. This can be done with the command:

# update-alternatives –config default-displaymanager

Then select 1 for console. This will disable the graphical login, and on subsequent boots the graphical login will be disabled. The “FAILED” message for the display manager is shown in the terminal above. This is what should happen when the graphical desktop is disabled.

Setup networking, ssh, and scp

Before installing the HPC Compiler, the virtual networking should be setup. This is done by making some modifications to the model-scripts/run_model.sh file.

A new line with NET=1 is needed at line 156 to enable the networking support.

There are also two parameters to pass to the FVP to enable user mode networking and port redirection for ssh and scp. These are at line 263.

-C bp.hostbridge.userNetworking=true \
-C bp.hostbridge.userNetPorts="8022=22" \

Now it’s possible to ssh to the FVP from the host machine using the redirected port 8022 instead of the default port 22:

$ ssh -p 8022 root@localhost 
$ scp -P 8022 somefile root@localhost:~/

To enable ssh and scp without the password use the following on the host machine.

$ ssh-keygen
$ ssh-copy-id -i ~/.ssh/id_rsa.pub  -p 8022 root@localhost

With virtual networking enabled and working the Arm HPC Compiler can be installed and used.

Download the Arm HPC Compiler

Download the Arm HPC Compiler by following the instructions on Arm Developer.  The file to download is the one for SUSE 12 and is the current version is 18.2. The file name is ARM-Compiler-for-HPC-eval_18.2_AArch64_SUSE_12_aarch64.tar.gz

To install the HPC Compiler mount sve.img on /mnt so the HPC Compiler can be added to the file system directly from the host machine. This is faster than copying it using ssh.

$ sudo mount -o loop,offset=79691776 sve.img /mnt
$ cd /mnt/home/
$ sudo mkdir hpc
$ cd hpc
$ sudo tar xvfz ~/Downloads/ARM-Compiler-for-HPC-eval_18.2_AArch64_SUSE_12_aarch64.tar.gz
$ cd
$ sudo umount /mnt

Run again with the same script as before:

$ ./model-scripts/run_model.sh output/fvp/fvp-oe/uboot

When the system boots login and run the installation script for the compiler.

# cd /home/hpc/ARM-Compiler-for-HPC-eval_18.2_AArch64_SUSE_12_aarch64/
# ./arm-compiler-for-hpc-18.2_eval_Generic-AArch64_SUSE-12_aarch64-linux-rpm.sh

Accept the license agreement and wait for the installation to complete. This can take quite some time as there are multiple packages being installed and the installer is over 1GB.

When the HPC Compiler install complete, install two additional packages. The “modules” package is used to configure the HPC compiler setup and “devel_basis” contains general software development tools.

# zypper install environment-modules
# zypper install --type pattern devel_basis

Set up the modules environment by adding the path the HPC Compiler modules to the MODULEPATH environment variable. This can be added to ~/.profile or ~/.bashrc to be setup automatically.

# export MODULEPATH=$MODULEPATH:/opt/arm/modulefiles/
# module load Generic-AArch64/SUSE/12/suites/arm-compiler-for-hpc_eval/18.2

The HPC compiler requires a license file to work. A free trial is available, just fill out the form and a license will be e-mailed.

Copy the file to /opt/arm/licenses/License.bin

$ ssh -p 8022 root@localhost mkdir /opt/arm/licenses
$ scp -P 8022 License.bin root@localhost:/opt/arm/licenses/

Everything should be setup now to compile and run an SVE application.

Create an example SVE program

To try out the SVE instructions use the example program. Copy example.c using ssh or paste it into the terminal. Compile and run as shown on the example page. If all goes well, a table of values should be printed in the terminal. 

# armclang -O3 -march=armv8-a+sve -o example example.c
# ./example

i       a[i]    b[i]    c[i]
=============================
0       197     283     86
1       262     277     15
2       258     293     35
3       194     286     92
4       228     249     21
5       235     262     27
6       231     290     59
7       237     263     26
8       214     240     26
9       236     272     36
10      143     211     68

To confirm SVE, use the disassembler and look at the function subtract_arrays().

#  armllvm-objdump -disassemble -mattr=+sve example > example.dis
# less example.dis 
/subtract_arrays
subtract_arrays:
  400604:	e9 03 16 32 	orr	w9, wzr, #0x400
  400608:	e8 03 1f aa 	mov	x8, xzr
  40060c:	e0 1f a9 25 	whilelo	p0.s, xzr, x9
  400610:	20 40 48 a5 	ld1w	{z0.s}, p0/z, [x1, x8, lsl #2]
  400614:	41 40 48 a5 	ld1w	{z1.s}, p0/z, [x2, x8, lsl #2]
  400618:	00 04 a1 04 	sub	z0.s, z0.s, z1.s
  40061c:	00 40 48 e5 	st1w	{z0.s}, p0, [x0, x8, lsl #2]
  400620:	e8 e3 b0 04 	incw	x8
  400624:	00 1d a9 25 	whilelo	p0.s, x8, x9
  400628:	44 ff ff 54 	b.mi	#-24
  40062c:	c0 03 5f d6 	ret

Summary

Linux applications utilizing SVE can be run today using a Fixed Virtual Platform (FVP) which utilizes the Arm Architectural Envelope Model (AEM) for Armv8-A. The FVP contains a plugin to enable SVE and set parameters such as the vector length. The FVP can run various Linux distributions including the current openSUSE Tumbleweed release as long as the Linux kernel is new enough to support SVE. Using Arm Fast Models before the availability of Arm systems supporting SVE provides needed flexibility for many early software porting tasks. If an Arm AArch64 target system is available, the Arm Instruction Emulator is another way to run SVE. 

Arm Fast Models cover a wide range of different hardware systems and are used for many software development tasks. Fast Models can be tried by requesting an evaluation license using the button below. 

Fast Models Downloads

run_model.sh
Anonymous