The Scale Vector Extension (SVE) is an optional extension to the Armv8-A architecture. A scalable vector length enables multiple processor performance and cost points. Prior to silicon availability, one way to learn SVE is to use the Arm HPC Compiler and Arm Fast Models to experiment with Linux applications compiled for SVE.
To demonstrate this, a Fixed Virtual Platform (FVP) can be used to run openSUSE, one of the Linux distributions used for HPC applications. FVPs are complete simulations of an Arm system, including processors, memories, and peripherals, and provide a way to start executing software without a physical system.
Although simplistic, a Linux distribution can be viewed as the combination of a Linux kernel and a file system with applications. These are the two primary components needed to achieve the goal of running SVE applications on the Tumbleweed release of openSUSE.
Because SVE is a recent technology, some extra work is required to create an environment capable of running SVE applications. Support for SVE in the Linux kernel came in during versions 4.15 and 4.16. As of this writing, 4.16 is the latest stable kernel and is used for this article. A kernel configuration value called CONFIG_ARM64_SVE in the Armv8.2 section was added to enable SVE support. This must be present and enabled in the kernel configuration for SVE applications to work. Without it SVE applications will generate an unrecognized exception and fail.
There are multiple ways to attain the goal of running Linux SVE applications, but one way is to perform the following steps:
The following sections outline how to perform each step and arrive at the goal of running SVE applications on openSUSE Tumbleweed. The steps have been done using an Ubuntu 16.04 host. Other Linux host machines are possible, but some workarounds will be necessary. About 25GB of disk space is needed to complete the process. The FVP is only available for Linux. The FVP can be changed, other virtual prototypes can be created, and Windows is supported using the Arm Fast Models product.
A good first step is the software stack from Linaro. This provides confidence in the setup as this is a documented procedure. There are many different FVPs available. The one needed to run SVE applications is the Armv8-A Base Platform FVP with the Architectural Envelope Model (AEM) for Armv8-A. AEM is a generic CPU model of the Arm architecture which supports versions 8.0 to 8.4 but does not represent any specific CPU implementation. Find the Armv8-A FVP on the FVP page.
Extract the downloaded FVP. The executable for the FVP is in the models/Linux64_GCC-4.9 directory.
The only requirement to run the FVP using the Linaro scripting is to set the MODEL environment variable to point to the executable. I downloaded the FVP to my ~/Downloads/ directory, extracted it, and set the environment variable in bash using:
$ export MODEL=~/Downloads/Base_RevC_AEMv8A_pkg/models/Linux64_GCC-4.9/FVP_Base_RevC-2xAEMv8A
The FVP does not require a FlexLM license but is subject to the included EULA.
The development platforms wiki on Arm community provides the instructions to setup the Linaro software stack. The path to follow is 64-bit Linaro with the latest-armlt kernel running the OpenEmbedded LAMP software. Although this is the wrong kernel and file system for the final goal, it provides all of the scripting to download the software components including software tools, UEFI, and u-boot. It is much easier to start from a working system and change the Linux kernel and file system compared to starting from scratch.
First, download the workspace_1801.py script and make sure python3 is installed. The workspace setup may flag missing packages which can be installed using apt-get. If this happens, install the missing packages and try again.
$ python3 workspace_1801.py
The answers to the script questions are 3, 1, 1, 2, 3, and y to create the workspace. This is a good time to take a break while all of the tools and software is downloaded. Once the setup is complete, a message will appear with instructions to build.
To build use:
$ build-scripts/build-all.sh all
This is another appropriate time to take a break while the compilation proceeds.
When the build is done, edit the file model-scripts/run_model.sh to specify the disk image. I did this around line 157. The DISK parameter should be a path the .img file for OpenEmbedded. The model is going to be relative to a subdirectory 4 levels down which contains the other boot artifacts. The complete run_model.sh is attached to the bottom of this article.
Here is the added line:
DISK=../../../../lt-vexpress64-openembedded_lamp-armv8-gcc-4.9_20150912-729.img
To run the FVP use:
$ ./model-scripts/run_model.sh output/fvp/fvp-oe/uboot
If everything works as expected, Linux should boot to the root prompt. This confirms UEFI, u-boot, the Linux kernel, device tree, and OpenEmbedded file system are all setup correctly. This serves as a good starting point to update the kernel from 4.14 to 4.16 for SVE support and then move to openSUSE.
SVE support must be enabled in two places, the simulated Arm system and the Linux kernel.
To run SVE instructions, a plugin is required for the FVP. Without this plugin, the FVP will fail with illegal instructions when SVE is used. Edit the file model-scripts/run-model.sh file and add the plugin. The plugin is added to the launch command around line 307.
--plugin /home/jasand01/Downloads/Base_RevC_AEMv8A_pkg/plugins/Linux64_GCC-4.9/ScalableVectorExtension.so \
The SVE plugin can be configured with various parameters, including the vector length (as a multiple of 64-bits). For example, to set the vector length to 128 use:
-C SVE.ScalableVectorExtension.veclen=2 \
Even with the plugin SVE will still fail without proper Linux kernel support.
To upgrade the Linux kernel, download the 4.16 kernel from kernel.org. There are multiple ways to do this, but I typically download the tarball link and extract it using tar. For a seamless transition move the linux/ directory in the workspace created above to linux-4.14 and move the linux-4.16 directory to linux/ in place of the old source tree. Just changing out the linux/ source directory is not enough because the mainline kernel doesn’t have the device tree files for the FVP. These can be copied from the 4.14 kernel tree:
$ cp linux-4.14/arch/arm64/boot/dts/arm/fvp-base* linux/arch/arm64/boot/dts/arm/
The Makefile in linux/arch/arm64/boot/dts/arm also needs to be edited to include the device tree blob for the FVP. The third line with the 2 fvp-base* .dtb files should be added. This will build the FVP device tree files for the new kernel.
dtb-$(CONFIG_ARCH_VEXPRESS) += \ foundation-v8.dtb foundation-v8-psci.dtb \ fvp-base-aemv8a-aemv8a.dtb fvp-base-aemv8a-aemv8a-t1.dtb \ foundation-v8-gicv3.dtb foundation-v8-gicv3-psci.dtb
Once the 4.16 kernel source is in place, rebuild and confirm everything still works.
$ rm -rf output/ $ build-scripts/build-all.sh all
Run again using:
Now uname -a should print a 4.16 kernel.
The next step is to change the file system to openSUSE Tumbleweed for Arm. I used the XFCE image, but other variations can be used. For this article the window manager is not important, and I ended up disabling it and running everything from the command line. After the file is downloaded, uncompress it using unxz:
$ unxz openSUSE-Tumbleweed-ARM-XFCE-efi.aarch64-2018.02.02-Build1.2.raw.xz
This is the openSUSE file system which will be used. Unfortunately, it has a couple of things which need attention, the partition table and amount of free space. The partition layout is different from the OpenEmbedded .img file used above. The openSUSE file has three partitions instead of two for the OpenEmbedded file. The other problem is the space is not going to be enough to install the Arm HPC Compiler, so more space needs to be added.
One solution is to make a copy of the OpenEmbedded file, add more space, delete all of the data, and copy the data from the openSUSE file to the new copy of the OpenEmbedded file. The result is an openSUSE image with more space. This can be achieved using Linux utilities.
$ cp lt-vexpress64-openembedded_lamp-armv8-gcc-4.9_20150912-729.img sve.img $ dd if=/dev/zero bs=1M count=10240 >> sve.img $ sudo parted sve.img GNU Parted 3.2 Using /home/jasand01/linaro/sve.img Welcome to GNU Parted! Type 'help' to view a list of commands. (parted) print Model: (file) Disk /home/jasand01/linaro/sve.img: 14.0GB Sector size (logical/physical): 512B/512B Partition Table: msdos Disk Flags: Number Start End Size Type File system Flags 1 32.3kB 79.7MB 79.7MB primary fat16 boot, lba 2 79.7MB 3221MB 3142MB primary ext4 (parted) resizepart 2 13900MB (parted) print Model: (file) Disk /home/jasand01/linaro/sve.img: 14.0GB Sector size (logical/physical): 512B/512B Partition Table: msdos Disk Flags: Number Start End Size Type File system Flags 1 32.3kB 79.7MB 79.7MB primary fat16 boot, lba 2 79.7MB 13.9GB 13.8GB primary ext4 (parted) quit $ fdisk -lu sve.img Disk sve.img: 13 GiB, 13958643712 bytes, 27262976 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disklabel type: dos Disk identifier: 0x517cf287 Device Boot Start End Sectors Size Id Type sve.img1 * 63 155646 155584 76M e W95 FAT16 (LBA) sve.img2 155648 26291015 26135368 12.5G 83 Linux
Multiply partition 2 start (155648) by 512 to get mount point offset and mount it on the host machine /mnt. Once mounted, resize the file system so it uses the available space.
$ sudo mount -o loop,offset=79691776 sve.img /mnt $ ls /mnt ./ bin/ dev/ etc/ lib/ media/ opt/ run/ sys/ usr/ ../ boot/ EFI/ home/ lost+found/ mnt/ proc/ sbin/ tmp@ var/ $ df /mnt Filesystem 1K-blocks Used Available Use% Mounted on /dev/loop0 2954128 1903696 880656 69% /mnt $ sudo resize2fs /dev/loop0 12500M $ df /mnt Filesystem 1K-blocks Used Available Use% Mounted on /dev/loop0 12537600 1906688 10074980 16% /mnt
Now mount the openSUSE file on new directory /mnt2 using same approach of multiplying the start of partition 3 (1128448) by 512.
$ sudo mkdir /mnt2 $ fdisk -lu openSUSE-Tumbleweed-ARM-XFCE-efi.aarch64-2018.02.02-Build1.2.raw Disk openSUSE-Tumbleweed-ARM-XFCE-efi.aarch64-2018.02.02-Build1.2.raw: 5.5 GiB, 5891948544 bytes, 11507712 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disklabel type: gpt Disk identifier: 2B7A8EB5-992C-45DE-9806-EC8C7D84759F Device Start End Sectors Size Type openSUSE-Tumbleweed-ARM-XFCE-efi.aarch64-2018.02.02-Build1.2.raw1 2048 411651 409604 200M EFI System openSUSE-Tumbleweed-ARM-XFCE-efi.aarch64-2018.02.02-Build1.2.raw2 413696 1126403 712708 348M Linux file openSUSE-Tumbleweed-ARM-XFCE-efi.aarch64-2018.02.02-Build1.2.raw3 1128448 11507584 10379137 5G Linux file $ sudo mount -o loop,offset=577765376 openSUSE-Tumbleweed-ARM-XFCE-efi.aarch64-2018.02.02-Build1.2.raw /mnt2
Remove the OpenEmbedded data and replace it with the openSUSE file system:
$ cd /mnt $ sudo rm -rf * $ sudo cp -r -p /mnt2/* . $ sudo umount /mnt2 /mnt
Now, test the new openSUSE file system by modifying the run_model.sh and change the DISK image from the OpenEmbedded .img file to the newly created sve.img. This is at line 159 of the attached run_model.sh file.
The first boot is takes more time, but subsequent boots will be faster. The time to boot also depends on the size of the root file system. Adding less extra space will result in a faster boot so feel free to adjust the numbers. At the end of the process there was still 4GB of free space, so adding only 6-7GB would be enough to complete the process.
The root password is linux. The file system is now openSUSE.
After logging in, I recommend to disable the graphical login. This can be done with the command:
# update-alternatives –config default-displaymanager
Then select 1 for console. This will disable the graphical login, and on subsequent boots the graphical login will be disabled. The “FAILED” message for the display manager is shown in the terminal above. This is what should happen when the graphical desktop is disabled.
Before installing the HPC Compiler, the virtual networking should be setup. This is done by making some modifications to the model-scripts/run_model.sh file.
A new line with NET=1 is needed at line 156 to enable the networking support.
There are also two parameters to pass to the FVP to enable user mode networking and port redirection for ssh and scp. These are at line 263.
-C bp.hostbridge.userNetworking=true \ -C bp.hostbridge.userNetPorts="8022=22" \
Now it’s possible to ssh to the FVP from the host machine using the redirected port 8022 instead of the default port 22:
$ ssh -p 8022 root@localhost $ scp -P 8022 somefile root@localhost:~/
To enable ssh and scp without the password use the following on the host machine.
$ ssh-keygen $ ssh-copy-id -i ~/.ssh/id_rsa.pub -p 8022 root@localhost
With virtual networking enabled and working the Arm HPC Compiler can be installed and used.
Download the Arm HPC Compiler by following the instructions on Arm Developer. The file to download is the one for SUSE 12 and is the current version is 18.2. The file name is ARM-Compiler-for-HPC-eval_18.2_AArch64_SUSE_12_aarch64.tar.gz
To install the HPC Compiler mount sve.img on /mnt so the HPC Compiler can be added to the file system directly from the host machine. This is faster than copying it using ssh.
$ sudo mount -o loop,offset=79691776 sve.img /mnt $ cd /mnt/home/ $ sudo mkdir hpc $ cd hpc $ sudo tar xvfz ~/Downloads/ARM-Compiler-for-HPC-eval_18.2_AArch64_SUSE_12_aarch64.tar.gz $ cd $ sudo umount /mnt
Run again with the same script as before:
When the system boots login and run the installation script for the compiler.
# cd /home/hpc/ARM-Compiler-for-HPC-eval_18.2_AArch64_SUSE_12_aarch64/ # ./arm-compiler-for-hpc-18.2_eval_Generic-AArch64_SUSE-12_aarch64-linux-rpm.sh
Accept the license agreement and wait for the installation to complete. This can take quite some time as there are multiple packages being installed and the installer is over 1GB.
When the HPC Compiler install complete, install two additional packages. The “modules” package is used to configure the HPC compiler setup and “devel_basis” contains general software development tools.
# zypper install environment-modules # zypper install --type pattern devel_basis
Set up the modules environment by adding the path the HPC Compiler modules to the MODULEPATH environment variable. This can be added to ~/.profile or ~/.bashrc to be setup automatically.
# export MODULEPATH=$MODULEPATH:/opt/arm/modulefiles/ # module load Generic-AArch64/SUSE/12/suites/arm-compiler-for-hpc_eval/18.2
The HPC compiler requires a license file to work. A free trial is available, just fill out the form and a license will be e-mailed.
Copy the file to /opt/arm/licenses/License.bin
$ ssh -p 8022 root@localhost mkdir /opt/arm/licenses $ scp -P 8022 License.bin root@localhost:/opt/arm/licenses/
Everything should be setup now to compile and run an SVE application.
To try out the SVE instructions use the example program. Copy example.c using ssh or paste it into the terminal. Compile and run as shown on the example page. If all goes well, a table of values should be printed in the terminal.
# armclang -O3 -march=armv8-a+sve -o example example.c # ./example i a[i] b[i] c[i] ============================= 0 197 283 86 1 262 277 15 2 258 293 35 3 194 286 92 4 228 249 21 5 235 262 27 6 231 290 59 7 237 263 26 8 214 240 26 9 236 272 36 10 143 211 68
To confirm SVE, use the disassembler and look at the function subtract_arrays().
# armllvm-objdump -disassemble -mattr=+sve example > example.dis # less example.dis /subtract_arrays subtract_arrays: 400604: e9 03 16 32 orr w9, wzr, #0x400 400608: e8 03 1f aa mov x8, xzr 40060c: e0 1f a9 25 whilelo p0.s, xzr, x9 400610: 20 40 48 a5 ld1w {z0.s}, p0/z, [x1, x8, lsl #2] 400614: 41 40 48 a5 ld1w {z1.s}, p0/z, [x2, x8, lsl #2] 400618: 00 04 a1 04 sub z0.s, z0.s, z1.s 40061c: 00 40 48 e5 st1w {z0.s}, p0, [x0, x8, lsl #2] 400620: e8 e3 b0 04 incw x8 400624: 00 1d a9 25 whilelo p0.s, x8, x9 400628: 44 ff ff 54 b.mi #-24 40062c: c0 03 5f d6 ret
Linux applications utilizing SVE can be run today using a Fixed Virtual Platform (FVP) which utilizes the Arm Architectural Envelope Model (AEM) for Armv8-A. The FVP contains a plugin to enable SVE and set parameters such as the vector length. The FVP can run various Linux distributions including the current openSUSE Tumbleweed release as long as the Linux kernel is new enough to support SVE. Using Arm Fast Models before the availability of Arm systems supporting SVE provides needed flexibility for many early software porting tasks. If an Arm AArch64 target system is available, the Arm Instruction Emulator is another way to run SVE.
Arm Fast Models cover a wide range of different hardware systems and are used for many software development tasks. Fast Models can be tried by requesting an evaluation license using the button below.
Fast Models Downloads
thanks for your reply,and thanks for your work!i will try again!
You actually got farther than I did, I got an error on the AArch32 compiler when I just tried it. These things do get out of date and need to be updated. I would recommend to use workspace_1901.py instead. It is a little different but very similar.
You can get the script for 1901 from this page:
https://community.arm.com/developer/tools-software/oss-platforms/w/docs/304/arm-reference-platforms-deliverables
The 1901 version has some extra Ubuntu packages required, but after I instealled them I could run the initial configuration and download all of the required tools and software without errors.
Let me know if you need more specific help.
Thanks,
Jason
hey guys,when i run the workspace_1801.py ,some error happen:
Fetching AArch32 compiler: 87.29 MiB / 87.29 MiB. Extracting... Done.Fetching AArch64 compiler: 93.88 MiB / 93.88 MiB. Extracting... Done.Fetching FVP 64-bit OpenEmbedded LAMP image: 488.85 MiB / 488.85 MiB. Extracting... Done.Fetching Repo: 28.46 KiB / 28.46 KiB. Extracting... Done.Initialising Repo tool... Done.Syncing Repo tool (this may take a long time, please be patient)...
[FATAL] RepoSync(): Failed to sync Repo tool
is there some error ? please help!