We are running a survey to help us improve the experience for all of our members. If you see the survey appear, please take the time to tell us about your experience if you can.
Dear ARM support,
I am currently exploring the Cortex-A320 core and have downloaded the startup_Cortex-A320x1 example for experimentation. To evaluate the functionality of SVE/SVE2 intrinsics, I added a small test snippet using basic SVE instructions.
startup_Cortex-A320x1
#include <stdlib.h> #include <stdio.h> #include <arm_sve.h> #include <arm_neon.h> #ifdef USE_SERIAL_PORT #include "uart.h" #endif #ifdef USE_SERIAL_PORT extern void UartInit(void); #endif // declaration of 'extern' functions extern void init_timer(void); // in timer_interrupts.c __attribute__((noreturn)) int main(void) { #ifdef USE_SERIAL_PORT UartInit(); #endif printf("\nArmv9-A single-core startup code example, using Arm Compiler for Embedded 6\n\n"); init_timer(); #if 1 uint8_t __attribute__((aligned(16))) data[64] = {0} ; uint8_t* unaligned_ptr = data ; printf("\n\rorig: %d\n\r", ((int32_t*)unaligned_ptr)[0]); int32x4_t v_neon = vld1q_s32((int32_t*)unaligned_ptr); v_neon = vaddq_s32(v_neon, vdupq_n_s32(1)); vst1q_s32((int32_t*)unaligned_ptr, v_neon); printf("\n\rneon: %d\n\r", ((int32_t*)unaligned_ptr)[0]); svbool_t pg = svptrue_b32(); svint32_t v_sve = svld1_s32(pg, (int32_t*)unaligned_ptr); v_sve = svadd_s32_m(pg, v_sve, svdup_s32(1)); svst1_s32(pg, (int32_t*)unaligned_ptr, v_sve); printf("\n\rsve: %d\n\r", ((int32_t*)unaligned_ptr)[0]); #endif for(;;) { asm volatile ("WFI"); } // loop forever }
However, during execution, the core enters an exception state immediately upon encountering the first SVE instruction (svld1_s32).
// // Current EL with SPx // .balign 0x80 cxsync1: B cxsync1
Could you please advise:
Why the exception occurs on the SVE instruction?
How to properly enable SVE/SVE2 support on the Cortex-A320 using the provided FVP simulator and example setup?
I am using the FVP configuration that came with the example mentioned above. Any guidance on configuring the vector length or enabling necessary system registers would also be appreciated.
Thank you in advance for your assistance.
Best regards, Yevhenii
You must disable the trapping of these instructions in your init code. Modify startup.s as follows
... // neither EL3 nor EL2 trap floating point or accesses to CPACR // msr CPTR_EL3, xzr // disable sve traps mrs x0, CPTR_EL3 bic x0, x0, #(1<<10) orr x0, x0, #(1<<8) msr CPTR_EL3, x0 ... // Enable floating point // mov x0, #CPACR_EL1_FPEN // Disable SVE trap orr x0, x0, #(3<<16) msr CPACR_EL1, x0
I found this documentation particularly useful:
https://developer.arm.com/documentation/100748/0624/SVE-Coding-Considerations-with-Arm-Compiler-for-Embedded-6/Running-a-binary-in-an-AEMv8-A-Base-Fixed-Virtual-Platform--FVP-
See also the below for a number of coding examples:
https://developer.arm.com/documentation/dai0548
Thank you! Now works.B.T.W.Could you please help me understand why the svqdmulh_s32 intrinsic behaves the way it does? As far as I understand, its NEON equivalent is vqdmulhq_s32, but for some reason the result differs when multiplying a negative value by a positive one. For example: 100500000 * -25000 = 2147483647 with SVE2, while NEON gives -1170.
svqdmulh_s32
vqdmulhq_s32
100500000 * -25000 = 2147483647
-1170
I have replicated this. Let me discuss internally.
Was this posting also from you? Sorry it did not get a reply:
Unexpected result from svqdmulh_s32 with negative input values
Yes. I tested it on Armv8-A and expected the result to be fixed on Armv9-A as well.-------------------------------------------------------------------------------------------------------------------------------------Also, for those looking where to insert the code to disable SVE traps in startup.S, I placed it right after:
startup.S
msr SCTLR_EL3, x0 msr SCTLR_EL2, x0 msr SCTLR_EL1, x0
I believe your understanding is correct, but it is the FVP that is not giving the correct result.
I have raised the issue to the FVP team.