This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

How to check if Neon instructions are used?

Former Member
Former Member

I am running code in ADStudio using Fixed Virtual Platforms simulator, so no hardware board is connected. 

I am trying to profile a sub-routine, so I count cycles for a peace of code:

int64_t prev, curr, delta;
asm volatile("isb;mrs %0, pmccntr_el0" : "=r"(prev));

// function body

asm volatile("isb;mrs %0, pmccntr_el0" : "=r"(curr));
delta = curr - prev;

My compiler settings are --target=aarch64-arm-none-eabi -march=armv8-a -mcpu=cortex-a53

I wanted to check if compiler uses NEON instructions: 

#ifdef __aarch64__
printf("--- THIS IS ARCH64 \n");
#endif

#ifdef __ARM_NEON__
printf("--- THIS IS NEON \n");
#endif

But it seems that it is not using neon.

1) Is my define __ARM_NEON__ wrong? 

2) What is the default -gfpu

3) How do I force neon with -gfpu?

4) When I set -gfpu=none my cycle count is THE SAME as default one. I find this rather strange, shouldn't the math heavy code be much slower? Is there an explanation?

Thanks.

  • Hi Danijel,

    FPU is implied by default with -mcpu=cortex-a53. To deselect, use -mcpu=corrtex-a53+nofp+nosimd. See:
    https://developer.arm.com/documentation/101754/0614/armclang-Reference/armclang-Command-line-Options/-mcpu

    Do you simply wish to see if SIMD instructions are used? An easy test would be to disable them by the debugger once you get to main():

    set var $AARCH64::$System::$Other::$CPACR_EL1.FPEN = 0

    Your code will then trigger an exception if such an instruction is hit.


    If you wish to use the PMCCNTR register, you should also enable it, again easily done via the debugger:

    set var $AARCH64::$System::$PMU::$PMCR_EL0.E = 1
    set var $AARCH64::$System::$PMU::$PMCNTENSET_EL0.C = 1

    Note the FVP is not cycle accurate, but you should get numbers to help you roughly compare performance of different algorithms.

  • Former Member
    0 Former Member in reply to Ronan Synnott

    Thanks Ronan. 

    I did enable the PMCCNTR. It worked fine, showed some cycle count numbers that made sense. 

    Do you know how inaccurate FVP is when it comes to cycle counter? Just an informed guess will do. Thanks.  

  • The Fast Model technology that the FVPs are built on don't have the concept of cycles - there is some limited timing annotation you can add to the model, but at the expense of performance of the model (some requires access to the full Fast Model tooling:

    https://developer.arm.com/documentation/100965/1190/timing-annotation

    How inaccurate is the model? The number is approximately the number of instructions - if you are running small apps that would likely fit in the cache, then it is reasonably close (maybe ~20%) to the real number, if you are dealing with a larger system that would factor cache hits and misses and L2cache etc, then the numbers become more divergent. For this reason, I tend to use it as a first pass relative comparison between implementations, rather than an absolute number.