This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

How to check if Neon instructions are used?

Former Member over 5 years ago

I am running code in ADStudio using Fixed Virtual Platforms simulator, so no hardware board is connected.

I am trying to profile a sub-routine, so I count cycles for a peace of code:

int64_t prev, curr, delta;
asm volatile("isb;mrs %0, pmccntr_el0" : "=r"(prev));

// function body

asm volatile("isb;mrs %0, pmccntr_el0" : "=r"(curr));
delta = curr - prev;

My compiler settings are --target=aarch64-arm-none-eabi -march=armv8-a -mcpu=cortex-a53.

I wanted to check if compiler uses NEON instructions:

#ifdef __aarch64__
printf("--- THIS IS ARCH64 \n");
#endif

#ifdef __ARM_NEON__
printf("--- THIS IS NEON \n");
#endif

But it seems that it is not using neon.

1) Is my define __ARM_NEON__ wrong?

2) What is the default -gfpu?

3) How do I force neon with -gfpu?

4) When I set -gfpu=none my cycle count is THE SAME as default one. I find this rather strange, shouldn't the math heavy code be much slower? Is there an explanation?

Thanks.

Top replies

Ronan Synnott over 5 years ago +1 verified

Hi Danijel, FPU is implied by default with -mcpu=cortex-a53. To deselect, use -mcpu=corrtex-a53+nofp+nosimd. See: https://developer.arm.com/documentation/101754/0614/armclang-Reference/armclang-Command...

0 Ronan Synnott over 5 years ago
Hi Danijel,

FPU is implied by default with -mcpu=cortex-a53. To deselect, use -mcpu=corrtex-a53+nofp+nosimd. See:
https://developer.arm.com/documentation/101754/0614/armclang-Reference/armclang-Command-line-Options/-mcpu

Do you simply wish to see if SIMD instructions are used? An easy test would be to disable them by the debugger once you get to main():

set var $AARCH64::$System::$Other::$CPACR_EL1.FPEN = 0

Your code will then trigger an exception if such an instruction is hit.

If you wish to use the PMCCNTR register, you should also enable it, again easily done via the debugger:

set var $AARCH64::$System::$PMU::$PMCR_EL0.E = 1 set var $AARCH64::$System::$PMU::$PMCNTENSET_EL0.C = 1

Note the FVP is not cycle accurate, but you should get numbers to help you roughly compare performance of different algorithms.
Cancel
Vote up +1 Vote down

Cancel
0 Former Member over 5 years ago in reply to Ronan Synnott

Thanks Ronan.

I did enable the PMCCNTR. It worked fine, showed some cycle count numbers that made sense.

Do you know how inaccurate FVP is when it comes to cycle counter? Just an informed guess will do. Thanks.
Cancel
Vote up 0 Vote down

Cancel
0 Ronan Synnott over 5 years ago in reply to Former Member

The Fast Model technology that the FVPs are built on don't have the concept of cycles - there is some limited timing annotation you can add to the model, but at the expense of performance of the model (some requires access to the full Fast Model tooling:

https://developer.arm.com/documentation/100965/1190/timing-annotation

How inaccurate is the model? The number is approximately the number of instructions - if you are running small apps that would likely fit in the cache, then it is reasonably close (maybe ~20%) to the real number, if you are dealing with a larger system that would factor cache hits and misses and L2cache etc, then the numbers become more divergent. For this reason, I tend to use it as a first pass relative comparison between implementations, rather than an absolute number.
Cancel
Vote up 0 Vote down

Cancel