This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

How to check if Neon instructions are used?

Former Member over 6 years ago

I am running code in ADStudio using Fixed Virtual Platforms simulator, so no hardware board is connected.

I am trying to profile a sub-routine, so I count cycles for a peace of code:

int64_t prev, curr, delta;
asm volatile("isb;mrs %0, pmccntr_el0" : "=r"(prev));

// function body

asm volatile("isb;mrs %0, pmccntr_el0" : "=r"(curr));
delta = curr - prev;

My compiler settings are --target=aarch64-arm-none-eabi -march=armv8-a -mcpu=cortex-a53.

I wanted to check if compiler uses NEON instructions:

#ifdef __aarch64__
printf("--- THIS IS ARCH64 \n");
#endif

#ifdef __ARM_NEON__
printf("--- THIS IS NEON \n");
#endif

But it seems that it is not using neon.

1) Is my define __ARM_NEON__ wrong?

2) What is the default -gfpu?

3) How do I force neon with -gfpu?

4) When I set -gfpu=none my cycle count is THE SAME as default one. I find this rather strange, shouldn't the math heavy code be much slower? Is there an explanation?

Thanks.

Top replies

Ronan Synnott over 5 years ago +1 verified

Hi Danijel, FPU is implied by default with -mcpu=cortex-a53. To deselect, use -mcpu=corrtex-a53+nofp+nosimd. See: https://developer.arm.com/documentation/101754/0614/armclang-Reference/armclang-Command...

Parents

0 Former Member over 5 years ago in reply to Ronan Synnott

Thanks Ronan.

I did enable the PMCCNTR. It worked fine, showed some cycle count numbers that made sense.

Do you know how inaccurate FVP is when it comes to cycle counter? Just an informed guess will do. Thanks.
Cancel
Vote up 0 Vote down

Cancel

Reply

0 Former Member over 5 years ago in reply to Ronan Synnott

Thanks Ronan.

I did enable the PMCCNTR. It worked fine, showed some cycle count numbers that made sense.

Do you know how inaccurate FVP is when it comes to cycle counter? Just an informed guess will do. Thanks.
Cancel
Vote up 0 Vote down

Cancel

Children

0 Ronan Synnott over 5 years ago in reply to Former Member

The Fast Model technology that the FVPs are built on don't have the concept of cycles - there is some limited timing annotation you can add to the model, but at the expense of performance of the model (some requires access to the full Fast Model tooling:

https://developer.arm.com/documentation/100965/1190/timing-annotation

How inaccurate is the model? The number is approximately the number of instructions - if you are running small apps that would likely fit in the cache, then it is reasonably close (maybe ~20%) to the real number, if you are dealing with a larger system that would factor cache hits and misses and L2cache etc, then the numbers become more divergent. For this reason, I tend to use it as a first pass relative comparison between implementations, rather than an absolute number.
Cancel
Vote up 0 Vote down

Cancel