This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

How to check if Neon instructions are used?

Former Member
Former Member

I am running code in ADStudio using Fixed Virtual Platforms simulator, so no hardware board is connected. 

I am trying to profile a sub-routine, so I count cycles for a peace of code:

int64_t prev, curr, delta;
asm volatile("isb;mrs %0, pmccntr_el0" : "=r"(prev));

// function body

asm volatile("isb;mrs %0, pmccntr_el0" : "=r"(curr));
delta = curr - prev;

My compiler settings are --target=aarch64-arm-none-eabi -march=armv8-a -mcpu=cortex-a53

I wanted to check if compiler uses NEON instructions: 

#ifdef __aarch64__
printf("--- THIS IS ARCH64 \n");
#endif

#ifdef __ARM_NEON__
printf("--- THIS IS NEON \n");
#endif

But it seems that it is not using neon.

1) Is my define __ARM_NEON__ wrong? 

2) What is the default -gfpu

3) How do I force neon with -gfpu?

4) When I set -gfpu=none my cycle count is THE SAME as default one. I find this rather strange, shouldn't the math heavy code be much slower? Is there an explanation?

Thanks.

Parents
  • The Fast Model technology that the FVPs are built on don't have the concept of cycles - there is some limited timing annotation you can add to the model, but at the expense of performance of the model (some requires access to the full Fast Model tooling:

    https://developer.arm.com/documentation/100965/1190/timing-annotation

    How inaccurate is the model? The number is approximately the number of instructions - if you are running small apps that would likely fit in the cache, then it is reasonably close (maybe ~20%) to the real number, if you are dealing with a larger system that would factor cache hits and misses and L2cache etc, then the numbers become more divergent. For this reason, I tend to use it as a first pass relative comparison between implementations, rather than an absolute number.

Reply
  • The Fast Model technology that the FVPs are built on don't have the concept of cycles - there is some limited timing annotation you can add to the model, but at the expense of performance of the model (some requires access to the full Fast Model tooling:

    https://developer.arm.com/documentation/100965/1190/timing-annotation

    How inaccurate is the model? The number is approximately the number of instructions - if you are running small apps that would likely fit in the cache, then it is reasonably close (maybe ~20%) to the real number, if you are dealing with a larger system that would factor cache hits and misses and L2cache etc, then the numbers become more divergent. For this reason, I tend to use it as a first pass relative comparison between implementations, rather than an absolute number.

Children
No data