This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Poor performance with GCC

I am porting a project from x86 to ARM64 and I have been struggling with poor performance for some time. Recently I tested switching from GCC to LLVM. To my surprise, I got a massive performance boost. In some cases code execution is several times faster. I experimented with all sorts of optimization flags but I can't get GCC to generate fast enough code. I suspect that vectorization doesn't work. When I compile a random source code file with the --verbose flag, LLVM reports +neon while GCC doesn't report SIMD features. I tried on different ARM64 cores and operating systems and the result is the same.

Any suggestions on how to enable vectorization with GCC on ARM64?

System:

  • GCC 12
  • LLVM 12
  • RHEL 7 and RHEL 8
  • ARMv8-a+neon
Parents
  • On Linux, Arm does not support platforms which do not support SIMD.  As such SIMD is always enabled, which is why GCC does not emit +simd, since it's implicitly always on.

    GCC enables some vectorization at -O2 and all at -O3, but your post contains too few details to see what the problem is:

    Could you post your full command-line flags, does your project use floating point math? can you give an example of what vectorizes with LLVM and not with GCC?

    When using floating point math GCC and LLVM have different defaults. GCC defaults to honoring floating point traps while LLVM does not, this means that LLVM by default will more aggressively vectorize while GCC needs -Ofast or -fno-trapping-math.

    So need some more details before can give you an answer.

Reply
  • On Linux, Arm does not support platforms which do not support SIMD.  As such SIMD is always enabled, which is why GCC does not emit +simd, since it's implicitly always on.

    GCC enables some vectorization at -O2 and all at -O3, but your post contains too few details to see what the problem is:

    Could you post your full command-line flags, does your project use floating point math? can you give an example of what vectorizes with LLVM and not with GCC?

    When using floating point math GCC and LLVM have different defaults. GCC defaults to honoring floating point traps while LLVM does not, this means that LLVM by default will more aggressively vectorize while GCC needs -Ofast or -fno-trapping-math.

    So need some more details before can give you an answer.

Children
No data