I am porting a project from x86 to ARM64 and I have been struggling with poor performance for some time. Recently I tested switching from GCC to LLVM. To my surprise, I got a massive performance boost. In some cases code execution is several times faster. I experimented with all sorts of optimization flags but I can't get GCC to generate fast enough code. I suspect that vectorization doesn't work. When I compile a random source code file with the --verbose flag, LLVM reports +neon while GCC doesn't report SIMD features. I tried on different ARM64 cores and operating systems and the result is the same.
Any suggestions on how to enable vectorization with GCC on ARM64?
System:
My colleagues recently posted this update on gcc12:https://community.arm.com/arm-community-blogs/b/tools-software-ides-blog/posts/gcc-12
as well as this update on Arm Compiler for Linux:https://community.arm.com/arm-community-blogs/b/tools-software-ides-blog/posts/arm-compiler-for-linux-and-arm-performance-libraries-22-0