I am porting a project from x86 to ARM64 and I have been struggling with poor performance for some time. Recently I tested switching from GCC to LLVM. To my surprise, I got a massive performance boost. In some cases code execution is several times faster. I experimented with all sorts of optimization flags but I can't get GCC to generate fast enough code. I suspect that vectorization doesn't work. When I compile a random source code file with the --verbose flag, LLVM reports +neon while GCC doesn't report SIMD features. I tried on different ARM64 cores and operating systems and the result is the same.
Any suggestions on how to enable vectorization with GCC on ARM64?
System:
I note you are using gcc 12, can you try the below release:https://developer.arm.com/tools-and-software/open-source-software/developer-tools/gnu-toolchain/downloads
Depending on your use case, Arm Compiler for Linux may be more appropriate:https://developer.arm.com/Tools%20and%20Software/Arm%20Compiler%20for%20Linux
I also stumbled upon the below article which you may find useful:https://sofiangotrong.wordpress.com/2017/10/16/simd-vectorization-on-aarch64/
My colleagues recently posted this update on gcc12:https://community.arm.com/arm-community-blogs/b/tools-software-ides-blog/posts/gcc-12
as well as this update on Arm Compiler for Linux:https://community.arm.com/arm-community-blogs/b/tools-software-ides-blog/posts/arm-compiler-for-linux-and-arm-performance-libraries-22-0