I am porting a project from x86 to ARM64 and I have been struggling with poor performance for some time. Recently I tested switching from GCC to LLVM. To my surprise, I got a massive performance boost. In some cases code execution is several times faster. I experimented with all sorts of optimization flags but I can't get GCC to generate fast enough code. I suspect that vectorization doesn't work. When I compile a random source code file with the --verbose flag, LLVM reports +neon while GCC doesn't report SIMD features. I tried on different ARM64 cores and operating systems and the result is the same.
Any suggestions on how to enable vectorization with GCC on ARM64?
System:
Somehow this has to do with the combination of ARM64 + Linux. GCC performs better than Clang on Intel and AMD hardware. I tested it on Apple M1 under macOS - GCC was significantly better.Ignore this. I tested single-threaded on M1.