I am using ARMv8 GCC compiler and I would like to optimize Neon Intrinsics code for better execution time performance. I have already tried loop unrolling and I am using look up table for the computation of log10. Any ideas?
Here is the code:
static inline…