• Optimization of Neon Intrinsics on ARM cortexa53

    I am using ARMv8 GCC compiler and I would like to optimize Neon Intrinsics code for better execution time performance. I have already tried loop unrolling and I am using look up table for the computation of log10. Any ideas?

    Here is the code:

    static inline…