mm_shuffle_epi8 equivalent on ARM machines

In a project which is focussed on accelerating the performance on ARM, I am using the mm_shuffle_epi8 implementation from the below page

But above implementation is sub optimal and leading to performance costs.

Is there a right equivalent for _mm_shuffle_epi8 for ARM ?

  • There isn't an exact equivalent, but vtbl is likely a useful command for doing _mm_shuffle_epi8 in Neon.

    As there isn't a direct equivalent, a completely generic version won't be as efficient, but if you have particular shuffles you'll get to something better.

    I always plug the searchable list of Neon commands, and this guide will also be useful.