This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

SSEtoNEON FAQ

_mm_sfence() and _mm_pause() are two of Intel instruction set.

Unlike ordinary operation instructions, they provide optimization functions. If I want to implement them on arm, is there any suitable instruction or statement to replace it?

Thanks.

Parents
  • Hi,

    Here are two repositories with the SIMD translations between two architectures:

    github.com/.../sse2neon

    github.com/.../ARM_NEON_2_x86_SSE

    From https://github.com/DLTcollab/sse2neon, you can download the code and find sse2neon.h, where _mm_sfence() and _mm_pause() are translated with compiler builtin functions & instructions, as there are no equivalent Neon instructions:

    /* Streaming Extensions */

    // Guarantees that every preceding store is globally visible before any
    // subsequent store.
    // msdn.microsoft.com/.../5h2w73d1(v=vs.90).aspx
    FORCE_INLINE void _mm_sfence(void)
    {
    __sync_synchronize();
    }

    // Pause the processor. This is typically used in spin-wait loops and depending
    // on the x86 processor typical values are in the 40-100 cycle range. The
    // 'yield' instruction isn't a good fit beacuse it's effectively a nop on most
    // Arm cores. Experience with several databases has shown has shown an 'isb' is
    // a reasonable approximation.
    FORCE_INLINE void _mm_pause()
    {
    __asm__ __volatile__("isb\n");
    }

Reply
  • Hi,

    Here are two repositories with the SIMD translations between two architectures:

    github.com/.../sse2neon

    github.com/.../ARM_NEON_2_x86_SSE

    From https://github.com/DLTcollab/sse2neon, you can download the code and find sse2neon.h, where _mm_sfence() and _mm_pause() are translated with compiler builtin functions & instructions, as there are no equivalent Neon instructions:

    /* Streaming Extensions */

    // Guarantees that every preceding store is globally visible before any
    // subsequent store.
    // msdn.microsoft.com/.../5h2w73d1(v=vs.90).aspx
    FORCE_INLINE void _mm_sfence(void)
    {
    __sync_synchronize();
    }

    // Pause the processor. This is typically used in spin-wait loops and depending
    // on the x86 processor typical values are in the 40-100 cycle range. The
    // 'yield' instruction isn't a good fit beacuse it's effectively a nop on most
    // Arm cores. Experience with several databases has shown has shown an 'isb' is
    // a reasonable approximation.
    FORCE_INLINE void _mm_pause()
    {
    __asm__ __volatile__("isb\n");
    }

Children
No data