This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

How to shuffle bits and Check high bit value using Neon Intrinsics?

Note: This was originally posted on 1st November 2011 at http://forums.arm.com

Hi,

I am trying to convert a code written in SSE3 intrinsics to NEON SIMD  and am stuck because of a shuffle function.I have looked at the GCC Intrinsics ,ARM manuals but have not been able to find a solution

Is there any equivalent function for the _mm_shuffle_epi8 function in SSE3 .Any suggestions on how to implement this would be really appreciated since I cant seem to get past this.I know that a lookup-table exists ,but it does not do an initial comparison like the _mm_shuffle ,so i am not sure how to implement this.

Also,I need to check only the high bit values of each byte in a register.Is there any way to check the high-bit value of each element in a vector ?I looked at the manual and was not able to find anything relevant.Any help/info would be sincerely appreciated.

Cheers,



Parents
  • Note: This was originally posted on 1st November 2011 at http://forums.arm.com


    Is there any equivalent function for the _mm_shuffle_epi8 function in SSE3 .Any suggestions on how to implement this would be really appreciated since I cant seem to get past this.I know that a lookup-table exists ,but it does not do an initial comparison like the _mm_shuffle ,so i am not sure how to implement this.


    There is no such instruction into NEON, but you can do it by yourself

    One random algorithm is just
    x[sub]n+1[/sub] = ( 1664525 * x[sub]n[/sub] + 1013904223)



    .alea_values:
    .word 123456789, 369258147  @ d0 : random value
    .word 1664525, 1664525   @ d1 : random multiplier
    .word 1013904223, 1013904223 @ d1 : random increment
    ...

    @ init
    adr   r0, .alea_values
    vld1.u32  {d0 - d2}, [r0]

    @ compute
    vmul.u32 d0, d0, d1
    vadd.u32 d0, d0, d2

    @ store
    vst1.u32  {d0}, [r0]


    After computing you musthave a nice 64 bits random value into d0. You can then of course use it as 8, 16, 32 or 64 bits values.

    Of course, you don't have to load and store register at every times.
    If you convert a SSE3 code, then, you must have to many NEON register free. your can use 3 of them to keep the random coef.

    PS: I've not test this code. But it should work ;)
Reply
  • Note: This was originally posted on 1st November 2011 at http://forums.arm.com


    Is there any equivalent function for the _mm_shuffle_epi8 function in SSE3 .Any suggestions on how to implement this would be really appreciated since I cant seem to get past this.I know that a lookup-table exists ,but it does not do an initial comparison like the _mm_shuffle ,so i am not sure how to implement this.


    There is no such instruction into NEON, but you can do it by yourself

    One random algorithm is just
    x[sub]n+1[/sub] = ( 1664525 * x[sub]n[/sub] + 1013904223)



    .alea_values:
    .word 123456789, 369258147  @ d0 : random value
    .word 1664525, 1664525   @ d1 : random multiplier
    .word 1013904223, 1013904223 @ d1 : random increment
    ...

    @ init
    adr   r0, .alea_values
    vld1.u32  {d0 - d2}, [r0]

    @ compute
    vmul.u32 d0, d0, d1
    vadd.u32 d0, d0, d2

    @ store
    vst1.u32  {d0}, [r0]


    After computing you musthave a nice 64 bits random value into d0. You can then of course use it as 8, 16, 32 or 64 bits values.

    Of course, you don't have to load and store register at every times.
    If you convert a SSE3 code, then, you must have to many NEON register free. your can use 3 of them to keep the random coef.

    PS: I've not test this code. But it should work ;)
Children
No data