This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Fast duplicate lane

Note: This was originally posted on 17th April 2013 at http://forums.arm.com

hi.
I have a little problem.

I have as input a Dn register with 8 byte.
[a, b, c, d, e, f, g, h]

I'd like to have 2 Dn Register with
[a, a, b, b, c, c, d, d]
and
[e, e, f, f, g, g, h, h]

The purpose is to try to do that with a minimum number of NEON register.
for the moment the best Way I found is something like


vmovl.u8              Qn, Dn                 @ convert byte to half word
vmul.u16              Qn, Qn, Qx    @ Dx contain 8 * 257


I'm looking for a solution not using extra register !

do you nhave any idea ?
thank's
Parents
  • Note: This was originally posted on 18th April 2013 at http://forums.arm.com


    You might get away without the multiply and have a transient register (rather than one containing the constant) by shifting the widened value left and ANDing it with the unshifted version, which might be faster (untested theory) but it is technically one instruction longer, so depends on pipeline (shift and and should be "simple" operations vs a MUL but YMMV).


    You mean


    vmovl.u8    q1, Dn
    vshll.u8    q2, Dn, #8
    vand.u16    q1, q1, q2


    Yes that could be better because I can use a temporary register instead af a fixed one !

    I'll try !
    thank's
Reply
  • Note: This was originally posted on 18th April 2013 at http://forums.arm.com


    You might get away without the multiply and have a transient register (rather than one containing the constant) by shifting the widened value left and ANDing it with the unshifted version, which might be faster (untested theory) but it is technically one instruction longer, so depends on pipeline (shift and and should be "simple" operations vs a MUL but YMMV).


    You mean


    vmovl.u8    q1, Dn
    vshll.u8    q2, Dn, #8
    vand.u16    q1, q1, q2


    Yes that could be better because I can use a temporary register instead af a fixed one !

    I'll try !
    thank's
Children
No data