We are running a survey to help us improve the experience for all of our members. If you see the survey appear, please take the time to tell us about your experience if you can.
vmovl.u8 Qn, Dn @ convert byte to half wordvmul.u16 Qn, Qn, Qx @ Dx contain 8 * 257
You might get away without the multiply and have a transient register (rather than one containing the constant) by shifting the widened value left and ANDing it with the unshifted version, which might be faster (untested theory) but it is technically one instruction longer, so depends on pipeline (shift and and should be "simple" operations vs a MUL but YMMV).
vmovl.u8 q1, Dnvshll.u8 q2, Dn, #8vand.u16 q1, q1, q2
vmovl.u8 q1, Dnvsli.u16 q1, q1, #8
vmov d1, d0
vext.8 d1, d0, d0, #0
// d0 = [ a b c d e f g h ]vmov d1, d0vzip.8 d0, d1
I'm looking for a solution not using extra register !