This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Best way to load u8bit into u16bit NEON elements?

Note: This was originally posted on 3rd August 2011 at http://forums.arm.com

Hi everyone,

I've unsigned 8-bit bytes in linear memory addresses that I want to load into unsigned 16-bit lanes of NEON vector registers.

For example, I'd like to load the bytes
0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07
0x10, 0x11, 0x12, 0x13, 0x14, 0x15, 0x16, 0x17

into NEON quadword registers like so:
Q8 = {0x0000, 0x0100, 0x0200, 0x0300, 0x0400, 0x0500, 0x0600, 0x0700}
Q9 = {0x1000, 0x1100, 0x1200, 0x1300, 0x1400, 0x1500, 0x1600, 0x1700}

The best I've come up with is:
veor      q8, q8, q8
veor      q9, q9, q9
vld1.8    {q0}, [r0,:128]!
vtrn.8    q8, q0
vshr.u16  q0, q0, #8
vtrn.8    q9, q0
vzip.16   q8, q9

Is there a better (read: faster) way to do this?

jpap
Parents Reply Children
No data