This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Reading 8 u8s into 8 u16 lanes in NEON Q register

Note: This was originally posted on 2nd June 2011 at http://forums.arm.com

I'm totally new to SIMD handcoding and the ARM ISA in general, and am having trouble understanding the documentation for the vector load instructions. I have an array of 8 unsigned 8-bit numbers on the heap, and I would like to load them into a q-register as 16 bit unsigned integers (0 extended), filling all 128 bits. The 16-bit width is needed because of multiplies that I perform after the load instruction.

I believe I have everything else figured out, but even after reading the docs for the VLDn instruction multiple times, I was unable to determine if this is possible with a single instruction. I do not care what order the u8s are loaded into the register, provided it's deterministic.

Parents

Peter Harris over 12 years ago

Note: This was originally posted on 2nd June 2011 at http://forums.arm.com

If you load a 64-bit register and the subsequent multiply is VMULL it will widen the result for you automatically.

If you can't use VMULL then you can load a 64-bit register with your data, a 64-bit register with a zero constant, and then use VZIP.U8 to interleave them.

If you need to do this multiple times then use VADDL to add zero to your 64-bit value. This widens but does not clobber the vector of zeros, so you can use them again for subsequent operations.
Cancel
Vote up 0 Vote down

Cancel

Reply

Peter Harris over 12 years ago

Note: This was originally posted on 2nd June 2011 at http://forums.arm.com

If you load a 64-bit register and the subsequent multiply is VMULL it will widen the result for you automatically.

If you can't use VMULL then you can load a 64-bit register with your data, a 64-bit register with a zero constant, and then use VZIP.U8 to interleave them.

If you need to do this multiple times then use VADDL to add zero to your 64-bit value. This widens but does not clobber the vector of zeros, so you can use them again for subsequent operations.
Cancel
Vote up 0 Vote down

Cancel

Children

No data