Arm Community
Site
Search
User
Site
Search
User
Support forums
Arm Development Studio forum
Reading 8 u8s into 8 u16 lanes in NEON Q register
Jump...
Cancel
Locked
Locked
Replies
2 replies
Subscribers
119 subscribers
Views
2382 views
Users
0 members are here
Options
Share
More actions
Cancel
Related
How was your experience today?
This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion
Reading 8 u8s into 8 u16 lanes in NEON Q register
Justin Wick
over 12 years ago
Note: This was originally posted on 2nd June 2011 at
http://forums.arm.com
I'm totally new to SIMD handcoding and the ARM ISA in general, and am having trouble understanding the documentation for the vector load instructions. I have an array of 8 unsigned 8-bit numbers on the heap, and I would like to load them into a q-register as 16 bit unsigned integers (0 extended), filling all 128 bits. The 16-bit width is needed because of multiplies that I perform after the load instruction.
I believe I have everything else figured out, but even after reading the docs for the VLDn instruction multiple times, I was unable to determine if this is possible with a single instruction. I do not care what order the u8s are loaded into the register, provided it's deterministic.
Parents
Peter Harris
over 12 years ago
Note: This was originally posted on 2nd June 2011 at
http://forums.arm.com
If you load a 64-bit register and the subsequent multiply is VMULL it will widen the result for you automatically.
If you can't use VMULL then you can load a 64-bit register with your data, a 64-bit register with a zero constant, and then use VZIP.U8 to interleave them.
If you need to do this multiple times then use VADDL to add zero to your 64-bit value. This widens but does not clobber the vector of zeros, so you can use them again for subsequent operations.
Cancel
Vote up
0
Vote down
Cancel
Reply
Peter Harris
over 12 years ago
Note: This was originally posted on 2nd June 2011 at
http://forums.arm.com
If you load a 64-bit register and the subsequent multiply is VMULL it will widen the result for you automatically.
If you can't use VMULL then you can load a 64-bit register with your data, a 64-bit register with a zero constant, and then use VZIP.U8 to interleave them.
If you need to do this multiple times then use VADDL to add zero to your 64-bit value. This widens but does not clobber the vector of zeros, so you can use them again for subsequent operations.
Cancel
Vote up
0
Vote down
Cancel
Children
No data