Arm Community
Site
Search
User
Site
Search
User
Groups
Research Collaboration and Enablement
DesignStart
Education Hub
Innovation
Open Source Software and Platforms
Forums
AI and ML forum
Architectures and Processors forum
Arm Development Platforms forum
Arm Development Studio forum
Arm Virtual Hardware forum
Automotive forum
Compilers and Libraries forum
Graphics, Gaming, and VR forum
High Performance Computing (HPC) forum
Infrastructure Solutions forum
Internet of Things (IoT) forum
Keil forum
Morello Forum
Operating Systems forum
SoC Design and Simulation forum
中文社区论区
Blogs
AI and ML blog
Announcements
Architectures and Processors blog
Automotive blog
Graphics, Gaming, and VR blog
High Performance Computing (HPC) blog
Infrastructure Solutions blog
Innovation blog
Internet of Things (IoT) blog
Operating Systems blog
Research Articles
SoC Design and Simulation blog
Tools, Software and IDEs blog
中文社区博客
Support
Arm Support Services
Documentation
Downloads
Training
Arm Approved program
Arm Design Reviews
Community Help
More
Cancel
Support forums
Arm Development Studio forum
Reading 8 u8s into 8 u16 lanes in NEON Q register
Jump...
Cancel
Locked
Locked
Replies
2 replies
Subscribers
121 subscribers
Views
2016 views
Users
0 members are here
Options
Share
More actions
Cancel
Related
How was your experience today?
This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion
Reading 8 u8s into 8 u16 lanes in NEON Q register
Offline
Justin Wick
over 9 years ago
Note: This was originally posted on 2nd June 2011 at
http://forums.arm.com
I'm totally new to SIMD handcoding and the ARM ISA in general, and am having trouble understanding the documentation for the vector load instructions. I have an array of 8 unsigned 8-bit numbers on the heap, and I would like to load them into a q-register as 16 bit unsigned integers (0 extended), filling all 128 bits. The 16-bit width is needed because of multiplies that I perform after the load instruction.
I believe I have everything else figured out, but even after reading the docs for the VLDn instruction multiple times, I was unable to determine if this is possible with a single instruction. I do not care what order the u8s are loaded into the register, provided it's deterministic.
Offline
Justin Wick
over 9 years ago
Note: This was originally posted on 2nd June 2011 at
http://forums.arm.com
Thanks, this solved my problem! The rest of the math was straightforward from there.
Cancel
Up
0
Down
Cancel
Offline
Peter Harris
over 9 years ago
Note: This was originally posted on 2nd June 2011 at
http://forums.arm.com
If you load a 64-bit register and the subsequent multiply is VMULL it will widen the result for you automatically.
If you can't use VMULL then you can load a 64-bit register with your data, a 64-bit register with a zero constant, and then use VZIP.U8 to interleave them.
If you need to do this multiple times then use VADDL to add zero to your 64-bit value. This widens but does not clobber the vector of zeros, so you can use them again for subsequent operations.
Cancel
Up
0
Down
Cancel