Arm Community
Site
Search
User
Site
Search
User
Support forums
Arm Development Studio forum
Implementation in NEON of non uniform address jumps
Jump...
Cancel
Locked
Locked
Replies
37 replies
Subscribers
119 subscribers
Views
16319 views
Users
0 members are here
Options
Share
More actions
Cancel
Related
How was your experience today?
This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion
Implementation in NEON of non uniform address jumps
Rnjai Lamba
over 12 years ago
Parents
Peter Harris
over 12 years ago
Note: This was originally posted on 1st July 2012 at
http://forums.arm.com
Okay and the second half of D0[0] can be utilized by doing {snip}. Right?
No. Remember this is a load-store architecture. The address is where the data comes from, the register specifier sets where the data gets stored to. You've changed the address, but not the register spec, so you've just overwritten the bottom 16-bits again.
The first question is what do you mean by "the second half of D0[0]"? D0 is a double word register (8 bytes/64-bits). It is split into a number of lanes, depending on vector element size. In sim's example you are using a 16-bit, so 4 lanes per register. In this case you set the bottom 16-bits because you specified lane 0 (D0[0]). But this isn't half of anything - it sets ALL 16-bits of D0[0], and as there are 4 16-bit lanes this is setting 1/4th of the whole D register.
If you want to load the second 16-bits of the register then you want D0[1].
As I mentioned in of of your other posts, if you end up doing a lot of scalar loads you are really defeating the point of using a vector engine, so if you can restructure you algorithm so you do not to need to do this.
Iso
Cancel
Vote up
0
Vote down
Cancel
Reply
Peter Harris
over 12 years ago
Note: This was originally posted on 1st July 2012 at
http://forums.arm.com
Okay and the second half of D0[0] can be utilized by doing {snip}. Right?
No. Remember this is a load-store architecture. The address is where the data comes from, the register specifier sets where the data gets stored to. You've changed the address, but not the register spec, so you've just overwritten the bottom 16-bits again.
The first question is what do you mean by "the second half of D0[0]"? D0 is a double word register (8 bytes/64-bits). It is split into a number of lanes, depending on vector element size. In sim's example you are using a 16-bit, so 4 lanes per register. In this case you set the bottom 16-bits because you specified lane 0 (D0[0]). But this isn't half of anything - it sets ALL 16-bits of D0[0], and as there are 4 16-bit lanes this is setting 1/4th of the whole D register.
If you want to load the second 16-bits of the register then you want D0[1].
As I mentioned in of of your other posts, if you end up doing a lot of scalar loads you are really defeating the point of using a vector engine, so if you can restructure you algorithm so you do not to need to do this.
Iso
Cancel
Vote up
0
Vote down
Cancel
Children
No data