Arm Development Studio forum Implementation in NEON of non uniform address jumps

This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Implementation in NEON of non uniform address jumps

Rnjai Lamba over 12 years ago

Parents

Peter Harris over 12 years ago

Note: This was originally posted on 1st July 2012 at http://forums.arm.com

Okay and the second half of D0[0] can be utilized by doing {snip}. Right?

No. Remember this is a load-store architecture. The address is where the data comes from, the register specifier sets where the data gets stored to. You've changed the address, but not the register spec, so you've just overwritten the bottom 16-bits again.

The first question is what do you mean by "the second half of D0[0]"? D0 is a double word register (8 bytes/64-bits). It is split into a number of lanes, depending on vector element size. In sim's example you are using a 16-bit, so 4 lanes per register. In this case you set the bottom 16-bits because you specified lane 0 (D0[0]). But this isn't half of anything - it sets ALL 16-bits of D0[0], and as there are 4 16-bit lanes this is setting 1/4th of the whole D register.

If you want to load the second 16-bits of the register then you want D0[1].

As I mentioned in of of your other posts, if you end up doing a lot of scalar loads you are really defeating the point of using a vector engine, so if you can restructure you algorithm so you do not to need to do this.

Iso
Cancel
Vote up 0 Vote down

Cancel

Reply

Peter Harris over 12 years ago

Note: This was originally posted on 1st July 2012 at http://forums.arm.com

Okay and the second half of D0[0] can be utilized by doing {snip}. Right?

No. Remember this is a load-store architecture. The address is where the data comes from, the register specifier sets where the data gets stored to. You've changed the address, but not the register spec, so you've just overwritten the bottom 16-bits again.

The first question is what do you mean by "the second half of D0[0]"? D0 is a double word register (8 bytes/64-bits). It is split into a number of lanes, depending on vector element size. In sim's example you are using a 16-bit, so 4 lanes per register. In this case you set the bottom 16-bits because you specified lane 0 (D0[0]). But this isn't half of anything - it sets ALL 16-bits of D0[0], and as there are 4 16-bit lanes this is setting 1/4th of the whole D register.

If you want to load the second 16-bits of the register then you want D0[1].

As I mentioned in of of your other posts, if you end up doing a lot of scalar loads you are really defeating the point of using a vector engine, so if you can restructure you algorithm so you do not to need to do this.

Iso
Cancel
Vote up 0 Vote down

Cancel

Children

No data