This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

VPf vector example

Note: This was originally posted on 5th January 2011 at http://forums.arm.com

Hi.

the cortex documentation speak about register bank for vector usage.
That's great, but I do not really understand what is a vector instruction (using those bank)

Does anybody can give me an example using the register bank and vpf vector instruction ?

Thank's

Parents

Jesse Towner over 12 years ago

Note: This was originally posted on 19th January 2011 at http://forums.arm.com

I'm agree with you.
But this is the only usage I found for the bank register.

NEON do not use the bank registrer.
Most of NEON instruction can use all NEON registrer without any restriction.

With the VFP architecture, the VFP registers were divided into 4 banks. So you had:

Bank #0: S0-S7 and D0-D3
Bank #1: S8-S15 and D4-D7
Bank #2: S16-S23 and D8-D11
Bank #3: S24-S31 and D12-D15

When you set the VFP vector arity to greater than 1, Banks #1 through #3 were used for vector operations, while Bank #0 was reserved for scalar operations. That way, even if you had set the vector arity to say 4 and were performing operations on 32-bit floating point 4-vectors, you still could use registers S0-S7 for scalar 32-bit floating point operations without having to switch the VFP unit's vector arity back to 1.

Of course, now that VFP is deprecated on ARMv7 based architectures such as Cortex, NEON is the way to go. VFP instructions with the arity set above 1 on ARMv7/NEON processors will perform much slower, so you should avoid using them on those platforms and use the NEON pipeline instead.

Edit: There was another interesting property of register addressing when used in vector operations that I had forgotten to mention. The subsequent registers comprising a vector would wrap around on the register bank boundaries. So if you issued the following instruction when the vector arity was set to 4:

fadds s16, s14, s20

The first vector operand starting at register s14 would wrap around so that it would be {s14, s15, s8, s9}. It would be the equivalent of:

s16 = s14 + s20
s17 = s15 + s21
s18 = s8 + s22
s19 = s9 + s23

You could exploit this trick to perform shuffling of vector components without additional instructions.
Cancel
Vote up 0 Vote down

Cancel

Reply

Jesse Towner over 12 years ago

Note: This was originally posted on 19th January 2011 at http://forums.arm.com

I'm agree with you.
But this is the only usage I found for the bank register.

NEON do not use the bank registrer.
Most of NEON instruction can use all NEON registrer without any restriction.

With the VFP architecture, the VFP registers were divided into 4 banks. So you had:

Bank #0: S0-S7 and D0-D3
Bank #1: S8-S15 and D4-D7
Bank #2: S16-S23 and D8-D11
Bank #3: S24-S31 and D12-D15

When you set the VFP vector arity to greater than 1, Banks #1 through #3 were used for vector operations, while Bank #0 was reserved for scalar operations. That way, even if you had set the vector arity to say 4 and were performing operations on 32-bit floating point 4-vectors, you still could use registers S0-S7 for scalar 32-bit floating point operations without having to switch the VFP unit's vector arity back to 1.

Of course, now that VFP is deprecated on ARMv7 based architectures such as Cortex, NEON is the way to go. VFP instructions with the arity set above 1 on ARMv7/NEON processors will perform much slower, so you should avoid using them on those platforms and use the NEON pipeline instead.

Edit: There was another interesting property of register addressing when used in vector operations that I had forgotten to mention. The subsequent registers comprising a vector would wrap around on the register bank boundaries. So if you issued the following instruction when the vector arity was set to 4:

fadds s16, s14, s20

The first vector operand starting at register s14 would wrap around so that it would be {s14, s15, s8, s9}. It would be the equivalent of:

s16 = s14 + s20
s17 = s15 + s21
s18 = s8 + s22
s19 = s9 + s23

You could exploit this trick to perform shuffling of vector components without additional instructions.
Cancel
Vote up 0 Vote down

Cancel

Children

No data