This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

VPf vector example

Note: This was originally posted on 5th January 2011 at http://forums.arm.com

Hi.

the cortex documentation speak about register bank for vector usage.
That's great, but I do not really understand what is a vector instruction (using those bank)

Does anybody can give me an example using the register bank and vpf vector instruction ?

Thank's
Parents
  • Note: This was originally posted on 19th January 2011 at http://forums.arm.com


    I'm agree with you.
    But this is the only usage I found for the bank register.

    NEON do not use the bank registrer.
    Most of NEON instruction can use all NEON registrer without any restriction.




    With the VFP architecture, the VFP registers were divided into 4 banks. So you had:

    Bank #0: S0-S7 and D0-D3
    Bank #1: S8-S15 and D4-D7
    Bank #2: S16-S23 and D8-D11
    Bank #3: S24-S31 and D12-D15

    When you set the VFP vector arity to greater than 1, Banks #1 through #3 were used for vector operations, while Bank #0 was reserved for scalar operations. That way, even if you had set the vector arity to say 4 and were performing operations on 32-bit floating point 4-vectors, you still could use registers S0-S7 for scalar 32-bit floating point operations without having to switch the VFP unit's vector arity back to 1.

    Of course, now that VFP is deprecated on ARMv7 based architectures such as Cortex, NEON is the way to go. VFP instructions with the arity set above 1 on ARMv7/NEON processors will perform much slower, so you should avoid using them on those platforms and use the NEON pipeline instead.

    Edit: There was another interesting property of register addressing when used in vector operations that I had forgotten to mention. The subsequent registers comprising a vector would wrap around on the register bank boundaries. So if you issued the following instruction when the vector arity was set to 4:

    fadds s16, s14, s20

    The first vector operand starting at register s14 would wrap around so that it would be {s14, s15, s8, s9}. It would be the equivalent of:

    s16 = s14 + s20
    s17 = s15 + s21
    s18 = s8 + s22
    s19 = s9 + s23

    You could exploit this trick to perform shuffling of vector components without additional instructions.
Reply
  • Note: This was originally posted on 19th January 2011 at http://forums.arm.com


    I'm agree with you.
    But this is the only usage I found for the bank register.

    NEON do not use the bank registrer.
    Most of NEON instruction can use all NEON registrer without any restriction.




    With the VFP architecture, the VFP registers were divided into 4 banks. So you had:

    Bank #0: S0-S7 and D0-D3
    Bank #1: S8-S15 and D4-D7
    Bank #2: S16-S23 and D8-D11
    Bank #3: S24-S31 and D12-D15

    When you set the VFP vector arity to greater than 1, Banks #1 through #3 were used for vector operations, while Bank #0 was reserved for scalar operations. That way, even if you had set the vector arity to say 4 and were performing operations on 32-bit floating point 4-vectors, you still could use registers S0-S7 for scalar 32-bit floating point operations without having to switch the VFP unit's vector arity back to 1.

    Of course, now that VFP is deprecated on ARMv7 based architectures such as Cortex, NEON is the way to go. VFP instructions with the arity set above 1 on ARMv7/NEON processors will perform much slower, so you should avoid using them on those platforms and use the NEON pipeline instead.

    Edit: There was another interesting property of register addressing when used in vector operations that I had forgotten to mention. The subsequent registers comprising a vector would wrap around on the register bank boundaries. So if you issued the following instruction when the vector arity was set to 4:

    fadds s16, s14, s20

    The first vector operand starting at register s14 would wrap around so that it would be {s14, s15, s8, s9}. It would be the equivalent of:

    s16 = s14 + s20
    s17 = s15 + s21
    s18 = s8 + s22
    s19 = s9 + s23

    You could exploit this trick to perform shuffling of vector components without additional instructions.
Children
No data