This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Performance effect because of removing some instructions from ARMv8?

I was going through the ARMv8 Architecture Reference Manual and I came to know that it does not support many instructions that were previously supported by ARMv7 architecture. For example ARMv8 does not support conditional codes and have a seperate instruction CSEL for implementing the same. On further reading I came to know that to reduce the load on instruction encoding and because of better branch predictor conditional support has been removed. However as I came across more instructions I found out that ARM has also removed  the RSB instruction instead you need to use a combination of SUB and NEG instruction to achieve the same. My question is that whether increasing the number of instruction does not increase the number of cycles required to execute that instruction. Similarly you can't load multiple registers onto stack, instead only a pair of registers at a time. Similarly there are various neon instructions in ARMv7 that do not have an equivalent instruction in ARMv8 so doesn't that affect the performance of the program?

Parents
  • Like most engineering it's not a simple answer; it's all about trying to find a balance point between a simpler instruction set (and therefore small and fast in hardware) vs a more complex instruction set (which can be faster *if* you actually use the additional instructions/options a lot, but otherwise you're just paying a price for having them there even if your program doesn't use them). ARMv8 is definitely simpler, and uses the freed up encoding space to do other things which are generally very useful and will help gain performance (like doubling the register file size to 32 registers, rather than 16) or which will make the CPU designs smaller / faster / more energy efficient.

    Also remember that just because some operations were "one instruction" in ARMv7 doesn't necessarily mean that they were "one cycle" on ARMv7. For example, LDM and STM were usually 64-bits per clock, so performance-wise identical to ARMv8 running a sequence of multiple LDRD or STRD instructions. Removing it doesn't really cost the program anything on a modern processor, but it makes the hardware simpler (always good), and frees up space for things which are actually useful (like the larger register file).

    It's probably possible to find specific things that are slightly slower - loss of any specific instruction will do that - but it's important to step back and look at the performance of the whole program. It's unlikely you actually do RSB in a tight loop and nothing else; so what you lose in some places you'll more than gain elsewhere with the additional things ARMv8 can now do.

    Cheers,
    Pete

Reply
  • Like most engineering it's not a simple answer; it's all about trying to find a balance point between a simpler instruction set (and therefore small and fast in hardware) vs a more complex instruction set (which can be faster *if* you actually use the additional instructions/options a lot, but otherwise you're just paying a price for having them there even if your program doesn't use them). ARMv8 is definitely simpler, and uses the freed up encoding space to do other things which are generally very useful and will help gain performance (like doubling the register file size to 32 registers, rather than 16) or which will make the CPU designs smaller / faster / more energy efficient.

    Also remember that just because some operations were "one instruction" in ARMv7 doesn't necessarily mean that they were "one cycle" on ARMv7. For example, LDM and STM were usually 64-bits per clock, so performance-wise identical to ARMv8 running a sequence of multiple LDRD or STRD instructions. Removing it doesn't really cost the program anything on a modern processor, but it makes the hardware simpler (always good), and frees up space for things which are actually useful (like the larger register file).

    It's probably possible to find specific things that are slightly slower - loss of any specific instruction will do that - but it's important to step back and look at the performance of the whole program. It's unlikely you actually do RSB in a tight loop and nothing else; so what you lose in some places you'll more than gain elsewhere with the additional things ARMv8 can now do.

    Cheers,
    Pete

Children
No data