I was going through the ARMv8 Architecture Reference Manual and I came to know that it does not support many instructions that were previously supported by ARMv7 architecture. For example ARMv8 does not support conditional codes and have a seperate instruction CSEL for implementing the same. On further reading I came to know that to reduce the load on instruction encoding and because of better branch predictor conditional support has been removed. However as I came across more instructions I found out that ARM has also removed the RSB instruction instead you need to use a combination of SUB and NEG instruction to achieve the same. My question is that whether increasing the number of instruction does not increase the number of cycles required to execute that instruction. Similarly you can't load multiple registers onto stack, instead only a pair of registers at a time. Similarly there are various neon instructions in ARMv7 that do not have an equivalent instruction in ARMv8 so doesn't that affect the performance of the program?
Hi Natesh,
ARMv8A is focused on high performance and high throughput. What you concern like conditional code, LDM/STM affects the high performance implementation of ARM architecture when using superscalar out of order execution method. So they are removed from architecture level.That's why A72 has huge performance increase than A15.
Does that mean even if we are using more number of instructions to achieve the same functionality still the performance is better than ARMv7. I find it a bit difficult to grasp. Can u please elaborate more on this part?
Do you read computer architecture? You can get more knowledge about out of order execution.
Take conditional execution as an example, it will limit instruction issue rate and increase hardware effect, but the software test shows that conditional execution can't get good code density.
LDM/STM: the hardware needs to split the LDM/STM into many uops, then sends them to the function unit. In the write back stage, the hardware needs to merge them. The hardware complexity is increased but the memory access throughput is not balanced. A64 use pair LD/ST instruction to replace them.
regarding RSB, I don't know more about it. I agree with you that it is a good instruction. Maybe it is not good for C/C++ compiler.
Any comment is welcome.
View all questions in Cortex-A / A-Profile forum