I was going through the ARMv8 Architecture Reference Manual and I came to know that it does not support many instructions that were previously supported by ARMv7 architecture. For example ARMv8 does not support conditional codes and have a seperate instruction CSEL for implementing the same. On further reading I came to know that to reduce the load on instruction encoding and because of better branch predictor conditional support has been removed. However as I came across more instructions I found out that ARM has also removed the RSB instruction instead you need to use a combination of SUB and NEG instruction to achieve the same. My question is that whether increasing the number of instruction does not increase the number of cycles required to execute that instruction. Similarly you can't load multiple registers onto stack, instead only a pair of registers at a time. Similarly there are various neon instructions in ARMv7 that do not have an equivalent instruction in ARMv8 so doesn't that affect the performance of the program?
The performance metric relevant to your question is total execution time consumed on a particular job. Total execution time (T) can be determined by
T = (total number of clock cycles) x (clock period)
T = (total number of clock cycles) ÷ (clock frequency)
T = Σ((cycles per instruction)n x (clock period))
T = Σ((cycles per instruction)n ÷ (clock frequency))
If all the instructions used have the same number of clock cycles to execute
T = (total number of instructions) x (cycles per instruction) x (clock period)
T = (total number of instructions) x (cycles per instruction) ÷ (clock frequency)
One approach is to minimize T by minimizing the number of instructions needed for the overall job. To achieve this, complex instructions that perform more operations are employed. Complex instructions (typically) require more cycles per instructions (CPI) and has a side effect of lowering the maximum clock frequency, both factors counteract the objective to minimize T. The other approach is to minimize T by decreasing the CPI and increasing the clock frequency. So, even when more instructions are needed to accomplish a job if CPI is minimal and higher clock frequency is attainable, performance in terms of T can be improved.