This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Question of Arm performance related to register allocation

I'm currently testing the code on an embedded board that is equipped with an ARM Cortex-A72 CPU based on the Armv8 architecture. The following code is used to measure the performance.

To test the above code, I divided it into three versions as shown below and measured their performance.

The performance of the first code(kernel_func_0) is measured at around 8ms, the second code(kernel_func_1) at 8ms, and the third code(kernel_func_2) at around 11ms.

To identify the reason for the performance difference between the second and third code, I converted both codes into assembly code.

The majority of the performance for both assembly codes is determined by operations on ".L2:" labels. However, I believe that there should be no difference in performance between the two codes because if i modify the second assembly code(KERNEL_FUNC_1_MOD) as follows, it looks like execute same operation as the third assembly code(KERNEL_FUNC_2 CODE).

When running the two codes, "KERNEL_FUNC_1_Mod" takes 8ms and "KERNEL_FUNC_2" takes 11ms. It is difficult to understand why such results are produced. It's hard to comprehend that the performance differs by about 3ms just because of the difference of whether the address to load memory is written in the same register or not.

0