Cortex-A53 :The execution time of the same four assembly code instructions varies at different locations(non-cacheable). After counting with the PMU, it is found that in the slower cases, there is an additional instruction per cycle.If the instruction cache is enabled, the runtime will be consistent. Please analyze the reasons for this
We used Lauterbach Trace32 to view the PMU;The initialization time difference between two 40Mbytes BSS segments is 1 second.