This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Why the function execution time is so hugely different in different flash address based on Cortex-M7

Found that the function execution time is affected by different flash address when looping load(LDRB) data from flash to core register(R0),the number loop is 60, we also used core PMU test the number of instruction, found they are different.

I also tested that if the ldrb instruction is run only once, the time of this instruction is not affected by any flash address. It can be affected only when the LDRB instruction is loop execution.

Found a strong rule, performance is very well when 32 bytes are aligned manually. how can we automatically ensure optimal performance when using LDRB instruction?