This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Why the function execution time is so hugely different in different flash address based on Cortex-M7

Found that the function execution time is affected by different flash address when looping load(LDRB) data from flash to core register(R0),the number loop is 60, we also used core PMU test the number of instruction, found they are different.

I also tested that if the ldrb instruction is run only once, the time of this instruction is not affected by any flash address. It can be affected only when the LDRB instruction is loop execution.

Found a strong rule, performance is very well when 32 bytes are aligned manually. how can we automatically ensure optimal performance when using LDRB instruction?

Parents

0 Gofor_core over 2 years ago in reply to Demiholden

thanks, I only want to know the root causes, and there are lots of this style code, it is not reasonable to unroll all of loop code.
Cancel
Vote up 0 Vote down

Cancel

Reply

0 Gofor_core over 2 years ago in reply to Demiholden

thanks, I only want to know the root causes, and there are lots of this style code, it is not reasonable to unroll all of loop code.
Cancel
Vote up 0 Vote down

Cancel

Children

0 Ronan Synnott over 2 years ago in reply to Gofor_core

Which device are you using to benchmark this?
It may be that the memory system is inefficient for non word-aligned accesses?
Cancel
Vote up 0 Vote down

Cancel