Support forums

Architectures and Processors forum Why the function execution time is so hugely different in different flash address based on Cortex-M7

State Suggested Answer
Locked Locked
Replies 3 replies
Answers 1 answer
Subscribers 350 subscribers
Views 1052 views
Users 0 members are here

Options

How was your experience today?

This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Why the function execution time is so hugely different in different flash address based on Cortex-M7

Gofor_core over 1 year ago

Found that the function execution time is affected by different flash address when looping load(LDRB) data from flash to core register(R0),the number loop is 60, we also used core PMU test the number of instruction, found they are different.

I also tested that if the ldrb instruction is run only once, the time of this instruction is not affected by any flash address. It can be affected only when the LDRB instruction is loop execution.

Found a strong rule, performance is very well when 32 bytes are aligned manually. how can we automatically ensure optimal performance when using LDRB instruction?

Parents

0 Demiholden over 1 year ago

You can consider loop unrolling to reduce loop overhead. This technique can improve performance by reducing the number of loop iterations and, consequently, the number of LDRB instructions executed. MyEnvoyAir
Cancel
Up 0 Down

Cancel

Reply

0 Demiholden over 1 year ago

You can consider loop unrolling to reduce loop overhead. This technique can improve performance by reducing the number of loop iterations and, consequently, the number of LDRB instructions executed. MyEnvoyAir
Cancel
Up 0 Down

Cancel

Children

0 Gofor_core over 1 year ago in reply to Demiholden

thanks, I only want to know the root causes, and there are lots of this style code, it is not reasonable to unroll all of loop code.
Cancel
Up 0 Down

Cancel
0 Ronan Synnott over 1 year ago in reply to Gofor_core

Which device are you using to benchmark this?
It may be that the memory system is inefficient for non word-aligned accesses?
Cancel
Up 0 Down

Cancel