Architectures and Processors forum Code alignment significantly affecting performance?

State Not Answered
Locked Locked
Replies 7 replies
Subscribers 350 subscribers
Views 3384 views
Users 0 members are here

2025 survey

We are running a survey to help us improve the experience for all of our members. If you see the survey appear, please take the time to tell us about your experience if you can.

Options

How was your experience today?

This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Code alignment significantly affecting performance?

kmaeng over 4 years ago

I am running Thumb-2 instruction code on M-7 processor, STM32F750N8.

I am seeing non-negligible performance number variation depending on whether I insert one NOP right before a tight loop (and changing the address of each instruction by halfword). Other parts of the code are not touched at all.

I am not sure why this is happening. Does Thumb-2 instruction (especially, branches) run faster or slower depending on whether they are halfword- or word-aligned? Or can there be any other explanation?

Below is my very simple code, which simply loops around and does nothing.

0x80001d6: bf00 nop

0x80001d8: 3c01 subs r4, #1

0x80001da: d1fc bne.n 80001d6

This very simple 3 line code, when I loop for 10000000 times, takes about 100ms.
However, when I add nop at the beginning (so that the addresses move by 2 bytes to 0x80001d8, 0x80001da, 0x80001dc), the execution time is significantly reduced to 75ms.

I have tried disabling the I-cache and D-cache, and turned off the flash prefetcher and ST's flash accelerator, but a similar phenomenon was still there.
Is there any possible explanation for this? What I thought was:

1. Is halfword-aligned instructions or halfword-aligned branch slower?

2. Can this somehow be related to dual-issue?
3. Can this be because I am crossing some sort of a page/bank boundary?

4. Can this be vendor-specific or is this something about the ARM architecture?

I have searched a lot, but have not seen any relevant info.

Any help will be appreciated.

Thank you,

Code alignment significantly affecting performance?

Top replies