Hi everyone,
I am trying to understand why the cycle count is not linearly increasing if I run the same instruction x times on the STM32F405RGT6.
When playing around with the STM32F4 to get a better understanding of instructions to cycles counts, I stumbled across this problem. From the documentation I know that a NOP and an ADD operation both should take one cycle. Therefore I would expect x NOPs to take x cycle and the same for ADD. What I found is somewhat different.
To produce these results I wrote a quick script that creates 30 assembly code functions. I can read the cycle count from the memory address 0xE0001004 on the STM32F405RGT6. I can also input any values into r3 and r4 for the ADD operation and check the result in r5. In r6 I get the number of cycles my instructions took. I checked the final elf file with objdump to verify that no operations were removed/rearranged/altered by the compiler.
# Replace XXXX by #instructions .global asm_add_XXXX .type asm_add_XXXX, %function .align 2 asm_add_XXXX: push {r3-r12} # Reading inputs to register ldr r3, [r0, #0] ldr r4, [r0, #4] # r5, r6 = outputs register mov r5, #0 mov r6, #0 # Cycle count address to register aka. DWT_CYCCNT ldr r9, =0xE0001004 .align 4 # Save current cycle count in r7 ldr r7, [r9, #0] ###################################### ### Start of assembly code ### ###################################### # Insert add instruction x times add r5, r3, r4 ###################################### ### End of assembly code ##### ###################################### # Save current cycle count in r8 ldr r8, [r9, #0] # Calculate cycles in r6 = r8 - r7 sub r6, r8, r7 # Write back output str r5, [r2, #0] str r6, [r2, #4] pop {r3-r12} bx lr # Avoid literal pools due to fake ldr .LTORG
Can someone please explain why the cycle count is increasing non-linearly? Thanks in advance for any insights.
I checked, and this chip has CCM, which seems to be something like TCM. Since it is limited in size, you need to find out the bottleneck code.