We are running a survey to help us improve the experience for all of our members. If you see the survey appear, please take the time to tell us about your experience if you can.
Hi everyone,
I am trying to understand why the cycle count is not linearly increasing if I run the same instruction x times on the STM32F405RGT6.
When playing around with the STM32F4 to get a better understanding of instructions to cycles counts, I stumbled across this problem. From the documentation I know that a NOP and an ADD operation both should take one cycle. Therefore I would expect x NOPs to take x cycle and the same for ADD. What I found is somewhat different.
To produce these results I wrote a quick script that creates 30 assembly code functions. I can read the cycle count from the memory address 0xE0001004 on the STM32F405RGT6. I can also input any values into r3 and r4 for the ADD operation and check the result in r5. In r6 I get the number of cycles my instructions took. I checked the final elf file with objdump to verify that no operations were removed/rearranged/altered by the compiler.
# Replace XXXX by #instructions .global asm_add_XXXX .type asm_add_XXXX, %function .align 2 asm_add_XXXX: push {r3-r12} # Reading inputs to register ldr r3, [r0, #0] ldr r4, [r0, #4] # r5, r6 = outputs register mov r5, #0 mov r6, #0 # Cycle count address to register aka. DWT_CYCCNT ldr r9, =0xE0001004 .align 4 # Save current cycle count in r7 ldr r7, [r9, #0] ###################################### ### Start of assembly code ### ###################################### # Insert add instruction x times add r5, r3, r4 ###################################### ### End of assembly code ##### ###################################### # Save current cycle count in r8 ldr r8, [r9, #0] # Calculate cycles in r6 = r8 - r7 sub r6, r8, r7 # Write back output str r5, [r2, #0] str r6, [r2, #4] pop {r3-r12} bx lr # Avoid literal pools due to fake ldr .LTORG
Can someone please explain why the cycle count is increasing non-linearly? Thanks in advance for any insights.
Are you sure you don't have a perhaps-hidden board_init() function or something that is upping the clock rate and turning on wait states?(you could check by READING flash_acr...)
I checked the FLASH_ACR using a JTAG debugger and found out that there really is a hidden section in the board_init() which changes the WAIT_CYCLES to 5. That is a small victory.
Sadly I can still not make sense of the plot. Instead of plotting the absolute cycle count, I did plot the number of extra cycles it took when adding another instruction. Meaning in a perfect world every NOP I add should take an extra 1 cycle.
As you can see every 8 NOP instructions (16-bit instruction) and every 4 ADD instructions (32-bit instruction) there is an increase in needed cycles. This makes sense as the FLASH loads 128-bit at a time. Therefore I would expect the board to take 6 cycles whenever a load happens.
This is true for the ADD instruction, but not for the NOP instruction. Also, there are times where another NOP takes 0 cycles which should not be possible. With the given explanations I can predict the cycle count for 4 or more consecutive ADD instructions. Any idea what is the reason for the behaviour for less than 4 ADDs?