Hi folks,
I'm trying to do some sort of profiling for the Cortex-M3 on the MCBSTM32C board. In particular, I have to establish how many clock cycles are needed for the execution of a specific algorithm. To this aim, I'm using the following code:
// variable definitions uint32_t clock_cycles_counter; volatile unsigned int *DWT_CYCCNT = (uint32_t *)0xE0001004; //address of the register volatile unsigned int *DWT_CONTROL = (uint32_t *)0xE0001000; //address of the register volatile unsigned int *SCB_DEMCR = (uint32_t *)0xE000EDFC; //address of the register [...] // configure and start the clock cycles counter clock_cycles_counter = 0; *SCB_DEMCR = *SCB_DEMCR | 0x01000000; *DWT_CYCCNT = 0; *DWT_CONTROL |= 1; // do something algorithm(); // stop and get the counter value *DWT_CONTROL &= ~1; clock_cycles_counter = *DWT_CYCCNT; // print the counter value printf("%d\n\r", clock_cycles_counter);
This code works, but something strange is happening. In particular, if I change the last printf() with this one:
printf("%s %d\n\r", "test", clock_cycles_counter);
The printed clock cycles value is different. I guess it shouldn't be the case, because the clock cycles counter has been already set when the printf() executes.
I've also tried to do some debug, like multiple printf() (one after an other) and each one prints the same value.
Unfortunately, I can't see the disassembly since I'm trying the free versione of the Keil toolchain.
Any hint would be greatly appreciated!
Thank you, Pierpaolo
Hi,
thank you for the reply! I forgot to say it, but I've already tried to define the variable as volatile, but unfortunately the behavior is still the same.
I'm pretty sure that what the compiler generated is something like: printf("%s %d\n\r", "test", clock_cycles_counter = *DWT_CYCCNT);
So you are probably suffering a problem similar to what is discussed in this link: preshing.com/.../
Yes, I also believe that is the problem. Do you know some guidelines (links or something) to avoid this type of optimization for the Keil ARM compiler? Because I've tried, but I've found nothing.
Might this link help? www.keil.com/.../group__intrinsic___c_p_u__gr.html
Thank you for the tip! I've tried using:
__dmb(0xF); __dsb(0xF); __isb(0xF);
But, while clock cycles changed a bit (of course), the behavior between the two printf() is still the same.
The intrinsics are intended for the processor core.
But if the ARM compiler doesn't recognize the meaning of the intrinsic, then it might still assume that it's allowed to produce code that moves the assign of the variable to after the intrinsic - so the barrier then didn't make a difference.
The compiler is expected to recognize the use of the barrier and not try to move any produced processor instructions across the barrier.
Maybe you should open a support ticket.
This link, alas, doesn't provide any additional information, even if it spends some time describing the barrier instructions. infocenter.arm.com/.../index.jsp