Hello,
I am using a Nuvoton M2351 for my research. I am running into a very weird issue. I am trying to get some portions of FreeRTOS to run inside the secure domain. I am using the EventRecorder API to time sections of the code (EventStartA and EventStopA). The processor is set at 12 MHz core clock and I am building the code without any optimizations enabled. While timing a simple for loop (which increments a value) running 1000 times in the secure domain, I log around 2ms. The exact same loop, when implemented in a non-secure function and called from the secure domain, takes 2.9 ms. This is nearly a 1.5X increase in runtime. Considering the function call overhead across domains is around 20-30 instructions, this is extremely high. Am I doing something wrong to cause this? Does the processor frequency change when switching? I can't make any sense of this behavior. Any help would be appreciated.
Thank you
No, the frequency should be the same between secure and non-secure domains. Your benchmark may have some unknown issues ( e.g there are more instruction difference than you expected )
Thank you. I suspected so. However, my benchmark is a simple for loop incrementing a uint32_t number 1000 times. The assembly looks exactly the same (other than the the memory addresses) for both loops. I am using the SysTick timer and breakpointing before and after the loop and note the difference in the timer values to determine how much time it takes to complete the loop. The following is from the Secure and NonSecure project output .txt files.
I even manually went over instruction by instruction and noted the timer values and I noted the difference there too in some instructions. Any idea why this difference would exist? Please let me know
Thanks
The time difference may be possible for S/NS domains.
I googled the Nuvoton M2351 data sheet and it is based on Cortex-M23. Please refer to this M2351 SoC component chart:
- https://www.nuvoton.com/export/sites/nuvoton/images/Microcontrollers/Figure1-3_M2351-Series-TrustZone-Architecture_small.png
In your benchmark, the secure and non-secure code will read different memory regions? Per the M2351 chart, different memory regions may have different timing paths.
<quote>
While timing a simple for loop (which increments a value) running 1000 times in the secure domain, I log around 2ms. The exact same loop, when implemented in a non-secure function and called from the secure domain, takes 2.9 ms.
</quote>
I am not sure what is your benchmark goal. If the benchmark code is implemented in NS function and called from Secure domain, it needs an additional world switch ( secure-> non-secure -> secure ); while it is not for the benchmark code implemented in Secure domain and called from Secure domain.
Hello!
Thank you for your response. Yes, the two pieces of code read 2 different memory regions. While the diagram you present is helpful, can you elaborate on the difference in timing path? I don't see why I would see a 50% difference in execution time. Also, the measurements I reported were after taking into consideration the overhead of the world switch.
I'm afraid that we cannot comment on the external memory outside of Arm Processor.