Hello,
I am using a Nuvoton M2351 for my research. I am running into a very weird issue. I am trying to get some portions of FreeRTOS to run inside the secure domain. I am using the EventRecorder API to time sections of the code (EventStartA and EventStopA). The processor is set at 12 MHz core clock and I am building the code without any optimizations enabled. While timing a simple for loop (which increments a value) running 1000 times in the secure domain, I log around 2ms. The exact same loop, when implemented in a non-secure function and called from the secure domain, takes 2.9 ms. This is nearly a 1.5X increase in runtime. Considering the function call overhead across domains is around 20-30 instructions, this is extremely high. Am I doing something wrong to cause this? Does the processor frequency change when switching? I can't make any sense of this behavior. Any help would be appreciated.
Thank you
The time difference may be possible for S/NS domains.
I googled the Nuvoton M2351 data sheet and it is based on Cortex-M23. Please refer to this M2351 SoC component chart:
- https://www.nuvoton.com/export/sites/nuvoton/images/Microcontrollers/Figure1-3_M2351-Series-TrustZone-Architecture_small.png
In your benchmark, the secure and non-secure code will read different memory regions? Per the M2351 chart, different memory regions may have different timing paths.
<quote>
While timing a simple for loop (which increments a value) running 1000 times in the secure domain, I log around 2ms. The exact same loop, when implemented in a non-secure function and called from the secure domain, takes 2.9 ms.
</quote>
I am not sure what is your benchmark goal. If the benchmark code is implemented in NS function and called from Secure domain, it needs an additional world switch ( secure-> non-secure -> secure ); while it is not for the benchmark code implemented in Secure domain and called from Secure domain.
Hello!
Thank you for your response. Yes, the two pieces of code read 2 different memory regions. While the diagram you present is helpful, can you elaborate on the difference in timing path? I don't see why I would see a 50% difference in execution time. Also, the measurements I reported were after taking into consideration the overhead of the world switch.
Thanks
I'm afraid that we cannot comment on the external memory outside of Arm Processor.