I'm working an a project on a Texas Instruments AM3517 Cortex-A8 processor. I was seeing less than expected performance, and did a simple comparison with a Cortex-M3 processor. The M3 performance was more than twice as good as the A8(?!).
The test was a simple count to 100,000:
while (1){ volatile uint32_t i; dbg_PinSet(DBG_PIN_00); for ( i = 0; i < 100000; i++ ) { } dbg_PinClear(DBG_PIN_00);}
This is a bare metal system. Timing was measured with a scope and the debug pin, and found to be about 40 ms on the A8 clocked at 600 MHz, and about 14 ms on the M3 clocked at 72 MHz.
The code on the A8 is running from the on chip 64K ram to remove cache and external memory effects. Interrupts are disabled on both processors.
I'm relatively new to the A8, and suspect I'm missing something simple in setup somewhere.
Any pointers or help will be greatly appreciated.
Thanks,
-Rob
Hi rlepage,
as I don't have the Cortex-A8 board, I executed the program on the Cortex-A9 board (i.e. Renesas RZ/A1L). The frequency is 384MHz. The results are the below.
icahce=ON branch predict=ON 0.392msicahce=ON branch predict=OFF 16.4msicahce=OFF branch predict=ON 0.392msicahce=OFF branch predict=OFF 11.7ms
The branch prediction would be dominant in the measurement. If the branch prediction is OFF, the icache would even be a penalty.I'm not sure why your Cortex-A8 resulted in such slow mark.By the way, what was your results if the icahce and the branch prediction were ON?
For your information, the Cortex-M4 results are the followings. The frequency is 50MHz. I used the FTDM-K20D50M.
Flash execution 8ms
SRAM execution 12ms
Best regards,
Yasuhiko Koumoto.