This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Cortex-A8 performance

I'm working an a project on a Texas Instruments AM3517 Cortex-A8 processor. I was seeing less than expected performance, and did a simple comparison with a Cortex-M3 processor. The M3 performance was more than twice as good as the A8(?!).

The test was a simple count to 100,000:

while (1)
{
    volatile uint32_t    i;
  
    dbg_PinSet(DBG_PIN_00);
    for ( i = 0; i < 100000; i++ )
    {
    }
    dbg_PinClear(DBG_PIN_00);
}

This is a bare metal system.  Timing was measured with a scope and the debug pin, and found to be about 40 ms on the A8 clocked at 600 MHz, and about 14 ms on the M3 clocked at 72 MHz.

The code on the A8 is running from the on chip 64K ram to remove cache and external memory effects.  Interrupts are disabled on both processors.

I'm relatively new to the A8, and suspect I'm missing something simple in setup somewhere.

Any pointers or help will be greatly appreciated.

Thanks,

-Rob

Parents
  • Hi rlepage,

    as I don't have the Cortex-A8 board, I executed the program on the Cortex-A9 board (i.e. Renesas RZ/A1L). The frequency is 384MHz. The results are the below.


    icahce=ON  branch predict=ON   0.392ms
    icahce=ON  branch predict=OFF  16.4ms
    icahce=OFF branch predict=ON   0.392ms
    icahce=OFF branch predict=OFF  11.7ms


    The branch prediction would be dominant in the measurement. If the branch prediction is OFF, the icache would even be a penalty.
    I'm not sure why your Cortex-A8 resulted in such slow mark.
    By the way, what was your results if the icahce and the branch prediction were ON?

    For your information, the Cortex-M4 results are the followings. The frequency is 50MHz. I used the FTDM-K20D50M.

    Flash execution  8ms

    SRAM execution 12ms

    Best regards,

    Yasuhiko Koumoto.

Reply
  • Hi rlepage,

    as I don't have the Cortex-A8 board, I executed the program on the Cortex-A9 board (i.e. Renesas RZ/A1L). The frequency is 384MHz. The results are the below.


    icahce=ON  branch predict=ON   0.392ms
    icahce=ON  branch predict=OFF  16.4ms
    icahce=OFF branch predict=ON   0.392ms
    icahce=OFF branch predict=OFF  11.7ms


    The branch prediction would be dominant in the measurement. If the branch prediction is OFF, the icache would even be a penalty.
    I'm not sure why your Cortex-A8 resulted in such slow mark.
    By the way, what was your results if the icahce and the branch prediction were ON?

    For your information, the Cortex-M4 results are the followings. The frequency is 50MHz. I used the FTDM-K20D50M.

    Flash execution  8ms

    SRAM execution 12ms

    Best regards,

    Yasuhiko Koumoto.

Children
No data