This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

How to rate performance without hardware

Note: This was originally posted on 8th July 2012 at http://forums.arm.com

Hi,
I'm using DS-5 with armcc. Is there any way to rate code performance without running it on hardware e.g. with profiling? My typical development cycle looks like: make some small change in loop, check if that improves performance, if yes make further changes in same direction, if no go back to entry point and try something else. E.g. compiler for TI DSPs can generate assembly files with pipeline information, like iteration intervals for loops. I'm looking for something similar for arm, because for now --asm option allow me only to manually check assembly code, no straight information about performance. TI also offers cycle accurate simulator, but I guess there is no such thing for arm. If this matters, my current platform is Cortex-A9.
Thanks in advance for any suggestions.

regards
MS
Parents
  • Note: This was originally posted on 9th July 2012 at http://forums.arm.com

    You could try one of the ARM software models - they tend to give a "OK" picture of pipeline performance, although it depends on the core - they are not designed to be cycle accurate for the A-profile cores.

    The main problem you will have is modelling the memory system. For a core running around 1GHz you tend to find that CPU cycles are cheap - it's the cache misses and memory accesses which really matter (e.g., 1 cycle benefit for shuffling a loop around, 120 cycles saved by removing a cache miss), and those tend to be very hardware specific depending on memory latency, cache size, cache access pattern, and the software you are running.
Reply
  • Note: This was originally posted on 9th July 2012 at http://forums.arm.com

    You could try one of the ARM software models - they tend to give a "OK" picture of pipeline performance, although it depends on the core - they are not designed to be cycle accurate for the A-profile cores.

    The main problem you will have is modelling the memory system. For a core running around 1GHz you tend to find that CPU cycles are cheap - it's the cache misses and memory accesses which really matter (e.g., 1 cycle benefit for shuffling a loop around, 120 cycles saved by removing a cache miss), and those tend to be very hardware specific depending on memory latency, cache size, cache access pattern, and the software you are running.
Children
No data