This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Calculating L1 hit latency and L2 hit latency

Note: This was originally posted on 16th January 2012 at http://forums.arm.com

All,

I am new here. I was interested in measuring the L1 hit latency for A15/A9. Which signals do I need to probe inside the ARM RTL to figure that out.for

1) L1 hit only
2) Also I have a scenario where I generate a L1miss and L2 hit. I need to evaluate the cache latency for l2 hit in that scenario too. Again which signals should I probe inside  each ARM CPU to figure that out.

Thanks much,

Dmax
Parents
  • Note: This was originally posted on 17th January 2012 at http://forums.arm.com

    Gotcha.

    I'm a software guy, so can't really help on the RTL signal side of things, but can hopefully provide some guidance.


    The first question is what do you really mean my miss latency? At the RTL level you can look at how long that instruction stalled for, but both A9 and A15 out-of-order instructions, so they will try and pull forward other instructions which have no dependencies to fill the gap in the pipeline. So there is "latency" but it isn't actually visible to software performance.

    At this system performance level both A9 and A15 L1 caches behave like a single cycle access, unless you start pointer chasing (load a pointer, which you immediately deference) which incurs one or two cycles of overhead.

    L2 cache latency varies from chip to chip. L2 caches tend to use compiled RAMs, enabling the SoC manufacturer to decide the PPA tradeoff for the L2 cache, as 512KB-2MB caches tend to be quite large =) On most smartphone type platforms I've used the software visible latency is somewhere around 16-20 cycles for A9 and an A15, although it is possible to synthesize a core which is a bit faster, or quite a lot slower than that.

    HTH,
    Iso
Reply
  • Note: This was originally posted on 17th January 2012 at http://forums.arm.com

    Gotcha.

    I'm a software guy, so can't really help on the RTL signal side of things, but can hopefully provide some guidance.


    The first question is what do you really mean my miss latency? At the RTL level you can look at how long that instruction stalled for, but both A9 and A15 out-of-order instructions, so they will try and pull forward other instructions which have no dependencies to fill the gap in the pipeline. So there is "latency" but it isn't actually visible to software performance.

    At this system performance level both A9 and A15 L1 caches behave like a single cycle access, unless you start pointer chasing (load a pointer, which you immediately deference) which incurs one or two cycles of overhead.

    L2 cache latency varies from chip to chip. L2 caches tend to use compiled RAMs, enabling the SoC manufacturer to decide the PPA tradeoff for the L2 cache, as 512KB-2MB caches tend to be quite large =) On most smartphone type platforms I've used the software visible latency is somewhere around 16-20 cycles for A9 and an A15, although it is possible to synthesize a core which is a bit faster, or quite a lot slower than that.

    HTH,
    Iso
Children
No data