Arm Community
Site
Search
User
Site
Search
User
Support forums
Arm Development Studio forum
Calculating L1 hit latency and L2 hit latency
Jump...
Cancel
Locked
Locked
Replies
3 replies
Subscribers
119 subscribers
Views
6512 views
Users
0 members are here
Options
Share
More actions
Cancel
Related
How was your experience today?
This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion
Calculating L1 hit latency and L2 hit latency
akshit dayal
over 12 years ago
Note: This was originally posted on 16th January 2012 at
http://forums.arm.com
All,
I am new here. I was interested in measuring the L1 hit latency for A15/A9. Which signals do I need to probe inside the ARM RTL to figure that out.for
1) L1 hit only
2) Also I have a scenario where I generate a L1miss and L2 hit. I need to evaluate the cache latency for l2 hit in that scenario too. Again which signals should I probe inside each ARM CPU to figure that out.
Thanks much,
Dmax
Parents
Peter Harris
over 12 years ago
Note: This was originally posted on 17th January 2012 at
http://forums.arm.com
Gotcha.
I'm a software guy, so can't really help on the RTL signal side of things, but can hopefully provide some guidance.
The first question is what do you really mean my miss latency? At the RTL level you can look at how long that instruction stalled for, but both A9 and A15 out-of-order instructions, so they will try and pull forward other instructions which have no dependencies to fill the gap in the pipeline. So there is "latency" but it isn't actually visible to software performance.
At this system performance level both A9 and A15 L1 caches behave like a single cycle access, unless you start pointer chasing (load a pointer, which you immediately deference) which incurs one or two cycles of overhead.
L2 cache latency varies from chip to chip. L2 caches tend to use compiled RAMs, enabling the SoC manufacturer to decide the PPA tradeoff for the L2 cache, as 512KB-2MB caches tend to be quite large =) On most smartphone type platforms I've used the software visible latency is somewhere around 16-20 cycles for A9 and an A15, although it is possible to synthesize a core which is a bit faster, or quite a lot slower than that.
HTH,
Iso
Cancel
Vote up
0
Vote down
Cancel
Reply
Peter Harris
over 12 years ago
Note: This was originally posted on 17th January 2012 at
http://forums.arm.com
Gotcha.
I'm a software guy, so can't really help on the RTL signal side of things, but can hopefully provide some guidance.
The first question is what do you really mean my miss latency? At the RTL level you can look at how long that instruction stalled for, but both A9 and A15 out-of-order instructions, so they will try and pull forward other instructions which have no dependencies to fill the gap in the pipeline. So there is "latency" but it isn't actually visible to software performance.
At this system performance level both A9 and A15 L1 caches behave like a single cycle access, unless you start pointer chasing (load a pointer, which you immediately deference) which incurs one or two cycles of overhead.
L2 cache latency varies from chip to chip. L2 caches tend to use compiled RAMs, enabling the SoC manufacturer to decide the PPA tradeoff for the L2 cache, as 512KB-2MB caches tend to be quite large =) On most smartphone type platforms I've used the software visible latency is somewhere around 16-20 cycles for A9 and an A15, although it is possible to synthesize a core which is a bit faster, or quite a lot slower than that.
HTH,
Iso
Cancel
Vote up
0
Vote down
Cancel
Children
No data