This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

ARMv6 performance monitor: Can I record the instruction which caused the data cache miss

Hi, I'm new to community.

I am recently working on cache performance evaluation of a software on arm ( which I did not know much about before) and aiming  to record all the instructions causing a data cache miss.

Currently, my way is straightforward: I configure the PMUIRQ as FIQ and set counter to -1 initially, and every time a overflow FIQ occurs the handler will first disable cache, disable counter, push lr to a stack, reset counter to -1, enable counter, enable cache again and return to lr - 4.

But from the lr I record, I found many of the cache-miss instructions( which are lr - 8 ) do not access data in memory (like branch, cmp, etc).

I want to ask:

1. Which instructions are possible to cause a cache miss ?

2. Is that possible that the FIQ request is delayed ?

3. Is there any better idea to record the cache-miss instructions?

Thank you!

  • Thanks a lot Peter. You are right, it is not reasonable to record exact instructions. I plan to try your advice and hope to give an overview of the “hot spot" map. But this can only deal with instruction cache miss I guess. For data cache miss, I think we may need to figure out an alternative.

  • But this can only deal with instruction cache miss I guess

    Why you think the approach will not work for data cache misses? It should work for any performance counter.

  • > 1. Which instructions are possible to cause a cache miss ?

    Many of the ARM11 cores support "hit under miss" so the data processing execution can keep progressing while a non-dependant cache miss is processed - so I wouldn't necessarily expect an exact PC match.

    Many of the PMU counters are also pipelined, so the exact cycle when the PMU is incremented may not exactly match the logical PC when the event was generated. It should be "close" though.

    > 3. Is there any better idea to record the cache-miss instructions?

    Most cache analysis is statistical anyway - it isn't a precise science - so you don't try and make it exact. You don't need to know which instruction generated the cache miss - you can just create a "miss map" based on PC value, and plot the density of misses over time. The "host spots" will generally cluster around a particular set of PC values, and you can then use the symbol table to convert that back to a specific function.

    I believe that DS-5 Streamline can already perform this kind of analysis; it is possible to change the trigger for samples to be whatever counter you want to use, rather than time or number of cycles.

    HTH,
    Pete

  • Oh, I did not make myself clear. Basically, I want to rearrange our data structures to reduce the data cache miss ( and restructure the code to reduce the instruction miss). So I think I need to know which data (or which data area) is accessed rather than which function causes data miss ( admittedly, the function might narrow down the range ).

  • Hi chenzhanalbert,

    how about setting the FI bit of the system control register (CP15 register 1), if the mismatch came from the hit under miss feature. You can prohibit the hit under miss by setting FI bit.

    Best regards,
    Yasuhiko Koumoto.

  • Thank you!! I'm gonna test if this can improve accuracy.