This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

ARMv7-A: Cache maintenance operation by VA, performance

Hi,

according to this talk, cache maintenance should always be performed by VA and not by set/way except during boot or shutdown. However, invalidating/cleaning a block of data by VA requires a loop to run over the entire memory block (in steps equal to the line size). If the memory block is large (say, a frame buffer that should subsequently be read by DMA when there is no hardware cache coherency), there is a significant overhead for the loop counting alone. If the memory block is much larger than the cache, most of the maintenance operations will be NOP's because the targeted addresses aren't cached, but the software can't know that. Maintenance by set/way and a loop that iterates through all ways and sets (such as the example given on p. 8-20 of the Cortex-A Programmer's Guide) has a fixed runtime independent of the actual buffer size, and will be faster for large buffers.

A framebuffer could probably simply be marked as uncached, but that is no general solution for every use case. So, how to correctly invalidate/clean cache for large memory buffers? If I am using a single-core Cortex-A8 with no L3 cache, would set/way be correct?

Thanks,
Niklas

Parents
  • That's right, but there might be other cases where a cache is virtually required, e.g. when receiving a large data block via USB, Ethernet, MMC/SDIO (e.g. SD-Card or SDIO-based WiFi-Adapter), because multiple read accesses profit from caches.
    Of course, for receiving, I need to do an invalidate instead of a clean, but the issue stays the same. According to the mentioned talk, two invalidate loops are necessary, making it worse.

Reply
  • That's right, but there might be other cases where a cache is virtually required, e.g. when receiving a large data block via USB, Ethernet, MMC/SDIO (e.g. SD-Card or SDIO-based WiFi-Adapter), because multiple read accesses profit from caches.
    Of course, for receiving, I need to do an invalidate instead of a clean, but the issue stays the same. According to the mentioned talk, two invalidate loops are necessary, making it worse.

Children
No data