Asking this to better understand how to manage cache consistency on I/O buffer for DMA read.
Is it possible that in between of cache invalidation on the buffer and end of DMA read, a speculative data access brings incomplete data to the cache?
When the cache on DMA read buffer (device to CPU) should be invalidated: before starting DMA, or after the DMA ends?
These two articles should help: [1], [2]
@a.surati Thank you but these articles do not look helpful. The question is about speculative data reads on CM7 (specifically in my case, STM32H7) and later (CM23/33... 85)
(answering to myself)
The definitive answer can be found in CM7 TRM for r1p2 (2018) , section 5.2
"Speculative data reads can be initiated to any Normal, read/write, or read-only memoryaddress. In some rare cases, this can occur regardless of whether there is any instructionthat causes the data read."
"Speculative cache linefills are never made to Non-cacheable memory addresses"
My conclusion from this: if DMA read buffer is defined non-cacheable, speculative read can occur, but it won't pollute D-cache.
Speculative linefills from non-cacheable memory are forbidden.
So, defining a DMA buffer non-cacheable, or non-normal (SO, device) should prevent D-cache pollution.
"Cache maintenance" cannot avoid speculative linefills during DMA read If the buffer is normal cacheable memory.
(What is said there about TCM memories is not relevant because TCM memories are normally not used with DMA)
Comments are welcome.
The slides from 28 onwards on [2] provide the answer to your second question. That the DMA invalidation is needed both before and after the DMA read. Slide 29 explicitly refers to cache-fills due to speculation. They say that yes, it is possible for speculation to bring cache-lines into memory (although lines brought in such a way are not marked dirty (unless CPU writes to them)), and then goes on to provide a solution to dealing with such a situation,
The article in [1] is about CM7; it gives the sequence of cache-maintenance instructions required to properly operate a DMA read on a rx buffer (a buffer that the CPU will not write to; this assumption is not made in [2], so [2] is more generic).
All your questions are answered in the links [1] and [2].
From your answer to yourself, it seems that you are not interested in dealing with speculative accesses or caching. If caching and speculative accesses are being disabled altogether, I wonder what the point was in raising questions about handling those very aspects one wants disabled. No point in asking for a scalpel when one is looking for a hammer.
Pavel A said:"Cache maintenance" cannot avoid speculative linefills during DMA read If the buffer is normal cacheable memory.
Are you saying that all such DMA-read buffers are stored in non-cacheable or device memory? The slides in [2] provide with the sequence where cache-fills due to speculation and dirty-cache lines due to CPU-writes both can be handled on a cacheable DMA-read buffer.
a.surati You are absolutely right, your link [2] exactly answers my question (I dismissed it on 1st reading, as it says it is about Cortex-A v8, while my question is about Cortex M v7).
The Microchip article [1] formally is correct - but misses the very important explanation on speculative line fills - why there should be a cache clean after the DMA.
The slide show by Mark Rutland [2] seems to provide the ultimate correct answer: clean the cache both before and after DMA read - but combined with his previous words about how complicated all this stuff is and will be more complicated in future - makes me look for simpler, more robust solutions. What you call a hammer, yes. I want my drivers to pass code review without questions.
But setting up a non-cached area in MPU is not so easy (must be synchronized with link script...) - after all, cache maintenance could be better for code robustness. MPU of CM7 is too limited.
Again thank you for pointing to these sources. Your answer has been noticed on the STM32 forum and it looks like some people there are surprised. Some of official STM32 libraries and examples are affected.