I'm trying to understand read allocate mode in cortex A7 core. From description of Read allocate mode in TRM of Cortex A7 core, my understanding is that bzero will downgrade memset performance while memset was done with consecutive memory. For example, lmbench bzero test will apply 100M memory, the it will try it clear memory via memset, then measure the performance. Please correct me if my above understanding is wrong.
The L1 data cache only supports a Write-Back policy. It normally allocates
a cache line on either a read miss or a write miss. However, there are some
situations where allocating on writes is undesirable, such as executing the
C standard library memset() function to clear a large block of memory to a
known value. Writing large blocks of data like this can pollute the cache
with unnecessary data. It can also waste power and performance if a
linefill must be performed only to discard the linefill data because the
entire line was subsequently written by the memset().
To prevent this, the Bus Interface Unit (BIU) includes logic to detect when
a full cache line has been written by the processor before the linefill has
completed. If this situation is detected on three consecutive linefills, it
switches into read allocate mode. When in read allocate mode, loads
behave as normal and can still cause linefills, and writes still lookup in the
cache but, if they miss, they write out to L2 rather than starting a linefill.
The BIU continues in read allocate mode until it detects either a cacheable
write burst to L2 that is not a full cache line, or there is a load to the same
line as is currently being written to L2.
I'm sorry but I cannot catch your concerning point.
The statements which you referred only describes that the sequential data will bypass the data cache if CPU had noticed the sequential property of data before starting the cache refill (i.e read or write allocation).
Regarding "Read allocate will impact bzero performance or not", the answer would be NO, because the read allocate would not happen.
Thanks a lot for your comments.
Do you mean that the scenario what I described will not trigger read allocate mode?
For bzero test in lmbench, it will call memset to clear sequential memory. This may trigger Read allocate condition that detect three consecutive linefills.
Is my understanding correct?
of course you are right.
I understood your question was the affect for performance of bzero by Read Allocate Mode.
Therefore, I answered there would not be no affect.
My understanding of the statements is that any write will bypass the data cache if it fell into Read Allocate Mode. This means that to back to Write Allocate Mode, at least one read will be necessary.
Regarding bzero (or memset) case, as there are only sequential write, the Read Allocate Mode will not decrease the performance.