We are running a survey to help us improve the experience for all of our members. If you see the survey appear, please take the time to tell us about your experience if you can.
I am using the Cortex-A53 processor (Xilinx Zynq Ultrascale+ SoC).
I have a problem that I get high BUS_ACCESS_LD count with write-streaming/read-allocate mode if I do a memset (it is a self-written memset in assembly). On the Xilinx chip I can also measure write byte count and read byte count to the DDR memory controller ports and I can see that the actual read byte count is not that high.
Testcase 1: memset of 1.085.440 bytes, write-streaming disabled:
L2D_CACHE: 32776BUS_ACCESS_LD: 65547L1D_CACHE_REFILL: 16386L1D_DACHE_WB: 16386L2D_CACHE_REFILL: 16388L2D_CACHE_WB: 16375
DDRC.S1 Write Byte Count: 524160DDRC.S1 Read Byte Count: 524480DDRC.S2 Write Byte Count: 523840DDRC.S2 Read Byte Count: 524416
One cacheline is 64 bytes. BUS_ACCESS counts beats, data width of the bus is 16 bytes. These values seem to make sense.
Testcase 2: memset of 1.085.440 bytes, write-streaming enabled:
L2D_CACHE: 16388BUS_ACCESS_LD: 16419L1D_CACHE_REFILL: 6L1D_DACHE_WB: 0L2D_CACHE_REFILL: 9L2D_CACHE_WB: 16255
DDRC.S1 Write Byte Count: 520128DDRC.S1 Read Byte Count: 384DDRC.S2 Write Byte Count: 520192DDRC.S2 Read Byte Count: 64
The count values of L2D cache access, L2D cache write-back and BUS_ACCESS_LD are close together. It makes sense that cache refill is low and L1 write-back is also low. But I do not understand why BUS_ACCESS_LD is so large in this case. I can see that on the DDR memory controller ports there are only a few bytes read.
There is an errata notice for the Cortex-A53 regarding "PMU counter values might be inaccurate when monitoring certain events". But only BUS_ACCESS and BUS_ACCESS_ST are mentioned there. Is there an error with BUS_ACCESS_LD and write-streaming?