Hi,
It's slightly unclear to me what the L/S cycles reported refer to. Since malioc is not taking into account memory-latency, etc.. are those cycles just related with the number of instructions issued to fetch attribute data and store the pre-interpolated varying results?
e.g.
A LS T Bound Total instruction cycles: 20.60 35.00 0.00 LS Shortest path cycles: 16.60 29.00 0.00 LS Longest path cycles: N/A N/A N/A N/A
Cheers
Hi JPJ,
The aim is that it report architectural throughput for the Load/Store pipeline, so number of active cycles doing useful work.
In terms of what it counts:
L/S includes - any non-texture memory access (attributes, ubos, ssbos, atomics, images, local memory in compute shaders, stack spills, programmatic tile access).
One caveat on the newer hardware (Bifrost / Valhall) is that the LS metric may over-estimate. The hardware can merge LS accesses for threads in the same warp if they hit the same cache line, but the compiler cannot know if this happens at compile time so you get the conservative number.
HTH, Pete
Thanks for the reply Pete. So, my interpretation "architectural throughput" is as a combination of memory-related cycles (hits/misses, latency) and instructions - is this assumption correct?Regarding the interpolator cycles, wouldn't that scale with the amount of pixels a primitive would span? So, more pixels more cycles?
"Architectural throughput" is just the processing cost of "doing" the instruction. Most of the time the GPU can hide misses and fetch latency - we have other things to run in parallel - so that's all ignored for the purposes of this metric.
For the interpolator costing, the cycle cost here is per fragment so primitive size doesn't matter for these metric (but would for determining total draw call cost - you need to scale these by your screen coverage).
Thanks for the clarification Pete! So, for the Midgard case this metric combines a mix of vertex and fragment stage cost: a "fixed" cost for the 3 vertices (in case of a triangle) and a variable cost (coverage-dependent) for the fragment side, correct?