This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Mali offline compiler - L/S cycles meaning

Hi, 

It's slightly unclear to me what the L/S cycles reported refer to. Since malioc is not taking into account memory-latency, etc.. are those cycles just related with the number of instructions issued to fetch attribute data and store the pre-interpolated varying results?

e.g. 

                               A      LS       T    Bound
Total instruction cycles:   20.60   35.00    0.00       LS
Shortest path cycles:       16.60   29.00    0.00       LS
Longest path cycles:          N/A     N/A     N/A      N/A

Cheers

Parents
  • Hi JPJ, 

    The aim is that it report architectural throughput for the Load/Store pipeline, so number of active cycles doing useful work.

    In terms of what it counts:

    • For Midgard family GPUs the LS pipe includes the interpolator - there is no separate varying pipeline.
    • For Bifrost and Valhall family GPUs the LS pipe excludes the interpolation - the interpolator is a separate unit reported as the "V" pipe in the reports. 

    L/S includes - any non-texture memory access (attributes, ubos, ssbos, atomics, images, local memory in compute shaders, stack spills, programmatic tile access).

    One caveat on the newer hardware (Bifrost / Valhall) is that the LS metric may over-estimate. The hardware can merge LS accesses for threads in the same warp if they hit the same cache line, but the compiler cannot know if this happens at compile time so you get the conservative number.

    HTH, 
    Pete

Reply
  • Hi JPJ, 

    The aim is that it report architectural throughput for the Load/Store pipeline, so number of active cycles doing useful work.

    In terms of what it counts:

    • For Midgard family GPUs the LS pipe includes the interpolator - there is no separate varying pipeline.
    • For Bifrost and Valhall family GPUs the LS pipe excludes the interpolation - the interpolator is a separate unit reported as the "V" pipe in the reports. 

    L/S includes - any non-texture memory access (attributes, ubos, ssbos, atomics, images, local memory in compute shaders, stack spills, programmatic tile access).

    One caveat on the newer hardware (Bifrost / Valhall) is that the LS metric may over-estimate. The hardware can merge LS accesses for threads in the same warp if they hit the same cache line, but the compiler cannot know if this happens at compile time so you get the conservative number.

    HTH, 
    Pete

Children