This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Confusion about the cycles in streamline

In streamline, multiple counters use cycles,e.g.
 MaliGPUCyclesGPUActive,MaliGPUCyclesNonFragmentQueueActive,MaliGPUCyclesFragmentQueueActive,MaliCoreTextureCyclesTexturingActive
.

In my understanding, these cycles describe the time consumption on the time axis, not the sum of the clock overhead of the real physical units.


If my understanding is correct, how is the cost of specific units such as texture units calculated? For example, there are two texture units working at the same time, and each consumes 1 cycle. At this time, is the cycle of the texture unit 1 or 2? In theory, if it is still according to my understanding, it is 1, but is this correct?


In addition, this cycles should be the sum of the clocks consumed by various loads in the ppt above. It is mentioned that the frequency of GPU and GPU cycles are compared to understand the load of the GPU, but this comparison is wrong with my understand above, because cycles are not sum of cycles of parallel multiple loads,but the span of the timeline.

  • Hi Shawn,   The GPU has many parallel queues and pipelines. Many of the counters show "something was running" in a particular queue, others show actual unit utilization of a particular block in the hardware.  

    Queues can contain multiple parallel units. And parallel units can be used concurrently by multiple queues, so in general things *don't* sum together in any meaningful way.   I'd start by reading though one of our counter guides, the diagrams help explain the hierarchy a little more. For example, for Mali-G77:  

     If my understanding is correct, how is the cost of specific units such as texture units calculated? For example, there are two texture units working at the same time, and each consumes 1 cycle.  

    Shader core counters are shown "per core", so they are frequency normalized for the target GPU. You don't need to worry about shader core count - you can just compare with frequency. Similarly, if a shader core has e.g. two arithmetic pipelines, counters are only shown for pipeline zero. Workload across the parallel units will be ~equal, and showing only unit zero also means that the data is implicitly normalized and can be compared directly with frequency.    

    HTH, 
    Pete