This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Infering the GPU Active cycles in Mali T-628

Hi ,

I have the following questions regarding Mali T-628 GPU.

 I am running a 3D convolution OpenCL kernel in Mali T-628 GPU (set of 4 cores device).  I obtain the GPU cycles counter using Streamline

The GPU Vertex-Tiling-Compute:Activity counter shows 100% utilization . So this translates to 100% GPU utilization.

Then the GPU active cycles should match the runtime right?

In a sampling interval of 1s , the GPU active cycles should be 600*10^6 whereas the GPU active cycles reported by streamline is 3141810

Why this discrepancy?

Please help me understand this.

Thanks

  • It's unlikely that a silicon system can even clock as low as 3MHz, so it's definitely not 100% utilizing the GPU for only 3M cycles.

    What Mali driver version (should be returned by the vendor string - and should look something like r<N>p<M> - e.g. r8p0) and Streamline gator version are you using? Not sure what I can do to help debug this one remotely - but  I can say it's not working correctly ...

    Cheers, 
    Pete

  • Hi

    Mali  driver  Version --  r4p0-02rel0

    Streamline gatord version 6.10 (DS-5 v5.61)

    My problem is as follows :

    I use a timer to measure the execution time of OpenCL kernel.  For the OpenCL kernel I am executing at 600MHz , it gives me OpenCL Runtime: 8.222271 seconds.

    I profile the same kernel using Streamline (1s sampling interval).  I get the  following statisitcs :

    Time Index    GPU Vertex-Tiling-Compute:Activity    Mali Job Manager Cycles:GPU cycles    Mali Job Manager Cycles:JS1 cycles    Mali Core Cycles:Tripipe cycles    Mali Core Cycles:Compute cycles
    4    0.00%    2432752    0    4004    0

    --- Kernel Start-----
    5    3.58%    23830709    21539073    14360724    14357377
    6    99.55%    597264018    597099182    385657069    385661788
    7    99.52%    597069905    596896010    385843125    385848021
    8    99.64%    476883765    476739306    307584997    307589073
    9    99.36%    327274357    327183748    211693189    211696096
    10    99.52%    597066566    596880847    385265405    385270772
    11    99.58%    597396030    597227832    386222691    386227613
    12    99.52%    597074759    596898333    386156229    386161004
    13    99.50%    596939208    596755666    383900260    383905987
    14    12.45%    76855608    74689989    48011367    48008148

    ---Kernel End---
    15    0.00%    2432775    0    4004    0

    If GPU activity is 99.36%, GPU active cycles should be 596160000 but streamline reports 327274357.

    I calculate my execution time as sum of all GPU cycles from Kernel start till Kernel end .  It gives me 4487654925 cycles =

    7.479424875 seconds but the actual runtime is 8.222271 seconds.

    I am talking about this discrepancy.. Please help me understand how can I fix this.

    Thanks

  • Do you have the ability to rebuild and replace the Mali kernel driver? r4p0 is quite old (almost 5 years old now - our latest release is r18p0) and I suspect has a bug relating to how the counter memory is set up; if you can rebuild and replace it I can try to provide a patch.

  • If you can please provide the patch. I will try to rebuild the linux kernel

  • I've just spotted that in this data you do have data points correctly reporting 599M cycles reporting 100% utilization, which looks correct to me. This is very different to the 3M cycles you reported in your first comment ...

    In terms of some data points reporting low, it's entirely possible that the platform is adjusting frequency and voltage based on idle periods or thermal load if it's overheating due to sustained workloads while running an overclock.

    maasa said:
    7.479424875 seconds but the actual runtime is 8.222271 seconds.

    Software isn't zero cycles - there is some driver load to setup and complete the work.

  • Hi ,

    Thank you for your inputs. But my question is 

    In the following data,

    Time Index    GPU Vertex-Tiling-Compute:Activity    Mali Job Manager Cycles:GPU cycles    Mali Job Manager Cycles:JS1 cycles    Mali Core Cycles:Tripipe cycles    Mali Core Cycles:Compute cycles

    6    99.55%    597264018    597099182    385657069    385661788
    7    99.52%    597069905    596896010    385843125    385848021
    8    99.64%    476883765    476739306    307584997    307589073
    9    99.36%    327274357    327183748    211693189    211696096
    10    99.52%    597066566    596880847    385265405    385270772

    Even though GPU Vertex-Tiling-Compute:Activity   is 99.64% why streamline reports 476M gpu cycles instead of 599M ? I checked the temperature of GPU , it is only ~55 C for the above case.

    99.64% activity means idle periods are not there. Also temperature of GPU  is only ~55 C. So overheating also is not happening. Then why discrepancy arises. I am unable to understand.

  • Hi massa, 

    I'm not sure I can give you a better answer. 

    The "Activity" counter is a software metric reported by the kernel driver; i.e. how busy does the driver think the hardware is.

    The other counters are hardware counters reported by the GPU while work is running; i.e. how busy does the hardware think it is.

    The only conclusion I can draw is that your platform BSP is down-clocking the GPU to ~300Mhz for some reason. The frequency management is outside of the Mali driver / hardware and is provided by the chipset manufacturer, so I can't explain why it's decided to do this.

    HTH, 
    Pete