Hi Peter Harris,
I have the following questions regarding Mali T-628 GPU.
I am running a 3D convolution OpenCL kernel in Mali T-628 GPU (set of 4 cores device). I obtain the GPU cycles counter using Streamline
The GPU Vertex-Tiling-Compute:Activity counter shows 100% utilization . So this translates to 100% GPU utilization.
Then the GPU active cycles should match the runtime right?
In a sampling interval of 1s , the GPU active cycles should be 600*10^6 whereas the GPU active cycles reported by streamline is 3141810
Why this discrepancy?
Please help me understand this.
Thanks
It's unlikely that a silicon system can even clock as low as 3MHz, so it's definitely not 100% utilizing the GPU for only 3M cycles.
What Mali driver version (should be returned by the vendor string - and should look something like r<N>p<M> - e.g. r8p0) and Streamline gator version are you using? Not sure what I can do to help debug this one remotely - but I can say it's not working correctly ...
Cheers, Pete
Hi Peter Harris
Mali driver Version -- r4p0-02rel0
Streamline gatord version 6.10 (DS-5 v5.61)
My problem is as follows :
I use a timer to measure the execution time of OpenCL kernel. For the OpenCL kernel I am executing at 600MHz , it gives me OpenCL Runtime: 8.222271 seconds.
I profile the same kernel using Streamline (1s sampling interval). I get the following statisitcs :
Time Index GPU Vertex-Tiling-Compute:Activity Mali Job Manager Cycles:GPU cycles Mali Job Manager Cycles:JS1 cycles Mali Core Cycles:Tripipe cycles Mali Core Cycles:Compute cycles4 0.00% 2432752 0 4004 0
--- Kernel Start-----5 3.58% 23830709 21539073 14360724 143573776 99.55% 597264018 597099182 385657069 3856617887 99.52% 597069905 596896010 385843125 3858480218 99.64% 476883765 476739306 307584997 3075890739 99.36% 327274357 327183748 211693189 21169609610 99.52% 597066566 596880847 385265405 38527077211 99.58% 597396030 597227832 386222691 38622761312 99.52% 597074759 596898333 386156229 38616100413 99.50% 596939208 596755666 383900260 38390598714 12.45% 76855608 74689989 48011367 48008148
---Kernel End---15 0.00% 2432775 0 4004 0
If GPU activity is 99.36%, GPU active cycles should be 596160000 but streamline reports 327274357.
I calculate my execution time as sum of all GPU cycles from Kernel start till Kernel end . It gives me 4487654925 cycles =
7.479424875 seconds but the actual runtime is 8.222271 seconds.
I am talking about this discrepancy.. Please help me understand how can I fix this.
Do you have the ability to rebuild and replace the Mali kernel driver? r4p0 is quite old (almost 5 years old now - our latest release is r18p0) and I suspect has a bug relating to how the counter memory is set up; if you can rebuild and replace it I can try to provide a patch.
If you can please provide the patch. I will try to rebuild the linux kernel
I've just spotted that in this data you do have data points correctly reporting 599M cycles reporting 100% utilization, which looks correct to me. This is very different to the 3M cycles you reported in your first comment ...
In terms of some data points reporting low, it's entirely possible that the platform is adjusting frequency and voltage based on idle periods or thermal load if it's overheating due to sustained workloads while running an overclock.
maasa said:7.479424875 seconds but the actual runtime is 8.222271 seconds.
Software isn't zero cycles - there is some driver load to setup and complete the work.
Thank you for your inputs. But my question is
In the following data,
Time Index GPU Vertex-Tiling-Compute:Activity Mali Job Manager Cycles:GPU cycles Mali Job Manager Cycles:JS1 cycles Mali Core Cycles:Tripipe cycles Mali Core Cycles:Compute cycles
6 99.55% 597264018 597099182 385657069 3856617887 99.52% 597069905 596896010 385843125 3858480218 99.64% 476883765 476739306 307584997 3075890739 99.36% 327274357 327183748 211693189 21169609610 99.52% 597066566 596880847 385265405 385270772
Even though GPU Vertex-Tiling-Compute:Activity is 99.64% why streamline reports 476M gpu cycles instead of 599M ? I checked the temperature of GPU , it is only ~55 C for the above case.
99.64% activity means idle periods are not there. Also temperature of GPU is only ~55 C. So overheating also is not happening. Then why discrepancy arises. I am unable to understand.
Hi massa,
I'm not sure I can give you a better answer.
The "Activity" counter is a software metric reported by the kernel driver; i.e. how busy does the driver think the hardware is.
The other counters are hardware counters reported by the GPU while work is running; i.e. how busy does the hardware think it is.
The only conclusion I can draw is that your platform BSP is down-clocking the GPU to ~300Mhz for some reason. The frequency management is outside of the Mali driver / hardware and is provided by the chipset manufacturer, so I can't explain why it's decided to do this.
HTH, Pete