Hi,
I'm trying to profile GPU utilization of the Mali T628 on an Odroid-XU4 board using Streamline. When I run some graphics workloads, sometime I can see "GPU Fragment Activity", "GPU Vertex Compute" and "GPU Vertex-Tiling-Compute Activity" adds up to more than 100%. For instance, Fragment Activity and Vertex Activity would both reach 100% at some short periods.
Are these metrics ("Fragment Activity" and "Vertex Activity") basically the percentage of time that the corresponding Job Manager is running?
Then if I try to run some OpenCL workloads (for instance matrix multiplication) along side with the graphics workloads, the Vertex Activity would maxed out and the frame rate would drop significantly.
Are these tasks (fragment, vertex compute, load-store) using different hardware in a shader core? Can a shader core perform more than 1 task at the same time?
Finally, is it possible for me to know per-core stats, such as how many shader cores are running, what tasks are they running and what are their utilizations? From OpenCL threads on Mali T628 I guess is it not possible from Streamline, is it true? If so, is it exposed somewhere in the kernel driver, or completely obvious to software?
Thanks a lot in advance!
Peter Harris wrote:For instance, by only running fragment I can get X Flops for Fragment, when I have fragment and vertex running at the same time, I would get something like X/2 Flops for Fragment and X/2 Flops for Vertex?That kind of thing, yes - the shader core is shared if you have two things running at the same time, but the exact ratio is dynamic and content dependent, so it may not be as simple as X/2 (in reality it will be skewed slightly in favor of vertex and compute, as they then to have longer programs to run).HTH, P
Peter Harris wrote:
For instance, by only running fragment I can get X Flops for Fragment, when I have fragment and vertex running at the same time, I would get something like X/2 Flops for Fragment and X/2 Flops for Vertex?
That kind of thing, yes - the shader core is shared if you have two things running at the same time, but the exact ratio is dynamic and content dependent, so it may not be as simple as X/2 (in reality it will be skewed slightly in favor of vertex and compute, as they then to have longer programs to run).
HTH,
P
I see. So OpenCL and graphics workloads can also be running concurrently as well?
OpenCL can run in parallel with fragment shading, but not vertex shading.