This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Mali T628 GPU activity in Streamline

Hi,

I'm trying to profile GPU utilization of the Mali T628 on an Odroid-XU4 board using Streamline. When I run some graphics workloads, sometime I can see "GPU Fragment Activity", "GPU Vertex Compute" and "GPU Vertex-Tiling-Compute Activity" adds up to more than 100%. For instance, Fragment Activity and Vertex Activity would both reach 100% at some short periods.

Are these metrics ("Fragment Activity" and "Vertex Activity") basically the percentage of time that the corresponding Job Manager is running?

Then if I try to run some OpenCL workloads (for instance matrix multiplication) along side with the graphics workloads, the Vertex Activity would maxed out and the frame rate would drop significantly.

Are these tasks (fragment, vertex compute, load-store) using different hardware in a shader core? Can a shader core perform more than 1 task at the same time?

Finally, is it possible for me to know per-core stats, such as how many shader cores are running, what tasks are they running and what are their utilizations? From OpenCL threads on Mali T628 I guess is it not possible from Streamline, is it true? If so, is it exposed somewhere in the kernel driver, or completely obvious to software?

Thanks a lot in advance!

  • Are these metrics ("Fragment Activity" and "Vertex Activity") basically the percentage of time that the corresponding Job Manager is running?

    Yes, exactly that.

    Then if I try to run some OpenCL workloads (for instance matrix multiplication) along side with the graphics workloads, the Vertex Activity would maxed out and the frame rate would drop significantly.

    Is that surprising? If you ask the GPU to do more work (whether graphics or compute) then the performance will inevitably drop.

    Are these tasks (fragment, vertex compute, load-store) using different hardware in a shader core?

    No, they can share the core concurrently.

    Finally, is it possible for me to know per-core stats, such as how many shader cores are running, what tasks are they running and what are their utilizations?

    Assuming you are using the default device or only device[0] then OpenCL will be using the first 4 of the 6 GPU cores in that chipset.

    Vertex shading will be using the first 4 too, but fragment shading can run on all 6.

    From OpenCL threads on Mali T628 I guess is it not possible from Streamline, is it true

    Correct - it's not possible in Streamline.

    HTH,

    Pete

  • Hi Peter,

    thanks for your reply.

    Then if I try to run some OpenCL workloads (for instance matrix multiplication) along side with the graphics workloads, the Vertex Activity would maxed out and the frame rate would drop significantly.

    Is that surprising? If you ask the GPU to do more work (whether graphics or compute) then the performance will inevitably drop.

    Nope -- totally expected.

    Are these tasks (fragment, vertex compute, load-store) using different hardware in a shader core?

    No, they can share the core concurrently.

    Sorry if I misunderstood, so the hardware is split among these tasks based on the discretion of the Job Manager? For instance, by only running fragment I can get X Flops for Fragment, when I have fragment and vertex running at the same time, I would get something like X/2 Flops for Fragment and X/2 Flops for Vertex?

    Thanks!

  • For instance, by only running fragment I can get X Flops for Fragment, when I have fragment and vertex running at the same time, I would get something like X/2 Flops for Fragment and X/2 Flops for Vertex?

    That kind of thing, yes - the shader core is shared if you have two things running at the same time, but the exact ratio is dynamic and content dependent, so it may not be as simple as X/2 (in reality it will be skewed slightly in favor of vertex and compute, as they then to have longer programs to run).

    HTH,

    P

  • Peter Harris wrote:

    For instance, by only running fragment I can get X Flops for Fragment, when I have fragment and vertex running at the same time, I would get something like X/2 Flops for Fragment and X/2 Flops for Vertex?

    That kind of thing, yes - the shader core is shared if you have two things running at the same time, but the exact ratio is dynamic and content dependent, so it may not be as simple as X/2 (in reality it will be skewed slightly in favor of vertex and compute, as they then to have longer programs to run).

    HTH,

    P

    I see. So OpenCL and graphics workloads can also be running concurrently as well?

  • I see. So OpenCL and graphics workloads can also be running concurrently as well?

    OpenCL can run in parallel with fragment shading, but not vertex shading.