Hello,
I would like to understand how the Performance Monitoring Unit (PMU) behaves during context switches on the GPU.
For example, if task A is running and PMU_Counter1 reaches a value of 56, then the scheduler removes task A and switches to task B. When task B begins execution, what happens to the value of PMU_Counter1? Does it reset to 0, isolating task B from task A, or does it retain the previous value of 56 from task A, since both tasks are using the same partition?
In other words, how do context switches impact the Performance Monitor Counters on the MALI G78AE GPU? Are the counters specific to each task, or do they reflect the partition's activity regardless of which task is running?
Thank you,Luca
Hi Luca, GPU counters represent the global workload for a single partition, so include all running contexts.
The counter sampling is virtualized in the kernel driver, so you can have multiple sampling processes and they will see correct values relative to their own samples. One process sampling won't zero the counters for other processes.
Kind regards, Pete
Hi Peter,
Thanks for replying me.
If I understand correctly, you have counters representing the partition that can be separated for each of the processes running on the partition afterwards.
In this case, when the new context comes into the partition, the counters represent the values from the previous process which was running on the partition.
How can the counters be isolated for each of the processes based on their sampling? Is the kernel responsible for this task or is it something that Arm Streamline does, for instance? Can we access these sampled values somehow (not necessarily with Arm Streamline)?
Kind regards, Luca
You have counters representing the partition that can be separated for each of the processes running on the partition afterwards.
The counters represent the partition. There is no ability to separate afterwards - counters are global and within a partition workloads from different processes can overlap and run concurrently so there is no ability to attribute counters to a single application process.
EDIT: Software could (in theory) enforce this, e.g. by running workloads serially and sampling between workloads, but this isn't a standard system policy as it impacts performance.