Hello forum,
I wanted to know if the MaliCorePUInstructionsFMAInstructions hardware counter counts the FMA instructions per processing unit or execution core or for the entire GPU.
By the name of the counter it seems like it is per processing unit. If so, how can I scale this to infer the total FMA instructions executed on the entire GPU.
Thank you,
rchakena
Hello Pete,
Thanks for the detailed explanation. It clarified my doubts to a large extent.
Here are some follow up points to validate my understanding and additional doubts.
1. I am using G77-MP7 and G78-MP14 GPU which as per my understanding has 7 and 14 cores respectively and 2 PU's per core. So the scale factor will be 7*2=14 for G77-MP7 and 14*2=28 for G78-MP14.
2. I was of the understanding that not all cores will be utilized for small workloads. So the scale factor might vary based on how many cores were actually active during the workload execution. (OR should I assume all cores will be active irrespective of the workload.)
3. If the #of active cores and scale factor is dependent on the workload, what counter or streamline info can tell me how may cores we actually active during workload execution. (Which be used to adjust the scale factor accordingly)
Cheers,
Yes, your scale factors look correct.
Streamline will sum and average counters assuming all cores are active, so scaling by the total core count will give the correct global total.
You cannot tell how many cores were actually active in Streamline; it's a transparent aspect of power management policy controlled by the platform provider.