Hello forum,
I wanted to know if the MaliCorePUInstructionsFMAInstructions hardware counter counts the FMA instructions per processing unit or execution core or for the entire GPU.
By the name of the counter it seems like it is per processing unit. If so, how can I scale this to infer the total FMA instructions executed on the entire GPU.
Thank you,
rchakena
Yes, your scale factors look correct.
Streamline will sum and average counters assuming all cores are active, so scaling by the total core count will give the correct global total.
You cannot tell how many cores were actually active in Streamline; it's a transparent aspect of power management policy controlled by the platform provider.