This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

MaliCorePUInstructionsFMAInstructions hardware counter

Hello forum,

I wanted to know if the MaliCorePUInstructionsFMAInstructions hardware counter counts the FMA instructions per processing unit or execution core or for the entire GPU.

By the name of the counter it seems like it is per processing unit. If so, how can I scale this to infer the total FMA instructions executed on the entire GPU.

Thank you,

rchakena 

Parents
  • In Streamline, the Mali instruction counters count the performance of a single unit, averaged across all shader cores to show single core performance. This is the most useful measure for performance analysis as the dominant single data-path throughput per-core is what you need to know to determine critical path performance bottlenecks.

    To compute "whole GPU totals" multiply the value in Streamline by your core count (accessible via $MaliConstantsShaderCoreCount) and the number of processing units per core (not accessible programatically). I can't find a good public reference for the number of PUs per core, but if you let me know what GPU you are using I can give you the scale factor.

    Cheers, 
    Pete

Reply
  • In Streamline, the Mali instruction counters count the performance of a single unit, averaged across all shader cores to show single core performance. This is the most useful measure for performance analysis as the dominant single data-path throughput per-core is what you need to know to determine critical path performance bottlenecks.

    To compute "whole GPU totals" multiply the value in Streamline by your core count (accessible via $MaliConstantsShaderCoreCount) and the number of processing units per core (not accessible programatically). I can't find a good public reference for the number of PUs per core, but if you let me know what GPU you are using I can give you the scale factor.

    Cheers, 
    Pete

Children