Query on $MaliCoreInstructionsNarrowInstructions

Hi Forum,

As per perf counter guide, counter $MaliCoreInstructionsNarrowInstructions increments for every 8-bit or 16-bit instruction execution. To understand counter further, when tried with simple medium precision test observed part of 16-bit instructions are reported in $MaliALUInstructionsFMAInstructions as well. is it expected? 

the following test executed on G720 and collected data using streamline version 8.9.0

precision mediump float;

in vec4 in1, in2;

out vec4 col;

main( ) { col = in1 * in2; }

Mali ALU Instructions:FMA instructions Mali ALU Instructions:Narrow instructions Mali Shader Warps:Fragment warps
8335 8325 16454

the total ALUs are 4. Narrow should be included to get 4, formula: ((FMA+Narrow)*4)/Warps

when tried the same test in high precision, data is as expected

Mali ALU Instructions:FMA instructions Mali ALU Instructions:Narrow instructions Mali Shader Warps:Fragment warps
16544 0 16454

Thanks,

Venkatesh.

Parents
  • 16-bit instructions are reported in $MaliALUInstructionsFMAInstructions as well. is it expected?

    Yes.

    The ALUInstructions counters increment for every issued arithmetic instruction for a specific sub-pipe.

    The NarrowInstructions counter increments for every issued narrow arithmetic instruction (for any of the three sub-pipes (FMA/CVT/SFU)).

    Narrow should be included to get 4,

    No. 

    Most 16-bit operations on Mali are vec2 SIMD, so this test is only running two instructions per thread, not four. 

    HTH, 
    Pete

Reply
  • 16-bit instructions are reported in $MaliALUInstructionsFMAInstructions as well. is it expected?

    Yes.

    The ALUInstructions counters increment for every issued arithmetic instruction for a specific sub-pipe.

    The NarrowInstructions counter increments for every issued narrow arithmetic instruction (for any of the three sub-pipes (FMA/CVT/SFU)).

    Narrow should be included to get 4,

    No. 

    Most 16-bit operations on Mali are vec2 SIMD, so this test is only running two instructions per thread, not four. 

    HTH, 
    Pete

Children