Hi Forum,
As per perf counter guide, counter $MaliCoreInstructionsNarrowInstructions increments for every 8-bit or 16-bit instruction execution. To understand counter further, when tried with simple medium precision test observed part of 16-bit instructions are reported in $MaliALUInstructionsFMAInstructions as well. is it expected?
the following test executed on G720 and collected data using streamline version 8.9.0
precision mediump float;
in vec4 in1, in2;
out vec4 col;
main( ) { col = in1 * in2; }
the total ALUs are 4. Narrow should be included to get 4, formula: ((FMA+Narrow)*4)/Warps
when tried the same test in high precision, data is as expected
Thanks,
Venkatesh.
Venkatesh K R said:16-bit instructions are reported in $MaliALUInstructionsFMAInstructions as well. is it expected?
Yes.
The ALUInstructions counters increment for every issued arithmetic instruction for a specific sub-pipe.
The NarrowInstructions counter increments for every issued narrow arithmetic instruction (for any of the three sub-pipes (FMA/CVT/SFU)).
Venkatesh K R said:Narrow should be included to get 4,
No.
Most 16-bit operations on Mali are vec2 SIMD, so this test is only running two instructions per thread, not four.
HTH, Pete
Thank you for the immediate response. one follow-up question. does 16-bit math operations (ex: sqrt, log, floor, abs, etc..) are also reported as vec2 SIMD? or only 16-bit ALUs are packed into vec2 SIMD?
Never checked specifically for the trancendental operations, but my understanding is that it should be all 16-bit ops.