What are MaliALUInstructionsFMAPipeInstructions in terms of f32 scalar mul/add operations?

Hi,

I'm trying to understand something about the performance of one of my test shaders (I'm using Mali G78 on Pixel 6a - I believe it's 20 cores) and using streamline, I'm getting values as 15 giga instructions per second, with arithmetic unit utilization of around 99%.

According to my calculation, we are processing around 600 giga scalar add/mul per second (counting them in the shader - which I think cannot be optimized -, multiplying by the number of pixels, times 4 because I'm using vec4 and times the fps).

I am not sure how to reconcile the 15 giga instructions/s with my calculated 600 gflops. If I assume an instruction can be run on 32 f32 simultaneously, that would give me 15 * 32 = 480 gflops which is still quite lower than what I estimate with my shader and fillrate.

Thanks,

Lorenzo

Top replies

Parents

+1 Peter Harris 20 days ago

Mali counters only count for one arithmetic unit, and increment per warp, not per thread. This allows normalization relative to clock frequency.

For Mali-G78 there are 2 arithmetic units, using 16-wide warps (see https://developer.arm.com/documentation/102849/latest/).

15G * 2 * 16 = 480G instructions/second.

Further, one instruction could be fused FMA or scalar op such as MUL and ADD, so 480G could be 960G flops if you count FMA as 2.
Cancel
Vote up +2 Vote down

Reply

Accept answer

Reject answer

Cancel

Reply

+1 Peter Harris 20 days ago

Mali counters only count for one arithmetic unit, and increment per warp, not per thread. This allows normalization relative to clock frequency.

For Mali-G78 there are 2 arithmetic units, using 16-wide warps (see https://developer.arm.com/documentation/102849/latest/).

15G * 2 * 16 = 480G instructions/second.

Further, one instruction could be fused FMA or scalar op such as MUL and ADD, so 480G could be 960G flops if you count FMA as 2.
Cancel
Vote up +2 Vote down

Reply

Accept answer

Reject answer

Cancel

Children

0 Lorenzo Dal Col 20 days ago in reply to Peter Harris

I'm counting FMA as one, but I ended up only having ADDs in my shader to simplify things.

I guess the reason I'm getting more than 480G is either approximation or something is getting optimized.

Thanks for your reply.
Cancel
Vote up +1 Vote down

Reply

Accept answer

Cancel
0 Lorenzo Dal Col 20 days ago in reply to Lorenzo Dal Col

I confirm I'm matching the 480G as expected. My previous numbers (600G) were based on not waiting for the last frame to complete (they are very slow frames).
Cancel
Vote up +2 Vote down

Reply

Accept answer

Cancel