This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Mali-G610 Rpeak in int8


I try to measure the Rpeak performance of Mali-G610 GPU in OpenCL environment.

I change the code in the following git to measure the computational performance of int8 and got the above results. ( )

According to the spec sheet, the Mali-G610 GPU of RK3588AP runs at 1000Mhz in MP4 configuration.

The Mali-G610 GPU has two execution engines and each engine has two 16-wide threads. As a results Mali-G610 MP1 has a total of 64 threads.

So, Rpeak is 512 GFLOPS because of its MP4 configuration.  ( 64 [threads] * 4 [MP4] * 2 [FMA] * 1000Mhz )

When I checked the code myself, I got similar results to the above calculation for real datatypes like float32 and float16.

However, for int, the performance is about a quarter of what I expected, and I don't know exactly why.

One guess is that the reason for this low performance is that the Mali-G610 GPU's execution engine has fewer INT ALUs than FP ALUs.

If anyone knows anything about this, it would be very helpful.