I try to measure the Rpeak performance of Mali-G610 GPU in OpenCL environment.
I change the code in the following git to measure the computational performance of int8 and got the above results. ( https://github.com/krrishnarraj/clpeak.git )
According to the spec sheet, the Mali-G610 GPU of RK3588AP runs at 1000Mhz in MP4 configuration.
The Mali-G610 GPU has two execution engines and each engine has two 16-wide threads. As a results Mali-G610 MP1 has a total of 64 threads.
So, Rpeak is 512 GFLOPS because of its MP4 configuration. ( 64 [threads] * 4 [MP4] * 2 [FMA] * 1000Mhz )
When I checked the code myself, I got similar results to the above calculation for real datatypes like float32 and float16.
However, for int, the performance is about a quarter of what I expected, and I don't know exactly why.
One guess is that the reason for this low performance is that the Mali-G610 GPU's execution engine has fewer INT ALUs than FP ALUs.
If anyone knows anything about this, it would be very helpful.