We are running a survey to help us improve the experience for all of our members. If you see the survey appear, please take the time to tell us about your experience if you can.
Hello~!
I try to measure the Rpeak performance of Mali-G610 GPU in OpenCL environment.
I change the code in the following git to measure the computational performance of int8 and got the above results. ( https://github.com/krrishnarraj/clpeak.git )
According to the spec sheet, the Mali-G610 GPU of RK3588AP runs at 1000Mhz in MP4 configuration.
The Mali-G610 GPU has two execution engines and each engine has two 16-wide threads. As a results Mali-G610 MP1 has a total of 64 threads.
So, Rpeak is 512 GFLOPS because of its MP4 configuration. ( 64 [threads] * 4 [MP4] * 2 [FMA] * 1000Mhz )
When I checked the code myself, I got similar results to the above calculation for real datatypes like float32 and float16.
However, for int, the performance is about a quarter of what I expected, and I don't know exactly why.
One guess is that the reason for this low performance is that the Mali-G610 GPU's execution engine has fewer INT ALUs than FP ALUs.
If anyone knows anything about this, it would be very helpful.
Thanks