This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

What is exact double precision performance for Mali T628 MP6 (Arndale Octa Board) ?

Different sources point to different numbers. On Arndale board I found something about 72 GFLOPS for T604.

Wikipedia show 109 GFLOPS for T628. Have you hear about performance measurements for this GPU and its theoretical capability ?

I'm think about using it for low-power HPC, let me know what you think about that. Opinions and links to related to this are welcome.

Regards,

Piotr

Parents
  • Hi Piotr,

    I've updated my original reply with some updated numbers, as it's actually 5 FP64 FLOPS not 8.5, the simple half was a bit too naive So slightly less than 1/3, but not as bad as 1/24. So it becomes $5.60/GFLOPS.

    As for utilization, we usually suggest 70% as being pretty optimal usage for a real application, but this number will change for different applications. Beyond that point the ALU is rarely a bottleneck, and you have to look a lot closer at cache utilization and the memory system feeding the GPU, eliminating CPU/GPU sync points to improve the pipelining of work to the GPU etc.

    Hope this helps,

    Chris

Reply
  • Hi Piotr,

    I've updated my original reply with some updated numbers, as it's actually 5 FP64 FLOPS not 8.5, the simple half was a bit too naive So slightly less than 1/3, but not as bad as 1/24. So it becomes $5.60/GFLOPS.

    As for utilization, we usually suggest 70% as being pretty optimal usage for a real application, but this number will change for different applications. Beyond that point the ALU is rarely a bottleneck, and you have to look a lot closer at cache utilization and the memory system feeding the GPU, eliminating CPU/GPU sync points to improve the pipelining of work to the GPU etc.

    Hope this helps,

    Chris

Children
No data