Different sources point to different numbers. On Arndale board I found something about 72 GFLOPS for T604.
Wikipedia show 109 GFLOPS for T628. Have you hear about performance measurements for this GPU and its theoretical capability ?
I'm think about using it for low-power HPC, let me know what you think about that. Opinions and links to related to this are welcome.
Regards,
Piotr
Hi Piotr,
I've updated my original reply with some updated numbers, as it's actually 5 FP64 FLOPS not 8.5, the simple half was a bit too naive So slightly less than 1/3, but not as bad as 1/24. So it becomes $5.60/GFLOPS.
As for utilization, we usually suggest 70% as being pretty optimal usage for a real application, but this number will change for different applications. Beyond that point the ALU is rarely a bottleneck, and you have to look a lot closer at cache utilization and the memory system feeding the GPU, eliminating CPU/GPU sync points to improve the pipelining of work to the GPU etc.
Hope this helps,
Chris