Hi
As part of my MSc Scientific Computing at UCL, I'm benchmarking a small Raspberry Pi 4 Model B cluster.
I would like to reference the theoretical maximum performance of the BCM2711 (4 x ARM Cortex-A72) in Linpack terminology, R_peak.
I believe R_peak to be: 1.5 GHz x 3-way dispatch x 4 cores = 18 Gflops. This seems to be the "standard" Linpack methodology.
It would be very helpful if someone more knowledgable than me can confirm that this seems reasonable. Or even better, if there is some official ARM benchmarking material which I can reference in my dissertation?
Best wishes
John
Hi Timo,
Yes, that's right. Each element takes half as much space in single precision, meaning you can get 4 "lanes" of single precision into a 128-bit vector meaning that the overall computation rates are doubled. This therefore enables A72 to achieve 8FLOPs per core per cycle when using FMA operations.
Note it is not universally true for every core in the world that double precision can be calculated at the same rate as single precision, but almost all that I've used do so.
Chris
Thanks Chris for your answer.