This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Cortex-A72 Maximum Theoretical Linpack Performance R_peak

Hi

As part of my MSc Scientific Computing at UCL, I'm benchmarking a small Raspberry Pi 4 Model B cluster.

I would like to reference the theoretical maximum performance of the BCM2711 (4 x ARM Cortex-A72) in Linpack terminology, R_peak.

I believe R_peak to be: 1.5 GHz x 3-way dispatch x 4 cores = 18 Gflops. This seems to be the "standard" Linpack methodology.

It would be very helpful if someone more knowledgable than me can confirm that this seems reasonable. Or even better, if there is some official ARM benchmarking material which I can reference in my dissertation?

Best wishes

John

Parents
  • Hi John,

    An A72 core has a single 128-bit vector pipeline.  This can therefore do two double precision FLOPs per cycle, such as are used in High Performance Linpack.  The use of FMA instructions (multiply and add) means every cycle you can do 2 FLOPs on 2 doubles, i.e. 4FLOPs.  At 1.5GHz you therefore have a maximum performance per core of 6GFLOPs.  This gives a peak performance for 4 cores of 24 GFLOPs.

    Hope this helps.

    Chris

Reply
  • Hi John,

    An A72 core has a single 128-bit vector pipeline.  This can therefore do two double precision FLOPs per cycle, such as are used in High Performance Linpack.  The use of FMA instructions (multiply and add) means every cycle you can do 2 FLOPs on 2 doubles, i.e. 4FLOPs.  At 1.5GHz you therefore have a maximum performance per core of 6GFLOPs.  This gives a peak performance for 4 cores of 24 GFLOPs.

    Hope this helps.

    Chris

Children