This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

ARM Cortex-A72 64-bit multiply (MADD) instruction low throughput

Hi, I've been benchmarking performance of Cortex-A72 CPU on Raspberry Pi 4 Model B Rev 1.1. It looks like the throughput of int64 multiply (MADD) instruction is about 1/3rd of multiply instructions for int32, float and double C data types on the same hardware.

I've posted the same question on NetBSD arm mailing list. More details can be found here: http://mail-index.netbsd.org/port-arm/2020/04/15/msg006614.html

Is this expected at all? Anyone knows why int64 multiply is so much slower compared to other data types?