Support forums

Architectures and Processors forum ARM Cortex-A72 64-bit multiply (MADD) instruction low throughput

State Accepted Answer
Locked Locked
Replies 8 replies
Subscribers 350 subscribers
Views 24394 views
Users 0 members are here

Options

How was your experience today?

This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

ARM Cortex-A72 64-bit multiply (MADD) instruction low throughput

Sad Clouds over 5 years ago

Hi, I've been benchmarking performance of Cortex-A72 CPU on Raspberry Pi 4 Model B Rev 1.1. It looks like the throughput of int64 multiply (MADD) instruction is about 1/3rd of multiply instructions for int32, float and double C data types on the same hardware.

I've posted the same question on NetBSD arm mailing list. More details can be found here: http://mail-index.netbsd.org/port-arm/2020/04/15/msg006614.html

Is this expected at all? Anyone knows why int64 multiply is so much slower compared to other data types?

Top replies

42Bastian Schick over 5 years ago in reply to vstehle +1 verified

vstehle said: As per the Cortex-A72 Software Optimization Guide , the MUL instruction has a throughput of 1 per cycle. The doc shows the MADD has a latency of 5 for 64bit compared to 3 for 32bit and...