Stalls in float point calcultions

Dear ARM Support Team,

I am reaching out to ask whether pipeline stalls may occur during the execution of vector operations such as multiplication, addition, or subtraction... when using NEON, SVE, or SVE2 instructions on the target hardware platform.

Specifically, I am interested in the following:

  • Are there any stalls during sequential execution of vector arithmetic instructions?

  • What is the latency between dependent instructions, especially when operating on the same registers?

  • Does the microarchitecture apply techniques  to mitigate stalls in these cases?

  • Does the vector length (in SVE) influence the likelihood or duration of stalls?

I would greatly appreciate any technical details or references to documentation that might provide deeper insights into how the processor handles such scenarios.

Illustrative examples to better understand the question:

Best regards,
Yevh


0