This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Stalls in float point calcultions

Yevh Prill 10 months ago

Dear ARM Support Team,

I am reaching out to ask whether pipeline stalls may occur during the execution of vector operations such as multiplication, addition, or subtraction... when using NEON, SVE, or SVE2 instructions on the target hardware platform.

Specifically, I am interested in the following:

Are there any stalls during sequential execution of vector arithmetic instructions?
What is the latency between dependent instructions, especially when operating on the same registers?
Does the microarchitecture apply techniques to mitigate stalls in these cases?
Does the vector length (in SVE) influence the likelihood or duration of stalls?

I would greatly appreciate any technical details or references to documentation that might provide deeper insights into how the processor handles such scenarios.

Illustrative examples to better understand the question:

Best regards,
Yevh

Top replies

Martin Weidmann 10 months ago +2 verified

Adding to Stephen's answer, you might also want to look at the Software Optimisation Guide for the relevant core(s). For example, for the Cortex-A320: https://developer.arm.com/documentation/110285/r0p1...

0 Stephen Theobald 10 months ago

Hi Yevh

This forum is for questions about Arm Development Studio. Your questions about pipelines stalls, etc, relate more to architectures and processors rather than Arm DS, so would be best handled by another route.

You could try posting to Architectures and Processors forum at

Architectures and Processors forum

but, given your specific interest in our latest Cortex-A320 processor in your previous post, I suggest instead that you "Open a Support Case" from the links at the bottom of this web page, and our Support team will be able to help with your enquiry.

Stephen
Cancel
Vote up 0 Vote down

Cancel
0 Yevh Prill 10 months ago in reply to Stephen Theobald

Thanks.
Moved topic to the to the Architectures and Processors forum.
Cancel
Vote up 0 Vote down

Cancel
+1 Martin Weidmann 10 months ago

Adding to Stephen's answer, you might also want to look at the Software Optimisation Guide for the relevant core(s). For example, for the Cortex-A320:

https://developer.arm.com/documentation/110285/r0p1/?lang=en
Cancel
Vote up +2 Vote down

Cancel