Support forums

Architectures and Processors forum Cortex-A78 NEON instructions timing

State Suggested Answer
Locked Locked
Replies 1 reply
Answers 1 answer
Subscribers 351 subscribers
Views 1589 views
Users 0 members are here

Options

How was your experience today?

This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Cortex-A78 NEON instructions timing

udrzbar over 2 years ago

I am curious to know how far Cortex-A78 goes with concurrent execution of (some) NEON instructions.

-------------------------------------------

Example 1:

/* Flush pipeline & disable ISRs */
SCST_PREPARE_PIPELINE

    /* ABS - 128-bit operation */
    ABS     V31.2D,V0.2D    /* Pipeline V0 */
    ABS     V30.2D,V1.2D    /* Pipeline V1 */
    ABS     V29.2D,V2.2D    /* Pipeline V0 */
    ABS     V28.2D,V2.2D   /* Pipeline V1 */

I assume that line 1 and 3 goes to pipeline V0, line 2 and 4 to pipeline V1.

Then I think line 1 and line 2 executes concurrently in one clock cycle. Line 3 and line 4 executes concurrently in one clock cycle.

So the code is done in 2 clock cycles.

Is it correct ?

----------------------------------------------

Example 1:

/* Flush pipeline & disable ISRs */
SCST_PREPARE_PIPELINE

    /* ABS - 64-bit operation*/
    ABS     V16.2S,V3.2S    /* Pipeline V0 */
    ABS     V15.2S,V3.2S    /* Pipeline V1 */
    ABS     V14.2S,V3.2S    /* Pipeline V0 */
    ABS     V14.2S,V3.2S    /* Pipeline V1 */

I assume that line 1 and 3 goes to pipeline V0, line 2 and 4 to pipeline V1.

Now there are two vector execution units in Cortex-A78, each is 128 bit in size.

Does that mean that NEON code using 64 bit operations can execute 4 NEON instructions in one clock cycle ?

In other words, the above code is done in 1 cycle.

Is it correct ?

Thanks for the answer.

P.S. The code is an example of our special code, please, do not ask why we need it or why don't we write it differently.

Top replies

Zhifei Yang over 2 years ago +1 suggested

Please see Cortex-A78 software optimization guide - https://developer.arm.com/documentation/102160/latest/ Section 3.15 ASIMD integer instructions.