This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Cortex-A78 NEON instructions timing

I am curious to know how far Cortex-A78 goes with concurrent execution of (some) NEON instructions.

-------------------------------------------

Example 1:

    /* Flush pipeline & disable ISRs            */
    SCST_PREPARE_PIPELINE

    /* ABS - 128-bit operation */
    ABS     V31.2D,V0.2D    /* Pipeline V0  */ 
    ABS     V30.2D,V1.2D    /* Pipeline V1  */
    ABS     V29.2D,V2.2D    /* Pipeline V0  */
    ABS     V28.2D,V2.2D     /* Pipeline V1  */ 

I assume that line 1 and 3 goes to pipeline V0, line 2 and 4 to pipeline V1.

Then I think line 1 and line 2 executes concurrently in one clock cycle. Line 3 and line 4 executes concurrently in one clock cycle.

So the code is done in 2 clock cycles.

Is it correct ?

----------------------------------------------

Example 1:

    /* Flush pipeline & disable ISRs            */
    SCST_PREPARE_PIPELINE

    /* ABS - 64-bit operation*/
    ABS     V16.2S,V3.2S    /* Pipeline V0  */
    ABS     V15.2S,V3.2S    /* Pipeline V1  */
    ABS     V14.2S,V3.2S    /* Pipeline V0  */
    ABS     V14.2S,V3.2S    /* Pipeline V1  */

I assume that line 1 and 3 goes to pipeline V0, line 2 and 4 to pipeline V1.

Now there are two vector execution units in Cortex-A78, each is 128 bit in size.

Does that mean that NEON code using 64 bit operations can execute 4 NEON instructions in one clock cycle ?

In other words, the above code is done in 1 cycle.

Is it correct ?

Thanks for the answer.

P.S. The code is an example of our special code, please, do not ask why we need it or why don't we write it differently.

Parents Reply Children
No data