This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Dual emission problem related to neon instruction set on A53

Hi awesome guy of ARM,

           I have a question on ARM A53 platform, and I needs your help!

           I have writen a small program to verify floating data compute paralleled performance, main loop was made of several "fmla" instructions, and related registers have no dependencies with each other. As a result, the dual issue was not I expected, as I know, we inserted some other neon instructions which registers not related to "fmla" so that it can get "dual issue". such as,

          fmla v0.4s, v0.4s, v20.s[0]     //line 0
        ldr q30,[x1]
        fmla v1.4s, v1.4s, v20.s[1]     //line 1

        but, it was found that the running time became long since the "ldr" instruction was inserted, unless the first operand of ldr instruction is general register(such as Xn), or else the running time must become long when insert it. and then we inserted " add v22.4s,v22.4s,v23.4s" or "str q30,[x1]" between line 0 and line 1, we got the same result.

        I refered to the doc. “Cortex_A57_Software_Optimization_Guide_external.pdf”, contents as follows,

       ldr was issued by pipeline "Load",

        str was issued by pipeline "Store",

       fmla was issued by pipeline "FP/ASIMD 0" or "FP/ASIMD 1",

       As I understand it, ldr and fmla should realize "dual issue".

       Wether I have got mistake in comprehension?

       Besides, if there is a document of A53 corresponds with "Cortex_A57_Software_Optimization_Guide_external.pdf".

       Thanks !

       

      

 

  

Parents
  • Unfortunately, there is no Cortex-A53 Software Optimization Guide external document. But we can consider Cortex-A53 is similar to Cortex-A57.

    However, we may admit that Cortex-A53 is in-order with non-symmetric dual-issue of branch and data-processing instructions; while Cortex-A57 is out of order ( instruction fetch is in order, the instruction execution is out of order).

    You can see the Pipeline Overview from the "Cortex-A57_Software_Optimization_Guide_external.pdf" page 6.

Reply
  • Unfortunately, there is no Cortex-A53 Software Optimization Guide external document. But we can consider Cortex-A53 is similar to Cortex-A57.

    However, we may admit that Cortex-A53 is in-order with non-symmetric dual-issue of branch and data-processing instructions; while Cortex-A57 is out of order ( instruction fetch is in order, the instruction execution is out of order).

    You can see the Pipeline Overview from the "Cortex-A57_Software_Optimization_Guide_external.pdf" page 6.

Children