Hi awesome guy of ARM,
I have a question on ARM A53 platform, and I needs your help!
I have writen a small program to verify floating data compute paralleled performance, main loop was made of several "fmla" instructions, and related registers have no dependencies with each other. As a result, the dual issue was not I expected, as I know, we inserted some other neon instructions which registers not related to "fmla" so that it can get "dual issue". such as,
fmla v0.4s, v0.4s, v20.s //line 0 ldr q30,[x1] fmla v1.4s, v1.4s, v20.s //line 1
but, it was found that the running time became long since the "ldr" instruction was inserted, unless the first operand of ldr instruction is general register(such as Xn), or else the running time must become long when insert it. and then we inserted " add v22.4s,v22.4s,v23.4s" or "str q30,[x1]" between line 0 and line 1, we got the same result.
I refered to the doc. “Cortex_A57_Software_Optimization_Guide_external.pdf”, contents as follows,
ldr was issued by pipeline "Load",
str was issued by pipeline "Store",
fmla was issued by pipeline "FP/ASIMD 0" or "FP/ASIMD 1",
As I understand it, ldr and fmla should realize "dual issue".
Wether I have got mistake in comprehension?
Besides, if there is a document of A53 corresponds with "Cortex_A57_Software_Optimization_Guide_external.pdf".
Unfortunately, there is no Cortex-A53 Software Optimization Guide external document. But we can consider Cortex-A53 is similar to Cortex-A57.
However, we may admit that Cortex-A53 is in-order with non-symmetric dual-issue of branch and data-processing instructions; while Cortex-A57 is out of order ( instruction fetch is in order, the instruction execution is out of order).
You can see the Pipeline Overview from the "Cortex-A57_Software_Optimization_Guide_external.pdf" page 6.
whether ldr and fmla can "dual issue" on A53? if yes, how I can do that?
I got confirm that Cortex-A53 Software Optimization guide is included in the Cortex-A53 bundle for development. Please double check your CA53 bundle.
Cortex-A53 can dual-issue under most circumstances. You can refer to Section 3.2 Dual Issue for that.
What is "Cortex-A53 bundle for development"? as we use the NXP platform, so We have no such bundle, Where I can get it ?besides, I want to consult another question:https://community.arm.com/developer/tools-software/oss-platforms/f/dev-platforms-forum/12810/ldr-and-fmla-instruction-time-consumption-issue#
Pls guide me !
That Cortex-A53 document is available for Cortex-A53 licensee. We cannot deliver it directly.
OK, Thanks! Could you tell me detailed name of the "bundle for development"? I will claim it from our vendor
If the license purchased the Cortex-A53, there are many documents in the Cortex-A53 bundle released by Arm. They can find the Cortex-A53 Software Optimization guide document from that bundle.
As we konw, Cortex-A53 has two 64-bits Neon unit, so if we use 128bits register like Q/V.4s, it will cover all of two Neon units, so it can not "dual issue" when one uses 128bits Q register and at the same time, the other one also use 128bits Q register.
Wether I misunderstand?