Hi awesome guy of ARM,
I have a question on ARM A53 platform, and I needs your help!
I have writen a small program to verify floating data compute paralleled performance, main loop was made of several "fmla" instructions, and related registers have no dependencies with each other. As a result, the dual issue was not I expected, as I know, we inserted some other neon instructions which registers not related to "fmla" so that it can get "dual issue". such as,
fmla v0.4s, v0.4s, v20.s[0] //line 0 ldr q30,[x1] fmla v1.4s, v1.4s, v20.s[1] //line 1
but, it was found that the running time became long since the "ldr" instruction was inserted, unless the first operand of ldr instruction is general register(such as Xn), or else the running time must become long when insert it. and then we inserted " add v22.4s,v22.4s,v23.4s" or "str q30,[x1]" between line 0 and line 1, we got the same result.
I refered to the doc. “Cortex_A57_Software_Optimization_Guide_external.pdf”, contents as follows,
ldr was issued by pipeline "Load",
str was issued by pipeline "Store",
fmla was issued by pipeline "FP/ASIMD 0" or "FP/ASIMD 1",
As I understand it, ldr and fmla should realize "dual issue".
Wether I have got mistake in comprehension?
Besides, if there is a document of A53 corresponds with "Cortex_A57_Software_Optimization_Guide_external.pdf".
Thanks !
What is "Cortex-A53 bundle for development"? as we use the NXP platform, so We have no such bundle, Where I can get it ?besides, I want to consult another question:https://community.arm.com/developer/tools-software/oss-platforms/f/dev-platforms-forum/12810/ldr-and-fmla-instruction-time-consumption-issue#
Pls guide me !
That Cortex-A53 document is available for Cortex-A53 licensee. We cannot deliver it directly.
OK, Thanks! Could you tell me detailed name of the "bundle for development"? I will claim it from our vendor
If the license purchased the Cortex-A53, there are many documents in the Cortex-A53 bundle released by Arm. They can find the Cortex-A53 Software Optimization guide document from that bundle.
As we konw, Cortex-A53 has two 64-bits Neon unit, so if we use 128bits register like Q/V.4s, it will cover all of two Neon units, so it can not "dual issue" when one uses 128bits Q register and at the same time, the other one also use 128bits Q register.
Wether I misunderstand?