We are running a survey to help us improve the experience for all of our members. If you see the survey appear, please take the time to tell us about your experience if you can.
Hi awesome guy,
I have a question on ARM A53 platform, and I needs your help!
8 ldr operations which using uncorrelated Qn register and 8 fmla operations which also using uncorrelated Qn reigster, codes shows as follows,
and
the address of X1 and X2 are on stack. why ldr loop will consume double time of the fmla loop?
I have refer to doc "Cortex_A57_Software_Optimization_Guide_external.pdf", ldr lantency is 5, and fmla is 10.
And you can see chapter 3 INSTRUCTION CHARACTERISTICS to find the instruction table in Cortex_A57_Software_Optimization_Guide_external.pdf