ldr and fmla instruction time consumption issue.

Hi awesome guy,

        I have a question on ARM A53 platform, and I needs your help!

        8 ldr operations which using uncorrelated Qn register and  8 fmla operations which also using uncorrelated Qn reigster, codes shows as follows, 


the address of X1 and X2 are on stack. why ldr loop will consume double time of the fmla loop?

I have refer to doc "Cortex_A57_Software_Optimization_Guide_external.pdf", ldr lantency is 5, and fmla is 10.

Parents Reply Children
No Data