This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

ldr and fmla instruction time consumption issue.

Hi awesome guy,

I have a question on ARM A53 platform, and I needs your help!

8 ldr operations which using uncorrelated Qn register and 8 fmla operations which also using uncorrelated Qn reigster, codes shows as follows,

and

the address of X1 and X2 are on stack. why ldr loop will consume double time of the fmla loop?

I have refer to doc "Cortex_A57_Software_Optimization_Guide_external.pdf", ldr lantency is 5, and fmla is 10.

0 Zhifei Yang over 7 years ago

Try to force the LDR data address to be 64-bit aligned and retest. Any changes?
Cancel
Vote up 0 Vote down

Cancel
0 姑苏风河 over 7 years ago in reply to Zhifei Yang

We have test, the data address are all 64-bits aligned!
Cancel
Vote up 0 Vote down

Cancel
0 姑苏风河 over 7 years ago in reply to 姑苏风河

address a:0x7ff6ea9f00, b:0x7ff6ea9f10, c:0x7ff6ea9f20, d:0x7ff6ea9f30

By the way, the complier is AArch64
Cancel
Vote up 0 Vote down

Cancel
0 Zhifei Yang over 7 years ago in reply to 姑苏风河

FMLA instruction utilizes the SIMD hardware floating point unit. So it is possible that Load-Store operations are slower than SIMD/FP instructions.
Cancel
Vote up 0 Vote down

Cancel
0 姑苏风河 over 7 years ago in reply to Zhifei Yang

The two pictures as below is cutted from doc "Cortex_A57_Software_Optimization_Guide_external.pdf", how to interpret lantency?

or else these instructions are different from A53?
Cancel
Vote up 0 Vote down

Cancel
0 Zhifei Yang over 7 years ago in reply to 姑苏风河

From Cortex-A53 Software Optimization guide, I did not see the similar instruction table. So the latency number may be different between CA53 and CA57.
Cancel
Vote up 0 Vote down

Cancel
0 arthur_libin over 7 years ago in reply to Zhifei Yang

can you show me the latency of fmla q-form and ldr q-form in CA53? i am confused when i test them.
Cancel
Vote up 0 Vote down

Cancel
0 arthur_libin over 7 years ago in reply to arthur_libin

And you can see chapter 3 INSTRUCTION CHARACTERISTICS to find the instruction table in Cortex_A57_Software_Optimization_Guide_external.pdf
Cancel
Vote up 0 Vote down

Cancel