This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Cortex A8 PLD

Good morning, I'm studying ARM assembly, Cortex A series. Reading the ARM documentation I found out this paper (Cortex A8, fast memcpy examples). My attention went to the PDL instruction, preloading into cache. I have read about it on the ARM manuals, but I still don't understand why the offset is in this way:

WordCopyPLD
      PLD [r1, #0x100]
      MOV r12, #16
WordCopyPLD1
      LDR r3, [r1], #4
      STR r3, [r0], #4
      SUBS r12, r12, #1
      BNE WordCopyPLD1
      SUBS r2, r2, #0x40
      BNE WordCopyPLD


Why the offset in this case is 128 byte ahead? if I read words from memory pointed by R1, for 16 times, I was supposed that the bytes ahead were 4*16=64. Why 128?

The same question with this example:

NEONCopyPLD
      PLD [r1, #0xC0]
      VLDM r1!,{d0-d7}
      VSTM r0!,{d0-d7}
      SUBS r2,r2,#0x40
      BGE NEONCopyPLD

Why 192 byte ahead if the Dn are 8 bytes each one and I load 8 registers each iteration? 

Thank you for any answer.

0