This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Cortex A8 PLD

Good morning, I'm studying ARM assembly, Cortex A series. Reading the ARM documentation I found out this paper (Cortex A8, fast memcpy examples). My attention went to the PDL instruction, preloading into cache. I have read about it on the ARM manuals, but I still don't understand why the offset is in this way:

WordCopyPLD
      PLD [r1, #0x100]
      MOV r12, #16
WordCopyPLD1
      LDR r3, [r1], #4
      STR r3, [r0], #4
      SUBS r12, r12, #1
      BNE WordCopyPLD1
      SUBS r2, r2, #0x40
      BNE WordCopyPLD


Why the offset in this case is 128 byte ahead? if I read words from memory pointed by R1, for 16 times, I was supposed that the bytes ahead were 4*16=64. Why 128?

The same question with this example:

NEONCopyPLD
      PLD [r1, #0xC0]
      VLDM r1!,{d0-d7}
      VSTM r0!,{d0-d7}
      SUBS r2,r2,#0x40
      BGE NEONCopyPLD

Why 192 byte ahead if the Dn are 8 bytes each one and I load 8 registers each iteration? 

Thank you for any answer.

Parents
  • Hi, I am speculating:1st case: The cache will do a speculative access in the next cache line (size 64) when it fill the current line (triggered by the ldr, so "pld" tells it to do so for the overnext.

    2nd case: I have no idea.

Reply
  • Hi, I am speculating:1st case: The cache will do a speculative access in the next cache line (size 64) when it fill the current line (triggered by the ldr, so "pld" tells it to do so for the overnext.

    2nd case: I have no idea.

Children
No data