Hi all, after a long time I'm back to forum with a question
I'm posting this question with some pseudo code
for(i=0;i<100;i++)
{
instruction1
instruction2
instruction3
.................
instructionA : pld [r0]
..................
instructionB :vld1.16 {d0-d3},[r0]!
InstructionN
}
Let me describe my understanding of pld instruction, correct me if I was wrong.....
pld instruction will give a hint to the processor that in near future we need the data at address r0 so that it may fill the cache lines with the required data from r0 to avoid cache miss penalties, but it is not compulsory sometimes processor may ignore it also....{cache line size = 8words = 32 bytes, in 32kb cache A9 processor, I know cache sizes are configurable}
I want to know below details
1.How many instructions ahead we have put pld [r0] before vld1.16 {d0-d3},[r0]! to see the better performane {avoiding cache miss penalties} on hardware like panda board ? like
3 instructions or 4 instructions ahead.......
2.when ever processor is excuting pld [r0] instruction how many cache lines will filled with data only 1-cache line or more?
will it be the same case for PLDW also with VST.16
ex : PLDW [r1]
VST.16 {d0-d3},[r1]!
What about PLI , how can specify the address reg for PLI instruction which contains address of instructions