Is it typical at least 2 cycles taken for load from and store to a zero wait state accessible memory?

Dear All,

I expected load and store instructions accessing zero wait state accessible memory to take only 1 cycle (average and with pipeline filled), but it doesn't seem to. Is it typical even with zero wait state memory access for load and store to take at least 2 cycles?

(Here, by the zero wait state memory I mean, for example, an internal RAM with operating clock freq. larger than that of the processor core.)

Here below is the test code and its generated assembly code I used. (I tested this on STM32F429ZITx board.)

  for (i=0; i<20000; i++) {

    data = test_data[i];

    test_data[20000-1-i] = data;


And below is the generated assembly code (loop unrolled with two iterations in the loop; with optimize option -O3 -Otime). This 14 instruction loop is measured to take 36 cycles. So, it takes 2.6 cycles/instruction.

0x080019E0 F8343011 LDRH     r3,[r4,r1,LSL #1]

0x080019E4 F8AD3000 STRH     r3,[sp,#0x00]

0x080019E8 F8BDC000 LDRH     r12,[sp,#0x00]

0x080019EC 1A53   SUBS     r3,r2,r1

0x080019EE F824C013 STRH     r12,[r4,r3,LSL #1]

0x080019F2 EB040341 ADD      r3,r4,r1,LSL #1

0x080019F6 885B   LDRH     r3,[r3,#0x02]

0x080019F8 F8AD3000 STRH     r3,[sp,#0x00]

0x080019FC F8BDC000 LDRH     r12,[sp,#0x00]

0x08001A00 1A43   SUBS     r3,r0,r1

0x08001A02 F824C013 STRH     r12,[r4,r3,LSL #1]

0x08001A06 1C89   ADDS     r1,r1,#2

0x08001A08 42A9   CMP      r1,r5

0x08001A0A D3E9   BCC      0x080019E0



Parents Reply Children
No data
More questions in this forum