Dear All,
I expected load and store instructions accessing zero wait state accessible memory to take only 1 cycle (average and with pipeline filled), but it doesn't seem to. Is it typical even with zero wait state memory access for load and store to take at least 2 cycles?
(Here, by the zero wait state memory I mean, for example, an internal RAM with operating clock freq. larger than that of the processor core.)
Here below is the test code and its generated assembly code I used. (I tested this on STM32F429ZITx board.)
for (i=0; i<20000; i++) {
data = test_data[i];
test_data[20000-1-i] = data;
}
And below is the generated assembly code (loop unrolled with two iterations in the loop; with optimize option -O3 -Otime). This 14 instruction loop is measured to take 36 cycles. So, it takes 2.6 cycles/instruction.
0x080019E0 F8343011 LDRH r3,[r4,r1,LSL #1]
0x080019E4 F8AD3000 STRH r3,[sp,#0x00]
0x080019E8 F8BDC000 LDRH r12,[sp,#0x00]
0x080019EC 1A53 SUBS r3,r2,r1
0x080019EE F824C013 STRH r12,[r4,r3,LSL #1]
0x080019F2 EB040341 ADD r3,r4,r1,LSL #1
0x080019F6 885B LDRH r3,[r3,#0x02]
0x080019F8 F8AD3000 STRH r3,[sp,#0x00]
0x080019FC F8BDC000 LDRH r12,[sp,#0x00]
0x08001A00 1A43 SUBS r3,r0,r1
0x08001A02 F824C013 STRH r12,[r4,r3,LSL #1]
0x08001A06 1C89 ADDS r1,r1,#2
0x08001A08 42A9 CMP r1,r5
0x08001A0A D3E9 BCC 0x080019E0
Thanks,
Junseo
JSLEE said:(Here, by the zero wait state memory I mean, for example, an internal RAM with operating clock freq. larger than that of the processor core.)
You are using CCM RAM, right? If not then wait states apply.
See chapter 3.3 of DDI0439C, it lists all cycle counts.
STRH r3,[sp,#0x00]
1 cycle to load and execute instruction (+ wait state of Instruction memory)
1 cycle to store data (+ wait state of data memory)
MOV r0,r1
1 cycle to load and execute instruction (+ wait state of instruction memory)
JSLEE said:I tested this on STM32F429ZITx
Have you also tested it on anything else - to see if it's specific to that chip ... ?
View all questions in Cortex-M / M-Profile forum