I'm stepping & cycle checking with the DWT counter, on CM4
Question why is the second LDR here is 6 cycles, Is it because of flash vs SRAM ? And is Flash read always 6 cycles ?
And , is it in the processor cycles, not some slower bus-cycles, to calculate actual time taken .... [bah, more questions ...]
131 cycles_ave /= N; <-- N = 2001000966: 683B ldr r3, [r7] <-- ok, nice 2 cycles , R7 contains SRAM address (the running counter ..)01000968: 4A0D ldr r2, [pc, #0x34] <--- 6 cycles here .. .. pc+x34 = 010009A2 is flash ... Ok, my constant N load ..?0100096a: FBA22303 umull r2, r3, r2, r30100096e: 091B lsrs r3, r3, #401000970: 603B str r3, [r7]...........010009a0: CCCD ldm r4!, {r0, r2, r3, r6, r7} <---- this is weird, so above it loads 0xCCCCCCCD into r2 ( ??? I sort of, hoped to see a 20/x14 somewhere .. )010009a2: CCCC ldm r4!, {r2, r3, r6, r7}............
Overall , of course, it generates correct result, but I'm lost as to what's it doing. Some compiler magic translating to me ..
At first: There is a button to insert code!And, yes normally reading from Flash takes longer then reading from SRAM. But this is SoC specific. Not CM4 specific.
Yea I had this as an afterthought about SoC specific - need to check what the chip says about flash !
Hmm, if I reduce my processor clock, will the cycles to read flash go down as well, or will it affect the bus speed and so flash read speed to, with result that this wont' change ... ?
d.ry said:if I reduce my processor clock, will the cycles to read flash go down as well
As noted, this is chip-specific - not an ARM thing.
On some chips, waitstates have to added to flash accesses at higher clock rates - so reducing the clock might allow you to reduce/remove the waitstates ...
42Bastian Schick said:There is a button to insert code!
This:
Thanks Andy,
I deff plan to try that.
d.ry said:Yea I had this as an afterthought about SoC specific - need to check what the chip says about flash !
Just in case someone "drops" into this thread: The Cortex-M4 manual says:
"3.3.1 Cortex-M4 instructionsThe processor implements the ARMv7-M Thumb instruction set. Table 3-1 shows the Cortex-M4 instructions and their cycle counts. The cycle counts are based on a system with zero wait states."
Different busses have different waitstates. The higher the CPU clock the higher the propability of wait states.
You probably meant to point to here ...
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0439b/CHDDIGAC.html
42Bastian Schick Andy Neil:
I'm reading CM4 tech reference manual, http://infocenter.arm.com/help/topic/com.arm.doc.ddi0439b/DDI0439B_cortex_m4_r0p0_trm.pdf
Again, sorry c&p:
Table 3-1
Word LDR Rd, [Rn, op2] 2 cycles (note b)
Lets ignore that b note, about sequential LDRs.
If op2 is Rm - another register to add to Rn - how can LDR be anything less than 3 cycles? When is Rn + Rm , the address, is calculated?
ARM7TDMI spec shows LDR as a 3 cycle op. How / what changed to ARMv7m to make it 2 cycles , and seems, avoid the need for cycle to calculate address?
d.ry said:ARM7TDMI spec shows LDR as a 3 cycle op. How / what changed to ARMv7m to make it 2 cycles , and seems, avoid the need for cycle to calculate address?
You should address this question to the designers of the core.