I'm not a native English speaker. So, sorry for the broken English. I'm intend to develop a system where the microcontroller will interface with a 8 bit parallel port IC. The bytes will be loaded into the microcontroller at the specific timing. As documented for the Cortex-M4 in http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0439b/CHDDIGAC.html , the example: LDR R0,[R1,R2]; STR R0,[R3,#20] - normally three cycles total.
LDR R0,[R1,R2]; STR R0,[R3,#20]
How about the timing for the Cortex-M7 same as the example ?