This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Cortex-M7 Load/store timing execution ?

I'm not a native English speaker. So, sorry for the broken English. I'm intend to develop a system where the microcontroller will interface with a 8 bit parallel port IC. The bytes will be loaded into the microcontroller at the specific timing. As documented for the Cortex-M4 in http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0439b/CHDDIGAC.html , the example: LDR R0,[R1,R2]; STR R0,[R3,#20] - normally three cycles total.

How about the timing for the Cortex-M7 same as the example ? 

Parents
  • Hi,


    the document says the execution time of a pair of "LDR and STR" will be 3 cycles.
    In the Cortex-M7 case, it would be basically the same.
    However, the Cotrtex-M7 can execute by 1 cycle for the successive pairs of "LDR and STR".
    That is.
    1) LDR & STR                         --- 3 cycles
    2) LDR & STR; LDR & STR              --- 4 cycles
    3) LDR & STR; LDR & STR; LDR & STR;  --- 5 cycles

    These are results from the real evaluation board.


    Best regards,
    Yasuhiko Koumoto.

Reply
  • Hi,


    the document says the execution time of a pair of "LDR and STR" will be 3 cycles.
    In the Cortex-M7 case, it would be basically the same.
    However, the Cotrtex-M7 can execute by 1 cycle for the successive pairs of "LDR and STR".
    That is.
    1) LDR & STR                         --- 3 cycles
    2) LDR & STR; LDR & STR              --- 4 cycles
    3) LDR & STR; LDR & STR; LDR & STR;  --- 5 cycles

    These are results from the real evaluation board.


    Best regards,
    Yasuhiko Koumoto.

Children
  • Hi Yasuhiko Koumoto. Thank you for your reply.

    Sorry if i'm wrong. From what i understand base on your explanation. The successive pair of "LDR & STR"  will be execute in 1 cycle. So, for the first pair of "LDR & STR" will take 3 cycle, then, the successive pairs will take one.

    So for the example:

    LDR R0,[R1] ; STR R0,[R2,#1] - 3 cycles

    LDR R0,[R1] ; STR R0,[R2,#2] - 1 cycle

    LDR R0,[R1] ; STR R0,[R2,#3] - 1 cycle

    LDR R0,[R1] ; STR R0,[R2,#4] - 1 cycle

    so, all these instructions will executed in 6 cycles. Is it this timing applies to M4?

    "Neighboring load and store single instructions can pipeline their address and data phases. This enables these instructions to complete in a single execution cycle."

    Im not good in English. I'm a bit confuse by this statement.

    So basically, M4 and M7 will execute the successive LDR & STR 's in 1 cycle ?

    I want develop a application where the bytes from 8 bit parallel port of a IC will be read (burst) by the microcontroller as fast as possible. May will be read at rate of 80 to 100 Mbyte per sec or more if possible. The IC will be synchronized by a system clock. I have never experienced with M3,M4 and M7. But, have learned the M0, the LPCXXXX  microcontroller and familiar with some types of microcontrollers. And now, i do some study about the cortex-m4 and m7.

    Is it M4 is more than enough for my application or i need to use M7 instead ?

  • Hi azrul,


    I'm sorry but I cannot fully answer your questions.
    I just show the observed results on my evaluation board.
    Because I am not the implementer of Cortex-M, I cannot know more than written in the documents.

    So basically, M4 and M7 will execute the successive LDR & STR 's in 1 cycle ?

    No. It would be applicable only to M3/M4. Also, this assumes 0 wait memory and AHB Lite bus. Cortex-M7 equips AXI bus.
    If you think to access GPIOs, they might not be accessed within 1 cycle.


    Best regards,
    Yasuhiko Koumoto.