This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Cortex M3 - Conditions for IT folding

Hi folks,

Some weeks ago, I discover the mechanism of IT instruction folding supported by the cortex-M3.

As mentionned in 'Cortex-M3 Devices Generic User Guide', "In some situations, the processor can start executing the first instruction in an IT block while it is still executing the IT instruction. This behavior is called IT folding...".

Therefore, it leads that IT instruction timing cost is '0' cycle, Wonderful !!!

In fact, I would like to know what are those situations/conditions to anticipate/favorise this behaviour ?

Before posting here, I made several unsuccessful searches on the net.

Are those conditions associated to the instruction before IT one ? Alignment ? Type of instruction (16 or 32, data processing, load-store)?

Are those conditions associated to the instruction after IT one ? Alignment ? Type of instruction (16 or 32, data processing, load-store)?

I also have some subsidiary questions, for my personal curiosity, and that help to answer my previous question.

Based on my knowledge of this chip after reading some articles, I made the following assumptions that I would like to confirm:

Is 'IT folding' linked to the fact that the first instruction of an IT block is always executed (always marked as THEN)?

Is 'IT folding' linked to the fact that the EPSR is not directly accessible [Cortex™-M3 Technical Reference Manual, §2.3.2]?

For this kind of simultaneous execution, I suppose that the IT and another instruction need to be present in the decode stage at the same time?

But the behavior of the couple fetch/decode stages is not clear for me: could the fetch contains two 16-bit instructions and then decode stage requests only one or two instructions ?

I'm new on this kind of topics, don't hesistate to correct me if my previous assumptions are wrong.

Thanks for your help.

Parents
  • I haven't look into this in details for a while so I could be wrong :

    I don't think it is necessary that the instruction pair (16-bit thumb + IT) need to be aligned on a 32-bit address.

    But sometimes it helps because the flash might not have the IT instruction in the instruction queue in time if the IT instruction is on the next instruction fetch.

    (Don't forget the flash memories are usually a bit slow). The Cortex-M3 and Cortex-M4 both has a 3-word instruction buffer, and if the instruction buffer is filled up I think the IT fold could work as the IT instruction can be decoded at the same time as the preceeding instruction.

    regards,

    Joseph

Reply
  • I haven't look into this in details for a while so I could be wrong :

    I don't think it is necessary that the instruction pair (16-bit thumb + IT) need to be aligned on a 32-bit address.

    But sometimes it helps because the flash might not have the IT instruction in the instruction queue in time if the IT instruction is on the next instruction fetch.

    (Don't forget the flash memories are usually a bit slow). The Cortex-M3 and Cortex-M4 both has a 3-word instruction buffer, and if the instruction buffer is filled up I think the IT fold could work as the IT instruction can be decoded at the same time as the preceeding instruction.

    regards,

    Joseph

Children
  • If a double-fold is possible, that's truly amazing (and unexpected)!

    Thank you for these  details. It's very helpful, when the cycles available are few (due to tight timing or low clock frequencies).

  • Joseph,

    Thank you for detailled answer.

    It is more clear for me now.

    As the IT instruction is often preceded by a CMP instruction.

    If the16-bit encoding is used for CMP, the IT should often be folded if it happens not shortly after a branch.

    Therefore a special attention to the choice of CMP instruction could favorise the IT folding.

    Regards,

    Rémi.