This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Cortex M3 - Conditions for IT folding

Hi folks,

Some weeks ago, I discover the mechanism of IT instruction folding supported by the cortex-M3.

As mentionned in 'Cortex-M3 Devices Generic User Guide', "In some situations, the processor can start executing the first instruction in an IT block while it is still executing the IT instruction. This behavior is called IT folding...".

Therefore, it leads that IT instruction timing cost is '0' cycle, Wonderful !!!

In fact, I would like to know what are those situations/conditions to anticipate/favorise this behaviour ?

Before posting here, I made several unsuccessful searches on the net.

Are those conditions associated to the instruction before IT one ? Alignment ? Type of instruction (16 or 32, data processing, load-store)?

Are those conditions associated to the instruction after IT one ? Alignment ? Type of instruction (16 or 32, data processing, load-store)?

I also have some subsidiary questions, for my personal curiosity, and that help to answer my previous question.

Based on my knowledge of this chip after reading some articles, I made the following assumptions that I would like to confirm:

Is 'IT folding' linked to the fact that the first instruction of an IT block is always executed (always marked as THEN)?

Is 'IT folding' linked to the fact that the EPSR is not directly accessible [Cortex™-M3 Technical Reference Manual, §2.3.2]?

For this kind of simultaneous execution, I suppose that the IT and another instruction need to be present in the decode stage at the same time?

But the behavior of the couple fetch/decode stages is not clear for me: could the fetch contains two 16-bit instructions and then decode stage requests only one or two instructions ?

I'm new on this kind of topics, don't hesistate to correct me if my previous assumptions are wrong.

Thanks for your help.

Parents
  • Those are excellent questions; I wish there were a "helpful question" button to reward them.

    -And I like the detailed answers jyiu give on this subject.

    There is one situation, which I think is not fully covered.

    A CMP instruction can, if it's 16-bit, be folded into a preceding LOAD instruction, if the LOAD instruction is 16-bit.

    An IT instruction can be folded into a preceding 16-bit instruction.

    If I understand this correctly, those two cases cannot happen at the same time; it's "either / or".

    My understanding is that if the 16-bit load instruction is located on a 32-bit aligned address, then the 16-bit CMP instruction following it will be folded into the load instruction; and the CMP may execute in one clock cycle, but the load instruction uses one clock cycle less, if it's a two-cycle instruction.

    The IT instruction will then not be folded into the 16-bit CMP instruction, because the IT instruction is not part of the 32-bit word that was fetched with CMP.

    However, if the load instruction is located on a non-32-bit aligned address, the IT instruction will be folded into the 16-bit CMP instruction.

    In other words: The fold only happens if the preceding instruction is aligned on a 32-bit address.

    (Did I understand this correctly ?)

Reply
  • Those are excellent questions; I wish there were a "helpful question" button to reward them.

    -And I like the detailed answers jyiu give on this subject.

    There is one situation, which I think is not fully covered.

    A CMP instruction can, if it's 16-bit, be folded into a preceding LOAD instruction, if the LOAD instruction is 16-bit.

    An IT instruction can be folded into a preceding 16-bit instruction.

    If I understand this correctly, those two cases cannot happen at the same time; it's "either / or".

    My understanding is that if the 16-bit load instruction is located on a 32-bit aligned address, then the 16-bit CMP instruction following it will be folded into the load instruction; and the CMP may execute in one clock cycle, but the load instruction uses one clock cycle less, if it's a two-cycle instruction.

    The IT instruction will then not be folded into the 16-bit CMP instruction, because the IT instruction is not part of the 32-bit word that was fetched with CMP.

    However, if the load instruction is located on a non-32-bit aligned address, the IT instruction will be folded into the 16-bit CMP instruction.

    In other words: The fold only happens if the preceding instruction is aligned on a 32-bit address.

    (Did I understand this correctly ?)

Children
  • I haven't look into this in details for a while so I could be wrong :

    I don't think it is necessary that the instruction pair (16-bit thumb + IT) need to be aligned on a 32-bit address.

    But sometimes it helps because the flash might not have the IT instruction in the instruction queue in time if the IT instruction is on the next instruction fetch.

    (Don't forget the flash memories are usually a bit slow). The Cortex-M3 and Cortex-M4 both has a 3-word instruction buffer, and if the instruction buffer is filled up I think the IT fold could work as the IT instruction can be decoded at the same time as the preceeding instruction.

    regards,

    Joseph

  • If a double-fold is possible, that's truly amazing (and unexpected)!

    Thank you for these  details. It's very helpful, when the cycles available are few (due to tight timing or low clock frequencies).

  • Happy that you appreciate my questions .

    A good thing for my first question on this forum.

    I will try contune in this way.

  • Joseph,

    Thank you for detailled answer.

    It is more clear for me now.

    As the IT instruction is often preceded by a CMP instruction.

    If the16-bit encoding is used for CMP, the IT should often be folded if it happens not shortly after a branch.

    Therefore a special attention to the choice of CMP instruction could favorise the IT folding.

    Regards,

    Rémi.