This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Cortex M3 - Conditions for IT folding

Hi folks,

Some weeks ago, I discover the mechanism of IT instruction folding supported by the cortex-M3.

As mentionned in 'Cortex-M3 Devices Generic User Guide', "In some situations, the processor can start executing the first instruction in an IT block while it is still executing the IT instruction. This behavior is called IT folding...".

Therefore, it leads that IT instruction timing cost is '0' cycle, Wonderful !!!

In fact, I would like to know what are those situations/conditions to anticipate/favorise this behaviour ?

Before posting here, I made several unsuccessful searches on the net.

Are those conditions associated to the instruction before IT one ? Alignment ? Type of instruction (16 or 32, data processing, load-store)?

Are those conditions associated to the instruction after IT one ? Alignment ? Type of instruction (16 or 32, data processing, load-store)?

I also have some subsidiary questions, for my personal curiosity, and that help to answer my previous question.

Based on my knowledge of this chip after reading some articles, I made the following assumptions that I would like to confirm:

Is 'IT folding' linked to the fact that the first instruction of an IT block is always executed (always marked as THEN)?

Is 'IT folding' linked to the fact that the EPSR is not directly accessible [Cortex™-M3 Technical Reference Manual, §2.3.2]?

For this kind of simultaneous execution, I suppose that the IT and another instruction need to be present in the decode stage at the same time?

But the behavior of the couple fetch/decode stages is not clear for me: could the fetch contains two 16-bit instructions and then decode stage requests only one or two instructions ?

I'm new on this kind of topics, don't hesistate to correct me if my previous assumptions are wrong.

Thanks for your help.

Parents
  • Those are excellent questions; I wish there were a "helpful question" button to reward them.

    -And I like the detailed answers jyiu give on this subject.

    There is one situation, which I think is not fully covered.

    A CMP instruction can, if it's 16-bit, be folded into a preceding LOAD instruction, if the LOAD instruction is 16-bit.

    An IT instruction can be folded into a preceding 16-bit instruction.

    If I understand this correctly, those two cases cannot happen at the same time; it's "either / or".

    My understanding is that if the 16-bit load instruction is located on a 32-bit aligned address, then the 16-bit CMP instruction following it will be folded into the load instruction; and the CMP may execute in one clock cycle, but the load instruction uses one clock cycle less, if it's a two-cycle instruction.

    The IT instruction will then not be folded into the 16-bit CMP instruction, because the IT instruction is not part of the 32-bit word that was fetched with CMP.

    However, if the load instruction is located on a non-32-bit aligned address, the IT instruction will be folded into the 16-bit CMP instruction.

    In other words: The fold only happens if the preceding instruction is aligned on a 32-bit address.

    (Did I understand this correctly ?)

Reply
  • Those are excellent questions; I wish there were a "helpful question" button to reward them.

    -And I like the detailed answers jyiu give on this subject.

    There is one situation, which I think is not fully covered.

    A CMP instruction can, if it's 16-bit, be folded into a preceding LOAD instruction, if the LOAD instruction is 16-bit.

    An IT instruction can be folded into a preceding 16-bit instruction.

    If I understand this correctly, those two cases cannot happen at the same time; it's "either / or".

    My understanding is that if the 16-bit load instruction is located on a 32-bit aligned address, then the 16-bit CMP instruction following it will be folded into the load instruction; and the CMP may execute in one clock cycle, but the load instruction uses one clock cycle less, if it's a two-cycle instruction.

    The IT instruction will then not be folded into the 16-bit CMP instruction, because the IT instruction is not part of the 32-bit word that was fetched with CMP.

    However, if the load instruction is located on a non-32-bit aligned address, the IT instruction will be folded into the 16-bit CMP instruction.

    In other words: The fold only happens if the preceding instruction is aligned on a 32-bit address.

    (Did I understand this correctly ?)

Children