This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Cortex M3 - Conditions for IT folding

Hi folks,

Some weeks ago, I discover the mechanism of IT instruction folding supported by the cortex-M3.

As mentionned in 'Cortex-M3 Devices Generic User Guide', "In some situations, the processor can start executing the first instruction in an IT block while it is still executing the IT instruction. This behavior is called IT folding...".

Therefore, it leads that IT instruction timing cost is '0' cycle, Wonderful !!!

In fact, I would like to know what are those situations/conditions to anticipate/favorise this behaviour ?

Before posting here, I made several unsuccessful searches on the net.

Are those conditions associated to the instruction before IT one ? Alignment ? Type of instruction (16 or 32, data processing, load-store)?

Are those conditions associated to the instruction after IT one ? Alignment ? Type of instruction (16 or 32, data processing, load-store)?

I also have some subsidiary questions, for my personal curiosity, and that help to answer my previous question.

Based on my knowledge of this chip after reading some articles, I made the following assumptions that I would like to confirm:

Is 'IT folding' linked to the fact that the first instruction of an IT block is always executed (always marked as THEN)?

Is 'IT folding' linked to the fact that the EPSR is not directly accessible [Cortex™-M3 Technical Reference Manual, §2.3.2]?

For this kind of simultaneous execution, I suppose that the IT and another instruction need to be present in the decode stage at the same time?

But the behavior of the couple fetch/decode stages is not clear for me: could the fetch contains two 16-bit instructions and then decode stage requests only one or two instructions ?

I'm new on this kind of topics, don't hesistate to correct me if my previous assumptions are wrong.

Thanks for your help.

Parents
  • IT folding happens when the

    - instruction preceeding the IT instruction is 16-bit (and not a branch instruction), and

    - the IT instruction is already fetched in the instruction buffer. (It might not happen shortly after a branch as the IT instruction might still being fetched).

    There is no other alignment requirement.

    Regarding your assumptions:

    > Is 'IT folding' linked to the fact that the first instruction of an IT block is always executed (always marked as THEN)?

    No. The first instruction is always "T" so that we can save a bit in the encoding of the condition.

    > Is 'IT folding' linked to the fact that the EPSR is not directly accessible [Cortex™-M3 Technical Reference Manual, §2.3.2]?

    Not as far as I know.

    >For this kind of simultaneous execution, I suppose that the IT and another instruction need to be present in the decode stage at the same time?

    Yes. That's why the preceeding instruction need to be 16-bit.

    > could the fetch contains two 16-bit instructions and then decode stage requests only one or two instructions ?

    The path from instruction fetch to decode stage is 32-bit. But the second 16-bit might not have valid instruction (e.g. still being fetched due to memory waitstate). So it is possible the the IT fold cannot take place and need another decode cycle later.

    regards,

    Joseph

Reply
  • IT folding happens when the

    - instruction preceeding the IT instruction is 16-bit (and not a branch instruction), and

    - the IT instruction is already fetched in the instruction buffer. (It might not happen shortly after a branch as the IT instruction might still being fetched).

    There is no other alignment requirement.

    Regarding your assumptions:

    > Is 'IT folding' linked to the fact that the first instruction of an IT block is always executed (always marked as THEN)?

    No. The first instruction is always "T" so that we can save a bit in the encoding of the condition.

    > Is 'IT folding' linked to the fact that the EPSR is not directly accessible [Cortex™-M3 Technical Reference Manual, §2.3.2]?

    Not as far as I know.

    >For this kind of simultaneous execution, I suppose that the IT and another instruction need to be present in the decode stage at the same time?

    Yes. That's why the preceeding instruction need to be 16-bit.

    > could the fetch contains two 16-bit instructions and then decode stage requests only one or two instructions ?

    The path from instruction fetch to decode stage is 32-bit. But the second 16-bit might not have valid instruction (e.g. still being fetched due to memory waitstate). So it is possible the the IT fold cannot take place and need another decode cycle later.

    regards,

    Joseph

Children
No data