This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Cortex-m0 instructions and core registers immediete values

Hi, i have just got a cortex-m0(LPC1114) based dev board. I'm reading about the architecture and instructions. My understanding is that it supports most thumb 16-bit instructions and a handful thumb-2 32-bit instructions. If the processor has a 32-bit bus which instructions are fetched(im assuming this, also i have limited knowledge of a CPU's inner workings), why don't it support more thumb-2 instructions? It seems like a waste, or maybe it fetches two 16-bit instructions?

My other issue is related to to my first question. I'm trying to set the core registers with immediate values using MOV. I read that you can use MOV for the for the 16 lower bits, and MOVT for the 16 higher bit(this is only for cores which supports ARM32 i suppose). However first it seems MOVT is not supported by cortex-m0, in arm-none-eabi-as: "Error: selected processor does not support Thumb mode `movt r0 , r1'. Also when i read the ARMv6-M reference manual, i read that MOV can only set a immediate value up to 8-bits long. This all seems very strange to me. I got a hint on a IRC that your really supposed to use PC relative addressing to set registers "directly", which i haven't  read much into. Are there no efficient way to set immediate 32-bit values for registers, using MOV or other data instructions?

Thanks for responses!

Parents
  • Hello,

    So if the first instruction is 16-bit and is a B, the following instruction is thrown away? if so, does that mean the following instruction is thrown away?

    Yes.

    I ask because some 32-bit CPUs with 16-bit instructions still perform the instruction IF it is nuclear e.g.LD R0,#0. You can see the advantage. No empty slots in the pipeline.

    Do you say about the branch delay slot?

    I don't know the 32-bit CPU of which instruction length is 16-bit and which supports the branch delay slot other than SH.

    In some case, it will be useful but there will be sometimes a disadvantage of a code size.

    It is because why it is a rare case which a compiler could fill the delay slot with a certain instruction.

    Best regards,

    Yasuhiko Koumoto.

Reply
  • Hello,

    So if the first instruction is 16-bit and is a B, the following instruction is thrown away? if so, does that mean the following instruction is thrown away?

    Yes.

    I ask because some 32-bit CPUs with 16-bit instructions still perform the instruction IF it is nuclear e.g.LD R0,#0. You can see the advantage. No empty slots in the pipeline.

    Do you say about the branch delay slot?

    I don't know the 32-bit CPU of which instruction length is 16-bit and which supports the branch delay slot other than SH.

    In some case, it will be useful but there will be sometimes a disadvantage of a code size.

    It is because why it is a rare case which a compiler could fill the delay slot with a certain instruction.

    Best regards,

    Yasuhiko Koumoto.

Children
  • Dear Koumoto San,                               Since the BBC is buying over 1 million custom chips, I am wondering if it is difficult to alter the M0 to remove the cache-clear. I found that I could speed up SH2 code significantly by using this simple trick. This was using the GNU compiler-chain so I rewrote all of the felide-constructors to use this feature. I was able to achieve 0 NOP instructions in the whole object code of Tombraider on the Saturn. In most cases, it would move the result of a C call into R0. A minor thing, but it reduced the size of object code.

    I fear I have an un natural hatred of a single cycle being wasted and so I look very hard at an instruction set and the pipeline as well as cache (if any) to ensure that the chip operates without a single wasted cycle. Like SH, M0 always reads 32-bit instructions and so, to prevent an instruction being thrown away, placing a B as the second of 2 16-bit instructions, at least less is wasted. Of course, it isn't as simple as 32-bit boundaries. A coder would have to work through code from it's start to ensure the order... but I will do that if it gives me extra performance. As you know, games programming in the 80s & 90s relied on someone hand-optimizing a mostly C project to get 95%+ of the theoretical CPU performance to compete. If I have a task that takes almost all of the CPU time (CELP decode for example), such tricks can mean the difference between success and failure.

    I thank you for your valuable time and I will try to keep my questions to a minimum.

    As I have posted, I am looking at software for the BBC Microbit and specifically an audiobook (CELP) and a language-lab in which the teacher broadcasts the language to all pupils and can listen to a single pupil. The Nordic Semiconductors chip also has an M0 core so I'm hoping, depending on how the CPU & bluetooth chips communicate, that when not in use, this CPU may also be used.

    If all else fails, I CAN use the ARM7TDMI used by all current Sandisk memory sticks, but that may prove difficult as they have a habit of altering control codes.

    I thank you for your patience and wish you a good day,

    With sincere thanks,Sean Wain Dunlevy