This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

How long bitfields on which ARM?

I need to be able to handle long bitfields as effectively as possible. Right now I need up to 64 bits in length.

Are there instructions to set, clear and test individual bits in one cycle available for some of the architectures? Which? Particularly, will the M0+ handle it (which only does reduced thumb2)? If not, which comparable?

What I find confuses me. In a thumb2 ref card I found that "Width of bitfield. <width> + <lsb> must be <= 32." But some 5 years ago I programmed some on a STR91xF ARM9 processor, and there was some talk about l-o-n-g bit arrays that could be handled in one cycle, but there was some 1024 bytes of microcoded table for this. (See, I am already long afloat, in deep water! Maybe this was for all kind of masks?)

Also, what would happen if I need to set or clear (like) bit 27 and bit 60 in one instruction? Will compilers (which?) then treat a full 32 bits word times two, a 64 bits word, or will it handle only byte 3 and byte 7 (starting at byte 0) and do the trick on them? Is the barrel shifter part of this?

Aclassifier

Øyvind Teig | Some of my blog notes

Parents
  • Unfortunately, I've not worked with ARM9, so I do not know the mask-array feature.

    (I'll admit that it took me a while to find out that CSP is an abbreviation of Communicating Sequential Processes)

    I was thinking a bit about masks in GPIO-registers, but for some reason, I did not mention them.

    Many of NXP's microcontrollers allow you to set a mask for the GPIO pins. I mention these, because some of them supports 32 pin (32-bit) GPIO ports. I do not know whether or not this is useful, however, in addition to this mask, the GPIO pins also have atomic access set and clear registers (some allow for toggling as well). So far, I believe NXP's LPC175x-LPC178x, LPC18xx, LPC43xx and LPC541xx have the quickest I/O ports that support 32 pins per port.

    You might not need to use any pins on the microcontroller, but you could still use these registers as '32-bit RAM'. As far as I know, Microchip also makes microcontrollers that support 32-pin (32-bit) ports.

    Regarding using the Cortex-M0; if you need real fast access, then the Cortex-M0 might be too limited.

    By now, you probably know that ...

    • The Cortex-M0 and Cortex-M0+ instruction sets are only 16-bit.
    • The Cortex-M3 has all the Cortex-M0/Cortex-M0+ instructions, plus a bunch of extra instructions.
    • The Cortex-M4 has all the Cortex-M3 instructions, plus some neat DSP functions.
    • The Cortex-M4F (with floating point unit) has all the Cortex-M4 instructions + 32-bit floating point instructions.
    • The Cortex-M7 has all the Cortex-M4 instructions + 64-bit floating point.

    In addition, the Cortex-M7 is basically 1.63 times as fast per MHz as the Cortex-M4 (my estimation).

    If you code in assembly-language, you might be able to get a performance that's twice as fast per MHz than if you run the code on the Cortex-M4.

    Some of the Cortex-M4 and Cortex-M7 DSP instructions might be interesting for you as well. The UXTA and SXTA instructions can extract an 8- or 16-bit value from one register and add it to another register. The operation includes rotating the source register first.

    Even though the Cortex-M0 only has a 16-bit instruction set, it's still able to work on 32-bit integers, but since the instruction set does not allow for the same barrel-shifter tricks and conditional instruction execution, the code will be larger and slower.

    However, some Cortex-M0/Cortex-M0+ implementations include Bit-Banding. The Bit-Banding is an optional feature, that the vendors may include if they wish. Bit-Banding is particular useful when the microcontroller has more than a single core (for instance a Cortex-M4 + a Cortex-M0 core), as Bit-Banding allows for atomic operations.

Reply
  • Unfortunately, I've not worked with ARM9, so I do not know the mask-array feature.

    (I'll admit that it took me a while to find out that CSP is an abbreviation of Communicating Sequential Processes)

    I was thinking a bit about masks in GPIO-registers, but for some reason, I did not mention them.

    Many of NXP's microcontrollers allow you to set a mask for the GPIO pins. I mention these, because some of them supports 32 pin (32-bit) GPIO ports. I do not know whether or not this is useful, however, in addition to this mask, the GPIO pins also have atomic access set and clear registers (some allow for toggling as well). So far, I believe NXP's LPC175x-LPC178x, LPC18xx, LPC43xx and LPC541xx have the quickest I/O ports that support 32 pins per port.

    You might not need to use any pins on the microcontroller, but you could still use these registers as '32-bit RAM'. As far as I know, Microchip also makes microcontrollers that support 32-pin (32-bit) ports.

    Regarding using the Cortex-M0; if you need real fast access, then the Cortex-M0 might be too limited.

    By now, you probably know that ...

    • The Cortex-M0 and Cortex-M0+ instruction sets are only 16-bit.
    • The Cortex-M3 has all the Cortex-M0/Cortex-M0+ instructions, plus a bunch of extra instructions.
    • The Cortex-M4 has all the Cortex-M3 instructions, plus some neat DSP functions.
    • The Cortex-M4F (with floating point unit) has all the Cortex-M4 instructions + 32-bit floating point instructions.
    • The Cortex-M7 has all the Cortex-M4 instructions + 64-bit floating point.

    In addition, the Cortex-M7 is basically 1.63 times as fast per MHz as the Cortex-M4 (my estimation).

    If you code in assembly-language, you might be able to get a performance that's twice as fast per MHz than if you run the code on the Cortex-M4.

    Some of the Cortex-M4 and Cortex-M7 DSP instructions might be interesting for you as well. The UXTA and SXTA instructions can extract an 8- or 16-bit value from one register and add it to another register. The operation includes rotating the source register first.

    Even though the Cortex-M0 only has a 16-bit instruction set, it's still able to work on 32-bit integers, but since the instruction set does not allow for the same barrel-shifter tricks and conditional instruction execution, the code will be larger and slower.

    However, some Cortex-M0/Cortex-M0+ implementations include Bit-Banding. The Bit-Banding is an optional feature, that the vendors may include if they wish. Bit-Banding is particular useful when the microcontroller has more than a single core (for instance a Cortex-M4 + a Cortex-M0 core), as Bit-Banding allows for atomic operations.

Children
No data