This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

How long bitfields on which ARM?

I need to be able to handle long bitfields as effectively as possible. Right now I need up to 64 bits in length.

Are there instructions to set, clear and test individual bits in one cycle available for some of the architectures? Which? Particularly, will the M0+ handle it (which only does reduced thumb2)? If not, which comparable?

What I find confuses me. In a thumb2 ref card I found that "Width of bitfield. <width> + <lsb> must be <= 32." But some 5 years ago I programmed some on a STR91xF ARM9 processor, and there was some talk about l-o-n-g bit arrays that could be handled in one cycle, but there was some 1024 bytes of microcoded table for this. (See, I am already long afloat, in deep water! Maybe this was for all kind of masks?)

Also, what would happen if I need to set or clear (like) bit 27 and bit 60 in one instruction? Will compilers (which?) then treat a full 32 bits word times two, a 64 bits word, or will it handle only byte 3 and byte 7 (starting at byte 0) and do the trick on them? Is the barrel shifter part of this?

Aclassifier

Øyvind Teig | Some of my blog notes

Parents
  • Thank you, guys! Your answers have been very helpful! I need to learn more than I think I need to know.

    I have a CSP-type channel based scheduler (Publication details by Øyvind Teig) where signalling on a channel is done by setting a bit in a bitfield. Right now I have 39 channels (synchronous with data, asynch without data=signal and finally timeout signals).

    Also, the selective choice (ALT) implementation for each CSP process needs a bitfield that bit-by-bit matches the channel bitfield. This holds the set of channels that's present in the ALT set and then contains a mask that's used to clear all those bits when one guard of the ALT is taken.

    With an 8 bit processor I have used byte_8, int_16, long_32 or long_long_64 (all used as unsigned), automatically handled with width dependent macros. For single bit handling there are several combinations of setting, testing and clearing with dynamic index and several with constant bit index. Then there is masking with dynamic or constant mask. Then our compiler on some of these cases shoots directly on the bit, which I have studied, and for some cases a small dynamic bit handling library was written. And some times it takes all 8 bytes in, clears one bit of them and writes all 8 bytes back!-(

    When recompiling this system for the ARM I am sure there would be special cases too. What I learn from you is that I should disregard byte_8 and int_16 (with 39 channels those cases wouldn't have been seen anyhow). I have not done any assembly coding for this (sorry, I forgot to tell), so I would basically rely on the compiler. Also I think I have learned that there would be differences with regard to processors.

    None of you triggered on the mask(?)-array that I think was present on the STR91xF ARM9?

    May I ask what your gut feeling on M0+ vs M3/M4 architectures would be?

    Best regards

    Øyvind Teig, Trondheim, Norway

Reply
  • Thank you, guys! Your answers have been very helpful! I need to learn more than I think I need to know.

    I have a CSP-type channel based scheduler (Publication details by Øyvind Teig) where signalling on a channel is done by setting a bit in a bitfield. Right now I have 39 channels (synchronous with data, asynch without data=signal and finally timeout signals).

    Also, the selective choice (ALT) implementation for each CSP process needs a bitfield that bit-by-bit matches the channel bitfield. This holds the set of channels that's present in the ALT set and then contains a mask that's used to clear all those bits when one guard of the ALT is taken.

    With an 8 bit processor I have used byte_8, int_16, long_32 or long_long_64 (all used as unsigned), automatically handled with width dependent macros. For single bit handling there are several combinations of setting, testing and clearing with dynamic index and several with constant bit index. Then there is masking with dynamic or constant mask. Then our compiler on some of these cases shoots directly on the bit, which I have studied, and for some cases a small dynamic bit handling library was written. And some times it takes all 8 bytes in, clears one bit of them and writes all 8 bytes back!-(

    When recompiling this system for the ARM I am sure there would be special cases too. What I learn from you is that I should disregard byte_8 and int_16 (with 39 channels those cases wouldn't have been seen anyhow). I have not done any assembly coding for this (sorry, I forgot to tell), so I would basically rely on the compiler. Also I think I have learned that there would be differences with regard to processors.

    None of you triggered on the mask(?)-array that I think was present on the STR91xF ARM9?

    May I ask what your gut feeling on M0+ vs M3/M4 architectures would be?

    Best regards

    Øyvind Teig, Trondheim, Norway

Children
  • Unfortunately, I've not worked with ARM9, so I do not know the mask-array feature.

    (I'll admit that it took me a while to find out that CSP is an abbreviation of Communicating Sequential Processes)

    I was thinking a bit about masks in GPIO-registers, but for some reason, I did not mention them.

    Many of NXP's microcontrollers allow you to set a mask for the GPIO pins. I mention these, because some of them supports 32 pin (32-bit) GPIO ports. I do not know whether or not this is useful, however, in addition to this mask, the GPIO pins also have atomic access set and clear registers (some allow for toggling as well). So far, I believe NXP's LPC175x-LPC178x, LPC18xx, LPC43xx and LPC541xx have the quickest I/O ports that support 32 pins per port.

    You might not need to use any pins on the microcontroller, but you could still use these registers as '32-bit RAM'. As far as I know, Microchip also makes microcontrollers that support 32-pin (32-bit) ports.

    Regarding using the Cortex-M0; if you need real fast access, then the Cortex-M0 might be too limited.

    By now, you probably know that ...

    • The Cortex-M0 and Cortex-M0+ instruction sets are only 16-bit.
    • The Cortex-M3 has all the Cortex-M0/Cortex-M0+ instructions, plus a bunch of extra instructions.
    • The Cortex-M4 has all the Cortex-M3 instructions, plus some neat DSP functions.
    • The Cortex-M4F (with floating point unit) has all the Cortex-M4 instructions + 32-bit floating point instructions.
    • The Cortex-M7 has all the Cortex-M4 instructions + 64-bit floating point.

    In addition, the Cortex-M7 is basically 1.63 times as fast per MHz as the Cortex-M4 (my estimation).

    If you code in assembly-language, you might be able to get a performance that's twice as fast per MHz than if you run the code on the Cortex-M4.

    Some of the Cortex-M4 and Cortex-M7 DSP instructions might be interesting for you as well. The UXTA and SXTA instructions can extract an 8- or 16-bit value from one register and add it to another register. The operation includes rotating the source register first.

    Even though the Cortex-M0 only has a 16-bit instruction set, it's still able to work on 32-bit integers, but since the instruction set does not allow for the same barrel-shifter tricks and conditional instruction execution, the code will be larger and slower.

    However, some Cortex-M0/Cortex-M0+ implementations include Bit-Banding. The Bit-Banding is an optional feature, that the vendors may include if they wish. Bit-Banding is particular useful when the microcontroller has more than a single core (for instance a Cortex-M4 + a Cortex-M0 core), as Bit-Banding allows for atomic operations.

  • Hi aclassifier,

    my good feeling regarding M0+ vs M3/M4 is compact and low power. Compared with M0, M0+ is high performance because of the shorter stage pipeline and the single cycle I/O (is used for GPIO). As for the bit manipulation, MCU vendors would add some complementations to a chip. For example Kinetis L series (of which CPU is M0+) has BME (Bit Manipulation Engile). BME can handle both single bit and bit fields. Although I am not an agent of the freescale, I have good impressions on Kinetis L series.


    Best regards,
    Yasuhiko Koumoto.