How can CLZ equivalent be achieved on Cortex-M0 where this instruction is missing?

Looking for alternates for this instruction.

Parents
  • Hello,

    In CMSIS, Arm_Math.h, there is defined an alternative which you may use.  There are some #defines around it, so if you're having trouble linking it, make sure you have the correct definitions, or just copy and paste __CLZ into your code somewhere.

      static __INLINE uint32_t __CLZ(
      q31_t data)
      {
        uint32_t count = 0;
        uint32_t mask = 0x80000000;
    
        while((data & mask) == 0)
        {
          count += 1u;
          mask = mask >> 1u;
        }
    
        return (count);
    
      }
    

    Cheers,

    Dan

Reply
  • Hello,

    In CMSIS, Arm_Math.h, there is defined an alternative which you may use.  There are some #defines around it, so if you're having trouble linking it, make sure you have the correct definitions, or just copy and paste __CLZ into your code somewhere.

      static __INLINE uint32_t __CLZ(
      q31_t data)
      {
        uint32_t count = 0;
        uint32_t mask = 0x80000000;
    
        while((data & mask) == 0)
        {
          count += 1u;
          mask = mask >> 1u;
        }
    
        return (count);
    
      }
    

    Cheers,

    Dan

Children
  • If you are interested in a better performance that beats the pants out of the brute force solution proposed above, you might use the following implementation:

    static __INLINE uint32_t __CLZ(uint32_t x) {
        extern uint8_t const log2Lkup[256];
    
        if (x >= 0x00010000U) {
            if (x >= 0x01000000U) {
                return 8U - log2Lkup[x >> 24];
            }
            else {
                return 16U - log2Lkup[x >> 16];
            }
        }
        else {
            if (x >= 0x00000100U) {
                return 24U - log2Lkup[x >> 8];
            }
            else {
                return 32U - log2Lkup[x];
            }
        }
    }
    

    The function would need the log2 (binary logarithm) lookup table defined in a .c file:

    uint8_t const log2Lkup[256] = {
      0U, 1U, 2U, 2U, 3U, 3U, 3U, 3U, 4U, 4U, 4U, 4U, 4U, 4U, 4U, 4U,
      5U, 5U, 5U, 5U, 5U, 5U, 5U, 5U, 5U, 5U, 5U, 5U, 5U, 5U, 5U, 5U,
      6U, 6U, 6U, 6U, 6U, 6U, 6U, 6U, 6U, 6U, 6U, 6U, 6U, 6U, 6U, 6U,
      6U, 6U, 6U, 6U, 6U, 6U, 6U, 6U, 6U, 6U, 6U, 6U, 6U, 6U, 6U, 6U,
      7U, 7U, 7U, 7U, 7U, 7U, 7U, 7U, 7U, 7U, 7U, 7U, 7U, 7U, 7U, 7U,
      7U, 7U, 7U, 7U, 7U, 7U, 7U, 7U, 7U, 7U, 7U, 7U, 7U, 7U, 7U, 7U,
      7U, 7U, 7U, 7U, 7U, 7U, 7U, 7U, 7U, 7U, 7U, 7U, 7U, 7U, 7U, 7U,
      7U, 7U, 7U, 7U, 7U, 7U, 7U, 7U, 7U, 7U, 7U, 7U, 7U, 7U, 7U, 7U,
      8U, 8U, 8U, 8U, 8U, 8U, 8U, 8U, 8U, 8U, 8U, 8U, 8U, 8U, 8U, 8U,
      8U, 8U, 8U, 8U, 8U, 8U, 8U, 8U, 8U, 8U, 8U, 8U, 8U, 8U, 8U, 8U,
      8U, 8U, 8U, 8U, 8U, 8U, 8U, 8U, 8U, 8U, 8U, 8U, 8U, 8U, 8U, 8U,
      8U, 8U, 8U, 8U, 8U, 8U, 8U, 8U, 8U, 8U, 8U, 8U, 8U, 8U, 8U, 8U,
      8U, 8U, 8U, 8U, 8U, 8U, 8U, 8U, 8U, 8U, 8U, 8U, 8U, 8U, 8U, 8U,
      8U, 8U, 8U, 8U, 8U, 8U, 8U, 8U, 8U, 8U, 8U, 8U, 8U, 8U, 8U, 8U,
      8U, 8U, 8U, 8U, 8U, 8U, 8U, 8U, 8U, 8U, 8U, 8U, 8U, 8U, 8U, 8U,
      8U, 8U, 8U, 8U, 8U, 8U, 8U, 8U, 8U, 8U, 8U, 8U, 8U, 8U, 8U, 8U
    };
    

    The __CLZ() implementation proposed above does not have loops and is deterministic, meaning that it takes the same number of instructions for all arguments x. It is better than most other algorithms you can find online, including the methods from the "Hacker's Delight" and Anderson's bit twiddling hacks (interested folks can google for these).

    --Miro