How can CLZ equivalent be achieved on Cortex-M0 where this instruction is missing?

Looking for alternates for this instruction.

Parents
  • It is worth noting that the CLZ implementation as shown is never called with, and does not support being called with, the value zero within CMSIS.

    The Cortex-M0 supports register based shifts, which does permit reasonably fast and deterministic implementations without requiring look-up tables, for example:

    uint32_t clz(uint32_t data)
    {
      uint32_t count, shift, value;
    
      count = 31;                       //    MOVS Rc,#31
    
    #ifdef CLZ_SUPPORT_ZERO
      if(data == 0)                     //    CMP  Rd,#0
        {                               //    BNE  %1
          count = 32;                   //    MOVS Rc,#32
        }                               // 1:
    #endif
    
      for(shift=16;shift;shift>>=1)     //    MOVS Rs,#16
        {                               // 2:              <--+
          value = data >> shift;        //    MOV  Rv,Rd      |
                                        //    LSRS Rv,Rs      |
          if(value) {                   //    BEQ  %3         |
            data = value;               //    MOV  Rd,Rv     x5
            count = count - shift;      //    SUBS Rc,Rs      |
          }                             // 3:                 |
        }                               //    LSRS Rs,#1      |
                                        //    BNE  %2     ----+
    
      return count;                     //    MOV  Rd,Rc
    }                                   //    BX   lr
    

    Support for correctly returning a result of 32 for the value zero can be enabled by defining CLZ_SUPPORT_ZERO.

    The best choice of implementation will depend on your particular constraints and anticipated data-set. Dependent on input value, on a Cortex-M0+, the above code should take between 41 and 47 cycles or 38 and 44 cycles, with or without zero support respectively, and consume between 22 and 28 bytes; the CMSIS code should take between 10 and 137 cycles and require 24 bytes; Miro's suggestion should always take around 17 cycles, but consume around 308 bytes.

    hth

    Simon.

Reply
  • It is worth noting that the CLZ implementation as shown is never called with, and does not support being called with, the value zero within CMSIS.

    The Cortex-M0 supports register based shifts, which does permit reasonably fast and deterministic implementations without requiring look-up tables, for example:

    uint32_t clz(uint32_t data)
    {
      uint32_t count, shift, value;
    
      count = 31;                       //    MOVS Rc,#31
    
    #ifdef CLZ_SUPPORT_ZERO
      if(data == 0)                     //    CMP  Rd,#0
        {                               //    BNE  %1
          count = 32;                   //    MOVS Rc,#32
        }                               // 1:
    #endif
    
      for(shift=16;shift;shift>>=1)     //    MOVS Rs,#16
        {                               // 2:              <--+
          value = data >> shift;        //    MOV  Rv,Rd      |
                                        //    LSRS Rv,Rs      |
          if(value) {                   //    BEQ  %3         |
            data = value;               //    MOV  Rd,Rv     x5
            count = count - shift;      //    SUBS Rc,Rs      |
          }                             // 3:                 |
        }                               //    LSRS Rs,#1      |
                                        //    BNE  %2     ----+
    
      return count;                     //    MOV  Rd,Rc
    }                                   //    BX   lr
    

    Support for correctly returning a result of 32 for the value zero can be enabled by defining CLZ_SUPPORT_ZERO.

    The best choice of implementation will depend on your particular constraints and anticipated data-set. Dependent on input value, on a Cortex-M0+, the above code should take between 41 and 47 cycles or 38 and 44 cycles, with or without zero support respectively, and consume between 22 and 28 bytes; the CMSIS code should take between 10 and 137 cycles and require 24 bytes; Miro's suggestion should always take around 17 cycles, but consume around 308 bytes.

    hth

    Simon.

Children