This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

How to "optimize" self-defined 32bit shift-left routine ?

Dear all,
I encountered one problem about "executing self-defined 32bit shift-left routine too long"...

That is, I built my own routine to execute 32-bit shift left operation and found it takes too long(Due to some reason 32bit shift left is NOT allowed in my project currently...)

[code body]

UINT32 ShiftLeft32Bits(UINT32 Operand, UINT8 ShiftValue){
        bit     Carry;
        OperandDW = Operand;
        while(ShiftValue>0){
                Carry=0;
                if(OperandDWLo&0x8000)
                        Carry=1;
                OperandDWLo<<=1;
                OperandDWHi = (OperandDWHi<<1) | Carry;
                ShiftValue-=1;
        }
        return OperandDW;
}

[Statistics]
if "Operand = ShiftLeft32Bits(x1, 31);"
=> cost 60us ( where Operand,x1 are 4-byte xdata variables)

My question is: is it possible to "optimize" above code segments(favor speed) ? Thanks in advance...

p.s I am implementing sha256 calculation...

Parents
  • Is it some kind of strange DSP architecture with a not-so-standard C compiler?
    Hopefully, arbitrary shifts of 16-bit values are allowed? You could try the following optimization: rather than shifting 1 bit at a time in a loop, shift the 16-bit halves by the full amount and perform a multiple-bit carry. This will require several if-elseif clauses and some careful thinking but should prove faster for the general case.

Reply
  • Is it some kind of strange DSP architecture with a not-so-standard C compiler?
    Hopefully, arbitrary shifts of 16-bit values are allowed? You could try the following optimization: rather than shifting 1 bit at a time in a loop, shift the 16-bit halves by the full amount and perform a multiple-bit carry. This will require several if-elseif clauses and some careful thinking but should prove faster for the general case.

Children
  • Take a look at: http://www.keil.com/forum/docs/thread1252.asp

    I would recommend that you make proper assembler modules - rather than inline - but that should certainly give you a starting point...

  • Thanks for all your comments. After utilizing what you mentioned I got good result. Below is my routine for executing 32-bit shift left operation:

    code UINT16 AndMask16bit[8]=
    {
     0x0000,
     0x0001,0x0003,0x0007,0x000f,
     0x001f,0x003f,0x007f,
    };
    
    UINT32 ShiftLeft32Bits(UINT32 Operand, UINT8 ShiftValue)
    {
            UINT8 remain;
            OperandDW = Operand;
    
            if(ShiftValue>=16)
            {
            remain = ShiftValue - 16;
    
            // shift left 16-bit first
            OperandDWHi = OperandDWLo;
            OperandDWLo = 0;
    
            // shift left with remaining bit number (<16)
            OperandDWHi = OperandDWHi << remain;
            }
            else if(ShiftValue>=8)
            {
            remain = ShiftValue - 8;
            // shift left 8 bit first
            OperandDWHiB3 = OperandDWHiB2;
            OperandDWHiB2 = OperandDWHiB1;
            OperandDWHiB1 = OperandDWHiB0;
            OperandDWHiB0 = 0;
    
            // shift left with remaining bit number (<8)
    
            // 1. shift high 16-bit first
            OperandDWHi = OperandDWHi << remain;
    
            // 2. rotate left low-16 bit
            OperandDWLo = _irol_(OperandDWLo,remain);
    
            // 3. get the part belonging to high 16bit
            OperandDWHi |= (OperandDWLo & (AndMask16bit[remain]));
    
            // 4. get the part belonging to low-16 bit
            OperandDWLo = OperandDWLo & ~(AndMask16bit[remain]);
    
            // end...
    
            }
            else if(ShiftValue>0)
            {
            // shift left high 16bit first
            OperandDWHi = OperandDWHi << ShiftValue;
    
            // shift left low 16bit
            OperandDWLo = _irol_(OperandDWLo,ShiftValue);
    
            // get the part belonging to high 16bit
            OperandDWHi |= (OperandDWLo & (AndMask16bit[ShiftValue]));
    
            // get the part belonging to low-16 bit
            OperandDWLo = OperandDWLo & ~(AndMask16bit[ShiftValue]);
            }
    
            return  OperandDW;
    }
    

    F.Y.I

  • I must be missing something of the plot here.

    That looks like one of the ugliest shift left routines I think I've ever seen.

  • I must be missing something of the plot here.

    This would be 'Bad decision at early design stage, have to pay a high price now.'
    Why are early design decisions important? See here:
    www.astrodigital.org/.../stshorse.html

  • I found one strange thing after investigation:

    #1 In my current project 32bit shift right is ok.When I wrote:

    x = (( ByteNumLo >> 31 ) & 0x01 ) + (( z >> 31 ) & 0x01 );
    


    I found above example cost about 22us...

    #2 If using self-defined routine like previous post:

    x = (ShiftRight32Bits(ByteNumLo)&0x01)+(ShiftRight32Bits(z)&0x01);
    


    I found it cost 9us...

    Thus I think in some special cases self-defined routine is better than that in ?C?LIB_CODE even we consider it in early design...

    F.Y.I

  • And just think what it could do it in if you had written in assembler!

  • Hmmmm.... that does very much rely on the skill of the particular assembly programmer...

  • Thus I think in some special cases self-defined routine is better than that in ?C?LIB_CODE even we consider it in early design...<p>

    That's probably due to doing byte-wise shifts first.

    This, of course, results in slightly larger code _and_ slightly more cycles in certain cases, but if the shift_value is evenly distributed, the optimization lowers the average cycle count.