This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

How to "optimize" self-defined 32bit shift-left routine ?

Dear all,
I encountered one problem about "executing self-defined 32bit shift-left routine too long"...

That is, I built my own routine to execute 32-bit shift left operation and found it takes too long(Due to some reason 32bit shift left is NOT allowed in my project currently...)

[code body]

UINT32 ShiftLeft32Bits(UINT32 Operand, UINT8 ShiftValue){
        bit     Carry;
        OperandDW = Operand;
        while(ShiftValue>0){
                Carry=0;
                if(OperandDWLo&0x8000)
                        Carry=1;
                OperandDWLo<<=1;
                OperandDWHi = (OperandDWHi<<1) | Carry;
                ShiftValue-=1;
        }
        return OperandDW;
}

[Statistics]
if "Operand = ShiftLeft32Bits(x1, 31);"
=> cost 60us ( where Operand,x1 are 4-byte xdata variables)

My question is: is it possible to "optimize" above code segments(favor speed) ? Thanks in advance...

p.s I am implementing sha256 calculation...

Parents

0 Mike Kleshov over 15 years ago in reply to Christoph Franck

Is it some kind of strange DSP architecture with a not-so-standard C compiler?
Hopefully, arbitrary shifts of 16-bit values are allowed? You could try the following optimization: rather than shifting 1 bit at a time in a loop, shift the 16-bit halves by the full amount and perform a multiple-bit carry. This will require several if-elseif clauses and some careful thinking but should prove faster for the general case.
Cancel
Vote up 0 Vote down

Cancel

Reply

0 Mike Kleshov over 15 years ago in reply to Christoph Franck

Is it some kind of strange DSP architecture with a not-so-standard C compiler?
Hopefully, arbitrary shifts of 16-bit values are allowed? You could try the following optimization: rather than shifting 1 bit at a time in a loop, shift the 16-bit halves by the full amount and perform a multiple-bit carry. This will require several if-elseif clauses and some careful thinking but should prove faster for the general case.
Cancel
Vote up 0 Vote down

Cancel

Children

0 Andy Neil over 15 years ago in reply to Mike Kleshov

Take a look at: http://www.keil.com/forum/docs/thread1252.asp

I would recommend that you make proper assembler modules - rather than inline - but that should certainly give you a starting point...
Cancel
Vote up 0 Vote down

Cancel

0 Jimmy liao over 15 years ago in reply to Mike Kleshov

Thanks for all your comments. After utilizing what you mentioned I got good result. Below is my routine for executing 32-bit shift left operation:

code UINT16 AndMask16bit[8]=
{
 0x0000,
 0x0001,0x0003,0x0007,0x000f,
 0x001f,0x003f,0x007f,
};

UINT32 ShiftLeft32Bits(UINT32 Operand, UINT8 ShiftValue)
{
        UINT8 remain;
        OperandDW = Operand;

        if(ShiftValue>=16)
        {
        remain = ShiftValue - 16;

        // shift left 16-bit first
        OperandDWHi = OperandDWLo;
        OperandDWLo = 0;

        // shift left with remaining bit number (<16)
        OperandDWHi = OperandDWHi << remain;
        }
        else if(ShiftValue>=8)
        {
        remain = ShiftValue - 8;
        // shift left 8 bit first
        OperandDWHiB3 = OperandDWHiB2;
        OperandDWHiB2 = OperandDWHiB1;
        OperandDWHiB1 = OperandDWHiB0;
        OperandDWHiB0 = 0;

        // shift left with remaining bit number (<8)

        // 1. shift high 16-bit first
        OperandDWHi = OperandDWHi << remain;

        // 2. rotate left low-16 bit
        OperandDWLo = _irol_(OperandDWLo,remain);

        // 3. get the part belonging to high 16bit
        OperandDWHi |= (OperandDWLo & (AndMask16bit[remain]));

        // 4. get the part belonging to low-16 bit
        OperandDWLo = OperandDWLo & ~(AndMask16bit[remain]);

        // end...

        }
        else if(ShiftValue>0)
        {
        // shift left high 16bit first
        OperandDWHi = OperandDWHi << ShiftValue;

        // shift left low 16bit
        OperandDWLo = _irol_(OperandDWLo,ShiftValue);

        // get the part belonging to high 16bit
        OperandDWHi |= (OperandDWLo & (AndMask16bit[ShiftValue]));

        // get the part belonging to low-16 bit
        OperandDWLo = OperandDWLo & ~(AndMask16bit[ShiftValue]);
        }

        return  OperandDW;
}

F.Y.I

0 Non Keil Related over 15 years ago in reply to Jimmy liao

I must be missing something of the plot here.

That looks like one of the ugliest shift left routines I think I've ever seen.
Cancel
Vote up 0 Vote down

Cancel
0 Mike Kleshov over 15 years ago in reply to Non Keil Related

I must be missing something of the plot here.

This would be 'Bad decision at early design stage, have to pay a high price now.'
Why are early design decisions important? See here:
www.astrodigital.org/.../stshorse.html
Cancel
Vote up 0 Vote down

Cancel
0 Jimmy liao over 15 years ago in reply to Mike Kleshov
I found one strange thing after investigation:

#1 In my current project 32bit shift right is ok.When I wrote:

x = (( ByteNumLo >> 31 ) & 0x01 ) + (( z >> 31 ) & 0x01 );

I found above example cost about 22us...

#2 If using self-defined routine like previous post:

x = (ShiftRight32Bits(ByteNumLo)&0x01)+(ShiftRight32Bits(z)&0x01);

I found it cost 9us...

Thus I think in some special cases self-defined routine is better than that in ?C?LIB_CODE even we consider it in early design...

F.Y.I
Cancel
Vote up 0 Vote down

Cancel
0 Non Keil Related over 15 years ago in reply to Jimmy liao

And just think what it could do it in if you had written in assembler!
Cancel
Vote up 0 Vote down

Cancel
0 Andy Neil over 15 years ago in reply to Non Keil Related

Hmmmm.... that does very much rely on the skill of the particular assembly programmer...
Cancel
Vote up 0 Vote down

Cancel
0 Christoph Franck over 15 years ago in reply to Jimmy liao

Thus I think in some special cases self-defined routine is better than that in ?C?LIB_CODE even we consider it in early design...<p>

That's probably due to doing byte-wise shifts first.

This, of course, results in slightly larger code _and_ slightly more cycles in certain cases, but if the shift_value is evenly distributed, the optimization lowers the average cycle count.
Cancel
Vote up 0 Vote down

Cancel