We are running a survey to help us improve the experience for all of our members. If you see the survey appear, please take the time to tell us about your experience if you can.
Dear all, I encountered one problem about "executing self-defined 32bit shift-left routine too long"...
That is, I built my own routine to execute 32-bit shift left operation and found it takes too long(Due to some reason 32bit shift left is NOT allowed in my project currently...)
[code body]
UINT32 ShiftLeft32Bits(UINT32 Operand, UINT8 ShiftValue){ bit Carry; OperandDW = Operand; while(ShiftValue>0){ Carry=0; if(OperandDWLo&0x8000) Carry=1; OperandDWLo<<=1; OperandDWHi = (OperandDWHi<<1) | Carry; ShiftValue-=1; } return OperandDW; }
[Statistics] if "Operand = ShiftLeft32Bits(x1, 31);" => cost 60us ( where Operand,x1 are 4-byte xdata variables)
My question is: is it possible to "optimize" above code segments(favor speed) ? Thanks in advance...
p.s I am implementing sha256 calculation...
(Due to some reason 32bit shift left is NOT allowed in my project currently...)
Why wouldn't
operand = operand << shift_value;
be "NOT allowed"? That doesn't make sense. The shift operator is a basic component of the C language.
Anyway. If you want to "roll your own" shift routine, here are my two cents:
1. Write it in assembly. That way, at least you have direct access to the carry flag.
and/or
2. If you have a general idea of the distribution of values of shift_value, you might be able to reduce the average time consumed by the shift routine by doing byte-wise shifts first as long as shift_value is larger than 7. A left-shift of 20 bits would therefore be composed of two byte-wise left shifts and four single-bit left shifts.
Is it some kind of strange DSP architecture with a not-so-standard C compiler? Hopefully, arbitrary shifts of 16-bit values are allowed? You could try the following optimization: rather than shifting 1 bit at a time in a loop, shift the 16-bit halves by the full amount and perform a multiple-bit carry. This will require several if-elseif clauses and some careful thinking but should prove faster for the general case.
Take a look at: http://www.keil.com/forum/docs/thread1252.asp
I would recommend that you make proper assembler modules - rather than inline - but that should certainly give you a starting point...
Thanks for all your comments. After utilizing what you mentioned I got good result. Below is my routine for executing 32-bit shift left operation:
code UINT16 AndMask16bit[8]= { 0x0000, 0x0001,0x0003,0x0007,0x000f, 0x001f,0x003f,0x007f, }; UINT32 ShiftLeft32Bits(UINT32 Operand, UINT8 ShiftValue) { UINT8 remain; OperandDW = Operand; if(ShiftValue>=16) { remain = ShiftValue - 16; // shift left 16-bit first OperandDWHi = OperandDWLo; OperandDWLo = 0; // shift left with remaining bit number (<16) OperandDWHi = OperandDWHi << remain; } else if(ShiftValue>=8) { remain = ShiftValue - 8; // shift left 8 bit first OperandDWHiB3 = OperandDWHiB2; OperandDWHiB2 = OperandDWHiB1; OperandDWHiB1 = OperandDWHiB0; OperandDWHiB0 = 0; // shift left with remaining bit number (<8) // 1. shift high 16-bit first OperandDWHi = OperandDWHi << remain; // 2. rotate left low-16 bit OperandDWLo = _irol_(OperandDWLo,remain); // 3. get the part belonging to high 16bit OperandDWHi |= (OperandDWLo & (AndMask16bit[remain])); // 4. get the part belonging to low-16 bit OperandDWLo = OperandDWLo & ~(AndMask16bit[remain]); // end... } else if(ShiftValue>0) { // shift left high 16bit first OperandDWHi = OperandDWHi << ShiftValue; // shift left low 16bit OperandDWLo = _irol_(OperandDWLo,ShiftValue); // get the part belonging to high 16bit OperandDWHi |= (OperandDWLo & (AndMask16bit[ShiftValue])); // get the part belonging to low-16 bit OperandDWLo = OperandDWLo & ~(AndMask16bit[ShiftValue]); } return OperandDW; }
F.Y.I
I must be missing something of the plot here.
That looks like one of the ugliest shift left routines I think I've ever seen.
This would be 'Bad decision at early design stage, have to pay a high price now.' Why are early design decisions important? See here: www.astrodigital.org/.../stshorse.html
I found one strange thing after investigation:
#1 In my current project 32bit shift right is ok.When I wrote:
x = (( ByteNumLo >> 31 ) & 0x01 ) + (( z >> 31 ) & 0x01 );
I found above example cost about 22us...
#2 If using self-defined routine like previous post:
x = (ShiftRight32Bits(ByteNumLo)&0x01)+(ShiftRight32Bits(z)&0x01);
I found it cost 9us...
Thus I think in some special cases self-defined routine is better than that in ?C?LIB_CODE even we consider it in early design...
And just think what it could do it in if you had written in assembler!
Hmmmm.... that does very much rely on the skill of the particular assembly programmer...
Thus I think in some special cases self-defined routine is better than that in ?C?LIB_CODE even we consider it in early design...<p>
That's probably due to doing byte-wise shifts first.
This, of course, results in slightly larger code _and_ slightly more cycles in certain cases, but if the shift_value is evenly distributed, the optimization lowers the average cycle count.