We are running a survey to help us improve the experience for all of our members. If you see the survey appear, please take the time to tell us about your experience if you can.
Dear all, I encountered one problem about "executing self-defined 32bit shift-left routine too long"...
That is, I built my own routine to execute 32-bit shift left operation and found it takes too long(Due to some reason 32bit shift left is NOT allowed in my project currently...)
[code body]
UINT32 ShiftLeft32Bits(UINT32 Operand, UINT8 ShiftValue){ bit Carry; OperandDW = Operand; while(ShiftValue>0){ Carry=0; if(OperandDWLo&0x8000) Carry=1; OperandDWLo<<=1; OperandDWHi = (OperandDWHi<<1) | Carry; ShiftValue-=1; } return OperandDW; }
[Statistics] if "Operand = ShiftLeft32Bits(x1, 31);" => cost 60us ( where Operand,x1 are 4-byte xdata variables)
My question is: is it possible to "optimize" above code segments(favor speed) ? Thanks in advance...
p.s I am implementing sha256 calculation...
The approach is described here: www.8052.com/.../149030
What reason, exactly?
It's impossible to give appropriate suggestions when you don't supply all the relevant information!
What is the time "cost" of just using the standard 'C' shift operator?
(Due to some reason 32bit shift left is NOT allowed in my project currently...)
Why wouldn't
operand = operand << shift_value;
be "NOT allowed"? That doesn't make sense. The shift operator is a basic component of the C language.
Anyway. If you want to "roll your own" shift routine, here are my two cents:
1. Write it in assembly. That way, at least you have direct access to the carry flag.
and/or
2. If you have a general idea of the distribution of values of shift_value, you might be able to reduce the average time consumed by the shift routine by doing byte-wise shifts first as long as shift_value is larger than 7. A left-shift of 20 bits would therefore be composed of two byte-wise left shifts and four single-bit left shifts.
Dear all, The reason why 32-bit shift left(<<) is not allowed in my project is:
#1 our code is composed of 2 parts; one is stored in ROM and the other is in SRAM. #2 if use operator << within code then after compile/link the ROM part changed and this is not allowed because ?C?LIB_CODE is bigger than before...
* ?C?LIB_CODE is assigned in ROM part always in my project
#2 causes mismatch between original ROM part & new SRAM code; then firmware operates abnormally...
To sum up, we did not include lib for 32bit shift-left operation in the early development but now we have to use it...
Is it some kind of strange DSP architecture with a not-so-standard C compiler? Hopefully, arbitrary shifts of 16-bit values are allowed? You could try the following optimization: rather than shifting 1 bit at a time in a loop, shift the 16-bit halves by the full amount and perform a multiple-bit carry. This will require several if-elseif clauses and some careful thinking but should prove faster for the general case.
Take a look at: http://www.keil.com/forum/docs/thread1252.asp
I would recommend that you make proper assembler modules - rather than inline - but that should certainly give you a starting point...
Thanks for all your comments. After utilizing what you mentioned I got good result. Below is my routine for executing 32-bit shift left operation:
code UINT16 AndMask16bit[8]= { 0x0000, 0x0001,0x0003,0x0007,0x000f, 0x001f,0x003f,0x007f, }; UINT32 ShiftLeft32Bits(UINT32 Operand, UINT8 ShiftValue) { UINT8 remain; OperandDW = Operand; if(ShiftValue>=16) { remain = ShiftValue - 16; // shift left 16-bit first OperandDWHi = OperandDWLo; OperandDWLo = 0; // shift left with remaining bit number (<16) OperandDWHi = OperandDWHi << remain; } else if(ShiftValue>=8) { remain = ShiftValue - 8; // shift left 8 bit first OperandDWHiB3 = OperandDWHiB2; OperandDWHiB2 = OperandDWHiB1; OperandDWHiB1 = OperandDWHiB0; OperandDWHiB0 = 0; // shift left with remaining bit number (<8) // 1. shift high 16-bit first OperandDWHi = OperandDWHi << remain; // 2. rotate left low-16 bit OperandDWLo = _irol_(OperandDWLo,remain); // 3. get the part belonging to high 16bit OperandDWHi |= (OperandDWLo & (AndMask16bit[remain])); // 4. get the part belonging to low-16 bit OperandDWLo = OperandDWLo & ~(AndMask16bit[remain]); // end... } else if(ShiftValue>0) { // shift left high 16bit first OperandDWHi = OperandDWHi << ShiftValue; // shift left low 16bit OperandDWLo = _irol_(OperandDWLo,ShiftValue); // get the part belonging to high 16bit OperandDWHi |= (OperandDWLo & (AndMask16bit[ShiftValue])); // get the part belonging to low-16 bit OperandDWLo = OperandDWLo & ~(AndMask16bit[ShiftValue]); } return OperandDW; }
F.Y.I
I must be missing something of the plot here.
That looks like one of the ugliest shift left routines I think I've ever seen.
And that assignment is wrong. It's as simple as that.
You're going at this from entirely the wrong direction. The problem is that you're trying something for which an 8051 is a disastrously wrong choice of CPU architecture, and now you're trying silly things to cure the symptoms, instead of fixing the actual problem.
Splitting programs into a ROM part and variable "add-ons" is extremely tricky business in C on an 8051. I should know --- I've worked in a project where we did this in pure assembly, and that was quite a nightmare already. Getting C51 to jump through those same hoops would be even worse.
You've basically built yourself a solid wall and now you keep running head-first into it instead of considering that it might hurt less if there was a hole to pass through. Might want to call it "door".
This would be 'Bad decision at early design stage, have to pay a high price now.' Why are early design decisions important? See here: www.astrodigital.org/.../stshorse.html
I found one strange thing after investigation:
#1 In my current project 32bit shift right is ok.When I wrote:
x = (( ByteNumLo >> 31 ) & 0x01 ) + (( z >> 31 ) & 0x01 );
I found above example cost about 22us...
#2 If using self-defined routine like previous post:
x = (ShiftRight32Bits(ByteNumLo)&0x01)+(ShiftRight32Bits(z)&0x01);
I found it cost 9us...
Thus I think in some special cases self-defined routine is better than that in ?C?LIB_CODE even we consider it in early design...
I assume the the OP is running into the old problem of having responsibility without authority, i.e. it's his job to fix the problem, but he doesn't have the authority to implement the effective, straightforward solution.
And just think what it could do it in if you had written in assembler!
Hmmmm.... that does very much rely on the skill of the particular assembly programmer...
Thus I think in some special cases self-defined routine is better than that in ?C?LIB_CODE even we consider it in early design...<p>
That's probably due to doing byte-wise shifts first.
This, of course, results in slightly larger code _and_ slightly more cycles in certain cases, but if the shift_value is evenly distributed, the optimization lowers the average cycle count.