We are running a survey to help us improve the experience for all of our members. If you see the survey appear, please take the time to tell us about your experience if you can.
Dear all, I encountered one problem about "executing self-defined 32bit shift-left routine too long"...
That is, I built my own routine to execute 32-bit shift left operation and found it takes too long(Due to some reason 32bit shift left is NOT allowed in my project currently...)
[code body]
UINT32 ShiftLeft32Bits(UINT32 Operand, UINT8 ShiftValue){ bit Carry; OperandDW = Operand; while(ShiftValue>0){ Carry=0; if(OperandDWLo&0x8000) Carry=1; OperandDWLo<<=1; OperandDWHi = (OperandDWHi<<1) | Carry; ShiftValue-=1; } return OperandDW; }
[Statistics] if "Operand = ShiftLeft32Bits(x1, 31);" => cost 60us ( where Operand,x1 are 4-byte xdata variables)
My question is: is it possible to "optimize" above code segments(favor speed) ? Thanks in advance...
p.s I am implementing sha256 calculation...
I must be missing something of the plot here.
This would be 'Bad decision at early design stage, have to pay a high price now.' Why are early design decisions important? See here: www.astrodigital.org/.../stshorse.html
I found one strange thing after investigation:
#1 In my current project 32bit shift right is ok.When I wrote:
x = (( ByteNumLo >> 31 ) & 0x01 ) + (( z >> 31 ) & 0x01 );
I found above example cost about 22us...
#2 If using self-defined routine like previous post:
x = (ShiftRight32Bits(ByteNumLo)&0x01)+(ShiftRight32Bits(z)&0x01);
I found it cost 9us...
Thus I think in some special cases self-defined routine is better than that in ?C?LIB_CODE even we consider it in early design...
F.Y.I
And just think what it could do it in if you had written in assembler!
Hmmmm.... that does very much rely on the skill of the particular assembly programmer...
Thus I think in some special cases self-defined routine is better than that in ?C?LIB_CODE even we consider it in early design...<p>
That's probably due to doing byte-wise shifts first.
This, of course, results in slightly larger code _and_ slightly more cycles in certain cases, but if the shift_value is evenly distributed, the optimization lowers the average cycle count.