We are running a survey to help us improve the experience for all of our members. If you see the survey appear, please take the time to tell us about your experience if you can.
I am a big fan of M0+ but I'm always running into the problem that only ADD,CMP & MOV instructions are available to hi registers with the exception of forming a stack frame. While access to said frame take 2 cycles instead of 1, if a reasonable number of variables are needed, SP is the only way to go. I have spent ages trying to find an efficient assembly version of this:void (void FDCT32 (int *buf, int *dest, int offset, int oddBlock, int gb)0{ a0 = buf[i]; a3 = buf[31-i]; b0 = a0 + a3; b3 = MULSHIFT32(*cptr++, a0 - a3) << (s0);
a1 = buf[15-i]; a2 = buf[16+i]; b1 = a1 + a2; b2 = MULSHIFT32(*cptr++, a1 - a2) << (s1);
buf[i] = b0 + b1; buf[15-i] = MULSHIFT32(*cptr, b0 - b1) << (s2);
buf[16+i] = b2 + b3; buf[31-i] = MULSHIFT32(*cptr++, b3 - b2) << (s2);}
Now MULSHIFT 32 performs a 32-bit x 32-bit multiply but only the top 32-bits of the result are needed, Sadly, it looks like such an algorithm is no faster (unless someone knows one).In the above you will note that I need to pointer registers i.e. *buf & cptr++. I DID consider using the SP as the base-address of cpr++ but even going to those lengths, I find myself running out of registers. I might add that R0-R4 are used by MULSHIFT32 so while I can use them between MULSHIFT instructions, I'm only left with r5,r6 & r7 and at least one of them needs to be a pointer.Do others simply define a stack-frame as manipulating Lo-Hi,Hi-Lo means that it's no faster.I would LIKE to store both the calues 31,15 & the 3 shift values (5 bit) into a single register but once again, it uses a low register.I HAVE spent a lot of time on this as I believe a technique to deal with this snippet will answer questions/problems seen all through the code.Many thanks.