We are running a survey to help us improve the experience for all of our members. If you see the survey appear, please take the time to tell us about your experience if you can.
Hi, I'm searching of an optimization of the following code:
void prepareData(uint16_t* dataOut, uint16_t* dataIn, uint32_t length) { uint32_t i; for (i = 0; i < length; i += 2) { dataOut[i] = (dataIn[i+1] >> 4) & 0x03FF; dataOut[i+1] = (dataIn[i] >> 4) & 0x03FF; } }
It's just swapping 2 16-bit words. shifting them by 4 and setting the upper 6 bits to 0. I already tried the hints from http://www.keil.com/support/man/docs/armcc/armcc_cjajacch.htm . But its getting slower with decrementing counter.
It's taking about 50ms (55ms with decrementing counter) for a length of 350000. Target: AT91SAM9260, executed from external RAM.
This is definitely the right approach, Mike. Better still, if data were 8 word aligned, since that is the size of a cache line in ARM926. Assuming the data cache has been enabled.
However, you can still shave off a few cycles inside the loop by parallelizing operations. Fortunately the task is rather well suited to this.
void prepareDataMH(uint16_t* dataOut, uint16_t* dataIn, uint32_t length) { int32_t i; uint32_t tmp; uint32_t *dataIn_pair = (uint32_t *)dataIn; uint32_t *dataOut_pair = (uint32_t *)dataOut; for (i = (length/2)-1; i >= 0; i--) { tmp = (dataIn_pair[i] >> 4) & 0x03FF03FF; dataOut_pair[i] = (tmp >> 16) | (tmp << 16); } }
Regards Marcus http://www.doulos.com/arm/