We are running a survey to help us improve the experience for all of our members. If you see the survey appear, please take the time to tell us about your experience if you can.
Hi, I'm searching of an optimization of the following code:
void prepareData(uint16_t* dataOut, uint16_t* dataIn, uint32_t length) { uint32_t i; for (i = 0; i < length; i += 2) { dataOut[i] = (dataIn[i+1] >> 4) & 0x03FF; dataOut[i+1] = (dataIn[i] >> 4) & 0x03FF; } }
It's just swapping 2 16-bit words. shifting them by 4 and setting the upper 6 bits to 0. I already tried the hints from http://www.keil.com/support/man/docs/armcc/armcc_cjajacch.htm . But its getting slower with decrementing counter.
It's taking about 50ms (55ms with decrementing counter) for a length of 350000. Target: AT91SAM9260, executed from external RAM.
> I tried to enable it, by setting it to 0x0005107D - > MMU and DCache enabled - but the processor then hangs. > Is there a special proceeding to enable the data cache?
Did you set up a page table at all? The MMU needs one to work properly. Don't forget to initialize cp15,c2 (TTB). RTFTRM ;-)
Looking at the assembler output, I am not sure if the unrolled loop is better than the single-word parallel version that I posted.
Regards Marcus http://www.doulos.com/arm/
PS: -Otime seems to be detrimental to performance (RealView Compiler) of all variants that have been posted here.