This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Optimizing specific code

Hi,
I'm searching of an optimization of the following code:

void prepareData(uint16_t* dataOut, uint16_t* dataIn, uint32_t length)
{
        uint32_t i;
        for (i = 0; i < length; i += 2)
        {
                dataOut[i] = (dataIn[i+1] >> 4) & 0x03FF;
                dataOut[i+1] = (dataIn[i] >> 4) & 0x03FF;
        }
}


It's just swapping 2 16-bit words. shifting them by 4 and setting the upper 6 bits to 0.
I already tried the hints from http://www.keil.com/support/man/docs/armcc/armcc_cjajacch.htm . But its getting slower with decrementing counter.

It's taking about 50ms (55ms with decrementing counter) for a length of 350000.
Target: AT91SAM9260, executed from external RAM.

Parents
  • This is definitely the right approach, Mike. Better still, if data were 8 word aligned, since that is the size of a cache line in ARM926. Assuming the data cache has been enabled.

    However, you can still shave off a few cycles inside the loop by parallelizing operations. Fortunately the task is rather well suited to this.

    void prepareDataMH(uint16_t* dataOut, uint16_t* dataIn, uint32_t length)
    {
        int32_t  i;
        uint32_t tmp;
        uint32_t *dataIn_pair  = (uint32_t *)dataIn;
        uint32_t *dataOut_pair = (uint32_t *)dataOut;
    
        for (i = (length/2)-1; i >= 0; i--)
        {
            tmp             = (dataIn_pair[i] >> 4) & 0x03FF03FF;
            dataOut_pair[i] = (tmp >> 16) | (tmp << 16);
        }
    }
    

    Regards
    Marcus
    http://www.doulos.com/arm/

Reply
  • This is definitely the right approach, Mike. Better still, if data were 8 word aligned, since that is the size of a cache line in ARM926. Assuming the data cache has been enabled.

    However, you can still shave off a few cycles inside the loop by parallelizing operations. Fortunately the task is rather well suited to this.

    void prepareDataMH(uint16_t* dataOut, uint16_t* dataIn, uint32_t length)
    {
        int32_t  i;
        uint32_t tmp;
        uint32_t *dataIn_pair  = (uint32_t *)dataIn;
        uint32_t *dataOut_pair = (uint32_t *)dataOut;
    
        for (i = (length/2)-1; i >= 0; i--)
        {
            tmp             = (dataIn_pair[i] >> 4) & 0x03FF03FF;
            dataOut_pair[i] = (tmp >> 16) | (tmp << 16);
        }
    }
    

    Regards
    Marcus
    http://www.doulos.com/arm/

Children
No data