This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Unrolling a loop

Hello,

The following code is used to calculate the abs (sqrt (i^2+q^2)) of a complex float vector. It runs under cortex-a53 

I compiled the code with: -O2 -mcpu=cortex-a53

I trying to improve the performance by unrolling the loop. Each iteration now works on 4 vectors (float32 x 4) 

It seems that the unrolled code works 15% faster.

Can I improve it by mixing the sequence of the load - calc - store ?

Thank you,

Zvika 

0