Getting the compiler to consistently use LDM ?

Hello,

I am trying to optimize the implementation of a FIR filter. One major improvement would be to use LDM for loading all data/coefficients to minimize the amount of cycles used for memory access.

How can I get the compiler do this consistently ?

long fir_coef[] = ...;
long dbuf = ...;
long out_buf;

void test(void)
{
register long c1, c2, c3, c4;
register long d1, d2, d3, d4;
registerlong accu;

c1 = fir_coef[0];
c2 = fir_coef[1];
c3 = fir_coef[2];
c4 = fir_coef[3];
d1 = dbuf[0];
d2 = dbuf[1];
d3 = dbuf[2];
d4 = dbuf[3];

accu = 0;
accu += c1 * d1;
accu += c2 * d2;
accu += c3 * d3;
accu += c4 * d4;

out_buf = accu;
}

seems to use LDM sporadically to load two registers at once (at -O3), but ideally I would like to see only two LDM instructions in the above code.

Can this be done in C, or is it time to get out the assembler ?


Parents
  • It is not generally a good idea to declare variables with the register qualifier. The RV compiler is a very good optimizing compiler, and you will hamper its decisions when you force register variables.

    Your code needs 12 registers if it holds everything in the cpu context. The compiler uses some registers to hold memory base pointers and interworking veneers, so it might be deciding not to load all the coefficients at once.

    If you want to go to assembly, you have a few alternatives: a) you can use the inline assembler and code a 'virtual registers' version of the algorithm, since it is a very straightforward computation, or b) you can write it as a assembly function, and use the full cpu registers for the filter computation. The compiler will save the needed registers prior to call the assembly module.

Reply
  • It is not generally a good idea to declare variables with the register qualifier. The RV compiler is a very good optimizing compiler, and you will hamper its decisions when you force register variables.

    Your code needs 12 registers if it holds everything in the cpu context. The compiler uses some registers to hold memory base pointers and interworking veneers, so it might be deciding not to load all the coefficients at once.

    If you want to go to assembly, you have a few alternatives: a) you can use the inline assembler and code a 'virtual registers' version of the algorithm, since it is a very straightforward computation, or b) you can write it as a assembly function, and use the full cpu registers for the filter computation. The compiler will save the needed registers prior to call the assembly module.

Children
More questions in this forum