Hello,
I am trying to optimize the implementation of a FIR filter. One major improvement would be to use LDM for loading all data/coefficients to minimize the amount of cycles used for memory access.
How can I get the compiler do this consistently ?
long fir_coef[] = ...; long dbuf = ...; long out_buf; void test(void) { register long c1, c2, c3, c4; register long d1, d2, d3, d4; registerlong accu; c1 = fir_coef[0]; c2 = fir_coef[1]; c3 = fir_coef[2]; c4 = fir_coef[3]; d1 = dbuf[0]; d2 = dbuf[1]; d3 = dbuf[2]; d4 = dbuf[3]; accu = 0; accu += c1 * d1; accu += c2 * d2; accu += c3 * d3; accu += c4 * d4; out_buf = accu; } seems to use LDM sporadically to load two registers at once (at -O3), but ideally I would like to see only two LDM instructions in the above code. Can this be done in C, or is it time to get out the assembler ?