This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Getting the compiler to consistently use LDM ?

Hello,

I am trying to optimize the implementation of a FIR filter. One major improvement would be to use LDM for loading all data/coefficients to minimize the amount of cycles used for memory access.

How can I get the compiler do this consistently ?

long fir_coef[] = ...;
long dbuf = ...;
long out_buf;

void test(void)
{
register long c1, c2, c3, c4;
register long d1, d2, d3, d4;
registerlong accu;

c1 = fir_coef[0];
c2 = fir_coef[1];
c3 = fir_coef[2];
c4 = fir_coef[3];
d1 = dbuf[0];
d2 = dbuf[1];
d3 = dbuf[2];
d4 = dbuf[3];

accu = 0;
accu += c1 * d1;
accu += c2 * d2;
accu += c3 * d3;
accu += c4 * d4;

out_buf = accu;
}

seems to use LDM sporadically to load two registers at once (at -O3), but ideally I would like to see only two LDM instructions in the above code.

Can this be done in C, or is it time to get out the assembler ?

Parents

0 Christoph Franck over 18 years ago in reply to Jonny Doin
It is not generally a good idea to declare variables with the register qualifier. The RV compiler is a very good optimizing compiler, and you will hamper its decisions when you force register variables.<p>

I believe the compiler will ignore the register qualifier anyway. I put it there to show which variables I'd like to see put in registers.

If you want to go to assembly, you have a few alternatives: a) you can use the inline assembler and code a 'virtual registers' version of the algorithm, since it is a very straightforward computation,

Is there any explanation in the docs about these "virtual registers" ? Any time I try to use inline assembly, I keep running into

main01.c(242): warning: #d1267-D: Implicit physical register R0 should be defined as a variable

warnings, and an

main01.c(242): error #549: variable "R0" is used before its value is set

error when I try to access one of the function arguments which should be passed in R0. I believe this has something to do with "virtual registers", or do I need to do anything to be able to use the "raw" registers in my inline assembly (like all of the examples seem to do) ?
Cancel
Vote up 0 Vote down

Cancel

Reply

0 Christoph Franck over 18 years ago in reply to Jonny Doin
It is not generally a good idea to declare variables with the register qualifier. The RV compiler is a very good optimizing compiler, and you will hamper its decisions when you force register variables.<p>

I believe the compiler will ignore the register qualifier anyway. I put it there to show which variables I'd like to see put in registers.

If you want to go to assembly, you have a few alternatives: a) you can use the inline assembler and code a 'virtual registers' version of the algorithm, since it is a very straightforward computation,

Is there any explanation in the docs about these "virtual registers" ? Any time I try to use inline assembly, I keep running into

main01.c(242): warning: #d1267-D: Implicit physical register R0 should be defined as a variable

warnings, and an

main01.c(242): error #549: variable "R0" is used before its value is set

error when I try to access one of the function arguments which should be passed in R0. I believe this has something to do with "virtual registers", or do I need to do anything to be able to use the "raw" registers in my inline assembly (like all of the examples seem to do) ?
Cancel
Vote up 0 Vote down

Cancel

Children

0 Christoph Franck over 18 years ago in reply to Christoph Franck

Ok, I think I found it. I should have checked the RealView docs earlier instead of looking at the CARM docs.

Silly me.
Cancel
Vote up 0 Vote down

Cancel
0 Christoph Franck over 18 years ago in reply to Christoph Franck

Ok, after playing around with the source code for a while, the "pure C" version of the filter runs 4% faster than the version with optimized embedded C.

Back to the drawing board.
Cancel
Vote up 0 Vote down

Cancel
0 Per Westermark over 18 years ago in reply to Christoph Franck
It might sometimes be good to do something like:

acc += *a++ * *b++; acc += *a++ * *b++; acc += *a++ * *b++; acc += *a++ * *b++;

when filtering data. A lot depens on what external loops you need, i.e. how much code is part of a filter kernel in comparison to the amount of iterations with sample data.
Cancel
Vote up 0 Vote down

Cancel
0 Christoph Franck over 18 years ago in reply to Per Westermark
I unrolled the innermost loops (the filter acts on several channels of data, and produces several output sample per call), since the ARM architecture does not have zero-overhead-looping functions.

However, I found working with pointers that are incremented actually slows things down since after every output sample, I need to reset the coefficient pointer to the start of the filter. Instead, I used regular indexing:

acc = a[0] * b[0]; acc += a[1] * b[1]; ...

since it does not matter to the processor whether it writes back the modified address or just uses a temporary index.
Cancel
Vote up 0 Vote down

Cancel