This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Code portability

Hello,
I was browsing through older posts that deal with the painful issue of portability (http://www.keil.com/forum/docs/thread8109.asp). I was (and still am) a big advocate of programming as much as possible conforming to the C standard, and having a layered structure that allowed "plugging-in" other hardware. But I have come to change my mind recently. I am reading the "ARM system developer's guide" (excellent book by the way. I'm reading it because I want to port some C167 code to an ARM9 environment) in which chapter 5 discusses writing efficient C code for an ARM. The point is, and it is fairly demonstrated, that even common, innocent looking C code can either be efficient of very inefficient on an ARM depending on specific choices made, let alone another processor used! So, if we are talking about squeezing every clock cycle out of a microcontroller - I do not believe that portability without ultimately littering the code is possible!

Parents

0 Catcus Blip over 17 years ago in reply to Jack Sprat

Ok Jack, here you go:

int checksum_v5(int *data)
{
    unsigned int i;
    int sum=0;
    for (i=0; i<64; i++)
    {
       sum += *(data++);
    }
    return sum;
}

This compiles to

checksum_v5
MOV r2,r0 ; r2 = data
MOV r0,#0 ; sum = 0
MOV r1,#0 ; i = 0
checksum_v5_loop
LDR r3,[r2],#4 ; r3 = *(data++)
ADD r1,r1,#1 ; i++
CMP r1,#0x40 ; compare i, 64
ADD r0,r3,r0 ; sum += r3
BCC checksum_v5_loop ; if (i<64) goto loop
MOV pc,r14 ; return sum

It takes three instructions to implement the for loop structure:

*An ADD to increment i
*A compare to check if i is less than 64
*A conditional branch to continue the loop if i < 64

This is not efficient. On the ARM, a loop should only use two instructions:

*A subtract to decrement the loop counter, which also sets the condition code flags on
the result
*A conditional branch instruction

The key point is that the loop counter should count down to zero rather than counting up to some arbitrary limit.

Now, an improved verison is this:

int checksum_v6(int *data)
{
    unsigned int i;
    int sum=0;
    for (i=64; i!=0; i--)
    {
      sum += *(data++);
    }
    return sum;
}

This compiles to

checksum_v6
MOV r2,r0 ; r2 = data
MOV r0,#0 ; sum = 0
MOV r1,#0x40 ; i = 64
checksum_v6_loop
LDR r3,[r2],#4 ; r3 = *(data++)
SUBS r1,r1,#1 ; i-- and set flags
ADD r0,r3,r0 ; sum += r3
BNE checksum_v6_loop ; if (i!=0) goto loop
MOV pc,r14 ; return sum

Say, Jack, are you going to read the manual for a change :-) :-) ;-)

Reply

0 Catcus Blip over 17 years ago in reply to Jack Sprat

Ok Jack, here you go:

int checksum_v5(int *data)
{
    unsigned int i;
    int sum=0;
    for (i=0; i<64; i++)
    {
       sum += *(data++);
    }
    return sum;
}

This compiles to

checksum_v5
MOV r2,r0 ; r2 = data
MOV r0,#0 ; sum = 0
MOV r1,#0 ; i = 0
checksum_v5_loop
LDR r3,[r2],#4 ; r3 = *(data++)
ADD r1,r1,#1 ; i++
CMP r1,#0x40 ; compare i, 64
ADD r0,r3,r0 ; sum += r3
BCC checksum_v5_loop ; if (i<64) goto loop
MOV pc,r14 ; return sum

It takes three instructions to implement the for loop structure:

*An ADD to increment i
*A compare to check if i is less than 64
*A conditional branch to continue the loop if i < 64

This is not efficient. On the ARM, a loop should only use two instructions:

*A subtract to decrement the loop counter, which also sets the condition code flags on
the result
*A conditional branch instruction

The key point is that the loop counter should count down to zero rather than counting up to some arbitrary limit.

Now, an improved verison is this:

int checksum_v6(int *data)
{
    unsigned int i;
    int sum=0;
    for (i=64; i!=0; i--)
    {
      sum += *(data++);
    }
    return sum;
}

This compiles to

checksum_v6
MOV r2,r0 ; r2 = data
MOV r0,#0 ; sum = 0
MOV r1,#0x40 ; i = 64
checksum_v6_loop
LDR r3,[r2],#4 ; r3 = *(data++)
SUBS r1,r1,#1 ; i-- and set flags
ADD r0,r3,r0 ; sum += r3
BNE checksum_v6_loop ; if (i!=0) goto loop
MOV pc,r14 ; return sum

Say, Jack, are you going to read the manual for a change :-) :-) ;-)

Children

0 ImPer Westermark over 17 years ago in reply to Catcus Blip

Loop unrolling?
Cancel
Vote up 0 Vote down

Cancel
0 Catcus Blip over 17 years ago in reply to ImPer Westermark

Yes Per, another excellent example, but I think that the example above is more powerful as it depends on the actual instruction set of the processor.
Cancel
Vote up 0 Vote down

Cancel
0 ImPer Westermark over 17 years ago in reply to Catcus Blip

Just about all processors prefer loops that decrement to zero, since zero is "magic".

In this case it takes a decrement and a conditional branch. A lot of processors has DJNZ instructions, where a hard-coded register is used to fit all in a single instruction.
Cancel
Vote up 0 Vote down

Cancel
0 Jack Sprat over 17 years ago in reply to Catcus Blip

The key point is that the loop counter should count down to zero rather than counting up to some arbitrary limit.

Given that the loop counter is not used in or after the body of the loop the compiler is, I believe, well within its rights under the 'as if' rule to rearrange the loop to decrement rather than increment. I guess it's a quality of implementation issue.

If you decide to code loops like this to decrement rather than increment the resulting 'C' is no less portable, so I'm not entirely sure what your point is.

If you mean that you would have to perform this kind of manual optimisation for each platform and/or compiler you target then I congratulate you on being able to design hardware that is only just powerful enough to work with optimal code every time.

If 'every clock cycle counts' then you have to use assembly. 'C' will always produce code that is slower and larger - the problem is that you cannot predict by exactly how much. If this matters, don't use 'C'.

Say, Jack, are you going to read the manual for a change

I don't use ARM.
Cancel
Vote up 0 Vote down

Cancel