This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Large memcpy

We are using Keil C-51 V7.06.

There is a structure defined:

typedef struct
{
  unsigned char address_bus_h;
  unsigned char address_bus_l;
  unsigned char data_bus;
}instbus_raw_t;

Using three simple lines to copy the structure to another it enlarges code by 9 bytes, but this is not an elegant solution:

instbus_raw.address_bus_h = instbus_raw_local.address_bus_h;
instbus_raw.address_bus_l = instbus_raw_local.address_bus_l;
instbus_raw.data_bus = instbus_raw_local.data_bus;

Using the normal library function memcpy
blows up the code by 300 bytes!

memcpy(&instbus_raw,&instbus_raw_local,sizeof(instbus_raw_t));

Using an own function my_memcpy the code increases by 167 bytes:

void *(my_memcpy)(void *s1, const void *s2, size_t n)
{
  char *su1 = (char *)s1;
  const char *su2 = (const char *)s2;

  for (; 0 < n; --n)
    *su1++ = *su2++;
  return (s1);
}

my_memcpy(&instbus_raw,&instbus_raw_local,sizeof(struct instbus_raw_t));

In a project with a little chip of 2k Flash, 300 bytes for copying few bytes are considerable!

Does anyone remarking same effects with library functions wasting resources?

Regards Peter

Parents

0 Graham Cole over 22 years ago in reply to Peter Christen

Then you will be wanting something like this:

//
//  Data Memory Copy
//
//  Author: Graham Cole
//
//  This function compies n bytes from s2 to s1.
//
//  The address of memory block s1 is a Keil generic pointer in R7.
//
//  The address of memory block s2 is a Keil data pointer in R5.
//
//  The value of n is held in registers R3.
//
//  The first lines of C code will suppress the UNUSED variable warning and the
//  Keil C51 compiler will optimise this code out. The return() statement is
//  compiled to an unused RET instruction.
//

#pragma ASM

    $REGUSE _data_memcpy( A, PSW, R0, R1, R3 )

#pragma ENDASM

char data *data_memcpy(char data *s1, char data *s2, unsigned char n)
{
    s1 = s1;                                        //Suppress UNUSED
    s2 = s2;                                        //Suppress UNUSED
    n  = n;                                         //Suppress UNUSED

        #pragma ASM
                                                    ;
        data_memcpy:                                ;
                                                    ;
                MOV     A,R7                        ;Load s1 into register R0.
                MOV     R0,A                        ;
                MOV     A,R5                        ;Load s2 into register R1.
                MOV     R1,A                        ;
                                                    ;
                MOV     A,R3                        ;
                JZ      ?data_memcpy_generic_end    ;If zero bytes to copy, terminate.
                                                    ;
        ?data_memcpy_loop:                          ;
                                                    ;
                MOV     A,@R1                       ;Read from pointer s2.
                MOV     @R0,A                       ;Write to pointer s1.
                                                    ;
                INC     R1                          ;Increment s2 pointer.
                INC     R0                          ;Increment s1 pointer.
                                                    ;
        ?data_memcpy_generic_skip_1:                ;
                                                    ;
                DJNZ    R3,?data_memcpy_loop        ;..and iterate.
                                                    ;
        ?data_memcpy_generic_end:                   ;
                                                    ;
                                                    ;Return with s1 still in R7.
                RET                                 ;
                                                    ;
        #pragma ENDASM


    return( 0 );                                    // Dummy return.
}

Which, of course, I have not actually tested!

Reply

0 Graham Cole over 22 years ago in reply to Peter Christen

Then you will be wanting something like this:

//
//  Data Memory Copy
//
//  Author: Graham Cole
//
//  This function compies n bytes from s2 to s1.
//
//  The address of memory block s1 is a Keil generic pointer in R7.
//
//  The address of memory block s2 is a Keil data pointer in R5.
//
//  The value of n is held in registers R3.
//
//  The first lines of C code will suppress the UNUSED variable warning and the
//  Keil C51 compiler will optimise this code out. The return() statement is
//  compiled to an unused RET instruction.
//

#pragma ASM

    $REGUSE _data_memcpy( A, PSW, R0, R1, R3 )

#pragma ENDASM

char data *data_memcpy(char data *s1, char data *s2, unsigned char n)
{
    s1 = s1;                                        //Suppress UNUSED
    s2 = s2;                                        //Suppress UNUSED
    n  = n;                                         //Suppress UNUSED

        #pragma ASM
                                                    ;
        data_memcpy:                                ;
                                                    ;
                MOV     A,R7                        ;Load s1 into register R0.
                MOV     R0,A                        ;
                MOV     A,R5                        ;Load s2 into register R1.
                MOV     R1,A                        ;
                                                    ;
                MOV     A,R3                        ;
                JZ      ?data_memcpy_generic_end    ;If zero bytes to copy, terminate.
                                                    ;
        ?data_memcpy_loop:                          ;
                                                    ;
                MOV     A,@R1                       ;Read from pointer s2.
                MOV     @R0,A                       ;Write to pointer s1.
                                                    ;
                INC     R1                          ;Increment s2 pointer.
                INC     R0                          ;Increment s1 pointer.
                                                    ;
        ?data_memcpy_generic_skip_1:                ;
                                                    ;
                DJNZ    R3,?data_memcpy_loop        ;..and iterate.
                                                    ;
        ?data_memcpy_generic_end:                   ;
                                                    ;
                                                    ;Return with s1 still in R7.
                RET                                 ;
                                                    ;
        #pragma ENDASM


    return( 0 );                                    // Dummy return.
}

Which, of course, I have not actually tested!

Children

0 Stefan Duncanson over 22 years ago in reply to Graham Cole
For a bit of amusement I compiled the following code under v7.01 optimisation level 9:

unsigned char data *my_data_memcpy(unsigned char data *dest, unsigned char data *src, unsigned char n) { unsigned char data *temp=dest; while(n) { *temp=*src; temp++; src++; n--; } return(dest); }

and found that it produces code that is two instructions shorter than Graham's data_memcpy() function.

Sadly I was unable to convince the compiler to generate a suitably short version that was also reentrant. I also noted that the code was probably less efficient than Graham's, although I didn't actually bother to count either the instruction cycles or the total opcode byte count.

Still, it does show that one only needs to resort to assembler when one has very specific requirements.

Finally I also noticed that the return(0); in Graham's function generates a MOV R7,#00H instruction as well as a RET, which is a shame.
Cancel
Vote up 0 Vote down

Cancel
0 Graham Cole over 22 years ago in reply to Stefan Duncanson
Ah, yes, sometime I just cannot helpmyself from getting into assembler... In fact, the C version could probably be slightly improved by using

do { ... }while(--n != 0);
In which case, I dare say the compiler code would be identical to my assembler.

It is a pity that the compiler rules for passing parameters do not allow for two generic pointers and a count entirely in registers. This is quite a common requirement and the C51 compiler itself seems to be able to override these rules.

Given that there are miriad ways of implementing memcpy() and other string.h functions, it would be very helpful for implementors to be able to write their own string.h libraries by having access to C51's special parameter passing rules. My guess is that this would not be too dificult to do, though it may require the addition of a new keyword to indicate use of the special parameter passing rules to the compiler. Such a facility could make a substantial difference to code size (as well as speed) and this could be very significant in the case of small applications.

Implementors could choose between large and fast functions or slow but compact. Also, such functions could then easily be made reenterant.

I have started a new thread on the subject of copying structures(): http://www.keil.com/forum/docs/thread4380.asp#msg18741
Cancel
Vote up 0 Vote down

Cancel
0 Andy Neil over 22 years ago in reply to Stefan Duncanson

"Finally I also noticed that the return(0); in Graham's function generates a MOV R7,#00H instruction as well as a RET, which is a shame."

Like I always say: if you need some assembler, have it properly as an assembler module - don't mess about with inline assembler in 'C' source files!

Graham's function actually contains four lines whose sole function is to suppress compiler warnings.
Luckily, the compiler happens to be smart enough to spot that the 1st three are irrelevant, and optimises them out.
That just leaves the "spurious" RET.

Using the SRC directive is a great way to create 'C'-compatible assembler source - with all the right calling & naming conventions, parameter passing, etc - but once you've done that, the 'C' file is of no further use; so throw it away!
Cancel
Vote up 0 Vote down

Cancel
0 Mik Kleshov over 22 years ago in reply to Andy Neil

Maybe a bit off topic, but I just wanted to share a thought.
Take a look at the webpage of a new programming language called D:
http://www.digitalmars.com/d/overview.html
Here is a quote:

Modern compiler technology has progressed to the point where language features for the purpose of compensating for primitive compiler technology can be omitted. (An example of this would be the 'register' keyword in C, a more subtle example is the macro preprocessor in C.) We can rely on modern compiler optimization technology to not need language features necessary to get acceptable code quality out of primitive compilers.

Yet a lot of discussions around the use of C in microcontroller programming boil down to how to get more optimal code from a particular C compiler.
Is it that Keil's compilers have not caught up with the latest and greatest in compiler technology? Or am I too picky?

- mike
Cancel
Vote up 0 Vote down

Cancel
0 Jon Ward over 22 years ago in reply to Mik Kleshov

Is it that Keil's compilers have not caught up with the latest and greatest in compiler technology?

Can you give me an example (manufacturer and version) of a compiler that is the latest and greated in technology. That way, I can let you know if we've caught up with them.

Jon
Cancel
Vote up 0 Vote down

Cancel
0 Stefan Duncanson over 22 years ago in reply to Mik Kleshov

"Or am I too picky?"

You're too picky.

In my opinion:

If you have to use assembler rather than 'C' for reasons of code size or speed then you're using the wrong hardware. The only case I can really see for using assembler is when you need to make sure the code doesn't change across compiler versions.
Cancel
Vote up 0 Vote down

Cancel
0 Mik Kleshov over 22 years ago in reply to Jon Ward
Yes, I agree, statements like these have to be supported by facts. I use the C166 compiler, and there are not too many compilers for that architecture. And from what I heard Keil's C166 is the best available.
But what I meant was that so many times when I look at the code generated by C166 I can't help but notice so obvious optimizations not performed by the compiler. Let's look at a real-world example:

#include <intrins.h> long l[2]; long read_long_atomically(int i) { long tmp; long *ptr; ptr = &l[i]; _atomic_(0); tmp = *ptr; _endatomic_(); return tmp; } Compiler listing: MOV R5,R8 SHL R5,#02H MOV R4,#l ADD R4,R5 ATOMIC #02H MOV R6,[R4] MOV R7,[R4+#02H] MOV R4,R6 MOV R5,R7 RET
Quite a few temporary storage registers can be eliminated in this code. It would not be unreasonable to expect the following kind of code from a modern compiler:

SHL R8,#4 ADD R8,#l ATOMIC #02H MOV R4,[R8] MOV R5,[R8+#2] RET

That's what I meant really.

- mike
Cancel
Vote up 0 Vote down

Cancel
0 Jon Ward over 22 years ago in reply to Mik Kleshov

In this case, I agree with you. However, most functions are not quite that trivial.

It's easy to create the perfect optimizing compiler if you guarantee that all function it compiles are small and are not too complex.

The problem arises when you have functions that are insanely complex. Then, the compiler still must do a good job.

As it is, the small functions like you demonstrate would be the ones that I would first write in C (to get working) and later go back in write in assembly (if needed).

Jon
Cancel
Vote up 0 Vote down

Cancel
0 Andy Neil over 22 years ago in reply to Stefan Duncanson

"If you have to use assembler rather than 'C' for reasons of code size or speed then you're using the wrong hardware."

As I've said before, all generalisations are bad! ;-)

IF you are making extremely cheap products in extremely large volumes, the development costs are secondary to the component costs.

In such cases, you want the cheapest processor you can possibly find - and you can afford to put in a bit more development effort to squeeze the last ounce of performance, or shoe-horn the code into the smallest possible ROM.

However, I do agree that most of the questions here about inline assembler are due to misconceptions...
Cancel
Vote up 0 Vote down

Cancel
0 Drew Davis over 22 years ago in reply to Andy Neil

The only case I can really see for using assembler

Sometimes you need to poke around in assembler for some glue logic in startup code before main can even get going. Say, boot code loading an application and restarting into it, or initializing bank switch logic, or that sort of thing. C presumes a certain environment exists, and somethings you need to do things outside of that world's "laws of physics".

Other times, you might need assembler to cope with some picky hardware. C doesn't always give you precise control over which bits change at which address on which clock cycle.

Speed and efficiency count, too. Often, there's just a couple of routines that can greatly benefit from specialized assembler, and just throwing a bigger processor at the whole project for the sake of a couple of functions is not really the right answer.
Cancel
Vote up 0 Vote down

Cancel
0 Mik Kleshov over 22 years ago in reply to Jon Ward

It's easy to create the perfect optimizing compiler if you guarantee that all function it compiles are small and are not too complex.

I'm sure a lot of users would appreciate a compiler command line option called "perform near-perfect optimization on simple functions". If it's easy, why not do that?
I seem to remember that the OpenWatcom compiler even allows the user to specify the amount of virtual memory to use in optimization. Basically the amount of available memory pretty much determines how good a job the compiler does at optimizing complex functions.
Ah, well...

- mike
Cancel
Vote up 0 Vote down

Cancel
0 Stefan Duncanson over 22 years ago in reply to Drew Davis

"Sometimes you need to poke around in assembler for some glue logic in startup code before main can even get going."

Sure, sorry, my comments were really in the context of calling hand optimised assembler routines from 'C'.

"Other times, you might need assembler to cope with some picky hardware. C doesn't always give you precise control over which bits change at which address on which clock cycle."

Well, I'd argue that this sort of thing shouldn't be done in software - chuck in a PLD or some such to move the timing burden to hardware.

"Speed and efficiency count, too. Often, there's just a couple of routines that can greatly benefit from specialized assembler, and just throwing a bigger processor at the whole project for the sake of a couple of functions is not really the right answer."

Yes, but as we've seen in this thread the speed and efficiency gains can often be made by rewriting [an existing library function, say] code in 'C' - there may be little to be gained from the move to assembler.

Outwith minor modifications to startup.a51 I can only think of one occasion I've had to use assembler on the 8051, and that was to call functions in an on-chip bootloader that required certain values in certain registers. While it was possible to do it in 'C' it fitted into the 'maintain the same code across compiler versions' category.

I also take Andy's point about low value high volume product, I've always been fortunate enough to work on high value kit where component cost isn't much of an issue, so I tend to overlook this.
Cancel
Vote up 0 Vote down

Cancel
0 Andy Neil over 22 years ago in reply to Stefan Duncanson

"I also take Andy's point about low value high volume product, I've always been fortunate enough to work on high value kit where component cost isn't much of an issue, so I tend to overlook this."

Actually, so do I!

But I've been picked up on it a number of times now, so I thought I'd just get my own back! ;-)
Cancel
Vote up 0 Vote down

Cancel