This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Shortest code for memory to memory transfer

Note: This was originally posted on 16th March 2009 at http://forums.arm.com

Hi All,
    Can anyone tell me what is the shortest code I can use for cortex M3 to transfer (a few words) from one memory location to other(without using the DMA).  It should be genric(the number of words to transfer is not constant).

Regards
Subin
Parents
  • Note: This was originally posted on 16th March 2009 at http://forums.arm.com

    Subin,

    Do you mean shortest in time or shortest in space? As canned functions, shortest in space, assuming num is never zero, is likely just:

    __asm void wordcopyasm
    (unsigned int *dest_r0, unsigned int *src_r1, int num_r2)
    {
    loop:
       LDR r3,[r1];
       STR r3,[r0];
       SUBS r2,r2,#1;
       BNE loop;
       BX lr;
    }


    however, this and shortest in time can likely be infered more easily and portably just by using the C compiler and code like:

    void wordcopy(unsigned int *dest, unsigned int *src, int num)
    {
       int i;
       for(i=0;i<num;i++) dest[i] = src[i];
    )


    Use of the 'restrict' keyword may be required to allow best optimisation where dest and src do not overlap.

    Absolute maximum performance can be achieved by eliminating or reducing the number of comparisons and branches, however, this requires better knowledge about what range of different number of words to copy need supporting, i.e. if only multiples of 4-words are performed, then only check every 4-words, e.g.

    __asm wordcopyfours
    (unsigned int *dest_r0, unsigned int *src_r1, int num_r2)
    {
    loop:
       LDR r3,[r1,#0];
       STR r3,[r0,#0];
       LDR r3,[r1,#0x4];
       STR r3,[r0,#0x4];
       LDR r3,[r1,#0x8];
       STR r3,[r0,#0x8];
       LDR r3,[r1,#0xC];
       STR r3,[r0,#0xC];
       SUBS r2,r2,#0x10;
       BNE loop;
       BX lr;
    }


    It is worth noting that if instruction bandwidth becomes a limiting factor, then pushing some registers onto the stack and using LDM/STM may be a better solution.

    hth
    s.
Reply
  • Note: This was originally posted on 16th March 2009 at http://forums.arm.com

    Subin,

    Do you mean shortest in time or shortest in space? As canned functions, shortest in space, assuming num is never zero, is likely just:

    __asm void wordcopyasm
    (unsigned int *dest_r0, unsigned int *src_r1, int num_r2)
    {
    loop:
       LDR r3,[r1];
       STR r3,[r0];
       SUBS r2,r2,#1;
       BNE loop;
       BX lr;
    }


    however, this and shortest in time can likely be infered more easily and portably just by using the C compiler and code like:

    void wordcopy(unsigned int *dest, unsigned int *src, int num)
    {
       int i;
       for(i=0;i<num;i++) dest[i] = src[i];
    )


    Use of the 'restrict' keyword may be required to allow best optimisation where dest and src do not overlap.

    Absolute maximum performance can be achieved by eliminating or reducing the number of comparisons and branches, however, this requires better knowledge about what range of different number of words to copy need supporting, i.e. if only multiples of 4-words are performed, then only check every 4-words, e.g.

    __asm wordcopyfours
    (unsigned int *dest_r0, unsigned int *src_r1, int num_r2)
    {
    loop:
       LDR r3,[r1,#0];
       STR r3,[r0,#0];
       LDR r3,[r1,#0x4];
       STR r3,[r0,#0x4];
       LDR r3,[r1,#0x8];
       STR r3,[r0,#0x8];
       LDR r3,[r1,#0xC];
       STR r3,[r0,#0xC];
       SUBS r2,r2,#0x10;
       BNE loop;
       BX lr;
    }


    It is worth noting that if instruction bandwidth becomes a limiting factor, then pushing some registers onto the stack and using LDM/STM may be a better solution.

    hth
    s.
Children
No data