This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

C51 Not usig Dual datapointer

We assumed that this verry good compiler could recognice "for" statements moving bytes from one buffer to another. This seams not to be right. From the manual I can se that the dual dp only works with a limited number of lib functions (strcpy, memcpy...)
Do anyone have experience in using dual dp on dallas 320 and do we have to set up any directives for the compiler to use the dual dp all over the program.

Henning

Parents
  • The benchmarks show a significant difference between "small" (on the order of 1-10) and "large" (on the order of hundreds & thousands).

    How much of this is due to the overhead of the Generic Pointers?

    Do you have corresponding benchmarks using memory-specific pointers; eg,

    void xdata *memcpy_xdata_xdata( 
                   void xdata *dst,
                   void xdata *src,
                   unsinged char end,
                   unsigned int  len )
    

Reply
  • The benchmarks show a significant difference between "small" (on the order of 1-10) and "large" (on the order of hundreds & thousands).

    How much of this is due to the overhead of the Generic Pointers?

    Do you have corresponding benchmarks using memory-specific pointers; eg,

    void xdata *memcpy_xdata_xdata( 
                   void xdata *dst,
                   void xdata *src,
                   unsinged char end,
                   unsigned int  len )
    

Children
  • The benchmarks show a significant difference between "small" (on the order of 1-10) and "large" (on the order of hundreds & thousands).

    How much of this is due to the overhead of the Generic Pointers?

    Do you have corresponding benchmarks using memory-specific pointers; eg,


    I guess we could have written a really crappy memcpy that doesn't know about the memory areas, but we didn't. :-)

    That's why you like the Keil compiler so much!

    The memcpy routine figures out which memory area you are reading and which area you are writing and invokes a specific routine to copy the data. Therefore, the routine you suggest would be faster only by 4 or 5 instructions (the ones that figure out the source and destination memory areas).

    There only need to be the following:

    CODE  -> XDATA
    CODE  -> IDATA
    XDATA -> XDATA
    XDATA -> IDATA
    IDATA -> XDATA
    IDATA -> IDATA
    

    Since there is no write line for CODE, we don't have routines to write to it.

    The following code is generated for an XDATA to XDATA copy using the memcpy function:

    loop:
    movx  a,@dptr  ; 2 cycles
    inc   dptr     ; 2 cycles
    xch   a,r0     ; 1 cycle
    xch   a,dpl    ; 1 cycle
    xch   a,r0     ; 1 cycle
    xch   a,r4     ; 1 cycle
    xch   a,dph    ; 1 cycle
    xch   a,r4     ; 1 cycle
    movx  @dptr,a  ; 2 cycles
    inc   dptr     ; 2 cycles
    xch   a,r0     ; 1 cycle
    xch   a,dpl    ; 1 cycle
    xch   a,r0     ; 1 cycle
    xch   a,r4     ; 1 cycle
    xch   a,dph    ; 1 cycle
    xch   a,r4     ; 1 cycle
    djnz  r7,loop  ; 2 cycles
    djnz  r6,loop  ; 2 cycles
    

    This sure looked like a lot of code to me. However, casual observation and perusal of the instruction set led me to nothing that was shorter or faster. Maybe someone else will suggest a faster xdata to xdata copy routine.

    The loop portion (counting only the inner-loop DJNZ since it is taken each time) takes 22 cycles to copy 1 byte.

    The following code is generated for the Dallas dual data pointers:

    loop:
    movx  a,@dptr   ; 2 cycles
    inc   dptr      ; 2 cycles
    inc   dps       ; 1 cycle
    movx  @dptr,a   ; 2 cycles
    inc   dptr      ; 2 cycle
    inc   dps       ; 1 cycle
    djnz  r7,loop   ; 2 cycles
    djnz  r6,loop   ; 2 cycles
    

    This looks a lot faster. The loop portion (counting only the inner-loop DJNZ) takes 12 cycles to copy 1 byte.

    So, assuming an apples to apples comparison, the Dallas semiconductor dual data pointers allow you to copy data in 12/22 of the speed of a standard 8051 with only one data pointer. This is a BEST CASE time.

    As I pointed out in the link in my previous post, at about 100 bytes to copy, this starts to pay off.

    Jon

  • Does C51 ever use more than 2 DPTRs, if the chip supports it?

  • Yes,

    The Infineon C517 type devices are the only ones that I know of that have more than 2 data pointers. For these devices, C51 uses one PAIR of data pointers for each register bank.

    Jon