This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

How can I optimize the xdata access with C51

Hi there!

I need to optimize the accessing the xdata memory (the speed) with C51.

I am sending some files via modem and the transmission could be a bit faster. At present, I am reaching a transmission speed around 2kbyte/sec, which is kinda slow.

Right now, my buffers are large arrays in xdata memory and before a byte reaches the UART, it went through a couple of buffers (for the prtotocol layers). I cannot prevent that. But maybe one or the other way can speed up my access time to xdata.

I am not sure, I have heard somewhere that C51 will be faster if pointers instead of arrays are used. I didn't find anything about that in the search (maybe due to my keywords). Does anybody know more about that topic?

  • I think it means the following: imagine we have to fill an array with zeros. There are at least 2 ways to do that:

    int array[100];
    int i;
    for (i=0; i<100; i++) array[i] = 0;
    
    and
    int array[100];
    int* ptr;
    for (ptr=array+100; ptr != array; ) *--ptr = 0;
    // or something like that
    
    The code generated for '*--ptr = 0' should be more efficient than the code generated for 'array[i] = 0'.

  • "The code generated for '*--ptr = 0' should be more efficient than the code generated for 'array[i] = 0'"

    This is not necessarily so - see:
    http://www.keil.com/forum/msgpage.asp?MsgID=4108

    Note also that it's more efficient to have your for loop counting down to zero, as the DJNZ instruction can then be used.

    If you do use pointers, be sure to use memory-specific pointers.

    Make sure that the pointer or loop index is in DATA.

    You may find that its best to use the Library routines like memcpy as they are (hopefully) pretty well optimised, or write your own optimised version in assembler.
    Enabling extra DPTR(s) should help a bit.

    Does your processor have DMA?

  • Another thought:

    Can you use PDATA?
    This should be quicker, as it only needs an 8-bit address?
    If you're really desperate, you could move the PDATA page for buffers >256 bytes - but that's probably need assembler!

    Can you turn your clock frequency up!?

  • Since you need to dereference only one pointer, and since you need to keep track of a count, either method is good.

       5          char xdata X[1000];
       6          extern void fn( char );
       7          
       8          void main( void )
       9          {
      10   1        char xdata* pX;
      11   1        unsigned int i;
      12   1      
      13   1        for( i = sizeof(X), pX = X; i != 0; --i, ++pX )
      14   1          fn( *pX );
      15   1      
      16   1       
      17   1        for( i = 0; i != sizeof(X); ++i )
      18   1          fn( X[ i ] );
      19   1      }
    C51 COMPILER V6.20c  MAIN                                                                  01/11/2002 06:16:05 PAGE 2   
    
    ASSEMBLY LISTING OF GENERATED OBJECT CODE
    
    
                 ; FUNCTION main (BEGIN)
                                               ; SOURCE LINE # 8
                                               ; SOURCE LINE # 9
                                               ; SOURCE LINE # 13
    0000 750003      R     MOV     i,#03H
    0003 7500E8      R     MOV     i+01H,#0E8H
    0006 750000      R     MOV     pX,#HIGH X
    0009 750000      R     MOV     pX+01H,#LOW X
    000C         ?C0001:
    000C E500        R     MOV     A,i+01H
    000E 4500        R     ORL     A,i
    0010 601D              JZ      ?C0002
                                               ; SOURCE LINE # 14
    0012 850082      R     MOV     DPL,pX+01H
    0015 850083      R     MOV     DPH,pX
    0018 E0                MOVX    A,@DPTR
    0019 FF                MOV     R7,A
    001A 120000      E     LCALL   _fn
    001D E500        R     MOV     A,i+01H
    001F 1500        R     DEC     i+01H
    0021 7002              JNZ     ?C0008
    0023 1500        R     DEC     i
    0025         ?C0008:
    0025 0500        R     INC     pX+01H
    0027 E500        R     MOV     A,pX+01H
    0029 70E1              JNZ     ?C0001
    002B 0500        R     INC     pX
    002D         ?C0009:
    002D 80DD              SJMP    ?C0001
    002F         ?C0002:
                                               ; SOURCE LINE # 17
    002F E4                CLR     A
    0030 F500        R     MOV     i,A
    0032 F500        R     MOV     i+01H,A
    0034         ?C0004:
                                               ; SOURCE LINE # 18
    0034 7400        R     MOV     A,#LOW X
    0036 2500        R     ADD     A,i+01H
    0038 F582              MOV     DPL,A
    003A 7400        R     MOV     A,#HIGH X
    003C 3500        R     ADDC    A,i
    003E F583              MOV     DPH,A
    0040 E0                MOVX    A,@DPTR
    0041 FF                MOV     R7,A
    0042 120000      E     LCALL   _fn
    0045 0500        R     INC     i+01H
    0047 E500        R     MOV     A,i+01H
    0049 7002              JNZ     ?C0010
    004B 0500        R     INC     i
    004D         ?C0010:
    004D B4E8E4            CJNE    A,#0E8H,?C0004
    0050 E500        R     MOV     A,i
    0052 B403DF            CJNE    A,#03H,?C0004
                                               ; SOURCE LINE # 19
    0055         ?C0007:
    0055 22                RET     
                 ; FUNCTION main (END)
    

  • My guess is that your protocall layers are killing you. Replace c = X[i] with c = 0, and see if you can even measure a speed increase.

  • You say that you have some large buffers. In that case you will have to place them in xdata.

    Presumably you are using an interrupt driven UART driver. Consider making the buffer that is accessed by the ISR as small as possible so that it at least can be placed in pdata – that will keep your interrupts as fast as possible.

    If you can, keep your buffers down to 256 elements or less; on the 8051, 8-bit arithmetic is very much faster that 16-bit.

    I assume that you are using circular buffers. Although having your buffers in xdata may be inevitable because of their size. Don't place the buffer and the control variables in one structure, it looks neat but is generally slower. If you can, place the control variables (read and write pointers/indexes, count etc.) in the fastest available memory e.g. pdata or preferably data.

    With a circular buffer it is necessary to increment an index modulo the length of the buffer. Make your buffer 2^n elements long and C51 will covert you modulo operator to an AND mask – which is nice and quick. This is probably the main reason why, in similar contexts, I have found no real advantage in using pointers rather than indexes – a fast increment modulo 2^n is essential.

    BTW: in general I have noticed that C51 is not very cleaver when pre/post increment/decrements are used within an expression and that it is generally the case that shorter, faster code results by placing these increments/decrements in separate C statements. The exceptions to this rule are

    unsigned char count;
    ...
    if ( --count )
    {
    ...
    }
    
    and
    do
    {
    ...
    } while( --count != 0 )
    
    In the above cases, the compiler uses DJNZ instruction for a very efficient implementation.