This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

trouble over trouble

Hello

I'm working with the AT91RM9200 from Atmel and I have some problems with data abort. The stack size is big enough. I tested it by making the user stack size zero and see which errors now occured. Everything was overwritten by other things. At the moment my stack size is 0x00000200. The error which occured is that the ptr to the usart base address is overwritten either if the usart.o is stored in the internal ram or the external ram. I use the watch window to see if the ptr would change his value - but nothing happens. I only know that the value is changed because I set a dummy within the retarget function for printf() - (thanks for the hint Per).

The whole programm only includes the ethernet init part, the usart and retarget function in separate c-files. So I have one main file with only the main routine where I init all other functions by calling the right function. In the while loop (main.c) I only poll if a flag is set by receiving a new frame by the ethernet. This works with the data abort problems.

These problems don't appear always at the same part of code. Is my acceptance right, that if I would have a stack size problem, the error would occurred always at the same part of code?

When I test the whole programm several times, I have one or two passes where no error occured. Very often the error happened bevor I call the init function for the ethernet part. So I don't think there would be a error during the execution of the code. I think the error could / must be very global.
If I don't include the ethernet part, then I got no data abort error. But I'm not able to find the / or these problems.

the headerfile of the ethernet part:

#define AT91C_EMAC_TDLIST_BASE                     0x21000000                                              /* */
#define ETH_PACKET_SIZE                 1536                                            /* greater than 1518 or 1522 (VLAN) bytes */
#define NB_ETH_RX_PACKETS               10                                                              /* number of RX buffer */
#define EMAC_RXBUF_ADD_WRAP                             0x02                                                    /* WRAP Bit at the end of the list descriptor */
#define MII_STS_REG                                     0x01
#define MII_STS2_REG                                    0x11
#define RxPacket (AT91C_EMAC_TDLIST_BASE + (8 * NB_ETH_RX_PACKETS))

typedef struct {
     unsigned int   RxBufAddr;
     unsigned int   RxBufStatus;
} *EMAC_pRX_descriptor;

void Phy_Init(void);

Parents
  • ok my ASR register is: 0x00010202 and my AASR register is 0x10000b69

    when I look into the map file then I found:

    local symbols:

     _printf_core                             0x100008bd   Thumb Code  1132  printf8.o(i._printf_core)
        i._printf_post_padding                   0x10000d28   Section       38  printf8.o(i._printf_post_padding)
    

    or global symobls

     __scatterload_zeroinit                   0x100008a4   ARM Code      24  handlers.o(i.__scatterload_zeroinit)
        Region$$Table$$Base                      0x10000d80   Number         0  anon$$obj.o(Region$$Table)
    

    the ASR register:
    - misaligned address abort status
    - code fetch
    - the last aborted access was due to the ARM920T

    that are all information. So the data abort comes from the printf() function - but why?

    Johannes

Reply
  • ok my ASR register is: 0x00010202 and my AASR register is 0x10000b69

    when I look into the map file then I found:

    local symbols:

     _printf_core                             0x100008bd   Thumb Code  1132  printf8.o(i._printf_core)
        i._printf_post_padding                   0x10000d28   Section       38  printf8.o(i._printf_post_padding)
    

    or global symobls

     __scatterload_zeroinit                   0x100008a4   ARM Code      24  handlers.o(i.__scatterload_zeroinit)
        Region$$Table$$Base                      0x10000d80   Number         0  anon$$obj.o(Region$$Table)
    

    the ASR register:
    - misaligned address abort status
    - code fetch
    - the last aborted access was due to the ARM920T

    that are all information. So the data abort comes from the printf() function - but why?

    Johannes

Children
  • most errors occured within the stack - below the stack pointer and sometimes I have errors above the stack pointer (but most of the time in the internal RAM - only one error was in the external sdram (during 20 tries)... Moreover I only get the data aborts if packets were received by the ethernet, but these packets don't use any kind of internal sram, because they are stored in the sdram by the DMA.

    best regards
    Johannes

  • Have you looked at the overwritten memory area - is there a pattern to the data there? Have you tried to send ethernet packets with specific test data?

    You say that the ethernet packets are stored in the sdram. But have you verified that you actually get data there? For example by filling the sdram area with a pattern so you can see that the pattern is overwritten by received data. Just believing that the ethernet data is (should be) stored in the sdram area isn't enough.

    Debugging means that you have theories. Then you have to prove or disprove these theories one by one until you find the problem. The big problem is to separate what you believe and what you know, and prove that what you think really is true.

  • thanks for your answer...

    the received data from the ethernet are stored in the sdram. I wrote a small programm to see if I get the data here and yes the data overwrite my pattern. There's one ethernet register where I wrote the address for the received data from the ethernet (transmit to the sdram by dma)

    At the moment I have always data abort when data are received by the ethernet - and the address of the data abort is very often within the heap size.

    But I don't know why the heap size should be too small. I only read, that the stdio-lib would use a lot of heap size.

    my stack ptr is:

    __initial_sp                             0x00201018   Data         544  rm9200.o(STACK)
    

    and the stack top

    STACK                                    0x00200018   Section     4640  rm9200.o(STACK)
        Stack_Top                                0x00201238   Number         0  rm9200.o(STACK)
    

    the addr of the data abort is e.g.
    - 00203332
    - 00206572
    - 0020ACF5
    - 00206563

    always in the internal RAM:
    INT_RAM 0x200000 0x000FFFFF (.ANY(+RW +ZI))

    And moreover I know that the pointer to the USART will be overwritten very often - sometimes a data abort happens sometimes not. But this occured bevor data will be received.

  • If I try to write a pattern between the end of the internal sram and the stack top.

    ptr = (unsigned int *)0x00206500;
    

    If I received data packets by the ethernet, the pattern is nearly there. Sometimes I get a data abort, too, but not regularly when data are received from the ethernet; but I get nearly never a data abort, if no data will be received by the ethernet

    If the address is smaller (e.g. addr = 0x00204000) than I'm not able to run the programm. The programm always stays by PLLA_loop (wait until PLL A is stabilized)...

    Johannes

  • Is it correct, that the heap base addr is the same as the stack base addr?

    Execution Region INTERN_RAM (Base: 0x00200000, Size: 0x00001298, Max: 0x00040000, ABSOLUTE)
    
        Base Addr    Size         Type   Attr      Idx    E Section Name        Object
    
        0x00200000   0x0000000c   Data   RW           13    .data               init.o
        0x0020000c   0x00000004   Data   RW           45    .data               usart.o
        0x00200010   0x00000004   Data   RW           74    .data               retarget.o
        0x00200014   0x00000060   Zero   RW          169    .bss                libspace.o(c_t.l)
        0x00200074   0x00000004   PAD
        0x00200078   0x00000000   Zero   RW            2    HEAP                rm9200.o
        0x00200078   0x00001220   Zero   RW            1    STACK               rm9200.o
    

    Johannes

  • another thing which I don't understand is, that I changed the size of the internal RAM by the linker script.

    INTERN_RAM 0x200000 0x00040000
    

    And I still get a data abort with the address 0x002FFFB.

    Johannes

  • Changing the size of RAM in your configuration files will not change the physical memory in the processor,
    or how the memory controller in the processor handles exceptions.

    If you have a pointer (or return address) that points to a specific address, the processor will try to jump there without the slightest knowledge of the existence of memory.

  • that means, that the beginning of the heap will not change if I reduce the size of the internal RAM.

    How can I find out from which code / function the data abort will caused? At the moment I only know the address of the data abort.

    Johannes

  • What could be the problem if I write a pattern to the addr 0x00205241

    for(i=0; i<100; i++)
            {
    
                    *ptr2=0xFF7;
    
                    *ptr2++;
            }
    

    (the stack top is 0x00200798) and I'm not able to transmit any data by the usart? I always get a data abort (independent if there are ethernet packets).

    Johannes

  • If I only write:

       ptr2 = (unsigned int *)0x00205241;
            *ptr2=0xFF7;
    

    then I get a data abort to this address (caused by this write access)

    But why could that be a problem, the end of the internal RAM is 0x002FFFFF

    Johannes

  • This happens because you are violating the alignment
    restrictions when reading or writing to memory. An exception will be raised if:

       read/write word32: (address & 3) != 0
       read/write word16: (address & 1) != 0
    

  • thanks for the answer...

    and it is ok, if the program write data into the ram to addresses which are higher than the stack ptr address?

    Which functions stored their values at this addresses? I've no own functions or variables which pointed to this address area.

    Johannes

  • Well, that depends on your memory layout. Assuming you are using a current MDK version V3.15b, there is a new window called

    Call Stack Unwinder

    It works in simulation as well as in target mode (e.g. ULINK) and shows the complete call chaining and all
    local variables/parameters of each call level down to
    the abort handler. Using this call chain, you can find out which function caused the access to out of area memory. If such a call is unnamed, then it is most likely some library function.

  • yes I saw this new window called stack frames...

    I've installed a function which is called if a data abort occured -> and if this happened then I only see this function and the main function.

    The problem is that this error does not always happen when I receive frames by the ethernet. Sometimes I can receive 20 ethernet frames without a problem, sometimes I get 10 ethernet frames, sometimes the printf() message is not completely transmitted (but no data abort), and sometimes I get a data abort (during the receive process)...

    And the addresses of the data abort are very often above the stack pointer in the internal RAM, and seldom at addresses which are not defined... I tested all pointers I have installed if they point to such a address - but there's everything ok.

    the main function is very small.. I've got three global variables, one extern defined pointer and a little bit code..

    volatile unsigned int Emac_Receive=0;
    volatile unsigned int overrun=0;
    volatile unsigned int zahl=0;
    extern EMAC_pRX_descriptor p_rxBD;
    

    Johannes

    Johannes

  • or the function are COM1_Senchar() and fputc() (from the retarget function)...

    #include <stdio.h>
    #include <rt_misc.h>
    #pragma import(__use_no_semihosting_swi)
    
    extern int COM1_Sendchar (int ch);      /* in usart.c */
    
    struct __FILE { int handle; /* Add whatever you need here */ };
    FILE __stdout;
    //FILE __stdin;
    
    int fputc(int ch, FILE *f) {
      return (COM1_Sendchar(ch));
    }
    
    
    int ferror(FILE *f) {
      /* Your implementation of ferror */
      return EOF;
    }
    
    
    void _ttywrch(int ch) {
      COM1_Sendchar(ch);
    }
    
    void _sys_exit(int return_code) {
    label:  goto label;  /* endless loop */
    }
    

    But I mean this function would work by nearly everyone..