This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Bootloader problem in ARM926ejs

  • Note: This was originally posted on 24th June 2011 at http://forums.arm.com

    (sorry this is the first time I use the forum, so at first I did post an empty message by mistake)
  • Note: This was originally posted on 25th June 2011 at http://forums.arm.com

    Hi scott,
    Thx for your reply.

    The abort exception I get is a Data Abort exception so I located the offending instruction by looking at the address pointed by (R14-8).
    It was a write access to a peripheral register.

    I did some additional checks today, and I discovered that my MMU table is not always mapped as I expect it to be.
    I fill up a table by giving
    [size], [logical address], [physical address], [attributes]
    in an MMU table, and I was told only to be careful that the TTB base address should be 16kB aligned, and the page addresses 4kB aligned, but after MMU table initialization, I took a pick at the entries in the TTB and Coarse TTB, and it is not always what I was expecting (for example, the entries concerning the stack memory do not appear in the TTB but it should point to an entry in the CTTB -  but I have to double check that).
    In any case, even if all my entries are there in the table, I also discovered that sections with size > 1MB have to be 1MB aligned.
    Since I was not aware of that before, that might have lead to some problem (ill review the memory-map at first so that this kind of problem cant occur).

    Concerning flushing the D cache, yes until now the caches were ON.
    But today I tried to put the attributes "NCNB" on the Application Code destination area, but I still got the same problem.

    The weirdest thing is that changing the Read-Only attribute of the code section to Read-Write seems to solve the issue (at least I could reproduce the issue by putting the code back to Read-Only).
    What is weird is that this attribute applies to a memory area that is not the one where the offending abort exception seems to occur.
    Also the peripheral registers are somewhere between 0x43F00000 - 0x7000000 and the code section is between 0x81400000 - 0x81800000 in RAM. It should be unrelated.
    But it might not be so weird if the TTB and CTTB are wrongly set.
    Still that does not explain why I dont get any trouble when running from the debugger.

    I will investigate more, and post back the results.

    The truth should be in there....

    PS: Please note that I modified the first post's boot sequence, I had inverted a few things. Now it reflects well what the boot sequence is doing. There is no access to the stacks in MMU initialization.
  • Note: This was originally posted on 27th June 2011 at http://forums.arm.com

    I've been checking a little deeper today:

      The offending instruction is:

      LDR   R12,[R3, #+0]

      And the reason is that R3 contains 0x008021A9 where it should contain the address of a GPIO register.

      When running from the debugger, at the same line, R3 contains 0x53FCC000 which is the expected value.

      

      The following table:

      static volatile struct gpio * gpioRegTbl[N_GPIO_CH] = { &GPIO1, &GPIO2, &GPIO3, &GPIO4}; is not properly initiated when starting from the starter. It is properly initialized when starting from the debugger.


    I join my initialization code. 

    If you see any mistake, please comment.

  • Note: This was originally posted on 27th June 2011 at http://forums.arm.com

    I am still gathering clues:
    It seems that the __iar_data_init2 in initarm.s79 that should be initializing the .data section is not working properly (it didn't initialize anything).
    Still searching why.
    Still couldnt confirm if the MMU is completely properly set (but it seems OK now that all entries are 1MB aligned).
  • Note: This was originally posted on 27th June 2011 at http://forums.arm.com

    I took a pick at the memory map to find that "__iar_data_init2" function, and all I found is "__iar_data_init3".
    So I did 2 modifications:
    1. Use __iar_data_init3
    2. Modified the linker file to add .data_init and other initialization sections next to the code section.    It worked once. So in order to confirm that both modifications were needed, I modified back __iar_data_init3 to __iar_data_init2 and it didnt work anymore.

    Then by doing only (2) it didnt work either. So I put back both, and now it doesnt  work.

    But it worked once, I saw that the variable had been initialized by the __iar_data_init3 function.




  • Note: This was originally posted on 28th June 2011 at http://forums.arm.com

    Hi again.
    The problem seems to be coming from a bug in the program that loads the application to flash. I tried to write the application by another way, and it worked properly. I also tried to load the data directly to RAM from the serial port in the starter, and again it worked properly.
    I am guessing that the bug is not obviously changing all data, but only occuring on particular cases, and without affecting the checksum. Maybe it is inverting bytes sometimes, and since the checksum is just a sum, that goes undetected.
    Thank you for your answers, it helped to eliminate some interrogations.

  • Note: This was originally posted on 29th June 2011 at http://forums.arm.com

    And here it is: I got some misplaced code that is causing me some  trouble. The loader had a bug and didnt take into account when there was a small non-continuity in the addresses. That put some code in the wrong place, but since everything had been written to flash, the checksum was right (it doesnt care about the code location).
    It was happening just on a few bytes in the code (less than 128 bytes) so it was hard to detect .
    Case solved
  • Note: This was originally posted on 25th June 2011 at http://forums.arm.com

    Are the caches on when your bootloader is copying the application from Flash to RAM?

    If yes, then you should clean the D cache (if it's write-back) and flush the I-cache before you jump to the application, because the copying happens as data.  During the copying the application area needs to be writeable.  Inconsistent cache problems can be hard or impossible to see in a debugger, because a debugger probably reads the code as data which might not match what the processor sees from the I cache.

    Also, when the application starts, is are the MMU and caches still on from the bootloader?  If so the application's "reset handler" may need to take that into account -- for instance when you disable the MMU there needs to be code at the same physical address as the virtual address it's just been executing from.

    Is your abort exception a data abort or prefetch abort?  You can only tell the difference by which exception vector was used.  If the abort was a data abort caused by the MMU then R14 will point a couple instructions past the offending instruction (see the TRM or ARM ARM for exact details) and more information (such as the address being accessed) is available in the DFAR and DFSR (again see the docs).
  • Note: This was originally posted on 26th June 2011 at http://forums.arm.com

    > Still that does not explain why I dont get any trouble when running from the debugger.

    Halting mode debug presents a "debug illusion" to the user which looks like it does the right thing, but the steps to get there are not always exactly what the processor would do if there was no debugger. There are normally a lot of hacks done by the ICE logic around cache flushing to make sure the debugger does the right thing, for example.

    Because of these background operations needed to maintain the illusion it tends to rather intolerant of misconfiguration of the MMU / TLB settings.