This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

hard fault caused by scatter loader decompress routine

I am working with a microsemi cortex M3 ARM with uVision and ulink2 mdk 5.12 and I am getting a PRECISERR bus fault from a memory read during initialization by the scatterloader decompress routine. It is definitely trying to read a bad address. This code is created by the linker to initialize my memory/variables so its not something I have direct control over.

Here is the goofy thing, I found that I could get around getting a hard fault 1 time if after hitting the hard fault I get out of the debugger and right back in. Then I can run as long as I don't try to restart by clicking the reset button on uVision.

I don't understand why this would make any difference but it does and allowed me to work for the past few weeks although its slow and awkward.

This weekend my luck ran out and now I get that hard fault every single time I trying to run. My trick no longer works. All I did just before this started was to add 1 line of code and recompile and reload.

Here is a interesting clue. Out of desperation I recompiled my entire project and reflashed and I was able to get to Main! All I did was recompile. I did some testing and had to make one minor change to a index and recompiled and now I am back to getting that hard fault every time. None of my tricks work either.

Its still coming from the same loop in the decompress routine where it reads a address that is just below my DDR memory. 0x9FFFFEFF

It really sounds like some kind of memory alignment issue or something like that. But the clues do not seem to be related.

Why would adding or removing a few bytes from my code cause the decompress routine to hard fault?
Why did recompiling the project allow me to get past the decompress problem once?
Why does clicking reset and trying to run the same code cause the decompress to suddenly not work?
Why would getting out the debugger and back in allow it to do the decompression correctly one time?

Arrrgg.

I am totally stopped from doing any more development because the decompression code always causes a hard fault from the same address and memory access.

Has anyone experienced anything like this before?

Parents

0 steve kerich over 11 years ago in reply to Westonsupermare Pier

Hi

I have flash it into memory and let it run and it still gets a hard fault.

I have been doing some testing/investigating and I think I was off on thinking it was the compiler setting. But there is something going on interesting. I wish I could post a screen shot because that would be clear as to what I have found out.

When the debugger starts it does write to DDR. ESRAM also contains the exact same data in the same order. I believe decompress is copying the data in esram to the DDR. It writes what will be my initialized data. It writes to 0xA0000000 (mixed hex and ascii)

0x0 0xD 0x1A SD_NO_ERROR . .. .

My memory map says that at this address should be my error array which has a first value of SD_NO_ERROR (ascii). THE 0 is the offset to the current address to write, the 0xD is the length of the string and the 0x1A is ???. there are 3 hex bytes in front of every string.

I can run to main and this is what the code initialized the DDR memory to:

SD_NO_ERROR .....

This is exact correct. What the debugger wrote was ignored and over written as it should be.

Now, if I click reset and go to decompress and I see that the 3 bytes before the string are missing from ESRAM so when it loads the offset to write it gets 0x53 ("S") instead of 0 and a count of 3 bytes instead of 0xD:

D_NO_ERROR

If I let it continue with these bad values it will eventually cause hard fault because it tried to write to address 0x9FFFFB5 which would be below my DDR.

If I reload with the debugger and run to the start of decompress the esram looks good with with those 3 bytes before each string. I can reset over and over and every time it looks the same and correct.

But if I allow it to write just the first string SD_NO_ERROR to DDR and then do a reset and run to the start of decompress, those 3 bytes are missing from ESRAM and SD_NO_ERROR is at 0x2000000 instead of those 3 bytes and then the string.

Why writing the string to DDR would cause whatever initializes the esram with the string values to be off by 3 bytes is a mystery? The code is in nvm memory and so should these initial values that are being put into esram and writing them out to DDR should have not affect.

Getting out of the debugger and back in fixes this data in esram some how because after doing that and running to decompress the esram is correct again.

The debugger is doing something with getting that data into the esram correctly although that all should be done by the code itself.

This is why i thought it might be a compiler switch issue but now it looks like the act of writing the data to DDR changes what is loaded upon the next reset by 3 bytes.

I'll look back farther into the scatter loader to see how that esram data is being put into esram memory.
Cancel
Vote up 0 Vote down

Cancel

Reply

0 steve kerich over 11 years ago in reply to Westonsupermare Pier

Hi

I have flash it into memory and let it run and it still gets a hard fault.

I have been doing some testing/investigating and I think I was off on thinking it was the compiler setting. But there is something going on interesting. I wish I could post a screen shot because that would be clear as to what I have found out.

When the debugger starts it does write to DDR. ESRAM also contains the exact same data in the same order. I believe decompress is copying the data in esram to the DDR. It writes what will be my initialized data. It writes to 0xA0000000 (mixed hex and ascii)

0x0 0xD 0x1A SD_NO_ERROR . .. .

My memory map says that at this address should be my error array which has a first value of SD_NO_ERROR (ascii). THE 0 is the offset to the current address to write, the 0xD is the length of the string and the 0x1A is ???. there are 3 hex bytes in front of every string.

I can run to main and this is what the code initialized the DDR memory to:

SD_NO_ERROR .....

This is exact correct. What the debugger wrote was ignored and over written as it should be.

Now, if I click reset and go to decompress and I see that the 3 bytes before the string are missing from ESRAM so when it loads the offset to write it gets 0x53 ("S") instead of 0 and a count of 3 bytes instead of 0xD:

D_NO_ERROR

If I let it continue with these bad values it will eventually cause hard fault because it tried to write to address 0x9FFFFB5 which would be below my DDR.

If I reload with the debugger and run to the start of decompress the esram looks good with with those 3 bytes before each string. I can reset over and over and every time it looks the same and correct.

But if I allow it to write just the first string SD_NO_ERROR to DDR and then do a reset and run to the start of decompress, those 3 bytes are missing from ESRAM and SD_NO_ERROR is at 0x2000000 instead of those 3 bytes and then the string.

Why writing the string to DDR would cause whatever initializes the esram with the string values to be off by 3 bytes is a mystery? The code is in nvm memory and so should these initial values that are being put into esram and writing them out to DDR should have not affect.

Getting out of the debugger and back in fixes this data in esram some how because after doing that and running to decompress the esram is correct again.

The debugger is doing something with getting that data into the esram correctly although that all should be done by the code itself.

This is why i thought it might be a compiler switch issue but now it looks like the act of writing the data to DDR changes what is loaded upon the next reset by 3 bytes.

I'll look back farther into the scatter loader to see how that esram data is being put into esram memory.
Cancel
Vote up 0 Vote down

Cancel

Children

0 steve kerich over 11 years ago in reply to steve kerich

Ok I found how the data that was being put into esram is coming from. The scatter loader is called first and it calles scatter loader_null which believe it or not copies the values in the DDR into the esram!!!!! Holy cow. The DDR has been either just written by the debugger with the correct init values or was over written with the initialization values.

On the second (after reset) and all other runs of the code it will continue to copy the final initialized code into esram DDR and then copies the esram data BACK INTO DDR. The 3 bytes at the beginning of each string is long gone. It is only there on the first run when the debugger puts it into DDR. All runs of the code after that are picking up whatever the DDR was initialized to.

This is not offset, string length or any information to tell the decompress routine how much to copy and to where. Its just chaos at this point.

So the code is definitely depending upon the debugger to put this proper data into DDR so it can copy that to esram so that it can be decompressed back into DDR memory minus the 3 bytes of string information.

This cannot not possibility be how this ARM processor and code will run operationally. It cannot depend upon a debugger to put that information in DDR memory.

There is some code that should be created by the linker to do this and not the debugger. Obviously this code is not being generated by the linker so that the proper initialization information is made available to the decompress routine.

How does one turn that on? I need some help here because I have not idea why this missing code is not being created.
Cancel
Vote up 0 Vote down

Cancel
0 ImPer Westermark over 11 years ago in reply to steve kerich

So exactly what does your scatter file look like?

Sounds a bit like some data that in a normal build should be stored in flash is in your build stored in a RAM region (downloaded by the debugger) that will be overwritten when the program starts to run - so a second run will basically have "parts of the 'nonvolatile flash' destroyed".
Cancel
Vote up 0 Vote down

Cancel
0 Tamiryan Michael over 11 years ago in reply to ImPer Westermark

Exactly - show us your scatter file. The problem is there, or in your hardware.
Cancel
Vote up 0 Vote down

Cancel

0 steve kerich over 11 years ago in reply to Tamiryan Michael

OK that would make total sense. I am sure I have accidentally put that initial data into the DDR memory space through the use of a wildcard * .

I didn't even think about that. I assumed that the data would be a integrated part of the code and not a separate data item. I just put every RW section into DDR. I assumed that this static data would RO only. It should be RO data and not RW. I think that would be a misclassification of that type of data

Do you know what it would be called? Its the .data section right? The .bss is my read/write variables?

0xA0000000 is where I found the scatterload loading the initialization data and that is where the .data sections are a grouped together.

FLASH_LOAD 0x00000000 0x00080000      ; load region size_region
{
    ER_RO 0x00000000 0x80000    ; load address = execution address
    {
        *.o (RESET, +First)
        *(InRoot$$Sections)
;        startup_m2sxxx.o (.text)
       system_m2sxxx.o (.text)
;       sys_config.o (.text)
       low_level_init.o (.text)
        retarget.o  (.text)
         * (+RO)
    }

    ER_RW 0x20000000 UNINIT 0x10000
    {
        startup_m2sxxx.o (STACK)
    }

}

MDDR_RAM 0xA0000000 0x10000000
{
    ER_DDR 0xA0000000 UNINIT 0x10000000  ; RW data  DDR
    {
       * (+RW +ZI)
       * (HEAP)
    }

}

Base Addr    Size         Type   Attr      Idx    E Section Name        Object




    0xa0000000   0x00001c8e   Data   RW            7    .data               accumulationnode.o
    0xa0001c8e   0x00000002   Data   RW          535    .data               statemachine.o
    0xa0001c90   0x00000004   Data   RW          777    .data               processidle.o
    0xa0001c94   0x00000001   Data   RW         1008    .data               timerirq.o
    0xa0001c95   0x00000001   PAD
    0xa0001c96   0x00000002   Data   RW         1069    .data               dummy.o
    0xa0001c98   0x000001e4   Data   RW         1204    .data               diskio.o
    0xa0001e7c   0x00000006   Data   RW         1365    .data               ff.o
    0xa0001e82   0x00000002   PAD
    0xa0001e84   0x00000055   Data   RW         1435    .data               uart.o
    0xa0001ed9   0x00000003   PAD
    0xa0001edc   0x0000001c   Data   RW         1603    .data               system_m2sxxx.o
    0xa0001ef8   0x00000004   Data   RW         1672    .data               retarget.o
    0xa0001efc   0x00000008   Data   RW         1888    .data               mss_can.o
    0xa0001f04   0x00000018   Data   RW         2047    .data               mss_pdma.o
    0xa0001f1c   0x0000002c   Data   RW         2121    .data               mss_comblk.o
    0xa0001f48   0x00004d34   Zero   RW            5    .bss                accumulationnode.o
    0xa0006c7c   0x0000000c   Zero   RW         1203    .bss                diskio.o
    0xa0006c88   0x000002d0   Zero   RW         1504    .bss                fault_handler.o
    0xa0006f58   0x00000018   Zero   RW         1950    .bss                mss_hpdma.o

0 steve kerich over 11 years ago in reply to steve kerich

Moving the .data sections to nvm solved the problem. I can run to main every time now. Decompress does not even get called now. I could not see the forest because of the trees.

Thank you very much for you help in leading me to the answer.

Steve
Cancel
Vote up 0 Vote down

Cancel
0 Westonsupermare Pier over 11 years ago in reply to steve kerich

Isn't 0xA0000000 the base of your DDR? Why the heck are you copying stuff there, or allowing the areas to conflict?

Stuff that gets unpacked by the linker/loader needs to reside INSIDE the ROM/FLASH LOAD REGION braces.

If you are remapping 0 <-> 0xA0000000 you need to carve that space out of the linkers view so it doesn't put data over the top of it.
Cancel
Vote up 0 Vote down

Cancel
0 steve kerich over 11 years ago in reply to Westonsupermare Pier

for this build I am not remapping. We needed to get this out by our deadline this friday so I created a no remapped nvm smaller version of my code. So I am only using it for my variables and heap right now. Just a huge hunk of ram.

Once that delivery is done. I will go back and work on the remapped version and your right my code and that variables will both be in DDR memory. I have a different scatter file for that mapping.

In fixing this initialization problem by moving the .data sections to nvm my mallocs will no longer allocate memory. They were working ok before making this change.

Do you know if there is anything in the .data section that would affect malloc?
Cancel
Vote up 0 Vote down

Cancel
0 Westonsupermare Pier over 11 years ago in reply to Westonsupermare Pier

A LOAD REGION is like a box of furniture from IKEA, all the parts need to fit inside the box for it to be shipped to you, this would be the "linkers" job. The "loaders" job is then to unpack the parts and assemble the pieces where you want the final constructed piece of furniture to end up. The separate parts cannot fit in the same time/space as each other without distorting the fabric of the universe.
Cancel
Vote up 0 Vote down

Cancel

0 Westonsupermare Pier over 11 years ago in reply to Westonsupermare Pier

FLASH_LOAD 0x00000000 0x00080000      ; load region size_region
{
    ER_RO 0x00000000 0x80000    ; load address = execution address
    {
        *.o (RESET, +First)
        *(InRoot$$Sections)
;        startup_m2sxxx.o (.text)
       system_m2sxxx.o (.text)
;       sys_config.o (.text)
       low_level_init.o (.text)
        retarget.o  (.text)
         * (+RO)
    }

    ER_RW 0x20000000 UNINIT 0x10000
    {
        startup_m2sxxx.o (STACK)
    }

;    ER_DDR 0xA0080000 0xFF80000  ; RW data  DDR (if FLASH gets copied into DDR space)

    ER_DDR 0xA0000000 0x10000000  ; RW data  DDR (if DDR doesn't clash with code space)
    {
       * (+RW +ZI)
       * (HEAP)
    }
}

0 steve kerich over 11 years ago in reply to Westonsupermare Pier
Thanks. This is what my remap scatter file looks like.

This problem with initialized variables is still a problem unfortunately. I know that move the .data sections into nvm solves the problem of having the initial values in nonvolatile memory and it does work and does not depend upon the debugger to load the values because they are part of the nvm now.

But it seems that the linker is also placing the variable address in nvm when it must be in a RW area like esram or DDR. The linker is moving all variable address to nvm. Even if they are not initialized at boot time.

I found this out when very early one of my routines attempted to write to a variable and it got a hard fault because when I looked at the map it was located in the flash memory. It was part of the .data section. In fact all variables are part of the .data section.

But if I move the .data section back to DDR, I am back the the original problem because the initialized data is put back into DDR as well. I cannot see a way to separate the address of a variable from the values used to initialize it.

The values should be in nvm to be accessed by the scatter loader and copied into the DDR where the variable is located. The .data section seems to apply to both which is a problem and does not make sense either.

There is a .constdata but it does not include the initialed variables which are .data

How am I to get the initial data to nvm and the variable to RW DDR?

This is my current scatter file

FLASH_LOAD 0x00000000 0x00080000 ; load region size_region { ER_RO 0x00000000 0x40000 ; load address = execution address { *.o (RESET, +First) *(InRoot$$Sections) * (+RO) * (.data) } ER_RW 0x20000000 UNINIT 0x10000 { startup_m2sxxx.o (STACK) } } MDDR_RAM 0xA0000000 0x10000000 { ER_DDR 0xA0000000 UNINIT 0x10000000 ; RW data DDR { * (+RW +ZI) * (HEAP) } }
Cancel
Vote up 0 Vote down

Cancel
0 Westonsupermare Pier over 11 years ago in reply to steve kerich

I'll have to ponder, but if your DDR is properly initialized prior to calling __main, the C runtime / scatter loader should be able to initialize the data there.

I don't understand why your platform is so broken that this isn't being done already. Or how getting this to work properly isn't going to solve both your Friday issue, and the shadow/remap issue.

Why do you still have MDDR_RAM, you need to package ALL of the released image components in the FLASH?
Cancel
Vote up 0 Vote down

Cancel
0 Westonsupermare Pier over 11 years ago in reply to Westonsupermare Pier

In case the IKEA concept isn't working for you
www.keil.com/.../armlink_pge1362075661087.htm
Cancel
Vote up 0 Vote down

Cancel
0 steve kerich over 11 years ago in reply to Westonsupermare Pier

yes I do understand this already. Thank you.

The problem that I still have is that my initialized const char* strings and my initialized global variables are all in the .data section.

If I located the .data section in the nvm, then my global variables are not writable and I get a hard fault. If I put the .data section into DDR memory, then the initial values for my const char* strings are read from (attempted to be read from DDR) but the initial data will not be there unless I run the debugger which loads it there. That will not work operationally. The initial data must be in nvm but that drags all of my global variables along with it.

I do not know how to get that initial data into nvm without having it take the initialized globals too.

from wiki:
The data area contains global and static variables used by the program that are explicitly initialized with a non-zero (or non-NULL) value. This segment can be further classified into a read-only area and read-write area. For instance, the string defined by char s[] = "hello world" in C and a C statement like int debug=1 outside the "main" would be stored in initialized read-write area. And a C statement like const char* string = "hello world" makes the string literal "hello world" to be stored in initialized read-only area and the character pointer variable string in initialized read-write area.
Cancel
Vote up 0 Vote down

Cancel
0 steve kerich over 11 years ago in reply to steve kerich
This is a example of one of the structures that is causes the problem. this should be in nvm as a constant value. Instead it is being located the DDR along with its initialization information if I put the .data section in DDR so that I can use my global variables.

If I put .data in nvm then this works fine but all my globals are not writable.

If I can get these into a read only section, then I can locate them in nvm. I would think there is some directive or something I can add to the declaration to force to be in a specific section. Maybe.

const char* ProcessProfile_errors_Tran[END_PROCESSPROFILE_ERROR][40] = { {"MESSAGE_SUCCESS"}, {"FAILED_TO_PROCESS"}, {"FAILED_TO_SEND_MESSAGE"}, {"FAILED_TO_RECEIVE_MESSAGE"}, {"FAILED_INVALID_MESSAGE"}, {"FAILED_TO_SYNC"}, {"FAILED_NO_ACTUATOR"}, {"FAILED_ACTUATOR_SETUP"}, {"TEST_USER_TERMINATED"}, {"FAILED_TO_FIND_NODE_INDEX"} };
Cancel
Vote up 0 Vote down

Cancel
0 steve kerich over 11 years ago in reply to steve kerich

This will force it

__attribute__ ((section ("INITDATA")))
Cancel
Vote up 0 Vote down

Cancel