This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Baremetal program jumps to 0x200

Hello, I am trying to run a "hello world" program with C/C++ standard library support on Morello board (hardware), using Arm Development Studio Morello edition.

I previously followed the standalone-baremetal-readme.rst guide which worked well (following the advice from this topic), but it did not allow to use functions like "printf".

I tried to use examples from:

https://git.morello-project.org/morello/llvm-project-releases/-/tree/morello/baremetal-release-1.6?ref_type=heads

I ran make and the "make-bm-image.sh" with "-e" flag to produce "howdy-purecap-bm-image.elf" and "howdy-morello-bm-image.elf" (in the "make-bm-image.sh" script I added a line to preserve a copy of the .elf file), then I loaded these in the development studio.

It appears that the program goes to address 0x200 after executing the "MRS" instruction.

Does anyone know why that happens?

Also, in the standalone-baremetal-readme.rst guide it was necessary to specify UART address (0x2A400000) in the program, is it correct to assume that examples from baremetal-release-1.6 branch of llvm-project-releases will use that address (without the need to specify it anywhere in the program) and the printf/cout messages will appear in the AP com port of Morello hardware board? Or is it necessary to do some adjustments to achieve that?

Parents
  • My previous reply to this message was hidden so it may appear after this one.

    I just realized that connecting to "Rainier_SMP_0" or "Rainierx4 Multi-Cluster SMP" (instead of "Rainier_0") causes the LDR instruction (the first instruction of ".pure" function that follows "_start") to jump at 0xE0002000 (and then at "curr_sp0_fiq").

    I don't know why but at least it gives the opportunity to view registers values when the issue first happens.

    This is the whole code that executes before LDR instruction fails:

    After the jump at 0xE0002000, this is the state of registers/memory:

    The ESR_EL2 mentions "Capability tag fault", do you know what could be the cause of it? Is it because C0 tag is not equal to 1 for some reason?

Reply
  • My previous reply to this message was hidden so it may appear after this one.

    I just realized that connecting to "Rainier_SMP_0" or "Rainierx4 Multi-Cluster SMP" (instead of "Rainier_0") causes the LDR instruction (the first instruction of ".pure" function that follows "_start") to jump at 0xE0002000 (and then at "curr_sp0_fiq").

    I don't know why but at least it gives the opportunity to view registers values when the issue first happens.

    This is the whole code that executes before LDR instruction fails:

    After the jump at 0xE0002000, this is the state of registers/memory:

    The ESR_EL2 mentions "Capability tag fault", do you know what could be the cause of it? Is it because C0 tag is not equal to 1 for some reason?

Children
  • Ah that is indeed progress. Your first post suggests some kind of stack overflow, as CSP hits its lower bound. Clearly this is a consequence of something else going wrong, probably related to this strange stepping behaviour.

    The second post is a lot more straightforward: the function pointer C0 is null-derived, so BR C0 will cause PCC to become an invalid capability and thus cause an instruction abort right away. Of course the question is why C0 would be null-derived. Assuming the code sequence is correctly executed, this could only happen if DDC itself is null. Could you check the value of DDC_EL2?

  • Right, that'll be your problem. How that came to pass, I have no idea... The first step would be to check if it is valid at the very beginning of the execution. If not, there must be something going wrong with the firmware.

  • I rebooted the board and it seems that now DCC_EL2 is not null anymore (at the beginning of execution), and the LDR instruction is executed well.

    I wrongly assumed that changing "Rainier_0" to "Rainier_SMP_0" or "Rainierx4 Multi-Cluster SMP" made the LDR fail, I think it was coincidence, because changing these 3 options now does not make LDR fail anymore (following reboot which fixed DCC_EL2 being 0).

    But the issue where the function enters "curr_sp0_fiq" recursively after being stepped-over is still there.

    I've set a hardware breakpoint on the "curr_sp0_fiq" and 0xE0002200, I used F5 to step until the first function call (which was _cpu_init_hook) and pressed F6 to step-over it, the breakpoint got triggered and registers values are:

    I tried expanding the column and copying the text to see if there's something after "following..." but there was nothing, this is what it looks like when using tooltip:

    The DDC_EL2 seems to keep the same value as at the beginning of execution, I think there may be 2 separate issues, DDC sometimes being 0 (which gets fixed by rebooting the board), and this unidentified issue when stepping-over function call. The ELR_EL2 points to the instruction just after the _cpu_init_hook call.

    I did another experiment, where I pressed "continue" button (from the beginning of "_start" function) instead of using F5 to reach the 1st function call. Interestingly, in this case the breakpoint is hit with ELR_EL2 having much higher value (part of _get_s function), where DDC_EL2 becomes 0.

    Apologies for bombarding with all these screenshots/reports but it's all black magic to me, and I can't understand why such weird issues could occur.