This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

gnu GCC option to enforce 8-byte stack alignment (necessary for R52)?

Hello,

ARM support and R52 TRM have indicated that the R52 core requires maintaining an 8-byte aligned Stack (meaning compiler shall always push/pop registers in even numbers), and I see in my current setup the GNU GCC compiler is not adhering to that requirement. Subsequently in my core simulations I see R52 erroneous behavior if the stack is not maintained as such, especially if only a single register is pushed/popped for a function.

Am I missing some GNU GCC option to force 8-byte stack alignment? Also since ARM support stated this was a necessary requirement, shouldn't that be automatically implemented in the -mtune=cortex-r52 options?

My current setup is using ARM recommended version gcc-arm-none-eabi-9-2020-q2-update (according to the GNU toolchain developer website)

(I have also tested the version gcc-arm-none-eabi-10-2020-q2-preview, still seeing same problems)

My GCC command has the following options:

gcc-arm-none-eabi-10-2020-q2-preview/bin/arm-none-eabi-gcc -march=armv8-r -mfpu=neon-fp-armv8 -mtune=cortex-r52 -marm -c -g -O3 -fno-inline -fno-strict-aliasing -DGCC -falign-functions=16 -falign-jumps=8 -falign-loops=8 -fomit-frame-pointer -funroll-loops -mapcs-frame -DITERATIONS=20 -save-temps -DCR52 -Werror -Wall -Dcr52 -DCORE_0 -std=c99 -o alive_CORE_0.o -c alive.c

From above example, my compile results in the following odd-number of registers pushed onto stack: (see snippet from .lst file)

int main (void) {
2a980: e92dd810 push {r4, fp, ip, lr, pc}   <<<=== ODD NUMBER OF REGISTER IN PUSH INSTRUCTION:

NOTE: the problem also occurs if I use -mthumb mode.

Other notes: I added the -fno-inline as according to an internal verification decision that we want to avoid the code in-lining optimization. Additionally I added the -mapcs-frame in an attempt to fix this issue, and this option seems to make the problem a lot better, but doesn't completely fix the problem.

Or is this a GNU GCC bug for the -mtune=cortex-r52 set of tuning options?

Thanks in advance for any help.

  • and R52 TRM have indicated that the R52 core requires maintaining an 8-byte aligned Stack

    I wasn't able to find any content inside the TRM amounting to this requirement. Could you please point out the location?

    meaning compiler shall always push/pop registers in even numbers

    If the requirement about the 8-byte alignment refers to the ABI defined by aapcs32, then the ABI enforces 4-byte alignment at all times, with the additional restriction of 8-byte alignment at public interfaces.

    For e.g. the code here is compiled to include these instructions:

            push    {r4, r5, r6, fp, ip, lr, pc}
            sub     fp, ip, #4
            sub     sp, sp, #52

    Assuming that that sp is 8-byte aligned at the entry of the function, pushing 7 registers causes sp to be misaligned, but the subsequent adjustment to sp, by subtracting 52, again aligns sp back to an 8-byte boundary, all before the call to an external routine, printf.

    While the stack is misaligned (i.e. isn't on an 8-byte boundary), and an exception/interrupt arrives, if the handler uses this same sp register, and if the handler expects it to be an 8-byte aligned, that could be a problem, but I am guessing that this situation too must have already been handled either in the ABI or in the way the handlers are written, or in the way sp register is chosen/banked and setup.

  • Follow-up on original issue:

    1) I was in error of misinterpreting the meaning of "ABI compliant stack alignment" requirement to incorrectly infer an 8-byte alignment requirement.

    2) the issue I was seeing in simulations tended to manifest itself most frequently where the SP was starting on an odd-word boundary (thus the original focus on the alignment of the stack), these stack transactions to memory resulted in sparse write transactions, and the root of the issue was my slave-memory behavioral model had a bug related to sparse writes that were dropping the write transactions.

    Thanks for the clarification in the process that helped educate my misunderstanding.