This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Cortex A9 (IMX6) : Enabling branch prediction aborts

Hello,

I am using imx6 (cortex- A9) board, and my mmu environment is as follows
            mmu   - enabled
            L1 data cache - enabled
            L1 instruction cache - enabled
            D-side prefetch - enabled
            L2 cache - disabled
            Branch prediction - disabled

With this environment, my code runs for a longer time (more than 12 hours) with out any issues.
But, if I enable branch prediction, the code aborts in few seconds, at random addresses.
I am not able to understand what goes wrong.
Are there any special option which I have to consider while building my code.
I am using GCC as the compiler.


Note : In my code, I am using a static library, which was given by our vendor.

Any help would be great.

Thanks,

Gopu

 

Parents
  • Things to try:

    1. When do you start to see this abort issue?
    Due to recent software changes?

    Try check out old code to check if you can duplicate the issue.

    (do binary search to find out which commit triggered this issue.)
    When you review your software changes, pay attention to the following:
    1. uninitialized variables
    2. volatile shared variables (add "volatile" uint32_t * if need)
    3. Adds ISB, DSB, DMB if need (for Out of Order execution CPU)


    2. enable L2 cache, keep other old configures.(core @1GHz)
    disable Branch prediction (abort or not, re-test and confirm this)
    enable branch prediction (abort or not, re-test and confirm this)
Reply
  • Things to try:

    1. When do you start to see this abort issue?
    Due to recent software changes?

    Try check out old code to check if you can duplicate the issue.

    (do binary search to find out which commit triggered this issue.)
    When you review your software changes, pay attention to the following:
    1. uninitialized variables
    2. volatile shared variables (add "volatile" uint32_t * if need)
    3. Adds ISB, DSB, DMB if need (for Out of Order execution CPU)


    2. enable L2 cache, keep other old configures.(core @1GHz)
    disable Branch prediction (abort or not, re-test and confirm this)
    enable branch prediction (abort or not, re-test and confirm this)
Children
  • Today, I have tried by reducing the core clock to 800MHz. With this there is no data abort. Does this mean that,

    When DDR3 is accessed by ARM core at 1GHz the data read by core looks faulty.
    Do I have to analyze further by reducing the DDR3 clock from 528MHz to 520MHz instead of reducing the core clock.

    I will check ur suggestions as well.
  • Hello,

    I have verified the software, and I could not find any bug in it. I am doing this analysis for more than a month now.
    One thing what I have noticed is, if Branch prediction is enabled, the system used to abort in few seconds. I have enabled the branch prediction, and reduced the arm core clock speed to 800MHz from 996MH, now the system runs for more than 24 hours with out aborting.
    My DDR runs at 528MHz. Since it is DDR expected data rate is 528MHz * 2.

    Thanks and regards,
    Gopu
  • imx6 seems sensitive to your software. Did you report this issue to the SoC vendor? Why other imx6 did not see the same issue that you saw? Is there any sequence software special? That can trigger this issue. Could you share your interrupt context switch code in github? Who make your board? If board does not provide clean reference clock to imx6, this could be a board design issue too. Still, it is hard to make a conclusion without the source code, the board schematic, and consulting with SoC vendor and board designer.
  • FYI: Some embedded firmware code is timing and order sensitive.

    For example:
    Original C working code. (test on another board/chips; it does not work on a new revision of SoCs)
    *p1 = A;
    *p2 = B;
    *p3 = C;

    *p1 = A;
    ADD_50_NOP_HERE_STILL_DOES_NOT_WORK.
    *p2 = B;
    *p3 = C;

    *p1 = A;
    ADD_100_NOP_HERE_THEN_WORK.
    *p2 = B;
    *p3 = C;


    *p1 = A;
    *p3 = C; // without delay; re-order p2 and p3 lines; then works too.
    *p2 = B;


    *p1=A;
    ISB; // add ISB then works too.
    *p2=B;
    *p3=B;