This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Cortex A9 (IMX6) : Enabling branch prediction aborts

Hello,

I am using imx6 (cortex- A9) board, and my mmu environment is as follows
            mmu   - enabled
            L1 data cache - enabled
            L1 instruction cache - enabled
            D-side prefetch - enabled
            L2 cache - disabled
            Branch prediction - disabled

With this environment, my code runs for a longer time (more than 12 hours) with out any issues.
But, if I enable branch prediction, the code aborts in few seconds, at random addresses.
I am not able to understand what goes wrong.
Are there any special option which I have to consider while building my code.
I am using GCC as the compiler.


Note : In my code, I am using a static library, which was given by our vendor.

Any help would be great.

Thanks,

Gopu

 

Parents Reply Children
  • 1. Why you disable Branch prediction and L2 cache?
    What is the original issue that you saw?
    Root cause of your original issue could be "lots of possibilities..."
    What things did your try before?

    2. How many cores do you enabled?
    Try enable just one core if your chip supports this feature.

    3. What is your system configuration?
    Core clock frequency?
    board reference clock?
    clock for memory interface?
    Try lower clock frequency of core cpu
    lower clock frequency of memory interface

    4. Modify gcc option to -O0 (less optimization)

    5. How many boards report the same issue?
    replace memory module?
  • Hello,

        Kindly find the answers for your questions below.

    1. Why you disable Branch prediction and L2 cache?

    What is the original issue that you saw?

    Root cause of your original issue could be "lots of possibilities..."

       I have disabled L2cache, because the execution was slower, when I enabled L2 cache.

       Disabled Brach prediction, because enabling results in data abort.

       So, wanted to solve this branch prediction issue, before enabling the l2 cache.

     

    What things did your try before?

       Tried reducing the arm core clock frequency.

       Tried disabling optimisation.

       When there is a data abort, it looks r11 or r12 register is corrupted.  r11 and r12 are stored and restored in interrupt context switch.

       But these issues doesnt happen when branch brediction is disabled.  

    2. How many cores do you enabled?

    Try enable just one core if your chip supports this feature.

          I have enabled, only one core.

    3. What is your system configuration?

    Core clock frequency?  - 996MHz

    board reference clock? - 24MHz

    clock for memory interface? - 528MHz  DDR3

    Try lower clock frequency of core cpu - 

    Today, I have tried by reducing the core clock to 800MHz. With this there is no data abort. Does this mean that,

    When DDR3 is accessed by ARM core at 1GHz the data read by core looks faulty.
    Want to analyze this further, by reducing the DDR3 clock from 528MHz to 520MHz instead of reducing the core clock.
     

    lower clock frequency of memory interface - Default is 528MHz, but  I have not tried reducing this.

    4. Modify gcc option to -O0 (less optimization) - This is my current configuration

    5. How many boards report the same issue ? - Tried in 3 boards, and all report the same

    replace memory module? - Tried 3 different boards, but not the module

  • Things to try:

    1. When do you start to see this abort issue?
    Due to recent software changes?

    Try check out old code to check if you can duplicate the issue.

    (do binary search to find out which commit triggered this issue.)
    When you review your software changes, pay attention to the following:
    1. uninitialized variables
    2. volatile shared variables (add "volatile" uint32_t * if need)
    3. Adds ISB, DSB, DMB if need (for Out of Order execution CPU)


    2. enable L2 cache, keep other old configures.(core @1GHz)
    disable Branch prediction (abort or not, re-test and confirm this)
    enable branch prediction (abort or not, re-test and confirm this)
  • Today, I have tried by reducing the core clock to 800MHz. With this there is no data abort. Does this mean that,

    When DDR3 is accessed by ARM core at 1GHz the data read by core looks faulty.
    Do I have to analyze further by reducing the DDR3 clock from 528MHz to 520MHz instead of reducing the core clock.

    I will check ur suggestions as well.
  • Hello,

    I have verified the software, and I could not find any bug in it. I am doing this analysis for more than a month now.
    One thing what I have noticed is, if Branch prediction is enabled, the system used to abort in few seconds. I have enabled the branch prediction, and reduced the arm core clock speed to 800MHz from 996MH, now the system runs for more than 24 hours with out aborting.
    My DDR runs at 528MHz. Since it is DDR expected data rate is 528MHz * 2.

    Thanks and regards,
    Gopu
  • imx6 seems sensitive to your software. Did you report this issue to the SoC vendor? Why other imx6 did not see the same issue that you saw? Is there any sequence software special? That can trigger this issue. Could you share your interrupt context switch code in github? Who make your board? If board does not provide clean reference clock to imx6, this could be a board design issue too. Still, it is hard to make a conclusion without the source code, the board schematic, and consulting with SoC vendor and board designer.
  • FYI: Some embedded firmware code is timing and order sensitive.

    For example:
    Original C working code. (test on another board/chips; it does not work on a new revision of SoCs)
    *p1 = A;
    *p2 = B;
    *p3 = C;

    *p1 = A;
    ADD_50_NOP_HERE_STILL_DOES_NOT_WORK.
    *p2 = B;
    *p3 = C;

    *p1 = A;
    ADD_100_NOP_HERE_THEN_WORK.
    *p2 = B;
    *p3 = C;


    *p1 = A;
    *p3 = C; // without delay; re-order p2 and p3 lines; then works too.
    *p2 = B;


    *p1=A;
    ISB; // add ISB then works too.
    *p2=B;
    *p3=B;