This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

i got really weird test result of memory barrier on Cortex A9

Parents
  • Note: This was originally posted on 28th November 2011 at http://forums.arm.com

    and obviously the DMB is enough here, with lower payload?


    Yes, DMB is what you want here. DSB also synchronizes the instruction stream, which is obviously more expensive, as it stops the core prefetching instructions.

    how can cpu0 get s_byt_test_int1 > s_byt_test_int2 ?


    After the  CPU out-of-orders you actualyl end up with:

    thread1:

    int1++;
    thread1_barrier;
    int2++;


    thread2:

    read int1;
    read int2;


    If the first thread runs a couple of times between the two reads in the second thread -> explosion. Remember you have no locks here, so there is no guarantee the two threads run in lock-step.

    make two variables into the same  cache is much  better than make them separately in these tests, see  result of step 1  and step 2, how to explain it ?                   

    Sharing a cache line is more likely to make the two threads run in lockstep - the core has to acquire the cache line before it can process the load or store, which will stop the other thread doing a load or a store.

    Secondly you assigned a larger gap between int1 and int2 as a starting condition, so in the case above the first thread running once between two reads isn't enough to trigger the error, it has to run three times to make int2 larger than int1, so you are less likely to hit this condition.

    Iso
Reply
  • Note: This was originally posted on 28th November 2011 at http://forums.arm.com

    and obviously the DMB is enough here, with lower payload?


    Yes, DMB is what you want here. DSB also synchronizes the instruction stream, which is obviously more expensive, as it stops the core prefetching instructions.

    how can cpu0 get s_byt_test_int1 > s_byt_test_int2 ?


    After the  CPU out-of-orders you actualyl end up with:

    thread1:

    int1++;
    thread1_barrier;
    int2++;


    thread2:

    read int1;
    read int2;


    If the first thread runs a couple of times between the two reads in the second thread -> explosion. Remember you have no locks here, so there is no guarantee the two threads run in lock-step.

    make two variables into the same  cache is much  better than make them separately in these tests, see  result of step 1  and step 2, how to explain it ?                   

    Sharing a cache line is more likely to make the two threads run in lockstep - the core has to acquire the cache line before it can process the load or store, which will stop the other thread doing a load or a store.

    Secondly you assigned a larger gap between int1 and int2 as a starting condition, so in the case above the first thread running once between two reads isn't enough to trigger the error, it has to run three times to make int2 larger than int1, so you are less likely to hit this condition.

    Iso
Children
No data