This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

ARM V7 memory barrier

Note: This was originally posted on 18th June 2011 at http://forums.arm.com

Hello everyone, i have questions about memory barrier which is implemented in Linux for ARM V7

first let's suppose that we are using the ARM cortex-A9 two core CPU for example.

what is the exact meaning of instruction DMB ?

    global int a = 0; global int b = 0;
   
CPU 0        
      
    str #0x1, a    
     
    DMB

    str #0x1, b       

CPU 1

    WAIT(b==1) ; wait on flag

    DMB                  

    ldr r0, a

the result should be: cpu1:r0 == 0x1 .

there are two DMB both on CPU0 and CPU1. i want to understand the deep hardware operations done by ARM for both of DMB instructions.
(1). for DMB of CPU0, does ARM guarantee "str #0x1, a" will be executed before "str #0x1, b" ?   as i know, this can be guaranteed by DSB, but not DMB, right?
(2). the actions done by CPU0 are: CPU0 execute "str #0x1, a" ,  suppose "a" is already in cache, so the content in cache is updated for "a", at this moment, if there is interrupt occurs, maybe the cache line for "a" will be flushed to main memory. and let's suppose this is a very slow operation, and then DMB instruction is executed, what does this instruction do ???  does DMB wait for cache line flush for "a" complete?  and at last CPU0 execute "str #0x1, b", suppse "b" is already in cache, so the content in cache is updated for "b". 
after that , is it possible that "a" is in CPU0 's write buffer and does not reach to main memory yet, and "b" is in cache line and is actually ready for use. then CPU1 will get chance to get the old content of "a" in main memroy?  as from ARM ARM , the DMB will not guarantee the write buffer operation, which will be done by DSB.
(3). for CPU1, use DMB will not guarantee execute " WAIT(b==1) " instruction before "ldr r0, a", right ?  it only guarantee the memory access by "WAIT(b==1)"will be in front of
"ldr r0, a", right ?   so if CPU1 out-of-order execute "ldr r0, a" before "WAIT(b==1) ", how can it wait for content of "a" after content of "b" by using DMB instruction?  if the out-of-order is allowed for CPU1 here, "ldr r0,a " should get content of "a" directly because the DMB instruction has not even been issued yet.
(4). in linux kernel 2.6.35 bnx2.c, function bnx2_rx_int(), there is a memory barrier as below:

hw_cons = bnx2_get_hw_rx_cons(bnapi);
sw_cons = rxr->rx_cons;
rmb();
while (sw_cons != hw_cons) {
   .....
   ....

  };

the rmb() is to guarantee code inside while loop will not be speculative prefetched by cpu before we get hw_cons and sw_cons , right?
without this memory barrier, will cpu touch data which is inside while loop before the program get the right status permission to get into while loop?(i.e. sw_cons==hw_cons , but we have executed the instructions inside while loop and have already touched some data struct which is protected by sw_cons!=hw_cons check)?

the instruction rmb() is DSB for ARM V7.   so i think DMB is not enough here because it can only guarantee the memory access order, right?
0